from:"Alessandro Benedetti \(JIRA\)"

[jira] [Updated] (SOLR-13663) XML Query Parser to Support SpanPositionRangeQuery

2019-07-31 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated SOLR-13663:

Description: 
Currently the XML Query Parser support a vast array of span queries, including 
the SpanFirstQuery, but it doesn't support the generic SpanPositionRangeQuery.

< SpanPositionRange start="2" end="3">
 prejudice
 

 

Scope of this issue is to introduce the related builder and allow the 
possibility to build such queries.

 

  was:
Currently the XML Query Parser support a vast array of span queries, including 
the SpanFirstQuery, but it doesn't support the generic SpanPositionRangeQuery.

< SpanPositionRange start="2" end="3">
 prejudice
 


> XML Query Parser to Support SpanPositionRangeQuery
> --
>
> Key: SOLR-13663
> URL: https://issues.apache.org/jira/browse/SOLR-13663
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-13663.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the XML Query Parser support a vast array of span queries, 
> including the SpanFirstQuery, but it doesn't support the generic 
> SpanPositionRangeQuery.
> < SpanPositionRange start="2" end="3">
>  prejudice
>  
>  
> Scope of this issue is to introduce the related builder and allow the 
> possibility to build such queries.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13663) XML Query Parser to Support SpanPositionRangeQuery

2019-07-30 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896141#comment-16896141
 ] 

Alessandro Benedetti commented on SOLR-13663:
-

Ready for review

> XML Query Parser to Support SpanPositionRangeQuery
> --
>
> Key: SOLR-13663
> URL: https://issues.apache.org/jira/browse/SOLR-13663
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-13663.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the XML Query Parser support a vast array of span queries, 
> including the SpanFirstQuery, but it doesn't support the generic 
> SpanPositionRangeQuery.
> < SpanPositionRange start="2" end="3">
>  prejudice
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13663) XML Query Parser to Support SpanPositionRangeQuery

2019-07-30 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated SOLR-13663:

Attachment: SOLR-13663.patch

> XML Query Parser to Support SpanPositionRangeQuery
> --
>
> Key: SOLR-13663
> URL: https://issues.apache.org/jira/browse/SOLR-13663
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-13663.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the XML Query Parser support a vast array of span queries, 
> including the SpanFirstQuery, but it doesn't support the generic 
> SpanPositionRangeQuery.
> < SpanPositionRange start="2" end="3">
>  prejudice
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13663) XML Query Parser to Support SpanPositionRangeQuery

2019-07-30 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated SOLR-13663:

Status: Patch Available  (was: Open)

> XML Query Parser to Support SpanPositionRangeQuery
> --
>
> Key: SOLR-13663
> URL: https://issues.apache.org/jira/browse/SOLR-13663
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.2
>Reporter: Alessandro Benedetti
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the XML Query Parser support a vast array of span queries, 
> including the SpanFirstQuery, but it doesn't support the generic 
> SpanPositionRangeQuery.
> < SpanPositionRange start="2" end="3">
>  prejudice
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13663) XML Query Parser to Support SpanPositionRangeQuery

2019-07-30 Thread Alessandro Benedetti (JIRA)

Alessandro Benedetti created SOLR-13663:
---

 Summary: XML Query Parser to Support SpanPositionRangeQuery
 Key: SOLR-13663
 URL: https://issues.apache.org/jira/browse/SOLR-13663
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: query parsers
Affects Versions: 8.2
Reporter: Alessandro Benedetti


Currently the XML Query Parser support a vast array of span queries, including 
the SpanFirstQuery, but it doesn't support the generic SpanPositionRangeQuery.

< SpanPositionRange start="2" end="3">
 prejudice
 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9095) ReRanker should gracefully handle sorts without score

2019-07-05 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879398#comment-16879398
 ] 

Alessandro Benedetti commented on SOLR-9095:


Brilliant, thank you very much!

 

> ReRanker should gracefully handle sorts without score
> -
>
> Key: SOLR-9095
> URL: https://issues.apache.org/jira/browse/SOLR-9095
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.10.4
> Environment: Solr 4.10.4 
> CentOS 6.5 64 bit
> Java 1.8.0_51 
>Reporter: Andrea Gazzarini
>Priority: Minor
>  Labels: re-ranking
>
> I have a Solr 4.10.4 instance with a RequestHandler that has a re-ranking 
> query configured like this:
> {code:title=solrconfig.xml|borderStyle=solid}
> 
> dismax
> ...
> {!boost b=someFunction() v=$q}
> {!rerank reRankQuery=$rqq reRankDocs=60 
> reRankWeight=1.2}
> score desc
> 
> {code}
> Everything is working until the client sends a sort params that doesn't 
> include the score field. So if for example the request contains "sort=price 
> asc" then a NullPointerException is thrown:
> {code}
> 09:46:08,548 ERROR [org.apache.solr.core.SolrCore] 
> java.lang.NullPointerException
> [INFO] [talledLocalContainer] at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.search.ReRankQParserPlugin$ReRankCollector.collect(ReRankQParserPlugin.java:263)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.search.SolrIndexSearcher.sortDocSet(SolrIndexSearcher.java:1999)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1423)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:484)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> {code}
> The only way to avoid this exception is to explicitly add the "score desc" 
> value to the incoming field; that is  
> {code}
> ?q=...=price asc, score desc 
> {code}
> In this way I get no exception. I said "explicitly" because adding an 
> "appends" section in my handler
> {code}
> 
> score desc
> 
> {code}
> Even I don't know if that could solve my problem, in practice it is 
> completely ignoring (i.e. I'm still getting the NPE above).
> However, when I explicitly add "sort=price asc, score desc", as consequence 
> of the re-ranking, the top 60 results, although I said to Solr "order by 
> price", are still shuffled and that's not what I want.
> So, at the end, the issue is about the following two points: 
> 1. the NullPointerException above 
> 2.  a way to disable the re-ranking (automatically or not)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-9095) ReRanker should gracefully handle sorts without score

2019-07-05 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879335#comment-16879335
 ] 

Alessandro Benedetti edited comment on SOLR-9095 at 7/5/19 2:47 PM:


[~munendrasn] , you resolved as duplicate, but duplicate of what?
 Can you please update the Jira ticket with link to the duplicate on closure?
 Thanks,


was (Author: alessandro.benedetti):
[~munendrasn] , tou resolved as duplicate, but duplicate of what?
Can you please update the Jira ticket with link to the duplicate on closure?
Thanks,

> ReRanker should gracefully handle sorts without score
> -
>
> Key: SOLR-9095
> URL: https://issues.apache.org/jira/browse/SOLR-9095
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.10.4
> Environment: Solr 4.10.4 
> CentOS 6.5 64 bit
> Java 1.8.0_51 
>Reporter: Andrea Gazzarini
>Priority: Minor
>  Labels: re-ranking
>
> I have a Solr 4.10.4 instance with a RequestHandler that has a re-ranking 
> query configured like this:
> {code:title=solrconfig.xml|borderStyle=solid}
> 
> dismax
> ...
> {!boost b=someFunction() v=$q}
> {!rerank reRankQuery=$rqq reRankDocs=60 
> reRankWeight=1.2}
> score desc
> 
> {code}
> Everything is working until the client sends a sort params that doesn't 
> include the score field. So if for example the request contains "sort=price 
> asc" then a NullPointerException is thrown:
> {code}
> 09:46:08,548 ERROR [org.apache.solr.core.SolrCore] 
> java.lang.NullPointerException
> [INFO] [talledLocalContainer] at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.search.ReRankQParserPlugin$ReRankCollector.collect(ReRankQParserPlugin.java:263)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.search.SolrIndexSearcher.sortDocSet(SolrIndexSearcher.java:1999)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1423)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:484)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> {code}
> The only way to avoid this exception is to explicitly add the "score desc" 
> value to the incoming field; that is  
> {code}
> ?q=...=price asc, score desc 
> {code}
> In this way I get no exception. I said "explicitly" because adding an 
> "appends" section in my handler
> {code}
> 
> score desc
> 
> {code}
> Even I don't know if that could solve my problem, in practice it is 
> completely ignoring (i.e. I'm still getting the NPE above).
> However, when I explicitly add "sort=price asc, score desc", as consequence 
> of the re-ranking, the top 60 results, although I said to Solr "order by 
> price", are still shuffled and that's not what I want.
> So, at the end, the issue is about the following two points: 
> 1. the NullPointerException above 
> 2.  a way to disable the re-ranking (automatically or not)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7830) topdocs facet function

2019-07-05 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879337#comment-16879337
 ] 

Alessandro Benedetti commented on SOLR-7830:


Any update on this issue? It is a very interesting feature, it is a shame it 
didn't make it to master after so many years!
Anything we could do to help?

> topdocs facet function
> --
>
> Key: SOLR-7830
> URL: https://issues.apache.org/jira/browse/SOLR-7830
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Yonik Seeley
>Priority: Major
> Attachments: ALT-SOLR-7830.patch, SOLR-7830.patch, SOLR-7830.patch
>
>
> A topdocs() facet function would return the top N documents per facet bucket.
> This would be a big step toward unifying grouping and the new facet module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9095) ReRanker should gracefully handle sorts without score

2019-07-05 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879335#comment-16879335
 ] 

Alessandro Benedetti commented on SOLR-9095:


[~munendrasn] , tou resolved as duplicate, but duplicate of what?
Can you please update the Jira ticket with link to the duplicate on closure?
Thanks,

> ReRanker should gracefully handle sorts without score
> -
>
> Key: SOLR-9095
> URL: https://issues.apache.org/jira/browse/SOLR-9095
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.10.4
> Environment: Solr 4.10.4 
> CentOS 6.5 64 bit
> Java 1.8.0_51 
>Reporter: Andrea Gazzarini
>Priority: Minor
>  Labels: re-ranking
>
> I have a Solr 4.10.4 instance with a RequestHandler that has a re-ranking 
> query configured like this:
> {code:title=solrconfig.xml|borderStyle=solid}
> 
> dismax
> ...
> {!boost b=someFunction() v=$q}
> {!rerank reRankQuery=$rqq reRankDocs=60 
> reRankWeight=1.2}
> score desc
> 
> {code}
> Everything is working until the client sends a sort params that doesn't 
> include the score field. So if for example the request contains "sort=price 
> asc" then a NullPointerException is thrown:
> {code}
> 09:46:08,548 ERROR [org.apache.solr.core.SolrCore] 
> java.lang.NullPointerException
> [INFO] [talledLocalContainer] at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.search.ReRankQParserPlugin$ReRankCollector.collect(ReRankQParserPlugin.java:263)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.search.SolrIndexSearcher.sortDocSet(SolrIndexSearcher.java:1999)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1423)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:484)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
> [INFO] [talledLocalContainer] at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> {code}
> The only way to avoid this exception is to explicitly add the "score desc" 
> value to the incoming field; that is  
> {code}
> ?q=...=price asc, score desc 
> {code}
> In this way I get no exception. I said "explicitly" because adding an 
> "appends" section in my handler
> {code}
> 
> score desc
> 
> {code}
> Even I don't know if that could solve my problem, in practice it is 
> completely ignoring (i.e. I'm still getting the NPE above).
> However, when I explicitly add "sort=price asc, score desc", as consequence 
> of the re-ranking, the top 60 results, although I said to Solr "order by 
> price", are still shuffled and that's not what I want.
> So, at the end, the issue is about the following two points: 
> 1. the NullPointerException above 
> 2.  a way to disable the re-ranking (automatically or not)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2019-05-31 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853002#comment-16853002
 ] 

Alessandro Benedetti commented on SOLR-12304:
-

That's great!
Thank you [~dsmiley] for your support!
The MLT looks a bit abandoned to me but I believe it's an important feature.
I will continue working on that, step by step.

I will provide a patch to support sharding (for the seed document retrieval) 
soon.

Cheers

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.2
>
> Attachments: SOLR-12304.patch, SOLR-12304.patch, SOLR-12304.patch, 
> SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8326) More Like This Params Refactor

2019-05-16 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841220#comment-16841220
 ] 

Alessandro Benedetti commented on LUCENE-8326:
--

Anyone interested in helping me in refactoring and improving the MLT step by 
step?
This is the first step, in the perspective of cleaning the code and make it 
more maintainable and extendable.
Happy to help and to support the changes necessary.

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch, LUCENE-8326.patch, 
> LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6687) MLT term frequency calculation bug

2019-05-10 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837163#comment-16837163
 ] 

Alessandro Benedetti commented on LUCENE-6687:
--

Thanks [~teofili] and [~mikemccand] and all the people that helped with this 
contribution!

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, 6.0
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 5.2.2, 8.1, master (9.0)
>
> Attachments: LUCENE-6687.patch, LUCENE-6687.patch, LUCENE-6687.patch, 
> LUCENE-6687.patch, buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12812) Add support for arbitrary field:text pairs to streaming similarity calculation in MoreLikeThisHandler

2019-01-28 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753905#comment-16753905
 ] 

Alessandro Benedetti commented on SOLR-12812:
-

Hi [~dweiss], the Cloud MLT Query parser does a similar job.
Effectively it is executed on a Solr instance but the seed document could be in 
other instances (similar to the use case you mentioned with multi cores).

The way it manages it's using the realtime GET to fetch the seed document and 
then it uses the 
org.apache.lucene.queries.mlt.MoreLikeThis#like(java.util.Map>).

So I guess this modification should allow to use that Lucene method given the 
document in input as payload.

> Add support for arbitrary field:text pairs to streaming similarity 
> calculation in MoreLikeThisHandler
> -
>
> Key: SOLR-12812
> URL: https://issues.apache.org/jira/browse/SOLR-12812
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Dawid Weiss
>Priority: Minor
>
> In this issue I would like to add support for streaming MLT case where the 
> content of the request specifies explicitly the field:text pairs to be used 
> for MLT lookup.
> A longer explanation why the current solutions are not working based on a 
> real use case. 
> Let's say a solr instance has multiple cores (collections of documents). We'd 
> like to search for similar documents between these cores. Let's assume each 
> collection of documents has three fields: title, summary and abstract.
> At the moment Solr has two MLT handler options: the query-based (similarity 
> to an indexed document) and the free-text based (similarity to an arbitrary 
> text).
> 1) The first MLT pipeline in Solr looks for documents similar to the given 
> one 
> (I'll assume a single document as input, to keep things simple). This
> pipeline reads the content of the document from the existing index and creates
> a mapping between fields and actual values stored in that document.
> Let's say the document looks like this:
> title: foo bar
> summary: baz bar
> abstract: ping ping
> The "interesting term" extraction routine in MoreLikeThisHelper will extract 
> those terms and
> score them against each field's statistics, then take top-N best scoring 
> terms (and fields they're assigned to) and create a Boolean query from it. It 
> could go something like this:
> title:foo^1.5 summary:bar^0.5
> When this query is applied against the collection it would *not* match "bar" 
> in the title or abstract (because the weighted "important" term wasn't 
> selected in that field). That's the way it should be.
> 2) In the second pipeline, we give the full "text" for which we wish to 
> obtain similar documents. If we were to emulate scenario (1), we'd have to 
> cram the content of each field into a single blob of text, so it'd become 
> something like:
> foo bar, baz bar, ping ping
> Solr takes this text and creates a pseudo-document that maps the
> provided set of fields (mlt.fl) to this value. So effectively it
> creates a pseudo-document like this:
> title: foo bar, baz bar, ping ping
> summary: foo bar, baz bar, ping ping
> abstract: foo bar, baz bar, ping ping
> What follows is identical to scenario (1), but note that this time the
> set of terms for each field (and their scores) are much broader. This
> means that the final query can look like this:
> title:foo^1.5 summary:foo^0.5 title:bar^1 summary:bar^0.5
> This results in severely skewed MLT results (for example shorter fields will 
> have drastically different term statistics).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13172) Deprecate MoreLikeTHisHandler

2019-01-28 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753871#comment-16753871
 ] 

Alessandro Benedetti commented on SOLR-13172:
-

Hi [~dweiss], this is a good call.
The time I checked if it was ready for deprecation I actually didn't check that 
sub-feature of the MLT, my bad.
And indeed that Lucene MLT feature is only accessible via the handler.

But I just checked the MLT query parser(s) and we have access to the Solr 
request in there (so the reader with the text in input is accessibile).
Effectively having that additional logic in the query parser can make sense (in 
the end you use an additional parameter provided by the user to build the 
query).

In my opinion it would be still possible to deprecate it and add the feature to 
the MLT Query Parser (actually we can add the feature there anyway).

To conclude, I don't have any strong opinion in deprecating the handler, but if 
we confirm all its features are achievable through the query parser, having 
less choice for the user could help.

> Deprecate MoreLikeTHisHandler
> -
>
> Key: SOLR-13172
> URL: https://issues.apache.org/jira/browse/SOLR-13172
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Reporter: Alessandro Benedetti
>Priority: Major
>
> Following the discussions with [~dsmi...@mac.com]
> Currently the Lucene More Like This functionality is offered in Apache Solr 
> through :
>  * More Like This Handler
>  * More Like This Component
>  * More Like This Query Parser
> The query parser is the most flexible approach and it is well supported, it 
> is a good candidate to become the main entry point if a user wnat the MLT 
> functionality.
> The More Like This component is quite coupled with the others but it has a 
> sense and offers slightly different features from the query parser ( *Using 
> MoreLikeThis as a search component returns similar documents for each 
> document in the response set.*)
> So the proposal here is to deprecate and remove the More Like This Handler, 
> to ease the maintenance  of the functionality and to simplify the way new 
> users approach it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2019-01-28 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753857#comment-16753857
 ] 

Alessandro Benedetti commented on SOLR-12304:
-

The bot doesn't check the Pull Request code and the patch was somewhat corrupt.
Just uploaded an healthy one.

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch, SOLR-12304.patch, 
> SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2019-01-28 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753858#comment-16753858
 ] 

Alessandro Benedetti commented on SOLR-12304:
-

The bot doesn't check the Pull Request code and the patch was somewhat corrupt.
Just uploaded an healthy one.

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch, SOLR-12304.patch, 
> SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6687) MLT term frequency calculation bug

2019-01-28 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753856#comment-16753856
 ] 

Alessandro Benedetti commented on LUCENE-6687:
--

The bot doesn't check the Pull Request code and the patch was somewhat corrupt.
Just uploaded an healthy one.

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, 6.0
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
>Priority: Major
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, LUCENE-6687.patch, LUCENE-6687.patch, 
> LUCENE-6687.patch, buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2019-01-28 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated SOLR-12304:

Attachment: SOLR-12304.patch

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch, SOLR-12304.patch, 
> SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2019-01-28 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-6687:
-
Attachment: LUCENE-6687.patch

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, 6.0
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
>Priority: Major
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, LUCENE-6687.patch, LUCENE-6687.patch, 
> LUCENE-6687.patch, buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2019-01-27 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-6687:
-
Attachment: LUCENE-6687.patch

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, 6.0
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
>Priority: Major
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, LUCENE-6687.patch, LUCENE-6687.patch, 
> buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6687) MLT term frequency calculation bug

2019-01-27 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753458#comment-16753458
 ] 

Alessandro Benedetti commented on LUCENE-6687:
--

Pull request and patch have been updated, following up the work done at the 
Lucene/Solr London hackaton in October 2018.

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, 6.0
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
>Priority: Major
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, LUCENE-6687.patch, LUCENE-6687.patch, 
> buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2019-01-27 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated SOLR-12304:

Attachment: SOLR-12304.patch

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch, SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2019-01-27 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753377#comment-16753377
 ] 

Alessandro Benedetti commented on SOLR-12304:
-

Hi [~dsmiley], I created the other issue that will take care of the More Like 
This Handler removal : https://issues.apache.org/jira/browse/SOLR-13172 .

Then I updated the Pull Request with the minor change you proposed and merging 
back upstream master, it is up to date, no conflict and related tests are green.

The reason I considered it a bug(or better an half baked functionality) is 
because there was the Lucene implementation available, Apache Solr had the 
request parameters meant to provide the functionality but then, they were just 
not used to actually properly generate the Interesting Terms.

Ready for review!

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2019-01-27 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753375#comment-16753375
 ] 

Alessandro Benedetti commented on SOLR-12304:
-

[~dsmiley], I just created the separated issue for the deprecation of the 
handler.

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13172) Deprecate MoreLikeTHisHandler

2019-01-27 Thread Alessandro Benedetti (JIRA)

Alessandro Benedetti created SOLR-13172:
---

 Summary: Deprecate MoreLikeTHisHandler
 Key: SOLR-13172
 URL: https://issues.apache.org/jira/browse/SOLR-13172
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: MoreLikeThis
Reporter: Alessandro Benedetti


Following the discussions with [~dsmi...@mac.com]
Currently the Lucene More Like This functionality is offered in Apache Solr 
through :
 * More Like This Handler
 * More Like This Component
 * More Like This Query Parser

The query parser is the most flexible approach and it is well supported, it is 
a good candidate to become the main entry point if a user wnat the MLT 
functionality.

The More Like This component is quite coupled with the others but it has a 
sense and offers slightly different features from the query parser ( *Using 
MoreLikeThis as a search component returns similar documents for each document 
in the response set.*)

So the proposal here is to deprecate and remove the More Like This Handler, to 
ease the maintenance  of the functionality and to simplify the way new users 
approach it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2019-01-09 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738355#comment-16738355
 ] 

Alessandro Benedetti commented on SOLR-12304:
-

thanks for your response David, investigating a bit more,
 there is a functionality that is offered by the component that is not 
achievable through the query parser:

*Using MoreLikeThis as a search component returns similar documents for each 
document in the response set.*

Given that, it should be ok to keep it and this patch is still valid.
Deprecating the MLT handler is still recommended though.
Ideally I prefer to remove it as I fear deprecated stuff could pend around for 
a while.
I will work in the next month on that, I will open a Jira and keep the progress 
there.

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2019-01-08 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737213#comment-16737213
 ] 

Alessandro Benedetti commented on SOLR-12304:
-

[~dsmiley] any feedback on the latest messages? I would be happy to help, but 
it seems this issue got forgotten.
Should we proceed in the deprecation path?

Or just keep the component and handler for backward compatibility ?

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2018-12-17 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722863#comment-16722863
 ] 

Alessandro Benedetti commented on SOLR-12304:
-

I will just repeat myself, but I don't have anything to add apart the fact I am 
happy to contribute a different patch or help.
So I would love some update on this from the community.

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8539) Fix typos and style in TestStopFilter

2018-11-19 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691565#comment-16691565
 ] 

Alessandro Benedetti commented on LUCENE-8539:
--

I have reviewed the Pull Request and it is a big +1 from my side.
Can anyone from the community pick it up and give us his opinion?

> Fix typos and style in TestStopFilter
> -
>
> Key: LUCENE-8539
> URL: https://issues.apache.org/jira/browse/LUCENE-8539
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Diego Ceccarelli
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This patch fixes some typos in TestStopFilter, it contains also some 
> refactoring of the tests to make them more clear. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload

2018-11-15 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687952#comment-16687952
 ] 

Alessandro Benedetti commented on SOLR-12238:
-

Thanks [~softwaredoug] for the support here.
I see the Pull request is currently out of date.
I will set a note in my agenda to bring it up to date with current master.

It would be brilliant if any of the committers could take a look to this, even 
a quick review could help finding if the patch is ready or any change is needed.

> Synonym Query Style Boost By Payload
> 
>
> Key: SOLR-12238
> URL: https://issues.apache.org/jira/browse/SOLR-12238
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, 
> SOLR-12238.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This improvement is built on top of the Synonym Query Style feature and 
> brings the possibility of boosting synonym queries using the payload 
> associated.
> It introduces two new modalities for the Synonym Query Style :
> PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses 
> boosted by payload
> AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses 
> boosted by payload
> This new synonym query styles will assume payloads are available so they must 
> be used in conjunction with a token filter able to produce payloads.
> An synonym.txt example could be :
> # Synonyms used by Payload Boost
> tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9
> leopard => leopard, Big_Cat|0.8, Bagheera|0.9
> lion => lion|1.0, panthera leo|0.99, Simba|0.8
> snow_leopard => panthera uncia|0.99, snow leopard|1.0
> A simple token filter to populate the payloads from such synonym.txt is :
>  delimiter="|"/>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8347) BlendedInfixSuggester to handle multi term matches better

2018-09-10 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609753#comment-16609753
 ] 

Alessandro Benedetti commented on LUCENE-8347:
--

I agree,

let's finalise LUCENE-8343 first and then I will take a look again to this 
contribution and update it !

> BlendedInfixSuggester to handle multi term matches better
> -
>
> Key: LUCENE-8347
> URL: https://issues.apache.org/jira/browse/LUCENE-8347
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8347.patch, LUCENE-8347.patch
>
>
> Currently the blendedInfix suggester considers just the first match position 
> when scoring a suggestion.
> From the lucene-dev mailing list :
> "
> If I write more than one term in the query, let's say 
>  
> "Mini Bar Fridge" 
>  
> I would expect in the results something like (note that allTermsRequired=true 
> and the schema weight field always returns 1000)
>  
> - *Mini Bar Fridge* something
> - *Mini Bar Fridge* something else
> - *Mini Bar* something *Fridge*        
> - *Mini Bar* something else *Fridge*
> - *Mini* something *Bar Fridge*
> ...
>  
> Instead I see this: 
>  
> - *Mini Bar* something *Fridge*        
> - *Mini Bar* something else *Fridge*
> - *Mini Bar Fridge* something
> - *Mini Bar Fridge* something else
> - *Mini* something *Bar Fridge*
> ...
>  
> After having a look at the suggester code 
> (BlendedInfixSuggester.createCoefficient), I see that the component takes in 
> account only one position, which is the lowest position (among the three 
> matching terms) within the term vector ("mini" in the example above) so all 
> the suggestions above have the same weight 
> "
> Scope of this Jira issue is to improve the BlendedInfix to better manage 
> those scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-09-10 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609752#comment-16609752
 ] 

Alessandro Benedetti commented on LUCENE-8343:
--

Hi [~mikemccand], sorry for the immense delay, but I have been busy/ on 
holidays and I missed the August comment !
I just updated the PR and a new patch is attached .
The changes added are :
- AnalysingSuggester weight encoded to 0 in the FST when weight is null ( this 
should be ok as the AnalysingSuggester weight the suggestion with no positional 
information, so in this case null or 0 weight should have the same semantic)
- SolrJ suggestion weight moved to double from long

let me know!
Happy to work more on it if necessary!

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch, 
> LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-09-10 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8343:
-
Attachment: LUCENE-8343.patch

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch, 
> LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-07-30 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561918#comment-16561918
 ] 

Alessandro Benedetti edited comment on LUCENE-8343 at 7/30/18 1:25 PM:
---

Hi [~mikemccand], I just updated the Pull Request and patch.

I have checked ant precommit ant it seems fine to me.
 I have also executed some of the tests ( the one that are related with the 
Suggesters).
 There were some Solr tests failing. So I addressed that as well.
 let me know, happy to take care of anything that is missing.
i will monitor the Jira issue and check when the robot returns the checks and 
tests.


was (Author: alessandro.benedetti):
Hi [~mikemccand], I just updated the Pull Request and patch.

I have checked ant precommit ant it seems fine to me.
I have also executed some of the tests ( the one that are related with the 
Suggesters).
There were some Solr tests failing. So I addressed that as well.
let me know, happy to take care of anything that is missing.

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch, 
> LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-07-30 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561918#comment-16561918
 ] 

Alessandro Benedetti commented on LUCENE-8343:
--

Hi [~mikemccand], I just updated the Pull Request and patch.

I have checked ant precommit ant it seems fine to me.
I have also executed some of the tests ( the one that are related with the 
Suggesters).
There were some Solr tests failing. So I addressed that as well.
let me know, happy to take care of anything that is missing.

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch, 
> LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-07-30 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8343:
-
Attachment: LUCENE-8343.patch

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch, 
> LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12238) Synonym Query Style Boost By Payload

2018-07-27 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559571#comment-16559571
 ] 

Alessandro Benedetti edited comment on SOLR-12238 at 7/27/18 11:06 AM:
---

Hi, is anyone from the community interested in moving this forward ?
 I can guarantee the support from my side and do all the bug fixes/ 
improvements necessary if any committer is interested in bringing this in OR 
explain me why it's not necessary :)


was (Author: alessandro.benedetti):
Hi, is anyone from the community interested in move this forward ?
I can guarantee the support from my side and do all the bug fixes/ improvements 
necessary if any committer is interested in bringing this in OR explain me why 
it's not necessary :)

> Synonym Query Style Boost By Payload
> 
>
> Key: SOLR-12238
> URL: https://issues.apache.org/jira/browse/SOLR-12238
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, 
> SOLR-12238.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This improvement is built on top of the Synonym Query Style feature and 
> brings the possibility of boosting synonym queries using the payload 
> associated.
> It introduces two new modalities for the Synonym Query Style :
> PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses 
> boosted by payload
> AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses 
> boosted by payload
> This new synonym query styles will assume payloads are available so they must 
> be used in conjunction with a token filter able to produce payloads.
> An synonym.txt example could be :
> # Synonyms used by Payload Boost
> tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9
> leopard => leopard, Big_Cat|0.8, Bagheera|0.9
> lion => lion|1.0, panthera leo|0.99, Simba|0.8
> snow_leopard => panthera uncia|0.99, snow leopard|1.0
> A simple token filter to populate the payloads from such synonym.txt is :
>  delimiter="|"/>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload

2018-07-27 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559571#comment-16559571
 ] 

Alessandro Benedetti commented on SOLR-12238:
-

Hi, is anyone from the community interested in move this forward ?
I can guarantee the support from my side and do all the bug fixes/ improvements 
necessary if any committer is interested in bringing this in OR explain me why 
it's not necessary :)

> Synonym Query Style Boost By Payload
> 
>
> Key: SOLR-12238
> URL: https://issues.apache.org/jira/browse/SOLR-12238
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, 
> SOLR-12238.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This improvement is built on top of the Synonym Query Style feature and 
> brings the possibility of boosting synonym queries using the payload 
> associated.
> It introduces two new modalities for the Synonym Query Style :
> PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses 
> boosted by payload
> AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses 
> boosted by payload
> This new synonym query styles will assume payloads are available so they must 
> be used in conjunction with a token filter able to produce payloads.
> An synonym.txt example could be :
> # Synonyms used by Payload Boost
> tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9
> leopard => leopard, Big_Cat|0.8, Bagheera|0.9
> lion => lion|1.0, panthera leo|0.99, Simba|0.8
> snow_leopard => panthera uncia|0.99, snow leopard|1.0
> A simple token filter to populate the payloads from such synonym.txt is :
>  delimiter="|"/>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2018-07-20 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550867#comment-16550867
 ] 

Alessandro Benedetti commented on SOLR-12243:
-

Hi community,

if this bug-fix is not of interest could we have an explanation why and have 
this Jira issue closed ?
Thanks

> Edismax missing phrase queries when phrases contain multiterm synonyms
> --
>
> Key: SOLR-12243
> URL: https://issues.apache.org/jira/browse/SOLR-12243
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.1
> Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>Reporter: Elizabeth Haubert
>Priority: Major
> Attachments: SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, 
> SOLR-12243.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> synonyms.txt:
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
> request handler:
> 
>  
> 
>  edismax
>   0.4
>  title^100
>  title~20^5000
>  title~11
>  title~22^1000
>  text
>  
>  3-1 6-3 930%
>  *:*
>  25
> 
>  
> Phrase queries (pf, pf2, pf3) containing "dog" or "aspirin"  against the 
> above list will not be generated.
> "allergic reaction dog" will generate pf2: "allergic reaction", but not 
> pf:"allergic reaction dog", pf2: "reaction dog", or pf3: "allergic reaction 
> dog"
> "aspirin dose in rats" will generate pf3: "dose ? rats" but not pf2: "aspirin 
> dose" or pf3:"aspirin dose ?"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-07-18 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8343:
-
Attachment: LUCENE-8343.patch

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch, 
> LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-07-18 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547600#comment-16547600
 ] 

Alessandro Benedetti commented on LUCENE-8343:
--

Hi [~mikemccand],
I just merged the branch with the latest master, no conflicts : 
[https://github.com/apache/lucene-solr/pull/398]

I am pushing now the patch here in the way all tests will be executed by 
Jenkins automatically.
If this doesn't work in the next day or so, I will proceed doing the tests 
locally as well.

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch, 
> LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-07-06 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534631#comment-16534631
 ] 

Alessandro Benedetti commented on LUCENE-8343:
--

Any update on this ? Can I help anyway with this to move forward ?

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8326) More Like This Params Refactor

2018-06-19 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516909#comment-16516909
 ] 

Alessandro Benedetti commented on LUCENE-8326:
--

Latest Jenkins failure doesn't seem to be related with the latest updates to 
the patch.

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch, LUCENE-8326.patch, 
> LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload

2018-06-19 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516906#comment-16516906
 ] 

Alessandro Benedetti commented on SOLR-12238:
-

Still happy to work on this if anyone can give me any review, recommendation or 
even rejection,
Happy to the contribute to the community if possible :)

> Synonym Query Style Boost By Payload
> 
>
> Key: SOLR-12238
> URL: https://issues.apache.org/jira/browse/SOLR-12238
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, 
> SOLR-12238.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This improvement is built on top of the Synonym Query Style feature and 
> brings the possibility of boosting synonym queries using the payload 
> associated.
> It introduces two new modalities for the Synonym Query Style :
> PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses 
> boosted by payload
> AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses 
> boosted by payload
> This new synonym query styles will assume payloads are available so they must 
> be used in conjunction with a token filter able to produce payloads.
> An synonym.txt example could be :
> # Synonyms used by Payload Boost
> tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9
> leopard => leopard, Big_Cat|0.8, Bagheera|0.9
> lion => lion|1.0, panthera leo|0.99, Simba|0.8
> snow_leopard => panthera uncia|0.99, snow leopard|1.0
> A simple token filter to populate the payloads from such synonym.txt is :
>  delimiter="|"/>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2018-06-19 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516905#comment-16516905
 ] 

Alessandro Benedetti commented on SOLR-12304:
-

Any update on this ? I am happy to contribute a different patch or help.
If deprecation is the path to go I am happy to contribute a patch that way...

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7161) TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery assertion error

2018-06-18 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516328#comment-16516328
 ] 

Alessandro Benedetti commented on LUCENE-7161:
--

I agree with this, happy to review and debug the issue if we are able to 
reproduce it again.

> TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery assertion 
> error
> ---
>
> Key: LUCENE-7161
> URL: https://issues.apache.org/jira/browse/LUCENE-7161
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 6.7, 7.0
>
>
> I just hit this unrelated but reproducible on master 
> #cc75be53f9b3b86ec59cb93896c4fd5a9a5926b2 while tweaking earth's radius:
> {noformat}
>[junit4] Suite: org.apache.lucene.queries.mlt.TestMoreLikeThis
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestMoreLikeThis 
> -Dtests.method=testMultiFieldShouldReturnPerFieldBooleanQuery 
> -Dtests.seed=794526110651C8E6 -Dtests.locale=es-HN 
> -Dtests.timezone=Brazil/West -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] FAILURE 0.25s | 
> TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery <<<
>[junit4]> Throwable #1: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([794526110651C8E6:1DF67ED7BBBF4E1D]:0)
>[junit4]>  at 
> org.apache.lucene.queries.mlt.TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery(TestMoreLikeThis.java:320)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
>[junit4]   2> NOTE: test params are: codec=CheapBastard, 
> sim=ClassicSimilarity, locale=es-HN, timezone=Brazil/West
>[junit4]   2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 
> 1.8.0_60 (64-bit)/cpus=8,threads=1,free=409748864,total=504889344
>[junit4]   2> NOTE: All tests run in this JVM: [TestMoreLikeThis]
>[junit4] Completed [1/1 (1!)] in 0.45s, 1 test, 1 failure <<< FAILURES!
>[junit4] 
>[junit4] 
>[junit4] Tests with failures [seed: 794526110651C8E6]:
>[junit4]   - 
> org.apache.lucene.queries.mlt.TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery
> {noformat}
> Likely related to LUCENE-6954?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-15 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513663#comment-16513663
 ] 

Alessandro Benedetti commented on LUCENE-8343:
--

Hi [~mikemccand], thanks for your review !
I followed your suggestions and I updated the Pull Request ( fixing a recent 
merge conflict).
Feel free to check the additional comments in there.

I agree to bring this to 8.x .
When we are close to an acceptable status let me know and I will go on with 
refinements and double checks to be production ready.

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-13 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510863#comment-16510863
 ] 

Alessandro Benedetti commented on LUCENE-8343:
--

Hi [~mikemccand], thank you for your feedback!
I definitely agree the path with data type migration is the best route to solve 
the bug(s) elegantly.
I already attached a Pull Request : 
[https://github.com/apache/lucene-solr/pull/398]

Which :
- move the weight long to Long ( preserving null values with difference to 0 
values)
- move the suggestion score to double ( preserving the precision)

It is ready for review and after a first feedback I can work more on that to 
make it production ready!

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6687) MLT term frequency calculation bug

2018-06-11 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507863#comment-16507863
 ] 

Alessandro Benedetti commented on LUCENE-6687:
--

Anyone interested in committing the bugfix ?
The 2018 patch is attached and this is the associated Pull Request :

[GitHub Pull Request #389|https://github.com/apache/lucene-solr/pull/389]

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, 6.0
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
>Priority: Major
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, LUCENE-6687.patch, 
> buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-08 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506129#comment-16506129
 ] 

Alessandro Benedetti commented on LUCENE-8343:
--

First of all, thanks again Adrien for your time.
I have done the work for the data type migration approach here :
[https://github.com/apache/lucene-solr/pull/398]
The patch is affecting many more files as expected, but the strictly 
BlendedInfixSuggester fix is much more elegant.

The drawbacks are that :
- much more attention is needed to review the new patch
- it should be pretty safe, but introducing nulls around, I never feel fully 
comfortable unless I am super confident of my tests

in case this approach is preferred and someone from the community commit to 
take care of the review process, I am more than happy to spend more effort in 
this and make it producton ready!

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-08 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505885#comment-16505885
 ] 

Alessandro Benedetti commented on LUCENE-8343:
--

Hi Adrien,
I theoretically agree with you.
The reason I structured the patch this way is because what I noticed so far in 
my contributions is that a contribution is much more likely to be reviewed and 
accepted if it fixes a bug with the minimal impact as possible and involving 
less classes as possible.

The problem here is indeed related the data type of : 
- the suggestion score ( should be double)
- and weght ( should be Long as 0 must be considered different from null)

I would be more than happy to contribute that, but my feeling is that a patch 
that span over a lot of different classes and areas, would be ignored with the 
final result of the bug(s) to remain there.
Happy if you( the community in general) contradict me and I will proceed with 
the data types change approach :)

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-07 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505016#comment-16505016
 ] 

Alessandro Benedetti commented on LUCENE-8343:
--

In the meantime I attached the patch with the overflow edge case fixed and a 
better handling of the weight just when it is too small.

Happy to discuss the implications of "turning weight=0 into 1" with the 
community!

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-07 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8343:
-
Attachment: LUCENE-8343.patch

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-07 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504909#comment-16504909
 ] 

Alessandro Benedetti edited comment on LUCENE-8343 at 6/7/18 5:42 PM:
--

Hi [~jpountz],

thanks for your time, I can give you a quick explanation here:

The (positional) coefficient should be a double  0<=x<=1 calculated with 3 
possible formulas from the position of the first matching query term in the 
suggestion ( linear doesn't respect that constraint and can go negative for 
postion which are farer than 10 positions from the beginning ) :
 * *position_linear*: (1 – 0.10*position): Matches to the start will be given a 
higher score (Default)
 * *position_reciprocal*: 1/(1+position): Matches to the start will be given a 
score which decay faster than linear
 * *position_exponential_reciprocal*: 1/pow(1+position,exponent): Matches to 
the start will be given a score which decay faster than reciprocal

To answer your questions :

1) "turning weight=0 into 1" , so this is an interesting one :
 You don't want all your weights to be 0 for the BlendedInfixSuggester because 
you would just flat to 0 the positional score of the suggestion, which is the 
only reason to use the Blended Infix ( if you are not interested in the 
positional score for the suggestion, you should use the parent suggester : 
AnalyzingInfixSuggester)
 If you don't configure the weight field ( which is not and shouldn't be 
mandatory) all your weights go to 0s 
(org.apache.lucene.search.suggest.DocumentDictionary.DocumentInputIterator#getWeight
 ) and your BlendedInfixSuggester doesn't blend anything anymore scoring each 
suggestion a constant 0.
 That was the reason to move the weight 0 to the smallest bigger value ( which 
in a long data type is 1) .
 With that fix you limit the ability of a user to move certain suggestions to 0 
weight ( they can just drop them to 1 weight) , but you gain a good bug fix for 
the missing weight field scenario.

2) So the chosen of 10 was completely arbitrary to get at least 10 possible 
ranked outcomes out of the positional coefficient. 
 You may end up in overflows if :  


 - the weight is already big enough.
 You are right maybe we can apply that scaling factor only if the weight is 
small.


The overflow according to my analysis can not come from the coefficient, 
because the edge cases for linear are :
1 - where input position is 0
-2.147483637002E8  -  where input position is 
[Integer.MAX_VALUE|http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Integer.html#MAX_VALUE]
 ( which is not going to be achievable as Strings full length are maxed by that 
value)


was (Author: alessandro.benedetti):
Hi [~jpountz],

thanks for your time, I can give you a quick explanation here:

The (positional) coefficient should be a double  0<=x<=1 calculated with 3 
possible formulas from the position of the first matching query term in the 
suggestion ( linear doesn't respect that constraint and can go negative for 
postion which are farer than 10 positions from the beginning ) :
 * *position_linear*: (1 – 0.10*position): Matches to the start will be given a 
higher score (Default)
 * *position_reciprocal*: 1/(1+position): Matches to the start will be given a 
score which decay faster than linear
 * *position_exponential_reciprocal*: 1/pow(1+position,exponent): Matches to 
the start will be given a score which decay faster than reciprocal

To answer your questions :

1) "turning weight=0 into 1" , so this is an interesting one :
You don't want all your weights to be 0 for the BlendedInfixSuggester because 
you would just flat to 0 the positional score of the suggestion, which is the 
only reason to use the Blended Infix ( if you are not interested in the 
positional score for the suggestion, you should use the parent suggester : 
AnalyzingInfixSuggester)
If you don't configure the weight field ( which is not and shouldn't be 
mandatory) all your weights go to 0s 
(org.apache.lucene.search.suggest.DocumentDictionary.DocumentInputIterator#getWeight
 ) and your BlendedInfixSuggester doesn't blend anything anymore scoring each 
suggestion a constant 0.
That was the reason to move the weight 0 to the smallest bigger value ( which 
in a long data type is 1) .
With that fix you limit the ability of a user to move certain suggestions to 0 
weight ( they can just drop them to 1 weight) , but you gain a good bug fix for 
the missing weight field scenario.

2) So the chosen of 10 was completely arbitrary to get at least 10 possible 
ranked outcomes out of the positional coefficient. 
You may end up in overflows if :  
: 
- the weight is already big enough.
You are right maybe we can apply that scaling factor only if the weight is 
small.
- the linear coefficient goes deep negative ( we can limit the coefficient 
score to a minimum of 0, which will also give Linear a behaviour similar to its

[jira] [Commented] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-07 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504909#comment-16504909
 ] 

Alessandro Benedetti commented on LUCENE-8343:
--

Hi [~jpountz],

thanks for your time, I can give you a quick explanation here:

The (positional) coefficient should be a double  0<=x<=1 calculated with 3 
possible formulas from the position of the first matching query term in the 
suggestion ( linear doesn't respect that constraint and can go negative for 
postion which are farer than 10 positions from the beginning ) :
 * *position_linear*: (1 – 0.10*position): Matches to the start will be given a 
higher score (Default)
 * *position_reciprocal*: 1/(1+position): Matches to the start will be given a 
score which decay faster than linear
 * *position_exponential_reciprocal*: 1/pow(1+position,exponent): Matches to 
the start will be given a score which decay faster than reciprocal

To answer your questions :

1) "turning weight=0 into 1" , so this is an interesting one :
You don't want all your weights to be 0 for the BlendedInfixSuggester because 
you would just flat to 0 the positional score of the suggestion, which is the 
only reason to use the Blended Infix ( if you are not interested in the 
positional score for the suggestion, you should use the parent suggester : 
AnalyzingInfixSuggester)
If you don't configure the weight field ( which is not and shouldn't be 
mandatory) all your weights go to 0s 
(org.apache.lucene.search.suggest.DocumentDictionary.DocumentInputIterator#getWeight
 ) and your BlendedInfixSuggester doesn't blend anything anymore scoring each 
suggestion a constant 0.
That was the reason to move the weight 0 to the smallest bigger value ( which 
in a long data type is 1) .
With that fix you limit the ability of a user to move certain suggestions to 0 
weight ( they can just drop them to 1 weight) , but you gain a good bug fix for 
the missing weight field scenario.

2) So the chosen of 10 was completely arbitrary to get at least 10 possible 
ranked outcomes out of the positional coefficient. 
You may end up in overflows if :  
: 
- the weight is already big enough.
You are right maybe we can apply that scaling factor only if the weight is 
small.
- the linear coefficient goes deep negative ( we can limit the coefficient 
score to a minimum of 0, which will also give Linear a behaviour similar to its 
siblings blender types)

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-07 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504852#comment-16504852
 ] 

Alessandro Benedetti edited comment on LUCENE-8343 at 6/7/18 4:00 PM:
--

Hi [~ctargett],

I did the change and pushed, so they were just in the Jira associated Pull 
Request :
 [GitHub Pull Request #391
|https://github.com/apache/lucene-solr/pull/391]I just uploaded the patch as 
well.
 
You can take a look now ( I think the Github Pull Request is easier to read, 
but feel free to use the patch at your convenience)|


was (Author: alessandro.benedetti):
Hi Cassandra,

I did the change and pushed, so they were just in the Jira associated Pull 
Request :
[GitHub Pull Request #391

|https://github.com/apache/lucene-solr/pull/391]I just uploaded the patch as 
well.
You can take a look now ( I think the Github Pull Request is easier to read, 
but feel free to use the patch at your convenience)

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-07 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504852#comment-16504852
 ] 

Alessandro Benedetti commented on LUCENE-8343:
--

Hi Cassandra,

I did the change and pushed, so they were just in the Jira associated Pull 
Request :
[GitHub Pull Request #391

|https://github.com/apache/lucene-solr/pull/391]I just uploaded the patch as 
well.
You can take a look now ( I think the Github Pull Request is easier to read, 
but feel free to use the patch at your convenience)

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-07 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8343:
-
Attachment: LUCENE-8343.patch

> BlendedInfixSuggester bad score calculus for certain suggestion weights
> ---
>
> Key: LUCENE-8343
> URL: https://issues.apache.org/jira/browse/LUCENE-8343
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8343.patch, LUCENE-8343.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the BlendedInfixSuggester return a (long) score to rank the 
> suggestions.
> This score is calculated as a multiplication between :
> long *Weight* : the suggestion weight, coming from a document field, it can 
> be any long value ( including 1, 0,.. )
> double *Coefficient* : 0<=x<=1, calculated based on the position match, 
> earlier the better
> The resulting score is a long, which means that at the moment, any weight<10 
> can bring inconsistencies.
> *Edge cases* 
> Weight =1
> Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
> any other match)
> Weight =0
> Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8347) BlendedInfixSuggester to handle multi term matches better

2018-06-07 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504752#comment-16504752
 ] 

Alessandro Benedetti commented on LUCENE-8347:
--

Added some additional edge cases tests + bugfixes :

- assertThat(responses.get(8).key, is("Bar Fridge Mini"));
Management of shuffled position in the suggestion but all terms match

- term query repetition properly managed

> BlendedInfixSuggester to handle multi term matches better
> -
>
> Key: LUCENE-8347
> URL: https://issues.apache.org/jira/browse/LUCENE-8347
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8347.patch, LUCENE-8347.patch
>
>
> Currently the blendedInfix suggester considers just the first match position 
> when scoring a suggestion.
> From the lucene-dev mailing list :
> "
> If I write more than one term in the query, let's say 
>  
> "Mini Bar Fridge" 
>  
> I would expect in the results something like (note that allTermsRequired=true 
> and the schema weight field always returns 1000)
>  
> - *Mini Bar Fridge* something
> - *Mini Bar Fridge* something else
> - *Mini Bar* something *Fridge*        
> - *Mini Bar* something else *Fridge*
> - *Mini* something *Bar Fridge*
> ...
>  
> Instead I see this: 
>  
> - *Mini Bar* something *Fridge*        
> - *Mini Bar* something else *Fridge*
> - *Mini Bar Fridge* something
> - *Mini Bar Fridge* something else
> - *Mini* something *Bar Fridge*
> ...
>  
> After having a look at the suggester code 
> (BlendedInfixSuggester.createCoefficient), I see that the component takes in 
> account only one position, which is the lowest position (among the three 
> matching terms) within the term vector ("mini" in the example above) so all 
> the suggestions above have the same weight 
> "
> Scope of this Jira issue is to improve the BlendedInfix to better manage 
> those scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8326) More Like This Params Refactor

2018-06-06 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503705#comment-16503705
 ] 

Alessandro Benedetti commented on LUCENE-8326:
--

I just attached a revised patch and Pull Request :

1) impact on user side has been considerably reduced keeping getters and 
setters in the MLT main class and parameters in the external class
2) Parameters with defaults have their own class, making them easy to maintain 
and read, it will be  easy to pass them to inner MLT modules when the refactor 
continues
3) Boost logic is still away from users of the MLT, this keeps the 
responsibility of managing boost, MLT side, open to discuss this
4) the patch is built on top of SOLR-12304 bugfix, that should go IN first 
anyway

Happy to revise further, if necessary and follow up with the following 
refactors ( in separate Jira issues)

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch, LUCENE-8326.patch, 
> LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8326) More Like This Params Refactor

2018-06-06 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8326:
-
Attachment: LUCENE-8326.patch

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch, LUCENE-8326.patch, 
> LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8347) BlendedInfixSuggester to handle multi term matches better

2018-06-06 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503177#comment-16503177
 ] 

Alessandro Benedetti commented on LUCENE-8347:
--

New patch attached to fix the minor precommit comments issues.

> BlendedInfixSuggester to handle multi term matches better
> -
>
> Key: LUCENE-8347
> URL: https://issues.apache.org/jira/browse/LUCENE-8347
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8347.patch, LUCENE-8347.patch
>
>
> Currently the blendedInfix suggester considers just the first match position 
> when scoring a suggestion.
> From the lucene-dev mailing list :
> "
> If I write more than one term in the query, let's say 
>  
> "Mini Bar Fridge" 
>  
> I would expect in the results something like (note that allTermsRequired=true 
> and the schema weight field always returns 1000)
>  
> - *Mini Bar Fridge* something
> - *Mini Bar Fridge* something else
> - *Mini Bar* something *Fridge*        
> - *Mini Bar* something else *Fridge*
> - *Mini* something *Bar Fridge*
> ...
>  
> Instead I see this: 
>  
> - *Mini Bar* something *Fridge*        
> - *Mini Bar* something else *Fridge*
> - *Mini Bar Fridge* something
> - *Mini Bar Fridge* something else
> - *Mini* something *Bar Fridge*
> ...
>  
> After having a look at the suggester code 
> (BlendedInfixSuggester.createCoefficient), I see that the component takes in 
> account only one position, which is the lowest position (among the three 
> matching terms) within the term vector ("mini" in the example above) so all 
> the suggestions above have the same weight 
> "
> Scope of this Jira issue is to improve the BlendedInfix to better manage 
> those scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8347) BlendedInfixSuggester to handle multi term matches better

2018-06-06 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8347:
-
Attachment: LUCENE-8347.patch

> BlendedInfixSuggester to handle multi term matches better
> -
>
> Key: LUCENE-8347
> URL: https://issues.apache.org/jira/browse/LUCENE-8347
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8347.patch, LUCENE-8347.patch
>
>
> Currently the blendedInfix suggester considers just the first match position 
> when scoring a suggestion.
> From the lucene-dev mailing list :
> "
> If I write more than one term in the query, let's say 
>  
> "Mini Bar Fridge" 
>  
> I would expect in the results something like (note that allTermsRequired=true 
> and the schema weight field always returns 1000)
>  
> - *Mini Bar Fridge* something
> - *Mini Bar Fridge* something else
> - *Mini Bar* something *Fridge*        
> - *Mini Bar* something else *Fridge*
> - *Mini* something *Bar Fridge*
> ...
>  
> Instead I see this: 
>  
> - *Mini Bar* something *Fridge*        
> - *Mini Bar* something else *Fridge*
> - *Mini Bar Fridge* something
> - *Mini Bar Fridge* something else
> - *Mini* something *Bar Fridge*
> ...
>  
> After having a look at the suggester code 
> (BlendedInfixSuggester.createCoefficient), I see that the component takes in 
> account only one position, which is the lowest position (among the three 
> matching terms) within the term vector ("mini" in the example above) so all 
> the suggestions above have the same weight 
> "
> Scope of this Jira issue is to improve the BlendedInfix to better manage 
> those scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8326) More Like This Params Refactor

2018-06-06 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503148#comment-16503148
 ] 

Alessandro Benedetti commented on LUCENE-8326:
--

Ok Robert,
I see your point.

I wouldn't say it is a critical part the parameters split up ( while I do 
believe the interesting terms retrieval and interesting terms scoring is, but 
this will be a later on discussion).

Let me spend some time looking for a more balanced and conveniente solution 
that makes a good compromise.
I will update this Jira as soon as I have a new patch/ pull request.
Thank you for your time again!

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch, LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8326) More Like This Params Refactor

2018-06-06 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503094#comment-16503094
 ] 

Alessandro Benedetti commented on LUCENE-8326:
--

Any consideration on my answer from the 24/05 ?
I would like to move this forward, happy to have a constructive discussion and 
revisiting the approach if necessary :)

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch, LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8347) BlendedInfixSuggester to handle multi term matches better

2018-06-04 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500554#comment-16500554
 ] 

Alessandro Benedetti commented on LUCENE-8347:
--

It is recommended to merge this one first

> BlendedInfixSuggester to handle multi term matches better
> -
>
> Key: LUCENE-8347
> URL: https://issues.apache.org/jira/browse/LUCENE-8347
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8347.patch
>
>
> Currently the blendedInfix suggester considers just the first match position 
> when scoring a suggestion.
> From the lucene-dev mailing list :
> "
> If I write more than one term in the query, let's say 
>  
> "Mini Bar Fridge" 
>  
> I would expect in the results something like (note that allTermsRequired=true 
> and the schema weight field always returns 1000)
>  
> - *Mini Bar Fridge* something
> - *Mini Bar Fridge* something else
> - *Mini Bar* something *Fridge*        
> - *Mini Bar* something else *Fridge*
> - *Mini* something *Bar Fridge*
> ...
>  
> Instead I see this: 
>  
> - *Mini Bar* something *Fridge*        
> - *Mini Bar* something else *Fridge*
> - *Mini Bar Fridge* something
> - *Mini Bar Fridge* something else
> - *Mini* something *Bar Fridge*
> ...
>  
> After having a look at the suggester code 
> (BlendedInfixSuggester.createCoefficient), I see that the component takes in 
> account only one position, which is the lowest position (among the three 
> matching terms) within the term vector ("mini" in the example above) so all 
> the suggestions above have the same weight 
> "
> Scope of this Jira issue is to improve the BlendedInfix to better manage 
> those scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8347) BlendedInfixSuggester to handle multi term matches better

2018-06-04 Thread Alessandro Benedetti (JIRA)

Alessandro Benedetti created LUCENE-8347:


 Summary: BlendedInfixSuggester to handle multi term matches better
 Key: LUCENE-8347
 URL: https://issues.apache.org/jira/browse/LUCENE-8347
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Alessandro Benedetti


Currently the blendedInfix suggester considers just the first match position 
when scoring a suggestion.
>From the lucene-dev mailing list :
"
If I write more than one term in the query, let's say 
 
"Mini Bar Fridge" 
 
I would expect in the results something like (note that allTermsRequired=true 
and the schema weight field always returns 1000)
 
- *Mini Bar Fridge* something
- *Mini Bar Fridge* something else
- *Mini Bar* something *Fridge*        
- *Mini Bar* something else *Fridge*
- *Mini* something *Bar Fridge*
...
 
Instead I see this: 
 
- *Mini Bar* something *Fridge*        
- *Mini Bar* something else *Fridge*
- *Mini Bar Fridge* something
- *Mini Bar Fridge* something else
- *Mini* something *Bar Fridge*
...
 
After having a look at the suggester code 
(BlendedInfixSuggester.createCoefficient), I see that the component takes in 
account only one position, which is the lowest position (among the three 
matching terms) within the term vector ("mini" in the example above) so all the 
suggestions above have the same weight 
"
Scope of this Jira issue is to improve the BlendedInfix to better manage those 
scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8343) BlendedInfixSuggester bad score calculus for certain suggestion weights

2018-06-01 Thread Alessandro Benedetti (JIRA)

Alessandro Benedetti created LUCENE-8343:


 Summary: BlendedInfixSuggester bad score calculus for certain 
suggestion weights
 Key: LUCENE-8343
 URL: https://issues.apache.org/jira/browse/LUCENE-8343
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 7.3.1
Reporter: Alessandro Benedetti


Currently the BlendedInfixSuggester return a (long) score to rank the 
suggestions.
This score is calculated as a multiplication between :

long *Weight* : the suggestion weight, coming from a document field, it can be 
any long value ( including 1, 0,.. )

double *Coefficient* : 0<=x<=1, calculated based on the position match, earlier 
the better

The resulting score is a long, which means that at the moment, any weight<10 
can bring inconsistencies.

*Edge cases* 
Weight =1
Score = 1( if we have a match at the beginning of the suggestion) or 0 ( for 
any other match)

Weight =0
Score = 0 ( independently of the position match coefficient)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6687) MLT term frequency calculation bug

2018-05-31 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496744#comment-16496744
 ] 

Alessandro Benedetti commented on LUCENE-6687:
--

I don't have the priviledges to change the status, it should be moved to "patch 
available"

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, 6.0
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
>Priority: Major
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, LUCENE-6687.patch, 
> buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6687) MLT term frequency calculation bug

2018-05-31 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496736#comment-16496736
 ] 

Alessandro Benedetti commented on LUCENE-6687:
--

I double checked and the issue was still there.
The bug would manifest when using the MLT with the "live Lucene Document" which 
is basically a map passed as a parameter to the like method in the MLT ( this 
is used by Solr Cloud for example and the distributed More Like This) .
A minimal patch + test is attached.

I didn't include any refactor in here.
A complete refactor with better modularity and proper testing is under way, 
unfortunately proceeding slowly.
The first piece is here :
https://issues.apache.org/jira/browse/LUCENE-8326

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, 6.0
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
>Priority: Major
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, LUCENE-6687.patch, 
> buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2018-05-31 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-6687:
-
Attachment: LUCENE-6687.patch

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, 6.0
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
>Priority: Major
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, LUCENE-6687.patch, 
> buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (LUCENE-6687) MLT term frequency calculation bug

2018-05-31 Thread Alessandro Benedetti (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-6687:
-
Comment: was deleted

(was: I just checked the source code and I find a different logic in place, I 
assume this bug was fixed long time ago.
Can anyone close this Jira ?)

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, 6.0
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
>Priority: Major
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6687) MLT term frequency calculation bug

2018-05-31 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496652#comment-16496652
 ] 

Alessandro Benedetti commented on LUCENE-6687:
--

I just checked the source code and I find a different logic in place, I assume 
this bug was fixed long time ago.
Can anyone close this Jira ?

> MLT term frequency calculation bug
> --
>
> Key: LUCENE-6687
> URL: https://issues.apache.org/jira/browse/LUCENE-6687
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/queryparser
>Affects Versions: 5.2.1, 6.0
> Environment: OS X v10.10.4; Solr 5.2.1
>Reporter: Marko Bonaci
>Priority: Major
> Fix For: 5.2.2
>
> Attachments: LUCENE-6687.patch, buggy-method-usage.png, 
> solr-mlt-tf-doubling-bug-results.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
> solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
> solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
> terms-glass.png, terms-how.png
>
>
> In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
> {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
> basically, but it doesn't have to be an existing doc.
> !solr-mlt-tf-doubling-bug.png|height=500!
> There are 2 for loops, one inside the other, which both loop through the same 
> set of fields.
> That effectively doubles the term frequency for all the terms from fields 
> that we provide in MLT QP {{qf}} parameter. 
> It basically goes two times over the list of fields and accumulates the term 
> frequencies from all fields into {{termFreqMap}}.
> The private method {{retrieveTerms}} is only called from one public method, 
> the version of overloaded method {{like}} that receives a Map: so that 
> private class member {{fieldNames}} is always derived from 
> {{retrieveTerms}}'s argument {{fields}}.
>  
> Uh, I don't understand what I wrote myself, but that basically means that, by 
> the time {{retrieveTerms}} method gets called, its parameter fields and 
> private member {{fieldNames}} always contain the same list of fields.
> Here's the proof:
> These are the final results of the calculation:
> !solr-mlt-tf-doubling-bug-results.png|height=700!
> And this is the actual {{thread_id:TID0009}} document, where those values 
> were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
> !terms-glass.png|height=100!
> !terms-angry.png|height=100!
> !terms-how.png|height=100!
> !terms-accumulator.png|height=100!
> Now, let's further test this hypothesis by seeing MLT QP in action from the 
> AdminUI.
> Let's try to find docs that are More Like doc {{TID0009}}. 
> Here's the interesting part, the query:
> {code}
> q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
> {code}
> We just saw, in the last image above, that the term accumulator appears {{7}} 
> times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
> {{14}}.
> By using {{mintf=14}}, we say that, when calculating similarity, we don't 
> want to consider terms that appear less than 14 times (when terms from fields 
> {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
> I added the term accumulator in only one other document ({{TID0004}}), where 
> it appears only once, in the field {{title_mlt}}. 
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
> Let's see what happens when we use {{mintf=15}}:
> !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
> I should probably mention that multiple fields ({{qf}}) work because I 
> applied the patch: 
> [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
> Bug, no?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12370) NullPointerException on MoreLikeThisComponent

2018-05-31 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496646#comment-16496646
 ] 

Alessandro Benedetti edited comment on SOLR-12370 at 5/31/18 2:42 PM:
--

I don't think this is a bug at all.
 You are mis-using the suggester component.
 Including the suggester radically change your response structure, not making 
it compatible with the More Like This Component .

Calling a request handler with the suggest component will return suggestions 
while calling a classic request handler will return documents ( and potentially 
MLT sections + spellchecking + whatever configured)

Possibly the problem is with documentation, where the suggest component should 
be explained maybe a little bit better.
 If you remove the suggest component you should be fine
 Hope it helps.

P.S. for the future I recommend to create a Jira issue only if you are sure it 
is a bug.
So first I suggest to send an email to the solr-user mailing list.


was (Author: alessandro.benedetti):
I don't think this is a bug at all.
You are mis-using the suggester component.
Including the suggester radically change your response structure, not making it 
compatible with the More Like This Component .

Calling a request handler with the suggest component will return suggestions 
while calling a classic request handler will return documents ( and potentially 
MLT sections + spellchecking + whatever configured)

Possibly the problem is with documentation, where the suggest component should 
be explained maybe a little bit better.
If you remove the suggest component you should be fine
Hope it helps.

> NullPointerException on MoreLikeThisComponent
> -
>
> Key: SOLR-12370
> URL: https://issues.apache.org/jira/browse/SOLR-12370
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.3.1
>Reporter: Gilles Bodart
>Priority: Blocker
>
> I'm trying to use the MoreLikeThis component under a suggest call, but I 
> receive a npe every time (here's the stacktrace)
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.solr.handler.component.MoreLikeThisComponent.process(MoreLikeThisComponent.java:127)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> ...{code}
> and here's the config of my requestHandlers:
> {code:java}
> 
> 
> true
> 10
> default
> true
> default
> wordbreak
> true
> true
> 10
> true
> true
> 5
> 5
> 10
> 5
> true
> _text_
> on
> content description title
> true
> html
> b
> /b
> 
> 
> suggest
> spellcheck
> mlt
> highlight
> 
> 
> 
> {code}
> I also tried with 
> {code:java}
> on{code}
> When I call
> {code:java}
> /mlt?df=_text_=pann=_text_
> {code}
>  it works fine but with
> {code:java}
> /suggest?df=_text_=pann=_text_
> {code}
> I got the npe
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12370) NullPointerException on MoreLikeThisComponent

2018-05-31 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496646#comment-16496646
 ] 

Alessandro Benedetti commented on SOLR-12370:
-

I don't think this is a bug at all.
You are mis-using the suggester component.
Including the suggester radically change your response structure, not making it 
compatible with the More Like This Component .

Calling a request handler with the suggest component will return suggestions 
while calling a classic request handler will return documents ( and potentially 
MLT sections + spellchecking + whatever configured)

Possibly the problem is with documentation, where the suggest component should 
be explained maybe a little bit better.
If you remove the suggest component you should be fine
Hope it helps.

> NullPointerException on MoreLikeThisComponent
> -
>
> Key: SOLR-12370
> URL: https://issues.apache.org/jira/browse/SOLR-12370
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.3.1
>Reporter: Gilles Bodart
>Priority: Blocker
>
> I'm trying to use the MoreLikeThis component under a suggest call, but I 
> receive a npe every time (here's the stacktrace)
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.solr.handler.component.MoreLikeThisComponent.process(MoreLikeThisComponent.java:127)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> ...{code}
> and here's the config of my requestHandlers:
> {code:java}
> 
> 
> true
> 10
> default
> true
> default
> wordbreak
> true
> true
> 10
> true
> true
> 5
> 5
> 10
> 5
> true
> _text_
> on
> content description title
> true
> html
> b
> /b
> 
> 
> suggest
> spellcheck
> mlt
> highlight
> 
> 
> 
> {code}
> I also tried with 
> {code:java}
> on{code}
> When I call
> {code:java}
> /mlt?df=_text_=pann=_text_
> {code}
>  it works fine but with
> {code:java}
> /suggest?df=_text_=pann=_text_
> {code}
> I got the npe
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7161) TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery assertion error

2018-05-31 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496438#comment-16496438
 ] 

Alessandro Benedetti commented on LUCENE-7161:
--

Just tried on current master all the seeds that had problem in various branches 
here :


_794526110651C8E6 - OK_
25E751FED53FC993 - OK
60AAD450C5F7A579 - OK
E467DF1643BE90EA - OK
12E0331668C5EB5 - OK
3FA5C26ECE58C917- OK
116FB7FCD72BFF28 - OK
F41FAA899068BC32 - OK
C802AA860A1EAE50 - OK

I also tried myself to run the test various times, with different random seeds, 
not able to reproduce any failure.
Is anybody able to reproduce this issue anymore ?
I would be more than happy to debug it and fix it.

> TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery assertion 
> error
> ---
>
> Key: LUCENE-7161
> URL: https://issues.apache.org/jira/browse/LUCENE-7161
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 6.7, 7.0
>
>
> I just hit this unrelated but reproducible on master 
> #cc75be53f9b3b86ec59cb93896c4fd5a9a5926b2 while tweaking earth's radius:
> {noformat}
>[junit4] Suite: org.apache.lucene.queries.mlt.TestMoreLikeThis
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestMoreLikeThis 
> -Dtests.method=testMultiFieldShouldReturnPerFieldBooleanQuery 
> -Dtests.seed=794526110651C8E6 -Dtests.locale=es-HN 
> -Dtests.timezone=Brazil/West -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] FAILURE 0.25s | 
> TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery <<<
>[junit4]> Throwable #1: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([794526110651C8E6:1DF67ED7BBBF4E1D]:0)
>[junit4]>  at 
> org.apache.lucene.queries.mlt.TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery(TestMoreLikeThis.java:320)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
>[junit4]   2> NOTE: test params are: codec=CheapBastard, 
> sim=ClassicSimilarity, locale=es-HN, timezone=Brazil/West
>[junit4]   2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 
> 1.8.0_60 (64-bit)/cpus=8,threads=1,free=409748864,total=504889344
>[junit4]   2> NOTE: All tests run in this JVM: [TestMoreLikeThis]
>[junit4] Completed [1/1 (1!)] in 0.45s, 1 test, 1 failure <<< FAILURES!
>[junit4] 
>[junit4] 
>[junit4] Tests with failures [seed: 794526110651C8E6]:
>[junit4]   - 
> org.apache.lucene.queries.mlt.TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery
> {noformat}
> Likely related to LUCENE-6954?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8776) Support RankQuery in grouping

2018-05-31 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496289#comment-16496289
 ] 

Alessandro Benedetti commented on SOLR-8776:


Hi all, I am quite interested in this Jira,
any update on this ?
I believe it would quite handy to have the Learning To Rank module working with 
grouping !

> Support RankQuery in grouping
> -
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 6.0
>Reporter: Diego Ceccarelli
>Priority: Minor
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-7161) TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery assertion error

2018-05-30 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495411#comment-16495411
 ] 

Alessandro Benedetti edited comment on LUCENE-7161 at 5/30/18 4:42 PM:
---

While refactoring the MoreLikeThis[1]  I just found this test awaiting fix 
 (  @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/LUCENE-7161;)
 public void testMultiFieldShouldReturnPerFieldBooleanQuery ) 

Happy to help, where can I find a seed to reproduce and debug the test failure ?
 I tried the test seeds in this Jira, but all of them succeeds on my local 
machine 
 e.g.

ant test -Dtestcase=TestMoreLikeThis 
-Dtests.method=testMultiFieldShouldReturnPerFieldBooleanQuery 
-Dtests.seed=C802AA860A1EAE50 -Dtests.slow=true 
-Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt 
-Dtests.locale=hi -Dtests.timezone=MST7MDT -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII

[1] https://issues.apache.org/jira/browse/LUCENE-8326


was (Author: alessandro.benedetti):
While refactoring the MoreLikeThis[1]  I just found this test awaiting fix 
(  @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/LUCENE-7161;)
public void testMultiFieldShouldReturnPerFieldBooleanQuery ) 

Happy to help, where can I find a seed to reproduce and debug the test failure ?
I tried the test seeds in this Jira, but all of them succeeds on my local 
machine 
e.g.

ant test -Dtestcase=TestMoreLikeThis 
-Dtests.method=testMultiFieldShouldReturnPerFieldBooleanQuery 
-Dtests.seed=C802AA860A1EAE50 -Dtests.slow=true 
-Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt 
-Dtests.locale=hi -Dtests.timezone=MST7MDT -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII

[1] [#https://issues.apache.org/jira/browse/LUCENE-8326]

> TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery assertion 
> error
> ---
>
> Key: LUCENE-7161
> URL: https://issues.apache.org/jira/browse/LUCENE-7161
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 6.7, 7.0
>
>
> I just hit this unrelated but reproducible on master 
> #cc75be53f9b3b86ec59cb93896c4fd5a9a5926b2 while tweaking earth's radius:
> {noformat}
>[junit4] Suite: org.apache.lucene.queries.mlt.TestMoreLikeThis
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestMoreLikeThis 
> -Dtests.method=testMultiFieldShouldReturnPerFieldBooleanQuery 
> -Dtests.seed=794526110651C8E6 -Dtests.locale=es-HN 
> -Dtests.timezone=Brazil/West -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] FAILURE 0.25s | 
> TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery <<<
>[junit4]> Throwable #1: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([794526110651C8E6:1DF67ED7BBBF4E1D]:0)
>[junit4]>  at 
> org.apache.lucene.queries.mlt.TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery(TestMoreLikeThis.java:320)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
>[junit4]   2> NOTE: test params are: codec=CheapBastard, 
> sim=ClassicSimilarity, locale=es-HN, timezone=Brazil/West
>[junit4]   2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 
> 1.8.0_60 (64-bit)/cpus=8,threads=1,free=409748864,total=504889344
>[junit4]   2> NOTE: All tests run in this JVM: [TestMoreLikeThis]
>[junit4] Completed [1/1 (1!)] in 0.45s, 1 test, 1 failure <<< FAILURES!
>[junit4] 
>[junit4] 
>[junit4] Tests with failures [seed: 794526110651C8E6]:
>[junit4]   - 
> org.apache.lucene.queries.mlt.TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery
> {noformat}
> Likely related to LUCENE-6954?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7161) TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery assertion error

2018-05-30 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495411#comment-16495411
 ] 

Alessandro Benedetti commented on LUCENE-7161:
--

While refactoring the MoreLikeThis[1]  I just found this test awaiting fix 
(  @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/LUCENE-7161;)
public void testMultiFieldShouldReturnPerFieldBooleanQuery ) 

Happy to help, where can I find a seed to reproduce and debug the test failure ?
I tried the test seeds in this Jira, but all of them succeeds on my local 
machine 
e.g.

ant test -Dtestcase=TestMoreLikeThis 
-Dtests.method=testMultiFieldShouldReturnPerFieldBooleanQuery 
-Dtests.seed=C802AA860A1EAE50 -Dtests.slow=true 
-Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt 
-Dtests.locale=hi -Dtests.timezone=MST7MDT -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII

[1] [#https://issues.apache.org/jira/browse/LUCENE-8326]

> TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery assertion 
> error
> ---
>
> Key: LUCENE-7161
> URL: https://issues.apache.org/jira/browse/LUCENE-7161
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 6.7, 7.0
>
>
> I just hit this unrelated but reproducible on master 
> #cc75be53f9b3b86ec59cb93896c4fd5a9a5926b2 while tweaking earth's radius:
> {noformat}
>[junit4] Suite: org.apache.lucene.queries.mlt.TestMoreLikeThis
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestMoreLikeThis 
> -Dtests.method=testMultiFieldShouldReturnPerFieldBooleanQuery 
> -Dtests.seed=794526110651C8E6 -Dtests.locale=es-HN 
> -Dtests.timezone=Brazil/West -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] FAILURE 0.25s | 
> TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery <<<
>[junit4]> Throwable #1: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([794526110651C8E6:1DF67ED7BBBF4E1D]:0)
>[junit4]>  at 
> org.apache.lucene.queries.mlt.TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery(TestMoreLikeThis.java:320)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
>[junit4]   2> NOTE: test params are: codec=CheapBastard, 
> sim=ClassicSimilarity, locale=es-HN, timezone=Brazil/West
>[junit4]   2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 
> 1.8.0_60 (64-bit)/cpus=8,threads=1,free=409748864,total=504889344
>[junit4]   2> NOTE: All tests run in this JVM: [TestMoreLikeThis]
>[junit4] Completed [1/1 (1!)] in 0.45s, 1 test, 1 failure <<< FAILURES!
>[junit4] 
>[junit4] 
>[junit4] Tests with failures [seed: 794526110651C8E6]:
>[junit4]   - 
> org.apache.lucene.queries.mlt.TestMoreLikeThis.testMultiFieldShouldReturnPerFieldBooleanQuery
> {noformat}
> Likely related to LUCENE-6954?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2018-05-29 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493928#comment-16493928
 ] 

Alessandro Benedetti commented on SOLR-12243:
-

Sorry to be boring, but is anything we can do to speed up a review and 
contribution of this one ?

> Edismax missing phrase queries when phrases contain multiterm synonyms
> --
>
> Key: SOLR-12243
> URL: https://issues.apache.org/jira/browse/SOLR-12243
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.1
> Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>Reporter: Elizabeth Haubert
>Priority: Major
> Attachments: SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, 
> SOLR-12243.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> synonyms.txt:
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
> request handler:
> 
>  
> 
>  edismax
>   0.4
>  title^100
>  title~20^5000
>  title~11
>  title~22^1000
>  text
>  
>  3-1 6-3 930%
>  *:*
>  25
> 
>  
> Phrase queries (pf, pf2, pf3) containing "dog" or "aspirin"  against the 
> above list will not be generated.
> "allergic reaction dog" will generate pf2: "allergic reaction", but not 
> pf:"allergic reaction dog", pf2: "reaction dog", or pf3: "allergic reaction 
> dog"
> "aspirin dose in rats" will generate pf3: "dose ? rats" but not pf2: "aspirin 
> dose" or pf3:"aspirin dose ?"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2018-05-29 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493924#comment-16493924
 ] 

Alessandro Benedetti commented on SOLR-12304:
-

Any update on this ? happy to contribute a different patch or help

 

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload

2018-05-29 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493922#comment-16493922
 ] 

Alessandro Benedetti commented on SOLR-12238:
-

Anything I can do to speed up a possible review?

 

> Synonym Query Style Boost By Payload
> 
>
> Key: SOLR-12238
> URL: https://issues.apache.org/jira/browse/SOLR-12238
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, 
> SOLR-12238.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This improvement is built on top of the Synonym Query Style feature and 
> brings the possibility of boosting synonym queries using the payload 
> associated.
> It introduces two new modalities for the Synonym Query Style :
> PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses 
> boosted by payload
> AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses 
> boosted by payload
> This new synonym query styles will assume payloads are available so they must 
> be used in conjunction with a token filter able to produce payloads.
> An synonym.txt example could be :
> # Synonyms used by Payload Boost
> tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9
> leopard => leopard, Big_Cat|0.8, Bagheera|0.9
> lion => lion|1.0, panthera leo|0.99, Simba|0.8
> snow_leopard => panthera uncia|0.99, snow leopard|1.0
> A simple token filter to populate the payloads from such synonym.txt is :
>  delimiter="|"/>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8326) More Like This Params Refactor

2018-05-26 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491601#comment-16491601
 ] 

Alessandro Benedetti commented on LUCENE-8326:
--

{color:#FF}Failed{color} - 
org.apache.solr.handler.component.SearchHandlerTest.testRequireZkConnectedDistrib
It's not marked as BadApple but it doesn't seem related to this patch at all to 
me

{color:#FF}Regression{color} - 
org.apache.solr.cloud.TestSolrCloudWithDelegationTokens.testDelegationTokenRenew
It's not marked as BadApple but it doesn't seem related to this patch at all to 
me

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch, LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8326) More Like This Params Refactor

2018-05-25 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490843#comment-16490843
 ] 

Alessandro Benedetti commented on LUCENE-8326:
--

[https://github.com/apache/lucene-solr/pull/380] has been updated

New patch attached.
Thanks [~dweiss] for letting us know !
If you are interested in this issue any feedback is more than welcome!

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch, LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8326) More Like This Params Refactor

2018-05-25 Thread Alessandro Benedetti (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8326:
-
Attachment: LUCENE-8326.patch

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch, LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8326) More Like This Params Refactor

2018-05-25 Thread Alessandro Benedetti (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8326:
-
Component/s: core/query/scoring

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8326) More Like This Params Refactor

2018-05-25 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490575#comment-16490575
 ] 

Alessandro Benedetti commented on LUCENE-8326:
--

Hi [~dweiss], I am fixing the patch!
In the meantime , what does this mean ? :



* This method calls \{@link #setMaxDocFreq(int)} internally (both conditions 
cannot
 * be used at the same time).
 *
 */
public void setMaxDocFreqPct(int maxPercentage) {
 setMaxDocFreq(Math.toIntExact((long) maxPercentage * ir.maxDoc() / 100));
}

Cheers

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8326) More Like This Params Refactor

2018-05-24 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488772#comment-16488772
 ] 

Alessandro Benedetti commented on LUCENE-8326:
--

Hi Robert,

first of all thanks for your feedback, much appreciated.

My initial goal was to make More Like This original 1000 lines class:
 * More Readable
 * Easier to Maintain and Extend
 * More Testable

So I started identifying the different responsibilities in the More Like This 
class, to separate them, in the way that if I just need to change the scoring 
algorithm for the interesting terms I just touch the TermScorer ect ect :
You see the overall idea in these slides : 
[https://www.slideshare.net/AlessandroBenedetti/advanced-document-similarity-with-apache-lucene]

I tried to modify as less as possible the logic and tests at this stage.

So let me wrap my considerations under different topics :

*Parameters Abstraction*
I see your point for just additional indirection/abstraction ( it is exactly 
just that with readability in mind).
My scope for this was :
"We have 600 lines of code of default and parameters for the MLT. How to make 
them isolated, more readable and extendable ?"
And I initially thought to just put them in a separate class to remove that 
responsibility from the original MLT .
So the focus was exclusively better readability and easy maintenance at this 
stage.
Can you elaborate why you think this is undesired ?
I don't have any strong feeling regarding this bit, so I am open to suggestions 
with the forementioned objective in mind.

*MLT Immutable*
I didn't consider it , but I am completely open to do that.
In such case it could be worth to add a MoreLikeThis factory that manages the 
parameters and create the immutable MLT object ?

*MoreLikeThisQuery*
It was not in the scope of this refactor but I am absolutely happy to tackle 
that immediately as a next step, it could give it a try to see if there is 
space for improvement there.

*Solr Tests*

I completely agree, indeed as part of my additional tests which I have ready 
for the sequent refactors I introducedmuch more tests Lucene side than Solr 
side.
We can elaborate this further at the right moment

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8326) More Like This Params Refactor

2018-05-23 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487484#comment-16487484
 ] 

Alessandro Benedetti edited comment on LUCENE-8326 at 5/23/18 3:40 PM:
---

It is very annoying that the patch automatically generated from the Github Pull 
Request ( which is green to merge with the master) actually ends up being 
malformed...

I attach a fixed patch now, just built straight from command line.
 The Pull Request is still valid from a code review perspective.


was (Author: alessandro.benedetti):
It is very annoying that the patch automatically generated from the Github Pull 
Request ( which is green to merge with the master) actually ends up being 
malformed...

I attacha fixed patch now, just built straight from command line.
The Pull Request is still valid from a code review perspective.

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8326) More Like This Params Refactor

2018-05-23 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487484#comment-16487484
 ] 

Alessandro Benedetti commented on LUCENE-8326:
--

It is very annoying that the patch automatically generated from the Github Pull 
Request ( which is green to merge with the master) actually ends up being 
malformed...

I attacha fixed patch now, just built straight from command line.
The Pull Request is still valid from a code review perspective.

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8326) More Like This Params Refactor

2018-05-23 Thread Alessandro Benedetti (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8326:
-
Attachment: LUCENE-8326.patch

> More Like This Params Refactor
> --
>
> Key: LUCENE-8326
> URL: https://issues.apache.org/jira/browse/LUCENE-8326
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: LUCENE-8326.patch, LUCENE-8326.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> More Like This ca be refactored to improve the code readability, test 
> coverage and maintenance.
> Scope of this Jira issue is to start the More Like This refactor from the 
> More Like This Params.
> This Jira will not improve the current More Like This but just keep the same 
> functionality with a refactored code.
> Other Jira issues will follow improving the overall code readability, test 
> coverage and maintenance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8329) Size Estimator wrongly calculate Disk space in MB

2018-05-23 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487045#comment-16487045
 ] 

Alessandro Benedetti edited comment on LUCENE-8329 at 5/23/18 10:27 AM:


Hi Adrien, 
 I am talking about the one included in the dev-tools in the Apache Lucene/Solr 
project :

dev-tools/size-estimator-lucene-solr.xls

I understand it is an old tool, but someone is still using it, so I just 
thought to contribute back these simple bug fixes.

For sure, that xls could be rewritten, but It's out of scope for this simple 
Jira :)

P.S. I attached the patch, but unfortunately it is unreadable.
Being a binary file, it just replace it.
This is annoying as I have done a minimal fix to the XSL but being on a Mac I 
had to export it via Numbers.
So I end up not being sure if I broke any OS compatibility issue.


was (Author: alessandro.benedetti):
Hi Adrien, 
I am talking about the one included in the dev-tools in the Apache Lucene/Solr 
project :

dev-tools/size-estimator-lucene-solr.xls

I understand it is an old tool, but someone is still using it, so I just 
thought to contribute back these simple bug fixes.

For sure, that xls could be rewritten, but It's out of scope for this simple 
Jira :)

> Size Estimator wrongly calculate Disk space in MB
> -
>
> Key: LUCENE-8329
> URL: https://issues.apache.org/jira/browse/LUCENE-8329
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: -tools
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Minor
> Attachments: LUCENE-8329.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The size estimator dev tool ( dev-tools/size-estimator-lucene-solr.xls 
> )currently :
>  * Wrongly calculates disk size in MB ( showing GB)
>  * Doesn't specify clearly that the space needed by the optimize is FREE space
>  * Avg. Document Size (KB) when they are more correctly Avg. Document Field 
> Size (KB)
> Scope of this issue is just to fix these small mistakes.
>  Out of scope is any improvement to the tool ( potentially separate Jira 
> issues will follow)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8329) Size Estimator wrongly calculate Disk space in MB

2018-05-23 Thread Alessandro Benedetti (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8329:
-
Attachment: LUCENE-8329.patch

> Size Estimator wrongly calculate Disk space in MB
> -
>
> Key: LUCENE-8329
> URL: https://issues.apache.org/jira/browse/LUCENE-8329
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: -tools
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Minor
> Attachments: LUCENE-8329.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The size estimator dev tool ( dev-tools/size-estimator-lucene-solr.xls 
> )currently :
>  * Wrongly calculates disk size in MB ( showing GB)
>  * Doesn't specify clearly that the space needed by the optimize is FREE space
>  * Avg. Document Size (KB) when they are more correctly Avg. Document Field 
> Size (KB)
> Scope of this issue is just to fix these small mistakes.
>  Out of scope is any improvement to the tool ( potentially separate Jira 
> issues will follow)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8329) Size Estimator wrongly calculate Disk space in MB

2018-05-23 Thread Alessandro Benedetti (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8329:
-
Description: 
The size estimator dev tool ( dev-tools/size-estimator-lucene-solr.xls 
)currently :
 * Wrongly calculates disk size in MB ( showing GB)
 * Doesn't specify clearly that the space needed by the optimize is FREE space
 * Avg. Document Size (KB) when they are more correctly Avg. Document Field 
Size (KB)

Scope of this issue is just to fix these small mistakes.
 Out of scope is any improvement to the tool ( potentially separate Jira issues 
will follow)

 

  was:
The size estimator dev tool currently :
 * Wrongly calculates disk size in MB ( showing GB)
 * Doesn't specify clearly that the space needed by the optimize is FREE space
 * Avg. Document Size (KB) when they are more correctly Avg. Document Field 
Size (KB)

Scope of this issue is just to fix these small mistakes.
 Out of scope is any improvement to the tool ( potentially separate Jira issues 
will follow)

 


> Size Estimator wrongly calculate Disk space in MB
> -
>
> Key: LUCENE-8329
> URL: https://issues.apache.org/jira/browse/LUCENE-8329
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: -tools
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Minor
>
> The size estimator dev tool ( dev-tools/size-estimator-lucene-solr.xls 
> )currently :
>  * Wrongly calculates disk size in MB ( showing GB)
>  * Doesn't specify clearly that the space needed by the optimize is FREE space
>  * Avg. Document Size (KB) when they are more correctly Avg. Document Field 
> Size (KB)
> Scope of this issue is just to fix these small mistakes.
>  Out of scope is any improvement to the tool ( potentially separate Jira 
> issues will follow)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8329) Size Estimator wrongly calculate Disk space in MB

2018-05-23 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487045#comment-16487045
 ] 

Alessandro Benedetti commented on LUCENE-8329:
--

Hi Adrien, 
I am talking about the one included in the dev-tools in the Apache Lucene/Solr 
project :

dev-tools/size-estimator-lucene-solr.xls

I understand it is an old tool, but someone is still using it, so I just 
thought to contribute back these simple bug fixes.

For sure, that xls could be rewritten, but It's out of scope for this simple 
Jira :)

> Size Estimator wrongly calculate Disk space in MB
> -
>
> Key: LUCENE-8329
> URL: https://issues.apache.org/jira/browse/LUCENE-8329
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: -tools
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Minor
>
> The size estimator dev tool currently :
>  * Wrongly calculates disk size in MB ( showing GB)
>  * Doesn't specify clearly that the space needed by the optimize is FREE space
>  * Avg. Document Size (KB) when they are more correctly Avg. Document Field 
> Size (KB)
> Scope of this issue is just to fix these small mistakes.
>  Out of scope is any improvement to the tool ( potentially separate Jira 
> issues will follow)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8329) Size Estimator wrongly calculate Disk space in MB

2018-05-23 Thread Alessandro Benedetti (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8329:
-
Environment: (was: The size estimator dev tool currently :
 * Wrongly calculates disk size in MB ( showing GB)
 * Doesn't specify clearly that the space needed by the optimize is FREE space
 * Avg. Document Size (KB) when they are more correctly Avg. Document Field 
Size (KB)


Scope of this issue is just to fix these small mistakes.
Out of scope is any improvement to the tool ( potentially separate Jira issues 
will follow))

> Size Estimator wrongly calculate Disk space in MB
> -
>
> Key: LUCENE-8329
> URL: https://issues.apache.org/jira/browse/LUCENE-8329
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: -tools
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8329) Size Estimator wrongly calculate Disk space in MB

2018-05-23 Thread Alessandro Benedetti (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated LUCENE-8329:
-
Description: 
The size estimator dev tool currently :
 * Wrongly calculates disk size in MB ( showing GB)
 * Doesn't specify clearly that the space needed by the optimize is FREE space
 * Avg. Document Size (KB) when they are more correctly Avg. Document Field 
Size (KB)

Scope of this issue is just to fix these small mistakes.
 Out of scope is any improvement to the tool ( potentially separate Jira issues 
will follow)

 

> Size Estimator wrongly calculate Disk space in MB
> -
>
> Key: LUCENE-8329
> URL: https://issues.apache.org/jira/browse/LUCENE-8329
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: -tools
>Affects Versions: 7.3.1
>Reporter: Alessandro Benedetti
>Priority: Minor
>
> The size estimator dev tool currently :
>  * Wrongly calculates disk size in MB ( showing GB)
>  * Doesn't specify clearly that the space needed by the optimize is FREE space
>  * Avg. Document Size (KB) when they are more correctly Avg. Document Field 
> Size (KB)
> Scope of this issue is just to fix these small mistakes.
>  Out of scope is any improvement to the tool ( potentially separate Jira 
> issues will follow)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8329) Size Estimator wrongly calculate Disk space in MB

2018-05-23 Thread Alessandro Benedetti (JIRA)

Alessandro Benedetti created LUCENE-8329:


 Summary: Size Estimator wrongly calculate Disk space in MB
 Key: LUCENE-8329
 URL: https://issues.apache.org/jira/browse/LUCENE-8329
 Project: Lucene - Core
  Issue Type: Bug
  Components: -tools
Affects Versions: 7.3.1
 Environment: The size estimator dev tool currently :
 * Wrongly calculates disk size in MB ( showing GB)
 * Doesn't specify clearly that the space needed by the optimize is FREE space
 * Avg. Document Size (KB) when they are more correctly Avg. Document Field 
Size (KB)


Scope of this issue is just to fix these small mistakes.
Out of scope is any improvement to the tool ( potentially separate Jira issues 
will follow)
Reporter: Alessandro Benedetti






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9480) Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph)

2018-05-22 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484313#comment-16484313
 ] 

Alessandro Benedetti commented on SOLR-9480:


+1 very interesting !
I opened a Jira issue long time ago ( and nver worked on it, which seems quite 
related [1] )
I remember at the time I investigate some different relatedness metrices ( some 
of them are available in Elasticsearch [2]) 

Great work, I am curious to take a look to the implementation!

 [1]  https://issues.apache.org/jira/browse/SOLR-9851
[2]  [https://www.elastic.co/blog/significant-terms-aggregation]

> Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph)
> --
>
> Key: SOLR-9480
> URL: https://issues.apache.org/jira/browse/SOLR-9480
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Trey Grainger
>Assignee: Hoss Man
>Priority: Major
> Attachments: SOLR-9480.patch, SOLR-9480.patch, SOLR-9480.patch, 
> SOLR-9480.patch, SOLR-9480.patch, SOLR-9480.patch
>
>
> This issue is to track the contribution of the Semantic Knowledge Graph Solr 
> Plugin (request handler), which exposes a graph-like interface for 
> discovering and traversing significant relationships between entities within 
> an inverted index.
> This data model has been described in the following research paper: [The 
> Semantic Knowledge Graph: A compact, auto-generated model for real-time 
> traversal and ranking of any relationship within a 
> domain|https://arxiv.org/abs/1609.00464], as well as in presentations I gave 
> in October 2015 at [Lucene/Solr 
> Revolution|http://www.slideshare.net/treygrainger/leveraging-lucenesolr-as-a-knowledge-graph-and-intent-engine]
>  and November 2015 at the [Bay Area Search 
> Meetup|http://www.treygrainger.com/posts/presentations/searching-on-intent-knowledge-graphs-personalization-and-contextual-disambiguation/].
> The source code for this project is currently available at 
> [https://github.com/careerbuilder/semantic-knowledge-graph], and the folks at 
> CareerBuilder (where this was built) have given me the go-ahead to now 
> contribute this back to the Apache Solr Project, as well.
> Check out the Github repository, research paper, or presentations for a more 
> detailed description of this contribution. Initial patch coming soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-9480) Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph)

2018-05-22 Thread Alessandro Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484313#comment-16484313
 ] 

Alessandro Benedetti edited comment on SOLR-9480 at 5/22/18 5:21 PM:
-

+1 very interesting !
 I opened a Jira issue long time ago ( and nver worked on it, which seems quite 
related [1] )
 I remember at the time I investigate some different relatedness metrics ( some 
of them are available in Elasticsearch [2]) 

Great work, I am curious to take a look to the implementation!

[1]  https://issues.apache.org/jira/browse/SOLR-9851
 [2]  [https://www.elastic.co/blog/significant-terms-aggregation]


was (Author: alessandro.benedetti):
+1 very interesting !
I opened a Jira issue long time ago ( and nver worked on it, which seems quite 
related [1] )
I remember at the time I investigate some different relatedness metrices ( some 
of them are available in Elasticsearch [2]) 

Great work, I am curious to take a look to the implementation!

 [1]  https://issues.apache.org/jira/browse/SOLR-9851
[2]  [https://www.elastic.co/blog/significant-terms-aggregation]

> Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph)
> --
>
> Key: SOLR-9480
> URL: https://issues.apache.org/jira/browse/SOLR-9480
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Trey Grainger
>Assignee: Hoss Man
>Priority: Major
> Attachments: SOLR-9480.patch, SOLR-9480.patch, SOLR-9480.patch, 
> SOLR-9480.patch, SOLR-9480.patch, SOLR-9480.patch
>
>
> This issue is to track the contribution of the Semantic Knowledge Graph Solr 
> Plugin (request handler), which exposes a graph-like interface for 
> discovering and traversing significant relationships between entities within 
> an inverted index.
> This data model has been described in the following research paper: [The 
> Semantic Knowledge Graph: A compact, auto-generated model for real-time 
> traversal and ranking of any relationship within a 
> domain|https://arxiv.org/abs/1609.00464], as well as in presentations I gave 
> in October 2015 at [Lucene/Solr 
> Revolution|http://www.slideshare.net/treygrainger/leveraging-lucenesolr-as-a-knowledge-graph-and-intent-engine]
>  and November 2015 at the [Bay Area Search 
> Meetup|http://www.treygrainger.com/posts/presentations/searching-on-intent-knowledge-graphs-personalization-and-contextual-disambiguation/].
> The source code for this project is currently available at 
> [https://github.com/careerbuilder/semantic-knowledge-graph], and the folks at 
> CareerBuilder (where this was built) have given me the go-ahead to now 
> contribute this back to the Apache Solr Project, as well.
> Check out the Github repository, research paper, or presentations for a more 
> detailed description of this contribution. Initial patch coming soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 3 4 >

1 - 100 of 387 matches

Mail list logo