[jira] [Commented] (LUCENE-4825) PostingsHighlighter support for positional queries

2013-03-13 Thread Luca Cavanna (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600918#comment-13600918
 ] 

Luca Cavanna commented on LUCENE-4825:
--

Hey Robert,
sorry but I don't quite understand why it would become an orange? :)

I mean, the PostingsHighlighter does (among others) two great things:
1) reads offsets from the postings list, as its name says
2) summarizes the content giving nice sentences as output

I think the two above features are a great improvement and pretty much what 
everybody would like to have!

I'm proposing to add support for positional queries, as a third optional 
feature. We would need to read the spans from the positional queries in order 
to highlight only the proper terms, otherwise the output is wrong from a user 
perspective. Would this make it that slower? I don't mean to reanalyze the 
text...

Don't get me wrong you must be right but I would like to understand more. 

You're saying that instead of adding 3) to 2) and 1) we should have another 
highlighter that does 1) 2) and 3)?





 PostingsHighlighter support for positional queries
 --

 Key: LUCENE-4825
 URL: https://issues.apache.org/jira/browse/LUCENE-4825
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 4.2
Reporter: Luca Cavanna

 I've been playing around with the brand new PostingsHighlighter. I'm really 
 happy with the result in terms of quality of the snippets and performance.
 On the other hand, I noticed it doesn't support positional queries. If you 
 make a span query, for example, all the single terms will be highlighted, 
 even though they haven't contributed to the match. That reminds me of the 
 difference between the QueryTermScorer and the QueryScorer (using the 
 standard Highlighter).
 I've been trying to adapt what the QueryScorer does, especially the 
 extraction of the query terms together with their positions (what 
 WeightedSpanTermExtractor does). Next step would be to take that information 
 into account within the formatter and highlight only the terms that actually 
 contributed to the match. I'm not quite ready yet with a patch to contribute 
 this back, but I certainly intend to do so. That's why I opened the issue and 
 in the meantime I would like to hear what you guys think about it and  
 discuss how best we can fix it. I think it would be a big improvement for 
 this new highlighter, which is already great!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4825) PostingsHighlighter support for positional queries

2013-03-13 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601017#comment-13601017
 ] 

Robert Muir commented on LUCENE-4825:
-

I dont see this highlighter as doing that I guess.

I see it as taking query *terms* (not matches) and intersecting them with a 
breakiterator in increasing offset order, ranking these passages as it goes.

{quote}
We would need to read the spans from the positional queries in order to 
highlight only the proper terms, otherwise the output is wrong from a user 
perspective.
{quote}

Then the user is wrong, and should use another highlighter. This one is about 
good document summarization with respect to the query terms. Its not about 
visualizing exact matches to lucene queries.

If the user doesnt care about 'search' but about 'matching' at the expense of 
everything else, they already have 2 other highlighters in lucene that focus on 
this (making wrong tradeoffs in my opinion)!


 PostingsHighlighter support for positional queries
 --

 Key: LUCENE-4825
 URL: https://issues.apache.org/jira/browse/LUCENE-4825
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 4.2
Reporter: Luca Cavanna

 I've been playing around with the brand new PostingsHighlighter. I'm really 
 happy with the result in terms of quality of the snippets and performance.
 On the other hand, I noticed it doesn't support positional queries. If you 
 make a span query, for example, all the single terms will be highlighted, 
 even though they haven't contributed to the match. That reminds me of the 
 difference between the QueryTermScorer and the QueryScorer (using the 
 standard Highlighter).
 I've been trying to adapt what the QueryScorer does, especially the 
 extraction of the query terms together with their positions (what 
 WeightedSpanTermExtractor does). Next step would be to take that information 
 into account within the formatter and highlight only the terms that actually 
 contributed to the match. I'm not quite ready yet with a patch to contribute 
 this back, but I certainly intend to do so. That's why I opened the issue and 
 in the meantime I would like to hear what you guys think about it and  
 discuss how best we can fix it. I think it would be a big improvement for 
 this new highlighter, which is already great!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4825) PostingsHighlighter support for positional queries

2013-03-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600404#comment-13600404
 ] 

Robert Muir commented on LUCENE-4825:
-

I think it supports positional queries, just in a different way. 

I don't really like the way the standardhighlighter does this myself. I would 
prefer if we avoided the slow stuff
those things do in this highlighter (because we already have other ones that do 
that). This one instead puts more effort
on trying to summarize the document with respect to the query terms (which is 
faster, and for some cases, better use of cpu time).

I think a good improvement would be to letting the proximity of terms within 
passages influence the scoring. Its not necessary to actually gather anything 
about the query to do this and wouldnt be confusing and would still support all 
queries that support extractTerms().

On the other hand we can always create variants of this highlighter that do as 
you suggest, so that it leaves the user with more choices. But I just would 
prefer we don't try to force PostingsHighlighter work like the other 
highlighters for the reasons i mentioned.


 PostingsHighlighter support for positional queries
 --

 Key: LUCENE-4825
 URL: https://issues.apache.org/jira/browse/LUCENE-4825
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 4.2
Reporter: Luca Cavanna

 I've been playing around with the brand new PostingsHighlighter. I'm really 
 happy with the result in terms of quality of the snippets and performance.
 On the other hand, I noticed it doesn't support positional queries. If you 
 make a span query, for example, all the single terms will be highlighted, 
 even though they haven't contributed to the match. That reminds me of the 
 difference between the QueryTermScorer and the QueryScorer (using the 
 standard Highlighter).
 I've been trying to adapt what the QueryScorer does, especially the 
 extraction of the query terms together with their positions (what 
 WeightedSpanTermExtractor does). Next step would be to take that information 
 into account within the formatter and highlight only the terms that actually 
 contributed to the match. I'm not quite ready yet with a patch to contribute 
 this back, but I certainly intend to do so. That's why I opened the issue and 
 in the meantime I would like to hear what you guys think about it and  
 discuss how best we can fix it. I think it would be a big improvement for 
 this new highlighter, which is already great!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4825) PostingsHighlighter support for positional queries

2013-03-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600459#comment-13600459
 ] 

Robert Muir commented on LUCENE-4825:
-

Also I think the most efficient way to add this (though its all in a branch i 
think?) would be to add a IntervalHighlighter.

This would work with all queries i think, without the current complexity of 
rewriting things and so on.

 PostingsHighlighter support for positional queries
 --

 Key: LUCENE-4825
 URL: https://issues.apache.org/jira/browse/LUCENE-4825
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 4.2
Reporter: Luca Cavanna

 I've been playing around with the brand new PostingsHighlighter. I'm really 
 happy with the result in terms of quality of the snippets and performance.
 On the other hand, I noticed it doesn't support positional queries. If you 
 make a span query, for example, all the single terms will be highlighted, 
 even though they haven't contributed to the match. That reminds me of the 
 difference between the QueryTermScorer and the QueryScorer (using the 
 standard Highlighter).
 I've been trying to adapt what the QueryScorer does, especially the 
 extraction of the query terms together with their positions (what 
 WeightedSpanTermExtractor does). Next step would be to take that information 
 into account within the formatter and highlight only the terms that actually 
 contributed to the match. I'm not quite ready yet with a patch to contribute 
 this back, but I certainly intend to do so. That's why I opened the issue and 
 in the meantime I would like to hear what you guys think about it and  
 discuss how best we can fix it. I think it would be a big improvement for 
 this new highlighter, which is already great!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4825) PostingsHighlighter support for positional queries

2013-03-12 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600549#comment-13600549
 ] 

Alan Woodward commented on LUCENE-4825:
---

That's sort of where I've been aiming at on LUCENE-2878, although it's 
half-finished at the moment, and I keep getting pulled away from it.  The idea 
is that you expose positions directly on the Scorer, and then you can have a 
HighlightingCollector that extracts matching positions in its collect() method.

 PostingsHighlighter support for positional queries
 --

 Key: LUCENE-4825
 URL: https://issues.apache.org/jira/browse/LUCENE-4825
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 4.2
Reporter: Luca Cavanna

 I've been playing around with the brand new PostingsHighlighter. I'm really 
 happy with the result in terms of quality of the snippets and performance.
 On the other hand, I noticed it doesn't support positional queries. If you 
 make a span query, for example, all the single terms will be highlighted, 
 even though they haven't contributed to the match. That reminds me of the 
 difference between the QueryTermScorer and the QueryScorer (using the 
 standard Highlighter).
 I've been trying to adapt what the QueryScorer does, especially the 
 extraction of the query terms together with their positions (what 
 WeightedSpanTermExtractor does). Next step would be to take that information 
 into account within the formatter and highlight only the terms that actually 
 contributed to the match. I'm not quite ready yet with a patch to contribute 
 this back, but I certainly intend to do so. That's why I opened the issue and 
 in the meantime I would like to hear what you guys think about it and  
 discuss how best we can fix it. I think it would be a big improvement for 
 this new highlighter, which is already great!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4825) PostingsHighlighter support for positional queries

2013-03-12 Thread Luca Cavanna (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600572#comment-13600572
 ] 

Luca Cavanna commented on LUCENE-4825:
--

Thanks for your inputs Robert!

I see your point, even though from a user perspective I'd rather see only the 
complete phrase highlighted if I make a phrase query, not every single term. I 
think we can currently achieve this only like the old highlighter does, am I 
right? 
Maybe we can make this pluggable and have different implementations?




 PostingsHighlighter support for positional queries
 --

 Key: LUCENE-4825
 URL: https://issues.apache.org/jira/browse/LUCENE-4825
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 4.2
Reporter: Luca Cavanna

 I've been playing around with the brand new PostingsHighlighter. I'm really 
 happy with the result in terms of quality of the snippets and performance.
 On the other hand, I noticed it doesn't support positional queries. If you 
 make a span query, for example, all the single terms will be highlighted, 
 even though they haven't contributed to the match. That reminds me of the 
 difference between the QueryTermScorer and the QueryScorer (using the 
 standard Highlighter).
 I've been trying to adapt what the QueryScorer does, especially the 
 extraction of the query terms together with their positions (what 
 WeightedSpanTermExtractor does). Next step would be to take that information 
 into account within the formatter and highlight only the terms that actually 
 contributed to the match. I'm not quite ready yet with a patch to contribute 
 this back, but I certainly intend to do so. That's why I opened the issue and 
 in the meantime I would like to hear what you guys think about it and  
 discuss how best we can fix it. I think it would be a big improvement for 
 this new highlighter, which is already great!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4825) PostingsHighlighter support for positional queries

2013-03-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600667#comment-13600667
 ] 

Robert Muir commented on LUCENE-4825:
-

I think making this pluggable for that is more like making an apple 
subclassable to be an orange. Thats why i recommend just a different 
highlighter with a design to fit. This one focuses on summarizing the document 
relevant to the individual query terms and the API reflects that.

We dont need to have a one-size-fits-all solution, we can have choices.

 PostingsHighlighter support for positional queries
 --

 Key: LUCENE-4825
 URL: https://issues.apache.org/jira/browse/LUCENE-4825
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 4.2
Reporter: Luca Cavanna

 I've been playing around with the brand new PostingsHighlighter. I'm really 
 happy with the result in terms of quality of the snippets and performance.
 On the other hand, I noticed it doesn't support positional queries. If you 
 make a span query, for example, all the single terms will be highlighted, 
 even though they haven't contributed to the match. That reminds me of the 
 difference between the QueryTermScorer and the QueryScorer (using the 
 standard Highlighter).
 I've been trying to adapt what the QueryScorer does, especially the 
 extraction of the query terms together with their positions (what 
 WeightedSpanTermExtractor does). Next step would be to take that information 
 into account within the formatter and highlight only the terms that actually 
 contributed to the match. I'm not quite ready yet with a patch to contribute 
 this back, but I certainly intend to do so. That's why I opened the issue and 
 in the meantime I would like to hear what you guys think about it and  
 discuss how best we can fix it. I think it would be a big improvement for 
 this new highlighter, which is already great!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org