[jira] [Created] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-26 Thread Jahangir Anwari (JIRA)
FastVectorHighlighter - add position offset to 
FieldPhraseList.WeightedPhraseInfo.Toffs
---

 Key: LUCENE-3243
 URL: https://issues.apache.org/jira/browse/LUCENE-3243
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.2
 Environment: Lucene 3.2
Reporter: Jahangir Anwari
Priority: Minor


Needed to return position offsets along with highlighted snippets when using 
FVH for highlighting. 

Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
patch I was able to get the fragInfo for a particular Phrase search. Currently 
the Toffs(Term offsets) class only stores the start and end offset.

To get the position offset, I added the position offset information in Toffs 
and FieldPhraseList class.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-26 Thread Jahangir Anwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jahangir Anwari updated LUCENE-3243:


Attachment: (was: LUCENE-3243.patch.diff)

> FastVectorHighlighter - add position offset to 
> FieldPhraseList.WeightedPhraseInfo.Toffs
> ---
>
> Key: LUCENE-3243
> URL: https://issues.apache.org/jira/browse/LUCENE-3243
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 3.2
> Environment: Lucene 3.2
>Reporter: Jahangir Anwari
>Priority: Minor
>  Labels: feature, lucene
>
> Needed to return position offsets along with highlighted snippets when using 
> FVH for highlighting. 
> Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
> patch I was able to get the fragInfo for a particular Phrase search. 
> Currently the Toffs(Term offsets) class only stores the start and end offset.
> To get the position offset, I added the position offset information in Toffs 
> and FieldPhraseList class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-26 Thread Jahangir Anwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jahangir Anwari updated LUCENE-3243:


Attachment: LUCENE-3243.patch.diff

> FastVectorHighlighter - add position offset to 
> FieldPhraseList.WeightedPhraseInfo.Toffs
> ---
>
> Key: LUCENE-3243
> URL: https://issues.apache.org/jira/browse/LUCENE-3243
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 3.2
> Environment: Lucene 3.2
>Reporter: Jahangir Anwari
>Priority: Minor
>  Labels: feature, lucene
>
> Needed to return position offsets along with highlighted snippets when using 
> FVH for highlighting. 
> Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
> patch I was able to get the fragInfo for a particular Phrase search. 
> Currently the Toffs(Term offsets) class only stores the start and end offset.
> To get the position offset, I added the position offset information in Toffs 
> and FieldPhraseList class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-26 Thread Jahangir Anwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jahangir Anwari updated LUCENE-3243:


Attachment: LUCENE-3243.patch.diff

> FastVectorHighlighter - add position offset to 
> FieldPhraseList.WeightedPhraseInfo.Toffs
> ---
>
> Key: LUCENE-3243
> URL: https://issues.apache.org/jira/browse/LUCENE-3243
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 3.2
> Environment: Lucene 3.2
>Reporter: Jahangir Anwari
>Priority: Minor
>  Labels: feature, lucene
> Attachments: LUCENE-3243.patch.diff
>
>
> Needed to return position offsets along with highlighted snippets when using 
> FVH for highlighting. 
> Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
> patch I was able to get the fragInfo for a particular Phrase search. 
> Currently the Toffs(Term offsets) class only stores the start and end offset.
> To get the position offset, I added the position offset information in Toffs 
> and FieldPhraseList class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-27 Thread Jahangir Anwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jahangir Anwari updated LUCENE-3243:


Attachment: CustomSolrHighlighter.java

> FastVectorHighlighter - add position offset to 
> FieldPhraseList.WeightedPhraseInfo.Toffs
> ---
>
> Key: LUCENE-3243
> URL: https://issues.apache.org/jira/browse/LUCENE-3243
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 3.2
> Environment: Lucene 3.2
>Reporter: Jahangir Anwari
>Priority: Minor
>  Labels: feature, lucene
> Attachments: CustomSolrHighlighter.java, LUCENE-3243.patch.diff
>
>
> Needed to return position offsets along with highlighted snippets when using 
> FVH for highlighting. 
> Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
> patch I was able to get the fragInfo for a particular Phrase search. 
> Currently the Toffs(Term offsets) class only stores the start and end offset.
> To get the position offset, I added the position offset information in Toffs 
> and FieldPhraseList class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-27 Thread Jahangir Anwari (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055610#comment-13055610
 ] 

Jahangir Anwari commented on LUCENE-3243:
-

Hi Koji,

Sorry for not elaborating more on our requirements and our implementation. 
Basically for every search result we needed the position(word offset)  
information of the search hits in the document. On the search result page, this 
position offsets information was embedded in the search result links. When the 
user clicked on a search link, at the target page using javascript and the 
position offset information we would highlight the search terms.

To return the position offset information along with the highlighted snippet we 
created a CustomSolrHighlihter(attached). Depending on the type of query the 
custom highlighter returns the position offsets information. 

# Non-phrase query: Using FieldTermStack we return the term position offset for 
the terms in the query.
# Phrase query: Using the WeightedFragInfo.fragInfos we return the term 
position offset for the terms in the query.

But currently the Toffs(Term offsets) class only stores the start and end 
offset and so we updated it so that it would store the position information as 
well.

Answers to your questions:

* *What is the position offset? Isn't it just a position?*
Yes, it is just the position.

* *Why is the position offset String?*
Since for phrase queries(e.g. "divine knowledge") the position-gap between 
terms == 1, WeightedPhraseInfo would only store the startOffset(i.e 12) of the 
first term of the phrase terms and the endOffset(i.e. 29) of the phrase terms.
{code} 

[startOffset, endOffset]
"divine knowledge": [(12,29)]
{code}
But as we needed position information(i.e. 5,6) of all the terms it required 
storing the position of the terms of a phrase query as a String. 
{code}
[startOffset, endOffset, positions]
"divine knowledge": [(12,29, [5,6])]

{code}
* *Why do you need setPositionOffset()?*
setPositionOffset() is used to store the positions of consecutive terms of a 
phrase query. For every terms of the phrase query it just appends the argument 
position to the current position(i.e. [5,6]). 

P.S. In order to able to override doHighlightingByFastVectorHighlighter() 
method in CustomSolrHighlighter we had to change the access modifier for 
alternateField() and getSolrFragmentsBuilder() to protected.

> FastVectorHighlighter - add position offset to 
> FieldPhraseList.WeightedPhraseInfo.Toffs
> ---
>
> Key: LUCENE-3243
> URL: https://issues.apache.org/jira/browse/LUCENE-3243
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 3.2
> Environment: Lucene 3.2
>Reporter: Jahangir Anwari
>Priority: Minor
>  Labels: feature, lucene
> Attachments: CustomSolrHighlighter.java, LUCENE-3243.patch.diff
>
>
> Needed to return position offsets along with highlighted snippets when using 
> FVH for highlighting. 
> Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
> patch I was able to get the fragInfo for a particular Phrase search. 
> Currently the Toffs(Term offsets) class only stores the start and end offset.
> To get the position offset, I added the position offset information in Toffs 
> and FieldPhraseList class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-27 Thread Jahangir Anwari (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055610#comment-13055610
 ] 

Jahangir Anwari edited comment on LUCENE-3243 at 6/27/11 4:01 PM:
--

Hi Koji,

Sorry for not elaborating more on our requirements and our implementation. 
Basically for every search result we needed the position(word offset)  
information of the search hits in the document. On the search result page, this 
position offsets information was embedded in the search result links. When the 
user clicked on a search link, at the target page using javascript and the 
position offset information we would highlight the search terms.

To return the position offset information along with the highlighted snippet we 
created a CustomSolrHighlihter(attached). Depending on the type of query the 
custom highlighter returns the position offsets information. 

# Non-phrase query: Using FieldTermStack we return the term position offset for 
the terms in the query.
# Phrase query: Using the WeightedFragInfo.fragInfos we return the term 
position offset for the terms in the query.

But currently the Toffs(Term offsets) class only stores the start and end 
offset and so we updated it so that it would store the position information as 
well.

Answers to your questions:

* *What is the position offset? Isn't it just a position?*
Yes, it is just the position.

* *Why is the position offset String?*
Since for phrase queries(e.g. "divine knowledge") the position-gap between 
terms == 1, WeightedPhraseInfo would only store the startOffset(i.e 12) of the 
first term of the phrase terms and the endOffset(i.e. 29) of the phrase terms.
{code} 

[startOffset, endOffset]
"divine knowledge": [(12,29)]
{code}
But as we needed position information(i.e. 5,6) of all the terms it required 
storing the position of the terms of a phrase query as a String. 
{code}
[startOffset, endOffset, positions]
"divine knowledge": [(12,29, [5,6])]

{code}
* *Why do you need setPositionOffset()?*
setPositionOffset() is used to store the positions of consecutive terms of a 
phrase query. For every terms of the phrase query it just appends the argument 
position to the current position(i.e. [5,6]). 

Example output:

{code}

   
   un of divine knowledge and 
understanding, and become the recipients of a grace that is infinite and 
   
   80,81,118,119


{code}


P.S. In order to able to override doHighlightingByFastVectorHighlighter() 
method in CustomSolrHighlighter we had to change the access modifier for 
alternateField() and getSolrFragmentsBuilder() to protected.

  was (Author: janwari):
Hi Koji,

Sorry for not elaborating more on our requirements and our implementation. 
Basically for every search result we needed the position(word offset)  
information of the search hits in the document. On the search result page, this 
position offsets information was embedded in the search result links. When the 
user clicked on a search link, at the target page using javascript and the 
position offset information we would highlight the search terms.

To return the position offset information along with the highlighted snippet we 
created a CustomSolrHighlihter(attached). Depending on the type of query the 
custom highlighter returns the position offsets information. 

# Non-phrase query: Using FieldTermStack we return the term position offset for 
the terms in the query.
# Phrase query: Using the WeightedFragInfo.fragInfos we return the term 
position offset for the terms in the query.

But currently the Toffs(Term offsets) class only stores the start and end 
offset and so we updated it so that it would store the position information as 
well.

Answers to your questions:

* *What is the position offset? Isn't it just a position?*
Yes, it is just the position.

* *Why is the position offset String?*
Since for phrase queries(e.g. "divine knowledge") the position-gap between 
terms == 1, WeightedPhraseInfo would only store the startOffset(i.e 12) of the 
first term of the phrase terms and the endOffset(i.e. 29) of the phrase terms.
{code} 

[startOffset, endOffset]
"divine knowledge": [(12,29)]
{code}
But as we needed position information(i.e. 5,6) of all the terms it required 
storing the position of the terms of a phrase query as a String. 
{code}
[startOffset, endOffset, positions]
"divine knowledge": [(12,29, [5,6])]

{code}
* *Why do you need setPositionOffset()?*
setPositionOffset() is used to store the positions of consecutive terms of a 
phrase query. For every terms of the phrase query it just appends the argument 
position to the current position(i.e. [5,6]). 

P.S. In order to able to override doHighlightingByFastVectorHighlighter() 
method in CustomSolrHighlighter we had to change the access modifier for 
alternateField() and getSo

[jira] [Commented] (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

2011-06-28 Thread Jahangir Anwari (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056455#comment-13056455
 ] 

Jahangir Anwari commented on LUCENE-1824:
-

Is there any chance of the patch being applied to the 3.x branch?

> FastVectorHighlighter truncates words at beginning and end of fragments
> ---
>
> Key: LUCENE-1824
> URL: https://issues.apache.org/jira/browse/LUCENE-1824
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
> Environment: any
>Reporter: Alex Vigdor
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when 
> building fragments, so that in most cases the first and last word of a 
> fragment are truncated.  This makes the highlights less legible than they 
> should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
> by expanding the start and end boundaries of the fragment to the first 
> whitespace character on either side of the fragment, or the beginning or end 
> of the source text, whichever comes first.  This significantly improves 
> legibility, at the cost of returning a slightly larger number of characters 
> than specified for the fragment size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3287) Allow ability to set maxDocCharsToAnalyze in WeightedSpanTermExtractor

2011-07-07 Thread Jahangir Anwari (JIRA)
Allow ability to set maxDocCharsToAnalyze in WeightedSpanTermExtractor
--

 Key: LUCENE-3287
 URL: https://issues.apache.org/jira/browse/LUCENE-3287
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.3
Reporter: Jahangir Anwari
Priority: Trivial


In WeightedSpanTermExtractor the default maxDocCharsToAnalyze value is 0. This 
inhibits us from getting the weighted span terms in any custom code(e.g 
attached CustomHighlighter.java) that uses WeightedSpanTermExtractor. Currently 
the setMaxDocCharsToAnalyze() method is protected, which prevents us from 
setting  maxDocCharsToAnalyze to a value greater than 0. Changing the method to 
public would give us the ability to set the maxDocCharsToAnalyze.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3287) Allow ability to set maxDocCharsToAnalyze in WeightedSpanTermExtractor

2011-07-07 Thread Jahangir Anwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jahangir Anwari updated LUCENE-3287:


Attachment: WeightedSpanTermExtractor.patch
CustomHighlighter.java

> Allow ability to set maxDocCharsToAnalyze in WeightedSpanTermExtractor
> --
>
> Key: LUCENE-3287
> URL: https://issues.apache.org/jira/browse/LUCENE-3287
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 3.3
>Reporter: Jahangir Anwari
>Priority: Trivial
> Attachments: CustomHighlighter.java, WeightedSpanTermExtractor.patch
>
>
> In WeightedSpanTermExtractor the default maxDocCharsToAnalyze value is 0. 
> This inhibits us from getting the weighted span terms in any custom code(e.g 
> attached CustomHighlighter.java) that uses WeightedSpanTermExtractor. 
> Currently the setMaxDocCharsToAnalyze() method is protected, which prevents 
> us from setting  maxDocCharsToAnalyze to a value greater than 0. Changing the 
> method to public would give us the ability to set the maxDocCharsToAnalyze.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3304) Allow WeightedSpanTermExtractor to collect positions for TermQuerys

2011-07-11 Thread Jahangir Anwari (JIRA)
Allow WeightedSpanTermExtractor to collect positions for TermQuerys
---

 Key: LUCENE-3304
 URL: https://issues.apache.org/jira/browse/LUCENE-3304
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.3
Reporter: Jahangir Anwari
Priority: Trivial


Spinoff from this thread:

http://www.gossamer-threads.com/lists/lucene/java-user/129668

Currently WeightedSpanTermExtractor only collects positions for "position 
sensitive" queries. Allowing WeightedSpanTermExtractor to store positions for 
TermQuery would enable the WeightedSpanTermExtractor to be used outside the 
highlighter in custom plugins to get positions information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3287) Allow ability to set maxDocCharsToAnalyze in WeightedSpanTermExtractor

2011-07-11 Thread Jahangir Anwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jahangir Anwari updated LUCENE-3287:


Description: 
Spinoff from this thread:

http://www.gossamer-threads.com/lists/lucene/java-user/129668

In WeightedSpanTermExtractor the default maxDocCharsToAnalyze value is 0. This 
inhibits us from getting the weighted span terms in any custom code(e.g 
attached CustomHighlighter.java) that uses WeightedSpanTermExtractor. Currently 
the setMaxDocCharsToAnalyze() method is protected, which prevents us from 
setting  maxDocCharsToAnalyze to a value greater than 0. Changing the method to 
public would give us the ability to set the maxDocCharsToAnalyze.


  was:
In WeightedSpanTermExtractor the default maxDocCharsToAnalyze value is 0. This 
inhibits us from getting the weighted span terms in any custom code(e.g 
attached CustomHighlighter.java) that uses WeightedSpanTermExtractor. Currently 
the setMaxDocCharsToAnalyze() method is protected, which prevents us from 
setting  maxDocCharsToAnalyze to a value greater than 0. Changing the method to 
public would give us the ability to set the maxDocCharsToAnalyze.



> Allow ability to set maxDocCharsToAnalyze in WeightedSpanTermExtractor
> --
>
> Key: LUCENE-3287
> URL: https://issues.apache.org/jira/browse/LUCENE-3287
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 3.3
>Reporter: Jahangir Anwari
>Priority: Trivial
> Attachments: CustomHighlighter.java, WeightedSpanTermExtractor.patch
>
>
> Spinoff from this thread:
> http://www.gossamer-threads.com/lists/lucene/java-user/129668
> In WeightedSpanTermExtractor the default maxDocCharsToAnalyze value is 0. 
> This inhibits us from getting the weighted span terms in any custom code(e.g 
> attached CustomHighlighter.java) that uses WeightedSpanTermExtractor. 
> Currently the setMaxDocCharsToAnalyze() method is protected, which prevents 
> us from setting  maxDocCharsToAnalyze to a value greater than 0. Changing the 
> method to public would give us the ability to set the maxDocCharsToAnalyze.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org