[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-26 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-3234:
---

Attachment: LUCENE-3234.patch

Oops, wrong patch. This one is correct.

> Provide limit on phrase analysis in FastVectorHighlighter
> -
>
> Key: LUCENE-3234
> URL: https://issues.apache.org/jira/browse/LUCENE-3234
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3
>Reporter: Mike Sokolov
>Assignee: Koji Sekiguchi
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, 
> LUCENE-3234.patch, LUCENE-3234.patch
>
>
> With larger documents, FVH can spend a lot of time trying to find the 
> best-scoring snippet as it examines every possible phrase formed from 
> matching terms in the document.  If one is willing to accept
> less-than-perfect scoring by limiting the number of phrases that are 
> examined, substantial speedups are possible.  This is analogous to the 
> Highlighter limit on the number of characters to analyze.
> The patch includes an artifical test case that shows > 1000x speedup.  In a 
> more normal test environment, with English documents and random queries, I am 
> seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
> effect of selecting the first possible snippet in the document.  Most of our 
> sites operate in this way (just show the first snippet), so this would be a 
> big win for us.
> With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
> phraseLimit, you may not get substantial speedup in the normal case, but you 
> do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-26 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-3234:
---

Attachment: LUCENE-3234.patch

Updated patch attached. I added CHANGES.txt entries for Lucene and Solr, used 
Integer.MAX_VALUE for the default and added @param for phraseLimit in the new 
constructor javadoc. Will commit soon.

> Provide limit on phrase analysis in FastVectorHighlighter
> -
>
> Key: LUCENE-3234
> URL: https://issues.apache.org/jira/browse/LUCENE-3234
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3
>Reporter: Mike Sokolov
>Assignee: Koji Sekiguchi
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, 
> LUCENE-3234.patch
>
>
> With larger documents, FVH can spend a lot of time trying to find the 
> best-scoring snippet as it examines every possible phrase formed from 
> matching terms in the document.  If one is willing to accept
> less-than-perfect scoring by limiting the number of phrases that are 
> examined, substantial speedups are possible.  This is analogous to the 
> Highlighter limit on the number of characters to analyze.
> The patch includes an artifical test case that shows > 1000x speedup.  In a 
> more normal test environment, with English documents and random queries, I am 
> seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
> effect of selecting the first possible snippet in the document.  Most of our 
> sites operate in this way (just show the first snippet), so this would be a 
> big win for us.
> With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
> phraseLimit, you may not get substantial speedup in the normal case, but you 
> do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-25 Thread Mike Sokolov (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Sokolov updated LUCENE-3234:
-

Attachment: LUCENE-3234.patch

Sure - the test is fragile.  It was just meant to illustrate the use case; not 
really a good unit test for regression.  The last patch has it commented.

> Provide limit on phrase analysis in FastVectorHighlighter
> -
>
> Key: LUCENE-3234
> URL: https://issues.apache.org/jira/browse/LUCENE-3234
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3
>Reporter: Mike Sokolov
>Assignee: Koji Sekiguchi
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch
>
>
> With larger documents, FVH can spend a lot of time trying to find the 
> best-scoring snippet as it examines every possible phrase formed from 
> matching terms in the document.  If one is willing to accept
> less-than-perfect scoring by limiting the number of phrases that are 
> examined, substantial speedups are possible.  This is analogous to the 
> Highlighter limit on the number of characters to analyze.
> The patch includes an artifical test case that shows > 1000x speedup.  In a 
> more normal test environment, with English documents and random queries, I am 
> seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
> effect of selecting the first possible snippet in the document.  Most of our 
> sites operate in this way (just show the first snippet), so this would be a 
> big win for us.
> With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
> phraseLimit, you may not get substantial speedup in the normal case, but you 
> do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-23 Thread Mike Sokolov (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Sokolov updated LUCENE-3234:
-

Attachment: (was: LUCENE-3234.patch)

> Provide limit on phrase analysis in FastVectorHighlighter
> -
>
> Key: LUCENE-3234
> URL: https://issues.apache.org/jira/browse/LUCENE-3234
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3
>Reporter: Mike Sokolov
>Assignee: Koji Sekiguchi
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3234.patch, LUCENE-3234.patch
>
>
> With larger documents, FVH can spend a lot of time trying to find the 
> best-scoring snippet as it examines every possible phrase formed from 
> matching terms in the document.  If one is willing to accept
> less-than-perfect scoring by limiting the number of phrases that are 
> examined, substantial speedups are possible.  This is analogous to the 
> Highlighter limit on the number of characters to analyze.
> The patch includes an artifical test case that shows > 1000x speedup.  In a 
> more normal test environment, with English documents and random queries, I am 
> seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
> effect of selecting the first possible snippet in the document.  Most of our 
> sites operate in this way (just show the first snippet), so this would be a 
> big win for us.
> With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
> phraseLimit, you may not get substantial speedup in the normal case, but you 
> do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-23 Thread Mike Sokolov (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Sokolov updated LUCENE-3234:
-

Attachment: LUCENE-3234.patch

> Provide limit on phrase analysis in FastVectorHighlighter
> -
>
> Key: LUCENE-3234
> URL: https://issues.apache.org/jira/browse/LUCENE-3234
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3
>Reporter: Mike Sokolov
>Assignee: Koji Sekiguchi
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3234.patch, LUCENE-3234.patch
>
>
> With larger documents, FVH can spend a lot of time trying to find the 
> best-scoring snippet as it examines every possible phrase formed from 
> matching terms in the document.  If one is willing to accept
> less-than-perfect scoring by limiting the number of phrases that are 
> examined, substantial speedups are possible.  This is analogous to the 
> Highlighter limit on the number of characters to analyze.
> The patch includes an artifical test case that shows > 1000x speedup.  In a 
> more normal test environment, with English documents and random queries, I am 
> seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
> effect of selecting the first possible snippet in the document.  Most of our 
> sites operate in this way (just show the first snippet), so this would be a 
> big win for us.
> With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
> phraseLimit, you may not get substantial speedup in the normal case, but you 
> do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-23 Thread Mike Sokolov (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Sokolov updated LUCENE-3234:
-

Attachment: LUCENE-3234.patch

Added solr parameter hl.phraseLimit (default=5000)

> Provide limit on phrase analysis in FastVectorHighlighter
> -
>
> Key: LUCENE-3234
> URL: https://issues.apache.org/jira/browse/LUCENE-3234
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3
>Reporter: Mike Sokolov
>Assignee: Koji Sekiguchi
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3234.patch, LUCENE-3234.patch
>
>
> With larger documents, FVH can spend a lot of time trying to find the 
> best-scoring snippet as it examines every possible phrase formed from 
> matching terms in the document.  If one is willing to accept
> less-than-perfect scoring by limiting the number of phrases that are 
> examined, substantial speedups are possible.  This is analogous to the 
> Highlighter limit on the number of characters to analyze.
> The patch includes an artifical test case that shows > 1000x speedup.  In a 
> more normal test environment, with English documents and random queries, I am 
> seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
> effect of selecting the first possible snippet in the document.  Most of our 
> sites operate in this way (just show the first snippet), so this would be a 
> big win for us.
> With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
> phraseLimit, you may not get substantial speedup in the normal case, but you 
> do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-23 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-3234:
---

Affects Version/s: 3.3
   2.9.4
   3.0.3
   3.1
   3.2
Fix Version/s: 4.0
   3.4

> Provide limit on phrase analysis in FastVectorHighlighter
> -
>
> Key: LUCENE-3234
> URL: https://issues.apache.org/jira/browse/LUCENE-3234
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3
>Reporter: Mike Sokolov
>Assignee: Koji Sekiguchi
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3234.patch
>
>
> With larger documents, FVH can spend a lot of time trying to find the 
> best-scoring snippet as it examines every possible phrase formed from 
> matching terms in the document.  If one is willing to accept
> less-than-perfect scoring by limiting the number of phrases that are 
> examined, substantial speedups are possible.  This is analogous to the 
> Highlighter limit on the number of characters to analyze.
> The patch includes an artifical test case that shows > 1000x speedup.  In a 
> more normal test environment, with English documents and random queries, I am 
> seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
> effect of selecting the first possible snippet in the document.  Most of our 
> sites operate in this way (just show the first snippet), so this would be a 
> big win for us.
> With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
> phraseLimit, you may not get substantial speedup in the normal case, but you 
> do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-23 Thread Mike Sokolov (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Sokolov updated LUCENE-3234:
-

Attachment: LUCENE-3234.patch

> Provide limit on phrase analysis in FastVectorHighlighter
> -
>
> Key: LUCENE-3234
> URL: https://issues.apache.org/jira/browse/LUCENE-3234
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Mike Sokolov
> Attachments: LUCENE-3234.patch
>
>
> With larger documents, FVH can spend a lot of time trying to find the 
> best-scoring snippet as it examines every possible phrase formed from 
> matching terms in the document.  If one is willing to accept
> less-than-perfect scoring by limiting the number of phrases that are 
> examined, substantial speedups are possible.  This is analogous to the 
> Highlighter limit on the number of characters to analyze.
> The patch includes an artifical test case that shows > 1000x speedup.  In a 
> more normal test environment, with English documents and random queries, I am 
> seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
> effect of selecting the first possible snippet in the document.  Most of our 
> sites operate in this way (just show the first snippet), so this would be a 
> big win for us.
> With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
> phraseLimit, you may not get substantial speedup in the normal case, but you 
> do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org