[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-3234: --- Attachment: LUCENE-3234.patch Oops, wrong patch. This one is correct. > Provide limit on phrase analysis in FastVectorHighlighter > - > > Key: LUCENE-3234 > URL: https://issues.apache.org/jira/browse/LUCENE-3234 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 >Reporter: Mike Sokolov >Assignee: Koji Sekiguchi > Fix For: 3.4, 4.0 > > Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, > LUCENE-3234.patch, LUCENE-3234.patch > > > With larger documents, FVH can spend a lot of time trying to find the > best-scoring snippet as it examines every possible phrase formed from > matching terms in the document. If one is willing to accept > less-than-perfect scoring by limiting the number of phrases that are > examined, substantial speedups are possible. This is analogous to the > Highlighter limit on the number of characters to analyze. > The patch includes an artifical test case that shows > 1000x speedup. In a > more normal test environment, with English documents and random queries, I am > seeing speedups of around 3-10x when setting phraseLimit=1, which has the > effect of selecting the first possible snippet in the document. Most of our > sites operate in this way (just show the first snippet), so this would be a > big win for us. > With phraseLimit = -1, you get the existing FVH behavior. At larger values of > phraseLimit, you may not get substantial speedup in the normal case, but you > do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-3234: --- Attachment: LUCENE-3234.patch Updated patch attached. I added CHANGES.txt entries for Lucene and Solr, used Integer.MAX_VALUE for the default and added @param for phraseLimit in the new constructor javadoc. Will commit soon. > Provide limit on phrase analysis in FastVectorHighlighter > - > > Key: LUCENE-3234 > URL: https://issues.apache.org/jira/browse/LUCENE-3234 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 >Reporter: Mike Sokolov >Assignee: Koji Sekiguchi > Fix For: 3.4, 4.0 > > Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, > LUCENE-3234.patch > > > With larger documents, FVH can spend a lot of time trying to find the > best-scoring snippet as it examines every possible phrase formed from > matching terms in the document. If one is willing to accept > less-than-perfect scoring by limiting the number of phrases that are > examined, substantial speedups are possible. This is analogous to the > Highlighter limit on the number of characters to analyze. > The patch includes an artifical test case that shows > 1000x speedup. In a > more normal test environment, with English documents and random queries, I am > seeing speedups of around 3-10x when setting phraseLimit=1, which has the > effect of selecting the first possible snippet in the document. Most of our > sites operate in this way (just show the first snippet), so this would be a > big win for us. > With phraseLimit = -1, you get the existing FVH behavior. At larger values of > phraseLimit, you may not get substantial speedup in the normal case, but you > do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-3234: - Attachment: LUCENE-3234.patch Sure - the test is fragile. It was just meant to illustrate the use case; not really a good unit test for regression. The last patch has it commented. > Provide limit on phrase analysis in FastVectorHighlighter > - > > Key: LUCENE-3234 > URL: https://issues.apache.org/jira/browse/LUCENE-3234 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 >Reporter: Mike Sokolov >Assignee: Koji Sekiguchi > Fix For: 3.4, 4.0 > > Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch > > > With larger documents, FVH can spend a lot of time trying to find the > best-scoring snippet as it examines every possible phrase formed from > matching terms in the document. If one is willing to accept > less-than-perfect scoring by limiting the number of phrases that are > examined, substantial speedups are possible. This is analogous to the > Highlighter limit on the number of characters to analyze. > The patch includes an artifical test case that shows > 1000x speedup. In a > more normal test environment, with English documents and random queries, I am > seeing speedups of around 3-10x when setting phraseLimit=1, which has the > effect of selecting the first possible snippet in the document. Most of our > sites operate in this way (just show the first snippet), so this would be a > big win for us. > With phraseLimit = -1, you get the existing FVH behavior. At larger values of > phraseLimit, you may not get substantial speedup in the normal case, but you > do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-3234: - Attachment: (was: LUCENE-3234.patch) > Provide limit on phrase analysis in FastVectorHighlighter > - > > Key: LUCENE-3234 > URL: https://issues.apache.org/jira/browse/LUCENE-3234 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 >Reporter: Mike Sokolov >Assignee: Koji Sekiguchi > Fix For: 3.4, 4.0 > > Attachments: LUCENE-3234.patch, LUCENE-3234.patch > > > With larger documents, FVH can spend a lot of time trying to find the > best-scoring snippet as it examines every possible phrase formed from > matching terms in the document. If one is willing to accept > less-than-perfect scoring by limiting the number of phrases that are > examined, substantial speedups are possible. This is analogous to the > Highlighter limit on the number of characters to analyze. > The patch includes an artifical test case that shows > 1000x speedup. In a > more normal test environment, with English documents and random queries, I am > seeing speedups of around 3-10x when setting phraseLimit=1, which has the > effect of selecting the first possible snippet in the document. Most of our > sites operate in this way (just show the first snippet), so this would be a > big win for us. > With phraseLimit = -1, you get the existing FVH behavior. At larger values of > phraseLimit, you may not get substantial speedup in the normal case, but you > do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-3234: - Attachment: LUCENE-3234.patch > Provide limit on phrase analysis in FastVectorHighlighter > - > > Key: LUCENE-3234 > URL: https://issues.apache.org/jira/browse/LUCENE-3234 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 >Reporter: Mike Sokolov >Assignee: Koji Sekiguchi > Fix For: 3.4, 4.0 > > Attachments: LUCENE-3234.patch, LUCENE-3234.patch > > > With larger documents, FVH can spend a lot of time trying to find the > best-scoring snippet as it examines every possible phrase formed from > matching terms in the document. If one is willing to accept > less-than-perfect scoring by limiting the number of phrases that are > examined, substantial speedups are possible. This is analogous to the > Highlighter limit on the number of characters to analyze. > The patch includes an artifical test case that shows > 1000x speedup. In a > more normal test environment, with English documents and random queries, I am > seeing speedups of around 3-10x when setting phraseLimit=1, which has the > effect of selecting the first possible snippet in the document. Most of our > sites operate in this way (just show the first snippet), so this would be a > big win for us. > With phraseLimit = -1, you get the existing FVH behavior. At larger values of > phraseLimit, you may not get substantial speedup in the normal case, but you > do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-3234: - Attachment: LUCENE-3234.patch Added solr parameter hl.phraseLimit (default=5000) > Provide limit on phrase analysis in FastVectorHighlighter > - > > Key: LUCENE-3234 > URL: https://issues.apache.org/jira/browse/LUCENE-3234 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 >Reporter: Mike Sokolov >Assignee: Koji Sekiguchi > Fix For: 3.4, 4.0 > > Attachments: LUCENE-3234.patch, LUCENE-3234.patch > > > With larger documents, FVH can spend a lot of time trying to find the > best-scoring snippet as it examines every possible phrase formed from > matching terms in the document. If one is willing to accept > less-than-perfect scoring by limiting the number of phrases that are > examined, substantial speedups are possible. This is analogous to the > Highlighter limit on the number of characters to analyze. > The patch includes an artifical test case that shows > 1000x speedup. In a > more normal test environment, with English documents and random queries, I am > seeing speedups of around 3-10x when setting phraseLimit=1, which has the > effect of selecting the first possible snippet in the document. Most of our > sites operate in this way (just show the first snippet), so this would be a > big win for us. > With phraseLimit = -1, you get the existing FVH behavior. At larger values of > phraseLimit, you may not get substantial speedup in the normal case, but you > do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-3234: --- Affects Version/s: 3.3 2.9.4 3.0.3 3.1 3.2 Fix Version/s: 4.0 3.4 > Provide limit on phrase analysis in FastVectorHighlighter > - > > Key: LUCENE-3234 > URL: https://issues.apache.org/jira/browse/LUCENE-3234 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 >Reporter: Mike Sokolov >Assignee: Koji Sekiguchi > Fix For: 3.4, 4.0 > > Attachments: LUCENE-3234.patch > > > With larger documents, FVH can spend a lot of time trying to find the > best-scoring snippet as it examines every possible phrase formed from > matching terms in the document. If one is willing to accept > less-than-perfect scoring by limiting the number of phrases that are > examined, substantial speedups are possible. This is analogous to the > Highlighter limit on the number of characters to analyze. > The patch includes an artifical test case that shows > 1000x speedup. In a > more normal test environment, with English documents and random queries, I am > seeing speedups of around 3-10x when setting phraseLimit=1, which has the > effect of selecting the first possible snippet in the document. Most of our > sites operate in this way (just show the first snippet), so this would be a > big win for us. > With phraseLimit = -1, you get the existing FVH behavior. At larger values of > phraseLimit, you may not get substantial speedup in the normal case, but you > do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-3234: - Attachment: LUCENE-3234.patch > Provide limit on phrase analysis in FastVectorHighlighter > - > > Key: LUCENE-3234 > URL: https://issues.apache.org/jira/browse/LUCENE-3234 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Mike Sokolov > Attachments: LUCENE-3234.patch > > > With larger documents, FVH can spend a lot of time trying to find the > best-scoring snippet as it examines every possible phrase formed from > matching terms in the document. If one is willing to accept > less-than-perfect scoring by limiting the number of phrases that are > examined, substantial speedups are possible. This is analogous to the > Highlighter limit on the number of characters to analyze. > The patch includes an artifical test case that shows > 1000x speedup. In a > more normal test environment, with English documents and random queries, I am > seeing speedups of around 3-10x when setting phraseLimit=1, which has the > effect of selecting the first possible snippet in the document. Most of our > sites operate in this way (just show the first snippet), so this would be a > big win for us. > With phraseLimit = -1, you get the existing FVH behavior. At larger values of > phraseLimit, you may not get substantial speedup in the normal case, but you > do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org