[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase
[ https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13544830#comment-13544830 ] Commit Tag Bot commented on LUCENE-4656: [branch_4x commit] Uwe Schindler http://svn.apache.org/viewvc?view=revisionrevision=1428675 Merged revision(s) 1428671 from lucene/dev/trunk: LUCENE-4656: Fix regression in IndexWriter to work with empty TokenStreams that have no TermToBytesRefAttribute (commonly provided by CharTermAttribute), e.g., oal.analysis.miscellaneous.EmptyTokenStream. Remove EmptyTokenizer from test-framework. Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase -- Key: LUCENE-4656 URL: https://issues.apache.org/jira/browse/LUCENE-4656 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Adrien Grand Assignee: Uwe Schindler Priority: Trivial Fix For: 4.1, 5.0 Attachments: LUCENE-4656_bttc.patch, LUCENE-4656-IW-bug.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch TestRandomChains can fail because EmptyTokenizer doesn't have a CharTermAttribute and doesn't compute the end offset (if the offset attribute was added by a filter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase
[ https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13544860#comment-13544860 ] Commit Tag Bot commented on LUCENE-4656: [trunk commit] Uwe Schindler http://svn.apache.org/viewvc?view=revisionrevision=1428671 LUCENE-4656: Fix regression in IndexWriter to work with empty TokenStreams that have no TermToBytesRefAttribute (commonly provided by CharTermAttribute), e.g., oal.analysis.miscellaneous.EmptyTokenStream. Remove EmptyTokenizer from test-framework. Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase -- Key: LUCENE-4656 URL: https://issues.apache.org/jira/browse/LUCENE-4656 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Adrien Grand Assignee: Uwe Schindler Priority: Trivial Fix For: 4.1, 5.0 Attachments: LUCENE-4656_bttc.patch, LUCENE-4656-IW-bug.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch TestRandomChains can fail because EmptyTokenizer doesn't have a CharTermAttribute and doesn't compute the end offset (if the offset attribute was added by a filter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase
[ https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543125#comment-13543125 ] Robert Muir commented on LUCENE-4656: - I would also say that we dont need EmptyTokenizer in test-framework. Its only there because 2 places use it, and both in a bogus way (in my opinion): 1. core/TestDocument 2. queryparsers we should first fix TestDocument, its test does not care if the tokenstream is empty or anything: {noformat} Index: src/test/org/apache/lucene/document/TestDocument.java === --- src/test/org/apache/lucene/document/TestDocument.java (revision 1428441) +++ src/test/org/apache/lucene/document/TestDocument.java (working copy) @@ -20,7 +20,7 @@ import java.io.StringReader; import java.util.List; -import org.apache.lucene.analysis.EmptyTokenizer; +import org.apache.lucene.analysis.CannedTokenStream; import org.apache.lucene.analysis.MockAnalyzer; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; @@ -318,7 +318,7 @@ // LUCENE-3616 public void testInvalidFields() { try { - new Field(foo, new EmptyTokenizer(new StringReader()), StringField.TYPE_STORED); + new Field(foo, new CannedTokenStream(), StringField.TYPE_STORED); fail(did not hit expected exc); } catch (IllegalArgumentException iae) { // expected {noformat} The queryparser test looks outdated, like its some test about when an Analyzer returns null? Maybe the test can just be removed, but if we apply this patch, we could move EmptyTokenizer from test-framework/src/java to queryparser/src/test at least as an improvement, since it is kinda funky. Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase -- Key: LUCENE-4656 URL: https://issues.apache.org/jira/browse/LUCENE-4656 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: Adrien Grand Assignee: Uwe Schindler Priority: Trivial Attachments: LUCENE-4656_bttc.patch, LUCENE-4656-IW-bug.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch TestRandomChains can fail because EmptyTokenizer doesn't have a CharTermAttribute and doesn't compute the end offset (if the offset attribute was added by a filter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase
[ https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543127#comment-13543127 ] Uwe Schindler commented on LUCENE-4656: --- I would really like to remove that horrible piece of sh* :-) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase -- Key: LUCENE-4656 URL: https://issues.apache.org/jira/browse/LUCENE-4656 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Adrien Grand Assignee: Uwe Schindler Priority: Trivial Fix For: 4.1, 5.0 Attachments: LUCENE-4656_bttc.patch, LUCENE-4656-IW-bug.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch TestRandomChains can fail because EmptyTokenizer doesn't have a CharTermAttribute and doesn't compute the end offset (if the offset attribute was added by a filter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase
[ https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543133#comment-13543133 ] Adrien Grand commented on LUCENE-4656: -- Uwe, I just ran all Lucene tests with your patch and they passed, so +1. +1 to removing EmptyTokenizer too. Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase -- Key: LUCENE-4656 URL: https://issues.apache.org/jira/browse/LUCENE-4656 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Adrien Grand Assignee: Uwe Schindler Priority: Trivial Fix For: 4.1, 5.0 Attachments: LUCENE-4656_bttc.patch, LUCENE-4656-IW-bug.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch TestRandomChains can fail because EmptyTokenizer doesn't have a CharTermAttribute and doesn't compute the end offset (if the offset attribute was added by a filter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase
[ https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543134#comment-13543134 ] Uwe Schindler commented on LUCENE-4656: --- In trunk it is not even used in core! Only in 4.x's TestDocument! Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase -- Key: LUCENE-4656 URL: https://issues.apache.org/jira/browse/LUCENE-4656 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Adrien Grand Assignee: Uwe Schindler Priority: Trivial Fix For: 4.1, 5.0 Attachments: LUCENE-4656_bttc.patch, LUCENE-4656-IW-bug.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch TestRandomChains can fail because EmptyTokenizer doesn't have a CharTermAttribute and doesn't compute the end offset (if the offset attribute was added by a filter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase
[ https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543144#comment-13543144 ] Uwe Schindler commented on LUCENE-4656: --- Sorry was wrong patch. Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase -- Key: LUCENE-4656 URL: https://issues.apache.org/jira/browse/LUCENE-4656 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Adrien Grand Assignee: Uwe Schindler Priority: Trivial Fix For: 4.1, 5.0 Attachments: LUCENE-4656_bttc.patch, LUCENE-4656-IW-bug.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch TestRandomChains can fail because EmptyTokenizer doesn't have a CharTermAttribute and doesn't compute the end offset (if the offset attribute was added by a filter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase
[ https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543161#comment-13543161 ] Uwe Schindler commented on LUCENE-4656: --- I also cleaned up some imports in affected files. I will commit this later and backport. Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase -- Key: LUCENE-4656 URL: https://issues.apache.org/jira/browse/LUCENE-4656 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Adrien Grand Assignee: Uwe Schindler Priority: Trivial Fix For: 4.1, 5.0 Attachments: LUCENE-4656_bttc.patch, LUCENE-4656-IW-bug.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch TestRandomChains can fail because EmptyTokenizer doesn't have a CharTermAttribute and doesn't compute the end offset (if the offset attribute was added by a filter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org