[jira] [Commented] (LUCENE-4545) Better error reporting StemmerOverrideFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396237#comment-16396237 ] Shawn Heisey commented on LUCENE-4545: -- Found this issue because of a user having a problem. Uploaded a new patch against master (8.0). [~rcmuir], I didn't use LineNumberReader as you suggested. I did find an example of that elsewhere in the code, but using that would have required a more substantial rewrite. I'm willing to do that if you really think that's the way it should be done, but I was able to get line numbers more directly than what the first patch did. The code has changed since the first patch was made. I changed the regex in the split usage to any sequence of one or more whitespace characters, so it should be able to handle just about anything a user is likely to throw at it. I did find a few other usages elsewhere of split with a single tab character. Some of them should perhaps be reviewed for adjustment to the "any whitespace" regex. > Better error reporting StemmerOverrideFilterFactory > --- > > Key: LUCENE-4545 > URL: https://issues.apache.org/jira/browse/LUCENE-4545 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.0 >Reporter: Markus Jelsma >Priority: Trivial > Fix For: 4.9, 6.0 > > Attachments: LUCENE-4545-trunk-1.patch, LUCENE-4545.patch > > > If the dictionary contains an error such as a space instead of a tab > somewhere in the dictionary it is hard to find the error in a long > dictionary. This patch includes the file and line number in the exception, > helping to debug it quickly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4545) Better error reporting StemmerOverrideFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492366#comment-13492366 ] Robert Muir commented on LUCENE-4545: - I'm for the idea, but not for the logic contained to this specific factory. Instead of tracking our own line numbers, we should use LineNumberReader and so on. WordListLoader.getStemDict should be changed to take a generic map (Not a chararraymap), so that it can be used by this method. In fact, since nothing at all is using this method, we can do whatever we want with it. Also the logic should not use split(s, 2): I think instead it should just use split(s)? This way we detect the situation where there are multiple tabs in a line unexpectedly, too. Better error reporting StemmerOverrideFilterFactory --- Key: LUCENE-4545 URL: https://issues.apache.org/jira/browse/LUCENE-4545 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Markus Jelsma Priority: Trivial Fix For: 4.1, 5.0 Attachments: LUCENE-4545-trunk-1.patch If the dictionary contains an error such as a space instead of a tab somewhere in the dictionary it is hard to find the error in a long dictionary. This patch includes the file and line number in the exception, helping to debug it quickly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org