[jira] [Commented] (LUCENE-4545) Better error reporting StemmerOverrideFilterFactory

2018-03-12 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396237#comment-16396237
 ] 

Shawn Heisey commented on LUCENE-4545:
--

Found this issue because of a user having a problem.  Uploaded a new patch 
against master (8.0).

[~rcmuir], I didn't use LineNumberReader as you suggested.  I did find an 
example of that elsewhere in the code, but using that would have required a 
more substantial rewrite.  I'm willing to do that if you really think that's 
the way it should be done, but I was able to get line numbers more directly 
than what the first patch did.  The code has changed since the first patch was 
made.

I changed the regex in the split usage to any sequence of one or more 
whitespace characters, so it should be able to handle just about anything a 
user is likely to throw at it.

I did find a few other usages elsewhere of split with a single tab character.  
Some of them should perhaps be reviewed for adjustment to the "any whitespace" 
regex.

> Better error reporting StemmerOverrideFilterFactory
> ---
>
> Key: LUCENE-4545
> URL: https://issues.apache.org/jira/browse/LUCENE-4545
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.0
>Reporter: Markus Jelsma
>Priority: Trivial
> Fix For: 4.9, 6.0
>
> Attachments: LUCENE-4545-trunk-1.patch, LUCENE-4545.patch
>
>
> If the dictionary contains an error such as a space instead of a tab 
> somewhere in the dictionary it is hard to find the error in a long 
> dictionary. This patch includes the file and line number in the exception, 
> helping to debug it quickly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4545) Better error reporting StemmerOverrideFilterFactory

2012-11-07 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492366#comment-13492366
 ] 

Robert Muir commented on LUCENE-4545:
-

I'm for the idea, but not for the logic contained to this specific factory.

Instead of tracking our own line numbers, we should use LineNumberReader and so 
on.

WordListLoader.getStemDict should be changed to take a generic map (Not a 
chararraymap), so that it can be used by this method.
In fact, since nothing at all is using this method, we can do whatever we want 
with it.

Also the logic should not use split(s, 2): I think instead it should just use 
split(s)? This way we detect the situation
where there are multiple tabs in a line unexpectedly, too.

 Better error reporting StemmerOverrideFilterFactory
 ---

 Key: LUCENE-4545
 URL: https://issues.apache.org/jira/browse/LUCENE-4545
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Markus Jelsma
Priority: Trivial
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4545-trunk-1.patch


 If the dictionary contains an error such as a space instead of a tab 
 somewhere in the dictionary it is hard to find the error in a long 
 dictionary. This patch includes the file and line number in the exception, 
 helping to debug it quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org