[jira] Issue Comment Edited: (LUCENE-1216) CharDelimiterTokenizer

Otis Gospodnetic (JIRA) Wed, 12 Mar 2008 14:40:21 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578023#action_12578023
 ]


otis edited comment on LUCENE-1216 at 3/12/08 2:37 PM:
-------------------------------------------------------------------

This looks useful.
Could you please write a simple unit test and put ASL on top of all .java 
files, and change the formating to match the rest of Lucene source code (two 
spaces, no tabs...)?

Thanks!


      was (Author: otis):
    This looks useful.
Could you please write a simple unit test and put ASL on top of all .java files?

  
> CharDelimiterTokenizer
> ----------------------
>
>                 Key: LUCENE-1216
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1216
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Hiroaki Kawai
>         Attachments: CharDelimiterTokenizer.java
>
>
> WhitespaceTokenizer is very useful for space separated languages, but my 
> Japanese text is not always separated by a space. So, I created an 
> alternative Tokenizer that we can specify the delimiter. The file submitted 
> will be an improvement of the current WhitespaceTokenizer.
> I tried to extend it from CharTokenizer, but CharTokenizer has a limitation 
> that a token can't be longer than 255 chars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Issue Comment Edited: (LUCENE-1216) CharDelimiterTokenizer

Reply via email to