[jira] [Updated] (LUCENE-4381) support unicode 6.2
[ https://issues.apache.org/jira/browse/LUCENE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4381: Attachment: LUCENE-4381.patch here's a cleaned up patch. i think its ready. our ICU is currently really out of date, and upgrading it allows us to delete a bunch of custom code. support unicode 6.2 --- Key: LUCENE-4381 URL: https://issues.apache.org/jira/browse/LUCENE-4381 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Fix For: 4.7 Attachments: LUCENE-4381.patch, LUCENE-4381.patch ICU will release a new version in about a month. They have a version for testing (http://site.icu-project.org/download/milestone) already out with some interesting features, e.g. dictionary-based CJK segmentation. This issue is just to test it out/integrate the new stuff/etc. We should try out the automation Steve did as well. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4381) support unicode 6.2
[ https://issues.apache.org/jira/browse/LUCENE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-4381: -- Fix Version/s: (was: 4.3) 4.4 support unicode 6.2 --- Key: LUCENE-4381 URL: https://issues.apache.org/jira/browse/LUCENE-4381 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Fix For: 4.4 Attachments: LUCENE-4381.patch ICU will release a new version in about a month. They have a version for testing (http://site.icu-project.org/download/milestone) already out with some interesting features, e.g. dictionary-based CJK segmentation. This issue is just to test it out/integrate the new stuff/etc. We should try out the automation Steve did as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4381) support unicode 6.2
[ https://issues.apache.org/jira/browse/LUCENE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4381: Attachment: LUCENE-4381.patch A hacked up patch for testing: I think its nice to offer the CJK dictionary-based stuff as an option? I'm not sure how good results will be on average yet (maybe I can enlist Christian to help investigate). So as a test I just added a boolean option, which if enabled, keeps all han/hiragana/katakana marked as Chinese/Japanese (uses the 15924 Japanese code, but I overrode the toString to try to prevent confusion). Seems to work ok: some trivial snippets from smartcn and kuromoji are analyzed fine, and testRandomStrings is happy :) support unicode 6.2 --- Key: LUCENE-4381 URL: https://issues.apache.org/jira/browse/LUCENE-4381 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Fix For: 4.1, 5.0 Attachments: LUCENE-4381.patch ICU will release a new version in about a month. They have a version for testing (http://site.icu-project.org/download/milestone) already out with some interesting features, e.g. dictionary-based CJK segmentation. This issue is just to test it out/integrate the new stuff/etc. We should try out the automation Steve did as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org