[jira] [Updated] (LUCENE-4381) support unicode 6.2

2013-12-02 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4381:


Attachment: LUCENE-4381.patch

here's a cleaned up patch. i think its ready.

our ICU is currently really out of date, and upgrading it allows us to delete a 
bunch of custom code.

 support unicode 6.2
 ---

 Key: LUCENE-4381
 URL: https://issues.apache.org/jira/browse/LUCENE-4381
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
 Fix For: 4.7

 Attachments: LUCENE-4381.patch, LUCENE-4381.patch


 ICU will release a new version in about a month.
 They have a version for testing 
 (http://site.icu-project.org/download/milestone) already out with some 
 interesting features, e.g. dictionary-based CJK segmentation.
 This issue is just to test it out/integrate the new stuff/etc. We should try 
 out the automation Steve did as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4381) support unicode 6.2

2013-05-09 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4381:
--

Fix Version/s: (was: 4.3)
   4.4

 support unicode 6.2
 ---

 Key: LUCENE-4381
 URL: https://issues.apache.org/jira/browse/LUCENE-4381
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
 Fix For: 4.4

 Attachments: LUCENE-4381.patch


 ICU will release a new version in about a month.
 They have a version for testing 
 (http://site.icu-project.org/download/milestone) already out with some 
 interesting features, e.g. dictionary-based CJK segmentation.
 This issue is just to test it out/integrate the new stuff/etc. We should try 
 out the automation Steve did as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4381) support unicode 6.2

2012-09-12 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4381:


Attachment: LUCENE-4381.patch

A hacked up patch for testing:

I think its nice to offer the CJK dictionary-based stuff as an option? I'm not 
sure how good results will be on average yet (maybe I can enlist Christian to 
help investigate).

So as a test I just added a boolean option, which if enabled, keeps all 
han/hiragana/katakana marked as Chinese/Japanese (uses the 15924 Japanese 
code, but I overrode the toString to try to prevent confusion).

Seems to work ok: some trivial snippets from smartcn and kuromoji are analyzed 
fine, and testRandomStrings is happy :)

 support unicode 6.2
 ---

 Key: LUCENE-4381
 URL: https://issues.apache.org/jira/browse/LUCENE-4381
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4381.patch


 ICU will release a new version in about a month.
 They have a version for testing 
 (http://site.icu-project.org/download/milestone) already out with some 
 interesting features, e.g. dictionary-based CJK segmentation.
 This issue is just to test it out/integrate the new stuff/etc. We should try 
 out the automation Steve did as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org