Tomoko Uchida created LUCENE-8752:
-------------------------------------

             Summary: Apply a patch to kuromoji dictionary to properly handle 
Japanese new era '令和' (REIWA)
                 Key: LUCENE-8752
                 URL: https://issues.apache.org/jira/browse/LUCENE-8752
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: Tomoko Uchida


As of May 1st, 2019, Japanese era '元号' (Gengo) will be set to '令和' (Reiwa). See 
this article for more details:

[https://www.bbc.com/news/world-asia-47769566]

Currently '令和' is splitted up to '令' and '和' by {{JapaneseTokenizer}}. It 
should be tokenized as one word so that Japanese texts including era names are 
searched as users expect. Because the default Kuromoji dictionary 
(mecab-ipadic) has not been maintained since 2007, a one-line patch to the 
source CSV file is needed for this era change.

Era name is used in many official or formal documents in Japan, so it would be 
desirable the search systems properly handle this without adding a user 
dictionary or using phrase query. :)

FYI, JDK DateTime API will support the new era (in the next updates.)

[https://blogs.oracle.com/java-platform-group/a-new-japanese-era-for-java]

The patch is available here:

[https://github.com/apache/lucene-solr/pull/632]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to