[ https://issues.apache.org/jira/browse/LUCENE-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namgyu Kim updated LUCENE-8553: ------------------------------- Attachment: LUCENE-8553.patch > New KoreanDecomposeFilter for KoreanAnalyzer(Nori) > -------------------------------------------------- > > Key: LUCENE-8553 > URL: https://issues.apache.org/jira/browse/LUCENE-8553 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis > Reporter: Namgyu Kim > Priority: Major > Attachments: LUCENE-8553.patch > > > This is a patch for KoreanDecomposeFilter. > This filter can be used to decompose Hangul. > (ex) 한글 -> ㅎㄱ or ㅎㅏㄴㄱㅡㄹ) > Hangul input is very unique. > If you want to type apple in English, > you can type it in the order {color:#FF0000}a -> p -> p -> l -> e{color}. > However, if you want to input "Hangul" in Hangul, > you have to type it in the order of {color:#FF0000}ㅎ -> ㅏ -> ㄴ -> ㄱ -> ㅡ > -> ㄹ{color}. > (Because of the keyboard shape) > This means that spell check with existing full Hangul can be less accurate. > > The structure of Hangul consists of elements such as *"Choseong"*, > *"Jungseong"*, and *"Jongseong"*. > These three elements are called *"Jamo"*. > If you have the Korean word "된장찌개" (that means Soybean Paste Stew) > *"Choseong"* means {color:#FF0000}"ㄷ, ㅈ, ㅉ, ㄱ"{color}, > *"Jungseong"* means {color:#FF0000}"ㅚ, ㅏ, ㅣ, ㅐ"{color}, > *"Jongseong"* means {color:#FF0000}"ㄴ, ㅇ"{color}. > The reason for Jamo separation is explained above. (spell check) > Also, the reason we need "Choseong Filter" is because many Koreans use > *"Choseong Search"* (especially in mobile environment). > If you want to search for "된장찌개" you need 10 typing, which is quite a lot. > For that reason, I think it would be useful to provide a filter that can be > searched by "ㄷㅈㅉㄱ". > Hangul also has *dual chars*, such as > "ㄲ, ㄸ, ㅁ, ㅃ, ㅉ, ㅚ (ㅗ + ㅣ), ㅢ (ㅡ + ㅣ), ...". > For such reasons, > KoreanDecompose offers *5 options*, > ex) *된장찌개* => [된장], [찌개] > *1) ORIGIN* > [된장], [찌개] > *2) SINGLECHOSEONG* > [ㄷㅈ], [ㅉㄱ] > *3) DUALCHOSEONG* > [ㄷㅈ], [ㅈㅈㄱ] > *4) SINGLEJAMO* > [ㄷㅚㄴㅈㅏㅇ], [ㅉㅣㄱㅐ] > *5) DUALJAMO* > [ㄷㅗㅣㄴㅈㅏㅇ], [ㅈㅈㅣㄱㅐ] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org