[ http://issues.apache.org/jira/browse/NUTCH-36?page=comments#action_12330192 ]
Kerang Lv commented on NUTCH-36: -------------------------------- enghlitened by your last comment, the bi-gram segmentation could be done with the following in NutchAnalysis.jj | <SIGRAM: <CJK><CJK> > { input_stream.backup(1); } > Chinese in Nutch > ---------------- > > Key: NUTCH-36 > URL: http://issues.apache.org/jira/browse/NUTCH-36 > Project: Nutch > Type: Improvement > Components: indexer, searcher > Environment: all > Reporter: Jack Tang > Priority: Minor > Attachments: 桌 > > Nutch now support Chinese in very simple way: NutchAnalysis segments CJK term > word-by-word. > So, if I search Chinese term 'FooBar'(two Chinese words: 'Foo' and 'Bar'), > the result in web gui will highlight 'FooBar' and 'Foo', 'Bar'. While we > expect Nutch only highlights 'FooBar'. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers