[ http://issues.apache.org/jira/browse/NUTCH-36?page=history ]
Jack Tang updated NUTCH-36: --------------------------- Attachment: 桌 Attachment includes 1. patch of NutchAnalysis.jj 2. patch of FastCharStream.java 3. CJKTokenizer.java 4. patch of NutchDocumentTokenizer.java > Chinese in Nutch > ---------------- > > Key: NUTCH-36 > URL: http://issues.apache.org/jira/browse/NUTCH-36 > Project: Nutch > Type: Improvement > Components: indexer, searcher > Environment: all > Reporter: Jack Tang > Priority: Minor > Attachments: 桌 > > Nutch now support Chinese in very simple way: NutchAnalysis segments CJK term > word-by-word. > So, if I search Chinese term 'FooBar'(two Chinese words: 'Foo' and 'Bar'), > the result in web gui will highlight 'FooBar' and 'Foo', 'Bar'. While we > expect Nutch only highlights 'FooBar'. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers