The current Nutch language identifier plugin currently doesn't handle
CJKV pages.  Does anybody here have any experience with automatically
detecting the language of such pages?

I know there are specific encodings which give away what language the
page is, but for Asian language pages that use unicode or its
variants, I'm out of luck.

Andy


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to