[ 
http://issues.apache.org/jira/browse/NUTCH-224?page=comments#action_12455065 ] 
            
Sean Dean commented on NUTCH-224:
---------------------------------

Just a note on my comment above, it seems JIRA cant display (or wont display) 
Korean text after I accept the comment.

If your trying to test this, and cant write Korean my best suggestion is to 
visit http://babelfish.altavista.com/ and type in "news" or whatever word your 
using then translate it to Korean. If your using Windows you might need to have 
East Asian languages support installed, which can be found under Regional and 
Language Options under the Control Panel.

> Nutch doesn't handle Korean text at all
> ---------------------------------------
>
>                 Key: NUTCH-224
>                 URL: http://issues.apache.org/jira/browse/NUTCH-224
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 0.7.1
>            Reporter: KuroSaka TeruHiko
>
> I was browing NutchAnalysis.jj and found that
> Hungul Syllables (U+AC00 ... U+D7AF; U+xxxx means
> a Unicode character of the hex value xxxx) are not
> part of LETTER or CJK class.  This seems to me that
> Nutch cannot handle Korean documents at all.
> I posted the above message at nutch-user ML and Cheolgoo Kang [EMAIL 
> PROTECTED]
> replied as:
> ------------------------------------------------------------------------------------
> There was similar issue with Lucene's StandardTokenizer.jj.
> http://issues.apache.org/jira/browse/LUCENE-444
> and
> http://issues.apache.org/jira/browse/LUCENE-461
> I'm have almost no experience with Nutch, but you can handle it like
> those issues above.
> ------------------------------------------------------------------------------------
> Both fixes should probably be ported back to NuatchAnalysis.jj.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to