Message-
From: sanjeev [mailto:[EMAIL PROTECTED]
Sent: 2006-11-08 19:28
To: nutch-dev@lucene.apache.org
Subject: Re: implement thai lanaguage analyzer in nutch
I need a Thai Analyzer for Nutch. I want the crawler to be
intelligent enough
to split thai words correctly since thai don't
the search term is one Unicode character.
-kuro
-Original Message-
From: sanjeev [mailto:[EMAIL PROTECTED]
Sent: 2006-11-08 19:28
To: nutch-dev@lucene.apache.org
Subject: Re: implement thai lanaguage analyzer in nutch
I need a Thai Analyzer for Nutch. I want the crawler
Sanjeev,
You have implemented Thai language, right? What else changes you have done
in orignal code ? Do I need to make same changes for say Hindi and Punjabi
Language?
If u bit of time to explain the things to him, will be of great help to
me.
Thank you
./Arun
On 11/8/06, sanjeev
Arun,
I tried implementing thai search for nutch.
I followed the steps outllined in this tutorialfor Chinese:
http://issues.apache.org/jira/browse/NUTCH-36?page=comments#action_62153
So sorry - I am not able to help much. How urgent is your requirement ?
Mine is very urgent as I have to get
Sanjay,
I don't think you should follow the Chinese example and extend the CJK
range.
This was needed because Chinese and Japanese don't use space to separate
words. I believe Thai uses spaces, right? If so, you should extend
LETTER
range to include Thai character rather than CJK.
Another place
ThaiWordFilter.java
Otis
- Original Message
From: Teruhiko Kurosaka [EMAIL PROTECTED]
To: sanjeev [EMAIL PROTECTED]; nutch-dev@lucene.apache.org
Sent: Wednesday, November 8, 2006 2:16:38 PM
Subject: RE: implement thai lanaguage analyzer in nutch
Sanjay,
I don't think you should follow the Chinese
: Wednesday, November 8, 2006 2:16:38 PM
Subject: RE: implement thai lanaguage analyzer in nutch
Sanjay,
I don't think you should follow the Chinese example and extend the CJK
range.
This was needed because Chinese and Japanese don't use space to separate
words. I believe Thai uses spaces
PM
Subject: RE: implement thai lanaguage analyzer in nutch
Sanjay,
I don't think you should follow the Chinese example and extend the CJK
range.
This was needed because Chinese and Japanese don't use space to separate
words. I believe Thai uses spaces, right? If so, you should extend
Arun,
No I haven't come anywhere near the solution. I am myself confused a little.
From what I've learnt - one approach is to use NutchAnalysis.jj and compile
using javacc.
Another is to download dev version of nutch and try to use the patches for
the language analyzer
and identifier.
I failed
i think you should learn the javacc ,then understand the analasis.jj
then the thai will be resolved soon .
just try it
On 11/7/06, sanjeev [EMAIL PROTECTED] wrote:
Hello,
After playing around with nutch for a few months I was tying to implement
the thai lanaguage analyzer for nutch.
Oh btw - I followed the chinese tutorial and was able to compile and
everything was fine.
Lemme just test if it is working properly - however i didn't make any
changes to NutchAnalysis.jj
I need more information please.
Thanks a bunch.
--
View this message in context:
Hi sanjeev and Kauu
I want to support Hindi-Language widely spoken in India language.
Can u guide what else I need to modify ? I think there is no support to
search and index Hindi language.
I want to work on this. But I need some information as what
to modify and where eaxctly
12 matches
Mail list logo