RE: implement thai lanaguage analyzer in nutch

2006-11-14 Thread sanjeev
Message- From: sanjeev [mailto:[EMAIL PROTECTED] Sent: 2006-11-08 19:28 To: nutch-dev@lucene.apache.org Subject: Re: implement thai lanaguage analyzer in nutch I need a Thai Analyzer for Nutch. I want the crawler to be intelligent enough to split thai words correctly since thai don't

RE: implement thai lanaguage analyzer in nutch

2006-11-10 Thread Teruhiko Kurosaka
the search term is one Unicode character. -kuro -Original Message- From: sanjeev [mailto:[EMAIL PROTECTED] Sent: 2006-11-08 19:28 To: nutch-dev@lucene.apache.org Subject: Re: implement thai lanaguage analyzer in nutch I need a Thai Analyzer for Nutch. I want the crawler

Re: implement thai lanaguage analyzer in nutch

2006-11-08 Thread Arun Kaundal
Sanjeev, You have implemented Thai language, right? What else changes you have done in orignal code ? Do I need to make same changes for say Hindi and Punjabi Language? If u bit of time to explain the things to him, will be of great help to me. Thank you ./Arun On 11/8/06, sanjeev

Re: implement thai lanaguage analyzer in nutch

2006-11-08 Thread sanjeev
Arun, I tried implementing thai search for nutch. I followed the steps outllined in this tutorialfor Chinese: http://issues.apache.org/jira/browse/NUTCH-36?page=comments#action_62153 So sorry - I am not able to help much. How urgent is your requirement ? Mine is very urgent as I have to get

RE: implement thai lanaguage analyzer in nutch

2006-11-08 Thread Teruhiko Kurosaka
Sanjay, I don't think you should follow the Chinese example and extend the CJK range. This was needed because Chinese and Japanese don't use space to separate words. I believe Thai uses spaces, right? If so, you should extend LETTER range to include Thai character rather than CJK. Another place

Re: implement thai lanaguage analyzer in nutch

2006-11-08 Thread ogjunk-nutch
ThaiWordFilter.java Otis - Original Message From: Teruhiko Kurosaka [EMAIL PROTECTED] To: sanjeev [EMAIL PROTECTED]; nutch-dev@lucene.apache.org Sent: Wednesday, November 8, 2006 2:16:38 PM Subject: RE: implement thai lanaguage analyzer in nutch Sanjay, I don't think you should follow the Chinese

Re: implement thai lanaguage analyzer in nutch

2006-11-08 Thread sanjeev
: Wednesday, November 8, 2006 2:16:38 PM Subject: RE: implement thai lanaguage analyzer in nutch Sanjay, I don't think you should follow the Chinese example and extend the CJK range. This was needed because Chinese and Japanese don't use space to separate words. I believe Thai uses spaces

Re: implement thai lanaguage analyzer in nutch

2006-11-08 Thread sanjeev
PM Subject: RE: implement thai lanaguage analyzer in nutch Sanjay, I don't think you should follow the Chinese example and extend the CJK range. This was needed because Chinese and Japanese don't use space to separate words. I believe Thai uses spaces, right? If so, you should extend

Re: implement thai lanaguage analyzer in nutch

2006-11-08 Thread sanjeev
Arun, No I haven't come anywhere near the solution. I am myself confused a little. From what I've learnt - one approach is to use NutchAnalysis.jj and compile using javacc. Another is to download dev version of nutch and try to use the patches for the language analyzer and identifier. I failed

Re: implement thai lanaguage analyzer in nutch

2006-11-07 Thread kauu
i think you should learn the javacc ,then understand the analasis.jj then the thai will be resolved soon . just try it On 11/7/06, sanjeev [EMAIL PROTECTED] wrote: Hello, After playing around with nutch for a few months I was tying to implement the thai lanaguage analyzer for nutch.

Re: implement thai lanaguage analyzer in nutch

2006-11-07 Thread sanjeev
Oh btw - I followed the chinese tutorial and was able to compile and everything was fine. Lemme just test if it is working properly - however i didn't make any changes to NutchAnalysis.jj I need more information please. Thanks a bunch. -- View this message in context:

Re: implement thai lanaguage analyzer in nutch

2006-11-07 Thread Arun Kaundal
Hi sanjeev and Kauu I want to support Hindi-Language widely spoken in India language. Can u guide what else I need to modify ? I think there is no support to search and index Hindi language. I want to work on this. But I need some information as what to modify and where eaxctly