Thanks Vishal. 

I had it working last night, with exactly the steps you had suggested. 

I was using CentOS, Tomcat, Nutch 0.9 - after I got the basic tutorial
working I tried crawling a tamil site, the crawling worked OK - but I
couldn't search. That's where I needed help. 

Here are the tiny road blocks I had to get over: 
1. Tamil fonts was not installed in my CentOS, so couldn't search from
command line or browser.
     "yum install tamil-fonts" fixed that problem. 

2. After installing the fonts, I was able to search from Command line

3. To get the tamil search working from Nutch WAR application, I had to set
the URI encoding in the  
   Tomcat Connector: 
   http://wiki.apache.org/nutch/GettingNutchRunningWithUtf8

Now the basic stuff are working.

Thanks Vishal and Saran for the help. 

-Surya


vishal vachhani wrote:
> 
> If your documents are in UTF then you directly crawl and index the
> documents
> using Nutch(tutorial is given in nutch wiki), else you need to first
> convert
> the documents into UTF-8 and then you can index. After indexing is over
> first try to search using command line searching APIs of Nutch and then
> using modify the GUI(jsp page of nutch) so that it can also search from
> GUI.
> In order to varify your index you can also use "LUKE-lucene index tool".
> 
> 
> 
> On Mon, Jan 26, 2009 at 4:01 AM, suryas <[email protected]> wrote:
> 
>>
>> Hi,
>> I want to index & search Tamil (an Indian language) pages using Nutch. I
>> have some knowledge of Lucene and just got the "Nutch Basic Tutorial"
>> working.
>>
>> Where do I look for indexing Tamil or any other Indian language pages?
>>
>> I'm looking for:
>> *step-by-step" documentation for indexing and searching foreign language
>> pages, particularly Indian languages
>> *some examples, samples, tutorials would be nice
>>
>> Or if you could just point me in the right direction, that'll be fine
>> too.
>>
>> I saw some postings from "saran" & "saravana kumar" talking about this
>> same
>> thing. Guys, did you figure this out? could you please help?
>>
>> Could someone help?
>>
>> thanks,
>> Surya
>> --
>> View this message in context:
>> http://www.nabble.com/How-to-index-and-search-Indian-languages-tp21657719p21657719.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-index-and-search-Indian-languages-tp21657719p21684449.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to