If your documents are in UTF then you directly crawl and index the documents
using Nutch(tutorial is given in nutch wiki), else you need to first convert
the documents into UTF-8 and then you can index. After indexing is over
first try to search using command line searching APIs of Nutch and then
using modify the GUI(jsp page of nutch) so that it can also search from GUI.
In order to varify your index you can also use "LUKE-lucene index tool".



On Mon, Jan 26, 2009 at 4:01 AM, suryas <[email protected]> wrote:

>
> Hi,
> I want to index & search Tamil (an Indian language) pages using Nutch. I
> have some knowledge of Lucene and just got the "Nutch Basic Tutorial"
> working.
>
> Where do I look for indexing Tamil or any other Indian language pages?
>
> I'm looking for:
> *step-by-step" documentation for indexing and searching foreign language
> pages, particularly Indian languages
> *some examples, samples, tutorials would be nice
>
> Or if you could just point me in the right direction, that'll be fine too.
>
> I saw some postings from "saran" & "saravana kumar" talking about this same
> thing. Guys, did you figure this out? could you please help?
>
> Could someone help?
>
> thanks,
> Surya
> --
> View this message in context:
> http://www.nabble.com/How-to-index-and-search-Indian-languages-tp21657719p21657719.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>

Reply via email to