Re: Customize Crawling..

2008-01-16 Thread Manoj Bist
I came across a languageidentifier plugin at PluginCentral while trying to figure out something else. *Maybe *this could be a starting point for you. http://wiki.apache.org/nutch/PluginCentral 2008/1/16 Volkan Ebil <[EMAIL PROTECTED]>: > url filter will solve the url limitation problem thanks.Is

RE: Customize Crawling..

2008-01-16 Thread Volkan Ebil
url filter will solve the url limitation problem thanks.Is anyone know how i can add an if check to the crawl process that allows only the sites that contains special chars like "ç,ü,ğ".Shoul i study on parse algoritm.

RE: Customize Crawling..

2008-01-15 Thread kishore.krishna2
Hi I dnt knw abt the special character part...but u can limit the urls using conf/urfilter.txt... Thanx kishore -Original Message- From: Volkan Ebil [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 15, 2008 6:13 PM To: nutch-user@lucene.apache.org Subject: Customize Crawling.. Hi, I

Customize Crawling..

2008-01-15 Thread Volkan Ebil
Hi, I am a new nutch user. My problem is to customize the crawl process.My aim is to detect and crawl web sites written in my language.I want to crawl only the sites that contains special chars like "ğ" or "ç" and also , i want to limit the urls that ends only with special extensions like "com