Hi,

 

I am a new nutch user. My problem is to customize the crawl process.My aim
is to detect and crawl web sites written in my language.I want to crawl only
the sites that contains special chars like "ğ" or "ç" and also ,

i want to limit the urls that ends only with special extensions like
"com.uk"  and skip others.How can i do these limitations ?   Where shoul i
change in inject,generate,fetch,parse algorithms?

 

Thanks.

Reply via email to