Hi,
I am a new nutch user. My problem is to customize the crawl process.My aim is to detect and crawl web sites written in my language.I want to crawl only the sites that contains special chars like "ğ" or "ç" and also , i want to limit the urls that ends only with special extensions like "com.uk" and skip others.How can i do these limitations ? Where shoul i change in inject,generate,fetch,parse algorithms? Thanks.
