Hello everyone,
We've a set of urls to crawl, but we're interested in crawling only pages whose language is in our white list (e.g.: English, Italian, French), and reject all the others. I don't know if Nutch has a built-in support for this, language-detector seems to be dedicated only to another task. Which is the best way to achieve this with Nutch? Some configuration options, or it's needed to write a new plug-in ? (That for example, download the page, detect the content language, and if the language is ok, proceed, otherwise the page is skipped). Thanks, Alessio

