Can you choose a custom regex-urlfilter.txt too save editing it each time you wish too index a different site ?.

I am surprised you can't enter a url when generating a fetch list. ie

/bin/nutch generate --only  someurl.com --job 192833-292837

The you fetch job 192833-292837 parse job 192833-292837 and finally update dbase job 192833-292837

Now that would be great..

Thanks will be doing it your way for now. :)

Shane.


On 03/04/14 13:24, remi tassing wrote:
Hi Shane,

You could use the same scripts as before but just modify the
regex-urlfilter.txt to restrict the crawling scope.

BR, Remi


On Thu, Apr 3, 2014 at 10:52 AM, Shane Wood<[email protected]>  wrote:

I have indexed several site successfully.
Now i wish too index a new site and not update any other sites already
indexed.

I use Nutch 2.21 MYSQL 5.3  and Solr 4.7.0 how would you recommend i go
about indexing a new site only
if someone can give examples of command lines that would be amazingly
helpful.

Cheers
Shane.


Reply via email to