Can you choose a custom regex-urlfilter.txt too save editing it each
time you wish too index a different site ?.
I am surprised you can't enter a url when generating a fetch list. ie
/bin/nutch generate --only someurl.com --job 192833-292837
The you fetch job 192833-292837 parse job 192833-292837 and finally
update dbase job 192833-292837
Now that would be great..
Thanks will be doing it your way for now. :)
Shane.
On 03/04/14 13:24, remi tassing wrote:
Hi Shane,
You could use the same scripts as before but just modify the
regex-urlfilter.txt to restrict the crawling scope.
BR, Remi
On Thu, Apr 3, 2014 at 10:52 AM, Shane Wood<[email protected]> wrote:
I have indexed several site successfully.
Now i wish too index a new site and not update any other sites already
indexed.
I use Nutch 2.21 MYSQL 5.3 and Solr 4.7.0 how would you recommend i go
about indexing a new site only
if someone can give examples of command lines that would be amazingly
helpful.
Cheers
Shane.