Hi,
looks like the segment is not addressed properly:
hdfs://localhost:54310/user/hduser/TestCrawl/segments/crawl_generate
Segments are named by a time-stamp, e.g.
.../TestCrawl/segments/20140502231126/
crawl_generate is a subdir.
Can you specify the exact commands to run the crawler?
same as for Nutch 2.2.1 in pseudo
bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/ 10
from within the deploy dir.
However, i remember reading somewhere that the deploy execution for the 1.x
series is different than the 2.x series, that some more files, asides the
seed.txt had to be
2 matches
Mail list logo