Hi Everyone,
Reading the help for the nutch crawl script, I have a question. If I run
the crawl script without the -i parameter, does that mean the crawl will
run and complete without updating SOLR? I need to crawl pages without
updating SOLR. Then I'll use solrindex to push the crawled content into
SOLR later, when I'm ready.
Usage: crawl [-i|--index] [-D "key=value"] [-s <Seed Dir>] <Crawl Dir> <Num
Rounds>
-i|--index Indexes crawl results into a configured indexer
-D... A Java property to pass to Nutch calls
-s <Seed Dir> Directory in which to look for a seeds file
<Crawl Dir> Directory where the crawl/link/segments dirs are saved
<Num Rounds> The number of rounds to run this crawl for
Example: bin/crawl -i -s urls/ TestCrawl/ 2
--
*Fig Leaf Software is now Collective FLS, Inc.*
*
*
*Collective FLS, Inc.*
https://www.collectivefls.com/ <https://www.collectivefls.com/>