Hi Musshorn, To run crawl untill it discovers all URL use -1 in end of crawl command. -deletGone - The way I use this , I crawl first, then run indexing on crawled data (nutch solrindex SOLR_URL CRAWLDB_PATH CRAWLDB_DIR/segments/* -filter -normalize -deleteGone )
Mark On Tue, Aug 9, 2016 at 10:52 AM, Musshorn, Kris T CTR USARMY RDECOM ARL (US) <[email protected]> wrote: > CLASSIFICATION: UNCLASSIFIED > > Marcus. > > 1. how do I keep nutch running all the time? > > 2. If I am invoking crawl with nutch/crawl from a bash script file then > how to I specify -deleteGone when indexing? > > > Thanks, > Kris > > ~~~~~~~~~~~~~~~~~~~~~~~~~~ > Kris T. Musshorn > FileMaker Developer - Contractor – Catapult Technology Inc. > US Army Research Lab > Aberdeen Proving Ground > Application Management & Development Branch > 410-278-7251 > [email protected] > ~~~~~~~~~~~~~~~~~~~~~~~~~~ > > -----Original Message----- > From: Markus Jelsma [mailto:[email protected]] > Sent: Wednesday, August 03, 2016 7:04 PM > To: [email protected] > Subject: [Non-DoD Source] RE: functional question... (UNCLASSIFIED) > > Yes, just keep Nutch running all the time with a refetch interval you > choose, defaults to 30 days. With -deleteGone switches when indexing you > will be fine. > > M. > > > > -----Original message----- > > From:Musshorn, Kris T CTR USARMY RDECOM ARL (US) < > [email protected]> > > Sent: Wednesday 3rd August 2016 19:11 > > To: [email protected] > > Subject: functional question... (UNCLASSIFIED) > > > > CLASSIFICATION: UNCLASSIFIED > > > > We are currently using ultraseek and looking to deprecate it in favor of > solr/nutch. > > Ultraseek runs all the time and auto detects when pages have changed and > automatically reindexes them. > > Is this possible with SOLR/nutch? > > > > Thanks, > > Kris > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Kris T. Musshorn > > FileMaker Developer - Contractor - Catapult Technology Inc. > > US Army Research Lab > > Aberdeen Proving Ground > > Application Management & Development Branch > > 410-278-7251 > > [email protected] > > ~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > CLASSIFICATION: UNCLASSIFIED > > > CLASSIFICATION: UNCLASSIFIED >

