Hi Musshorn,

To run crawl untill it discovers all URL use -1 in end of crawl command.
-deletGone - The way I use this , I crawl first, then run indexing on
crawled data (nutch solrindex SOLR_URL CRAWLDB_PATH CRAWLDB_DIR/segments/*
-filter -normalize -deleteGone )

Mark

On Tue, Aug 9, 2016 at 10:52 AM, Musshorn, Kris T CTR USARMY RDECOM ARL
(US) <[email protected]> wrote:

> CLASSIFICATION: UNCLASSIFIED
>
> Marcus.
>
> 1. how do I keep nutch running all the time?
>
> 2. If I am invoking crawl with nutch/crawl from a bash script file then
> how to I specify -deleteGone when indexing?
>
>
> Thanks,
> Kris
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
> Kris T. Musshorn
> FileMaker Developer - Contractor – Catapult Technology Inc.
> US Army Research Lab
> Aberdeen Proving Ground
> Application Management & Development Branch
> 410-278-7251
> [email protected]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> -----Original Message-----
> From: Markus Jelsma [mailto:[email protected]]
> Sent: Wednesday, August 03, 2016 7:04 PM
> To: [email protected]
> Subject: [Non-DoD Source] RE: functional question... (UNCLASSIFIED)
>
> Yes, just keep Nutch running all the time with a refetch interval you
> choose, defaults to 30 days. With -deleteGone switches when indexing you
> will be fine.
>
> M.
>
>
>
> -----Original message-----
> > From:Musshorn, Kris T CTR USARMY RDECOM ARL (US) <
> [email protected]>
> > Sent: Wednesday 3rd August 2016 19:11
> > To: [email protected]
> > Subject: functional question... (UNCLASSIFIED)
> >
> > CLASSIFICATION: UNCLASSIFIED
> >
> > We are currently using ultraseek and looking to deprecate it in favor of
> solr/nutch.
> > Ultraseek runs all the time and auto detects when pages have changed and
> automatically reindexes them.
> > Is this possible with SOLR/nutch?
> >
> > Thanks,
> > Kris
> >
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Kris T. Musshorn
> > FileMaker Developer - Contractor - Catapult Technology Inc.
> > US Army Research Lab
> > Aberdeen Proving Ground
> > Application Management & Development Branch
> > 410-278-7251
> > [email protected]
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> >
> > CLASSIFICATION: UNCLASSIFIED
>
>
> CLASSIFICATION: UNCLASSIFIED
>

Reply via email to