Hello - this sounds fine indeed. But i don't know what happens with the 
calculation of the next fetch time when adddays is used, i've never tried. You 
may want to confirm fetch time is not affected by this.

Markus


-----Original message-----
> From:Sujan Suppala <[email protected]>
> Sent: Friday 7th October 2016 14:11
> To: [email protected]
> Subject: RE: nutch 1.12 How can I force a URL to get re-indexed
> 
> Thanks Markus.
> 
> I can not use freegen as this tool is not available via REST api.
> 
> With the combination of -adddays and -expr options of generator I achieved my 
> requirement. 
> Here is what I did:
> 1. inject the urls with some metadata say pageId=<unique value>
>       Seed file contains the below entry:
>       http://localhost:9090/nutchsite/html/page1.html pageId=<unique vlaue>
> 
> 2. now issue the generate command with the -adddays(to make all the urls to 
> be due for fetch) and -expr(to filter out the urls) options to select only 
> the urls to be fetched again as below:
>       $ bin/nutch generate examplesite/crawldb examplesite/segments -expr 
> "(pageId == '<unique value>')" -adddays 30
>       
> Please comment if you see any issues with this approach.
> 
> Thanks
> Sujan
> 
> -----Original Message-----
> From: Markus Jelsma [mailto:[email protected]] 
> Sent: Thursday, October 06, 2016 7:32 PM
> To: [email protected]
> Subject: RE: nutch 1.12 How can I force a URL to get re-indexed
> 
> Hi
> 
> You can use -adddays N in the generator job to fool it, or just use a lower 
> interval. Or, use the freegen tool to immediately crawl a set of URL's.
> 
> Markus
> 
>  
>  
> -----Original message-----
> > From:Sujan Suppala <[email protected]>
> > Sent: Thursday 6th October 2016 15:56
> > To: [email protected]
> > Subject: nutch 1.12 How can I force a URL to get re-indexed
> > 
> > Hi,
> > 
> > By default the nutch is fetching the URL based on the already set next 
> > fetch interval(30 days), suppose if the page is updated before this 
> > interval (30 days) how can I force to re-index?
> > 
> > How can I just 're-inject' the URLs to set the next fetch date to 
> > 'immediately'?
> > 
> > Fyi, I am using the nutch rest api client for to index the URLs.
> > 
> > Thanks
> > Sujan
> > 
> 

Reply via email to