What about the configuration in crawl-urlfilter.txt?

On Thu, Dec 10, 2009 at 2:29 AM, Peters, Vijaya <[email protected]> wrote:
> I tried that too.
> in Nutch-site.xml, I added in the below, but this had no effect.
>
> <property>
>  <name>db.default.fetch.interval</name>
>  <value>0</value>
>  <description>(DEPRECATED) The default number of days between re-fetches of a 
> page.  value was 30
>  </description>
> </property>
>
> <property>
>  <name>db.fetch.interval.default</name>
>  <value>3600</value>
>  <description>The default number of seconds between re-fetches of a page (30 
> days). value was 2592000 (30 days)
>  </description>
> </property>
>
> <property>
>  <name>db.fetch.interval.max</name>
>  <value>3600</value>
>  <description>The maximum number of seconds between re-fetches of a page
>  (90 days). After this period every page in the db will be re-tried, no
>  matter what is its status.  value was 7776000
>  </description>
> </property>
>
> Vijaya Peters
> SRA International, Inc.
> 4350 Fair Lakes Court North
> Room 4004
> Fairfax, VA  22033
> Tel:  703-502-1184
>
> www.sra.com
> Named to FORTUNE's "100 Best Companies to Work For" list for 10 consecutive 
> years
> P Please consider the environment before printing this e-mail
> This electronic message transmission contains information from SRA 
> International, Inc. which may be confidential, privileged or proprietary.  
> The information is intended for the use of the individual or entity named 
> above.  If you are not the intended recipient, be aware that any disclosure, 
> copying, distribution, or use of the contents of this information is strictly 
> prohibited.  If you have received this electronic information in error, 
> please notify us immediately by telephone at 866-584-2143.
>
> -----Original Message-----
> From: MilleBii [mailto:[email protected]]
> Sent: Wednesday, December 09, 2009 1:27 PM
> To: [email protected]
> Subject: Re: how to force nutch to do a recrawl
>
> Nutch only recrawl every 30 days by default. So you set the numberDays
> adequately and it wil recrawl read nutch-default.xml to get the
> details
>
> 2009/12/9, xiao yang <[email protected]>:
>> What do you mean by "recrawl"?
>> Does the following command meets what you need?
>> bin/nutch crawl urls -dir crawl -depth 3 -topN 50
>> Change the destination directory to a different one with the last crawl.
>>
>> On Thu, Dec 10, 2009 at 1:44 AM, Peters, Vijaya <[email protected]>
>> wrote:
>>> I'm running Nutch 1.0 in windows.  How do I force Nutch to do a complete
>>> recrawl?
>>>
>>>
>>>
>>> thanks,
>>>
>>> - Vijaya
>>>
>>>
>>>
>>> Vijaya Peters
>>> SRA International, Inc.
>>> 4350 Fair Lakes Court North
>>> Room 4004
>>> Fairfax, VA  22033
>>> Tel:  703-502-1184
>>>
>>> www.sra.com <http://www.sra.com/>
>>> Named to FORTUNE's "100 Best Companies to Work For" list for 10
>>> consecutive years
>>>
>>> P Please consider the environment before printing this e-mail
>>>
>>> This electronic message transmission contains information from SRA
>>> International, Inc. which may be confidential, privileged or
>>> proprietary.  The information is intended for the use of the individual
>>> or entity named above.  If you are not the intended recipient, be aware
>>> that any disclosure, copying, distribution, or use of the contents of
>>> this information is strictly prohibited.  If you have received this
>>> electronic information in error, please notify us immediately by
>>> telephone at 866-584-2143.
>>>
>>>
>>>
>>>
>>
>
>
> --
> -MilleBii-
>

Reply via email to