Thanks Charman,
How do I go about changing the 30 days to 1 day since I intend to
recrawl on a daily basis?
I am also using index-more to do the indexing, Could someone tell me
how to construct a query using the "lastModified" field. Since I am
recrawling on a daily basis I am hoping to use the current day and
current day -1 to retrieve the results as a function of date.
Thanks for the help
Suhail
On May 30, 2005, at 8:44 PM, Chirag Chaman wrote:
Suhail,
The default nutch crawl process already does this. It will refetch
pages
every 30 days.
Look at the nutch Wiki and documentation. To recrawl the links
specify the
link depth.
CC-
--------------------------------------------
Filangy, Inc.
Interested in Improving Search? Join our Team!
http://filangy.com/jointheteam.jsp
-----Original Message-----
From: Suhail Ahmed [mailto:[EMAIL PROTECTED]
Sent: Monday, May 30, 2005 12:44 PM
To: [email protected]
Subject: recrawling sites
Hi,
How do I go about recrawling websites? Essentially I want to repeat
the
following tasks repeatedly:
[one off task] inject the database with a url list
1. create a segment with the initial list 2. fetch the segment 3.
update the
database 4. create a new segment with the outlinks from [2] 5.
fetch the
segement created in [4].
I basically want to repeat steps 2 through 5. How would I do this?
Thanks for the help
Suhail