1. create a segment with the initial list
2. fetch the segment
3. update the database
4. create a new segment with the outlinks from [2]
5. fetch the segement created in [4].

I basically want to repeat steps 2 through 5. How would I do this?

Here's what I have in my script:

bin/nutch generate crawl.test/db crawl.test/segments -topN 20 # Create new segment
s1=`ls -d crawl.test/segments/2* | tail -1`
bin/nutch fetch $s1 # Fetch it bin/nutch updatedb crawl.test/db $s1 # Updatedb with new links
bin/nutch analyze crawl.test/db 5
bin/nutch index $s1

Change the db and segments directories as needed and change topN to suit
your needs. The steps start at a different point than your step 2, but you
probably get the picture. See the Nutch tutorial for more info...




-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to