I tried that and it worked a few times, but now I get 0 records selected for fetching.
$ bin/nutch crawl urls -dir crawl9a -depth 15 -topN 50 crawl started in: crawl9a rootUrlDir = urls threads = 10 depth = 15 topN = 50 Injector: starting Injector: crawlDb: crawl9a/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: crawl9a/segments/20091209124308 Generator: filtering: true Generator: topN: 50 Generator: jobtracker is 'local', generating exactly one Generator: 0 records selected for fetching, exiting ... Stopping at depth=0 - no more URLs to fetch. No URLs to fetch - check your seed list and URL filters. crawl finished: crawl9a Vijaya Peters SRA International, Inc. 4350 Fair Lakes Court North Room 4004 Fairfax, VA 22033 Tel: 703-502-1184 www.sra.com Named to FORTUNE's "100 Best Companies to Work For" list for 10 consecutive years P Please consider the environment before printing this e-mail This electronic message transmission contains information from SRA International, Inc. which may be confidential, privileged or proprietary. The information is intended for the use of the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution, or use of the contents of this information is strictly prohibited. If you have received this electronic information in error, please notify us immediately by telephone at 866-584-2143. -----Original Message----- From: xiao yang [mailto:[email protected]] Sent: Wednesday, December 09, 2009 1:19 PM To: [email protected] Subject: Re: how to force nutch to do a recrawl What do you mean by "recrawl"? Does the following command meets what you need? bin/nutch crawl urls -dir crawl -depth 3 -topN 50 Change the destination directory to a different one with the last crawl. On Thu, Dec 10, 2009 at 1:44 AM, Peters, Vijaya <[email protected]> wrote: > I'm running Nutch 1.0 in windows. How do I force Nutch to do a complete > recrawl? > > > > thanks, > > - Vijaya > > > > Vijaya Peters > SRA International, Inc. > 4350 Fair Lakes Court North > Room 4004 > Fairfax, VA 22033 > Tel: 703-502-1184 > > www.sra.com <http://www.sra.com/> > Named to FORTUNE's "100 Best Companies to Work For" list for 10 > consecutive years > > P Please consider the environment before printing this e-mail > > This electronic message transmission contains information from SRA > International, Inc. which may be confidential, privileged or > proprietary. The information is intended for the use of the individual > or entity named above. If you are not the intended recipient, be aware > that any disclosure, copying, distribution, or use of the contents of > this information is strictly prohibited. If you have received this > electronic information in error, please notify us immediately by > telephone at 866-584-2143. > > > >
