i am trying to setup nutch 0.9 to crawl www.yahoo.com. I am using this command "bin/nutch crawl urls -dir crawl -depth 3".
But after the command, no links have been fetch. Is that something I need to setup before www.yahoo.com can be crawled? Thanks for any help. I have struggled with this problem for days. And I have tried using nutch 0.8.1 and I have the same problem. I am able to crawl www.cnn.com with the same setup Here is the output: crawl started in: crawl rootUrlDir = urls threads = 10 depth = 3 Injector: starting Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: crawl/segments/20070416230326 Generator: filtering: false Generator: topN: 2147483647 Generator: jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: starting Fetcher: segment: crawl/segments/20070416230326 Fetcher: threads: 10 fetching http://www.yahoo.com/ Fetcher: done CrawlDb update: starting CrawlDb update: db: crawl/crawldb CrawlDb update: segments: [crawl/segments/20070416230326] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: Merging segment data into db. CrawlDb update: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: crawl/segments/20070416230338 Generator: filtering: false Generator: topN: 2147483647 Generator: jobtracker is 'local', generating exactly one partition. Generator: 0 records selected for fetching, exiting ... Stopping at depth=1 - no more URLs to fetch. LinkDb: starting LinkDb: linkdb: crawl/linkdb LinkDb: URL normalize: true LinkDb: URL filter: true LinkDb: adding segment: crawl/segments/20070416230326 LinkDb: done Indexer: starting Indexer: linkdb: crawl/linkdb Indexer: adding segment: crawl/segments/20070416230326 Indexing [http://www.yahoo.com/] with analyzer [EMAIL PROTECTED] (null) Optimizing index. merging segments _ram_0 (1 docs) into _0 (1 docs) Indexer: done Dedup: starting Dedup: adding indexes in: crawl/indexes Dedup: done merging indexes to: crawl/index Adding crawl/indexes/part-00000 done merging crawl finished: crawl CrawlDb topN: starting (topN=25, min=0.0) CrawlDb db: crawl/crawldb CrawlDb topN: collecting topN scores. CrawlDb topN: done Match ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers