So, the fetch has been successful. The document has been indexed. It stopped at depth 1 because there are no more URLs to fetch. This is not any sort of failure. This just means that the job has been completed too soon. If there were more URLs to fetch, the crawling would have continued with depth 1.
Crawling begins with depth 0 when it fetches all the URLs mentioned in the seed files. Then in the generate phase, more URLs are generated that have been obtained from the pages fetched in depth 0. In depth 1, fetcher fetches the generated URLs which haven't been fetched. This process continues till the fetcher has run as many number of times as mentioned in the -depth argument, or there are no more URLs to fetch, whichever happens first. I would strongly recommend that you go through the Nutch tutorial once. It is available at:- http://lucene.apache.org/nutch/tutorial8.html and this would help you understand Nutch better. Regards, Susam Pal On Nov 16, 2007 4:59 PM, crazy <[EMAIL PROTECTED]> wrote: > > i change my seed urls file to this > http://www.frlii.org/IMG/doc/catalogue_a_portail_27-09-2004.doc > > and i have this like result: > fetching http://www.frlii.org/IMG/doc/cactalogue_a_portail_27-09-2004.doc > 16 nov. 2007 11:18:55 org.apache.tika.mime.MimeUtils load > INFO: Loading [tika-mimetypes.xml] > Fetcher: done > CrawlDb update: starting > CrawlDb update: db: crawl/crawldb > CrawlDb update: segments: [crawl/segments/20071116111851] > CrawlDb update: additions allowed: true > CrawlDb update: URL normalizing: true > CrawlDb update: URL filtering: true > CrawlDb update: Merging segment data into db. > CrawlDb update: done > Generator: Selecting best-scoring urls due for fetch. > Generator: starting > Generator: segment: crawl/segments/20071116111859 > Generator: filtering: false > Generator: topN: 2147483647 > Generator: jobtracker is 'local', generating exactly one partition. > Generator: 0 records selected for fetching, exiting ... > Stopping at depth=1 - no more URLs to fetch. > > what i can do now i feel that we are near the aim > > tksss >
