So, the fetch has been successful. The document has been indexed. It
stopped at depth 1 because there are no more URLs to fetch. This is
not any sort of failure. This just means that the job has been
completed too soon. If there were more URLs to fetch, the crawling
would have continued with depth 1.

Crawling begins with depth 0 when it fetches all the URLs mentioned in
the seed files. Then in the generate phase, more URLs are generated
that have been obtained from the pages fetched in depth 0. In depth 1,
fetcher fetches the generated URLs which haven't been fetched. This
process continues till the fetcher has run as many number of times as
mentioned in the -depth argument, or there are no more URLs to fetch,
whichever  happens first.

I would strongly recommend that you go through the Nutch tutorial
once. It is available at:-
http://lucene.apache.org/nutch/tutorial8.html and this would help you
understand Nutch better.

Regards,
Susam Pal

On Nov 16, 2007 4:59 PM, crazy <[EMAIL PROTECTED]> wrote:
>
> i change my seed urls file to this
> http://www.frlii.org/IMG/doc/catalogue_a_portail_27-09-2004.doc
>
> and i have this like result:
> fetching http://www.frlii.org/IMG/doc/cactalogue_a_portail_27-09-2004.doc
> 16 nov. 2007 11:18:55 org.apache.tika.mime.MimeUtils load
> INFO: Loading [tika-mimetypes.xml]
> Fetcher: done
> CrawlDb update: starting
> CrawlDb update: db: crawl/crawldb
> CrawlDb update: segments: [crawl/segments/20071116111851]
> CrawlDb update: additions allowed: true
> CrawlDb update: URL normalizing: true
> CrawlDb update: URL filtering: true
> CrawlDb update: Merging segment data into db.
> CrawlDb update: done
> Generator: Selecting best-scoring urls due for fetch.
> Generator: starting
> Generator: segment: crawl/segments/20071116111859
> Generator: filtering: false
> Generator: topN: 2147483647
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=1 - no more URLs to fetch.
>
> what i can do now i feel that we are near the aim
>
> tksss
>

Reply via email to