Aha, thank you. The progress percentage is set early on. Is that a good thing for both Nutch and Hadoop in general? If that progress is set to early on in the process, what happens when you have a task that takes a *really* long time? I suppose it's just a minor annoyance, since one can always look at the completed/not completed bit to see what the real task status is.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: Andrzej Bialecki <[EMAIL PROTECTED]> To: [email protected] Sent: Thursday, April 10, 2008 5:55:37 PM Subject: Re: Fetch task 100% done, but still fetching Dennis Kubes wrote: > I believe the percentage complete is set in hadoop, in the > TaskInProgress.recomputeProgressMethod() and then lines 570-595 in > JobInProgress.updateTaskStatus. Correct - this actually comes down the way from reading the current FileSplit, i.e. the part of the input fetchlist. When this reading process is completed, the percentage is set to 100%, even though a lot of URLs could be still queued and waiting to be fetched. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
