On Tuesday 29 June 2010 20:50:01 Julien Nioche wrote:
> Markus,
> 
> 
> The depth of the queue is simply 50 * number of threads, so I gather that
> you are using 10 threads. There is a JIRA where we discussed making this
> value parametrable.
> 
> the threads just wiggle between 490 and 500.
> 
> 
> you probably mean 'the total size wiggles...'?
yes
> 
> The number of remaining URLs to fetch is not known from the Fetcher as it
> reads the fetchlist as it goes. The fact that you can see the real number
>  of remaining URLs when > 500 is simply due to the fact that all the input
>  URLs have been read and all the remaining ones are in the queue.
> 
> 
> The 50x value could be set in a parameter however this is not the issue
> here. The point it that the queue is what's stored in memory whereas the
> total number of URLS is the queue + what's left to be read from HDFS.
> 
> I'd suggest using the mapreduce webapp to monitor the progress and not
> simply looking at the logs (i.e. you need to run it in distributed mode).
> There are now details about how many URLs have been fetched successfully or
> not + of course the progress of the map operations which indicates how much
> has been read from HDFS. Since you know how many URLs you put in the
> fetchlist in the first place, it would be trivial to work out what's left.
thanks, i understand and try the monitor.
> 
> HTH
> 
> Julien
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to