Reduce task going away for 10 seconds at a time

2009-03-13 Thread Doug Cook

Hi folks,

I've been debugging a severe performance problems with a Hadoop-based
application (a highly modified version of Nutch). I've recently upgraded to
Hadoop 0.19.1 from a much, much older version, and a reduce that used to
work just fine is now running orders of magnitude more slowly. 

>From the logs I can see that progress of my reduce stops for periods that
average almost exactly 10 seconds (with a very narrow distribution around 10
seconds), and it does so in various places in my code, but more or less in
proportion to how much time I'd expect the task would normally spend in that
particular place in the code, i.e. the behavior seems like my code is
randomly being interrupted for 10 seconds at a time. 

I'm planning to keep digging, but thought that these symptoms might sound
familiar to someone on this list. Ring any bells? Your help much
appreciated. 

Thanks!

Doug Cook
-- 
View this message in context: 
http://www.nabble.com/Reduce-task-going-away-for-10-seconds-at-a-time-tp22496810p22496810.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Reducer goes past 100% complete?

2009-03-09 Thread Doug Cook

Hi folks,

I've recently upgraded to Hadoop 0.19.1 from a much, much older version of
Hadoop. 

Most things in my application (a highly modified version of Nutch) are
working just fine, but one of them is bombing out with odd symptoms. The map
works just fine, but then reduce phase (a) runs extremely slowly and (b) the
"percentage complete" reporting for each reduce task doesn't stop at 100%,
it just keeps going on past that.

I figure I'll start by understanding the percentage-complete reporting
issue, since it's pretty concrete and may have some bearing on the
performance issue. It seems likely that my application is mis-configuring
the job, or otherwise not correctly using the Hadoop API. I don't think I'm
doing anything way out of the ordinary; my reducer simply creates an object,
wraps it in an ObjectWritable, and calls output.collect(), and I have a
local class that implements OutputFormat to take the object and put it in a
Lucene index. It does actually create correct output, at least for small
indices; on large indices, the performance problems are killing me.
 
I can and will start rummaging around in the Hadoop code to figure out how
it calculates percentage complete, and see what I'm not doing correctly, but
thought I'd ask here, too, to see if someone has good suggestions off the
top of their head.

Many thanks-

Doug Cook
-- 
View this message in context: 
http://www.nabble.com/Reducer-goes-past-100--complete--tp22413589p22413589.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.