Hi kaveh, We have recently been informed about parsing taking forever and a day in the reduce phase. This is currently being investigated. FYI the thread can be found below
http://www.mail-archive.com/user%40nutch.apache.org/msg06560.html I wonder if you have looked into this and if there is a more general link between such issues? Lewis On Wed, Jun 13, 2012 at 1:31 AM, kaveh minooie <[email protected]> wrote: > Hi everybody > > I have an unusual issue. when i run nutch on top off hadoop, after the map > tasks finish, the reduce task start to finish very fast almost all of them > finish in less than 2 hours but there is alway one or two that take a lot > longer. this is a link to the list of a completed reduce tasks ( that is all > of them for that fetch job) and you can see on the list that the last one > took more than 18 hours to finish and there is another one that took more > than 6 hours. does any body have any idea why this is happening? > > http://plutooz.com/hadoop.html > > p.s. this fetch job had about 1.5 million pages in it. > > thanks, -- Lewis

