Hi All, I am running into a situation where the reduce phase of the fetch job with parsing enabled at the time of fetch is taking excessively long amount of time , I have seen recommendations to filter the URLs based on length to avoid normalization related delays ,I am not filtering any URLs based on length , could that be an issue ?
Can anyone share if they faced this issue and what the resolution was, I am running Nutch 1.7 on Hadoop YARN. The issue was previously inconclusively discussed here. http://markmail.org/message/p6dzvvycpfzbaugr#query:+page:1+mid:p6dzvvycpfzbaugr+state:results Thanks.