Hi,

Recently a reducer got killed because of this. Increasing heap did work but the next job some days later also failed. I looked at the code and i cannot seem to find why it would take more than 400MB of RAM to process outlinks of a single record. We do limit outlinks so the HashSets pages and domains are used. But we also limit the number of outlinks per record in the parser to the default of 100. So i would not expect the List and the both Sets in the reducer to use that much. Also, URL's longer than about 400 characters are discarded anyway.

Any thoughts to share?

Thanks,
Markus

Reply via email to