Hi Everyone, I am using MapReduce and DFS for a crawl + index operation. When parsing relatively small segments (about 50,000 - 60,000 URLs), everything goes fine. But, when I try to parse a larger segment (600,000 - 700,000 URLs), my job is stopped by OutOfMemoryError at tasktrackers during the map phase :
"java.lang.OutOfMemoryError: Java heap space" Is this an expected situation as the segments grow larger or is this a bug waiting to be examined? I have been trying to solve the problem, but I could not achieve it. Could somebody help me? Thanks
_______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
