Hi Amit,

I execute whole internet crawling in Nutch 2.x. Parse phrase is alwyas problem. I found base64 image information was embeded in url. That cause some OOM exception. May be you have some issue. Can you share log of parse. May be We can think about that.

Talat

01-12-2013 22:47 tarihinde, Amit Sela yazdı:
I'm using a long running production cluster so I don't think the machine
configuration is the issue, and if so, I'd expect it in the fetch phase,
wouldn't you ?
On Dec 1, 2013 9:41 PM, "S.L" <simpleliving...@gmail.com> wrote:

I was able to execute a crawl of couple of hundred thousand URLs in local
mode , I did not get any OOM exceptions , what  machine configuration do
you use  ?


On Sat, Nov 30, 2013 at 4:43 PM, Amit Sela <am...@infolinks.com> wrote:

I get OOM exception in parse phase.
I think it's related to https://issues.apache.org/jira/browse/NUTCH-1640
Did anyone succeed in fetching and parsing hundreds of thousands or even
millions of pages with Nutch 1.7 ?




Reply via email to