i tried this once but before i knew it my log file was approaching a gig
within an hour or so!
I suggest maybe turning the debug logs on for hadoop before you do the
next crawl... you can do this by editing log4j.properties
and change the rootLogger from INFO to DEBUG
On Thu, Nov 5, 2009 at
hi there,
we tried a few things around this; one suggestion was to run on it on a
local machine; so i pulled one of our decent servers and got to work...
but surprisingly we got the same error on a local machine!
so it seems the hardware (VPS/Local) wasnt the culprit.. probably the
data, or the