i tried this once but before i knew it my log file was approaching a gig
within an hour or so!


> I suggest maybe turning the debug logs on for hadoop before you do the
> next crawl... you can do this by editing log4j.properties
> and change the rootLogger from INFO to DEBUG
>
> On Thu, Nov 5, 2009 at 12:37 AM, Andrzej Bialecki <a...@getopt.org> wrote:
>> fa...@butterflycluster.net wrote:
>>>
>>> Hi there,
>>>
>>> seems i have some serious problems with hadoop during map-reduce for
>>> MergeSegments.
>>>
>>> i am out of ideas on this. Any suggestions will be quite welcome.
>>>
>>> Here is my set up:
>>>
>>> RAM: 4G
>>> JVM HEAP: 2G
>>> mapred.child.java.opts = 1024M
>>> hadoop-0.19.1-core.jar
>>> nutch-1.0
>>> Xen VPS.
>>>
>>> After running a recrawl a few times; i end up with one segment that is
>>> relatively larger compared to the new ones last generated. here is my
>>> segments structure when things blow up after a (5th) recrawl;
>>>
>>> segment1 = 674Megs (after several recrawls)
>>> segment2 = 580k (last recrawl)
>>> segment3 = 568k (last recrawl)
>>> segment4 = 584k (last recrawl)
>>> ..
>>> segment8 = 560k (last recrawl)
>>>
>>> when i run mergeSegments everything goes well until we get up to 90% of
>>> the map-reduce and we get a thread death; here is a stack trace
>>>
>>> 2009-11-05 10:54:16,874 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
>>> reduce > reduce
>>> 2009-11-05 10:54:29,794 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
>>> reduce > reduce
>>> 2009-11-05 10:54:55,194 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
>>> reduce > reduce
>>> 2009-11-05 10:57:25,844 WARN  [org.apache.hadoop.mapred.LocalJobRunner]
>>> job_local_0001
>>> java.lang.ThreadDeath
>>>        at java.lang.Thread.stop(Thread.java:715)
>>>        at
>>> org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310)
>>>        at
>>>
>>> org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:315)
>>>        at
>>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239)
>>>        at
>>> org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
>>>        at
>>> org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)
>>>
>>> any suggestions please!!!!
>>
>> This is a high-level exception that doesn't indicate the nature of the
>> original problem. Is there any other information in hadoop.log or in
>> task
>> logs (logs/userlogs)?
>>
>> In my experience this sort of things happen rarely, for the relatively
>> small
>> dataset that you have, so you are lucky ;) This could be related to a
>> number
>> of issues, like running this under Xen that imposes some limits and
>> slowdowns, or you may have a low number of file descriptors (ulimit -n),
>> or
>> a faulty RAM, or an overheated CPU ...
>>
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>  ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>>
>


Reply via email to