Thanks Ferdy

On Thu, May 31, 2012 at 9:58 PM, George Smith <[email protected]> wrote:
> Hi Ferdy.
>
> That patch is fantastic. I applied the change yesterday morning and
> everything is parsing smoothly. Thanks for your help.
>
>
>
> On Wed, May 30, 2012 at 5:30 AM, Ferdy Galema <[email protected]>wrote:
>
>> I already stumbled upon this some time ago. I just created the following
>> issue with a work-around for it. (Already committed it so you can update
>> your workspace as an alternative to applying the patch).
>>
>> https://issues.apache.org/jira/browse/NUTCH-1379
>>
>>
>> On Tue, May 29, 2012 at 3:57 PM, Lewis John Mcgibbney <
>> [email protected]> wrote:
>>
>> > Hi George,
>> >
>> > How were you executing parsing on this page?
>> >
>> > the toArgMap method in Tool Util can throw a runtime exception,
>> > however this doesn't look like the one your getting.
>> >
>> > What other kind of logging do you have around here? Specifically
>> > related to when the parse method kicks in? This might give us a bit
>> > more idea where exactly this is happening.
>> >
>> > Thanks
>> >
>> > Lewis
>> >
>> > On Fri, May 25, 2012 at 9:00 PM, George Smith <[email protected]>
>> > wrote:
>> > > I've been using the nutchgora branch for a few months so I'm very new
>> to
>> > it
>> > > and I've been able to find information on the user or dev list, jira,
>> or
>> > > regular web searches for most of the issues I've encountered except
>> one.
>> > >
>> > >
>> > >
>> > > This error occurs frequently when parsing but not always and there
>> > doesn't
>> > > seem to be a common element among the pages it is erroring out on.
>> > >
>> > >
>> > >
>> > > I have tried a number of revisions of the nutchgora branch that built
>> > with
>> > > ant/ivy fine and also with eclipse. I also tried the prebuilt copies
>> from
>> > > Jenkins. I've run them on Debian 6.0.5, ubuntu 10, and ubuntu 12 with
>> the
>> > > openjdk and the sun jdk and always receive the same error.
>> > >
>> > >
>> > >
>> > > Can someone shed some light on this or point me in the right direction.
>> > > Thanks.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > The error output to stdout is:
>> > >
>> > > Parsing http://www.site.com/dir/page.html
>> > >
>> > > Exception in thread "main" java.lang.RuntimeException: job failed:
>> > > name=parse, jobid=job_local_0001
>> > >
>> > > at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:47)
>> > >
>> > > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:242)
>> > >
>> > > at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257)
>> > >
>> > > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300)
>> > >
>> > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> > >
>> > > at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304)
>> > >
>> > >
>> > >
>> > > The error output in the hadoop.log is:
>> > >
>> > > 2012-05-25 13:50:29,635 INFO  parse.ParserJob - Parsing
>> > > http://www.site.com/dir/page.html
>> > >
>> > > 2012-05-25 13:50:29,638 WARN  mapred.FileOutputCommitter - Output path
>> is
>> > > null in cleanup
>> > >
>> > > 2012-05-25 13:50:29,639 WARN  mapred.LocalJobRunner - job_local_0001
>> > >
>> > > java.lang.NullPointerException
>> > >
>> > > at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
>> > >
>> > > at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:212)
>> > >
>> > > at
>> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:123)
>> > >
>> > > at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76)
>> > >
>> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> > >
>> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>> > >
>> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> > >
>> > > at
>> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>> >
>> >
>> >
>> > --
>> > Lewis
>> >
>>



-- 
Lewis

Reply via email to