[ 
https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860493#action_12860493
 ] 

Soila Pertet commented on NUTCH-650:
------------------------------------

I encountered the following NULL exception while running nutchbase.

2010-04-24 01:58:47,012 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child java.lang.NullPointerException at 
org.apache.hadoop.hbase.io.ImmutableBytesWritable.<init>(ImmutableBytesWritable.java:59)
 at org.apache.nutch.fetcher.Fetcher$FetcherMapper.map(Fetcher.java:81) at 
org.apache.nutch.fetcher.Fetcher$FetcherMapper.map(Fetcher.java:77) at 
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at 
org.apache.hadoop.mapred.Child.main(Child.java:170)

I downloaded nutchbase from svn co 
http://svn.apache.org/repos/asf/lucene/nutch/branches/nutchbase and applied 
Xiao's patch. I am running hadoop-0.20.3, hbase-0.20.3 and zookeeper-3.2.2. 

In my application the error occurs after the first iteration of the 
fetch/generate cycle and is limited to the base url with a generator mark=csh, 
e.g.:
keyvalues={host:http:8080/wikipedia/de/de/index.html/mtdt:_csh_/1272088691273/Put/vlen=4}

But it works fine for values with generator mark=genmrk, e.g.,:
keyvalues={host:http:8080/wikipedia/de/de/images/wikimedia-button.png/mtdt:__genmrk__/1272088714395/Put/vlen=4,
 
host:http:8080/wikipedia/de/de/images/wikimedia-button.png/mtdt:_csh_/1272088691109/Put/vlen=4}

I modified my map function to check for null values in outKeyRaw in  
org.apache.nutch.fetcher.Fetcher$FetcherMapper.map. This masks the error but I 
am not sure if this is the right action to take. Please let me know.

Thanks.

> Hbase Integration
> -----------------
>
>                 Key: NUTCH-650
>                 URL: https://issues.apache.org/jira/browse/NUTCH-650
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Doğacan Güney
>            Assignee: Doğacan Güney
>             Fix For: 2.0
>
>         Attachments: hbase-integration_v1.patch, hbase_v2.patch, 
> malformedurl.patch, meta.patch, meta2.patch, nofollow-hbase.patch, 
> NUTCH-650.patch, nutch-habase.patch, searching.diff, slash.patch
>
>
> This issue will track nutch/hbase integration

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to