Hi everyone
anybody has any idea why i am getting this error when i run generate
right after i inject to a new crawlId in local mode (that is not to say
that this doesn't happen in deploy mode or on a preexisting crawlID, i
just haven't test those)
2013-03-28 11:06:21,911 INFO crawl.AbstractFetchSchedule -
maxInterval=5184000
2013-03-28 11:06:21,963 INFO regex.RegexURLNormalizer - can't find
rules for scope 'generate_host_count', using default
2013-03-28 11:06:25,158 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 't1_webpage' , assuming they are the same.
2013-03-28 11:06:25,166 INFO mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
2013-03-28 11:06:25,286 WARN mapred.FileOutputCommitter - Output path
is null in cleanup
2013-03-28 11:06:25,287 WARN mapred.LocalJobRunner - job_local_0001
java.lang.NullPointerException
at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:235)
at
org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:588)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.nutch.crawl.GeneratorReducer.reduce(GeneratorReducer.java:79)
at
org.apache.nutch.crawl.GeneratorReducer.reduce(GeneratorReducer.java:40)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2013-03-28 11:06:26,255 ERROR crawl.GeneratorJob - GeneratorJob:
java.lang.RuntimeException: job failed: name=[t1]generate:
1364493979-1392803250, jobid=job_local_0001
at
org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:193)
at
org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:219)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:264)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:272)
error seems to be comeing from here:
(
https://github.com/apache/gora/blob/trunk/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java
)
if(o instanceof StatefulMap) {
StatefulHashMap<Utf8, ?> map = (StatefulHashMap<Utf8, ?>) o;
for (Entry<Utf8, State> e : map.states().entrySet()) {
Utf8 mapKey = e.getKey();
switch (e.getValue()) {
case DIRTY:
--->>>>> byte[] qual = Bytes.toBytes(mapKey.toString());
byte[] val = toBytes(map.get(mapKey),
field.schema().getValueType());
put.add(hcol.getFamily(), qual, val);
hasPuts = true;
break;
case DELETED:
qual = Bytes.toBytes(mapKey.toString());
hasDeletes = true;
delete.deleteColumn(hcol.getFamily(), qual);
break;
}
}
}
thanks,