Hi everyone

anybody has any idea why i am getting this error when i run generate right after i inject to a new crawlId in local mode (that is not to say that this doesn't happen in deploy mode or on a preexisting crawlID, i just haven't test those)

2013-03-28 11:06:21,911 INFO crawl.AbstractFetchSchedule - maxInterval=5184000 2013-03-28 11:06:21,963 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default 2013-03-28 11:06:25,158 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 't1_webpage' , assuming they are the same. 2013-03-28 11:06:25,166 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000 2013-03-28 11:06:25,286 WARN mapred.FileOutputCommitter - Output path is null in cleanup
2013-03-28 11:06:25,287 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.NullPointerException
        at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:235)
at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:588) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.nutch.crawl.GeneratorReducer.reduce(GeneratorReducer.java:79) at org.apache.nutch.crawl.GeneratorReducer.reduce(GeneratorReducer.java:40)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) 2013-03-28 11:06:26,255 ERROR crawl.GeneratorJob - GeneratorJob: java.lang.RuntimeException: job failed: name=[t1]generate: 1364493979-1392803250, jobid=job_local_0001 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
        at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:193)
at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:219)
        at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:264)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:272)



error seems to be comeing from here:
( https://github.com/apache/gora/blob/trunk/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java )


if(o instanceof StatefulMap) {
              StatefulHashMap<Utf8, ?> map = (StatefulHashMap<Utf8, ?>) o;
              for (Entry<Utf8, State> e : map.states().entrySet()) {
                Utf8 mapKey = e.getKey();
                switch (e.getValue()) {
                  case DIRTY:
--->>>>>            byte[] qual = Bytes.toBytes(mapKey.toString());
byte[] val = toBytes(map.get(mapKey), field.schema().getValueType());
                    put.add(hcol.getFamily(), qual, val);
                    hasPuts = true;
                    break;
                  case DELETED:
                    qual = Bytes.toBytes(mapKey.toString());
                    hasDeletes = true;
                    delete.deleteColumn(hcol.getFamily(), qual);
                    break;
                }
              }
            }

thanks,

Reply via email to