Hello, everybody.
I have rather strange behavior of Nutch 2.3: even initial Inject job is failing
with the following exception (see below).
All Hadoop infrastructure is up and running:
root@5e7ca0b0c19d:~# jps
2810 NutchServer
1071 SecondaryNameNode
99 QuorumPeerMain
1694 ResourceManager
4598 Jps
795 NameNode
2243 HMaster
2376 HRegionServer
2669 ThriftServer
1789 NodeManager
913 DataNode
Even Nutch is configured correctly, because with the same configuration I was
able to crawl some pages and see the data in Solr.
If I understand correctly, one of the goals on InjectorJob is to create
'webpage' table inside of HBase. Shell of HBase also shows 0 tables created.
Do you have any ideas what is wrong here and what should be done to fix this.
2015-04-29 13:23:58,978 INFO crawl.InjectorJob - InjectorJob: starting at
2015-04-29 13:23:58
2015-04-29 13:23:58,979 INFO crawl.InjectorJob - InjectorJob: Injecting
urlDir: ram.txt
2015-04-29 13:24:01,434 ERROR store.HBaseStore -
org.apache.hadoop.hbase.TableExistsException: webpage
2015-04-29 13:24:01,434 ERROR store.HBaseStore -
[Ljava.lang.StackTraceElement;@6a19905e
2015-04-29 13:24:01,454 INFO crawl.InjectorJob - InjectorJob: Using class
org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
2015-04-29 13:24:01,520 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2015-04-29 13:24:01,607 WARN snappy.LoadSnappy - Snappy native library not
loaded
2015-04-29 13:24:02,501 ERROR store.HBaseStore -
org.apache.hadoop.hbase.TableExistsException: webpage
2015-04-29 13:24:02,501 ERROR store.HBaseStore -
[Ljava.lang.StackTraceElement;@523b3317
2015-04-29 13:24:02,813 INFO regex.RegexURLNormalizer - can't find rules for
scope 'inject', using default
2015-04-29 13:24:02,986 WARN
client.HConnectionManager$HConnectionImplementation - Encountered problems when
prefetch META table:
org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for
table: webpage, row=webpage,,99999999999999
at
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:151)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1059)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1121)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1001)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:958)
at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:251)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:155)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:129)
at
org.apache.gora.hbase.store.HBaseTableConnection$1.<init>(HBaseTableConnection.java:87)
at
org.apache.gora.hbase.store.HBaseTableConnection.getTable(HBaseTableConnection.java:87)
at
org.apache.gora.hbase.store.HBaseTableConnection.put(HBaseTableConnection.java:186)
at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:260)
at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:79)
at
org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:188)
at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:82)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-04-29 13:24:02,996 ERROR store.HBaseStore - webpage
2015-04-29 13:24:02,996 ERROR store.HBaseStore -
[Ljava.lang.StackTraceElement;@f757c05
2015-04-29 13:24:03,009 WARN mapred.FileOutputCommitter - Output path is null
in cleanup
2015-04-29 13:24:03,073 INFO crawl.InjectorJob - InjectorJob: total number of
urls rejected by filters: 0
2015-04-29 13:24:03,073 INFO crawl.InjectorJob - InjectorJob: total number of
urls injected after normalization and filtering: 1
2015-04-29 13:24:03,075 INFO crawl.InjectorJob - Injector: finished at
2015-04-29 13:24:03, elapsed: 00:00:04
Alexander Baranov