What if you do bin/nutch inject urls/ ?
-----Original Message----- From: Christopher Gross <[email protected]> To: user <[email protected]> Sent: Fri, May 17, 2013 11:26 am Subject: error crawling I'm having trouble getting my nutch working. I had it on another server and it was working fine. I migrated it to a new server, and I've been getting nothing but problems. My old script wasn't working right (getting a lot of "skipping" on the parser saying that the crawl id was null [a separate point of frustration]), so now I'm trying the 'newer' crawl script. This one is worse, since I can't even get the inject to work. urls contains a "seed.txt" file that worked previously and contains a bunch of urls. crawl is empty. from my $NUTCH_HOME directory: $ ./bin/nutch inject urls/ -crawlId ./crawl/ InjectorJob: starting InjectorJob: urlDir: urls InjectorJob: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal first character <46> at 0. User-space table names can only start with 'word characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135) at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214) at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:228) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:248) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:258) Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal first character <46> at 0. User-space table names can only start with 'word characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:125) at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161) ... 7 more Caused by: java.lang.IllegalArgumentException: Illegal first character <46> at 0. User-space table names can only start with 'word characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage at org.apache.hadoop.hbase.HTableDescriptor.isLegalTableName(HTableDescriptor.java:280) at org.apache.hadoop.hbase.HTableDescriptor.<init>(HTableDescriptor.java:172) at org.apache.hadoop.hbase.HTableDescriptor.<init>(HTableDescriptor.java:158) at org.apache.gora.hbase.store.HBaseMapping$HBaseMappingBuilder.build(HBaseMapping.java:171) at org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:592) at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:111) ... 9 more Where is the "_webpage" coming from? Am I just missing something? Any help/ideas/references would be appreciated. Thanks! -- Chris

