I'm having trouble getting my nutch working.  I had it on another server
and it was working fine.  I migrated it to a new server, and I've been
getting nothing but problems.  My old script wasn't working right (getting
a lot of "skipping" on the parser saying that the crawl id was null [a
separate point of frustration]), so now I'm trying the 'newer' crawl
script.  This one is worse, since I can't even get the inject to work.

urls contains a "seed.txt" file that worked previously and contains a bunch
of urls.  crawl is empty.

from my $NUTCH_HOME directory:

$ ./bin/nutch inject urls/ -crawlId ./crawl/
InjectorJob: starting
InjectorJob: urlDir: urls
InjectorJob: org.apache.gora.util.GoraException:
java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal
first character <46> at 0. User-space table names can only start with 'word
characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage
        at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
        at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
        at
org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214)
        at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:228)
        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:248)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:258)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException:
Illegal first character <46> at 0. User-space table names can only start
with 'word characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage
        at
org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:125)
        at
org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
        at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
        ... 7 more
Caused by: java.lang.IllegalArgumentException: Illegal first character <46>
at 0. User-space table names can only start with 'word characters': i.e.
[a-zA-Z_0-9]: ./crawl/_webpage
        at
org.apache.hadoop.hbase.HTableDescriptor.isLegalTableName(HTableDescriptor.java:280)
        at
org.apache.hadoop.hbase.HTableDescriptor.<init>(HTableDescriptor.java:172)
        at
org.apache.hadoop.hbase.HTableDescriptor.<init>(HTableDescriptor.java:158)
        at
org.apache.gora.hbase.store.HBaseMapping$HBaseMappingBuilder.build(HBaseMapping.java:171)
        at
org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:592)
        at
org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:111)
        ... 9 more

Where is the "_webpage" coming from?  Am I just missing something?

Any help/ideas/references would be appreciated.

Thanks!

-- Chris

Reply via email to