Ok, so the crawlId isn't like the directories used in the 1.x versions of nutch.
Well, changing that line makes that part work. I still get the "Skipping <url>; different batch id (null)" error. I'm not sure if this line from the hadoop.log file relates: INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'crawl_webpage' , assuming they are the same. Any ideas for that one? -- Chris On Fri, May 17, 2013 at 4:32 PM, Tejas Patil <[email protected]>wrote: > The exception speaks about the problem: > > java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal > first > character <46> at 0. > User-space table names can only start with 'word characters': i.e. > [a-zA-Z_0-9]: ./crawl/_webpage > > The crawlId passed must follow the regex [a-zA-Z_0-9]. The one you passed > has dot and slash. > $ ./bin/nutch inject urls/ -crawlId ./crawl/ > > Try this: > $ ./bin/nutch inject urls/ -crawlId crawl > > > > On Fri, May 17, 2013 at 12:47 PM, <[email protected]> wrote: > > > What if you do bin/nutch inject urls/ ? > > > > > > > > > > > > > > -----Original Message----- > > From: Christopher Gross <[email protected]> > > To: user <[email protected]> > > Sent: Fri, May 17, 2013 11:26 am > > Subject: error crawling > > > > > > I'm having trouble getting my nutch working. I had it on another server > > and it was working fine. I migrated it to a new server, and I've been > > getting nothing but problems. My old script wasn't working right > (getting > > a lot of "skipping" on the parser saying that the crawl id was null [a > > separate point of frustration]), so now I'm trying the 'newer' crawl > > script. This one is worse, since I can't even get the inject to work. > > > > urls contains a "seed.txt" file that worked previously and contains a > bunch > > of urls. crawl is empty. > > > > from my $NUTCH_HOME directory: > > > > $ ./bin/nutch inject urls/ -crawlId ./crawl/ > > InjectorJob: starting > > InjectorJob: urlDir: urls > > InjectorJob: org.apache.gora.util.GoraException: > > java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal > > first character <46> at 0. User-space table names can only start with > 'word > > characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage > > at > > > > > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167) > > at > > > > > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135) > > at > > > org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75) > > at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214) > > at > org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:228) > > at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:248) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:258) > > Caused by: java.lang.RuntimeException: > java.lang.IllegalArgumentException: > > Illegal first character <46> at 0. User-space table names can only start > > with 'word characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage > > at > > org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:125) > > at > > > > > org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102) > > at > > > > > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161) > > ... 7 more > > Caused by: java.lang.IllegalArgumentException: Illegal first character > <46> > > at 0. User-space table names can only start with 'word characters': i.e. > > [a-zA-Z_0-9]: ./crawl/_webpage > > at > > > > > org.apache.hadoop.hbase.HTableDescriptor.isLegalTableName(HTableDescriptor.java:280) > > at > > > org.apache.hadoop.hbase.HTableDescriptor.<init>(HTableDescriptor.java:172) > > at > > > org.apache.hadoop.hbase.HTableDescriptor.<init>(HTableDescriptor.java:158) > > at > > > > > org.apache.gora.hbase.store.HBaseMapping$HBaseMappingBuilder.build(HBaseMapping.java:171) > > at > > org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:592) > > at > > org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:111) > > ... 9 more > > > > Where is the "_webpage" coming from? Am I just missing something? > > > > Any help/ideas/references would be appreciated. > > > > Thanks! > > > > -- Chris > > > > > > >

