Please search the mailing list for the HBase logging. There was a conversation on this reasonably recently.
Please see my other response for the rest. hth Lewis On Monday, May 20, 2013, Christopher Gross <[email protected]> wrote: > Ok, so the crawlId isn't like the directories used in the 1.x versions of > nutch. > > Well, changing that line makes that part work. I still get the "Skipping > <url>; different batch id (null)" error. > > I'm not sure if this line from the hadoop.log file relates: > INFO store.HBaseStore - Keyclass and nameclass match but mismatching table > names mappingfile schema is 'webpage' vs actual schema 'crawl_webpage' , > assuming they are the same. > > Any ideas for that one? > > -- Chris > > > On Fri, May 17, 2013 at 4:32 PM, Tejas Patil <[email protected] >wrote: > >> The exception speaks about the problem: >> >> java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal >> first >> character <46> at 0. >> User-space table names can only start with 'word characters': i.e. >> [a-zA-Z_0-9]: ./crawl/_webpage >> >> The crawlId passed must follow the regex [a-zA-Z_0-9]. The one you passed >> has dot and slash. >> $ ./bin/nutch inject urls/ -crawlId ./crawl/ >> >> Try this: >> $ ./bin/nutch inject urls/ -crawlId crawl >> >> >> >> On Fri, May 17, 2013 at 12:47 PM, <[email protected]> wrote: >> >> > What if you do bin/nutch inject urls/ ? >> > >> > >> > >> > >> > >> > >> > -----Original Message----- >> > From: Christopher Gross <[email protected]> >> > To: user <[email protected]> >> > Sent: Fri, May 17, 2013 11:26 am >> > Subject: error crawling >> > >> > >> > I'm having trouble getting my nutch working. I had it on another server >> > and it was working fine. I migrated it to a new server, and I've been >> > getting nothing but problems. My old script wasn't working right >> (getting >> > a lot of "skipping" on the parser saying that the crawl id was null [a >> > separate point of frustration]), so now I'm trying the 'newer' crawl >> > script. This one is worse, since I can't even get the inject to work. >> > >> > urls contains a "seed.txt" file that worked previously and contains a >> bunch >> > of urls. crawl is empty. >> > >> > from my $NUTCH_HOME directory: >> > >> > $ ./bin/nutch inject urls/ -crawlId ./crawl/ >> > InjectorJob: starting >> > InjectorJob: urlDir: urls >> > InjectorJob: org.apache.gora.util.GoraException: >> > java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal >> > first character <46> at 0. User-space table names can only start with >> 'word >> > characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage >> > at >> > >> > >> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167) >> > at >> > >> > >> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135) >> > at >> > >> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75) >> > at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214) >> > at >> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:228) >> > at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:248) >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> > at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:258) >> > Caused by: java.lang.RuntimeException: >> java.lang.IllegalArgumentException: >> > Illegal first character <46> at 0. User-space table names can only start >> > with 'word characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage >> > at >> > org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:125) >> > at >> > >> > >> org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102) >> > at >> > >> > >> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161) >> > ... 7 more >> > Caused by: java.lang.IllegalArgumentException: Illegal first character >> <46> >> > at 0. User-space table names can only start with 'word characters': i.e. >> > [a-zA-Z_0-9]: ./crawl/_webpage >> > at >> > >> > >> org.apache.hadoop.hbase.HTableDescriptor. -- *Lewis*

