Lewis -- Is the DEBUG something set in the conf/log4j.properties file? I have the rootLogger set to INFO,DRFA and the threshold is ALL. Everything else is INFO or WARN (no DEBUGs to be found.)
Is there something I should set elsewhere that would be causing this? I'm still a bit lost on what I need to do for the gora-hbase portion. My gora-hbase-mapping.xml is unchanged. Also, from the nutch-default.xml file: <property> <name>storage.schema.webpage</name> <value>webpage</value> <description>This value holds the schema name used for Nutch web db. Note that Nutch ignores the value in the gora mapping files, and uses this as the webpage schema name. </description> </property> So that would lead me to believe that the gora file is just ignored. If I have the "crawlId" set to "crawlId" -- where do I need to tell nutch to look in the hbase for the "crawlId_webpage"? -- Chris On Mon, May 20, 2013 at 11:56 AM, Lewis John Mcgibbney < [email protected]> wrote: > Please search the mailing list for the HBase logging. There was a > conversation on this reasonably recently. > > Please see my other response for the rest. > hth > Lewis > > On Monday, May 20, 2013, Christopher Gross <[email protected]> wrote: > > Ok, so the crawlId isn't like the directories used in the 1.x versions of > > nutch. > > > > Well, changing that line makes that part work. I still get the "Skipping > > <url>; different batch id (null)" error. > > > > I'm not sure if this line from the hadoop.log file relates: > > INFO store.HBaseStore - Keyclass and nameclass match but mismatching > table > > names mappingfile schema is 'webpage' vs actual schema 'crawl_webpage' , > > assuming they are the same. > > > > Any ideas for that one? > > > > -- Chris > > > > > > On Fri, May 17, 2013 at 4:32 PM, Tejas Patil <[email protected] > >wrote: > > > >> The exception speaks about the problem: > >> > >> java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal > >> first > >> character <46> at 0. > >> User-space table names can only start with 'word characters': i.e. > >> [a-zA-Z_0-9]: ./crawl/_webpage > >> > >> The crawlId passed must follow the regex [a-zA-Z_0-9]. The one you > passed > >> has dot and slash. > >> $ ./bin/nutch inject urls/ -crawlId ./crawl/ > >> > >> Try this: > >> $ ./bin/nutch inject urls/ -crawlId crawl > >> > >> > >> > >> On Fri, May 17, 2013 at 12:47 PM, <[email protected]> wrote: > >> > >> > What if you do bin/nutch inject urls/ ? > >> > > >> > > >> > > >> > > >> > > >> > > >> > -----Original Message----- > >> > From: Christopher Gross <[email protected]> > >> > To: user <[email protected]> > >> > Sent: Fri, May 17, 2013 11:26 am > >> > Subject: error crawling > >> > > >> > > >> > I'm having trouble getting my nutch working. I had it on another > server > >> > and it was working fine. I migrated it to a new server, and I've been > >> > getting nothing but problems. My old script wasn't working right > >> (getting > >> > a lot of "skipping" on the parser saying that the crawl id was null [a > >> > separate point of frustration]), so now I'm trying the 'newer' crawl > >> > script. This one is worse, since I can't even get the inject to work. > >> > > >> > urls contains a "seed.txt" file that worked previously and contains a > >> bunch > >> > of urls. crawl is empty. > >> > > >> > from my $NUTCH_HOME directory: > >> > > >> > $ ./bin/nutch inject urls/ -crawlId ./crawl/ > >> > InjectorJob: starting > >> > InjectorJob: urlDir: urls > >> > InjectorJob: org.apache.gora.util.GoraException: > >> > java.lang.RuntimeException: java.lang.IllegalArgumentException: > Illegal > >> > first character <46> at 0. User-space table names can only start with > >> 'word > >> > characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage > >> > at > >> > > >> > > >> > > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167) > >> > at > >> > > >> > > >> > > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135) > >> > at > >> > > >> > org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75) > >> > at > org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214) > >> > at > >> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:228) > >> > at > org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:248) > >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >> > at > org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:258) > >> > Caused by: java.lang.RuntimeException: > >> java.lang.IllegalArgumentException: > >> > Illegal first character <46> at 0. User-space table names can only > start > >> > with 'word characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage > >> > at > >> > org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:125) > >> > at > >> > > >> > > >> > > org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102) > >> > at > >> > > >> > > >> > > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161) > >> > ... 7 more > >> > Caused by: java.lang.IllegalArgumentException: Illegal first character > >> <46> > >> > at 0. User-space table names can only start with 'word characters': > i.e. > >> > [a-zA-Z_0-9]: ./crawl/_webpage > >> > at > >> > > >> > > >> org.apache.hadoop.hbase.HTableDescriptor. > > -- > *Lewis* >

