Lewis --
Is the DEBUG something set in the conf/log4j.properties file?  I have the
rootLogger set to INFO,DRFA and the threshold is ALL.  Everything else is
INFO or WARN (no DEBUGs to be found.)

Is there something I should set elsewhere that would be causing this?

I'm still a bit lost on what I need to do for the gora-hbase portion.  My
gora-hbase-mapping.xml is unchanged.  Also, from the nutch-default.xml file:
<property>
  <name>storage.schema.webpage</name>
  <value>webpage</value>
  <description>This value holds the schema name used for Nutch web db.
  Note that Nutch ignores the value in the gora mapping files, and uses
  this as the webpage schema name.
  </description>
</property>

So that would lead me to believe that the gora file is just ignored.
If I have the "crawlId" set to "crawlId" -- where do I need to tell nutch
to look in the hbase for the "crawlId_webpage"?


-- Chris


On Mon, May 20, 2013 at 11:56 AM, Lewis John Mcgibbney <
[email protected]> wrote:

> Please search the mailing list for the HBase logging. There was a
> conversation on this reasonably recently.
>
> Please see my other response for the rest.
> hth
> Lewis
>
> On Monday, May 20, 2013, Christopher Gross <[email protected]> wrote:
> > Ok, so the crawlId isn't like the directories used in the 1.x versions of
> > nutch.
> >
> > Well, changing that line makes that part work.  I still get the "Skipping
> > <url>; different batch id (null)" error.
> >
> > I'm not sure if this line from the hadoop.log file relates:
> > INFO  store.HBaseStore - Keyclass and nameclass match but mismatching
> table
> > names  mappingfile schema is 'webpage' vs actual schema 'crawl_webpage' ,
> > assuming they are the same.
> >
> > Any ideas for that one?
> >
> > -- Chris
> >
> >
> > On Fri, May 17, 2013 at 4:32 PM, Tejas Patil <[email protected]
> >wrote:
> >
> >> The exception speaks about the problem:
> >>
> >> java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal
> >> first
> >> character <46> at 0.
> >> User-space table names can only start with 'word characters': i.e.
> >> [a-zA-Z_0-9]: ./crawl/_webpage
> >>
> >> The crawlId passed must follow the regex [a-zA-Z_0-9]. The one you
> passed
> >> has dot and slash.
> >> $ ./bin/nutch inject urls/ -crawlId ./crawl/
> >>
> >> Try this:
> >> $ ./bin/nutch inject urls/ -crawlId crawl
> >>
> >>
> >>
> >> On Fri, May 17, 2013 at 12:47 PM, <[email protected]> wrote:
> >>
> >> > What if you do bin/nutch inject urls/ ?
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: Christopher Gross <[email protected]>
> >> > To: user <[email protected]>
> >> > Sent: Fri, May 17, 2013 11:26 am
> >> > Subject: error crawling
> >> >
> >> >
> >> > I'm having trouble getting my nutch working.  I had it on another
> server
> >> > and it was working fine.  I migrated it to a new server, and I've been
> >> > getting nothing but problems.  My old script wasn't working right
> >> (getting
> >> > a lot of "skipping" on the parser saying that the crawl id was null [a
> >> > separate point of frustration]), so now I'm trying the 'newer' crawl
> >> > script.  This one is worse, since I can't even get the inject to work.
> >> >
> >> > urls contains a "seed.txt" file that worked previously and contains a
> >> bunch
> >> > of urls.  crawl is empty.
> >> >
> >> > from my $NUTCH_HOME directory:
> >> >
> >> > $ ./bin/nutch inject urls/ -crawlId ./crawl/
> >> > InjectorJob: starting
> >> > InjectorJob: urlDir: urls
> >> > InjectorJob: org.apache.gora.util.GoraException:
> >> > java.lang.RuntimeException: java.lang.IllegalArgumentException:
> Illegal
> >> > first character <46> at 0. User-space table names can only start with
> >> 'word
> >> > characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage
> >> >         at
> >> >
> >> >
> >>
>
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
> >> >         at
> >> >
> >> >
> >>
>
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
> >> >         at
> >> >
> >>
> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
> >> >         at
> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214)
> >> >         at
> >> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:228)
> >> >         at
> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:248)
> >> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >> >         at
> org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:258)
> >> > Caused by: java.lang.RuntimeException:
> >> java.lang.IllegalArgumentException:
> >> > Illegal first character <46> at 0. User-space table names can only
> start
> >> > with 'word characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage
> >> >         at
> >> > org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:125)
> >> >         at
> >> >
> >> >
> >>
>
> org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
> >> >         at
> >> >
> >> >
> >>
>
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
> >> >         ... 7 more
> >> > Caused by: java.lang.IllegalArgumentException: Illegal first character
> >> <46>
> >> > at 0. User-space table names can only start with 'word characters':
> i.e.
> >> > [a-zA-Z_0-9]: ./crawl/_webpage
> >> >         at
> >> >
> >> >
> >> org.apache.hadoop.hbase.HTableDescriptor.
>
> --
> *Lewis*
>

Reply via email to