Ok, so the crawlId isn't like the directories used in the 1.x versions of
nutch.

Well, changing that line makes that part work.  I still get the "Skipping
<url>; different batch id (null)" error.

I'm not sure if this line from the hadoop.log file relates:
INFO  store.HBaseStore - Keyclass and nameclass match but mismatching table
names  mappingfile schema is 'webpage' vs actual schema 'crawl_webpage' ,
assuming they are the same.

Any ideas for that one?

-- Chris


On Fri, May 17, 2013 at 4:32 PM, Tejas Patil <[email protected]>wrote:

> The exception speaks about the problem:
>
> java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal
> first
> character <46> at 0.
> User-space table names can only start with 'word characters': i.e.
> [a-zA-Z_0-9]: ./crawl/_webpage
>
> The crawlId passed must follow the regex [a-zA-Z_0-9]. The one you passed
> has dot and slash.
> $ ./bin/nutch inject urls/ -crawlId ./crawl/
>
> Try this:
> $ ./bin/nutch inject urls/ -crawlId crawl
>
>
>
> On Fri, May 17, 2013 at 12:47 PM, <[email protected]> wrote:
>
> > What if you do bin/nutch inject urls/ ?
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Christopher Gross <[email protected]>
> > To: user <[email protected]>
> > Sent: Fri, May 17, 2013 11:26 am
> > Subject: error crawling
> >
> >
> > I'm having trouble getting my nutch working.  I had it on another server
> > and it was working fine.  I migrated it to a new server, and I've been
> > getting nothing but problems.  My old script wasn't working right
> (getting
> > a lot of "skipping" on the parser saying that the crawl id was null [a
> > separate point of frustration]), so now I'm trying the 'newer' crawl
> > script.  This one is worse, since I can't even get the inject to work.
> >
> > urls contains a "seed.txt" file that worked previously and contains a
> bunch
> > of urls.  crawl is empty.
> >
> > from my $NUTCH_HOME directory:
> >
> > $ ./bin/nutch inject urls/ -crawlId ./crawl/
> > InjectorJob: starting
> > InjectorJob: urlDir: urls
> > InjectorJob: org.apache.gora.util.GoraException:
> > java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal
> > first character <46> at 0. User-space table names can only start with
> 'word
> > characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage
> >         at
> >
> >
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
> >         at
> >
> >
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
> >         at
> >
> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
> >         at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214)
> >         at
> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:228)
> >         at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:248)
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >         at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:258)
> > Caused by: java.lang.RuntimeException:
> java.lang.IllegalArgumentException:
> > Illegal first character <46> at 0. User-space table names can only start
> > with 'word characters': i.e. [a-zA-Z_0-9]: ./crawl/_webpage
> >         at
> > org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:125)
> >         at
> >
> >
> org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
> >         at
> >
> >
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
> >         ... 7 more
> > Caused by: java.lang.IllegalArgumentException: Illegal first character
> <46>
> > at 0. User-space table names can only start with 'word characters': i.e.
> > [a-zA-Z_0-9]: ./crawl/_webpage
> >         at
> >
> >
> org.apache.hadoop.hbase.HTableDescriptor.isLegalTableName(HTableDescriptor.java:280)
> >         at
> >
> org.apache.hadoop.hbase.HTableDescriptor.<init>(HTableDescriptor.java:172)
> >         at
> >
> org.apache.hadoop.hbase.HTableDescriptor.<init>(HTableDescriptor.java:158)
> >         at
> >
> >
> org.apache.gora.hbase.store.HBaseMapping$HBaseMappingBuilder.build(HBaseMapping.java:171)
> >         at
> > org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:592)
> >         at
> > org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:111)
> >         ... 9 more
> >
> > Where is the "_webpage" coming from?  Am I just missing something?
> >
> > Any help/ideas/references would be appreciated.
> >
> > Thanks!
> >
> > -- Chris
> >
> >
> >
>

Reply via email to