Hi Drulea, On Sun, Sep 27, 2015 at 7:36 AM, <[email protected]> wrote:
> > I’m using nutch 2.3 on OS X 10.9.5 with homebrew. > >From the start I would like to point you at the current release candidate for Nutch 2.3.1. The VOTE is currently open and the release candidate is being tested by the community. There are a number of bugs fixed down in Gora (particularly within the gora-mongodb module) which Nutch 2.3.1 will benefit from. It can be obtained from here http://www.mail-archive.com/dev%40nutch.apache.org/msg19271.html Another thing here is that, AFAIK we are not publishing Homebrew recipes! Wherever you got your recipe from I can guarantee you that it is not an official Nutch one! I do however see two lmcgibbn@LMC-032857 /usr/local(joshua) $ brew search nutch No formula found for "nutch". ==> Searching pull requests... Closed pull requests: Added formula for Apache Nutch ( https://github.com/Homebrew/homebrew/pull/26587) Added Apache Nutch 2.2.1 (https://github.com/Homebrew/homebrew/pull/22004) None of these are from the release managers at Nutch... maybe this is something we should look in to. > > I’ve been unable to use the crawl command with MySQL, Mongo, or Cassandra. > The inject step fails in each configuration with the following arcane > errors: > > 1.) MySQL (after downgrading to gora-cpre 0.2.1 in ivy.xml as per comments) > MySQL backend for Gora is broken by now. Things have changed and moved on with the SQL module being left in the dust. Avro has also moved on significantly and we now utilize a MUCH never version of Avro so your NoSuchMethodError below us entirely understandable. > InjectorJob: Injecting urlDir: urls > [...snip] > > > 2.) Mongo with default 0.5 gora > > InjectorJob: Injecting urlDir: urls > > InjectorJob: org.apache.gora.util.GoraException: > java.lang.NullPointerException > > > [...snip] This is gone in the Nutch 2.3.1 release candidate. > 3.) Mongo(upgrading to gora 0.6.1 to resolve previous issue above) > > InjectorJob: Injecting urlDir: urls > > InjectorJob: java.lang.UnsupportedOperationException: Not implemented by > the DistributedFileSystem FileSystem implementation > > > [...snip] Can you please try with the 2.3.1 release candidate and provide the same feedback? > 4.) Cassandra using default gora 0.5 > > InjectorJob: Injecting urlDir: urls > > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.avro.Schema.access$1400()Ljava/lang/ThreadLocal; > > > [...snip] I've never seen this before. On another note, Renato and me are currently overhauling the gora-cassandra driver from Hector --> Datastax Java Driver. Work is ongoing here https://github.com/renato2099/gora/tree/gora-datastax-cassandra > Does the “crawl" script inject task work with any backend storage reliably > on OS X? > Well we can better answer that question if and when you and more people try our the 2.3.1 release candidate. > > Which backend is the most reliable to use with nutch 2.3? > HBase 0.94.14 > > It’s frustrating that 3 common (and supposedly supported) backends don’t > work with nutch due to arcane errors. > > I agree. But lets not throw the baby out with the bath water here. Hows about you try out the above and respond and we can take it from there? Would be great to have more developers submitting patches for 2.X branch. If you are keen then it would be great to have you on board. Thanks Lewis

