Hi,

I’m using nutch 2.3 on OS X 10.9.5 with homebrew.

I’ve been unable to use the crawl command with MySQL, Mongo, or Cassandra. The 
inject step fails in each configuration with the following arcane errors:

1.) MySQL (after downgrading to gora-cpre 0.2.1 in ivy.xml as per comments)
      InjectorJob: Injecting urlDir: urls

Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.avro.Schema.access$1400()Ljava/lang/ThreadLocal;

at org.apache.avro.Schema$Parser.parse(Schema.java:950)

at org.apache.avro.Schema$Parser.parse(Schema.java:943)

at org.apache.nutch.storage.WebPage.<clinit>(WebPage.java:30)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

at org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:76)

at 
org.apache.gora.persistency.impl.BeanFactoryImpl.<init>(BeanFactoryImpl.java:53)

at org.apache.gora.store.impl.DataStoreBase.initialize(DataStoreBase.java:80)

at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:146)

2.) Mongo with default 0.5 gora

InjectorJob: Injecting urlDir: urls

InjectorJob: org.apache.gora.util.GoraException: java.lang.NullPointerException

at 
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:169)

at 
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:137)

at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)

at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)

at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)

at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)

Caused by: java.lang.NullPointerException

at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)

at 
java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:964)

at org.apache.gora.mongodb.store.MongoStore.getDB(MongoStore.java:243)

at org.apache.gora.mongodb.store.MongoStore.initialize(MongoStore.java:171)

at 
org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:104)

at 
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:163)

... 7 more

3.) Mongo(upgrading to gora 0.6.1 to resolve previous issue above)

InjectorJob: Injecting urlDir: urls

InjectorJob: java.lang.UnsupportedOperationException: Not implemented by the 
DistributedFileSystem FileSystem implementation

at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:214)

at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2559)

at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2569)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2586)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:352)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:372)

at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:212)

at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)

at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)

4.) Cassandra using default gora 0.5

InjectorJob: Injecting urlDir: urls

Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.avro.Schema.access$1400()Ljava/lang/ThreadLocal;

at org.apache.avro.Schema$Parser.parse(Schema.java:950)

at org.apache.avro.Schema$Parser.parse(Schema.java:943)

at org.apache.nutch.storage.WebPage.<clinit>(WebPage.java:30)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

at org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:76)

at 
org.apache.gora.persistency.impl.BeanFactoryImpl.<init>(BeanFactoryImpl.java:53)

at org.apache.gora.store.impl.DataStoreBase.initialize(DataStoreBase.java:80)

at 
org.apache.gora.cassandra.store.CassandraStore.initialize(CassandraStore.java:143)

at 
org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)

at 
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)

at 
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)

at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)

at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)

at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)

at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)

Does the “crawl" script inject task work with any backend storage reliably on 
OS X?

Which backend is the most reliable to use with nutch 2.3?

It’s frustrating that 3 common (and supposedly supported) backends don’t work 
with nutch due to arcane errors.

Cheers,
Sherban
--
Sherban Drulea, RAND Corporation
Senior Research Software Engineer, Information Services
m5129   x7384   [email protected]<mailto:[email protected]>
—

__________________________________________________________________________

This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.

Reply via email to