Hi,
I’m using nutch 2.3 on OS X 10.9.5 with homebrew.
I’ve been unable to use the crawl command with MySQL, Mongo, or Cassandra. The
inject step fails in each configuration with the following arcane errors:
1.) MySQL (after downgrading to gora-cpre 0.2.1 in ivy.xml as per comments)
InjectorJob: Injecting urlDir: urls
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.avro.Schema.access$1400()Ljava/lang/ThreadLocal;
at org.apache.avro.Schema$Parser.parse(Schema.java:950)
at org.apache.avro.Schema$Parser.parse(Schema.java:943)
at org.apache.nutch.storage.WebPage.<clinit>(WebPage.java:30)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:76)
at
org.apache.gora.persistency.impl.BeanFactoryImpl.<init>(BeanFactoryImpl.java:53)
at org.apache.gora.store.impl.DataStoreBase.initialize(DataStoreBase.java:80)
at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:146)
2.) Mongo with default 0.5 gora
InjectorJob: Injecting urlDir: urls
InjectorJob: org.apache.gora.util.GoraException: java.lang.NullPointerException
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:169)
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:137)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Caused by: java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
at
java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:964)
at org.apache.gora.mongodb.store.MongoStore.getDB(MongoStore.java:243)
at org.apache.gora.mongodb.store.MongoStore.initialize(MongoStore.java:171)
at
org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:104)
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:163)
... 7 more
3.) Mongo(upgrading to gora 0.6.1 to resolve previous issue above)
InjectorJob: Injecting urlDir: urls
InjectorJob: java.lang.UnsupportedOperationException: Not implemented by the
DistributedFileSystem FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:214)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2559)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2569)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2586)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:352)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:372)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:212)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
4.) Cassandra using default gora 0.5
InjectorJob: Injecting urlDir: urls
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.avro.Schema.access$1400()Ljava/lang/ThreadLocal;
at org.apache.avro.Schema$Parser.parse(Schema.java:950)
at org.apache.avro.Schema$Parser.parse(Schema.java:943)
at org.apache.nutch.storage.WebPage.<clinit>(WebPage.java:30)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:76)
at
org.apache.gora.persistency.impl.BeanFactoryImpl.<init>(BeanFactoryImpl.java:53)
at org.apache.gora.store.impl.DataStoreBase.initialize(DataStoreBase.java:80)
at
org.apache.gora.cassandra.store.CassandraStore.initialize(CassandraStore.java:143)
at
org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Does the “crawl" script inject task work with any backend storage reliably on
OS X?
Which backend is the most reliable to use with nutch 2.3?
It’s frustrating that 3 common (and supposedly supported) backends don’t work
with nutch due to arcane errors.
Cheers,
Sherban
--
Sherban Drulea, RAND Corporation
Senior Research Software Engineer, Information Services
m5129 x7384 [email protected]<mailto:[email protected]>
—
__________________________________________________________________________
This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.