Hi, Can you please link me to a well documented blog that explains about setting up Apache Nutch 2.2 end to end. Crawling - moving data to any database and finally to searching in Solr. [Configuration is pain]
This link[1] documented by Thejas is good and well explained. [ Thank you]. However, even after following the steps mentioned in this bit by bit, there is error while running the first "nutch injector job". Error is mentioned below. I see some discussion about this error on mailing list but none explains the fix. I am plainly trying to have the default setup. No specific database. [So Hbase and Gora is ok.] But should I do any specific configuration for it outside eclipse other than what is mentioned on the link? I don't see that I have missed any steps. Please correct me. Also I am new to all the technologies here, so if I had to configure anything. point me to that. I was looking for any blog that may explain [otherwise redirect]about setting up default data base, may be hbase - gora. And changes that is needed to be made to solr so that the index job does not fail. Thanks - David [1] https://wiki.apache.org/nutch/RunNutchInEclipse 2014-04-22 15:29:39,797 INFO crawl.InjectorJob (InjectorJob.java:inject(249)) - InjectorJob: starting at 2014-04-22 15:29:39 2014-04-22 15:29:39,799 INFO crawl.InjectorJob (InjectorJob.java:inject(250)) - InjectorJob: Injecting urlDir: /home/David/ApacheNutch/apache-nutch-1.8/URLS 2014-04-22 15:29:40,162 ERROR crawl.InjectorJob (InjectorJob.java:run(276)) - InjectorJob: java.lang.ClassNotFoundException: org.apache.gora.sql.store.SqlStore at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:90) at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:74) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221) at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)

