Hi David, Sorry to take so long to get back to you. Have you try this [1] by any chance? Maybe this will give you a better idea how pieces fit together and then you can move into putting all this into Eclipse.
Renato M. [1] http://wiki.apache.org/nutch/Nutch2Tutorial 2014-04-23 14:35 GMT+02:00 David Philip <[email protected]>: > Hi, > > I did some good web search but I hardly found any relevant suggestion to > resolve this issue and get started. I am stuck in setting up Nutch 2.2 with > any data base and integration with Apache Gora. > > Line failed: > DataStore<String, WebPage> store = > StorageUtils.createWebStore(currentJob.getConfiguration(), > String.class, WebPage.class); > > Error: > InjectorJob: java.lang.ClassNotFoundException: > org.apache.gora.sql.store.SqlStore > at java.net.URLClassLoader$1.run(URLClassLoader.java:217) > > Config done: Property filed of > Eclipse:gora.datastore.default=org.apache.gora.hbase.store.HBaseStore > Necessary changes in ivy is also done. > > Where else should the changes need to made or considered as it is still > taking sql store? > > > > I have had used Apache Nutch 1.5 and Solr 4. This was pretty straight > forward to me. I did svn check out of the source in eclipse, created java > project, did the necessary settings in nutch-site xml and configured solr > with tomcat. Finally run and it was successful. > > > However with Nutch 2.2, I am unable to move forward. I am trying to doing > set up and run source on eclipse. > > I did the svn check out of source and configuration required to do with > Apache gora properties file and nutch-site. I think I am missing something > in configuration, so is it failing. > One thing, Should I do Hbase installation by any chance? Should I need to > have hadoop running for this? [Can you please point me to link on how to do > this should be done with Apache Nutch's hadoop and hbase built on it? - I > am not clear] > Should I do Apache Gora download separately and follow any specific > installation other than the configuration of setting properties tat is > mentioned? > > Thanks - David > > > > > > > > On Tue, Apr 22, 2014 at 6:51 PM, David Philip > <[email protected]>wrote: > > > Hi Renato, > > > > Yes running from eclipse. This is the path of the file and workspace > of > > eclipse. > > home/David/Nutch2.2_WorkSpace/Nutch/conf/gora.properties > > > > Here is what I modified or rather added this line to > > > gora.properties:gora.datastore.default=org.apache.gora.hbase.store.HBaseStore > > > > Thank you. > > > > David. > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 22, 2014 at 5:50 PM, Renato Marroquín Mogrovejo < > > [email protected]> wrote: > > > >> Hi David, > >> > >> So where are you running this from? command-line? or eclipse? I think > your > >> classpath is missing the necessary files. > >> Are you still getting the same exception as before? like if the changes > >> you > >> did took no where? This is probably because the gora.properties file > being > >> picked up inside Eclipse is not the same you have modified. > >> > >> > >> Renato M. > >> > >> > >> > >> 2014-04-22 14:16 GMT+02:00 David Philip <[email protected]>: > >> > >> > Hi Alparslan, > >> > > >> > Thank you for the links. I am browsing through them to see what > >> > configuration is missed out that is leading to the rise of this > >> exception. > >> > > >> > > >> > As for what ever you mentioned expecting the reason for exception, I > >> have > >> > had done everything, i.e, > >> > 1. You should uncomment the suitable Gora artifact lines at the end of > >> > [NUTCH_HOME]/conf/ivy.xml file. > >> > 2. Update the "gora.datastore.default" property in > >> > [NUTCH_HOME]/conf/gora.properties > >> > > >> > > >> > Since these steps are clearly mentioned in the wiki page I was > referring > >> > too[1], it was done. > >> > So as I said, I have followed bit by bit, every configuration step > >> > mentioned in this link and after that is the error that I am getting. > >> > > >> > Thanks - David > >> > [1]] https://wiki.apache.org/nutch/RunNutchInEclipse > >> > > >> > > >> > > >> > > >> > > >> > On Tue, Apr 22, 2014 at 4:21 PM, Alparslan Avcı > >> > <[email protected]>wrote: > >> > > >> > > Hi David, > >> > > > >> > > Welcome to Apache Nutch Community :) > >> > > > >> > > > >> > > You can use other wiki pages [0] for detailed information of Nutch > 2.x > >> > > crawling. And also for the sample configuration files, you can use > >> this > >> > > link [1]. > >> > > > >> > > For the exception, it is probably arised because of the Gora > >> > > configuration. You should uncomment the suitable Gora artifact lines > >> at > >> > the > >> > > end of [NUTCH_HOME]/conf/ivy.xml file. For example, if you want to > use > >> > > HBase as your database; you should uncomment the lines below: > >> > > > >> > > <dependency org="org.apache.gora" name="gora-core" rev="0.3" > >> > > conf="*->default"/> > >> > > <dependency org="org.apache.gora" name="gora-hbase" rev="0.3" > >> > > conf="*->default"/> > >> > > > >> > > > >> > > Moreover, you also should update the "gora.datastore.default" > >> property in > >> > > [NUTCH_HOME]/conf/gora.properties file according to your database. > For > >> > > instace; if you use Hbase, than you should add this line: > >> > > > >> > > gora.datastore.default=org.apache.gora.hbase.store.HBaseStore > >> > > > >> > > > >> > > Please feel free to ask about your future problems to this mailing > >> list. > >> > > We will be glad if we can help. > >> > > > >> > > Thanks, > >> > > Alparslan > >> > > > >> > > > >> > > > >> > > [0] https://wiki.apache.org/nutch/Nutch2Crawling > >> > > [1] https://wiki.apache.org/nutch/NutchConfigurationFiles-2.x > >> > > > >> > > > >> > > > >> > > On 22-04-2014 13:09, David Philip wrote: > >> > > > >> > >> Hi, > >> > >> > >> > >> Can you please link me to a well documented blog that explains > >> about > >> > >> setting up Apache Nutch 2.2 end to end. Crawling - moving data to > any > >> > >> database and finally to searching in Solr. [Configuration is pain] > >> > >> > >> > >> This link[1] documented by Thejas is good and well explained. [ > Thank > >> > >> you]. > >> > >> However, even after following the steps mentioned in this bit by > bit, > >> > >> there > >> > >> is error while running the first "nutch injector job". Error is > >> > mentioned > >> > >> below. I see some discussion about this error on mailing list but > >> none > >> > >> explains the fix. I am plainly trying to have the default setup. No > >> > >> specific database. [So Hbase and Gora is ok.] But should I do any > >> > >> specific > >> > >> configuration for it outside eclipse other than what is mentioned > on > >> the > >> > >> link? I don't see that I have missed any steps. Please correct me. > >> Also > >> > I > >> > >> am new to all the technologies here, so if I had to configure > >> anything. > >> > >> point me to that. > >> > >> > >> > >> > >> > >> I was looking for any blog that may explain [otherwise > >> redirect]about > >> > >> setting up default data base, may be hbase - gora. And changes that > >> is > >> > >> needed to be made to solr so that the index job does not fail. > >> > >> > >> > >> > >> > >> Thanks - David > >> > >> > >> > >> [1] https://wiki.apache.org/nutch/RunNutchInEclipse > >> > >> > >> > >> > >> > >> 2014-04-22 15:29:39,797 INFO crawl.InjectorJob > >> > >> (InjectorJob.java:inject(249)) - InjectorJob: starting at > 2014-04-22 > >> > >> 15:29:39 > >> > >> 2014-04-22 15:29:39,799 INFO crawl.InjectorJob > >> > >> (InjectorJob.java:inject(250)) - InjectorJob: Injecting urlDir: > >> > >> /home/David/ApacheNutch/apache-nutch-1.8/URLS > >> > >> 2014-04-22 15:29:40,162 ERROR crawl.InjectorJob > >> > >> (InjectorJob.java:run(276)) > >> > >> - InjectorJob: java.lang.ClassNotFoundException: > >> > >> org.apache.gora.sql.store.SqlStore > >> > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:217) > >> > >> at java.security.AccessController.doPrivileged(Native Method) > >> > >> at java.net.URLClassLoader.findClass(URLClassLoader.java:205) > >> > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:323) > >> > >> at > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) > >> > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:268) > >> > >> at java.lang.Class.forName0(Native Method) > >> > >> at java.lang.Class.forName(Class.java:190) > >> > >> at > >> > >> org.apache.nutch.storage.StorageUtils.getDataStoreClass( > >> > >> StorageUtils.java:90) > >> > >> at > >> > >> org.apache.nutch.storage.StorageUtils.createWebStore( > >> > >> StorageUtils.java:74) > >> > >> at > org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221) > >> > >> at > >> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251) > >> > >> at > org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273) > >> > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >> > >> at > org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282) > >> > >> > >> > >> > >> > > > >> > > >> > > > > >

