Re: Nutch 2.x- Hbase - Solr Configuration

Alparslan Avcı Tue, 22 Apr 2014 03:52:24 -0700

Hi David,

Welcome to Apache Nutch Community :)

You can use other wiki pages [0] for detailed information of Nutch 2.xcrawling. And also for the sample configuration files, you can use thislink [1].

For the exception, it is probably arised because of the Goraconfiguration. You should uncomment the suitable Gora artifact lines atthe end of [NUTCH_HOME]/conf/ivy.xml file. For example, if you want touse HBase as your database; you should uncomment the lines below:

<dependency org="org.apache.gora" name="gora-core" rev="0.3"conf="*->default"/><dependency org="org.apache.gora" name="gora-hbase" rev="0.3"conf="*->default"/>

Moreover, you also should update the "gora.datastore.default" propertyin [NUTCH_HOME]/conf/gora.properties file according to your database.For instace; if you use Hbase, than you should add this line:


gora.datastore.default=org.apache.gora.hbase.store.HBaseStore

Please feel free to ask about your future problems to this mailing list.We will be glad if we can help.


Thanks,
Alparslan



[0] https://wiki.apache.org/nutch/Nutch2Crawling
[1] https://wiki.apache.org/nutch/NutchConfigurationFiles-2.x


On 22-04-2014 13:09, David Philip wrote:

Hi,

   Can you please link me to a well documented blog that explains about
setting up Apache Nutch 2.2 end to end. Crawling - moving data to any
database  and finally to searching in Solr. [Configuration is pain]

This link[1] documented by Thejas is good and well explained. [ Thank you].
However, even after following the steps mentioned in this bit by bit, there
is error while running the first "nutch injector job". Error is mentioned
below. I see some discussion about this error on mailing list but none
explains the fix. I am plainly trying to have the default setup. No
specific database. [So Hbase and Gora is ok.] But should I do any  specific
configuration for it outside eclipse other than what is mentioned on the
link? I don't see that I have missed any steps. Please correct me. Also I
am new to all the technologies here, so if I had to configure anything.
point me to that.


I was looking for any blog that may explain  [otherwise redirect]about
setting up default data base, may be hbase - gora. And changes that is
needed to be made to solr so that the index job does not fail.


Thanks - David

[1] https://wiki.apache.org/nutch/RunNutchInEclipse


2014-04-22 15:29:39,797 INFO  crawl.InjectorJob
(InjectorJob.java:inject(249)) - InjectorJob: starting at 2014-04-22
15:29:39
2014-04-22 15:29:39,799 INFO  crawl.InjectorJob
(InjectorJob.java:inject(250)) - InjectorJob: Injecting urlDir:
/home/David/ApacheNutch/apache-nutch-1.8/URLS
2014-04-22 15:29:40,162 ERROR crawl.InjectorJob (InjectorJob.java:run(276))
- InjectorJob: java.lang.ClassNotFoundException:
org.apache.gora.sql.store.SqlStore
     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:190)
     at
org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:90)
     at
org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:74)
     at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
     at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
     at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
     at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)

Re: Nutch 2.x- Hbase - Solr Configuration

Reply via email to