Hi,

  I did some good web search but I hardly found any relevant suggestion to
resolve this issue and get started. I am stuck in setting up Nutch 2.2 with
any data base and integration with Apache Gora.

Line failed:
 DataStore<String, WebPage> store =
StorageUtils.createWebStore(currentJob.getConfiguration(),
      String.class, WebPage.class);

Error:
InjectorJob: java.lang.ClassNotFoundException:
org.apache.gora.sql.store.SqlStore
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

Config done: Property filed of
Eclipse:gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
Necessary changes in ivy is also done.

Where else should the changes need to made or considered as it is still
taking sql store?



I have had used Apache Nutch 1.5 and Solr 4. This was pretty straight
forward to me. I did svn check out of the source in eclipse, created java
project, did the necessary settings in nutch-site xml and configured solr
with tomcat. Finally run and it was successful.


However with Nutch 2.2, I am unable to move forward.  I am trying to doing
set up and run source on eclipse.

I did the svn check out of source and configuration required to do with
Apache gora properties file and nutch-site. I think I am missing something
in configuration, so is it failing.
One thing, Should I do Hbase installation by any chance? Should I need to
have hadoop running for this? [Can you please point me to link on how to do
this should be done with Apache Nutch's hadoop and hbase built on it?  - I
am not clear]
Should I do Apache Gora download separately and follow any specific
installation other than the configuration of setting properties tat is
mentioned?

Thanks - David







On Tue, Apr 22, 2014 at 6:51 PM, David Philip
<[email protected]>wrote:

> Hi Renato,
>
>   Yes running from eclipse.  This is the path of the file and workspace of
> eclipse.
> home/David/Nutch2.2_WorkSpace/Nutch/conf/gora.properties
>
> Here is what I modified or rather added this line to
> gora.properties:gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
>
> Thank you.
>
> David.
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Apr 22, 2014 at 5:50 PM, Renato Marroquín Mogrovejo <
> [email protected]> wrote:
>
>> Hi David,
>>
>> So where are you running this from? command-line? or eclipse? I think your
>> classpath is missing the necessary files.
>> Are you still getting the same exception as before? like if the changes
>> you
>> did took no where? This is probably because the gora.properties file being
>> picked up inside Eclipse is not the same you have modified.
>>
>>
>> Renato M.
>>
>>
>>
>> 2014-04-22 14:16 GMT+02:00 David Philip <[email protected]>:
>>
>> > Hi Alparslan,
>> >
>> >   Thank you for the links. I am browsing through them to see what
>> > configuration is missed out that is leading to the rise of this
>> exception.
>> >
>> >
>> > As for what ever you mentioned expecting the reason for exception, I
>> have
>> > had done everything, i.e,
>> > 1. You should uncomment the suitable Gora artifact lines at the end of
>> > [NUTCH_HOME]/conf/ivy.xml file.
>> > 2. Update the "gora.datastore.default" property in
>> > [NUTCH_HOME]/conf/gora.properties
>> >
>> >
>> > Since these steps are clearly mentioned in the wiki page I was referring
>> > too[1], it was done.
>> > So as I said, I have followed bit by bit, every configuration step
>> > mentioned in this link and after that is the error that I am getting.
>> >
>> > Thanks - David
>> > [1]] https://wiki.apache.org/nutch/RunNutchInEclipse
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Apr 22, 2014 at 4:21 PM, Alparslan Avcı
>> > <[email protected]>wrote:
>> >
>> > > Hi David,
>> > >
>> > > Welcome to Apache Nutch Community :)
>> > >
>> > >
>> > > You can use other wiki pages [0] for detailed information of Nutch 2.x
>> > > crawling. And also for the sample configuration files, you can use
>> this
>> > > link [1].
>> > >
>> > > For the exception, it is probably arised because of the Gora
>> > > configuration. You should uncomment the suitable Gora artifact lines
>> at
>> > the
>> > > end of [NUTCH_HOME]/conf/ivy.xml file. For example, if you want to use
>> > > HBase as your database; you should uncomment the lines below:
>> > >
>> > > <dependency org="org.apache.gora" name="gora-core" rev="0.3"
>> > > conf="*->default"/>
>> > > <dependency org="org.apache.gora" name="gora-hbase" rev="0.3"
>> > > conf="*->default"/>
>> > >
>> > >
>> > > Moreover, you also should update the "gora.datastore.default"
>> property in
>> > > [NUTCH_HOME]/conf/gora.properties file according to your database. For
>> > > instace; if you use Hbase, than you should add this line:
>> > >
>> > > gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
>> > >
>> > >
>> > > Please feel free to ask about your future problems to this mailing
>> list.
>> > > We will be glad if we can help.
>> > >
>> > > Thanks,
>> > > Alparslan
>> > >
>> > >
>> > >
>> > > [0] https://wiki.apache.org/nutch/Nutch2Crawling
>> > > [1] https://wiki.apache.org/nutch/NutchConfigurationFiles-2.x
>> > >
>> > >
>> > >
>> > > On 22-04-2014 13:09, David Philip wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >>    Can you please link me to a well documented blog that explains
>> about
>> > >> setting up Apache Nutch 2.2 end to end. Crawling - moving data to any
>> > >> database  and finally to searching in Solr. [Configuration is pain]
>> > >>
>> > >> This link[1] documented by Thejas is good and well explained. [ Thank
>> > >> you].
>> > >> However, even after following the steps mentioned in this bit by bit,
>> > >> there
>> > >> is error while running the first "nutch injector job". Error is
>> > mentioned
>> > >> below. I see some discussion about this error on mailing list but
>> none
>> > >> explains the fix. I am plainly trying to have the default setup. No
>> > >> specific database. [So Hbase and Gora is ok.] But should I do any
>> > >>  specific
>> > >> configuration for it outside eclipse other than what is mentioned on
>> the
>> > >> link? I don't see that I have missed any steps. Please correct me.
>> Also
>> > I
>> > >> am new to all the technologies here, so if I had to configure
>> anything.
>> > >> point me to that.
>> > >>
>> > >>
>> > >> I was looking for any blog that may explain  [otherwise
>> redirect]about
>> > >> setting up default data base, may be hbase - gora. And changes that
>> is
>> > >> needed to be made to solr so that the index job does not fail.
>> > >>
>> > >>
>> > >> Thanks - David
>> > >>
>> > >> [1] https://wiki.apache.org/nutch/RunNutchInEclipse
>> > >>
>> > >>
>> > >> 2014-04-22 15:29:39,797 INFO  crawl.InjectorJob
>> > >> (InjectorJob.java:inject(249)) - InjectorJob: starting at 2014-04-22
>> > >> 15:29:39
>> > >> 2014-04-22 15:29:39,799 INFO  crawl.InjectorJob
>> > >> (InjectorJob.java:inject(250)) - InjectorJob: Injecting urlDir:
>> > >> /home/David/ApacheNutch/apache-nutch-1.8/URLS
>> > >> 2014-04-22 15:29:40,162 ERROR crawl.InjectorJob
>> > >> (InjectorJob.java:run(276))
>> > >> - InjectorJob: java.lang.ClassNotFoundException:
>> > >> org.apache.gora.sql.store.SqlStore
>> > >>      at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>> > >>      at java.security.AccessController.doPrivileged(Native Method)
>> > >>      at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>> > >>      at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>> > >>      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>> > >>      at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>> > >>      at java.lang.Class.forName0(Native Method)
>> > >>      at java.lang.Class.forName(Class.java:190)
>> > >>      at
>> > >> org.apache.nutch.storage.StorageUtils.getDataStoreClass(
>> > >> StorageUtils.java:90)
>> > >>      at
>> > >> org.apache.nutch.storage.StorageUtils.createWebStore(
>> > >> StorageUtils.java:74)
>> > >>      at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
>> > >>      at
>> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
>> > >>      at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
>> > >>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> > >>      at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)
>> > >>
>> > >>
>> > >
>> >
>>
>
>

Reply via email to