Hi Talat,
Here are my installation steps. Let me know if there is something not clear!
Best,
Adam
cd ~/Downloads
wget
http://mirror.softaculous.com/apache/nutch/2.2.1/apache-nutch-2.2.1-src.tar.gz
tar -zxvf apache-nutch-2.2.1-src.tar.gz
cd ~/Downloads
wget http://archive.apache.org/dist/hbase/hbase-0.90.4/hbase-0.90.4.tar.gz
tar -zxvf hbase-0.90.4.tar.gz
cd ~/Downloads
wget http://archive.apache.org/dist/lucene/solr/4.7.1/solr-4.7.1.zip
unzip solr-4.7.1.zip -d ~/Downloads
mkdir ~/Downloads/hbase_rootdir
mkdir ~/Downloads/hbase_zookeeper
gedit ~/Downloads/hbase-0.90.4/conf/hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>~/Downloads/hbase_rootdir</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>~/Downloads/hbase_zookeeper</value>
</property>
</configuration>
gedit ~/Downloads/apache-nutch-2.2.1/conf/nutch-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.hbase.store.HBaseStore</value>
<description>Default class for storing data</description>
</property>
</configuration>
gedit ~/Downloads/apache-nutch-2.2.1/ivy/ivy.xml
<!-- Uncomment this to use HBase as Gora backend -->
<dependency org="org.apache.gora" name="gora-hbase" rev="0.3"
conf="*->default" />
gedit ~/Downloads/apache-nutch-2.2.1/conf/gora.properties
# Add this to use HBase as Gora backend
gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
cd ~/Downloads/apache-nutch-2.2.1/
ant runtime
cd ~/Downloads/hbase-0.90.4/
export JAVA_HOME=/usr/lib/jvm/java-7-oracle/
./bin/hbase shell
exit
cd ~/Downloads/apache-nutch-2.2.1/runtime/local
bin/nutch
gedit ~/Downloads/apache-nutch-2.2.1/runtime/local/conf/nutch-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.hbase.store.HBaseStore</value>
<description>Default class for storing data</description>
</property>
<property>
<name>http.agent.name</name>
<value>My Nutch Spider</value>
</property>
</configuration>
cd ~/Downloads/apache-nutch-2.2.1/runtime/local
mkdir -p urls
cd urls
gedit seed.txt
http://nutch.apache.org/
gedit ~/Downloads/apache-nutch-2.2.1/conf/regex-urlfilter.txt
# accept anything else
+^http://([a-z0-9]*\.)*nutch.apache.org/
#Set SOLR home
export SOLR_HOME=~/Downloads/solr-4.7.1/solr/example/solr
cd ~/Downloads/solr-4.7.1/example
java -jar start.jar
http://localhost:8983/solr/admin/
CTRL + C
mv
~/Downloads/solr-4.7.1/solr/example/solr/collection1/conf/schema.xml
~/Downloads/solr-4.7.1/solr/example/solr/collection1/conf/schema.xml.bnk
cp ~/Downloads/apache-nutch-2.2.1/conf/schema.xml
~/Downloads/solr-4.7.1/solr/example/solr/collection1/conf/schema.xml
cd ~/Downloads/solr-4.7.1/example
java -jar start.jar
http://localhost:8983/solr/admin/
CTRL + SHIFT + T
cd ~/Downloads/hbase-0.90.4/
export JAVA_HOME=/usr/lib/jvm/java-7-oracle/
./bin/start-hbase.sh
CTRL + SHIFT + T
cd ~/Downloads/apache-nutch-2.2.1/runtime/local
export JAVA_HOME=/usr/lib/jvm/java-7-oracle/
./bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/ 2
On 04/03/2014 08:18 AM, Talat Uyarer wrote:
Hi Adamantios,
I dont know steps of the book. Can you share us what did you do ? Two
different situation can be caused this error. Either your hbase client
version different hbase server which used by gora (Gora use 0.90.4
hbase client) or your zookeeper has a misconfiguration.
I wait your installation steps :)
Talat
2014-04-03 1:06 GMT+03:00 Adamantios Corais <[email protected]>:
Hi all,
I have followed all steps to set-up Nutch (2.2.1) with HBase (0.90.4) and
Solr (4.7.1) as described in the book "Web Crawling and Data Mining with
Apache Nutch", however, I am getting the following error:
InjectorJob: org.apache.gora.util.GoraException:
java.lang.RuntimeException: java.lang.IllegalArgumentException: Not a
host:port pair: � 27204@eualin-T430eualin-T430,37745,1396453102781
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at
org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException:
Not a host:port pair: � 27204@eualin-T430eualin-T430,37745,1396453102781
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:127)
at
org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 7 more
Caused by: java.lang.IllegalArgumentException: Not a host:port pair: �
27204@eualin-T430eualin-T430,37745,1396453102781
at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:60)
at
org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:63)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:354)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:94)
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:109)
... 9 more
As much as I searched, I could not find any solution. Any ideas?
Best,
Adam.