solr

sumarlidason Tue, 23 Oct 2012 08:01:22 -0700

Alright Lewis,

Leaps and bounds over yesterday's progress. I've abandon the idea of using
HBASE for now. I setup mySQL database. Launch successfully, and now fail @
IndexerJob.createIndexJob()


*gora.properties*
gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
gora.sqlstore.jdbc.url=jdbc:mysql://10.100.220.220:3306/nutch?createDatabaseIfNotExist=true
gora.sqlstore.jdbc.user=root
gora.sqlstore.jdbc.password=pw

*nutch-site.xml*
<configuration>
<property>
<name>http.agent.name</name>
<value>The Nutchess</value>
</property>
<property>
<name>parser.character.encoding.default</name>
<value>utf-8</value>
<description>The character encoding to fall back to when no other
information
is available</description>
</property>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.sql.store.SqlStore</value>
<description>The Gora DataStore class for storing and retrieving data.
Currently the following stores are available: ..
</description>
</property>
</configuration>

[root@hdpjt01 build]# sudo -u mapred hadoop jar apache-nutch-2.1.job
org.apache.nutch.crawl.Crawler urls -solr http://10.100.220.220:8983/solr/
-depth 3 -topN 5
12/10/23 10:31:17 INFO input.FileInputFormat: Total input paths to process :
1
12/10/23 10:31:17 WARN snappy.LoadSnappy: Snappy native library is available
12/10/23 10:31:17 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
12/10/23 10:31:17 INFO snappy.LoadSnappy: Snappy native library loaded
12/10/23 10:31:18 INFO mapred.JobClient: Running job: job_201210221719_0006
12/10/23 10:31:19 INFO mapred.JobClient:  map 0% reduce 0%
12/10/23 10:31:28 INFO mapred.JobClient:  map 100% reduce 0%
12/10/23 10:31:29 INFO mapred.JobClient: Job complete: job_201210221719_0006
*... Several more jobs ...*
12/10/23 10:36:02 INFO mapred.JobClient: Job complete: job_201210221719_0018
12/10/23 10:36:02 INFO mapred.JobClient: Counters: 24
12/10/23 10:36:02 INFO mapred.JobClient:   Job Counters
12/10/23 10:36:02 INFO mapred.JobClient:     Launched reduce tasks=1
12/10/23 10:36:02 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=7562
12/10/23 10:36:02 INFO mapred.JobClient:     Total time spent by all reduces
waiting after reserving slots (ms)=0
12/10/23 10:36:02 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
12/10/23 10:36:02 INFO mapred.JobClient:     Launched map tasks=1
12/10/23 10:36:02 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9683
12/10/23 10:36:02 INFO mapred.JobClient:   FileSystemCounters
12/10/23 10:36:02 INFO mapred.JobClient:     FILE_BYTES_READ=280978
12/10/23 10:36:02 INFO mapred.JobClient:     HDFS_BYTES_READ=1078
12/10/23 10:36:02 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=738512
12/10/23 10:36:02 INFO mapred.JobClient:   Map-Reduce Framework
12/10/23 10:36:02 INFO mapred.JobClient:     Map input records=688
12/10/23 10:36:02 INFO mapred.JobClient:     Reduce shuffle bytes=280978
12/10/23 10:36:02 INFO mapred.JobClient:     Spilled Records=3078
12/10/23 10:36:02 INFO mapred.JobClient:     Map output bytes=277753
12/10/23 10:36:02 INFO mapred.JobClient:     CPU time spent (ms)=9830
12/10/23 10:36:02 INFO mapred.JobClient:     Total committed heap usage
(bytes)=891486208
12/10/23 10:36:02 INFO mapred.JobClient:     Combine input records=0
12/10/23 10:36:02 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1078
12/10/23 10:36:02 INFO mapred.JobClient:     Reduce input records=1539
12/10/23 10:36:02 INFO mapred.JobClient:     Reduce input groups=820
12/10/23 10:36:02 INFO mapred.JobClient:     Combine output records=0
12/10/23 10:36:02 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=949526528
12/10/23 10:36:02 INFO mapred.JobClient:     Reduce output records=820
12/10/23 10:36:02 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=3199545344
12/10/23 10:36:02 INFO mapred.JobClient:     Map output records=1539
Exception in thread "main" java.lang.NullPointerException
        at java.util.Hashtable.put(Hashtable.java:394)
        at java.util.Properties.setProperty(Properties.java:143)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:460)
        at
org.apache.nutch.indexer.IndexerJob.createIndexJob(IndexerJob.java:128)
        at
org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:44)
        at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
        at org.apache.nutch.crawl.Crawler.run(Crawler.java:192)
        at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/nutch-hadoop-solr-tp4014761p4015379.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: nutch/hadoop/solr

Reply via email to