Hello,

I installed the following with no issues:

apache-nutch-2.2.1-src.tar

apache-solr-3.6.2.tar

hbase-0.96.1.1-hadoop2-bin.tar



However, when I try to crawl, this error shows:java[12580:1003] Unable to
load realm info from SCDynamicStore



Please see below for more details. I am new to the web crawling, any help
is appreciated.

Martha



$ bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/2 -depth 3
-topN 5

InjectorJob: starting at 2014-03-12 21:12:25

InjectorJob: Injecting urlDir: urls/seed.txt

2014-03-12 21:12:25.824 java[12580:1003] Unable to load realm info from
SCDynamicStore

InjectorJob: Using class org.apache.gora.memory.store.MemStore as the Gora
storage class.

InjectorJob: total number of urls rejected by filters: 0

InjectorJob: total number of urls injected after normalization and
filtering: 1

Injector: finished at 2014-03-12 21:12:28, elapsed: 00:00:02



$ bin/nutch crawl urls -solr http://localhost:8983/ -depth 4 -topN 5
-threads 4

2014-03-12 21:12:56.972 java[12587:1003] Unable to load realm info from
SCDynamicStore

InjectorJob: Using class org.apache.gora.memory.store.MemStore as the Gora
storage class.

InjectorJob: total number of urls rejected by filters: 0

InjectorJob: total number of urls injected after normalization and
filtering: 1

Exception in thread "main" java.lang.RuntimeException: job failed:
name=generate: null, jobid=job_local338944173_0002

at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)

at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:199)

at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)

at org.apache.nutch.crawl.Crawler.run(Crawler.java:152)

at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)



$ bin/nutch inject urls/seed.txt

InjectorJob: starting at 2014-03-12 21:13:29

InjectorJob: Injecting urlDir: urls/seed.txt

2014-03-12 21:13:29.658 java[12599:1003] Unable to load realm info from
SCDynamicStore

InjectorJob: Using class org.apache.gora.memory.store.MemStore as the Gora
storage class.

InjectorJob: total number of urls rejected by filters: 0

InjectorJob: total number of urls injected after normalization and
filtering: 1

Injector: finished at 2014-03-12 21:13:31, elapsed: 00:00:01



$ bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb
crawl/linkdb crawl/segments/*

SolrIndexerJob: starting

2014-03-12 21:14:12.474 java[12606:1003] Unable to load realm info from
SCDynamicStore

SolrIndexerJob: done.

Reply via email to