Hi all -

I'm stuck.  I'm a Nutch and Solr newbie.

I'm trying to crawl "http://nutch.apache.org"; and store the crawl results
on Solr.  I'm uncertain if the crawl worked because I don't see the crawl
results in Solr.  I figured crawling the apache.org site would be a safe
test.

-------------  Here's my console ---------------

MARKs-Mac-Pro:local mark$ bin/crawl urls/ TestCrawl/
http://localhost:8983/solr/ 2

InjectorJob: starting at 2014-12-30 09:01:57

InjectorJob: Injecting urlDir: urls

InjectorJob: Using class org.apache.gora.memory.store.MemStore as the Gora
storage class.

InjectorJob: total number of urls rejected by filters: 0

InjectorJob: total number of urls injected after normalization and
filtering: 1

Injector: finished at 2014-12-30 09:01:58, elapsed: 00:00:01

Tue Dec 30 09:01:58 PST 2014 : Iteration 1 of 2

Generating batchId

Generating a new fetchlist

GeneratorJob: starting at 2014-12-30 09:01:59

GeneratorJob: Selecting best-scoring urls due for fetch.

GeneratorJob: starting

GeneratorJob: filtering: false

GeneratorJob: normalizing: false

GeneratorJob: topN: 50000

GeneratorJob: finished at 2014-12-30 09:02:00, time elapsed: 00:00:01

GeneratorJob: generated batch id: 1419958918-5854

Fetching :

FetcherJob: starting

FetcherJob: batchId: 1419958918-5854

Fetcher: Your 'http.agent.name' value should be listed first in
'http.robots.agents' property.

FetcherJob: threads: 50

FetcherJob: parsing: false

FetcherJob: resuming: false

FetcherJob : timelimit set for : 1419969721241

Using queue mode : byHost

Fetcher: threads: 50

QueueFeeder finished: total 0 records. Hit by time limit :0

.... ( I removed the "-finishing thread FetcherThread0, activeThreads=0"
messages for brevity)

Fetcher: throughput threshold: -1

-finishing thread FetcherThread49, activeThreads=0

Fetcher: throughput threshold sequence: 5

0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs
in 0 queues

-activeThreads=0

FetcherJob: done

Parsing :

ParserJob: starting

ParserJob: resuming: false

ParserJob: forced reparse: false

ParserJob: batchId: 1419958918-5854

ParserJob: success

CrawlDB update for TestCrawl/

DbUpdaterJob: starting

DbUpdaterJob: done

Indexing TestCrawl/ on SOLR index -> http://localhost:8983/solr/

SolrIndexerJob: starting

SolrIndexerJob: done.

SOLR dedup -> http://localhost:8983/solr/


Mark

Reply via email to