Hi all - I'm stuck. I'm a Nutch and Solr newbie.
I'm trying to crawl "http://nutch.apache.org" and store the crawl results on Solr. I'm uncertain if the crawl worked because I don't see the crawl results in Solr. I figured crawling the apache.org site would be a safe test. ------------- Here's my console --------------- MARKs-Mac-Pro:local mark$ bin/crawl urls/ TestCrawl/ http://localhost:8983/solr/ 2 InjectorJob: starting at 2014-12-30 09:01:57 InjectorJob: Injecting urlDir: urls InjectorJob: Using class org.apache.gora.memory.store.MemStore as the Gora storage class. InjectorJob: total number of urls rejected by filters: 0 InjectorJob: total number of urls injected after normalization and filtering: 1 Injector: finished at 2014-12-30 09:01:58, elapsed: 00:00:01 Tue Dec 30 09:01:58 PST 2014 : Iteration 1 of 2 Generating batchId Generating a new fetchlist GeneratorJob: starting at 2014-12-30 09:01:59 GeneratorJob: Selecting best-scoring urls due for fetch. GeneratorJob: starting GeneratorJob: filtering: false GeneratorJob: normalizing: false GeneratorJob: topN: 50000 GeneratorJob: finished at 2014-12-30 09:02:00, time elapsed: 00:00:01 GeneratorJob: generated batch id: 1419958918-5854 Fetching : FetcherJob: starting FetcherJob: batchId: 1419958918-5854 Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. FetcherJob: threads: 50 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob : timelimit set for : 1419969721241 Using queue mode : byHost Fetcher: threads: 50 QueueFeeder finished: total 0 records. Hit by time limit :0 .... ( I removed the "-finishing thread FetcherThread0, activeThreads=0" messages for brevity) Fetcher: throughput threshold: -1 -finishing thread FetcherThread49, activeThreads=0 Fetcher: throughput threshold sequence: 5 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues -activeThreads=0 FetcherJob: done Parsing : ParserJob: starting ParserJob: resuming: false ParserJob: forced reparse: false ParserJob: batchId: 1419958918-5854 ParserJob: success CrawlDB update for TestCrawl/ DbUpdaterJob: starting DbUpdaterJob: done Indexing TestCrawl/ on SOLR index -> http://localhost:8983/solr/ SolrIndexerJob: starting SolrIndexerJob: done. SOLR dedup -> http://localhost:8983/solr/ Mark

