Nothing peculiar, looks like Nutch 1.4 right? But you also didn't mention the 
domain you can't crawl. libraries.mit.edu seems to work, although the indexer 
doesn't seem to send a document in and the parser doesn't mention parsing that 
file.

Either the file throws a parse error or is truncated or ....

> I'm trying to crawl pages from a number of domains, and one of these
> domains has been giving me trouble. The really irritating thing is that it
> did work at least once, which led me to believe that I'd solved the
> problem. I can't think of anything at this point but to paste my log of a
> failed crawl and solrindex and hope that someone can think of anything
> I've overlooked. Does anything look strange here?
> 
> Thanks,
> Chip
> 
> 2011-12-19 16:31:01,010 WARN  crawl.Crawl - solrUrl is not set, indexing
> will be skipped... 2011-12-19 16:31:01,404 INFO  crawl.Crawl - crawl
> started in: mit-c-crawl 2011-12-19 16:31:01,420 INFO  crawl.Crawl -
> rootUrlDir = mit-c-urls 2011-12-19 16:31:01,420 INFO  crawl.Crawl -
> threads = 10
> 2011-12-19 16:31:01,420 INFO  crawl.Crawl - depth = 1
> 2011-12-19 16:31:01,420 INFO  crawl.Crawl - solrUrl=null
> 2011-12-19 16:31:01,420 INFO  crawl.Crawl - topN = 500000
> 2011-12-19 16:31:01,420 INFO  crawl.Injector - Injector: starting at
> 2011-12-19 16:31:01 2011-12-19 16:31:01,420 INFO  crawl.Injector -
> Injector: crawlDb: mit-c-crawl/crawldb 2011-12-19 16:31:01,420 INFO 
> crawl.Injector - Injector: urlDir: mit-c-urls 2011-12-19 16:31:01,436 INFO
>  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2011-12-19 16:31:02,854 INFO  plugin.PluginRepository - Plugins: looking
> in: C:\Apache\apache-nutch-1.4\runtime\local\plugins 2011-12-19
> 16:31:02,917 INFO  plugin.PluginRepository - Plugin Auto-activation mode:
> [true] 2011-12-19 16:31:02,917 INFO  plugin.PluginRepository - Registered
> Plugins: 2011-12-19 16:31:02,917 INFO  plugin.PluginRepository -          
>      the nutch core extension points (nutch-extensionpoints) 2011-12-19
> 16:31:02,917 INFO  plugin.PluginRepository -                Basic URL
> Normalizer (urlnormalizer-basic) 2011-12-19 16:31:02,917 INFO 
> plugin.PluginRepository -                Html Parse Plug-in (parse-html)
> 2011-12-19 16:31:02,917 INFO  plugin.PluginRepository -               
> Basic Indexing Filter (index-basic) 2011-12-19 16:31:02,917 INFO 
> plugin.PluginRepository -                Http / Https Protocol Plug-in
> (protocol-httpclient) 2011-12-19 16:31:02,917 INFO 
> plugin.PluginRepository -                HTTP Framework (lib-http)
> 2011-12-19 16:31:02,917 INFO  plugin.PluginRepository -               
> Regex URL Filter (urlfilter-regex) 2011-12-19 16:31:02,917 INFO 
> plugin.PluginRepository -                Pass-through URL Normalizer
> (urlnormalizer-pass) 2011-12-19 16:31:02,917 INFO  plugin.PluginRepository
> -                Http Protocol Plug-in (protocol-http) 2011-12-19
> 16:31:02,917 INFO  plugin.PluginRepository -                Regex URL
> Normalizer (urlnormalizer-regex) 2011-12-19 16:31:02,917 INFO 
> plugin.PluginRepository -                Tika Parser Plug-in (parse-tika)
> 2011-12-19 16:31:02,917 INFO  plugin.PluginRepository -               
> OPIC Scoring Plug-in (scoring-opic) 2011-12-19 16:31:02,917 INFO 
> plugin.PluginRepository -                CyberNeko HTML Parser
> (lib-nekohtml) 2011-12-19 16:31:02,917 INFO  plugin.PluginRepository -    
>            Anchor Indexing Filter (index-anchor) 2011-12-19 16:31:02,917
> INFO  plugin.PluginRepository -                URL Meta Indexing Filter
> (urlmeta) 2011-12-19 16:31:02,917 INFO  plugin.PluginRepository -         
>       Regex URL Filter Framework (lib-regex-filter) 2011-12-19
> 16:31:02,917 INFO  plugin.PluginRepository - Registered Extension-Points:
> 2011-12-19 16:31:02,917 INFO  plugin.PluginRepository -               
> Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2011-12-19
> 16:31:02,917 INFO  plugin.PluginRepository -                Nutch Protocol
> (org.apache.nutch.protocol.Protocol) 2011-12-19 16:31:02,917 INFO 
> plugin.PluginRepository -                Nutch Segment Merge Filter
> (org.apache.nutch.segment.SegmentMergeFilter) 2011-12-19 16:31:02,917 INFO
>  plugin.PluginRepository -                Nutch URL Filter
> (org.apache.nutch.net.URLFilter) 2011-12-19 16:31:02,917 INFO 
> plugin.PluginRepository -                Nutch Indexing Filter
> (org.apache.nutch.indexer.IndexingFilter) 2011-12-19 16:31:02,917 INFO 
> plugin.PluginRepository -                HTML Parse Filter
> (org.apache.nutch.parse.HtmlParseFilter) 2011-12-19 16:31:02,917 INFO 
> plugin.PluginRepository -                Nutch Content Parser
> (org.apache.nutch.parse.Parser) 2011-12-19 16:31:02,917 INFO 
> plugin.PluginRepository -                Nutch Scoring
> (org.apache.nutch.scoring.ScoringFilter) 2011-12-19 16:31:02,964 INFO 
> regex.RegexURLNormalizer - can't find rules for scope 'inject', using
> default 2011-12-19 16:31:05,722 INFO  crawl.Injector - Injector: Merging
> injected urls into crawl db. 2011-12-19 16:31:07,014 WARN 
> util.NativeCodeLoader - Unable to load native-hadoop library for your
> platform... using builtin-java classes where applicable 2011-12-19
> 16:31:07,897 INFO  crawl.Injector - Injector: finished at 2011-12-19
> 16:31:07, elapsed: 00:00:06 2011-12-19 16:31:07,913 INFO  crawl.Generator
> - Generator: starting at 2011-12-19 16:31:07 2011-12-19 16:31:07,913 INFO 
> crawl.Generator - Generator: Selecting best-scoring urls due for fetch.
> 2011-12-19 16:31:07,913 INFO  crawl.Generator - Generator: filtering: true
> 2011-12-19 16:31:07,913 INFO  crawl.Generator - Generator: normalizing:
> true 2011-12-19 16:31:07,913 INFO  crawl.Generator - Generator: topN:
> 500000 2011-12-19 16:31:07,913 INFO  crawl.Generator - Generator:
> jobtracker is 'local', generating exactly one partition. 2011-12-19
> 16:31:09,157 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl:
> org.apache.nutch.crawl.DefaultFetchSchedule 2011-12-19 16:31:09,157 INFO 
> crawl.AbstractFetchSchedule - defaultInterval=2592000 2011-12-19
> 16:31:09,157 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000
> 2011-12-19 16:31:09,157 INFO  regex.RegexURLNormalizer - can't find rules
> for scope 'partition', using default 2011-12-19 16:31:09,189 INFO 
> crawl.FetchScheduleFactory - Using FetchSchedule impl:
> org.apache.nutch.crawl.DefaultFetchSchedule 2011-12-19 16:31:09,189 INFO 
> crawl.AbstractFetchSchedule - defaultInterval=2592000 2011-12-19
> 16:31:09,189 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000
> 2011-12-19 16:31:09,189 INFO  regex.RegexURLNormalizer - can't find rules
> for scope 'generate_host_count', using default 2011-12-19 16:31:10,071
> INFO  crawl.Generator - Generator: Partitioning selected urls for
> politeness. 2011-12-19 16:31:11,080 INFO  crawl.Generator - Generator:
> segment: mit-c-crawl/segments/20111219163111 2011-12-19 16:31:12,309 INFO 
> regex.RegexURLNormalizer - can't find rules for scope 'partition', using
> default 2011-12-19 16:31:13,223 INFO  crawl.Generator - Generator:
> finished at 2011-12-19 16:31:13, elapsed: 00:00:05 2011-12-19 16:31:13,239
> INFO  fetcher.Fetcher - Fetcher: starting at 2011-12-19 16:31:13
> 2011-12-19 16:31:13,239 INFO  fetcher.Fetcher - Fetcher: segment:
> mit-c-crawl/segments/20111219163111 2011-12-19 16:31:14,515 INFO 
> fetcher.Fetcher - Using queue mode : byHost 2011-12-19 16:31:14,515 INFO 
> fetcher.Fetcher - Fetcher: threads: 10 2011-12-19 16:31:14,515 INFO 
> fetcher.Fetcher - Fetcher: time-out divisor: 2 2011-12-19 16:31:14,515
> INFO  fetcher.Fetcher - QueueFeeder finished: total 1 records + hit by
> time limit :0 2011-12-19 16:31:14,531 INFO  fetcher.Fetcher - Using queue
> mode : byHost 2011-12-19 16:31:14,531 INFO  fetcher.Fetcher - Using queue
> mode : byHost 2011-12-19 16:31:14,531 INFO  fetcher.Fetcher - fetching
> http://libraries.mit.edu/archives/research/collections/collections-mc/mc1.
> html 2011-12-19 16:31:14,531 INFO  fetcher.Fetcher - Using queue mode :
> byHost 2011-12-19 16:31:14,531 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=1 2011-12-19 16:31:14,531 INFO 
> fetcher.Fetcher - Using queue mode : byHost 2011-12-19 16:31:14,531 INFO 
> fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1
> 2011-12-19 16:31:14,531 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=1 2011-12-19 16:31:14,531 INFO 
> fetcher.Fetcher - Using queue mode : byHost 2011-12-19 16:31:14,531 INFO 
> fetcher.Fetcher - Using queue mode : byHost 2011-12-19 16:31:14,531 INFO 
> fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1
> 2011-12-19 16:31:14,531 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=1 2011-12-19 16:31:14,531 INFO 
> fetcher.Fetcher - Using queue mode : byHost 2011-12-19 16:31:14,531 INFO 
> fetcher.Fetcher - Using queue mode : byHost 2011-12-19 16:31:14,531 INFO 
> fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1
> 2011-12-19 16:31:14,531 INFO  fetcher.Fetcher - -finishing thread
> FetcherThread, activeThreads=1 2011-12-19 16:31:14,531 INFO 
> fetcher.Fetcher - Using queue mode : byHost 2011-12-19 16:31:14,531 INFO 
> fetcher.Fetcher - Using queue mode : byHost 2011-12-19 16:31:14,531 INFO 
> fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1
> 2011-12-19 16:31:14,531 INFO  fetcher.Fetcher - Fetcher: throughput
> threshold: -1 2011-12-19 16:31:14,531 INFO  fetcher.Fetcher - -finishing
> thread FetcherThread, activeThreads=1 2011-12-19 16:31:14,531 INFO 
> fetcher.Fetcher - Fetcher: throughput threshold retries: 5 2011-12-19
> 16:31:14,562 INFO  httpclient.Http - http.proxy.host = null 2011-12-19
> 16:31:14,562 INFO  httpclient.Http - http.proxy.port = 8080 2011-12-19
> 16:31:14,562 INFO  httpclient.Http - http.timeout = 10000 2011-12-19
> 16:31:14,562 INFO  httpclient.Http - http.content.limit = -1 2011-12-19
> 16:31:14,562 INFO  httpclient.Http - http.agent = PHFAWS/Nutch-1.3
> (American Institute of Physics: Physics History Finding Aids Web Site;
> http://www.aip.org/history/nbl/findingaids.html; [email protected])
> 2011-12-19 16:31:14,562 INFO  httpclient.Http - http.accept.language =
> en-us,en-gb,en;q=0.7,*;q=0.3 2011-12-19 16:31:14,799 INFO  fetcher.Fetcher
> - -finishing thread FetcherThread, activeThreads=0 2011-12-19 16:31:15,539
> INFO  fetcher.Fetcher - -activeThreads=0, spinWaiting=0,
> fetchQueues.totalSize=0 2011-12-19 16:31:15,539 INFO  fetcher.Fetcher -
> -activeThreads=0
> 2011-12-19 16:31:16,390 INFO  fetcher.Fetcher - Fetcher: finished at
> 2011-12-19 16:31:16, elapsed: 00:00:03 2011-12-19 16:31:16,390 INFO 
> parse.ParseSegment - ParseSegment: starting at 2011-12-19 16:31:16
> 2011-12-19 16:31:16,390 INFO  parse.ParseSegment - ParseSegment: segment:
> mit-c-crawl/segments/20111219163111 2011-12-19 16:31:18,533 INFO 
> parse.ParseSegment - ParseSegment: finished at 2011-12-19 16:31:18,
> elapsed: 00:00:02 2011-12-19 16:31:18,549 INFO  crawl.CrawlDb - CrawlDb
> update: starting at 2011-12-19 16:31:18 2011-12-19 16:31:18,549 INFO 
> crawl.CrawlDb - CrawlDb update: db: mit-c-crawl/crawldb 2011-12-19
> 16:31:18,549 INFO  crawl.CrawlDb - CrawlDb update: segments:
> [mit-c-crawl/segments/20111219163111] 2011-12-19 16:31:18,549 INFO 
> crawl.CrawlDb - CrawlDb update: additions allowed: true 2011-12-19
> 16:31:18,549 INFO  crawl.CrawlDb - CrawlDb update: URL normalizing: true
> 2011-12-19 16:31:18,549 INFO  crawl.CrawlDb - CrawlDb update: URL
> filtering: true 2011-12-19 16:31:18,549 INFO  crawl.CrawlDb - CrawlDb
> update: 404 purging: false 2011-12-19 16:31:18,549 INFO  crawl.CrawlDb -
> CrawlDb update: Merging segment data into db. 2011-12-19 16:31:19,873 INFO
>  regex.RegexURLNormalizer - can't find rules for scope 'crawldb', using
> default 2011-12-19 16:31:20,046 INFO  regex.RegexURLNormalizer - can't
> find rules for scope 'crawldb', using default 2011-12-19 16:31:20,204 INFO
>  crawl.FetchScheduleFactory - Using FetchSchedule impl:
> org.apache.nutch.crawl.DefaultFetchSchedule 2011-12-19 16:31:20,204 INFO 
> crawl.AbstractFetchSchedule - defaultInterval=2592000 2011-12-19
> 16:31:20,204 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000
> 2011-12-19 16:31:20,771 INFO  crawl.CrawlDb - CrawlDb update: finished at
> 2011-12-19 16:31:20, elapsed: 00:00:02 2011-12-19 16:31:20,787 INFO 
> crawl.LinkDb - LinkDb: starting at 2011-12-19 16:31:20 2011-12-19
> 16:31:20,787 INFO  crawl.LinkDb - LinkDb: linkdb: mit-c-crawl/linkdb
> 2011-12-19 16:31:20,787 INFO  crawl.LinkDb - LinkDb: URL normalize: true
> 2011-12-19 16:31:20,787 INFO  crawl.LinkDb - LinkDb: URL filter: true
> 2011-12-19 16:31:20,787 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/C:/apache/apache-nutch-1.4/runtime/local/mit-c-crawl/segments/201112
> 19163111 2011-12-19 16:31:22,898 INFO  crawl.LinkDb - LinkDb: finished at
> 2011-12-19 16:31:22, elapsed: 00:00:02 2011-12-19 16:31:22,898 INFO 
> crawl.Crawl - crawl finished: mit-c-crawl 2011-12-19 16:32:08,061 INFO 
> solr.SolrIndexer - SolrIndexer: starting at 2011-12-19 16:32:08 2011-12-19
> 16:32:08,093 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb:
> mit-c-crawl/crawldb 2011-12-19 16:32:08,093 INFO  indexer.IndexerMapReduce
> - IndexerMapReduce: linkdb: mit-c-crawl/linkdb 2011-12-19 16:32:08,093
> INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment:
> mit-c-crawl/segments/20111219163111 2011-12-19 16:32:09,984 WARN 
> util.NativeCodeLoader - Unable to load native-hadoop library for your
> platform... using builtin-java classes where applicable 2011-12-19
> 16:32:10,141 INFO  plugin.PluginRepository - Plugins: looking in:
> C:\Apache\apache-nutch-1.4\runtime\local\plugins 2011-12-19 16:32:10,220
> INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]
> 2011-12-19 16:32:10,220 INFO  plugin.PluginRepository - Registered
> Plugins: 2011-12-19 16:32:10,220 INFO  plugin.PluginRepository -          
>      the nutch core extension points (nutch-extensionpoints) 2011-12-19
> 16:32:10,220 INFO  plugin.PluginRepository -                Basic URL
> Normalizer (urlnormalizer-basic) 2011-12-19 16:32:10,220 INFO 
> plugin.PluginRepository -                Html Parse Plug-in (parse-html)
> 2011-12-19 16:32:10,220 INFO  plugin.PluginRepository -               
> Basic Indexing Filter (index-basic) 2011-12-19 16:32:10,220 INFO 
> plugin.PluginRepository -                Http / Https Protocol Plug-in
> (protocol-httpclient) 2011-12-19 16:32:10,220 INFO 
> plugin.PluginRepository -                HTTP Framework (lib-http)
> 2011-12-19 16:32:10,220 INFO  plugin.PluginRepository -               
> Regex URL Filter (urlfilter-regex) 2011-12-19 16:32:10,220 INFO 
> plugin.PluginRepository -                Pass-through URL Normalizer
> (urlnormalizer-pass) 2011-12-19 16:32:10,220 INFO  plugin.PluginRepository
> -                Http Protocol Plug-in (protocol-http) 2011-12-19
> 16:32:10,220 INFO  plugin.PluginRepository -                Regex URL
> Normalizer (urlnormalizer-regex) 2011-12-19 16:32:10,220 INFO 
> plugin.PluginRepository -                Tika Parser Plug-in (parse-tika)
> 2011-12-19 16:32:10,220 INFO  plugin.PluginRepository -               
> OPIC Scoring Plug-in (scoring-opic) 2011-12-19 16:32:10,220 INFO 
> plugin.PluginRepository -                CyberNeko HTML Parser
> (lib-nekohtml) 2011-12-19 16:32:10,220 INFO  plugin.PluginRepository -    
>            Anchor Indexing Filter (index-anchor) 2011-12-19 16:32:10,220
> INFO  plugin.PluginRepository -                URL Meta Indexing Filter
> (urlmeta) 2011-12-19 16:32:10,220 INFO  plugin.PluginRepository -         
>       Regex URL Filter Framework (lib-regex-filter) 2011-12-19
> 16:32:10,220 INFO  plugin.PluginRepository - Registered Extension-Points:
> 2011-12-19 16:32:10,220 INFO  plugin.PluginRepository -               
> Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2011-12-19
> 16:32:10,220 INFO  plugin.PluginRepository -                Nutch Protocol
> (org.apache.nutch.protocol.Protocol) 2011-12-19 16:32:10,220 INFO 
> plugin.PluginRepository -                Nutch Segment Merge Filter
> (org.apache.nutch.segment.SegmentMergeFilter) 2011-12-19 16:32:10,220 INFO
>  plugin.PluginRepository -                Nutch URL Filter
> (org.apache.nutch.net.URLFilter) 2011-12-19 16:32:10,220 INFO 
> plugin.PluginRepository -                Nutch Indexing Filter
> (org.apache.nutch.indexer.IndexingFilter) 2011-12-19 16:32:10,220 INFO 
> plugin.PluginRepository -                HTML Parse Filter
> (org.apache.nutch.parse.HtmlParseFilter) 2011-12-19 16:32:10,220 INFO 
> plugin.PluginRepository -                Nutch Content Parser
> (org.apache.nutch.parse.Parser) 2011-12-19 16:32:10,220 INFO 
> plugin.PluginRepository -                Nutch Scoring
> (org.apache.nutch.scoring.ScoringFilter) 2011-12-19 16:32:10,252 INFO 
> indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-12-19 16:32:10,283
> INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
> 2011-12-19 16:32:10,283 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-12-19
> 16:32:10,283 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.urlmeta.URLMetaIndexingFilter 2011-12-19
> 16:32:11,276 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-12-19 16:32:11,276
> INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
> 2011-12-19 16:32:11,276 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-12-19
> 16:32:11,276 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.urlmeta.URLMetaIndexingFilter 2011-12-19
> 16:32:11,402 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-12-19 16:32:11,402
> INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
> 2011-12-19 16:32:11,402 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-12-19
> 16:32:11,402 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.urlmeta.URLMetaIndexingFilter 2011-12-19
> 16:32:11,544 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-12-19 16:32:11,544
> INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
> 2011-12-19 16:32:11,544 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-12-19
> 16:32:11,544 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.urlmeta.URLMetaIndexingFilter 2011-12-19
> 16:32:11,686 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-12-19 16:32:11,686
> INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
> 2011-12-19 16:32:11,686 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-12-19
> 16:32:11,686 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.urlmeta.URLMetaIndexingFilter 2011-12-19
> 16:32:11,906 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-12-19 16:32:11,906
> INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
> 2011-12-19 16:32:11,906 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-12-19
> 16:32:11,906 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.urlmeta.URLMetaIndexingFilter 2011-12-19
> 16:32:11,985 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-12-19 16:32:11,985
> INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
> 2011-12-19 16:32:11,985 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-12-19
> 16:32:11,985 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.urlmeta.URLMetaIndexingFilter 2011-12-19
> 16:32:12,111 INFO  solr.SolrMappingReader - source: content dest: content
> 2011-12-19 16:32:12,111 INFO  solr.SolrMappingReader - source: site dest:
> site 2011-12-19 16:32:12,111 INFO  solr.SolrMappingReader - source: title
> dest: title 2011-12-19 16:32:12,111 INFO  solr.SolrMappingReader - source:
> host dest: host 2011-12-19 16:32:12,111 INFO  solr.SolrMappingReader -
> source: segment dest: segment 2011-12-19 16:32:12,111 INFO 
> solr.SolrMappingReader - source: boost dest: boost 2011-12-19 16:32:12,111
> INFO  solr.SolrMappingReader - source: digest dest: digest 2011-12-19
> 16:32:12,111 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
> 2011-12-19 16:32:12,111 INFO  solr.SolrMappingReader - source: url dest:
> id 2011-12-19 16:32:12,111 INFO  solr.SolrMappingReader - source: url
> dest: url 2011-12-19 16:32:13,309 INFO  solr.SolrIndexer - SolrIndexer:
> finished at 2011-12-19 16:32:13, elapsed: 00:00:05

Reply via email to