Hi,

after a second look: the Solr error only affects the cleaning job.
After checking the logs carefully:

- only one page is fetched
  2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - fetching 
https://dev-abc.com/letters (queue
crawl delay=30000ms)

- and one page is sent as deletion (probably a 404) to the indexer
  2018-07-20 09:46:23,769 INFO  solr.SolrIndexWriter - SolrIndexer: deleting
1/1 documents

But given only the logs I don't see a way to find out why the
page failed to fetch. The CrawlDb contains the fetch status and
usually also a status message which explains the failure.

Best,
Sebastian

On 07/23/2018 04:15 PM, Rushi wrote:
> Hi Sebastian,
> I am using Solr 6.4.2.But i am surprised with the same configuration Nutch
> 1.13 and Solr 6.4.2 crawling/indexing with Prod urls seems to be working
> fine without any issues.
> 
> On Mon, Jul 23, 2018 at 7:37 AM Sebastian Nagel
> <wastl.na...@googlemail.com.invalid> wrote:
> 
>> Hi,
>>
>> there is an exception "Connection pool shut down".
>> Which version of Solr are you running? It should be
>> Solr 5.5.0 for Nutch 1.13.
>>
>> Sebastian
>>
>> On 07/20/2018 03:58 PM, Rushi wrote:
>>> Thanks for the response Sebastian,
>>> Yeah i changed my seeds and i am using Nutch 1.13
>>>
>>> Here is the log
>>> 2018-07-20 09:45:49,769 INFO  crawl.Injector - Injector: starting at
>>> 2018-07-20 09:45:49
>>> 2018-07-20 09:45:49,770 INFO  crawl.Injector - Injector: crawlDb:
>>> TestCra7sl/crawldb
>>> 2018-07-20 09:45:49,770 INFO  crawl.Injector - Injector: urlDir: urls
>>> 2018-07-20 09:45:49,770 INFO  crawl.Injector - Injector: Converting
>>> injected urls to crawl db entries.
>>> 2018-07-20 09:45:49,894 WARN  util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes
>> where
>>> applicable
>>> 2018-07-20 09:45:51,672 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/plugin/plugin.xml` java.io.FileNotFoundException:
>>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
>>> 2018-07-20 09:45:51,688 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/publish-rabitmq/plugin.xml
>>> (No such file or directory)
>>> 2018-07-20 09:45:51,759 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/parse-replace/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
>> (No
>>> such file or directory)
>>> 2018-07-20 09:45:51,839 INFO  regex.RegexURLNormalizer - can't find rules
>>> for scope 'inject', using default
>>> 2018-07-20 09:45:51,985 INFO  crawl.Injector - Injector: overwrite: false
>>> 2018-07-20 09:45:51,985 INFO  crawl.Injector - Injector: update: false
>>> 2018-07-20 09:45:52,330 INFO  crawl.Injector - Injector: Total urls
>>> rejected by filters: 0
>>> 2018-07-20 09:45:52,330 INFO  crawl.Injector - Injector: Total urls
>>> injected after normalization and filtering: 1
>>> 2018-07-20 09:45:52,330 INFO  crawl.Injector - Injector: Total urls
>>> injected but already in CrawlDb: 0
>>> 2018-07-20 09:45:52,330 INFO  crawl.Injector - Injector: Total new urls
>>> injected: 1
>>> 2018-07-20 09:45:52,330 INFO  crawl.Injector - Injector: finished at
>>> 2018-07-20 09:45:52, elapsed: 00:00:02
>>> 2018-07-20 09:45:53,235 WARN  util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes
>> where
>>> applicable
>>> 2018-07-20 09:45:53,374 INFO  crawl.Generator - Generator: starting at
>>> 2018-07-20 09:45:53
>>> 2018-07-20 09:45:53,374 INFO  crawl.Generator - Generator: Selecting
>>> best-scoring urls due for fetch.
>>> 2018-07-20 09:45:53,374 INFO  crawl.Generator - Generator: filtering:
>> false
>>> 2018-07-20 09:45:53,375 INFO  crawl.Generator - Generator: normalizing:
>> true
>>> 2018-07-20 09:45:53,375 INFO  crawl.Generator - Generator: topN: 50000
>>> 2018-07-20 09:45:54,084 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/plugin/plugin.xml` java.io.FileNotFoundException:
>>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
>>> 2018-07-20 09:45:54,088 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/publish-rabitmq/plugin.xml
>>> (No such file or directory)
>>> 2018-07-20 09:45:54,109 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/parse-replace/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
>> (No
>>> such file or directory)
>>> 2018-07-20 09:45:54,146 INFO  crawl.FetchScheduleFactory - Using
>>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
>>> 2018-07-20 09:45:54,147 INFO  crawl.AbstractFetchSchedule -
>>> defaultInterval=2592000
>>> 2018-07-20 09:45:54,147 INFO  crawl.AbstractFetchSchedule -
>>> maxInterval=7776000
>>> 2018-07-20 09:45:54,154 INFO  regex.RegexURLNormalizer - can't find rules
>>> for scope 'partition', using default
>>> 2018-07-20 09:45:54,233 INFO  crawl.FetchScheduleFactory - Using
>>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
>>> 2018-07-20 09:45:54,233 INFO  crawl.AbstractFetchSchedule -
>>> defaultInterval=2592000
>>> 2018-07-20 09:45:54,233 INFO  crawl.AbstractFetchSchedule -
>>> maxInterval=7776000
>>> 2018-07-20 09:45:54,243 INFO  crawl.FetchScheduleFactory - Using
>>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
>>> 2018-07-20 09:45:54,243 INFO  crawl.AbstractFetchSchedule -
>>> defaultInterval=2592000
>>> 2018-07-20 09:45:54,243 INFO  crawl.AbstractFetchSchedule -
>>> maxInterval=7776000
>>> 2018-07-20 09:45:54,244 INFO  regex.RegexURLNormalizer - can't find rules
>>> for scope 'generate_host_count', using default
>>> 2018-07-20 09:45:54,915 INFO  crawl.Generator - Generator: Partitioning
>>> selected urls for politeness.
>>> 2018-07-20 09:45:55,916 INFO  crawl.Generator - Generator: segment:
>>> TestCra7sl/segments/20180720094555
>>> 2018-07-20 09:45:57,096 INFO  crawl.Generator - Generator: finished at
>>> 2018-07-20 09:45:57, elapsed: 00:00:03
>>> 2018-07-20 09:45:57,928 INFO  fetcher.Fetcher - Fetcher: starting at
>>> 2018-07-20 09:45:57
>>> 2018-07-20 09:45:57,929 INFO  fetcher.Fetcher - Fetcher: segment:
>>> TestCra7sl/segments/20180720094555
>>> 2018-07-20 09:45:57,929 INFO  fetcher.Fetcher - Fetcher Timelimit set
>> for :
>>> 1532105157929
>>> 2018-07-20 09:45:58,073 WARN  util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes
>> where
>>> applicable
>>> 2018-07-20 09:45:58,800 INFO  fetcher.FetchItemQueues - Using queue mode
>> :
>>> byHost
>>> 2018-07-20 09:45:58,800 INFO  fetcher.Fetcher - Fetcher: threads: 50
>>> 2018-07-20 09:45:58,800 INFO  fetcher.Fetcher - Fetcher: time-out
>> divisor: 2
>>> 2018-07-20 09:45:58,804 INFO  fetcher.QueueFeeder - QueueFeeder finished:
>>> total 1 records + hit by time limit :0
>>> 2018-07-20 09:45:58,852 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/plugin/plugin.xml` java.io.FileNotFoundException:
>>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
>>> 2018-07-20 09:45:58,855 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/publish-rabitmq/plugin.xml
>>> (No such file or directory)
>>> 2018-07-20 09:45:58,875 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/parse-replace/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
>> (No
>>> such file or directory)
>>> 2018-07-20 09:45:58,901 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,917 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,917 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - fetching
>>> https://dev-abc.com/letters (queue crawl delay=30000ms)
>>> 2018-07-20 09:45:58,918 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,918 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,919 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,919 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,919 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,920 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,920 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,920 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,920 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,921 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,921 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,921 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,922 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,922 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,923 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,923 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,923 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,924 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,924 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,925 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,925 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,925 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,926 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,926 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,926 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,927 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,927 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,927 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,928 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,928 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,928 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,929 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,929 INFO  protocol.RobotRulesParser - robots.txt
>>> whitelist not configured.
>>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,929 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,929 INFO  http.Http - http.proxy.host = null
>>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,929 INFO  http.Http - http.proxy.port = 8080
>>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,929 INFO  http.Http - http.proxy.exception.list =
>> false
>>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,929 INFO  http.Http - http.timeout = 10000
>>> 2018-07-20 09:45:58,930 INFO  http.Http - http.content.limit = -1
>>> 2018-07-20 09:45:58,930 INFO  http.Http - http.agent =
>>> nutch-solr-integration/Nutch-1.13-SNAPSHOT
>>> 2018-07-20 09:45:58,930 INFO  http.Http - http.accept.language =
>>> en-us,en-gb,en;q=0.7,*;q=0.3
>>> 2018-07-20 09:45:58,930 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,936 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,936 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,936 INFO  http.Http - http.accept =
>>> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>>> 2018-07-20 09:45:58,936 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,936 INFO  http.Http - http.enable.cookie.header =
>> true
>>> 2018-07-20 09:45:58,936 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,936 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,936 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,937 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,937 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,937 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,938 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,938 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,938 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,939 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,939 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,939 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,940 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,940 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,940 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,941 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:45:58,941 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,941 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,941 INFO  fetcher.FetcherThread - Using queue mode :
>>> byHost
>>> 2018-07-20 09:45:58,941 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:45:58,941 INFO  fetcher.Fetcher - Fetcher: throughput
>>> threshold: -1
>>> 2018-07-20 09:45:58,941 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=1
>>> 2018-07-20 09:45:58,941 INFO  fetcher.Fetcher - Fetcher: throughput
>>> threshold retries: 5
>>> 2018-07-20 09:45:58,941 INFO  fetcher.Fetcher - fetcher.maxNum.threads
>>> can't be < than 50 : using 50 instead
>>> 2018-07-20 09:45:59,945 INFO  fetcher.Fetcher - -activeThreads=1,
>>> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
>>> 2018-07-20 09:46:00,951 INFO  fetcher.Fetcher - -activeThreads=1,
>>> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
>>> 2018-07-20 09:46:01,955 INFO  fetcher.Fetcher - -activeThreads=1,
>>> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
>>> 2018-07-20 09:46:02,957 INFO  fetcher.Fetcher - -activeThreads=1,
>>> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
>>> 2018-07-20 09:46:03,959 INFO  fetcher.Fetcher - -activeThreads=1,
>>> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
>>> 2018-07-20 09:46:04,756 INFO  fetcher.FetcherThread - Thread
>> FetcherThread
>>> has no more work available
>>> 2018-07-20 09:46:04,756 INFO  fetcher.FetcherThread - -finishing thread
>>> FetcherThread, activeThreads=0
>>> 2018-07-20 09:46:04,964 INFO  fetcher.Fetcher - -activeThreads=0,
>>> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=0
>>> 2018-07-20 09:46:04,964 INFO  fetcher.Fetcher - -activeThreads=0
>>> 2018-07-20 09:46:05,709 INFO  fetcher.Fetcher - Fetcher: finished at
>>> 2018-07-20 09:46:05, elapsed: 00:00:07
>>> 2018-07-20 09:46:06,597 WARN  util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes
>> where
>>> applicable
>>> 2018-07-20 09:46:06,712 INFO  parse.ParseSegment - ParseSegment: starting
>>> at 2018-07-20 09:46:06
>>> 2018-07-20 09:46:06,713 INFO  parse.ParseSegment - ParseSegment: segment:
>>> TestCra7sl/segments/20180720094555
>>> 2018-07-20 09:46:07,478 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/plugin/plugin.xml` java.io.FileNotFoundException:
>>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
>>> 2018-07-20 09:46:07,482 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/publish-rabitmq/plugin.xml
>>> (No such file or directory)
>>> 2018-07-20 09:46:07,502 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/parse-replace/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
>> (No
>>> such file or directory)
>>> 2018-07-20 09:46:07,690 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:46:07,775 INFO  net.URLExemptionFilters - Found 0
>> extensions
>>> at point:'org.apache.nutch.net.URLExemptionFilter'
>>> 2018-07-20 09:46:08,294 INFO  parse.ParseSegment - ParseSegment: finished
>>> at 2018-07-20 09:46:08, elapsed: 00:00:01
>>> 2018-07-20 09:46:09,234 WARN  util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes
>> where
>>> applicable
>>> 2018-07-20 09:46:09,391 INFO  crawl.CrawlDb - CrawlDb update: starting at
>>> 2018-07-20 09:46:09
>>> 2018-07-20 09:46:09,391 INFO  crawl.CrawlDb - CrawlDb update: db:
>>> TestCra7sl/crawldb
>>> 2018-07-20 09:46:09,391 INFO  crawl.CrawlDb - CrawlDb update: segments:
>>> [TestCra7sl/segments/20180720094555]
>>> 2018-07-20 09:46:09,391 INFO  crawl.CrawlDb - CrawlDb update: additions
>>> allowed: false
>>> 2018-07-20 09:46:09,391 INFO  crawl.CrawlDb - CrawlDb update: URL
>>> normalizing: false
>>> 2018-07-20 09:46:09,392 INFO  crawl.CrawlDb - CrawlDb update: URL
>>> filtering: false
>>> 2018-07-20 09:46:09,392 INFO  crawl.CrawlDb - CrawlDb update: 404
>> purging:
>>> false
>>> 2018-07-20 09:46:09,393 INFO  crawl.CrawlDb - CrawlDb update: Merging
>>> segment data into db.
>>> 2018-07-20 09:46:10,518 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/plugin/plugin.xml` java.io.FileNotFoundException:
>>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
>>> 2018-07-20 09:46:10,522 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/publish-rabitmq/plugin.xml
>>> (No such file or directory)
>>> 2018-07-20 09:46:10,541 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/parse-replace/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
>> (No
>>> such file or directory)
>>> 2018-07-20 09:46:10,567 INFO  crawl.FetchScheduleFactory - Using
>>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
>>> 2018-07-20 09:46:10,567 INFO  crawl.AbstractFetchSchedule -
>>> defaultInterval=2592000
>>> 2018-07-20 09:46:10,567 INFO  crawl.AbstractFetchSchedule -
>>> maxInterval=7776000
>>> 2018-07-20 09:46:10,616 INFO  crawl.FetchScheduleFactory - Using
>>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
>>> 2018-07-20 09:46:10,616 INFO  crawl.AbstractFetchSchedule -
>>> defaultInterval=2592000
>>> 2018-07-20 09:46:10,616 INFO  crawl.AbstractFetchSchedule -
>>> maxInterval=7776000
>>> 2018-07-20 09:46:11,005 INFO  crawl.CrawlDb - CrawlDb update: finished at
>>> 2018-07-20 09:46:11, elapsed: 00:00:01
>>> 2018-07-20 09:46:11,980 WARN  util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes
>> where
>>> applicable
>>> 2018-07-20 09:46:12,132 INFO  crawl.LinkDb - LinkDb: starting at
>> 2018-07-20
>>> 09:46:12
>>> 2018-07-20 09:46:12,132 INFO  crawl.LinkDb - LinkDb: linkdb:
>>> TestCra7sl/linkdb
>>> 2018-07-20 09:46:12,132 INFO  crawl.LinkDb - LinkDb: URL normalize: true
>>> 2018-07-20 09:46:12,132 INFO  crawl.LinkDb - LinkDb: URL filter: true
>>> 2018-07-20 09:46:12,132 INFO  crawl.LinkDb - LinkDb: internal links will
>> be
>>> ignored.
>>> 2018-07-20 09:46:12,132 INFO  crawl.LinkDb - LinkDb: adding segment:
>>> TestCra7sl/segments/20180720094555
>>> 2018-07-20 09:46:12,922 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/plugin/plugin.xml` java.io.FileNotFoundException:
>>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
>>> 2018-07-20 09:46:12,926 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/publish-rabitmq/plugin.xml
>>> (No such file or directory)
>>> 2018-07-20 09:46:12,946 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/parse-replace/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
>> (No
>>> such file or directory)
>>> 2018-07-20 09:46:13,726 INFO  crawl.LinkDb - LinkDb: finished at
>> 2018-07-20
>>> 09:46:13, elapsed: 00:00:01
>>> 2018-07-20 09:46:14,596 INFO  crawl.DeduplicationJob - DeduplicationJob:
>>> starting at 2018-07-20 09:46:14
>>> 2018-07-20 09:46:14,785 WARN  util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes
>> where
>>> applicable
>>> 2018-07-20 09:46:16,448 INFO  crawl.DeduplicationJob - Deduplication: 0
>>> documents marked as duplicates
>>> 2018-07-20 09:46:16,449 INFO  crawl.DeduplicationJob - Deduplication:
>>> Updating status of duplicate urls into crawl db.
>>> 2018-07-20 09:46:17,665 INFO  crawl.DeduplicationJob - Deduplication
>>> finished at 2018-07-20 09:46:17, elapsed: 00:00:03
>>> 2018-07-20 09:46:18,637 WARN  util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes
>> where
>>> applicable
>>> 2018-07-20 09:46:18,776 INFO  segment.SegmentChecker - Segment dir is
>>> complete: TestCra7sl/segments/20180720094555.
>>> 2018-07-20 09:46:18,777 INFO  indexer.IndexingJob - Indexer: starting at
>>> 2018-07-20 09:46:18
>>> 2018-07-20 09:46:18,780 INFO  indexer.IndexingJob - Indexer: deleting
>> gone
>>> documents: false
>>> 2018-07-20 09:46:18,780 INFO  indexer.IndexingJob - Indexer: URL
>> filtering:
>>> false
>>> 2018-07-20 09:46:18,780 INFO  indexer.IndexingJob - Indexer: URL
>>> normalizing: false
>>> 2018-07-20 09:46:18,848 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/plugin/plugin.xml` java.io.FileNotFoundException:
>>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
>>> 2018-07-20 09:46:18,853 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/publish-rabitmq/plugin.xml
>>> (No such file or directory)
>>> 2018-07-20 09:46:18,877 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/parse-replace/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
>> (No
>>> such file or directory)
>>> 2018-07-20 09:46:18,947 INFO  indexer.IndexWriters - Adding
>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter
>>> 2018-07-20 09:46:18,947 INFO  indexer.IndexingJob - Active IndexWriters :
>>> SOLRIndexWriter
>>> solr.server.url : URL of the SOLR instance
>>> solr.zookeeper.hosts : URL of the Zookeeper quorum
>>> solr.commit.size : buffer size when sending to SOLR (default 1000)
>>> solr.mapping.file : name of the mapping file for fields (default
>>> solrindex-mapping.xml)
>>> solr.auth : use authentication (default false)
>>> solr.auth.username : username for authentication
>>> solr.auth.password : password for authentication
>>>
>>>
>>> 2018-07-20 09:46:18,949 INFO  indexer.IndexerMapReduce -
>> IndexerMapReduce:
>>> crawldb: TestCra7sl/crawldb
>>> 2018-07-20 09:46:18,949 INFO  indexer.IndexerMapReduce -
>> IndexerMapReduce:
>>> linkdb: TestCra7sl/linkdb
>>> 2018-07-20 09:46:18,949 INFO  indexer.IndexerMapReduce -
>> IndexerMapReduces:
>>> adding segment: TestCra7sl/segments/20180720094555
>>> 2018-07-20 09:46:19,781 INFO  anchor.AnchorIndexingFilter - Anchor
>>> deduplication is: off
>>> 2018-07-20 09:46:20,716 INFO  indexer.IndexWriters - Adding
>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter
>>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: content
>>> dest: content
>>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: title
>> dest:
>>> title
>>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source:
>>> metatag.description dest: description
>>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source:
>>> metatag.section dest: section
>>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source:
>>> metatag.gldocname dest: gldocname
>>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: host dest:
>>> host
>>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: segment
>>> dest: segment
>>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: boost
>> dest:
>>> boost
>>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: digest
>> dest:
>>> digest
>>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: tstamp
>> dest:
>>> tstamp
>>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source:
>> mainContent
>>> dest: mainContent
>>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: content
>>> dest: content
>>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: title
>> dest:
>>> title
>>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source:
>>> metatag.description dest: description
>>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source:
>>> metatag.section dest: section
>>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source:
>>> metatag.gldocname dest: gldocname
>>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: host dest:
>>> host
>>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: segment
>>> dest: segment
>>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: boost
>> dest:
>>> boost
>>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: digest
>> dest:
>>> digest
>>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: tstamp
>> dest:
>>> tstamp
>>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source:
>> mainContent
>>> dest: mainContent
>>> 2018-07-20 09:46:21,646 INFO  indexer.IndexingJob - Indexer: number of
>>> documents indexed, deleted, or skipped:
>>> 2018-07-20 09:46:21,652 INFO  indexer.IndexingJob - Indexer: finished at
>>> 2018-07-20 09:46:21, elapsed: 00:00:02
>>> 2018-07-20 09:46:22,551 INFO  indexer.CleaningJob - CleaningJob: starting
>>> at 2018-07-20 09:46:22
>>> 2018-07-20 09:46:22,717 WARN  util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes
>> where
>>> applicable
>>> 2018-07-20 09:46:23,395 WARN  output.FileOutputCommitter - Output Path is
>>> null in setupJob()
>>> 2018-07-20 09:46:23,644 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/plugin/plugin.xml` java.io.FileNotFoundException:
>>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
>>> 2018-07-20 09:46:23,649 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/publish-rabitmq/plugin.xml
>>> (No such file or directory)
>>> 2018-07-20 09:46:23,669 WARN  plugin.PluginRepository - Error while
>> loading
>>> plugin `/nutch/plugins/parse-replace/plugin.xml`
>>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
>> (No
>>> such file or directory)
>>> 2018-07-20 09:46:23,702 INFO  indexer.IndexWriters - Adding
>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter
>>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: content
>>> dest: content
>>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: title
>> dest:
>>> title
>>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source:
>>> metatag.description dest: description
>>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source:
>>> metatag.section dest: section
>>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source:
>>> metatag.gldocname dest: gldocname
>>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: host dest:
>>> host
>>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: segment
>>> dest: segment
>>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: boost
>> dest:
>>> boost
>>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: digest
>> dest:
>>> digest
>>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: tstamp
>> dest:
>>> tstamp
>>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source:
>> mainContent
>>> dest: mainContent
>>> 2018-07-20 09:46:23,769 INFO  solr.SolrIndexWriter - SolrIndexer:
>> deleting
>>> 1/1 documents
>>> 2018-07-20 09:46:23,909 WARN  output.FileOutputCommitter - Output Path is
>>> null in cleanupJob()
>>> 2018-07-20 09:46:23,910 WARN  mapred.LocalJobRunner -
>>> job_local1584437722_0001
>>> java.lang.Exception: java.lang.IllegalStateException: Connection pool
>> shut
>>> down
>>> at
>>>
>> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
>>> Caused by: java.lang.IllegalStateException: Connection pool shut down
>>> at org.apache.http.util.Asserts.check(Asserts.java:34)
>>> at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
>>> at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
>>> at
>>>
>> org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
>>> at
>>>
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
>>> at
>>>
>> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
>>> at
>>>
>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>>> at
>>>
>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>>> at
>>>
>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>>> at
>>>
>> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:481)
>>> at
>>>
>> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
>>> at
>>>
>> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
>>> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
>>> at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:482)
>>> at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:463)
>>> at
>>>
>> org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:191)
>>> at
>>>
>> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:179)
>>> at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:117)
>>> at
>>>
>> org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:122)
>>> at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
>>> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>>> at
>>>
>> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at
>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> at
>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> at java.lang.Thread.run(Thread.java:748)
>>> 2018-07-20 09:46:24,406 ERROR indexer.CleaningJob - CleaningJob:
>>> java.io.IOException: Job failed!
>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
>>> at org.apache.nutch.indexer.CleaningJob.delete(CleaningJob.java:174)
>>> at org.apache.nutch.indexer.CleaningJob.run(CleaningJob.java:197)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> at org.apache.nutch.indexer.CleaningJob.main(CleaningJob.java:208)
>>>
>>> On Fri, Jul 20, 2018 at 2:06 AM Sebastian Nagel
>>> <wastl.na...@googlemail.com.invalid> wrote:
>>>
>>>> Hi,
>>>>
>>>>>   * Changed my regex-filter to use development domain address.
>>>>
>>>> Did you also change your seeds?
>>>>
>>>> The fact that deletions are sent but not additions/updates
>>>> suggests that no pages have been successfully crawled.
>>>>
>>>> Could you specify the Nutch version used and also attach some
>>>> log snippets to make it possible to analyze the issue.
>>>>
>>>> Thanks,
>>>> Sebastian
>>>>
>>>> On 07/19/2018 10:30 PM, Rushi wrote:
>>>>> Hi all,
>>>>> I was using nutch from last 6 months and it works with Production  urls
>>>> with out any issue and for
>>>>> testing purpose i want make this work on Dev/staging.I followed these
>>>> steps
>>>>>
>>>>>
>>>>> And ran this command
>>>>>
>>>>> ./bin/crawl -i -D solr.server.url=
>>>> http://dev.abc.com:8983/solr/automateindex/ urls/ Testcrawl  2
>>>>>
>>>>> I dont see that it is indexed but it shows that it is deleted.
>>>>>
>>>>>
>>>>> nutch_—_-bash_—_206×67.png
>>>>>
>>>>> Note:I tried checking the Dev url with bin/nutch indexchecker
>>>> https://dev-abc.com/letters it shows
>>>>> me the content.
>>>>>
>>>>> I would really appreciate suggest me a solution.
>>>>> ​
>>>>
>>>>
>>>
>>
>>
>>
> 

Reply via email to