FYI,
I finally figured out the problem why it is not indexing
Here is the solution as website is internal we are not allowing the
crawlers i went to the nutch-sitemap.xml file and added the folowing

 <name>http.robot.rules.whitelist</name>

  <value>dev-abc.com <https://dev-abc.com/letters></value>
And then ran the indexer it works now.



On Mon, Jul 23, 2018 at 10:46 AM Sebastian Nagel
<wastl.na...@googlemail.com.invalid> wrote:

> Hi,
>
> after a second look: the Solr error only affects the cleaning job.
> After checking the logs carefully:
>
> - only one page is fetched
>   2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - fetching
> https://dev-abc.com/letters (queue
> crawl delay=30000ms)
>
> - and one page is sent as deletion (probably a 404) to the indexer
>   2018-07-20 09:46:23,769 INFO  solr.SolrIndexWriter - SolrIndexer:
> deleting
> 1/1 documents
>
> But given only the logs I don't see a way to find out why the
> page failed to fetch. The CrawlDb contains the fetch status and
> usually also a status message which explains the failure.
>
> Best,
> Sebastian
>
> On 07/23/2018 04:15 PM, Rushi wrote:
> > Hi Sebastian,
> > I am using Solr 6.4.2.But i am surprised with the same configuration
> Nutch
> > 1.13 and Solr 6.4.2 crawling/indexing with Prod urls seems to be working
> > fine without any issues.
> >
> > On Mon, Jul 23, 2018 at 7:37 AM Sebastian Nagel
> > <wastl.na...@googlemail.com.invalid> wrote:
> >
> >> Hi,
> >>
> >> there is an exception "Connection pool shut down".
> >> Which version of Solr are you running? It should be
> >> Solr 5.5.0 for Nutch 1.13.
> >>
> >> Sebastian
> >>
> >> On 07/20/2018 03:58 PM, Rushi wrote:
> >>> Thanks for the response Sebastian,
> >>> Yeah i changed my seeds and i am using Nutch 1.13
> >>>
> >>> Here is the log
> >>> 2018-07-20 09:45:49,769 INFO  crawl.Injector - Injector: starting at
> >>> 2018-07-20 09:45:49
> >>> 2018-07-20 09:45:49,770 INFO  crawl.Injector - Injector: crawlDb:
> >>> TestCra7sl/crawldb
> >>> 2018-07-20 09:45:49,770 INFO  crawl.Injector - Injector: urlDir: urls
> >>> 2018-07-20 09:45:49,770 INFO  crawl.Injector - Injector: Converting
> >>> injected urls to crawl db entries.
> >>> 2018-07-20 09:45:49,894 WARN  util.NativeCodeLoader - Unable to load
> >>> native-hadoop library for your platform... using builtin-java classes
> >> where
> >>> applicable
> >>> 2018-07-20 09:45:51,672 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/plugin/plugin.xml`
> java.io.FileNotFoundException:
> >>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
> >>> 2018-07-20 09:45:51,688 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
> >>> java.io.FileNotFoundException:
> /nutch/plugins/publish-rabitmq/plugin.xml
> >>> (No such file or directory)
> >>> 2018-07-20 09:45:51,759 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/parse-replace/plugin.xml`
> >>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
> >> (No
> >>> such file or directory)
> >>> 2018-07-20 09:45:51,839 INFO  regex.RegexURLNormalizer - can't find
> rules
> >>> for scope 'inject', using default
> >>> 2018-07-20 09:45:51,985 INFO  crawl.Injector - Injector: overwrite:
> false
> >>> 2018-07-20 09:45:51,985 INFO  crawl.Injector - Injector: update: false
> >>> 2018-07-20 09:45:52,330 INFO  crawl.Injector - Injector: Total urls
> >>> rejected by filters: 0
> >>> 2018-07-20 09:45:52,330 INFO  crawl.Injector - Injector: Total urls
> >>> injected after normalization and filtering: 1
> >>> 2018-07-20 09:45:52,330 INFO  crawl.Injector - Injector: Total urls
> >>> injected but already in CrawlDb: 0
> >>> 2018-07-20 09:45:52,330 INFO  crawl.Injector - Injector: Total new urls
> >>> injected: 1
> >>> 2018-07-20 09:45:52,330 INFO  crawl.Injector - Injector: finished at
> >>> 2018-07-20 09:45:52, elapsed: 00:00:02
> >>> 2018-07-20 09:45:53,235 WARN  util.NativeCodeLoader - Unable to load
> >>> native-hadoop library for your platform... using builtin-java classes
> >> where
> >>> applicable
> >>> 2018-07-20 09:45:53,374 INFO  crawl.Generator - Generator: starting at
> >>> 2018-07-20 09:45:53
> >>> 2018-07-20 09:45:53,374 INFO  crawl.Generator - Generator: Selecting
> >>> best-scoring urls due for fetch.
> >>> 2018-07-20 09:45:53,374 INFO  crawl.Generator - Generator: filtering:
> >> false
> >>> 2018-07-20 09:45:53,375 INFO  crawl.Generator - Generator: normalizing:
> >> true
> >>> 2018-07-20 09:45:53,375 INFO  crawl.Generator - Generator: topN: 50000
> >>> 2018-07-20 09:45:54,084 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/plugin/plugin.xml`
> java.io.FileNotFoundException:
> >>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
> >>> 2018-07-20 09:45:54,088 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
> >>> java.io.FileNotFoundException:
> /nutch/plugins/publish-rabitmq/plugin.xml
> >>> (No such file or directory)
> >>> 2018-07-20 09:45:54,109 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/parse-replace/plugin.xml`
> >>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
> >> (No
> >>> such file or directory)
> >>> 2018-07-20 09:45:54,146 INFO  crawl.FetchScheduleFactory - Using
> >>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
> >>> 2018-07-20 09:45:54,147 INFO  crawl.AbstractFetchSchedule -
> >>> defaultInterval=2592000
> >>> 2018-07-20 09:45:54,147 INFO  crawl.AbstractFetchSchedule -
> >>> maxInterval=7776000
> >>> 2018-07-20 09:45:54,154 INFO  regex.RegexURLNormalizer - can't find
> rules
> >>> for scope 'partition', using default
> >>> 2018-07-20 09:45:54,233 INFO  crawl.FetchScheduleFactory - Using
> >>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
> >>> 2018-07-20 09:45:54,233 INFO  crawl.AbstractFetchSchedule -
> >>> defaultInterval=2592000
> >>> 2018-07-20 09:45:54,233 INFO  crawl.AbstractFetchSchedule -
> >>> maxInterval=7776000
> >>> 2018-07-20 09:45:54,243 INFO  crawl.FetchScheduleFactory - Using
> >>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
> >>> 2018-07-20 09:45:54,243 INFO  crawl.AbstractFetchSchedule -
> >>> defaultInterval=2592000
> >>> 2018-07-20 09:45:54,243 INFO  crawl.AbstractFetchSchedule -
> >>> maxInterval=7776000
> >>> 2018-07-20 09:45:54,244 INFO  regex.RegexURLNormalizer - can't find
> rules
> >>> for scope 'generate_host_count', using default
> >>> 2018-07-20 09:45:54,915 INFO  crawl.Generator - Generator: Partitioning
> >>> selected urls for politeness.
> >>> 2018-07-20 09:45:55,916 INFO  crawl.Generator - Generator: segment:
> >>> TestCra7sl/segments/20180720094555
> >>> 2018-07-20 09:45:57,096 INFO  crawl.Generator - Generator: finished at
> >>> 2018-07-20 09:45:57, elapsed: 00:00:03
> >>> 2018-07-20 09:45:57,928 INFO  fetcher.Fetcher - Fetcher: starting at
> >>> 2018-07-20 09:45:57
> >>> 2018-07-20 09:45:57,929 INFO  fetcher.Fetcher - Fetcher: segment:
> >>> TestCra7sl/segments/20180720094555
> >>> 2018-07-20 09:45:57,929 INFO  fetcher.Fetcher - Fetcher Timelimit set
> >> for :
> >>> 1532105157929
> >>> 2018-07-20 09:45:58,073 WARN  util.NativeCodeLoader - Unable to load
> >>> native-hadoop library for your platform... using builtin-java classes
> >> where
> >>> applicable
> >>> 2018-07-20 09:45:58,800 INFO  fetcher.FetchItemQueues - Using queue
> mode
> >> :
> >>> byHost
> >>> 2018-07-20 09:45:58,800 INFO  fetcher.Fetcher - Fetcher: threads: 50
> >>> 2018-07-20 09:45:58,800 INFO  fetcher.Fetcher - Fetcher: time-out
> >> divisor: 2
> >>> 2018-07-20 09:45:58,804 INFO  fetcher.QueueFeeder - QueueFeeder
> finished:
> >>> total 1 records + hit by time limit :0
> >>> 2018-07-20 09:45:58,852 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/plugin/plugin.xml`
> java.io.FileNotFoundException:
> >>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
> >>> 2018-07-20 09:45:58,855 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
> >>> java.io.FileNotFoundException:
> /nutch/plugins/publish-rabitmq/plugin.xml
> >>> (No such file or directory)
> >>> 2018-07-20 09:45:58,875 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/parse-replace/plugin.xml`
> >>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
> >> (No
> >>> such file or directory)
> >>> 2018-07-20 09:45:58,901 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,917 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,917 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - fetching
> >>> https://dev-abc.com/letters (queue crawl delay=30000ms)
> >>> 2018-07-20 09:45:58,918 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,918 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,918 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,919 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,919 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,919 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,919 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,920 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,920 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,920 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,920 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,920 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,921 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,921 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,921 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,921 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,922 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,922 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,922 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,923 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,923 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,923 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,923 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,924 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,924 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,924 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,925 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,925 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,925 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,925 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,926 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,926 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,926 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,926 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,927 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,927 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,927 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,927 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,928 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,928 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,928 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,928 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,929 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,929 INFO  protocol.RobotRulesParser - robots.txt
> >>> whitelist not configured.
> >>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,929 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,929 INFO  http.Http - http.proxy.host = null
> >>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,929 INFO  http.Http - http.proxy.port = 8080
> >>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,929 INFO  http.Http - http.proxy.exception.list =
> >> false
> >>> 2018-07-20 09:45:58,929 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,929 INFO  http.Http - http.timeout = 10000
> >>> 2018-07-20 09:45:58,930 INFO  http.Http - http.content.limit = -1
> >>> 2018-07-20 09:45:58,930 INFO  http.Http - http.agent =
> >>> nutch-solr-integration/Nutch-1.13-SNAPSHOT
> >>> 2018-07-20 09:45:58,930 INFO  http.Http - http.accept.language =
> >>> en-us,en-gb,en;q=0.7,*;q=0.3
> >>> 2018-07-20 09:45:58,930 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,936 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,936 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,936 INFO  http.Http - http.accept =
> >>> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> >>> 2018-07-20 09:45:58,936 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,936 INFO  http.Http - http.enable.cookie.header =
> >> true
> >>> 2018-07-20 09:45:58,936 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,936 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,936 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,937 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,937 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,937 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,937 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,938 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,938 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,938 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,938 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,939 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,939 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,939 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,939 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,940 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,940 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,940 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,940 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,941 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:45:58,941 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,941 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,941 INFO  fetcher.FetcherThread - Using queue mode
> :
> >>> byHost
> >>> 2018-07-20 09:45:58,941 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:45:58,941 INFO  fetcher.Fetcher - Fetcher: throughput
> >>> threshold: -1
> >>> 2018-07-20 09:45:58,941 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=1
> >>> 2018-07-20 09:45:58,941 INFO  fetcher.Fetcher - Fetcher: throughput
> >>> threshold retries: 5
> >>> 2018-07-20 09:45:58,941 INFO  fetcher.Fetcher - fetcher.maxNum.threads
> >>> can't be < than 50 : using 50 instead
> >>> 2018-07-20 09:45:59,945 INFO  fetcher.Fetcher - -activeThreads=1,
> >>> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
> >>> 2018-07-20 09:46:00,951 INFO  fetcher.Fetcher - -activeThreads=1,
> >>> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
> >>> 2018-07-20 09:46:01,955 INFO  fetcher.Fetcher - -activeThreads=1,
> >>> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
> >>> 2018-07-20 09:46:02,957 INFO  fetcher.Fetcher - -activeThreads=1,
> >>> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
> >>> 2018-07-20 09:46:03,959 INFO  fetcher.Fetcher - -activeThreads=1,
> >>> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
> >>> 2018-07-20 09:46:04,756 INFO  fetcher.FetcherThread - Thread
> >> FetcherThread
> >>> has no more work available
> >>> 2018-07-20 09:46:04,756 INFO  fetcher.FetcherThread - -finishing thread
> >>> FetcherThread, activeThreads=0
> >>> 2018-07-20 09:46:04,964 INFO  fetcher.Fetcher - -activeThreads=0,
> >>> spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=0
> >>> 2018-07-20 09:46:04,964 INFO  fetcher.Fetcher - -activeThreads=0
> >>> 2018-07-20 09:46:05,709 INFO  fetcher.Fetcher - Fetcher: finished at
> >>> 2018-07-20 09:46:05, elapsed: 00:00:07
> >>> 2018-07-20 09:46:06,597 WARN  util.NativeCodeLoader - Unable to load
> >>> native-hadoop library for your platform... using builtin-java classes
> >> where
> >>> applicable
> >>> 2018-07-20 09:46:06,712 INFO  parse.ParseSegment - ParseSegment:
> starting
> >>> at 2018-07-20 09:46:06
> >>> 2018-07-20 09:46:06,713 INFO  parse.ParseSegment - ParseSegment:
> segment:
> >>> TestCra7sl/segments/20180720094555
> >>> 2018-07-20 09:46:07,478 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/plugin/plugin.xml`
> java.io.FileNotFoundException:
> >>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
> >>> 2018-07-20 09:46:07,482 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
> >>> java.io.FileNotFoundException:
> /nutch/plugins/publish-rabitmq/plugin.xml
> >>> (No such file or directory)
> >>> 2018-07-20 09:46:07,502 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/parse-replace/plugin.xml`
> >>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
> >> (No
> >>> such file or directory)
> >>> 2018-07-20 09:46:07,690 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:46:07,775 INFO  net.URLExemptionFilters - Found 0
> >> extensions
> >>> at point:'org.apache.nutch.net.URLExemptionFilter'
> >>> 2018-07-20 09:46:08,294 INFO  parse.ParseSegment - ParseSegment:
> finished
> >>> at 2018-07-20 09:46:08, elapsed: 00:00:01
> >>> 2018-07-20 09:46:09,234 WARN  util.NativeCodeLoader - Unable to load
> >>> native-hadoop library for your platform... using builtin-java classes
> >> where
> >>> applicable
> >>> 2018-07-20 09:46:09,391 INFO  crawl.CrawlDb - CrawlDb update: starting
> at
> >>> 2018-07-20 09:46:09
> >>> 2018-07-20 09:46:09,391 INFO  crawl.CrawlDb - CrawlDb update: db:
> >>> TestCra7sl/crawldb
> >>> 2018-07-20 09:46:09,391 INFO  crawl.CrawlDb - CrawlDb update: segments:
> >>> [TestCra7sl/segments/20180720094555]
> >>> 2018-07-20 09:46:09,391 INFO  crawl.CrawlDb - CrawlDb update: additions
> >>> allowed: false
> >>> 2018-07-20 09:46:09,391 INFO  crawl.CrawlDb - CrawlDb update: URL
> >>> normalizing: false
> >>> 2018-07-20 09:46:09,392 INFO  crawl.CrawlDb - CrawlDb update: URL
> >>> filtering: false
> >>> 2018-07-20 09:46:09,392 INFO  crawl.CrawlDb - CrawlDb update: 404
> >> purging:
> >>> false
> >>> 2018-07-20 09:46:09,393 INFO  crawl.CrawlDb - CrawlDb update: Merging
> >>> segment data into db.
> >>> 2018-07-20 09:46:10,518 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/plugin/plugin.xml`
> java.io.FileNotFoundException:
> >>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
> >>> 2018-07-20 09:46:10,522 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
> >>> java.io.FileNotFoundException:
> /nutch/plugins/publish-rabitmq/plugin.xml
> >>> (No such file or directory)
> >>> 2018-07-20 09:46:10,541 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/parse-replace/plugin.xml`
> >>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
> >> (No
> >>> such file or directory)
> >>> 2018-07-20 09:46:10,567 INFO  crawl.FetchScheduleFactory - Using
> >>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
> >>> 2018-07-20 09:46:10,567 INFO  crawl.AbstractFetchSchedule -
> >>> defaultInterval=2592000
> >>> 2018-07-20 09:46:10,567 INFO  crawl.AbstractFetchSchedule -
> >>> maxInterval=7776000
> >>> 2018-07-20 09:46:10,616 INFO  crawl.FetchScheduleFactory - Using
> >>> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
> >>> 2018-07-20 09:46:10,616 INFO  crawl.AbstractFetchSchedule -
> >>> defaultInterval=2592000
> >>> 2018-07-20 09:46:10,616 INFO  crawl.AbstractFetchSchedule -
> >>> maxInterval=7776000
> >>> 2018-07-20 09:46:11,005 INFO  crawl.CrawlDb - CrawlDb update: finished
> at
> >>> 2018-07-20 09:46:11, elapsed: 00:00:01
> >>> 2018-07-20 09:46:11,980 WARN  util.NativeCodeLoader - Unable to load
> >>> native-hadoop library for your platform... using builtin-java classes
> >> where
> >>> applicable
> >>> 2018-07-20 09:46:12,132 INFO  crawl.LinkDb - LinkDb: starting at
> >> 2018-07-20
> >>> 09:46:12
> >>> 2018-07-20 09:46:12,132 INFO  crawl.LinkDb - LinkDb: linkdb:
> >>> TestCra7sl/linkdb
> >>> 2018-07-20 09:46:12,132 INFO  crawl.LinkDb - LinkDb: URL normalize:
> true
> >>> 2018-07-20 09:46:12,132 INFO  crawl.LinkDb - LinkDb: URL filter: true
> >>> 2018-07-20 09:46:12,132 INFO  crawl.LinkDb - LinkDb: internal links
> will
> >> be
> >>> ignored.
> >>> 2018-07-20 09:46:12,132 INFO  crawl.LinkDb - LinkDb: adding segment:
> >>> TestCra7sl/segments/20180720094555
> >>> 2018-07-20 09:46:12,922 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/plugin/plugin.xml`
> java.io.FileNotFoundException:
> >>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
> >>> 2018-07-20 09:46:12,926 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
> >>> java.io.FileNotFoundException:
> /nutch/plugins/publish-rabitmq/plugin.xml
> >>> (No such file or directory)
> >>> 2018-07-20 09:46:12,946 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/parse-replace/plugin.xml`
> >>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
> >> (No
> >>> such file or directory)
> >>> 2018-07-20 09:46:13,726 INFO  crawl.LinkDb - LinkDb: finished at
> >> 2018-07-20
> >>> 09:46:13, elapsed: 00:00:01
> >>> 2018-07-20 09:46:14,596 INFO  crawl.DeduplicationJob -
> DeduplicationJob:
> >>> starting at 2018-07-20 09:46:14
> >>> 2018-07-20 09:46:14,785 WARN  util.NativeCodeLoader - Unable to load
> >>> native-hadoop library for your platform... using builtin-java classes
> >> where
> >>> applicable
> >>> 2018-07-20 09:46:16,448 INFO  crawl.DeduplicationJob - Deduplication: 0
> >>> documents marked as duplicates
> >>> 2018-07-20 09:46:16,449 INFO  crawl.DeduplicationJob - Deduplication:
> >>> Updating status of duplicate urls into crawl db.
> >>> 2018-07-20 09:46:17,665 INFO  crawl.DeduplicationJob - Deduplication
> >>> finished at 2018-07-20 09:46:17, elapsed: 00:00:03
> >>> 2018-07-20 09:46:18,637 WARN  util.NativeCodeLoader - Unable to load
> >>> native-hadoop library for your platform... using builtin-java classes
> >> where
> >>> applicable
> >>> 2018-07-20 09:46:18,776 INFO  segment.SegmentChecker - Segment dir is
> >>> complete: TestCra7sl/segments/20180720094555.
> >>> 2018-07-20 09:46:18,777 INFO  indexer.IndexingJob - Indexer: starting
> at
> >>> 2018-07-20 09:46:18
> >>> 2018-07-20 09:46:18,780 INFO  indexer.IndexingJob - Indexer: deleting
> >> gone
> >>> documents: false
> >>> 2018-07-20 09:46:18,780 INFO  indexer.IndexingJob - Indexer: URL
> >> filtering:
> >>> false
> >>> 2018-07-20 09:46:18,780 INFO  indexer.IndexingJob - Indexer: URL
> >>> normalizing: false
> >>> 2018-07-20 09:46:18,848 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/plugin/plugin.xml`
> java.io.FileNotFoundException:
> >>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
> >>> 2018-07-20 09:46:18,853 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
> >>> java.io.FileNotFoundException:
> /nutch/plugins/publish-rabitmq/plugin.xml
> >>> (No such file or directory)
> >>> 2018-07-20 09:46:18,877 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/parse-replace/plugin.xml`
> >>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
> >> (No
> >>> such file or directory)
> >>> 2018-07-20 09:46:18,947 INFO  indexer.IndexWriters - Adding
> >>> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> >>> 2018-07-20 09:46:18,947 INFO  indexer.IndexingJob - Active
> IndexWriters :
> >>> SOLRIndexWriter
> >>> solr.server.url : URL of the SOLR instance
> >>> solr.zookeeper.hosts : URL of the Zookeeper quorum
> >>> solr.commit.size : buffer size when sending to SOLR (default 1000)
> >>> solr.mapping.file : name of the mapping file for fields (default
> >>> solrindex-mapping.xml)
> >>> solr.auth : use authentication (default false)
> >>> solr.auth.username : username for authentication
> >>> solr.auth.password : password for authentication
> >>>
> >>>
> >>> 2018-07-20 09:46:18,949 INFO  indexer.IndexerMapReduce -
> >> IndexerMapReduce:
> >>> crawldb: TestCra7sl/crawldb
> >>> 2018-07-20 09:46:18,949 INFO  indexer.IndexerMapReduce -
> >> IndexerMapReduce:
> >>> linkdb: TestCra7sl/linkdb
> >>> 2018-07-20 09:46:18,949 INFO  indexer.IndexerMapReduce -
> >> IndexerMapReduces:
> >>> adding segment: TestCra7sl/segments/20180720094555
> >>> 2018-07-20 09:46:19,781 INFO  anchor.AnchorIndexingFilter - Anchor
> >>> deduplication is: off
> >>> 2018-07-20 09:46:20,716 INFO  indexer.IndexWriters - Adding
> >>> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> >>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: content
> >>> dest: content
> >>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: title
> >> dest:
> >>> title
> >>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source:
> >>> metatag.description dest: description
> >>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source:
> >>> metatag.section dest: section
> >>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source:
> >>> metatag.gldocname dest: gldocname
> >>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: host
> dest:
> >>> host
> >>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: segment
> >>> dest: segment
> >>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: boost
> >> dest:
> >>> boost
> >>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: digest
> >> dest:
> >>> digest
> >>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source: tstamp
> >> dest:
> >>> tstamp
> >>> 2018-07-20 09:46:20,809 INFO  solr.SolrMappingReader - source:
> >> mainContent
> >>> dest: mainContent
> >>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: content
> >>> dest: content
> >>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: title
> >> dest:
> >>> title
> >>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source:
> >>> metatag.description dest: description
> >>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source:
> >>> metatag.section dest: section
> >>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source:
> >>> metatag.gldocname dest: gldocname
> >>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: host
> dest:
> >>> host
> >>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: segment
> >>> dest: segment
> >>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: boost
> >> dest:
> >>> boost
> >>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: digest
> >> dest:
> >>> digest
> >>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source: tstamp
> >> dest:
> >>> tstamp
> >>> 2018-07-20 09:46:21,636 INFO  solr.SolrMappingReader - source:
> >> mainContent
> >>> dest: mainContent
> >>> 2018-07-20 09:46:21,646 INFO  indexer.IndexingJob - Indexer: number of
> >>> documents indexed, deleted, or skipped:
> >>> 2018-07-20 09:46:21,652 INFO  indexer.IndexingJob - Indexer: finished
> at
> >>> 2018-07-20 09:46:21, elapsed: 00:00:02
> >>> 2018-07-20 09:46:22,551 INFO  indexer.CleaningJob - CleaningJob:
> starting
> >>> at 2018-07-20 09:46:22
> >>> 2018-07-20 09:46:22,717 WARN  util.NativeCodeLoader - Unable to load
> >>> native-hadoop library for your platform... using builtin-java classes
> >> where
> >>> applicable
> >>> 2018-07-20 09:46:23,395 WARN  output.FileOutputCommitter - Output Path
> is
> >>> null in setupJob()
> >>> 2018-07-20 09:46:23,644 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/plugin/plugin.xml`
> java.io.FileNotFoundException:
> >>> /nutch/plugins/plugin/plugin.xml (No such file or directory)
> >>> 2018-07-20 09:46:23,649 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/publish-rabitmq/plugin.xml`
> >>> java.io.FileNotFoundException:
> /nutch/plugins/publish-rabitmq/plugin.xml
> >>> (No such file or directory)
> >>> 2018-07-20 09:46:23,669 WARN  plugin.PluginRepository - Error while
> >> loading
> >>> plugin `/nutch/plugins/parse-replace/plugin.xml`
> >>> java.io.FileNotFoundException: /nutch/plugins/parse-replace/plugin.xml
> >> (No
> >>> such file or directory)
> >>> 2018-07-20 09:46:23,702 INFO  indexer.IndexWriters - Adding
> >>> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> >>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: content
> >>> dest: content
> >>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: title
> >> dest:
> >>> title
> >>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source:
> >>> metatag.description dest: description
> >>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source:
> >>> metatag.section dest: section
> >>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source:
> >>> metatag.gldocname dest: gldocname
> >>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: host
> dest:
> >>> host
> >>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: segment
> >>> dest: segment
> >>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: boost
> >> dest:
> >>> boost
> >>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: digest
> >> dest:
> >>> digest
> >>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source: tstamp
> >> dest:
> >>> tstamp
> >>> 2018-07-20 09:46:23,766 INFO  solr.SolrMappingReader - source:
> >> mainContent
> >>> dest: mainContent
> >>> 2018-07-20 09:46:23,769 INFO  solr.SolrIndexWriter - SolrIndexer:
> >> deleting
> >>> 1/1 documents
> >>> 2018-07-20 09:46:23,909 WARN  output.FileOutputCommitter - Output Path
> is
> >>> null in cleanupJob()
> >>> 2018-07-20 09:46:23,910 WARN  mapred.LocalJobRunner -
> >>> job_local1584437722_0001
> >>> java.lang.Exception: java.lang.IllegalStateException: Connection pool
> >> shut
> >>> down
> >>> at
> >>>
> >>
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> >>> at
> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> >>> Caused by: java.lang.IllegalStateException: Connection pool shut down
> >>> at org.apache.http.util.Asserts.check(Asserts.java:34)
> >>> at
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
> >>> at
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
> >>> at
> >>>
> >>
> org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
> >>> at
> >>>
> >>
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
> >>> at
> >>>
> >>
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
> >>> at
> >>>
> >>
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> >>> at
> >>>
> >>
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
> >>> at
> >>>
> >>
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
> >>> at
> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:481)
> >>> at
> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
> >>> at
> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
> >>> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
> >>> at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:482)
> >>> at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:463)
> >>> at
> >>>
> >>
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:191)
> >>> at
> >>>
> >>
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:179)
> >>> at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:117)
> >>> at
> >>>
> >>
> org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:122)
> >>> at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
> >>> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
> >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> >>> at
> >>>
> >>
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
> >>> at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >>> at
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >>> at
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >>> at java.lang.Thread.run(Thread.java:748)
> >>> 2018-07-20 09:46:24,406 ERROR indexer.CleaningJob - CleaningJob:
> >>> java.io.IOException: Job failed!
> >>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
> >>> at org.apache.nutch.indexer.CleaningJob.delete(CleaningJob.java:174)
> >>> at org.apache.nutch.indexer.CleaningJob.run(CleaningJob.java:197)
> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >>> at org.apache.nutch.indexer.CleaningJob.main(CleaningJob.java:208)
> >>>
> >>> On Fri, Jul 20, 2018 at 2:06 AM Sebastian Nagel
> >>> <wastl.na...@googlemail.com.invalid> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>>>   * Changed my regex-filter to use development domain address.
> >>>>
> >>>> Did you also change your seeds?
> >>>>
> >>>> The fact that deletions are sent but not additions/updates
> >>>> suggests that no pages have been successfully crawled.
> >>>>
> >>>> Could you specify the Nutch version used and also attach some
> >>>> log snippets to make it possible to analyze the issue.
> >>>>
> >>>> Thanks,
> >>>> Sebastian
> >>>>
> >>>> On 07/19/2018 10:30 PM, Rushi wrote:
> >>>>> Hi all,
> >>>>> I was using nutch from last 6 months and it works with Production
> urls
> >>>> with out any issue and for
> >>>>> testing purpose i want make this work on Dev/staging.I followed these
> >>>> steps
> >>>>>
> >>>>>
> >>>>> And ran this command
> >>>>>
> >>>>> ./bin/crawl -i -D solr.server.url=
> >>>> http://dev.abc.com:8983/solr/automateindex/ urls/ Testcrawl  2
> >>>>>
> >>>>> I dont see that it is indexed but it shows that it is deleted.
> >>>>>
> >>>>>
> >>>>> nutch_—_-bash_—_206×67.png
> >>>>>
> >>>>> Note:I tried checking the Dev url with bin/nutch indexchecker
> >>>> https://dev-abc.com/letters it shows
> >>>>> me the content.
> >>>>>
> >>>>> I would really appreciate suggest me a solution.
> >>>>> ​
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >
>
>

-- 
Regards
Rushikesh M
.Net Developer

Reply via email to