Hi Puneet Responses inline On Wed, Aug 15, 2018 at 7:20 AM <user-digest-h...@nutch.apache.org> wrote:
> > From: Puneet Dhanda <ppu...@gmail.com> > To: user@nutch.apache.org > Cc: > Bcc: > Date: Wed, 15 Aug 2018 10:02:12 -0400 > Subject: Nutch 2.3.1 with Mongo datastore - No Document is getting indexed. > Hi, > > I am using the Nutch- 2.3.1 with MongoDB as the datastore. Are you using it from SCM or the release? If I were you I would use from SCM, we fixed a few bugs in there. > While crawling > the sites, getting the following error. Please assist what could be wrong > here. > > Hadoop.log exception > 2018-08-15 09:56:42,139 INFO httpclient.HttpMethodDirector - Retrying > request > 2018-08-15 09:56:42,139 INFO httpclient.HttpMethodDirector - I/O exception > (java.net.ConnectException) caught when processing request: Connection > refused (Connection refused) > 2018-08-15 09:56:42,139 INFO httpclient.HttpMethodDirector - Retrying > request > 2018-08-15 09:56:42,242 ERROR httpclient.Http - Failed with the following > error: > java.net.ConnectException: Connection refused (Connection refused) > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net > .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net > .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > 2018-08-15 09:56:46,409 INFO fetcher.FetcherJob - 0/0 spinwaiting/active, > 2 pages, 2 errors, 0.4 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues > You may wish to use the parser checker tooling to ensure that you can reach the 2 failed URLs without executing a full crawl https://wiki.apache.org/nutch/bin/nutch%20parsechecker Also, you can try setting DEBUG or TRACE logging for this tool, see https://github.com/apache/nutch/blob/2.x/conf/log4j.properties#L40 Lewis