Hi everyone; I am developing and app that needs to have dockerized Nutch 1.X instances and gets crawl requests from Celery and indexes it to Solr 6.6.0. The app indexes images (using protocol-selenium plugin to fetch dynamic content). However, I noticed that whereas small crawl tasks are properly indexed I had no success with a slightly bigger query - when I asked my dockerized app to crawl a website that (after 3 iterations of the crawl script) needs to fetch ~5000 links the Nutch in the Docker container just stops to work - the last thing I see in hadoop.log are from the fetcher; there are no exceptions, however, save for an exception that does not occur when I run (successfully) the very same crawling task on the host machine.
the exception (pastebin to full exception): org.apache.commons.httpclient.NoHttpResponseException: The server some.site.web failed to respond https://pastebin.com/nNs7DP93 I doubt that failing to fetch a couple of links would put Nutch in this crashed-but-not-really state. I say "not really", because Celery still sees the task as active - but when I look at htop or *docker stats *it's quite obvious that nutch ceased to do anything productive. Let me restate that this doesn't occur when I run the task outside of Docker. Has anyone here stumbled upon anything similair, or has any experience with running bigger crawls on dockerized Nutch? Thanks in advance, Filip

