Just as a follow-up: my problem is akin to this one; https://stackoverflow.com/questions/43786034/nutch-selenium-firefox-issue-unable-to-connect-to-host-127-0-0-1-on-port-7055-a ,perhaps described more precisely.
2017-07-14 14:13 GMT+02:00 Filip Stysiak <[email protected]>: > Dear Nutch users, > > [Nutch 1.13] > > I am developing and app that needs to crawl and index images and in order > to fetch dynamic content - like images in galleries - I started using > protocol-selenium plugin. However, after initial success (though I needed > to install a very outdated version of Firefox - 31.x) with a single URL in > seed.txt, the crawler crashed when I tried to crawl multiple sites (a > standard scenario in the app). > > This - of course - was the result of Nutch starting a queue for every > different host and inability to open several Firefox instances with > selenium in local mode. > > I tried to switch to Selenium grid, per: > https://github.com/apache/nutch/tree/master/src/plugin/protocol-selenium > > I used selenium-server-standalone 3.4.0, however when I started the hub > and started crawling, the* hub didn't register any attempts at connecting > to it. I* think nutch-site.xml was properly configured, though I didn't > set the grid.binary.location. I also tried upgrading the lib-selenium and > the server, with little luck. I dis > > Does anyone know what is the issue here? Has anyone succeeded in > configuring protocol-selenium grid and made it work with multiple URLs from > different hosts in the seed.txt? > > Thanks in advance, > Filip >

