[ https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740472#comment-16740472 ]
Stas Batururimi commented on NUTCH-2676: ---------------------------------------- Hi, [~wastl-nagel] Could you point me on the right direction in order to follow the redirects of the initial urls list but not the (internal/external) links present in many pages? I played with {code:java} db.ignore.also.redirects db.ignore.external.links db.ignore.internal.links {code} and took a look at https://issues.apache.org/jira/browse/NUTCH-2216 but failed with this. All the time I have one of the following: - redirects + a lot of other links (not specified in the initial url list) - no redirects but saved db_redir_temp and db_redir_perm (for later use as somewhere specified) How to combine that: links from db_redir_temp/db_redir_perm + not internal/external links present in web pages? > Update to the latest selenium and add code to use chrome and firefox headless > mode with the remote web driver > ------------------------------------------------------------------------------------------------------------- > > Key: NUTCH-2676 > URL: https://issues.apache.org/jira/browse/NUTCH-2676 > Project: Nutch > Issue Type: New Feature > Components: protocol > Affects Versions: 1.15 > Reporter: Stas Batururimi > Priority: Major > Fix For: 1.16 > > Attachments: Screenshot 2018-11-16 at 18.15.52.png > > > * Selenium needs to be updated > * missing remote web driver for chrome > * necessity to add headless mode for both remote WebDriverBase Firefox & > Chrome > * use case with Selenium grid using docker (1 hub docker container, several > nodes in different docker containers, Nutch in another docker container, > streaming to Apache Solr in docker container, that is at least 4 different > docker containers) -- This message was sent by Atlassian JIRA (v7.6.3#76005)