[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2019-02-15 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769283#comment-16769283
 ] 

Stas Batururimi commented on NUTCH-2676:


[~wastl-nagel] Hi. Any news about the patch and planned release?

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2019-02-12 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766021#comment-16766021
 ] 

Stas Batururimi commented on NUTCH-2676:


Hi [~wastl-nagel]
Updates
https://github.com/apache/nutch/pull/430/#issuecomment-462759562

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2019-01-16 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744215#comment-16744215
 ] 

Stas Batururimi commented on NUTCH-2676:


https://github.com/apache/nutch/pull/430

I will probably add more updates during the month.
I have also added a possibility to ignore robots.txt file. Let me know if 
needed to push it.

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2019-01-16 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744203#comment-16744203
 ] 

Stas Batururimi commented on NUTCH-2676:


[~wastl-nagel]I'm ready to make a pull request but don't see any NUTCH-2676 
branch in the https://github.com/apache/nutch repository. Should I perform a 
pull request to master?

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2019-01-11 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740472#comment-16740472
 ] 

Stas Batururimi commented on NUTCH-2676:


Hi, [~wastl-nagel]
Could you point me on the right direction in order to follow the redirects of 
the initial urls list but not the (internal/external) links present in many 
pages?
I played with

{code:java}
db.ignore.also.redirects
db.ignore.external.links
db.ignore.internal.links
{code}

and took a look at 
https://issues.apache.org/jira/browse/NUTCH-2216
but failed with this.

All the time I have one of the following:
- redirects + a lot of other links (not specified in the initial url list)
- no redirects but saved db_redir_temp and db_redir_perm (for later use as 
somewhere specified)
How to combine that:
links from db_redir_temp/db_redir_perm + not internal/external links present in 
web pages?

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2019-01-07 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735656#comment-16735656
 ] 

Stas Batururimi edited comment on NUTCH-2676 at 1/7/19 10:48 AM:
-

[~wastl-nagel] Hi. Yes. I will provide it soon, somewhere between Jan 9 - Jan 
13.


was (Author: virt):
[~wastl-nagel] Hi. Yes. I will provide it soo, somewhere between Jan 9 - Jan 13.

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2019-01-07 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735656#comment-16735656
 ] 

Stas Batururimi commented on NUTCH-2676:


[~wastl-nagel] Hi. Yes. I will provide it soo, somewhere between Jan 9 - Jan 13.

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-12-10 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714531#comment-16714531
 ] 

Stas Batururimi commented on NUTCH-2676:


So, I have already made a patch for this in the source code of 
FetcherThread.java for our needs. So, I could push it some time later in a 
separate Issue if necessary. Let me know what do you think about.

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-12-09 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714396#comment-16714396
 ] 

Stas Batururimi commented on NUTCH-2676:


[~wastl-nagel]Some updates: the work is still in progress, I have updated some 
parts and working on some other while testing different configuration. The 
patch hasn't been abandoned.

By the way, can I add an option to not consider robots.txt or it's better to 
keep it private and not to be pushed into the main repository?

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-12-09 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714396#comment-16714396
 ] 

Stas Batururimi edited comment on NUTCH-2676 at 12/10/18 7:56 AM:
--

[~wastl-nagel]Some updates: the work is still in progress, I have updated some 
parts and working on some other while testing different configurations. The 
patch hasn't been abandoned!

By the way, can I add an option to not consider robots.txt or it's better to 
keep it private and not to be pushed into the main repository?


was (Author: virt):
[~wastl-nagel]Some updates: the work is still in progress, I have updated some 
parts and working on some other while testing different configuration. The 
patch hasn't been abandoned.

By the way, can I add an option to not consider robots.txt or it's better to 
keep it private and not to be pushed into the main repository?

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-11-20 Thread Stas Batururimi (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stas Batururimi updated NUTCH-2676:
---
Comment: was deleted

(was: Running twice didn't help. Looks like the property is still unresolved 
during the build time
```
resolve-default:
[ivy:resolve] :: loading settings :: file = 
/root/nutch_source/ivy/ivysettings.xml
[ivy:resolve] 
[ivy:resolve] :: problems summary ::
[ivy:resolve]  WARNINGS
[ivy:resolve]   [FAILED ] 
javax.ws.rs#javax.ws.rs-api;2.1!javax.ws.rs-api.${packaging.type}:  (0ms)
[ivy:resolve]    local: tried
[ivy:resolve] 
/root/.ivy2/local/javax.ws.rs/javax.ws.rs-api/2.1/${packaging.type}s/javax.ws.rs-api.${packaging.type}
[ivy:resolve]    maven2: tried
[ivy:resolve] 
http://repo1.maven.org/maven2/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}
[ivy:resolve]    apache-snapshot: tried
[ivy:resolve] 
https://repository.apache.org/content/repositories/snapshots/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}
[ivy:resolve]    sonatype: tried
[ivy:resolve] 
http://oss.sonatype.org/content/repositories/releases/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}
[ivy:resolve]   ::
[ivy:resolve]   ::  FAILED DOWNLOADS::
[ivy:resolve]   :: ^ see resolution messages for details  ^ ::
[ivy:resolve]   ::
[ivy:resolve]   :: 
javax.ws.rs#javax.ws.rs-api;2.1!javax.ws.rs-api.${packaging.type}
[ivy:resolve]   ::
[ivy:resolve] 
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
```)

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-11-19 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692799#comment-16692799
 ] 

Stas Batururimi commented on NUTCH-2676:


Running twice didn't help. Looks like the property is still unresolved during 
the build time
```
resolve-default:
[ivy:resolve] :: loading settings :: file = 
/root/nutch_source/ivy/ivysettings.xml
[ivy:resolve] 
[ivy:resolve] :: problems summary ::
[ivy:resolve]  WARNINGS
[ivy:resolve]   [FAILED ] 
javax.ws.rs#javax.ws.rs-api;2.1!javax.ws.rs-api.${packaging.type}:  (0ms)
[ivy:resolve]    local: tried
[ivy:resolve] 
/root/.ivy2/local/javax.ws.rs/javax.ws.rs-api/2.1/${packaging.type}s/javax.ws.rs-api.${packaging.type}
[ivy:resolve]    maven2: tried
[ivy:resolve] 
http://repo1.maven.org/maven2/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}
[ivy:resolve]    apache-snapshot: tried
[ivy:resolve] 
https://repository.apache.org/content/repositories/snapshots/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}
[ivy:resolve]    sonatype: tried
[ivy:resolve] 
http://oss.sonatype.org/content/repositories/releases/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}
[ivy:resolve]   ::
[ivy:resolve]   ::  FAILED DOWNLOADS::
[ivy:resolve]   :: ^ see resolution messages for details  ^ ::
[ivy:resolve]   ::
[ivy:resolve]   :: 
javax.ws.rs#javax.ws.rs-api;2.1!javax.ws.rs-api.${packaging.type}
[ivy:resolve]   ::
[ivy:resolve] 
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
```

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-11-19 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691570#comment-16691570
 ] 

Stas Batururimi commented on NUTCH-2676:


Quite strange. It is working with Tika 1.18, but not with Tika 1.19+ the 
specified packaging.type seems to be missing. I see the following commits
https://github.com/apache/nutch/blob/65c4fedfacdb873a050e97a50602ed366c7b5a98/ivy/ivysettings.xml
But it is not helping...

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-11-19 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691525#comment-16691525
 ] 

Stas Batururimi commented on NUTCH-2676:


The problem is lying in the dependencies section here
https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/ivy.xml

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-11-18 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691338#comment-16691338
 ] 

Stas Batururimi edited comment on NUTCH-2676 at 11/19/18 7:33 AM:
--

 
Still getting the same error.
The ivy version is 2.4.0.
I'm building the project by replicating the environment with a docker 
container. That is:

{code:java}
docker run -dit -v $PWD:/root/nutch -w /root --name nutch_container 
ubuntu:16.04 bash
{code}

then installing all dependencies
{code:java}
apt upgrade
 apt update
 apt install -y ant openssh-server vim telnet git rsync curl 
openjdk-8-jdk-headless
 echo 'export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64' >> $HOME/.bashrc
 source ~/.bashrc
{code}

 And compiling the project with

{code:java}
 ant runtime
{code}

 or as specified above:


{code:java}
 run ant -d clean runtime
{code}


 The result I always got are:

{code:java}
 [ivy:resolve] report for org.apache.nutch#parse-tika;working@8ea54697ddb4 
default produced in /root/.ivy2/cache/org.apache.nutch-parse-tika-default.xml
 [ivy:resolve] resolve done (47605ms resolve - 16768ms download)
 [ivy:resolve] 
 [ivy:resolve] :: problems summary ::
 [ivy:resolve]  WARNINGS
 [ivy:resolve] [FAILED ] 
javax.ws.rs#javax.ws.rs-api;2.1!javax.ws.rs-api.${packaging.type}: (0ms)
 [ivy:resolve]  local: tried
 [ivy:resolve] 
/root/.ivy2/local/javax.ws.rs/javax.ws.rs-api/2.1/${packaging.type}s/javax.ws.rs-api.${packaging.type}
 [ivy:resolve]  maven2: tried
 [ivy:resolve] 
http://repo1.maven.org/maven2/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}
 [ivy:resolve]  apache-snapshot: tried
 [ivy:resolve] 
https://repository.apache.org/content/repositories/snapshots/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}
 [ivy:resolve]  sonatype: tried
 [ivy:resolve] 
http://oss.sonatype.org/content/repositories/releases/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}
 [ivy:resolve] ::
 [ivy:resolve] :: FAILED DOWNLOADS ::
 [ivy:resolve] :: ^ see resolution messages for details ^ ::
 [ivy:resolve] ::
 [ivy:resolve] :: 
javax.ws.rs#javax.ws.rs-api;2.1!javax.ws.rs-api.${packaging.type}
 [ivy:resolve] ::
 [ivy:resolve] 
 [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
 [ant] Exiting /root/nutch/src/plugin/parse-tika/build.xml.
 [ant] Exiting /root/nutch/src/plugin/build.xml.
BUILD FAILED
 /root/nutch/build.xml:116: The following error occurred while executing this 
line:
 /root/nutch/src/plugin/build.xml:68: The following error occurred while 
executing this line:
 /root/nutch/src/plugin/build-plugin.xml:229: impossible to resolve 
dependencies:
 resolve failed - see output for details
 at org.apache.ivy.ant.IvyResolve.doExecute(IvyResolve.java:337)
 at org.apache.ivy.ant.IvyTask.execute(IvyTask.java:271)
 at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
 at org.apache.tools.ant.Task.perform(Task.java:348)
 at org.apache.tools.ant.Target.execute(Target.java:435)
 at org.apache.tools.ant.Target.performTasks(Target.java:456)
 at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1405)
 at 
org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
 at org.apache.tools.ant.Project.executeTargets(Project.java:1260)
 at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:441)
 at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
 at org.apache.tools.ant.Task.perform(Task.java:348)
 at org.apache.tools.ant.Target.execute(Target.java:435)
 at org.apache.tools.ant.Target.performTasks(Target.java:456)
 at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1405)
 at 
org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
 at org.apache.tools.ant.Project.executeTargets(Project.java:1260)
 at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:441)
 at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at 

[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-11-18 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691338#comment-16691338
 ] 

Stas Batururimi commented on NUTCH-2676:


 
Still getting the same error.
The ivy version is 2.4.0.
I'm building the project by replicating the environment with a docker 
container. That is:

{code:java}
docker run -dit -v $PWD:/root/nutch -w /root --name nutch_container 
ubuntu:16.04 bash

then installing all dependencies
apt upgrade
 apt update
 apt install -y ant openssh-server vim telnet git rsync curl 
openjdk-8-jdk-headless
 echo 'export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64' >> $HOME/.bashrc
 source ~/.bashrc
{code}

 And compiling the project with

{code:java}
 ant runtime
{code}

 or as specified above:


{code:java}
 run ant -d clean runtime
{code}


 The result I always got are:

{code:java}
 [ivy:resolve] report for org.apache.nutch#parse-tika;working@8ea54697ddb4 
default produced in /root/.ivy2/cache/org.apache.nutch-parse-tika-default.xml
 [ivy:resolve] resolve done (47605ms resolve - 16768ms download)
 [ivy:resolve] 
 [ivy:resolve] :: problems summary ::
 [ivy:resolve]  WARNINGS
 [ivy:resolve] [FAILED ] 
javax.ws.rs#javax.ws.rs-api;2.1!javax.ws.rs-api.${packaging.type}: (0ms)
 [ivy:resolve]  local: tried
 [ivy:resolve] 
/root/.ivy2/local/javax.ws.rs/javax.ws.rs-api/2.1/${packaging.type}s/javax.ws.rs-api.${packaging.type}
 [ivy:resolve]  maven2: tried
 [ivy:resolve] 
http://repo1.maven.org/maven2/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}
 [ivy:resolve]  apache-snapshot: tried
 [ivy:resolve] 
https://repository.apache.org/content/repositories/snapshots/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}
 [ivy:resolve]  sonatype: tried
 [ivy:resolve] 
http://oss.sonatype.org/content/repositories/releases/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}
 [ivy:resolve] ::
 [ivy:resolve] :: FAILED DOWNLOADS ::
 [ivy:resolve] :: ^ see resolution messages for details ^ ::
 [ivy:resolve] ::
 [ivy:resolve] :: 
javax.ws.rs#javax.ws.rs-api;2.1!javax.ws.rs-api.${packaging.type}
 [ivy:resolve] ::
 [ivy:resolve] 
 [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
 [ant] Exiting /root/nutch/src/plugin/parse-tika/build.xml.
 [ant] Exiting /root/nutch/src/plugin/build.xml.
BUILD FAILED
 /root/nutch/build.xml:116: The following error occurred while executing this 
line:
 /root/nutch/src/plugin/build.xml:68: The following error occurred while 
executing this line:
 /root/nutch/src/plugin/build-plugin.xml:229: impossible to resolve 
dependencies:
 resolve failed - see output for details
 at org.apache.ivy.ant.IvyResolve.doExecute(IvyResolve.java:337)
 at org.apache.ivy.ant.IvyTask.execute(IvyTask.java:271)
 at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
 at org.apache.tools.ant.Task.perform(Task.java:348)
 at org.apache.tools.ant.Target.execute(Target.java:435)
 at org.apache.tools.ant.Target.performTasks(Target.java:456)
 at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1405)
 at 
org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
 at org.apache.tools.ant.Project.executeTargets(Project.java:1260)
 at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:441)
 at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
 at org.apache.tools.ant.Task.perform(Task.java:348)
 at org.apache.tools.ant.Target.execute(Target.java:435)
 at org.apache.tools.ant.Target.performTasks(Target.java:456)
 at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1405)
 at 
org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
 at org.apache.tools.ant.Project.executeTargets(Project.java:1260)
 at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:441)
 at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 

[jira] [Updated] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-11-16 Thread Stas Batururimi (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stas Batururimi updated NUTCH-2676:
---
Attachment: Screenshot 2018-11-16 at 18.15.52.png

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-11-16 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689545#comment-16689545
 ] 

Stas Batururimi commented on NUTCH-2676:


[~wastl-nagel] Can you point me on the right direction with Apache Tika?

I got an error similar to this one when building from source:

https://issues.apache.org/jira/browse/NUTCH-2584

  !Screenshot 2018-11-16 at 18.15.52.png!

 

Removing the parse-tiki plugin helps to solve the compilation problem but I 
assume this is not to be an option...

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-11-16 Thread Stas Batururimi (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689129#comment-16689129
 ] 

Stas Batururimi commented on NUTCH-2676:


[ Sebastian 
Nagel|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=wastl-nagel] 
This is mostly an extension as the remote web driver code for Chrome as well as 
an example of use for  multi-docker containers with Selenium Grid are missing.

> Update to the latest selenium and add code to use chrome and firefox headless 
> mode with the remote web driver
> -
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.15
>Reporter: Stas Batururimi
>Priority: Major
> Fix For: 1.16
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome 
>  * necessity to add headless mode for both remote WebDriverBase Firefox & 
> Chrome
>  * use case with Selenium grid using docker (1 hub docker container, several 
> nodes in different docker containers, Nutch in another docker container, 
> streaming to Apache Solr in docker container, that is at least 4 different 
> docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

2018-11-14 Thread Stas Batururimi (JIRA)
Stas Batururimi created NUTCH-2676:
--

 Summary: Update to the latest selenium and add code to use chrome 
and firefox headless mode with the remote web driver
 Key: NUTCH-2676
 URL: https://issues.apache.org/jira/browse/NUTCH-2676
 Project: Nutch
  Issue Type: New Feature
Reporter: Stas Batururimi


* Selenium needs to be updated
 * missing remote web driver for chrome 
 * necessity to add headless mode for both remote WebDriverBase Firefox & Chrome
 * use case with Selenium grid using docker (1 hub docker container, several 
nodes in different docker containers, Nutch in another docker container, 
streaming to Apache Solr in docker container, that is at least 4 different 
docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)