[jira] [Commented] (NUTCH-2106) Runtime to contain Selenium and dependencies only once

2015-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901528#comment-14901528
 ] 

Hudson commented on NUTCH-2106:
---

SUCCESS: Integrated in Nutch-trunk #3275 (See 
[https://builds.apache.org/job/Nutch-trunk/3275/])
NUTCH-2106 Runtime to contain Selenium and dependencies only once (snagel: 
http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1704425)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/plugin/lib-selenium/build-ivy.xml
* /nutch/trunk/src/plugin/lib-selenium/howto_upgrade_selenium.txt
* /nutch/trunk/src/plugin/lib-selenium/ivy.xml
* /nutch/trunk/src/plugin/lib-selenium/plugin.xml
* /nutch/trunk/src/plugin/protocol-interactiveselenium/build-ivy.xml
* /nutch/trunk/src/plugin/protocol-interactiveselenium/ivy.xml
* /nutch/trunk/src/plugin/protocol-interactiveselenium/plugin.xml
* /nutch/trunk/src/plugin/protocol-selenium/build-ivy.xml
* /nutch/trunk/src/plugin/protocol-selenium/ivy.xml
* /nutch/trunk/src/plugin/protocol-selenium/plugin.xml


> Runtime to contain Selenium and dependencies only once
> --
>
> Key: NUTCH-2106
> URL: https://issues.apache.org/jira/browse/NUTCH-2106
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
> Fix For: 1.11
>
> Attachments: NUTCH-2106.patch
>
>
> All Selenium-based plugins contain the same dependendent jars which 
> significantly affects the size of runtime and bin package:
> {noformat}
> % du -hs runtime/local/plugins/*selenium/ runtime/deploy/*.job
> 25M runtime/local/plugins/lib-selenium/
> 25M runtime/local/plugins/protocol-interactiveselenium/
> 25M runtime/local/plugins/protocol-selenium/
> 182M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Since all plugins depend on the same Selenium version we could bundle the 
> dependencies in lib-selenium and let the other plugins load it from there:
> - let lib-selenium export all dependent libs, e.g.:
> {code:xml|title=lib-selenium/plugin.xml}
> 
>   ...
>   
> 
>   
> {code}
> - both protocol plugins already import lib-selenium: the dependencies in 
> ivy.xml can be removed
> As expected, these changes make the runtime smaller:
> {noformat}
> 25M runtime/local/plugins/lib-selenium/
> 20K runtime/local/plugins/protocol-interactiveselenium/
> 16K runtime/local/plugins/protocol-selenium/
> 138M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Open points:
> - I've tested only protocol-selenium using chromedriver. Should also test 
> protocol-interactiveselenium?
> - What about phantomjsdriver-1.2.1.jar? It was contained in lib-selenium and 
> protocol-selenium but not protocol-interactiveselenium. Is there a reason for 
> this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2106) Runtime to contain Selenium and dependencies only once

2015-09-19 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877251#comment-14877251
 ] 

Lewis John McGibbney commented on NUTCH-2106:
-

+1 for commit [~wastl-nagel]

> Runtime to contain Selenium and dependencies only once
> --
>
> Key: NUTCH-2106
> URL: https://issues.apache.org/jira/browse/NUTCH-2106
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11
>Reporter: Sebastian Nagel
> Fix For: 1.11
>
> Attachments: NUTCH-2106.patch
>
>
> All Selenium-based plugins contain the same dependendent jars which 
> significantly affects the size of runtime and bin package:
> {noformat}
> % du -hs runtime/local/plugins/*selenium/ runtime/deploy/*.job
> 25M runtime/local/plugins/lib-selenium/
> 25M runtime/local/plugins/protocol-interactiveselenium/
> 25M runtime/local/plugins/protocol-selenium/
> 182M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Since all plugins depend on the same Selenium version we could bundle the 
> dependencies in lib-selenium and let the other plugins load it from there:
> - let lib-selenium export all dependent libs, e.g.:
> {code:xml|title=lib-selenium/plugin.xml}
> 
>   ...
>   
> 
>   
> {code}
> - both protocol plugins already import lib-selenium: the dependencies in 
> ivy.xml can be removed
> As expected, these changes make the runtime smaller:
> {noformat}
> 25M runtime/local/plugins/lib-selenium/
> 20K runtime/local/plugins/protocol-interactiveselenium/
> 16K runtime/local/plugins/protocol-selenium/
> 138M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Open points:
> - I've tested only protocol-selenium using chromedriver. Should also test 
> protocol-interactiveselenium?
> - What about phantomjsdriver-1.2.1.jar? It was contained in lib-selenium and 
> protocol-selenium but not protocol-interactiveselenium. Is there a reason for 
> this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2106) Runtime to contain Selenium and dependencies only once

2015-09-18 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14847281#comment-14847281
 ] 

Sebastian Nagel commented on NUTCH-2106:


Avoiding conflicting dependencies is the reason for the Nutch plugin system 
[[1|https://wiki.apache.org/nutch/WhatsTheProblemWithPluginsAndClass-loading]]. 
However, if a plugin depends on another plugin and both depend on a library, 
there is no way: both plugins must rely on the same version (or two versions 
with compatible API).
- protocol-selenium depends on lib-selenium
- both depend on selenium-java (currently the same version)
- when the plugin protocol-selenium is loaded the lib-selenium.jar is just 
added to the classpath of protocol-selenium's own class loader. The classes 
from lib-selenium.jar do not live in it's own class loader! They are used 
directly (and not via the lib-selenium plugin instance) from classes in 
protocol-selenium.
- the same situation for protocol-interactiveselenium

As a consequence, the Selenium version used by lib-selenium dictates the 
version to be used by the two protocol plugins. So, why not bundle Selenium 
jars and dependencies in lib-selenium?

> Runtime to contain Selenium and dependencies only once
> --
>
> Key: NUTCH-2106
> URL: https://issues.apache.org/jira/browse/NUTCH-2106
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11
>Reporter: Sebastian Nagel
> Fix For: 1.11
>
> Attachments: NUTCH-2106.patch
>
>
> All Selenium-based plugins contain the same dependendent jars which 
> significantly affects the size of runtime and bin package:
> {noformat}
> % du -hs runtime/local/plugins/*selenium/ runtime/deploy/*.job
> 25M runtime/local/plugins/lib-selenium/
> 25M runtime/local/plugins/protocol-interactiveselenium/
> 25M runtime/local/plugins/protocol-selenium/
> 182M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Since all plugins depend on the same Selenium version we could bundle the 
> dependencies in lib-selenium and let the other plugins load it from there:
> - let lib-selenium export all dependent libs, e.g.:
> {code:xml|title=lib-selenium/plugin.xml}
> 
>   ...
>   
> 
>   
> {code}
> - both protocol plugins already import lib-selenium: the dependencies in 
> ivy.xml can be removed
> As expected, these changes make the runtime smaller:
> {noformat}
> 25M runtime/local/plugins/lib-selenium/
> 20K runtime/local/plugins/protocol-interactiveselenium/
> 16K runtime/local/plugins/protocol-selenium/
> 138M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Open points:
> - I've tested only protocol-selenium using chromedriver. Should also test 
> protocol-interactiveselenium?
> - What about phantomjsdriver-1.2.1.jar? It was contained in lib-selenium and 
> protocol-selenium but not protocol-interactiveselenium. Is there a reason for 
> this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2106) Runtime to contain Selenium and dependencies only once

2015-09-18 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805366#comment-14805366
 ] 

Lewis John McGibbney commented on NUTCH-2106:
-

[~kwhitehall] lets touch base on this and try to include  within 
selenium definition. This is Maven magic so maybe we can print out 

{code}
ant report
{code}
.. that way we can see how many transient dependencies come from selenium.

[~wastl-nagel], tbh this was (and still is) and underlying concern for plugin 
dependencies... e.g. we recently introduced Apache Mahout. These libraries are 
non trivial by any means. We have the same issue.

I would encourage all additions to evaluate existing compatibility and where 
new functionality fits it. We do not want to break new features as old folks. :)


> Runtime to contain Selenium and dependencies only once
> --
>
> Key: NUTCH-2106
> URL: https://issues.apache.org/jira/browse/NUTCH-2106
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11
>Reporter: Sebastian Nagel
> Fix For: 1.11
>
> Attachments: NUTCH-2106.patch
>
>
> All Selenium-based plugins contain the same dependendent jars which 
> significantly affects the size of runtime and bin package:
> {noformat}
> % du -hs runtime/local/plugins/*selenium/ runtime/deploy/*.job
> 25M runtime/local/plugins/lib-selenium/
> 25M runtime/local/plugins/protocol-interactiveselenium/
> 25M runtime/local/plugins/protocol-selenium/
> 182M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Since all plugins depend on the same Selenium version we could bundle the 
> dependencies in lib-selenium and let the other plugins load it from there:
> - let lib-selenium export all dependent libs, e.g.:
> {code:xml|title=lib-selenium/plugin.xml}
> 
>   ...
>   
> 
>   
> {code}
> - both protocol plugins already import lib-selenium: the dependencies in 
> ivy.xml can be removed
> As expected, these changes make the runtime smaller:
> {noformat}
> 25M runtime/local/plugins/lib-selenium/
> 20K runtime/local/plugins/protocol-interactiveselenium/
> 16K runtime/local/plugins/protocol-selenium/
> 138M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Open points:
> - I've tested only protocol-selenium using chromedriver. Should also test 
> protocol-interactiveselenium?
> - What about phantomjsdriver-1.2.1.jar? It was contained in lib-selenium and 
> protocol-selenium but not protocol-interactiveselenium. Is there a reason for 
> this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)