[jira] [Commented] (NUTCH-2086) Nutch 1.X Webui

2015-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900316#comment-14900316
 ] 

ASF GitHub Bot commented on NUTCH-2086:
---

GitHub user sujen1412 opened a pull request:

https://github.com/apache/nutch/pull/61

Fix for NUTCH-2086 Contributed by Sujen Shah



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sujen1412/nutch NUTCH-2086

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/61.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #61


commit f60fe99389b194f960c7562224e09380b3e81e31
Author: Sujen Shah 
Date:   2015-09-21T07:22:34Z

Nutch 1x webui ported from 2x




> Nutch 1.X Webui 
> 
>
> Key: NUTCH-2086
> URL: https://issues.apache.org/jira/browse/NUTCH-2086
> Project: Nutch
>  Issue Type: New Feature
>  Components: REST_api, web gui
>Reporter: Sujen Shah
>Assignee: Chris A. Mattmann
>  Labels: memex
> Fix For: 1.11
>
>
> To port the Apache Wicket based webui in Nutch 2.X to 1.X



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request: Fix for NUTCH-2086 Contributed by Sujen Shah

2015-09-21 Thread sujen1412
GitHub user sujen1412 opened a pull request:

https://github.com/apache/nutch/pull/61

Fix for NUTCH-2086 Contributed by Sujen Shah



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sujen1412/nutch NUTCH-2086

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/61.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #61


commit f60fe99389b194f960c7562224e09380b3e81e31
Author: Sujen Shah 
Date:   2015-09-21T07:22:34Z

Nutch 1x webui ported from 2x




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2110) Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium"

2015-09-21 Thread Asitang Mishra (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901798#comment-14901798
 ] 

Asitang Mishra commented on NUTCH-2110:
---

Also updated the description to tackle some basic problems with this idea first.

> Create the capability to provide seeds in the form of "url+xpath(including 
> option to enter seach terms).selenium" 
> --
>
> Key: NUTCH-2110
> URL: https://issues.apache.org/jira/browse/NUTCH-2110
> Project: Nutch
>  Issue Type: Sub-task
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Asitang Mishra
>  Labels: memex
>
> Create the capability to provide seeds in the form of "url+xpath(including 
> option to enter seach terms).selenium" to be used by selenium 
> protocols/plugins as urls/flow to reach to a specific ajax based page or save 
> the state of a selenium operation for the next fetching round.
> Atleast, this should make nutch capable of distinguishing if a url should be 
> opened using the basic http, httpclient or selenium protocols. And provide 
> the selenium protocol with basic authentication capabilities based on the 
> above ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2110) Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium"

2015-09-21 Thread Asitang Mishra (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Asitang Mishra updated NUTCH-2110:
--
Description: 
Create the capability to provide seeds in the form of "url+xpath(including 
option to enter seach terms).selenium" to be used by selenium protocols/plugins 
as urls/flow to reach to a specific ajax based page or save the state of a 
selenium operation for the next fetching round.
Atleast, this should make nutch capable of distinguishing if a url should be 
opened using the basic http, httpclient or selenium protocols. And provide the 
selenium protocol with basic authentication capabilities based on the above 
ideas.


  was:Create the capability to provide seeds in the form of 
"url+xpath(including option to enter seach terms).selenium" to be used by 
selenium protocols/plugins as urls/flow to reach to a specific ajax based page 
or save the state of a selenium operation for the next fetching round.


> Create the capability to provide seeds in the form of "url+xpath(including 
> option to enter seach terms).selenium" 
> --
>
> Key: NUTCH-2110
> URL: https://issues.apache.org/jira/browse/NUTCH-2110
> Project: Nutch
>  Issue Type: Sub-task
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Asitang Mishra
>  Labels: memex
>
> Create the capability to provide seeds in the form of "url+xpath(including 
> option to enter seach terms).selenium" to be used by selenium 
> protocols/plugins as urls/flow to reach to a specific ajax based page or save 
> the state of a selenium operation for the next fetching round.
> Atleast, this should make nutch capable of distinguishing if a url should be 
> opened using the basic http, httpclient or selenium protocols. And provide 
> the selenium protocol with basic authentication capabilities based on the 
> above ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2106) Runtime to contain Selenium and dependencies only once

2015-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901528#comment-14901528
 ] 

Hudson commented on NUTCH-2106:
---

SUCCESS: Integrated in Nutch-trunk #3275 (See 
[https://builds.apache.org/job/Nutch-trunk/3275/])
NUTCH-2106 Runtime to contain Selenium and dependencies only once (snagel: 
http://svn.apache.org/viewvc/nutch/trunk/?view=rev=1704425)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/plugin/lib-selenium/build-ivy.xml
* /nutch/trunk/src/plugin/lib-selenium/howto_upgrade_selenium.txt
* /nutch/trunk/src/plugin/lib-selenium/ivy.xml
* /nutch/trunk/src/plugin/lib-selenium/plugin.xml
* /nutch/trunk/src/plugin/protocol-interactiveselenium/build-ivy.xml
* /nutch/trunk/src/plugin/protocol-interactiveselenium/ivy.xml
* /nutch/trunk/src/plugin/protocol-interactiveselenium/plugin.xml
* /nutch/trunk/src/plugin/protocol-selenium/build-ivy.xml
* /nutch/trunk/src/plugin/protocol-selenium/ivy.xml
* /nutch/trunk/src/plugin/protocol-selenium/plugin.xml


> Runtime to contain Selenium and dependencies only once
> --
>
> Key: NUTCH-2106
> URL: https://issues.apache.org/jira/browse/NUTCH-2106
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
> Fix For: 1.11
>
> Attachments: NUTCH-2106.patch
>
>
> All Selenium-based plugins contain the same dependendent jars which 
> significantly affects the size of runtime and bin package:
> {noformat}
> % du -hs runtime/local/plugins/*selenium/ runtime/deploy/*.job
> 25M runtime/local/plugins/lib-selenium/
> 25M runtime/local/plugins/protocol-interactiveselenium/
> 25M runtime/local/plugins/protocol-selenium/
> 182M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Since all plugins depend on the same Selenium version we could bundle the 
> dependencies in lib-selenium and let the other plugins load it from there:
> - let lib-selenium export all dependent libs, e.g.:
> {code:xml|title=lib-selenium/plugin.xml}
> 
>   ...
>   
> 
>   
> {code}
> - both protocol plugins already import lib-selenium: the dependencies in 
> ivy.xml can be removed
> As expected, these changes make the runtime smaller:
> {noformat}
> 25M runtime/local/plugins/lib-selenium/
> 20K runtime/local/plugins/protocol-interactiveselenium/
> 16K runtime/local/plugins/protocol-selenium/
> 138M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Open points:
> - I've tested only protocol-selenium using chromedriver. Should also test 
> protocol-interactiveselenium?
> - What about phantomjsdriver-1.2.1.jar? It was contained in lib-selenium and 
> protocol-selenium but not protocol-interactiveselenium. Is there a reason for 
> this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2110) Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium"

2015-09-21 Thread Asitang Mishra (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901059#comment-14901059
 ] 

Asitang Mishra commented on NUTCH-2110:
---

Hi Sebastain,

Yes, using the crawldatum is the perfect idea.

This thought came to my mind when we had a use case where: The whole site was 
ajax based. So the pagination was also ajax (the url wouldnt change with the 
pagination click), so we needed to fetch the whole site in one go. We thought 
there must be a way to identify an ajax based resource/page because url was 
insufficient. That is when I thought url+a series of selenium interaction info 
can be used as a unique identifier in such scenarios.
This is mostly theoretical right now, because things need to be discussed upon 
like how the outlinks can be identified for the next fetch (have some ideas 
though).

And to answer your last questions. Imagine this scenario: We have a starting 
page called page1. There are a bunch of ajax clicks here. We click all of them 
the page manipulates and we save all the info into the data of that page. Then 
we need to go to the next page, which is still not exactly a different url but 
a page interaction. So, we 'somehow' save this for the next round. How do we do 
that??. So in the next round we come back to the page1 (cause there is no other 
way to page2 if not thru page1 since it does not have a unique url) and this 
time we dont go thru all the interaction in page1 and save no data for this 
page, but only click the pagination for page2 --> go to page2 and click around 
again and save data for it.


> Create the capability to provide seeds in the form of "url+xpath(including 
> option to enter seach terms).selenium" 
> --
>
> Key: NUTCH-2110
> URL: https://issues.apache.org/jira/browse/NUTCH-2110
> Project: Nutch
>  Issue Type: Sub-task
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Asitang Mishra
>  Labels: memex
>
> Create the capability to provide seeds in the form of "url+xpath(including 
> option to enter seach terms).selenium" to be used by selenium 
> protocols/plugins as urls/flow to reach to a specific ajax based page or save 
> the state of a selenium operation for the next fetching round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2095) WARC exporter for the CommonCrawlDataDumper

2015-09-21 Thread Jorge Luis Betancourt Gonzalez (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Luis Betancourt Gonzalez updated NUTCH-2095:
--
Attachment: NUTCH-2095.patch

> WARC exporter for the CommonCrawlDataDumper
> ---
>
> Key: NUTCH-2095
> URL: https://issues.apache.org/jira/browse/NUTCH-2095
> Project: Nutch
>  Issue Type: Improvement
>  Components: commoncrawl, tool
>Affects Versions: 1.11
>Reporter: Jorge Luis Betancourt Gonzalez
>Priority: Minor
>  Labels: tools, warc
> Attachments: NUTCH-2095.patch
>
>
> Adds the possibility of exporting the nutch segments to a WARC files.
> From the usage point of view a couple of new command line options are 
> available:
> {{-warc}}: enables the functionality to export into WARC files, if not 
> specified the default JACKSON formatter is used.
> {{-warcSize}}: enable the option to define a max file size for each WARC 
> file, if not specified a default of 1GB per file is used as recommended by 
> the WARC ISO standard.
> The usual {{-gzip}} flag can be used to enable compression on the WARC files.
> Some changes to the default {{CommonCrawlDataDumper}} were done, essentially 
> some changes to the Factory and to the Formats. This changes avoid creating a 
> new instance of a {{CommmonCrawlFormat}} on each URL read from the segments. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2095) WARC exporter for the CommonCrawlDataDumper

2015-09-21 Thread Jorge Luis Betancourt Gonzalez (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Luis Betancourt Gonzalez updated NUTCH-2095:
--
Attachment: (was: NUTCH-2095.patch)

> WARC exporter for the CommonCrawlDataDumper
> ---
>
> Key: NUTCH-2095
> URL: https://issues.apache.org/jira/browse/NUTCH-2095
> Project: Nutch
>  Issue Type: Improvement
>  Components: commoncrawl, tool
>Affects Versions: 1.11
>Reporter: Jorge Luis Betancourt Gonzalez
>Priority: Minor
>  Labels: tools, warc
>
> Adds the possibility of exporting the nutch segments to a WARC files.
> From the usage point of view a couple of new command line options are 
> available:
> {{-warc}}: enables the functionality to export into WARC files, if not 
> specified the default JACKSON formatter is used.
> {{-warcSize}}: enable the option to define a max file size for each WARC 
> file, if not specified a default of 1GB per file is used as recommended by 
> the WARC ISO standard.
> The usual {{-gzip}} flag can be used to enable compression on the WARC files.
> Some changes to the default {{CommonCrawlDataDumper}} were done, essentially 
> some changes to the Factory and to the Formats. This changes avoid creating a 
> new instance of a {{CommmonCrawlFormat}} on each URL read from the segments. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2094) Stopping and Restarting a crawl has issues in the Web UI

2015-09-21 Thread Prerna Satija (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901357#comment-14901357
 ] 

Prerna Satija commented on NUTCH-2094:
--

Thanks!

> Stopping and Restarting a crawl has issues in the Web UI
> 
>
> Key: NUTCH-2094
> URL: https://issues.apache.org/jira/browse/NUTCH-2094
> Project: Nutch
>  Issue Type: Bug
>  Components: web gui
>Reporter: Prerna Satija
>Assignee: Chris A. Mattmann
> Fix For: 2.4
>
>
> I have created a stop button in Nutch webapp to stop a running crawl from the 
> UI on click of a "stop" button. While testing, I found that I am able to stop 
> a crawl successfully but when I restart a stopped crawl and try to stop it, 
> it doesn't stop. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (NUTCH-2106) Runtime to contain Selenium and dependencies only once

2015-09-21 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel reassigned NUTCH-2106:
--

Assignee: Sebastian Nagel

> Runtime to contain Selenium and dependencies only once
> --
>
> Key: NUTCH-2106
> URL: https://issues.apache.org/jira/browse/NUTCH-2106
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
> Fix For: 1.11
>
> Attachments: NUTCH-2106.patch
>
>
> All Selenium-based plugins contain the same dependendent jars which 
> significantly affects the size of runtime and bin package:
> {noformat}
> % du -hs runtime/local/plugins/*selenium/ runtime/deploy/*.job
> 25M runtime/local/plugins/lib-selenium/
> 25M runtime/local/plugins/protocol-interactiveselenium/
> 25M runtime/local/plugins/protocol-selenium/
> 182M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Since all plugins depend on the same Selenium version we could bundle the 
> dependencies in lib-selenium and let the other plugins load it from there:
> - let lib-selenium export all dependent libs, e.g.:
> {code:xml|title=lib-selenium/plugin.xml}
> 
>   ...
>   
> 
>   
> {code}
> - both protocol plugins already import lib-selenium: the dependencies in 
> ivy.xml can be removed
> As expected, these changes make the runtime smaller:
> {noformat}
> 25M runtime/local/plugins/lib-selenium/
> 20K runtime/local/plugins/protocol-interactiveselenium/
> 16K runtime/local/plugins/protocol-selenium/
> 138M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Open points:
> - I've tested only protocol-selenium using chromedriver. Should also test 
> protocol-interactiveselenium?
> - What about phantomjsdriver-1.2.1.jar? It was contained in lib-selenium and 
> protocol-selenium but not protocol-interactiveselenium. Is there a reason for 
> this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2106) Runtime to contain Selenium and dependencies only once

2015-09-21 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2106.

Resolution: Fixed

Committed to trunk, r1704425. Thanks, Lewis!

> Runtime to contain Selenium and dependencies only once
> --
>
> Key: NUTCH-2106
> URL: https://issues.apache.org/jira/browse/NUTCH-2106
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.11
>Reporter: Sebastian Nagel
> Fix For: 1.11
>
> Attachments: NUTCH-2106.patch
>
>
> All Selenium-based plugins contain the same dependendent jars which 
> significantly affects the size of runtime and bin package:
> {noformat}
> % du -hs runtime/local/plugins/*selenium/ runtime/deploy/*.job
> 25M runtime/local/plugins/lib-selenium/
> 25M runtime/local/plugins/protocol-interactiveselenium/
> 25M runtime/local/plugins/protocol-selenium/
> 182M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Since all plugins depend on the same Selenium version we could bundle the 
> dependencies in lib-selenium and let the other plugins load it from there:
> - let lib-selenium export all dependent libs, e.g.:
> {code:xml|title=lib-selenium/plugin.xml}
> 
>   ...
>   
> 
>   
> {code}
> - both protocol plugins already import lib-selenium: the dependencies in 
> ivy.xml can be removed
> As expected, these changes make the runtime smaller:
> {noformat}
> 25M runtime/local/plugins/lib-selenium/
> 20K runtime/local/plugins/protocol-interactiveselenium/
> 16K runtime/local/plugins/protocol-selenium/
> 138M runtime/deploy/apache-nutch-1.11-SNAPSHOT.job
> {noformat}
> Open points:
> - I've tested only protocol-selenium using chromedriver. Should also test 
> protocol-interactiveselenium?
> - What about phantomjsdriver-1.2.1.jar? It was contained in lib-selenium and 
> protocol-selenium but not protocol-interactiveselenium. Is there a reason for 
> this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901450#comment-14901450
 ] 

ASF GitHub Bot commented on NUTCH-2108:
---

GitHub user asitang opened a pull request:

https://github.com/apache/nutch/pull/62

made changes for NUTCH-2108 and formatted the previously unformatted …

…code for this plugin

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/asitang/nutch NUTCH-2108

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/62.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #62


commit cd742353283410cc2f750af16a3ca6e286525193
Author: Asitang Mishra 
Date:   2015-09-21T21:32:41Z

made changes for NUTCH-2108 and formatted the previously unformatted code 
for this plugin




> Add a function to the selenium interactive plugin interface to do multiple 
> manipulation of driver and then return the data
> --
>
> Key: NUTCH-2108
> URL: https://issues.apache.org/jira/browse/NUTCH-2108
> Project: Nutch
>  Issue Type: Sub-task
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Asitang Mishra
>Priority: Minor
>  Labels: memex
>
> In the interactive selenium plugin we have to create handler classes for each 
> manipulation of a page. Sometimes we need to manipulate a page in many ways 
> and keep track of those manipulations. Like clicking on say each link in a 
> table and then refreshing to get the original page back as even one click can 
> make all other links go away. This can be done in a single loop. Which will 
> be a little too much work and way complicated using multiple handlers. So, I 
> am proposing a new function "String multiProcessDriver(WebDriver driver)"  
> that takes the driver and returns a concatenated String along with the 
> already present "void processDriver(WebDriver driver)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request: made changes for NUTCH-2108 and formatted the ...

2015-09-21 Thread asitang
GitHub user asitang opened a pull request:

https://github.com/apache/nutch/pull/62

made changes for NUTCH-2108 and formatted the previously unformatted …

…code for this plugin

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/asitang/nutch NUTCH-2108

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/62.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #62


commit cd742353283410cc2f750af16a3ca6e286525193
Author: Asitang Mishra 
Date:   2015-09-21T21:32:41Z

made changes for NUTCH-2108 and formatted the previously unformatted code 
for this plugin




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-09-21 Thread Asitang Mishra (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901458#comment-14901458
 ] 

Asitang Mishra commented on NUTCH-2108:
---

Hi [~jo...@apache.org],

Can you take a look at the changes once.

> Add a function to the selenium interactive plugin interface to do multiple 
> manipulation of driver and then return the data
> --
>
> Key: NUTCH-2108
> URL: https://issues.apache.org/jira/browse/NUTCH-2108
> Project: Nutch
>  Issue Type: Sub-task
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Asitang Mishra
>Priority: Minor
>  Labels: memex
>
> In the interactive selenium plugin we have to create handler classes for each 
> manipulation of a page. Sometimes we need to manipulate a page in many ways 
> and keep track of those manipulations. Like clicking on say each link in a 
> table and then refreshing to get the original page back as even one click can 
> make all other links go away. This can be done in a single loop. Which will 
> be a little too much work and way complicated using multiple handlers. So, I 
> am proposing a new function "String multiProcessDriver(WebDriver driver)"  
> that takes the driver and returns a concatenated String along with the 
> already present "void processDriver(WebDriver driver)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)