[jira] [Created] (NUTCH-2403) Nutch Selenium: Wrong documentation about PhantomJS

2017-07-21 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2403: --- Summary: Nutch Selenium: Wrong documentation about PhantomJS Key: NUTCH-2403 URL: https://issues.apache.org/jira/browse/NUTCH-2403 Project: Nutch

[jira] [Created] (NUTCH-2486) Compiler Warning: Unchecked / unsafe operations in MimeTypeIndexingFilter

2017-12-19 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2486: --- Summary: Compiler Warning: Unchecked / unsafe operations in MimeTypeIndexingFilter Key: NUTCH-2486 URL: https://issues.apache.org/jira/browse/NUTCH-2486

[jira] [Created] (NUTCH-2473) Elasticsearch REST Indexer broken due to wrong depenency

2017-12-07 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2473: --- Summary: Elasticsearch REST Indexer broken due to wrong depenency Key: NUTCH-2473 URL: https://issues.apache.org/jira/browse/NUTCH-2473 Project: Nutch

[jira] [Assigned] (NUTCH-2473) Elasticsearch REST Indexer broken due to wrong depenency

2017-12-07 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher reassigned NUTCH-2473: --- Assignee: Sebastian Nagel > Elasticsearch REST Indexer broken due to wrong depenency

[jira] [Created] (NUTCH-2493) Add configuration parameter for sitemap processing to crawler script

2018-01-08 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2493: --- Summary: Add configuration parameter for sitemap processing to crawler script Key: NUTCH-2493 URL: https://issues.apache.org/jira/browse/NUTCH-2493 Project:

[jira] [Created] (NUTCH-2491) Integrate sitemap processing and HostDB into crawl script

2018-01-03 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2491: --- Summary: Integrate sitemap processing and HostDB into crawl script Key: NUTCH-2491 URL: https://issues.apache.org/jira/browse/NUTCH-2491 Project: Nutch

[jira] [Updated] (NUTCH-2493) Add configuration parameter for sitemap processing to crawler script

2018-01-08 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher updated NUTCH-2493: Description: While using the crawler script with the sitemap processing feature introduced

[jira] [Updated] (NUTCH-2499) Elastic REST Indexer: Duplicate values

2018-01-16 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher updated NUTCH-2499: Description: Due to a change in

[jira] [Updated] (NUTCH-2499) Elastic REST Indexer: Duplicate values

2018-01-16 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher updated NUTCH-2499: Description: Due to a change in

[jira] [Commented] (NUTCH-2496) Speed up link inversion step in crawling script

2018-01-15 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326640#comment-16326640 ] Moreno Feltscher commented on NUTCH-2496: - [~markus17]: Thanks for that hint. This is something I

[jira] [Created] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering

2018-01-23 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2502: --- Summary: Any23 Plugin: Add Content-Type filtering Key: NUTCH-2502 URL: https://issues.apache.org/jira/browse/NUTCH-2502 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-2499) Elastic REST Indexer: Duplicate values

2018-01-16 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher updated NUTCH-2499: Environment: (was: Due to a change in

[jira] [Created] (NUTCH-2499) Elastic REST Indexer: Duplicate values

2018-01-16 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2499: --- Summary: Elastic REST Indexer: Duplicate values Key: NUTCH-2499 URL: https://issues.apache.org/jira/browse/NUTCH-2499 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-2496) Speed up link inversion step in crawling script

2018-01-17 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329760#comment-16329760 ] Moreno Feltscher commented on NUTCH-2496: - Thanks again for clearing things up even more. One

[jira] [Created] (NUTCH-2497) Elastic REST Indexer: Allow multiple hosts

2018-01-12 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2497: --- Summary: Elastic REST Indexer: Allow multiple hosts Key: NUTCH-2497 URL: https://issues.apache.org/jira/browse/NUTCH-2497 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-2496) Speed up link inversion step in crawling script

2018-01-12 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324737#comment-16324737 ] Moreno Feltscher commented on NUTCH-2496: - One thing I found out is that if I do the link

[jira] [Assigned] (NUTCH-2496) Speed up link inversion step in crawling script

2018-01-12 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher reassigned NUTCH-2496: --- Assignee: Lewis John McGibbney > Speed up link inversion step in crawling script >

[jira] [Created] (NUTCH-2496) Speed up link inversion step in crawling script

2018-01-12 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2496: --- Summary: Speed up link inversion step in crawling script Key: NUTCH-2496 URL: https://issues.apache.org/jira/browse/NUTCH-2496 Project: Nutch Issue

[jira] [Created] (NUTCH-2495) Use -deleteGone instead of clean job in crawler script while indexing

2018-01-12 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2495: --- Summary: Use -deleteGone instead of clean job in crawler script while indexing Key: NUTCH-2495 URL: https://issues.apache.org/jira/browse/NUTCH-2495 Project:

[jira] [Commented] (NUTCH-1129) Any23 Nutch plugin

2018-01-11 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323026#comment-16323026 ] Moreno Feltscher commented on NUTCH-1129: - [~lewismc]: Thanks for merging! A special thank you

[jira] [Commented] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-31 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347742#comment-16347742 ] Moreno Feltscher commented on NUTCH-2466: - I absolutely get your point and I'm a 100% with you on

[jira] [Created] (NUTCH-2508) Misleading documentation about http.proxy.exception.list

2018-01-31 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2508: --- Summary: Misleading documentation about http.proxy.exception.list Key: NUTCH-2508 URL: https://issues.apache.org/jira/browse/NUTCH-2508 Project: Nutch

[jira] [Commented] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-31 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347718#comment-16347718 ] Moreno Feltscher commented on NUTCH-2466: - Is there any way to configure this so that nutch

[jira] [Updated] (NUTCH-2490) Sitemap processing: Sitemap index files not working

2018-01-02 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher updated NUTCH-2490: Description: The [sitemap processing feature|https://wiki.apache.org/nutch/SitemapFeature]

[jira] [Created] (NUTCH-2492) Add more configuration parameters to crawl script

2018-01-03 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2492: --- Summary: Add more configuration parameters to crawl script Key: NUTCH-2492 URL: https://issues.apache.org/jira/browse/NUTCH-2492 Project: Nutch Issue

[jira] [Created] (NUTCH-2490) Sitemap processing: Sitemap index files not working

2018-01-02 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2490: --- Summary: Sitemap processing: Sitemap index files not working Key: NUTCH-2490 URL: https://issues.apache.org/jira/browse/NUTCH-2490 Project: Nutch

[jira] [Created] (NUTCH-2501) Take into account $NUTCH_HEAPSIZE when crawling using crawl script

2018-01-22 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2501: --- Summary: Take into account $NUTCH_HEAPSIZE when crawling using crawl script Key: NUTCH-2501 URL: https://issues.apache.org/jira/browse/NUTCH-2501 Project:

[jira] [Assigned] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering

2018-01-23 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher reassigned NUTCH-2502: --- Assignee: Lewis John McGibbney (was: Moreno Feltscher) > Any23 Plugin: Add

[jira] [Assigned] (NUTCH-2501) Take into account $NUTCH_HEAPSIZE when crawling using crawl script

2018-01-23 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher reassigned NUTCH-2501: --- Assignee: Lewis John McGibbney (was: Moreno Feltscher) > Take into account

[jira] [Assigned] (NUTCH-2495) Use -deleteGone instead of clean job in crawler script while indexing

2018-01-23 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher reassigned NUTCH-2495: --- Assignee: Lewis John McGibbney (was: Moreno Feltscher) > Use -deleteGone instead of

[jira] [Assigned] (NUTCH-2499) Elastic REST Indexer: Duplicate values

2018-01-23 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher reassigned NUTCH-2499: --- Assignee: Lewis John McGibbney (was: Moreno Feltscher) > Elastic REST Indexer:

[jira] [Created] (NUTCH-2503) Add option to run tests for a single plugin

2018-01-23 Thread Moreno Feltscher (JIRA)
Moreno Feltscher created NUTCH-2503: --- Summary: Add option to run tests for a single plugin Key: NUTCH-2503 URL: https://issues.apache.org/jira/browse/NUTCH-2503 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-2501) Take into account $NUTCH_HEAPSIZE when crawling using crawl script

2018-01-23 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335999#comment-16335999 ] Moreno Feltscher commented on NUTCH-2501: - Pull request: https://github.com/apache/nutch/pull/279

[jira] [Commented] (NUTCH-2503) Add option to run tests for a single plugin

2018-01-23 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335991#comment-16335991 ] Moreno Feltscher commented on NUTCH-2503: - Pull request: https://github.com/apache/nutch/pull/281

[jira] [Commented] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering

2018-01-23 Thread Moreno Feltscher (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335994#comment-16335994 ] Moreno Feltscher commented on NUTCH-2502: - Pull request: https://github.com/apache/nutch/pull/280