[jira] [Created] (NUTCH-2741) Remove ivy/ivy-2.2.0.jar

2019-10-01 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2741: -- Summary: Remove ivy/ivy-2.2.0.jar Key: NUTCH-2741 URL: https://issues.apache.org/jira/browse/NUTCH-2741 Project: Nutch Issue Type: Bug

[jira] [Resolved] (NUTCH-1805) Remove unnecessary transitive dependencies from Hadoop core

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1805. Resolution: Resolved We rely now only on a fixed set of Hadoop sub-dependencies

[jira] [Updated] (NUTCH-1917) index.parse.md, index.content.md and index.db.md should support wildcard

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1917: --- Fix Version/s: (was: 1.16) 1.17 > index.parse.md, index.content.md

[jira] [Commented] (NUTCH-1403) Add default ScoringFilter for manipulating metadata

2019-10-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941784#comment-16941784 ] ASF GitHub Bot commented on NUTCH-1403: --- sebastian-nagel commented on pull request #458: fix for

[jira] [Commented] (NUTCH-1403) Add default ScoringFilter for manipulating metadata

2019-10-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941787#comment-16941787 ] ASF GitHub Bot commented on NUTCH-1403: --- sebastian-nagel commented on pull request #458: fix for

[jira] [Commented] (NUTCH-1403) Add default ScoringFilter for manipulating metadata

2019-10-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941786#comment-16941786 ] ASF GitHub Bot commented on NUTCH-1403: --- sebastian-nagel commented on pull request #458: fix for

[jira] [Commented] (NUTCH-1403) Add default ScoringFilter for manipulating metadata

2019-10-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941785#comment-16941785 ] ASF GitHub Bot commented on NUTCH-1403: --- sebastian-nagel commented on pull request #458: fix for

[jira] [Commented] (NUTCH-1403) Add default ScoringFilter for manipulating metadata

2019-10-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941788#comment-16941788 ] ASF GitHub Bot commented on NUTCH-1403: --- sebastian-nagel commented on pull request #458: fix for

[jira] [Resolved] (NUTCH-1220) Upgrade Solr deps

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1220. Resolution: Resolved Obsoleted by multiple Solr upgrades, NUTCH-2600 is the latest one. >

[jira] [Resolved] (NUTCH-1035) Tune Solr config for Nutch users

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1035. Resolution: Abandoned Definitely outdated, the Solr schema.xml has been reworked multiple

[jira] [Commented] (NUTCH-1403) Add default ScoringFilter for manipulating metadata

2019-10-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941851#comment-16941851 ] ASF GitHub Bot commented on NUTCH-1403: --- aalbahem commented on pull request #458: fix for

[jira] [Updated] (NUTCH-1380) Fetcher reducer not to configure filter/normalizers

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1380: --- Fix Version/s: 1.17 > Fetcher reducer not to configure filter/normalizers >

[jira] [Commented] (NUTCH-1380) Fetcher reducer not to configure filter/normalizers

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941863#comment-16941863 ] Sebastian Nagel commented on NUTCH-1380: This should be fixed by NUTCH-2375 which has moved the

[jira] [Updated] (NUTCH-1749) Optionally exclude title from content field

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1749: --- Fix Version/s: (was: 1.16) 1.17 > Optionally exclude title from

[jira] [Commented] (NUTCH-1403) Add default ScoringFilter for manipulating metadata

2019-10-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941869#comment-16941869 ] ASF GitHub Bot commented on NUTCH-1403: --- sebastian-nagel commented on pull request #458: fix for

[jira] [Updated] (NUTCH-2278) Handle alpha-2 language codes consistently

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2278: --- Fix Version/s: (was: 1.16) 1.17 > Handle alpha-2 language codes

[jira] [Updated] (NUTCH-1403) Add default ScoringFilter for manipulating metadata

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1403: --- Fix Version/s: (was: 1.16) 1.17 > Add default ScoringFilter for

[jira] [Updated] (NUTCH-2248) CSS parser plugin

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2248: --- Fix Version/s: (was: 1.16) 1.17 > CSS parser plugin >

[jira] [Updated] (NUTCH-2506) host is not available for filtering on the JEXL indexing plugin

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2506: --- Fix Version/s: (was: 1.16) 1.17 > host is not available for filtering

[jira] [Updated] (NUTCH-2419) Domain blacklist URL filter does not respect command-line override for file

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2419: --- Fix Version/s: (was: 1.16) 1.17 > Domain blacklist URL filter does

[jira] [Resolved] (NUTCH-2740) Generator: generate.max.count overflow not logged

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2740. Resolution: Fixed > Generator: generate.max.count overflow not logged >

[jira] [Resolved] (NUTCH-2738) Generator: document property generate.restrict.status

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2738. Resolution: Fixed > Generator: document property generate.restrict.status >

[jira] [Resolved] (NUTCH-2737) Generator: count and log reason of rejections during selection

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2737. Resolution: Implemented > Generator: count and log reason of rejections during selection >

[jira] [Commented] (NUTCH-2737) Generator: count and log reason of rejections during selection

2019-10-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941887#comment-16941887 ] ASF GitHub Bot commented on NUTCH-2737: --- sebastian-nagel commented on pull request #477: NUTCH-2737

[jira] [Commented] (NUTCH-2740) Generator: generate.max.count overflow not logged

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941888#comment-16941888 ] Sebastian Nagel commented on NUTCH-2740: Fixed in

[jira] [Resolved] (NUTCH-1176) Fix all javadoc warnings from nightly builds

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1176. Resolution: Abandoned Outdated. > Fix all javadoc warnings from nightly builds >

[jira] [Updated] (NUTCH-1194) CrawlDB lock should be released earlier

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1194: --- Fix Version/s: 1.17 > CrawlDB lock should be released earlier >

[jira] [Updated] (NUTCH-1186) FreeGenerator always normalizes

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1186: --- Fix Version/s: 1.17 > FreeGenerator always normalizes > --- > >

[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941824#comment-16941824 ] Sebastian Nagel commented on NUTCH-1186: Disabling normalization can be done by setting:

[jira] [Commented] (NUTCH-2735) Update the indexer-solr documentation about the schema.xml usage

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941867#comment-16941867 ] Sebastian Nagel commented on NUTCH-2735: Ok, moving to 1.17. We need a clean list without open

[jira] [Updated] (NUTCH-2735) Update the indexer-solr documentation about the schema.xml usage

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2735: --- Fix Version/s: (was: 1.16) > Update the indexer-solr documentation about the schema.xml

[jira] [Updated] (NUTCH-1559) parse-metatags duplicates extracted metatags in combination with parse-tika

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1559: --- Fix Version/s: (was: 1.16) 1.17 > parse-metatags duplicates extracted

[jira] [Updated] (NUTCH-2511) SitemapProcessor limited by http.content.limit

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2511: --- Fix Version/s: (was: 1.16) 1.17 > SitemapProcessor limited by

[jira] [Commented] (NUTCH-2279) LinkRank fails when using Hadoop MR output compression

2019-10-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941889#comment-16941889 ] ASF GitHub Bot commented on NUTCH-2279: --- sebastian-nagel commented on pull request #478: NUTCH-2279

[jira] [Resolved] (NUTCH-1076) Solrindex has no documents following bin/nutch solrindex when using protocol-file

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1076. Resolution: Duplicate > Solrindex has no documents following bin/nutch solrindex when

[jira] [Resolved] (NUTCH-1342) Read time out protocol-http

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1342. Resolution: Not A Problem Can be fixed via configuration. Thanks, everybody! > Read time

[jira] [Updated] (NUTCH-2525) Metadata indexer cannot handle uppercase parse metadata

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2525: --- Fix Version/s: (was: 1.16) 1.17 > Metadata indexer cannot handle

[jira] [Updated] (NUTCH-2309) Scoring-Similarity Plugin raises NullPointerException when error occurs in fetching URL

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2309: --- Fix Version/s: (was: 1.16) 1.17 > Scoring-Similarity Plugin raises

[jira] [Updated] (NUTCH-2353) Create seed file with metadata using the REST API

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2353: --- Fix Version/s: (was: 1.16) 1.17 > Create seed file with metadata

[jira] [Resolved] (NUTCH-2279) LinkRank fails when using Hadoop MR output compression

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2279. Resolution: Fixed Thanks, [~naegelejd]! > LinkRank fails when using Hadoop MR output

[jira] [Commented] (NUTCH-2738) Generator: document property generate.restrict.status

2019-10-01 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942046#comment-16942046 ] Hudson commented on NUTCH-2738: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3649 (See

[jira] [Commented] (NUTCH-2279) LinkRank fails when using Hadoop MR output compression

2019-10-01 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942048#comment-16942048 ] Hudson commented on NUTCH-2279: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3649 (See

[jira] [Commented] (NUTCH-2737) Generator: count and log reason of rejections during selection

2019-10-01 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942045#comment-16942045 ] Hudson commented on NUTCH-2737: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3649 (See

[jira] [Commented] (NUTCH-2740) Generator: generate.max.count overflow not logged

2019-10-01 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942047#comment-16942047 ] Hudson commented on NUTCH-2740: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3649 (See

ApacheCon North America 2020, project participation

2019-10-01 Thread Rich Bowen
Hi, folks, (Note: You're receiving this email because you're on the dev@ list for one or more Apache Software Foundation projects.) For ApacheCon North America 2019, we asked projects to participate in the creation of project/topic specific tracks. This was very successful, with about 15