[jira] [Commented] (NUTCH-2753) Add -listen option to command-line help of CrawlDbReader and LinkDbReader

2020-04-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096630#comment-17096630 ] ASF GitHub Bot commented on NUTCH-2753: --- sebastian-nagel opened a new pull request #523: URL:

[GitHub] [nutch] sebastian-nagel opened a new pull request #523: NUTCH-2753 Add -listen option to command-line help of CrawlDbReader and LinkDbReader

2020-04-30 Thread GitBox
sebastian-nagel opened a new pull request #523: URL: https://github.com/apache/nutch/pull/523 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[jira] [Commented] (NUTCH-2434) Add methods to reset parameters HTMLMetaTags

2020-04-30 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096645#comment-17096645 ] Markus Jelsma commented on NUTCH-2434: -- Ah, thanks! > Add methods to reset parameters HTMLMetaTags

[jira] [Updated] (NUTCH-2434) Add methods to reset parameters HTMLMetaTags

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2434: --- Summary: Add methods to reset parameters HTMLMetaTags (was: Option to reset parameters

[jira] [Updated] (NUTCH-2434) Add methods to reset parameters HTMLMetaTags

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2434: --- Component/s: parser > Add methods to reset parameters HTMLMetaTags >

[jira] [Commented] (NUTCH-2434) Add methods to reset parameters HTMLMetaTags

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096633#comment-17096633 ] Sebastian Nagel commented on NUTCH-2434: +1 [~markus17], nothing to complain, as this does not

[jira] [Resolved] (NUTCH-2784) Add tool to list Nutch and Hadoop properties

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2784. Resolution: Implemented > Add tool to list Nutch and Hadoop properties >

[jira] [Commented] (NUTCH-2776) Fetcher to temporarily deduplicate followed redirects

2020-04-30 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096338#comment-17096338 ] Hudson commented on NUTCH-2776: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3677 (See

[jira] [Commented] (NUTCH-2772) Debugging parse filter to show serialized DOM tree

2020-04-30 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096337#comment-17096337 ] Hudson commented on NUTCH-2772: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3677 (See

[jira] [Resolved] (NUTCH-2495) Use -deleteGone instead of clean job in crawler script while indexing

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2495. Resolution: Fixed > Use -deleteGone instead of clean job in crawler script while indexing

[jira] [Resolved] (NUTCH-2743) Add list of Nutch properties (nutch-default.xml) to documentation

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2743. Resolution: Implemented > Add list of Nutch properties (nutch-default.xml) to

[jira] [Commented] (NUTCH-2495) Use -deleteGone instead of clean job in crawler script while indexing

2020-04-30 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096388#comment-17096388 ] Hudson commented on NUTCH-2495: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3678 (See

[jira] [Commented] (NUTCH-2743) Add list of Nutch properties (nutch-default.xml) to documentation

2020-04-30 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096389#comment-17096389 ] Hudson commented on NUTCH-2743: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3678 (See

[jira] [Commented] (NUTCH-2784) Add tool to list Nutch and Hadoop properties

2020-04-30 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096387#comment-17096387 ] Hudson commented on NUTCH-2784: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3678 (See

[jira] [Resolved] (NUTCH-2772) Debugging parse filter to show serialized DOM tree

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2772. Resolution: Implemented > Debugging parse filter to show serialized DOM tree >

[jira] [Resolved] (NUTCH-2776) Fetcher to temporarily deduplicate followed redirects

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2776. Resolution: Implemented Merged. This feature has been successfully tested in production in

[jira] [Updated] (NUTCH-2771) Tests in nightly builds: speed up long runners

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2771: --- Fix Version/s: (was: 1.17) 1.18 > Tests in nightly builds: speed up

[jira] [Commented] (NUTCH-2771) Tests in nightly builds: speed up long runners

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096297#comment-17096297 ] Sebastian Nagel commented on NUTCH-2771: Moving to 1.18 for now. After a closer look: all these

[jira] [Resolved] (NUTCH-2507) NutchTutorial wiki pages as a lot of outdated command line calls when it starts with the solr interaction

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2507. Assignee: Sebastian Nagel Resolution: Fixed Thanks, [~artodeto]! The section in

[jira] [Updated] (NUTCH-2423) Update contributor info page

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2423: --- Fix Version/s: (was: 1.17) 1.18 > Update contributor info page >

[jira] [Commented] (NUTCH-2423) Update contributor info page

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096452#comment-17096452 ] Sebastian Nagel commented on NUTCH-2423: Applies to: -

[jira] [Resolved] (NUTCH-2425) Update GettingNutchRunningWithUbuntu wiki article

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2425. Fix Version/s: (was: 1.17) Resolution: Abandoned The wiki page

[jira] [Commented] (NUTCH-2743) Add list of Nutch properties (nutch-default.xml) to documentation

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096396#comment-17096396 ] Sebastian Nagel commented on NUTCH-2743: Current properties are now available through nightly

[jira] [Commented] (NUTCH-2743) Add list of Nutch properties (nutch-default.xml) to documentation

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096398#comment-17096398 ] Sebastian Nagel commented on NUTCH-2743: Also note that properties can be addressed via page

[GitHub] [nutch] sebastian-nagel opened a new pull request #521: NUTCH-2002 parse and index checkers to check robots.txt

2020-04-30 Thread GitBox
sebastian-nagel opened a new pull request #521: URL: https://github.com/apache/nutch/pull/521 - applied Julien's patch to recent code base - also check redirects whether they are allowed - add command-line parameter `-checkRobotsTxt` enabling this check

[jira] [Commented] (NUTCH-2002) ParserChecker to check robots.txt

2020-04-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096424#comment-17096424 ] ASF GitHub Bot commented on NUTCH-2002: --- sebastian-nagel opened a new pull request #521: URL:

[jira] [Assigned] (NUTCH-2002) ParserChecker and IndexingFiltersChecker to check robots.txt

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2002: -- Assignee: Sebastian Nagel > ParserChecker and IndexingFiltersChecker to check

[jira] [Updated] (NUTCH-2002) ParserChecker and IndexingFiltersChecker to check robots.txt

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2002: --- Summary: ParserChecker and IndexingFiltersChecker to check robots.txt (was: ParserChecker

[jira] [Commented] (NUTCH-2758) Add plugin READMEs to binary release packages

2020-04-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096472#comment-17096472 ] ASF GitHub Bot commented on NUTCH-2758: --- sebastian-nagel opened a new pull request #522: URL:

[jira] [Assigned] (NUTCH-2758) Add plugin READMEs to binary release packages

2020-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2758: -- Assignee: Sebastian Nagel > Add plugin READMEs to binary release packages >

[GitHub] [nutch] sebastian-nagel opened a new pull request #522: NUTCH-2758 Add plugin READMEs to binary release packages

2020-04-30 Thread GitBox
sebastian-nagel opened a new pull request #522: URL: https://github.com/apache/nutch/pull/522 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL