[jira] [Resolved] (NUTCH-2434) Add methods to reset parameters HTMLMetaTags

2020-05-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2434. Resolution: Implemented Applied [~markus17]'s patch to master in

[jira] [Resolved] (NUTCH-1652) Avoid instanciation of MimeUtil for each Content object created

2020-05-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1652. Resolution: Done This has been fixed as part of NUTCH-2578 for Nutch 1.15. Thanks,

[jira] [Assigned] (NUTCH-1945) Test for XLSX parser

2020-05-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-1945: -- Assignee: Sebastian Nagel > Test for XLSX parser > > >

[jira] [Updated] (NUTCH-1945) Test for XLSX parser

2020-05-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1945: --- Fix Version/s: 1.17 > Test for XLSX parser > > > Key:

[jira] [Commented] (NUTCH-2434) Add methods to reset parameters HTMLMetaTags

2020-05-05 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099740#comment-17099740 ] Hudson commented on NUTCH-2434: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3679 (See

[jira] [Assigned] (NUTCH-2753) Add -listen option to command-line help of CrawlDbReader and LinkDbReader

2020-05-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2753: -- Assignee: Sebastian Nagel > Add -listen option to command-line help of CrawlDbReader

[jira] [Resolved] (NUTCH-2753) Add -listen option to command-line help of CrawlDbReader and LinkDbReader

2020-05-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2753. Resolution: Fixed > Add -listen option to command-line help of CrawlDbReader and

[jira] [Resolved] (NUTCH-2002) ParserChecker and IndexingFiltersChecker to check robots.txt

2020-05-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2002. Resolution: Implemented > ParserChecker and IndexingFiltersChecker to check robots.txt >

[jira] [Commented] (NUTCH-2002) ParserChecker and IndexingFiltersChecker to check robots.txt

2020-05-05 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099977#comment-17099977 ] Hudson commented on NUTCH-2002: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3681 (See

[jira] [Commented] (NUTCH-2785) FreeGenerator: command-line option to define number of generated fetch lists

2020-05-05 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099976#comment-17099976 ] Hudson commented on NUTCH-2785: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3681 (See

[jira] [Commented] (NUTCH-2753) Add -listen option to command-line help of CrawlDbReader and LinkDbReader

2020-05-05 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099979#comment-17099979 ] Hudson commented on NUTCH-2753: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3681 (See

[jira] [Commented] (NUTCH-2758) Add plugin READMEs to binary release packages

2020-05-05 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099978#comment-17099978 ] Hudson commented on NUTCH-2758: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3681 (See

[jira] [Resolved] (NUTCH-2785) FreeGenerator: command-line option to define number of generated fetch lists

2020-05-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2785. Resolution: Fixed > FreeGenerator: command-line option to define number of generated fetch

[jira] [Resolved] (NUTCH-2758) Add plugin READMEs to binary release packages

2020-05-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2758. Resolution: Fixed > Add plugin READMEs to binary release packages >

[GitHub] [nutch] sebastian-nagel commented on pull request #514: NUTCH-1194 Generator: CrawlDB lock should be released earlier

2020-05-05 Thread GitBox
sebastian-nagel commented on pull request #514: URL: https://github.com/apache/nutch/pull/514#issuecomment-624003247 Rebased to master and squashed commits. This is an automated message from the Apache Git Service. To

[jira] [Commented] (NUTCH-1194) Generator: CrawlDB lock should be released earlier

2020-05-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099805#comment-17099805 ] ASF GitHub Bot commented on NUTCH-1194: --- sebastian-nagel commented on pull request #514: URL:

[jira] [Updated] (NUTCH-1806) Delegate processing of URL domains to crawler commons

2020-05-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1806: --- Fix Version/s: 1.18 > Delegate processing of URL domains to crawler commons >

[jira] [Commented] (NUTCH-1194) Generator: CrawlDB lock should be released earlier

2020-05-05 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099852#comment-17099852 ] Hudson commented on NUTCH-1194: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3680 (See

[jira] [Resolved] (NUTCH-1194) Generator: CrawlDB lock should be released earlier

2020-05-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1194. Resolution: Fixed > Generator: CrawlDB lock should be released earlier >

[GitHub] [nutch] sebastian-nagel opened a new pull request #525: NUTCH-1945 Test for XLSX parser

2020-05-05 Thread GitBox
sebastian-nagel opened a new pull request #525: URL: https://github.com/apache/nutch/pull/525 - add Tika unit test for XLSX files - bundle instance variables and utility methods in class TikaParserTest - clean up javadoc comments See patch attached to

[jira] [Commented] (NUTCH-1945) Test for XLSX parser

2020-05-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099801#comment-17099801 ] ASF GitHub Bot commented on NUTCH-1945: --- sebastian-nagel opened a new pull request #525: URL: