[jira] [Commented] (NUTCH-1945) Test for XLSX parser

2020-05-12 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105455#comment-17105455 ] Hudson commented on NUTCH-1945: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3682 (See

[jira] [Resolved] (NUTCH-1945) Test for XLSX parser

2020-05-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1945. Resolution: Implemented > Test for XLSX parser > > >

[jira] [Commented] (NUTCH-2419) Domain blacklist URL filter does not respect command-line override for file

2020-05-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105413#comment-17105413 ] Sebastian Nagel commented on NUTCH-2419: Working on a patch. Turned out that the situation is

[jira] [Updated] (NUTCH-2419) Some URL filters and normalizers do not respect command-line override for rule file

2020-05-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2419: --- Summary: Some URL filters and normalizers do not respect command-line override for rule file

[jira] [Updated] (NUTCH-2419) Some URL filters and normalizers do not respect command-line override for rule file

2020-05-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2419: --- Component/s: urlnormalizer urlfilter plugin > Some URL

[jira] [Commented] (NUTCH-2419) Domain blacklist URL filter does not respect command-line override for file

2020-05-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105418#comment-17105418 ] Sebastian Nagel commented on NUTCH-2419: Hi [~markus17] , a PR is open to fix this issue in all

[jira] [Updated] (NUTCH-2318) Text extraction in HtmlParser adds too much whitespace.

2020-05-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2318: --- Component/s: plugin > Text extraction in HtmlParser adds too much whitespace. >