[jira] [Commented] (NUTCH-2703) parse-tika: Boilerpipe should not run for non-(X)HTML pages

2019-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815285#comment-16815285 ] ASF GitHub Bot commented on NUTCH-2703: --- sebastian-nagel commented on pull request #449: NUTCH-2703

[jira] [Commented] (NUTCH-2703) parse-tika: Boilerpipe should not run for non-(X)HTML pages

2019-04-11 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815305#comment-16815305 ] Markus Jelsma commented on NUTCH-2703: -- remote: To git@github:apache/nutch.git remote:

[jira] [Resolved] (NUTCH-2703) parse-tika: Boilerpipe should not run for non-(X)HTML pages

2019-04-11 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2703. -- Resolution: Fixed Assignee: Markus Jelsma > parse-tika: Boilerpipe should not run for

[jira] [Commented] (NUTCH-2704) Upgrade crawler-commons dependency to 1.0

2019-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815274#comment-16815274 ] ASF GitHub Bot commented on NUTCH-2704: --- sebastian-nagel commented on pull request #448: NUTCH-2704

Build failed in Jenkins: Nutch-trunk #3620

2019-04-11 Thread Apache Jenkins Server
See Changes: [markus] NUTCH-2703 parse-tika: Boilerpipe should not run for non-(X)HTML pages -- [...truncated 5.65 KB...] [javac] Compiling 298 source files to

[jira] [Commented] (NUTCH-2703) parse-tika: Boilerpipe should not run for non-(X)HTML pages

2019-04-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815308#comment-16815308 ] Hudson commented on NUTCH-2703: --- FAILURE: Integrated in Jenkins build Nutch-trunk #3620 (See

[jira] [Commented] (NUTCH-2700) Indexchecker: improve command-line help

2019-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815310#comment-16815310 ] ASF GitHub Bot commented on NUTCH-2700: --- sebastian-nagel commented on pull request #446: NUTCH-2700

[jira] [Commented] (NUTCH-2690) Configurable and fast URL filter

2019-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815355#comment-16815355 ] Sebastian Nagel commented on NUTCH-2690: PR updated, squashed and rebased to current master. I'll

[jira] [Comment Edited] (NUTCH-2690) Configurable and fast URL filter

2019-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815355#comment-16815355 ] Sebastian Nagel edited comment on NUTCH-2690 at 4/11/19 11:53 AM: -- PR

[jira] [Assigned] (NUTCH-2279) LinkRank fails when using Hadoop MR output compression

2019-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2279: -- Assignee: Sebastian Nagel > LinkRank fails when using Hadoop MR output compression >

[jira] [Created] (NUTCH-2708) urlfilter-automaton: update library dependency (dk.brics.automaton)

2019-04-11 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2708: -- Summary: urlfilter-automaton: update library dependency (dk.brics.automaton) Key: NUTCH-2708 URL: https://issues.apache.org/jira/browse/NUTCH-2708 Project: Nutch

[jira] [Commented] (NUTCH-2703) parse-tika: Boilerpipe should not run for non-(X)HTML pages

2019-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815282#comment-16815282 ] Sebastian Nagel commented on NUTCH-2703: +1 But I would opt to make it configurable. I'll open a

[jira] [Work started] (NUTCH-2700) Indexchecker: improve command-line help

2019-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2700 started by Sebastian Nagel. -- > Indexchecker: improve command-line help >

[jira] [Assigned] (NUTCH-2700) Indexchecker: improve command-line help

2019-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2700: -- Assignee: Sebastian Nagel > Indexchecker: improve command-line help >

[jira] [Resolved] (NUTCH-2700) Indexchecker: improve command-line help

2019-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2700. Resolution: Implemented > Indexchecker: improve command-line help >

[jira] [Commented] (NUTCH-2708) urlfilter-automaton: update library dependency (dk.brics.automaton)

2019-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815360#comment-16815360 ] ASF GitHub Bot commented on NUTCH-2708: --- sebastian-nagel commented on pull request #450: NUTCH-2708

[jira] [Updated] (NUTCH-2703) parse-tika: Boilerpipe should not run for non-(X)HTML pages

2019-04-11 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2703: - Priority: Minor (was: Critical) > parse-tika: Boilerpipe should not run for non-(X)HTML pages >

[jira] [Commented] (NUTCH-2703) parse-tika: Boilerpipe should not run for non-(X)HTML pages

2019-04-11 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815302#comment-16815302 ] Markus Jelsma commented on NUTCH-2703: -- Thanks for not missing both MIME types, text/html AND

Build failed in Jenkins: Nutch-trunk #3621

2019-04-11 Thread Apache Jenkins Server
See Changes: [snagel] NUTCH-2700 Indexchecker: improve command-line help - add options -- [...truncated 5.66 KB...] [javac] Compiling 298 source files to

[jira] [Commented] (NUTCH-2700) Indexchecker: improve command-line help

2019-04-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815350#comment-16815350 ] Hudson commented on NUTCH-2700: --- FAILURE: Integrated in Jenkins build Nutch-trunk #3621 (See

[jira] [Commented] (NUTCH-2703) parse-tika: Boilerpipe should not run for non-(X)HTML pages

2019-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815383#comment-16815383 ] ASF GitHub Bot commented on NUTCH-2703: --- sebastian-nagel commented on pull request #449: NUTCH-2703