[jira] [Commented] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-10-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549367#comment-15549367 ] Markus Jelsma commented on NUTCH-2320: -- Julien, you are right. I will revert it tomorrow morning, and

[jira] [Commented] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-10-05 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549245#comment-15549245 ] Sebastian Nagel commented on NUTCH-2320: Right, change logs are generated from Jira. >

[jira] [Commented] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-10-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549206#comment-15549206 ] Julien Nioche commented on NUTCH-2320: -- Hi @markus17, you haven't left much time for people to

[jira] [Commented] (NUTCH-2319) Link with "rel=alternate" doesn't return in crawl

2016-10-05 Thread Zuber (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548759#comment-15548759 ] Zuber commented on NUTCH-2319: -- I am using parse-tika as HTML parser. Do I still upgrade to 1.12? > Link

Build failed in Jenkins: Nutch-trunk #3396

2016-10-05 Thread Apache Jenkins Server
See Changes: [markus] NUTCH-2320 URLFilterChecker to run as TCP Telnet service -- [...truncated 5348 lines...] [javac] Compiling 1 source file to

[jira] [Commented] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-10-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548786#comment-15548786 ] Hudson commented on NUTCH-2320: --- FAILURE: Integrated in Jenkins build Nutch-trunk #3396 (See

[jira] [Commented] (NUTCH-2319) Link with "rel=alternate" doesn't return in crawl

2016-10-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548768#comment-15548768 ] Markus Jelsma commented on NUTCH-2319: -- Yes, 1.4 has a really old Tika on board. > Link with

[jira] [Commented] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-10-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548627#comment-15548627 ] Markus Jelsma commented on NUTCH-2320: -- We don't change CHANGES.txt anymore? > URLFilterChecker to

[jira] [Resolved] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-10-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2320. -- Resolution: Fixed Commit to trunk in (whatever this is) e53b34b..836b2e0 master -> master >

[jira] [Commented] (NUTCH-2318) Text extraction in HtmlParser adds too much whitespace.

2016-10-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548610#comment-15548610 ] Markus Jelsma commented on NUTCH-2318: -- This is a know problem, it also affects the TikaParser and

[jira] [Commented] (NUTCH-2319) Link with "rel=alternate" doesn't return in crawl

2016-10-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548591#comment-15548591 ] Markus Jelsma commented on NUTCH-2319: -- Try upgrading to 1.12 and/or using parse-tika as your HTML

[jira] [Created] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-10-05 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2320: Summary: URLFilterChecker to run as TCP Telnet service Key: NUTCH-2320 URL: https://issues.apache.org/jira/browse/NUTCH-2320 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-10-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2320: - Attachment: NUTCH-2320.patch Patch for trunk. > URLFilterChecker to run as TCP Telnet service >

[jira] [Updated] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-10-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2320: - Description: Allow testing URL filters for webapplications just like indexing filters checker.