NutchServer

2017-08-17 Thread kenneth mcfarland
Quick question about the NutchServer class. The JAXRSServerFactoryBean documentation seems to ask for a call to close or destroy on the server it creates, but I notice the NutchServer close simply calls system.exit(0) and doesn't maintain a reference after the call to create(). Is this a possible

[jira] [Created] (NUTCH-2410) Unit test for jsoup-extractor not to depend on external resources

2017-08-17 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2410: -- Summary: Unit test for jsoup-extractor not to depend on external resources Key: NUTCH-2410 URL: https://issues.apache.org/jira/browse/NUTCH-2410 Project: Nutch

[jira] [Commented] (NUTCH-2409) Injector: complete command-line help and counters

2017-08-17 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130245#comment-16130245 ] ASF GitHub Bot commented on NUTCH-2409: --- sebastian-nagel opened a new pull request #215: NUTCH-2409

[jira] [Created] (NUTCH-2409) Injector: complete command-line help and counters

2017-08-17 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2409: -- Summary: Injector: complete command-line help and counters Key: NUTCH-2409 URL: https://issues.apache.org/jira/browse/NUTCH-2409 Project: Nutch Issue

[jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-08-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130206#comment-16130206 ] Markus Jelsma commented on NUTCH-2335: -- Ok, with this modification, it doesnt print with -noFilter,

[jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-08-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130204#comment-16130204 ] Markus Jelsma commented on NUTCH-2335: -- I moved the sys.out to: {code} if (filters != null)

[jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-08-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130191#comment-16130191 ] Markus Jelsma commented on NUTCH-2335: -- I see, will modify the println to the correct branch. The

[jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-08-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130178#comment-16130178 ] Sebastian Nagel commented on NUTCH-2335: Hi Markus, here a simple test. I've run it both with

[jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-08-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130110#comment-16130110 ] Markus Jelsma commented on NUTCH-2335: -- passing -noFilter does not change anything. I am staring at

[jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-08-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130105#comment-16130105 ] Markus Jelsma commented on NUTCH-2335: -- Sebastian, there is a problem with either this patch, or the

[jira] [Updated] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-08-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2335: - Attachment: Injector.java > Injector not to filter and normalize existing URLs in CrawlDb >