[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1325: - Patch Info: Patch Available Description: h1. HostDB for Apache Nutch 1.x * automatically

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1325: - Attachment: NUTCH-1325.patch Updated patch for trunk contains more thorough config descriptions

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1325: - Attachment: NUTCH-1325.patch Updated patch to use TDigest for streaming percentiles. But because

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1325: - Attachment: NUTCH-1325.patch TDigest is awesome! Here's with support for user configurable list

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1325: - Fix Version/s: 1.12 > HostDB for Nutch > > > Key: NUTCH-1325 >

[jira] [Resolved] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1325. -- Resolution: Fixed Committed to trunk in revision 1725952. Many thanks to all contributors! >

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1325: - Component/s: hostdb > HostDB for Nutch > > > Key: NUTCH-1325 >

[jira] [Commented] (NUTCH-1233) Rely on Tika for outlink extraction

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110375#comment-15110375 ] Markus Jelsma commented on NUTCH-1233: -- Yes, we'll get this support with Tika 1.12. Timothy Allison

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110373#comment-15110373 ] Markus Jelsma commented on NUTCH-961: - Hello - that doesn't seem related to this issue as it doesn't

[jira] [Updated] (NUTCH-2201) Remove loops program from webgraph package

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2201: - Attachment: NUTCH-2201.patch Patch for trunk which removed the loops program and all references.

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110702#comment-15110702 ] Lewis John McGibbney commented on NUTCH-1325: - What a patch. Real nice. I really like th

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110733#comment-15110733 ] Lewis John McGibbney commented on NUTCH-1325: - Nice Markus, the conversation in this ticket is

[jira] [Updated] (NUTCH-2201) Remove loops program from webgraph package

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2201: - Patch Info: Patch Available > Remove loops program from webgraph package >

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110708#comment-15110708 ] Hudson commented on NUTCH-1325: --- SUCCESS: Integrated in Nutch-trunk #3339 (See

[jira] [Commented] (NUTCH-2197) Add solr5 solrcloud indexer support

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110797#comment-15110797 ] Markus Jelsma commented on NUTCH-2197: -- This Solr 5 plugin is capable of indexing to Solr 5 in cloud

[jira] [Resolved] (NUTCH-2201) Remove loops program from webgraph package

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2201. -- Resolution: Fixed Committed to trunk revision 1725981. Thanks Dennis! > Remove loops program

[ANNOUNCE] Apache Nutch 2.3.1 Release

2016-01-21 Thread lewis john mcgibbney
Hi Folks, !!Apologies for cross posting!! The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v2.3.1, we advise all current users and developers of the 2.X series to upgrade to this release. Nutch is a well matured, production ready Web crawler. Nutch 2.X branch

[RESULT] WAS Re: [VOTE] Release Apache Nutch 2.3.1rc2

2016-01-21 Thread Lewis John Mcgibbney
Hi Folks, I am bringing this VOTE to a close with the following results [3] +1 Release this package as Apache Nutch 2.3.1. Lewis John McGibbney* Sebastian Nagel* Chris Mattmann* [0] -1 Do not release this package becauseā€¦ *Nutch PMC Member I am really happy to therefore announce that the VOTE

[jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110947#comment-15110947 ] Markus Jelsma commented on NUTCH-2202: -- Yes, a patch would be a good place to start. I've read the

[jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

2016-01-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110867#comment-15110867 ] Lewis John McGibbney commented on NUTCH-2202: - I agree [~robertmeusel], this would be good to

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2016-01-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111292#comment-15111292 ] Markus Jelsma commented on NUTCH-961: - Some news, the upstream Tika issue has been committed and