Jenkins build is back to normal : Nutch-trunk #2617

2014-04-26 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/2617/

[jira] [Commented] (NUTCH-1714) Nutch 2.x upgrade to use GORA_94 branch

2014-04-26 Thread Navid Shekoufa (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981964#comment-13981964 ] Navid Shekoufa commented on NUTCH-1714: --- Thanks for this very suitable patch! I've

Re: Why are web urls not assumed to be http

2014-04-26 Thread Sebastian Nagel
Hi Diaa, Why doesn't nutch assume that web links that have www. at the beginning are of the http protocol? It would be not a big problem to do so. The url normalizer provides scopes (inject, fetch, etc.): you only have to point the property urlnormalizer.regex.file.inject to a special

[jira] [Resolved] (NUTCH-566) Sun's URL class has bug in creation of relative query URLs

2014-04-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-566. --- Resolution: Fixed Was fixed by NUTCH-797 with version 1.4 (2.x will be patched soon), the

[jira] [Updated] (NUTCH-952) fix outlink which started with '?' in html parser

2014-04-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-952: -- Attachment: test_nutch_952.html Was fixed by NUTCH-797 for v 1.4 (2.x will follow soon).

[jira] [Resolved] (NUTCH-952) fix outlink which started with '?' in html parser

2014-04-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-952. --- Resolution: Fixed fix outlink which started with '?' in html parser

[jira] [Updated] (NUTCH-566) Sun's URL class has bug in creation of relative query URLs

2014-04-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-566: -- Fix Version/s: (was: 1.9) Sun's URL class has bug in creation of relative query URLs

[jira] [Updated] (NUTCH-952) fix outlink which started with '?' in html parser

2014-04-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-952: -- Fix Version/s: (was: 1.9) fix outlink which started with '?' in html parser

[jira] [Commented] (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2014-04-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982116#comment-13982116 ] Sebastian Nagel commented on NUTCH-797: --- Hi [~jnioche], is there anything left

[jira] [Resolved] (NUTCH-1764) readdb to show command-line help if no action (-stats, -dump, etc.) given

2014-04-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1764. Resolution: Fixed Fix Version/s: (was: 1.8) +1 Thanks, [~diaa_abdallah]!

[jira] [Created] (NUTCH-1765) SolrClean to remove redirected URLs from Solr

2014-04-26 Thread Iain Lopata (JIRA)
Iain Lopata created NUTCH-1765: -- Summary: SolrClean to remove redirected URLs from Solr Key: NUTCH-1765 URL: https://issues.apache.org/jira/browse/NUTCH-1765 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-1764) readdb to show command-line help if no action (-stats, -dump, etc.) given

2014-04-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982128#comment-13982128 ] Hudson commented on NUTCH-1764: --- SUCCESS: Integrated in Nutch-trunk #2618 (See