[jira] Created: (NUTCH-488) Avoid parsing uneccessary links and get a more relevant outlink list

2007-05-22 Thread Emmanuel Joke (JIRA)
Avoid parsing uneccessary links and get a more relevant outlink list Key: NUTCH-488 URL: https://issues.apache.org/jira/browse/NUTCH-488 Project: Nutch Issue Type:

[jira] Updated: (NUTCH-488) Avoid parsing uneccessary links and get a more relevant outlink list

2007-05-22 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-488: Attachment: DOMContentUtils.patch Avoid parsing uneccessary links and get a more relevant outlink

[jira] Updated: (NUTCH-489) URLFilter-suffix management of the url path when the url contains some query parameters

2007-05-22 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-489: Attachment: SuffixURLFilter.java.patch suffix-urlfilter.txt.patch URLFilter-suffix

[jira] Commented: (NUTCH-489) URLFilter-suffix management of the url path when the url contains some query parameters

2007-05-22 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12497770 ] Doğacan Güney commented on NUTCH-489: - This is obviously useful but: * Your patches both in this issue and in

[jira] Updated: (NUTCH-490) Extension point with filters for Neko HTML parser (with patch)

2007-05-22 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-490: - Attachment: HtmlParser.java.diff Patch for HtmlParser. Extension point with filters for

[jira] Updated: (NUTCH-490) Extension point with filters for Neko HTML parser (with patch)

2007-05-22 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-490: - Attachment: nutch-extensionpoins_plugin.xml.diff Patch for plugin.xml in

[jira] Commented: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implme

2007-05-22 Thread Vadim Bauer (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12497851 ] Vadim Bauer commented on NUTCH-427: --- There is an Error in the plugin.xml File the plugin id should be protocol-smb

[jira] Commented: (NUTCH-25) needs 'character encoding' detector

2007-05-22 Thread Doug Cook (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498041 ] Doug Cook commented on NUTCH-25: Thanks! I'll take a look at your proposed patch... (that was fast! ask and ye shall

[jira] Updated: (NUTCH-489) URLFilter-suffix management of the url path when the url contains some query parameters

2007-05-22 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-489: Attachment: SuffixURLFilter_v2.java.patch My mistake... I've added a new patchwhich is supposed