[jira] Commented: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

2007-10-31 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539028 ] Andrzej Bialecki commented on NUTCH-552: - We definitely need to do this, things would crash burn otherwise.

How to extract specified information from html?

2007-10-31 Thread zhao xiuwen
Hi, I have seen the http://wiki.apache.org/nutch/WritingPluginExample, but I don't understand clearly. I need extract specified infromation in specified web site in nucth. Firstly,I determine a URL set. Secondly,I determine that the current page URL was contained the URL set.

Re: How to extract specified information from html?

2007-10-31 Thread zhao xiuwen
Should I implement HtmlParseFilter? If it is,How to invoke my method in filter() of HtmlParseFilter? Thanks. 2007/10/31, zhao xiuwen [EMAIL PROTECTED]: Hi, I have seen the http://wiki.apache.org/nutch/WritingPluginExample, but I don't understand clearly. I need extract specified

[jira] Updated: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

2007-10-31 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-552: --- Attachment: NUTCH-552-3.patch New patch. Fixes problems with path handling changes in hadoop

Re: How to extract specified information from html?

2007-10-31 Thread Adam Lofts
Hi, On 31/10/2007, zhao xiuwen [EMAIL PROTECTED] wrote: Should I implement HtmlParseFilter? Yes If it is,How to invoke my method in filter() of HtmlParseFilter? Load your plugin in the nutch config and filter() will be called for every html file that you crawl. Best, Adam

[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup.

2007-10-31 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539123 ] Doğacan Güney commented on NUTCH-567: - Hi Dawid, If tagsoup is not going to release a new version soon, then

[jira] Commented: (NUTCH-566) Sun's URL class has bug in creation of relative query URLs

2007-10-31 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539127 ] Doğacan Güney commented on NUTCH-566: - I am going to commit this one, but I am not sure what needs to be updated

[jira] Commented: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server

2007-10-31 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539131 ] Doğacan Güney commented on NUTCH-559: - Hi Susam, Your last patch looks great! I have one minor nit: I think it

[jira] Assigned: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server

2007-10-31 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney reassigned NUTCH-559: --- Assignee: Doğacan Güney NTLM, Basic and Digest Authentication schemes for web/proxy server

[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat

2007-10-31 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539133 ] Doğacan Güney commented on NUTCH-548: - I think this is ready for commit, but I would like to get an approval from

[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup.

2007-10-31 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539135 ] Andrzej Bialecki commented on NUTCH-567: - I'm slightly worried about losing track of what has been patched

Re: Next move with JIRA ticket

2007-10-31 Thread Doğacan Güney
Hi, On 10/31/07, Ned Rockson [EMAIL PROTECTED] wrote: I submitted a JIRA ticket regarding URL ordering in Generator.java as well as a patch (NUTCH-570) and I'm wondering what else I need to do to get this committed. Obviously it's low priority so I may be getting too antsy. Since NUTCH-570

Re: Next move with JIRA ticket

2007-10-31 Thread Ned Rockson
Thanks for the information. I'll have to run a fresh fetch to get some correct stats so I'll submit it in a day or two. On 10/31/07, Doğacan Güney [EMAIL PROTECTED] wrote: Hi, On 10/31/07, Ned Rockson [EMAIL PROTECTED] wrote: I submitted a JIRA ticket regarding URL ordering in

[jira] Commented: (NUTCH-566) Sun's URL class has bug in creation of relative query URLs

2007-10-31 Thread Doug Cook (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539146 ] Doug Cook commented on NUTCH-566: - Hi Doğacan. Thanks for following up. The issue has gotten a little more

[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup.

2007-10-31 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539162 ] Dawid Weiss commented on NUTCH-567: --- I agree. What we used to do in Carrot2 was to include the patch (against the