[
https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539028
]
Andrzej Bialecki commented on NUTCH-552:
-
We definitely need to do this, things would crash burn otherwise.
Hi,
I have seen the http://wiki.apache.org/nutch/WritingPluginExample, but I
don't understand clearly.
I need extract specified infromation in specified web site in nucth.
Firstly,I determine a URL set.
Secondly,I determine that the current page URL was contained the URL set.
Should I implement HtmlParseFilter? If it is,How to invoke my method in
filter() of HtmlParseFilter?
Thanks.
2007/10/31, zhao xiuwen [EMAIL PROTECTED]:
Hi,
I have seen the http://wiki.apache.org/nutch/WritingPluginExample, but
I don't understand clearly.
I need extract specified
[
https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-552:
---
Attachment: NUTCH-552-3.patch
New patch. Fixes problems with path handling changes in hadoop
Hi,
On 31/10/2007, zhao xiuwen [EMAIL PROTECTED] wrote:
Should I implement HtmlParseFilter?
Yes
If it is,How to invoke my method in
filter() of HtmlParseFilter?
Load your plugin in the nutch config and filter() will be called for every
html file that you crawl.
Best,
Adam
[
https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539123
]
Doğacan Güney commented on NUTCH-567:
-
Hi Dawid,
If tagsoup is not going to release a new version soon, then
[
https://issues.apache.org/jira/browse/NUTCH-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539127
]
Doğacan Güney commented on NUTCH-566:
-
I am going to commit this one, but I am not sure what needs to be updated
[
https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539131
]
Doğacan Güney commented on NUTCH-559:
-
Hi Susam,
Your last patch looks great!
I have one minor nit: I think it
[
https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney reassigned NUTCH-559:
---
Assignee: Doğacan Güney
NTLM, Basic and Digest Authentication schemes for web/proxy server
[
https://issues.apache.org/jira/browse/NUTCH-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539133
]
Doğacan Güney commented on NUTCH-548:
-
I think this is ready for commit, but I would like to get an approval from
[
https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539135
]
Andrzej Bialecki commented on NUTCH-567:
-
I'm slightly worried about losing track of what has been patched
Hi,
On 10/31/07, Ned Rockson [EMAIL PROTECTED] wrote:
I submitted a JIRA ticket regarding URL ordering in Generator.java as
well as a patch (NUTCH-570) and I'm wondering what else I need to do to
get this committed. Obviously it's low priority so I may be getting too
antsy.
Since NUTCH-570
Thanks for the information. I'll have to run a fresh fetch to get
some correct stats so I'll submit it in a day or two.
On 10/31/07, Doğacan Güney [EMAIL PROTECTED] wrote:
Hi,
On 10/31/07, Ned Rockson [EMAIL PROTECTED] wrote:
I submitted a JIRA ticket regarding URL ordering in
[
https://issues.apache.org/jira/browse/NUTCH-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539146
]
Doug Cook commented on NUTCH-566:
-
Hi Doğacan.
Thanks for following up. The issue has gotten a little more
[
https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539162
]
Dawid Weiss commented on NUTCH-567:
---
I agree. What we used to do in Carrot2 was to include the patch (against the
15 matches
Mail list logo