[jira] Updated: (NUTCH-669) Consolidate code for Fetcher and Fetcher2
[ https://issues.apache.org/jira/browse/NUTCH-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-669: - Fix Version/s: (was: 1.1) 1.0.0 Moving this back to 1.0 Are you close with your patch? As discussed in this thread we should just replace Fetcher With Fetcher2, change Crawl class and check that the tests pass. other issues we can deal within their own tickets. I can also help with this if you don't have the time. Consolidate code for Fetcher and Fetcher2 - Key: NUTCH-669 URL: https://issues.apache.org/jira/browse/NUTCH-669 Project: Nutch Issue Type: Improvement Components: fetcher Affects Versions: 0.9.0 Reporter: Todd Lipcon Fix For: 1.0.0 I'd like to consolidate a lot of the common code between Fetcher and Fetcher2.java. It seems to me like there are the following differences: - Fetcher relies on the Protocol to obey robots.txt and crawl delay settings whereas Fetcher2 implements them itself - Fetcher2 uses a different queueing model (queue per crawl host) to accomplish the per-host limiting without making the Protocol do it. I've begun work on this but want to check with people on the following: - What reason is there for Fetcher existing at all since Fetcher2 seems to be a superset of functionality? - Is it on the road map to remove the robots/delay logic from the Http protocol and make Fetcher2's delegation of duties the standard? - Any other improvements wanted for Fetcher while I am in and around the code? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-669) Consolidate code for Fetcher and Fetcher2
[ https://issues.apache.org/jira/browse/NUTCH-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated NUTCH-669: --- Priority: Major (was: Minor) Fix Version/s: 1.0.0 +1 -- people, vote for it. This could go in 1.0, right? Consolidate code for Fetcher and Fetcher2 - Key: NUTCH-669 URL: https://issues.apache.org/jira/browse/NUTCH-669 Project: Nutch Issue Type: Improvement Components: fetcher Affects Versions: 0.9.0 Reporter: Todd Lipcon Fix For: 1.0.0 I'd like to consolidate a lot of the common code between Fetcher and Fetcher2.java. It seems to me like there are the following differences: - Fetcher relies on the Protocol to obey robots.txt and crawl delay settings whereas Fetcher2 implements them itself - Fetcher2 uses a different queueing model (queue per crawl host) to accomplish the per-host limiting without making the Protocol do it. I've begun work on this but want to check with people on the following: - What reason is there for Fetcher existing at all since Fetcher2 seems to be a superset of functionality? - Is it on the road map to remove the robots/delay logic from the Http protocol and make Fetcher2's delegation of duties the standard? - Any other improvements wanted for Fetcher while I am in and around the code? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.