[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned
[ https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1314#comment-1314 ] Ferdy Galema commented on NUTCH-1289: - This is a showstopper for the upcoming release. I will cook up a patch using your input and commit it asap. In distributed mode URL's are not partitioned - Key: NUTCH-1289 URL: https://issues.apache.org/jira/browse/NUTCH-1289 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: nutchgora Reporter: Dan Rosher Fix For: nutchgora Attachments: NUTCH-1289.patch In distributed mode URL's are not partitioned to a specific machine which means the politeness policy is voided -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned
[ https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222324#comment-13222324 ] Ferdy Galema commented on NUTCH-1289: - Committed. Dan, could you verify this issue for closing? In distributed mode URL's are not partitioned - Key: NUTCH-1289 URL: https://issues.apache.org/jira/browse/NUTCH-1289 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: nutchgora Reporter: Dan Rosher Fix For: nutchgora Attachments: NUTCH-1289-v2.patch, NUTCH-1289.patch In distributed mode URL's are not partitioned to a specific machine which means the politeness policy is voided -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned
[ https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222415#comment-13222415 ] Dan Rosher commented on NUTCH-1289: --- Hi Ferdy, Thanks for adding the tests, looks good to me, Cheers, Dan In distributed mode URL's are not partitioned - Key: NUTCH-1289 URL: https://issues.apache.org/jira/browse/NUTCH-1289 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: nutchgora Reporter: Dan Rosher Fix For: nutchgora Attachments: NUTCH-1289-v2.patch, NUTCH-1289.patch In distributed mode URL's are not partitioned to a specific machine which means the politeness policy is voided -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned
[ https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222982#comment-13222982 ] Hudson commented on NUTCH-1289: --- Integrated in Nutch-nutchgora #184 (See [https://builds.apache.org/job/Nutch-nutchgora/184/]) NUTCH-1289 In distributed mode URL's are not partitioned (Revision 1297039) Result = SUCCESS ferdy : Files : * /nutch/branches/nutchgora/CHANGES.txt * /nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorJob.java * /nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/URLPartitioner.java * /nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/FetcherJob.java * /nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/PartitionUrlByHost.java * /nutch/branches/nutchgora/src/test/org/apache/nutch/crawl/TestURLPartitioner.java In distributed mode URL's are not partitioned - Key: NUTCH-1289 URL: https://issues.apache.org/jira/browse/NUTCH-1289 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: nutchgora Reporter: Dan Rosher Fix For: nutchgora Attachments: NUTCH-1289-v2.patch, NUTCH-1289.patch In distributed mode URL's are not partitioned to a specific machine which means the politeness policy is voided -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned
[ https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217155#comment-13217155 ] Lewis John McGibbney commented on NUTCH-1289: - Hi Dan, thanks for opening this issue and for the patch. Are you using trunk at all? If so is it possible to confirm if this functionality is already running in trunk... if not then we can get a patch cooked up. In distributed mode URL's are not partitioned - Key: NUTCH-1289 URL: https://issues.apache.org/jira/browse/NUTCH-1289 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: nutchgora Reporter: Dan Rosher Fix For: nutchgora Attachments: NUTCH-1289.patch In distributed mode URL's are not partitioned to a specific machine which means the politeness policy is voided -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned
[ https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217157#comment-13217157 ] Markus Jelsma commented on NUTCH-1289: -- In trunk records of the same queue end up in the same fetch list which corresponds to a single mapper. In distributed mode URL's are not partitioned - Key: NUTCH-1289 URL: https://issues.apache.org/jira/browse/NUTCH-1289 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: nutchgora Reporter: Dan Rosher Fix For: nutchgora Attachments: NUTCH-1289.patch In distributed mode URL's are not partitioned to a specific machine which means the politeness policy is voided -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned
[ https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217159#comment-13217159 ] Lewis John McGibbney commented on NUTCH-1289: - Markus, what is your opinion as to which suits best? Or is it the case in Nutchgora that Dan's patch is more appropriate? In distributed mode URL's are not partitioned - Key: NUTCH-1289 URL: https://issues.apache.org/jira/browse/NUTCH-1289 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: nutchgora Reporter: Dan Rosher Fix For: nutchgora Attachments: NUTCH-1289.patch In distributed mode URL's are not partitioned to a specific machine which means the politeness policy is voided -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned
[ https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217172#comment-13217172 ] Mathijs Homminga commented on NUTCH-1289: - Nice catch. The PartitionUrlByHost seems broken indeed. I would suggest that we use the existing o.a.n.crawl.URLPartitioner class which has support for three URL partition modes (host, domain, IP) and which is used by the GeneratorJob too. Pros: support for different partition modes in the Fetcher + no duplicate code. Or is there a reason why the Fetcher has its own partition logic? The URLPartitioner class is a PartitionerSelectorEntry, WebPage instead of a PartitionerIntWritable, FetchEntry but you can perhaps extract a method and use it from both classes, or create one URLPartitioner with two specific inner classes for the Generator and Fetcher. In distributed mode URL's are not partitioned - Key: NUTCH-1289 URL: https://issues.apache.org/jira/browse/NUTCH-1289 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: nutchgora Reporter: Dan Rosher Fix For: nutchgora Attachments: NUTCH-1289.patch In distributed mode URL's are not partitioned to a specific machine which means the politeness policy is voided -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira