[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned

2012-03-05 Thread Ferdy Galema (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1314#comment-1314
 ] 

Ferdy Galema commented on NUTCH-1289:
-

This is a showstopper for the upcoming release. I will cook up a patch using 
your input and commit it asap.

 In distributed mode URL's are not partitioned
 -

 Key: NUTCH-1289
 URL: https://issues.apache.org/jira/browse/NUTCH-1289
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: nutchgora
Reporter: Dan Rosher
 Fix For: nutchgora

 Attachments: NUTCH-1289.patch


 In distributed mode URL's are not partitioned to a specific machine which 
 means the politeness policy is voided

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned

2012-03-05 Thread Ferdy Galema (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222324#comment-13222324
 ] 

Ferdy Galema commented on NUTCH-1289:
-

Committed. 

Dan, could you verify this issue for closing?

 In distributed mode URL's are not partitioned
 -

 Key: NUTCH-1289
 URL: https://issues.apache.org/jira/browse/NUTCH-1289
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: nutchgora
Reporter: Dan Rosher
 Fix For: nutchgora

 Attachments: NUTCH-1289-v2.patch, NUTCH-1289.patch


 In distributed mode URL's are not partitioned to a specific machine which 
 means the politeness policy is voided

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned

2012-03-05 Thread Dan Rosher (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222415#comment-13222415
 ] 

Dan Rosher commented on NUTCH-1289:
---

Hi Ferdy,

Thanks for adding the tests, looks good to me,

Cheers,
Dan

 In distributed mode URL's are not partitioned
 -

 Key: NUTCH-1289
 URL: https://issues.apache.org/jira/browse/NUTCH-1289
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: nutchgora
Reporter: Dan Rosher
 Fix For: nutchgora

 Attachments: NUTCH-1289-v2.patch, NUTCH-1289.patch


 In distributed mode URL's are not partitioned to a specific machine which 
 means the politeness policy is voided

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned

2012-03-05 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222982#comment-13222982
 ] 

Hudson commented on NUTCH-1289:
---

Integrated in Nutch-nutchgora #184 (See 
[https://builds.apache.org/job/Nutch-nutchgora/184/])
NUTCH-1289 In distributed mode URL's are not partitioned (Revision 1297039)

 Result = SUCCESS
ferdy : 
Files : 
* /nutch/branches/nutchgora/CHANGES.txt
* /nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorJob.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/URLPartitioner.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/FetcherJob.java
* 
/nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/PartitionUrlByHost.java
* 
/nutch/branches/nutchgora/src/test/org/apache/nutch/crawl/TestURLPartitioner.java


 In distributed mode URL's are not partitioned
 -

 Key: NUTCH-1289
 URL: https://issues.apache.org/jira/browse/NUTCH-1289
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: nutchgora
Reporter: Dan Rosher
 Fix For: nutchgora

 Attachments: NUTCH-1289-v2.patch, NUTCH-1289.patch


 In distributed mode URL's are not partitioned to a specific machine which 
 means the politeness policy is voided

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned

2012-02-27 Thread Lewis John McGibbney (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217155#comment-13217155
 ] 

Lewis John McGibbney commented on NUTCH-1289:
-

Hi Dan, thanks for opening this issue and for the patch. Are you using trunk at 
all? If so is it possible to confirm if this functionality is already running 
in trunk... if not then we can get a patch cooked up.

 In distributed mode URL's are not partitioned
 -

 Key: NUTCH-1289
 URL: https://issues.apache.org/jira/browse/NUTCH-1289
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: nutchgora
Reporter: Dan Rosher
 Fix For: nutchgora

 Attachments: NUTCH-1289.patch


 In distributed mode URL's are not partitioned to a specific machine which 
 means the politeness policy is voided

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned

2012-02-27 Thread Markus Jelsma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217157#comment-13217157
 ] 

Markus Jelsma commented on NUTCH-1289:
--

In trunk records of the same queue end up in the same fetch list which 
corresponds to a single mapper.

 In distributed mode URL's are not partitioned
 -

 Key: NUTCH-1289
 URL: https://issues.apache.org/jira/browse/NUTCH-1289
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: nutchgora
Reporter: Dan Rosher
 Fix For: nutchgora

 Attachments: NUTCH-1289.patch


 In distributed mode URL's are not partitioned to a specific machine which 
 means the politeness policy is voided

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned

2012-02-27 Thread Lewis John McGibbney (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217159#comment-13217159
 ] 

Lewis John McGibbney commented on NUTCH-1289:
-

Markus, what is your opinion as to which suits best? Or is it the case in 
Nutchgora that Dan's patch is more appropriate?

 In distributed mode URL's are not partitioned
 -

 Key: NUTCH-1289
 URL: https://issues.apache.org/jira/browse/NUTCH-1289
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: nutchgora
Reporter: Dan Rosher
 Fix For: nutchgora

 Attachments: NUTCH-1289.patch


 In distributed mode URL's are not partitioned to a specific machine which 
 means the politeness policy is voided

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1289) In distributed mode URL's are not partitioned

2012-02-27 Thread Mathijs Homminga (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217172#comment-13217172
 ] 

Mathijs Homminga commented on NUTCH-1289:
-

Nice catch. The PartitionUrlByHost seems broken indeed.
I would suggest that we use the existing o.a.n.crawl.URLPartitioner class which 
has support for three URL partition modes (host, domain, IP) and which is used 
by the GeneratorJob too.

Pros: support for different partition modes in the Fetcher + no duplicate code.
Or is there a reason why the Fetcher has its own partition logic?

The URLPartitioner class is a PartitionerSelectorEntry, WebPage instead of a 
PartitionerIntWritable, FetchEntry but you can perhaps extract a method and 
use it from both classes, or create one URLPartitioner with two specific inner 
classes for the Generator and Fetcher.


 In distributed mode URL's are not partitioned
 -

 Key: NUTCH-1289
 URL: https://issues.apache.org/jira/browse/NUTCH-1289
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: nutchgora
Reporter: Dan Rosher
 Fix For: nutchgora

 Attachments: NUTCH-1289.patch


 In distributed mode URL's are not partitioned to a specific machine which 
 means the politeness policy is voided

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira