[
https://issues.apache.org/jira/browse/NUTCH-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508445
]
Doğacan Güney commented on NUTCH-289:
-
It seems this issue has kind of died down, but this would be a great
[
http://issues.apache.org/jira/browse/NUTCH-289?page=comments#action_12450315 ]
Uros Gruber commented on NUTCH-289:
---
One question. Why does IP need to be in CrawlDatum and not in metadata?
CrawlDatum should store IP address
[
http://issues.apache.org/jira/browse/NUTCH-289?page=comments#action_12413996 ]
Andrzej Bialecki commented on NUTCH-289:
-
Re: lookup in ParseOutputFormat: I respectfully disagree. Consider the scenario
when you run Fetcher in non-parsing mode.
[
http://issues.apache.org/jira/browse/NUTCH-289?page=comments#action_12414114 ]
Doug Cutting commented on NUTCH-289:
It should be possible to partition by IP and limit fetchlists by IP. Resolving
only in the fetcher is too late to implement these
[
http://issues.apache.org/jira/browse/NUTCH-289?page=comments#action_12413939 ]
Matt Kangas commented on NUTCH-289:
---
+1 to saving IP address in CrawlDatum, wherever the value comes from. (Fetcher
or otherwise)
CrawlDatum should store IP address
[
http://issues.apache.org/jira/browse/NUTCH-289?page=comments#action_12413940 ]
Stefan Groschupf commented on NUTCH-289:
+1
Andrzej, I agree that lookup the ip in ParseOutputFormat would be the best as
Doug suggested.
The biggest problem nutch has
[
http://issues.apache.org/jira/browse/NUTCH-289?page=comments#action_12413604 ]
Andrzej Bialecki commented on NUTCH-289:
-
I'm not sure how to address round-robin DNS with your approach ...
Also, I think the best place to resolve and record the