[
https://issues.apache.org/jira/browse/NUTCH-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898828#action_12898828
]
Doğacan Güney commented on NUTCH-888:
-
+1
> Remove parse-rss
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898827#action_12898827
]
Julien Nioche commented on NUTCH-887:
-
Have created https://issues.apache.org/jira/brows
Remove parse-rss
Key: NUTCH-888
URL: https://issues.apache.org/jira/browse/NUTCH-888
Project: Nutch
Issue Type: Task
Components: parser
Affects Versions: 2.0
Reporter: Julien Nioche
Assi
It's probably more an issue with DNS resolution than robots.txt. Even if you
respect the robots.txt instructions you can still have N host or even domain
names pointing to a single server. This can be avoided in Nutch by setting
'partition.url.mode' and 'fetcher.queue.mode' to 'byIP'.
On 16 Augus
Rather amusing :)
Something similar was what made Grub gain a bit of bad reputation...
thank god we have the robots.txt file.
On Sat, Aug 14, 2010 at 7:48 PM, Mattmann, Chris A (388J)
wrote:
> LOL...
>
>
> On 8/14/10 8:57 AM, "Ken Krugler" wrote:
>
> Dear @80legs stop crushing metafilter.com fr
5 matches
Mail list logo