[
https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474068
]
Dennis Kubes commented on NUTCH-247:
------------------------------------
I agree, but then should we approach the check as a configurable option. For
example have a http.agent.name.check configuration option that is defaulted to
true, since the default setup it assumed to be for HTTP crawlers, but can be
set to false to turn off the check? In this way it could support setups where
the administrator doesn't care about having an agent name, such as intranet
crawling, or where it is not applicable, such as crawling local file system or
windows shares in the future.
If we go with this approach, should we check both agent name and advertised
agent or only agent name?
> robot parser to restrict.
> -------------------------
>
> Key: NUTCH-247
> URL: https://issues.apache.org/jira/browse/NUTCH-247
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 0.8
> Reporter: Stefan Groschupf
> Assigned To: Dennis Kubes
> Priority: Minor
> Fix For: 0.9.0
>
> Attachments: agent-names.patch
>
>
> If the agent name and the robots agents are not proper configure the Robot
> rule parser uses LOG.severe to log the problem but solve it also.
> Later on the fetcher thread checks for severe errors and stop if there is one.
> RobotRulesParser:
> if (agents.size() == 0) {
> agents.add(agentName);
> LOG.severe("No agents listed in 'http.robots.agents' property!");
> } else if (!((String)agents.get(0)).equalsIgnoreCase(agentName)) {
> agents.add(0, agentName);
> LOG.severe("Agent we advertise (" + agentName
> + ") not listed first in 'http.robots.agents' property!");
> }
> Fetcher.FetcherThread:
> if (LogFormatter.hasLoggedSevere()) // something bad happened
> break;
> I suggest to use warn or something similar instead of severe to log this
> problem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers