[issue13281] robotparser.RobotFileParser ignores rules preceeded by a blank line

Petri Lehtinen Sat, 29 Oct 2011 03:11:24 -0700

Petri Lehtinen <pe...@digip.org> added the comment:

> Because of the line break, clicking that link gives "Server error 404".


I don't see a line break, but the comma after the link seems to breaks it. 
Sorry.

> The way I read the grammar, 'records' (which start with an agent
> line) cannot have blank lines and must be separated by blank lines.

Ah, true. But it seems to me that having blank lines elsewhere doesn't break 
the parsing. If other robots.txt parser implementations allow arbitrary blank 
lines, we could add a strict=False parameter to make the parser non-strict. 
This would be a new feature of course.

Does the parser currently handle blank lines between full records (agentline(s) 
+ ruleline(s)) correctly?

> I also do not see "Crawl-delay" and "Sitemap" (from whitehouse.gov) in the 
> grammar referenced above. So I wonder if de facto practice has evolved.

The spec says:

   Lines with Fields not explicitly specified by this specification
   may occur in the /robots.txt, allowing for future extension of the
   format.

So these seem to be nonstandard extensions.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13281>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13281] robotparser.RobotFileParser ignores rules preceeded by a blank line

Reply via email to