Has anyone else thought about how robots.txt will work with
internationalized URIs (IRIs)? Frankly, I find the IRI spec
a bit hard to follow, but it seems like a robots.txt Disallow
line should use the path portion of the IRI mapped to a URI
(section 3.1 of the spec). Since the disallow can have fragments,
we should probably require that the fragment can legally be
converted to an IRI. For example, the fragment could not
end in the middle of an escaped UTF-8 multi-byte sequence.

We probably need to come up with some examples, and it 
would not hurt to offer them to the IRI spec folk as
an appendix.

The current IRI draft is here:

  <http://www.w3.org/International/iri-edit/draft-duerst-iri.html>

wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots

Reply via email to