Has anyone else thought about how robots.txt will work with internationalized URIs (IRIs)? Frankly, I find the IRI spec a bit hard to follow, but it seems like a robots.txt Disallow line should use the path portion of the IRI mapped to a URI (section 3.1 of the spec). Since the disallow can have fragments, we should probably require that the fragment can legally be converted to an IRI. For example, the fragment could not end in the middle of an escaped UTF-8 multi-byte sequence.
We probably need to come up with some examples, and it would not hurt to offer them to the IRI spec folk as an appendix. The current IRI draft is here: <http://www.w3.org/International/iri-edit/draft-duerst-iri.html> wunder -- Walter Underwood Principal Architect Verity Ultraseek _______________________________________________ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots
