On Thu, Aug 04, 2005 at 02:16:58PM +0200, Paweł Pałucha wrote:
>
> > And does an argument for xmlParseFile (for
> >>example) should be already escaped?
> >
> > I would expect the argument to be passed unescaped. Escaping should
> > normally be a no-op on an already escaped string, the escaping of the
> > : after the protocol is completely bogus, I dunno how it happened but it's
> > completely wrong. I think this need to be reexamined for all case,
> > and escaping should understand URI semantic,
>
> The escaping is done in xmlCanonicPath - building an uri from
> 'http://alpha/a b' fails because xmlParseUri expects _escaped_ string and
> fails on space after 'a'. Next, all the string is treated as 'path' part of
> uri:
>
> (uri.c, line 2268):
> uri->path = (char *) xmlStrdup((const xmlChar *) path);
> ...
> ret = xmlSaveUri(uri);
> ...
> return(ret);
We can't change xmlParseUri, clearly it's defined as parsing per the
rfc2396 syntax. There is 2 ways around this:
- make a new entry point allowing to parse unescaped strings
- find a way to correctly unescape unescaped strings even if they
use spaces or other chars.
> xmlSaveUri escapes characters that should't be in 'path' segment, including
> ':' and returns 'http%3A//alpha/a%20b'.
xmlSaveUri is right, the problem is that the http should nver end in
the path part.
> If escaping should be done by library (and not user calling xmlParseFile)
> perhaps the easiest way to fix it is to modify xmlCanonicPath/xmlParseUri.
We can play with xmlCanonicPath, not with xmlParseUri.
We could detect {protocol}:// and then force an escaping of the
remaining chars like ' ' '\n' '\r' '\t' and then pass to xmlParseUri.
> And escaping of already escaped string is not a no-op operation beacuse of
> possible recursive escaping of '%' characters.
yes it should be, %xy with x and y being numbers should not be modified by
any URI escaping routing, that would be another bug.
> I also looked at xslt code - in transform.c there's an attempt to build URI
> from string and then, when failed, it is repeated with escaped string. Perhaps
> the same thing could be done in xmlCanonicPath() ?
the problem is that the escaping may turn http:// into http%3A// again
that sounds wrong, we should fix that part once and for good, checking all
the different cases.
Daniel
--
Daniel Veillard | Red Hat Desktop team http://redhat.com/
[EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml