On Thu, Aug 04, 2005 at 02:16:58PM +0200, Paweł Pałucha wrote:
> 
> > And does an argument for xmlParseFile (for
> >>example) should be already escaped? 
> > 
> >   I would expect the argument to be passed unescaped. Escaping should
> > normally be a no-op on an already escaped string, the escaping of the
> > : after the protocol is completely bogus, I dunno how it happened but it's
> > completely wrong. I think this need to be reexamined for all case, 
> > and escaping should understand URI semantic, 
> 
> The escaping is done in xmlCanonicPath - building an uri from
> 'http://alpha/a b' fails because xmlParseUri expects _escaped_ string and
> fails on space after 'a'. Next, all the string is treated as 'path' part of 
> uri:
> 
> (uri.c, line 2268):
> uri->path = (char *) xmlStrdup((const xmlChar *) path);
> ...
> ret = xmlSaveUri(uri);
> ...
> return(ret);

   We can't change xmlParseUri, clearly it's defined as parsing per the
rfc2396 syntax. There is 2 ways around this:
   - make a new entry point allowing to parse unescaped strings
   - find a way to correctly unescape unescaped strings even if they
     use spaces or other chars.

> xmlSaveUri escapes characters that should't be in 'path' segment, including
> ':' and returns 'http%3A//alpha/a%20b'.

  xmlSaveUri is right, the problem is that the http should nver end in
the path part.

> If escaping should be done by library (and not user calling xmlParseFile)
> perhaps the easiest way to fix it is to modify xmlCanonicPath/xmlParseUri.

  We can play with xmlCanonicPath, not with xmlParseUri.
We could detect {protocol}:// and then force an escaping of the 
remaining chars like ' ' '\n' '\r' '\t' and then pass to xmlParseUri.

> And escaping of already escaped string is not a no-op operation beacuse of
> possible recursive escaping of '%' characters.

  yes it should be, %xy with x and y being numbers should not be modified by
any URI escaping routing, that would be another bug.

> I also looked at xslt code - in transform.c there's an attempt to build URI
> from string and then, when failed, it is repeated with escaped string. Perhaps
> the same thing could be done in xmlCanonicPath() ?

  the problem is that the escaping may turn http:// into http%3A// again
that sounds wrong, we should fix that part once and for good, checking all
the different cases.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to