On Wed, Apr 25, 2007 at 03:40:04PM +0100, Richard W.M. Jones wrote:
> Daniel Veillard wrote:
> >On Wed, Apr 25, 2007 at 11:19:36AM +0100, Richard W.M. Jones wrote:
> >>OK, so I'll rework to integrate this into the normal parsing and saving
> >>of URIs and put the results in the URI structure. (Is that right?)
> >
> > yes.
> >
> >>uri->query really must be deprecated though!
>
> There's a real problem with this ...
>
> When the URI's query string is parsed, xmlParseURIQuery unescapes the
> query string. Unfortunately this means that application/
> x-www-form-urlencoded data cannot be decoded as per RFC 2396. Allow me
> to explain further ...
>
> Consider this test program:
>
> #include <stdio.h>
> #include <libxml/uri.h>
>
> int
> main ()
> {
> char *str = "/?field1=%26&field2=%26";
> xmlURIPtr uri;
>
> uri = xmlParseURI (str);
> if (uri == NULL) { printf ("xmlParseURI returned NULL\n"); exit (1); }
>
> printf ("query = %s\n", uri->query);
>
> return 0;
> }
>
> This prints:
>
> $ ./test
> query = field1=&&field2=&
[...]
> So we can certainly proceed with parsing into pairs _if_ we either
> assume that we'll always do application/x-www-form-urlencoded encoding,
> and that the charset of the strings that come out is whatever charset
> the higher layers are expecting (they should know).
>
> Or can we add some extra flags/fields into xmlURIPtr so that the
> encoding at least can be fed into xmlParseURIReference?
>
> Or should we just add uri->query_raw and "deprecate" (ie. tell people to
> use with caution) uri->query?
Okay, so we need uri->query_raw to be added, fine.
W.r.t. encoding, if the URI comes from the application no way we can guess
if the URI comes from an XML chunk (e.g. an attribute value) then it should
be UTF-8. Anyway we don't need to interpret characters outside of ASCII at
that level (and if the encoding of that string is not compatible with
the ASCII range all bets are off anyway). So I son't think we need to do
anything here: encoding wise we don't need to understand the string
except that if the upper bit of a character is 0 we must assume it's the
ascii value.
So fine by me to add a query_raw field and an explanation in the structure
(since it's public), and since we have to take the risk of augmenting the
xmlURI size, then let's add the interpreted array of the queries value
if there is any.
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
[EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml