On Wed, Apr 25, 2007 at 11:19:36AM +0100, Richard W.M. Jones wrote:
> Daniel Veillard wrote:
> >>The current uri->query field is always unescaped during parsing.  I have 
> >>changed so it always stored in its raw form.  This because otherwise 
> >>it's impossible to parse query strings such as: 
> >>file:///tmp/test.html?test=%26&second=%26 which can be generated by web 
> >>browsers.  If anyone was relying on the current semantics, then it seems 
> >>to me that they cannot parse such query strings correctly.
> >
> >  Aside from the number of new APIs, available there, that's my main
> >issue with the patch. You are changing the default behaviour of a
> >functionality exposed like forever.
> >  I guess I would really prefer an approach which hooked into the 
> >URI parsing itself and filled in an extra list of values (or rather
> >an array of xmlChar *, alternatively name and values) in the xmlURI
> >structure at the end. That would allow to keep the uri->query data
> >as it was, and still provide the functionalities you suggest, based
> >on a preparsed xmlURIPtr. This would also avoid adding an extra list
> >type.
> 
> OK, so I'll rework to integrate this into the normal parsing and saving 
> of URIs and put the results in the URI structure.  (Is that right?)

  yes.

> uri->query really must be deprecated though!

  there is basically no deprecation in libxml2 possible, and I have
no plan so far for libxml3, so ...

> >I'm not sure about the ignore flag in that list, what it is 
> >used for ?
> 
> So this is really useful in the situation that I'm actually using this 
> for: I want to parse the URI, remove some of the parameters (basically 
> the ones which my code understands) and leave the rest of the parameters 
> in place for another piece of code down the line to use.
> 
> Now, removing a parameter from a linked list is annoyingly complicated, 
> but setting a flag to say "ignore this parameter - I've seen it" is a 
> lot easier.  On the other hand, if the complexity is hidden inside a 
> uri.c function then that doesn't matter (so long as it works :-)

  okay, 

> >  - those simplified API would work immediately with the Python generator
> >    which would not find char ** which can't be handled automatically.
> 
> So the use of char ** to return values is there because I couldn't see a 
> good way to return an error indication.
> 
> As an example, if xmlURIQueryGetSingle were defined as:
> 
>   char *xmlURIQueryGetSingle (xmlURIPtr uri, const char *name);
> 
> then returning NULL might mean either (1) there is no field with that 
> name, or (2) there was an error, eg. in memory allocation.

  No memory allocation error should occur at that stage, you will return
a const char * coming back from within the xmlURIPtr array. The life time
of that value will be the same as the xmlURI, which is IMHO a fine way
to do things.

> About the use of char vs xmlChar: I could really see which one was 
> correct.  I understand that xmlChar is unsigned because of some bogosity 
> in the XML spec, so xmlChar is used for characters in XML documents. 
> URIs are different though, so which should I be using?

  xmlChar is unsigned because of the bogosity of C strings which have
no associated encoding (and no the current locale is not a decent answer
for XML processing). xmlChar * means an UTF-8 encoded string. char * means
"we don't know the encoding" basically. See the URI vs. IRI disaster
(sorry I don't have the IRI RFC number offhand), I assume we should stick
to char * (or rather const char *) for all of those APIs.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to