On Tue, Nov 22, 2005 at 01:39:21PM -0500, Stefan Seefeld wrote:
> >>I have been suggesting a new C++ API (wrapper around libxml2) on boost.org
> >>(http://boost.org/) and a big part of the discussion is precisly about how
> >>best to do that conversion (see 
> >>http://lists.boost.org/Archives/boost/2005/10/96129.php)
> >
> >
> >  Don't do conversion and keep UTF-8, converting back and forth all the 
> >  time
> >bustring by substring is gonna be quite costly, probably more than the cost
> >of parsing the data.
> 
> We agree that this is the best option. However, users may not be in control
> of all application layers, and so at some point a conversion may be 
> required.
> Ideally some conversion mechanism can be provided that only allocates / 
> copies
> data if absolutely necessary, and passes utf-8 strings through, if possible.

  My reply was precisely about the boost framework. Keep the APIs UTF-8
if you provide wrappers for libxml2, otherwise you may force a lot of
unecessary conversions.
  I tried to explain why UTF-8 was the one making the most sense.
  http://xmlsoft.org/encoding.html#internal
I would also add in retrospect that 99% of the instances you see around
use markup names in the ASCII range and hence the api using markup names
are usually the cheapest possible. At the instance level converting the
full document while streaming is less costly than converting back and
forth all tags, attribute and namespaces.

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to