Re: [xml] New user, evaluating XML libraries

Todd Ditchendorf Mon, 25 Dec 2006 13:51:40 -0800

Will, FWIW, here's my 2cents.

A few months back, I started a project using both Xerces-C++ andlibxml (http://xmlnanny.com).

Initially, I was more attracted to Xerces (probably because it was C++/OO, which I prefered), so used Xerces wherever I could in myproject (basic XML parsing, DTD and XSD validation), and supplementedwith libxml for features Xerces didn't have (RNG, pull parsing).

Although Xerces is a wonderful toolkit, after having used both, I'm amuch bigger fan of libxml. I've done several xml projects since then,and chosen libxml exclusively for each. I'm currently planning arewrite of my original project and will be using libxml exclusivelyand removing Xerces.

Some things I prefer about libxml that I couldn't get Xerces to do(maybe it was my fault, don't know for sure):

1. libxml allows you to dynamically specify a DTD for validationwithout having to hard code the Doctype in the source doc. AFAICT,you can't do this with XercesC.2. dynamically specifying an XSD against which to validate a sourcedocument seems MUCH, MUCH easier with libxml.3. I prefer libxml's error messages... they seem more complete andmake more sense to humans (JMHO).

4. libxml has XmlReader (pull-parsing)
5. libxml has RNG support
6. libxml has XPath support
7. libxml has XInclude support (AFAICT, XercesC doesn't)

To be fair, I found the quality of documentation of the two productsto be about equal.

I don't understand anyone choosing XercesC because it has DOMsupport... everyone knows DOM is a pretty crap API, and libxml hasit's own tree api that can do the exact same things, more or less.Who cares if it's technically 'DOM' or not? I don't get it.


I hope you give libxml a fair shot... screw the DOM ;)


Todd Ditchendorf

Scandalous Software - Cocoa Developer Tools
http://scan.dalo.us


On Dec 19, 2006, at 9:23 AM, Will Sappington wrote:

Hello to all,
I’m a new user of libxml and new to XML in general. I’ve beenasked to evaluate XML libraries, preferably Open Source projects,for some things we want to do with XML in our products. We providean archival/retrieval system for medical records and images and weuse XML for attaching metadata to the files we store. We have somefront-end UI components that make some use of XML but currentlymost of the work is done in the transport layer and the backenddatabase components. Due to the volume of data involved,efficiency and execution speed is a prime concern, though notnecessarily an overriding one. Most of the XML work being done nowis with roll-your-own string processing. Going forward we willneed to be more sophisticated and standards-compliant.
Of the packages that turned up when I did a search, Xerces andlibxml are the leading candidates. I’ve downloaded, installed,built, and written test code for both and based on my findings, I’mleaning very heavily toward recommending libxml. The person Ireport to has a very strong bias toward Xerces in general, and theW3C DOM standard in particular, as the hammer with which to poundall nails, even if the problem isn’t a nail. I’ve also receivedfeedback from some of the users in the Xerces group and they makesome points that I should at least consider. What I’d like to dois present my reasons for recommending libxml, given the job weneed to do as described above, include some of the Xerces users’comments, and hopefully get your thoughts as well. I like libxmlbecause:
It’s fast, about 3x faster than Xerces in some fairly rudimentarytestsIt supports XPath (one of our big requirements) on its own, Xercesrequires a bolt-on component like Xalan or Pathan to do XPath.Being written in C, it has a much simpler programming interfacethan Xerces’ C++ object model. Nothing against C++, it’s myprimary language and I like it, but the interface to Xerces is morecomplicated, perhaps unnecessarily so, than most of the C++ I’vebeen exposed to. To me, a simpler interface translates to betterunderstanding by a wider range of programmers, faster up-and-running time, and potentially better, safer code.It’s better documented. In addition to the API reference manual,there’s the let-me-walk-you-through-it tutorial, well documentedsample code, and many pages of additional information on a varietyof topics. The information presented in all areas is morethorough. Xerces has the Doxygen-generated ref. man., aProgramming Guide (equivalent to the tutorial, but sparse bycomparison), and some commented sample code.(I may be mistaken about this, but…) for character encodings libxmluses a standard library (iconv) that is distributed with mostversions of Linux and Unix (and has been ported to Win32), Xercesuses its own internal routines (?).In addition to a DOM-like interface and SAX support, libxml has theXMLTextReader interface which I haven’t tried yet, but I’m assumingis a fast efficient way to do simple XML queries. Xerces has onlyDOM and SAX.
I’ve likened the use of big packages like Xerces for some of thethings we need to “using a blowtorch to light a cigarette”. Hereis one response from a Xerces user:
“Libxml is a great library with somewhat different goals thanXerces. I
don't think it's explicitly stated on the Web site, but Xerces andother
projects that build on it tend to implement W3C standards (DOM, XML

Schema), while libxml implements what its maintainer prefers (a unique

API, RelaxNG), with a focus on efficiency.  Both approaches are

reasonable, and which is appropriate depends on your needs.



In your shoes, if I were certain that lighting a cigarette is all I

would ever need to do, I'd probably use libxml.  In my experience,

though, XML is useful for so many things that I'd probably want to be

prepared to bake, boil, weld, and power fighter jets as well - in a

variety of local languages.  I'm a nut for portability, and a DOM

interface has the advantage of being similar or identical in a wide

range of environments (C++, C#, JavaScript, etc).”
What about this? Is Xerces that much more powerful, as the writersuggests? Is portability the only advantage to W3C-compliantinterfaces like DOM?
And then this:



“In cases where performance is critical, I think you'd be best off
avoiding XPath altogether. (snip) An optimal Xerces SAX parsermight well be more efficient than
libxml parsing + XPath evaluation.”



Finally:



“One big difference between Xerces-C++ and Libxml2 is that the latter

does not have a functional XML Schema validator. I don't know if it

is important to you or not. Also note that much of the speed-up of

Libxml2 compared to Xerces-C++ comes from the fact that Xerces-C++

uses 2-byte characters (UTF-16) while Libxml2 uses 1-byte characters

(UTF-8). Since most performance tests that I am aware of are done

on XML files that are either ASCII or UTF-8, Libxml2 has a natural

advantage here. This is also something to consider depending on the

type of applications you are planning to build.”
I’m unsure of the importance of an XML Schema validator so I can’tcomment on this. I don’t think I agree with the comment aboutspeed vis a vis UTF-8/16. Encoding conversions using UTF-8 aremore computationally intensive than UTF-16 so what you lose bymoving around double the number bytes would, I think be offset bythe greater CPU requirement for translating the data. Does Xerces’use of UTF-16 provide support for a wider range of encodings andlocal languages?
I know this is rather long and I apologize in advance if it is toomuch so, but obviously there’s a lot to be considered, this is ahefty decision, and I want to provide anybody who might be inclinedto help with as much to go on as possible. Thanks in advance forany responses,
-will

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] New user, evaluating XML libraries

Reply via email to