Hi Will, See comments below.
"Will Sappington" <[EMAIL PROTECTED]> writes: > We provide an > archival/retrieval system for medical records and images and we use XML > for attaching metadata to the files we store. We have some front-end UI > components that make some use of XML but currently most of the work is > done in the transport layer and the backend database components. Due to > the volume of data involved, efficiency and execution speed is a prime > concern, though not necessarily an overriding one. Most of the XML work > being done now is with roll-your-own string processing. Going forward > we will need to be more sophisticated and standards-compliant. > > Of the packages that turned up when I did a search, Xerces and libxml > are the leading candidates. I've downloaded, installed, built, and > written test code for both and based on my findings, I'm leaning very > heavily toward recommending libxml. The person I report to has a very > strong bias toward Xerces in general, and the W3C DOM standard in > particular, as the hammer with which to pound all nails, even if the > problem isn't a nail. If the Xerces guy wins ;-), you may want to consider using data binding on top of Xerces that will hide all (or most) of the details of dealing with XML. From the description above it appear that your application is data-centric (as opposed to document-centric) so the XML data binding approach should work nicely for you. One such data binding tool is CodeSynthesis XSD (full disclosure: I am involved with the project). It is open-source and supports a wide range of platforms and compilers: http://www.codesynthesis.com/products/xsd/ > * (I may be mistaken about this, but...) for character encodings > libxml uses a standard library (iconv) that is distributed with most > versions of Linux and Unix (and has been ported to Win32), Xerces uses > its own internal routines (?). Yes, you are mistaken here. Xerces-C++ has a built-in support for a small set of essential encodings (UTF-8/16, UCS-4, etc.). It can also be built to use external libraries for encoding. The supported external libraries are Iconv or ICU. > And then this: > > "In cases where performance is critical, I think you'd be best off > > avoiding XPath altogether. (snip) An optimal Xerces SAX parser might > well be more efficient than > > libxml parsing + XPath evaluation." XPath is slow because it is an interpretive language. It is always more efficient to hand-code critical queries in a compiled language such as C or C++. XML data binding has a big advantage here since you can implement your queries using the standard C++ algorithms which will allow you to maintain both sanity and speed. > I'm unsure of the importance of an XML Schema validator so I can't > comment on this. I don't think I agree with the comment about speed vis > a vis UTF-8/16. Encoding conversions using UTF-8 are more > computationally intensive than UTF-16 so what you lose by moving around > double the number bytes would, I think be offset by the greater CPU > requirement for translating the data. Does Xerces' use of UTF-16 > provide support for a wider range of encodings and local languages? The speedup comes from the simple fact that when your XML instance is UTF-8-encoded (as most XML instances are these days) and your parser uses UTF-8 encoding then you do not need to convert from one encoding to the other. You can just use the strings as is. On the other hand, if your parser uses UTF-16 then you will need to convert every character in the XML document from UTF-8 to UTF-16. If you are interested in the XML parsing performance, you may want to read the "XSDBench XML Schema Benchmark 1.0.0 released" thread on the xmlschema-dev mailing list: http://lists.w3.org/Archives/Public/xmlschema-dev/2006Oct/ Particularly this message: http://lists.w3.org/Archives/Public/xmlschema-dev/2006Oct/0061.html hth, -boris -- Boris Kolpackov Code Synthesis Tools CC http://www.codesynthesis.com Open-Source, Cross-Platform C++ XML Data Binding _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
