Hi,
Hopefully someone can answer a couple of fairly simple :) questions:
1) Using the SAX2 API to process lots of small documents I'm not really getting the performance I'd like. I looked at the documentation for setting features on the parser to minimise the work done but the descriptions of each seems a little vague. Can anyone recomend the best setting for these options is to prevent as much validation as possible - ie to get the optimum performance? What effect does each do? I'm currently doing:
m_xerces_parser->setFeature(XMLUni::fgSAX2CoreValidation, false);
m_xerces_parser->setFeature(XMLUni::fgSAX2CoreNameSpaces, true);
m_xerces_parser->setFeature(XMLUni::fgXercesSchema, false);
I tried switching off CoreNameSpaces as I resolve and validate namespaces at an application level but I didn't get the behaviour I was expecting. This seems to prevent startPrefixMapping() getting called back for namespace declarations and just swallows them up. I was expecting by setting this flag off that the namespace declarations would basically just be treated as regular attributes and be available in the Attribute list via the startElement() callback method. Is this not the case?
On a related note is there any performance difference between using the SAX (1?) parser and SAX2 parse?
2) We recently switched to using ICU. Although I haven't benchmarked at all I'm a bit nervous about the impact of this on performance. Does anyone know if there is much of a difference?
If there is, it would seem to make sense to rather than have an all ICU library or all non ICU library to build the ICU library so that it can be used in either mode. This would allow changing the transcoding method at runtime based on user requirements. In some instances, users of an application that utilises Xerces would care about the extra support offered by ICU, but in other instances users would prefer to sacrifice this for performance. Any thoughts?
Thanks and regards,
SImon
