Andy Heninger wrote:
>> About 6% of the time is taken in the UTF8 transcoder (this on a file
>>with no characters with code higher than points than 127). Seems like
>>that could be whittled down a bit.
>I've tried to get this one down already, and couldn't find anything more
>to do to that loop. I tried several different forms without getting any
>further improvement. The percentage looks even bigger with SAX.
Doing a profile of thread test using SAX showed UTF8 transcoding taking about 10% of
the time. I was able to cut that down to 2.2% in a profile build and to see Win32
release performance go up by
around 3% by adding a preliminary loop that copies until it runs into the first multi
byte sequence or exhausts either the source or destination buffers.
I originally was trying to do something more elaborate (casting to long*'s and OR'ing
with 0x80808080 to check four bytes at a time), but this seems to do the trick just
fine without adding any
dependencies on the length of longs.
unsigned int
XMLUTF8Transcoder::transcodeFrom(const XMLByte* const srcData
, const unsigned int srcCount
, XMLCh* const toFill
, const unsigned int maxChars
, unsigned int& bytesEaten
, unsigned char* const charSizes)
{
// Watch for pathological scenario. Shouldn't happen, but...
if (!srcCount || !maxChars)
return 0;
// If debugging, make sure that the block size is legal
#if defined(XERCES_DEBUG)
checkBlockSize(maxChars);
#endif
//
// Get pointers to our start and end points of the input and output
// buffers.
//
const XMLByte* srcPtr = srcData;
const XMLByte* srcEnd = srcPtr + srcCount;
XMLCh* outPtr = toFill;
XMLCh* outEnd = outPtr + maxChars;
unsigned char* sizePtr = charSizes;
+//
+// copy characters until the first multibyte sequence or
+// exhaustion of the source or destination buffers
+//
+ unsigned int bytesToEat = srcCount;
+ if(srcCount > maxChars) {
+ bytesToEat = maxChars;
+ }
+ for(unsigned int i = 0; i < bytesToEat && *srcPtr < 128; i++) {
+ *outPtr++ = *srcPtr++;
+ }
//
// We now loop until we either run out of input data, or room to store
// output chars.
//
while ((srcPtr < srcEnd) && (outPtr < outEnd))
{
// Get the next leading byte out
const XMLByte firstByte = *srcPtr;
// Special-case ASCII, which is a leading byte value of <= 127
if (firstByte <= 127)
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]