Andy Heninger wrote:
>> About 6% of the time is taken in the UTF8 transcoder (this on a file
>>with no characters with code higher than points than 127).  Seems like
>>that could be whittled down a bit.

>I've tried to get this one down already, and couldn't find anything more
>to do to that loop.  I tried several different forms without getting any
>further improvement.  The percentage looks even bigger with SAX.

Doing a profile of thread test using SAX showed UTF8 transcoding taking about 10% of 
the time.  I was able to cut that down to 2.2% in a profile build and to see Win32 
release performance go up by
around 3% by adding a preliminary loop that copies until it runs into the first multi 
byte sequence or exhausts either the source or destination buffers.

I originally was trying to do something more elaborate (casting to long*'s and OR'ing 
with 0x80808080 to check four bytes at a time), but this seems to do the trick just 
fine without adding any
dependencies on the length of longs.


unsigned int
XMLUTF8Transcoder::transcodeFrom(const  XMLByte* const          srcData
                                , const unsigned int            srcCount
                                ,       XMLCh* const            toFill
                                , const unsigned int            maxChars
                                ,       unsigned int&           bytesEaten
                                ,       unsigned char* const    charSizes)
{
    // Watch for pathological scenario. Shouldn't happen, but...
    if (!srcCount || !maxChars)
        return 0;

    // If debugging, make sure that the block size is legal
    #if defined(XERCES_DEBUG)
    checkBlockSize(maxChars);
    #endif

    //
    //  Get pointers to our start and end points of the input and output
    //  buffers.
    //
    const XMLByte*  srcPtr = srcData;
    const XMLByte*  srcEnd = srcPtr + srcCount;
    XMLCh*          outPtr = toFill;
    XMLCh*          outEnd = outPtr + maxChars;
    unsigned char*  sizePtr = charSizes;

+//
+//   copy characters until the first multibyte sequence or
+//       exhaustion of the source or destination buffers
+//
+     unsigned int bytesToEat = srcCount;
+     if(srcCount > maxChars) {
+               bytesToEat = maxChars;
+       }
+       for(unsigned int i = 0; i < bytesToEat && *srcPtr < 128; i++) {
+               *outPtr++ = *srcPtr++;
+       }

    //
    //  We now loop until we either run out of input data, or room to store
    //  output chars.
    //
    while ((srcPtr < srcEnd) && (outPtr < outEnd))
    {
        // Get the next leading byte out
        const XMLByte firstByte = *srcPtr;

        // Special-case ASCII, which is a leading byte value of <= 127
        if (firstByte <= 127)

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to