[ 
http://nagoya.apache.org/jira/browse/XERCESC-1319?page=comments#action_57100 ]
     
Alex R. Herbstritt commented on XERCESC-1319:
---------------------------------------------

Alberto,
Thanks for looking at this so quickly. Here are my responses to your numbered 
items. (Not in order)

1) This was a copy and paste error. Sorry about that. It is indeed the 
ICULCPTranscoder::transcode(const XMLCh* const toTranscode) method where the 
bug is. I am including the whole function at the end of my comment because your 
comment 2) has me confused.

4) I am including an xml file that reproduces the bug. In particular the 
character data Mikšíková causes the over run.

2) My aim is not to fix the U_STRING_NOT_TERMINATED_WARNING but to ensure that 
there is no buffer overrun.
With the Mikšíková case, targetLen is 11, so targetLen + 1 is 12, that is what 
ucnv_fromUChars gets.
After it does the conversion it returns targetCap of 12 and 
U_STRING_NOT_TERMINATED_WARNING.
This is all fine.
But then retBuf[targetCap] = 0 places the null outside of the buffer, i.e. 
retBuf[12] = 0, but retBuf is only 12 bytes.
You mention that there is a test for U_STRING_NOT_TERMINATED_WARNING in the 
function. The function is included below, and there is no test for the return. 
Unless I am missing something. When I run it through the debugger, "if 
(U_FAILURE(err))" does not catch it.

3) In reference to the other bug. I don't see how it is possible for the 
character data to not be null termed since this is the last thing that the 
function does. Even if there is an error it will return null. That seems very 
strange. I have looked at the function in a number of versions of CVS and they 
all look the same. The version that I am looking at is the newest one in CVS: 
"Current tag: Xerces-C_1_6_0"

char* ICULCPTranscoder::transcode(const XMLCh* const toTranscode)
{
    char* retBuf = 0;

    // Check for a couple of special cases
    if (!toTranscode)
        return retBuf;

    if (!*toTranscode)
    {
        retBuf = new char[1];
        retBuf[0] = 0;
        return retBuf;
    }

    //
    //  Get the length of the source string since we'll have to use it in
    //  a couple places below.
    //
    const unsigned int srcLen = XMLString::stringLen(toTranscode);

    //
    //  If XMLCh and UChar are not the same size, then we have to make a
    //  temp copy of the text to pass to ICU.
    //
    const UChar* actualSrc;
    UChar* ncActual = 0;
    if (sizeof(XMLCh) == sizeof(UChar))
    {
        actualSrc = (const UChar*)toTranscode;
    }
     else
    {
        // Allocate a non-const temp buf, but store it also in the actual
        ncActual = convertToUChar(toTranscode);
        actualSrc = ncActual;
    }

    // Insure that the temp buffer, if any, gets cleaned up via the nc pointer
    ArrayJanitor<UChar> janTmp(ncActual);

    // Caculate a return buffer size not too big, but less likely to overflow
    int32_t targetLen = (int32_t)(srcLen * 1.25);

    // Allocate the return buffer
    retBuf = new char[targetLen + 1];

    //
    //  Lock now while we call the converter. Use a faux block to do the
    //  lock so that it unlocks immediately afterwards.
    //
    UErrorCode err = U_ZERO_ERROR;
    int32_t targetCap;
    {
        XMLMutexLock lockConverter(&fMutex);

        targetCap = ucnv_fromUChars
        (
            fConverter
            , retBuf
            , targetLen + 1
            , actualSrc
            , -1
            , &err
        );
    }

    // If targetLen is not enough then buffer overflow might occur
    if (err == U_BUFFER_OVERFLOW_ERROR)
    {
        //
        //  Reset the error, delete the old buffer, allocate a new one,
        //  and try again.
        //
        err = U_ZERO_ERROR;
        delete [] retBuf;
        retBuf = new char[targetCap + 1];

        // Lock again before we retry
        XMLMutexLock lockConverter(&fMutex);
        targetCap = ucnv_fromUChars
        (
            fConverter
            , retBuf
            , targetCap
            , actualSrc
            , -1
            , &err
        );
    }

    if (U_FAILURE(err))
    {
        delete [] retBuf;
        return 0;
    }

    // Cap it off and return
    retBuf[targetCap] = 0;
    return retBuf;
}


> Buffer overflow in ICULCPTranscoder::transcode
> ----------------------------------------------
>
>          Key: XERCESC-1319
>          URL: http://nagoya.apache.org/jira/browse/XERCESC-1319
>      Project: Xerces-C++
>         Type: Bug
>   Components: Utilities
>  Environment: All Platforms
>     Reporter: Alex R. Herbstritt
>  Attachments: saxbug01cz.xml
>
> I have found a bug in the transcoder code when transcoding from UTF-16 to 
> UTF-8. We use Xerces against an in house library so I cannot include the code 
> that reproduces the bug. But the bug has been reproduced on Windows and 
> HPUX32. Instead I will give the details of the bug - along with the fix.
> The bug is a buffer over run that happens in a very special case. The fix for 
> it is very simple. I find it hard to believe that nobody has seen this bug 
> before.
> The problem is located in the file
> xercesc/util/Transcoders/ICU/ICUTranService.cpp
> in the method
> XMLCh* ICULCPTranscoder::transcode(const char* const toTranscode)
> with the function call ucnv_fromUChars:
> targetCap = ucnv_fromUChars
>         (
>             fConverter
>             , retBuf
>             , targetLen + 1
>             , actualSrc
>             , -1
>             , &err
>         );
> This is the function that is doing the actual conversion. The problem is with 
> the "targetLen + 1" - this should be replaced with "targetLen". (Note that 
> the call that follows has "targetCap", not "targetCap + 1".)
> The problem is that ucnv_fromUChars can fill the buffer up, including the 
> space held for the null term. That is, targetCap is returned equaling 
> targetLen+1, along with a U_STRING_NOT_TERMINATED_WARNING. This is all fine, 
> until the end of the method where,
>     // Cap it off and return
>     retBuf[targetCap] = 0;
>     return retBuf;
> will place the null term outside of the buffer. That is, we should never let 
> targetCap be larger than targetLen. (The buffer overflow will only happen 
> when targetCap==targetLen+1.)
> Replacing "targetLen + 1" with "targetLen" results in a 
> U_BUFFER_OVERFLOW_ERROR. This is correct, because in the overflow case the 
> problem is that the new string created is one byte longer than the buffer 
> that was allocated. So we want the error to cause a new buffer to be 
> allocated.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://nagoya.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to