[ http://nagoya.apache.org/jira/browse/XERCESC-1319?page=comments#action_57100 ] Alex R. Herbstritt commented on XERCESC-1319: ---------------------------------------------
Alberto, Thanks for looking at this so quickly. Here are my responses to your numbered items. (Not in order) 1) This was a copy and paste error. Sorry about that. It is indeed the ICULCPTranscoder::transcode(const XMLCh* const toTranscode) method where the bug is. I am including the whole function at the end of my comment because your comment 2) has me confused. 4) I am including an xml file that reproduces the bug. In particular the character data Mikšíková causes the over run. 2) My aim is not to fix the U_STRING_NOT_TERMINATED_WARNING but to ensure that there is no buffer overrun. With the Mikšíková case, targetLen is 11, so targetLen + 1 is 12, that is what ucnv_fromUChars gets. After it does the conversion it returns targetCap of 12 and U_STRING_NOT_TERMINATED_WARNING. This is all fine. But then retBuf[targetCap] = 0 places the null outside of the buffer, i.e. retBuf[12] = 0, but retBuf is only 12 bytes. You mention that there is a test for U_STRING_NOT_TERMINATED_WARNING in the function. The function is included below, and there is no test for the return. Unless I am missing something. When I run it through the debugger, "if (U_FAILURE(err))" does not catch it. 3) In reference to the other bug. I don't see how it is possible for the character data to not be null termed since this is the last thing that the function does. Even if there is an error it will return null. That seems very strange. I have looked at the function in a number of versions of CVS and they all look the same. The version that I am looking at is the newest one in CVS: "Current tag: Xerces-C_1_6_0" char* ICULCPTranscoder::transcode(const XMLCh* const toTranscode) { char* retBuf = 0; // Check for a couple of special cases if (!toTranscode) return retBuf; if (!*toTranscode) { retBuf = new char[1]; retBuf[0] = 0; return retBuf; } // // Get the length of the source string since we'll have to use it in // a couple places below. // const unsigned int srcLen = XMLString::stringLen(toTranscode); // // If XMLCh and UChar are not the same size, then we have to make a // temp copy of the text to pass to ICU. // const UChar* actualSrc; UChar* ncActual = 0; if (sizeof(XMLCh) == sizeof(UChar)) { actualSrc = (const UChar*)toTranscode; } else { // Allocate a non-const temp buf, but store it also in the actual ncActual = convertToUChar(toTranscode); actualSrc = ncActual; } // Insure that the temp buffer, if any, gets cleaned up via the nc pointer ArrayJanitor<UChar> janTmp(ncActual); // Caculate a return buffer size not too big, but less likely to overflow int32_t targetLen = (int32_t)(srcLen * 1.25); // Allocate the return buffer retBuf = new char[targetLen + 1]; // // Lock now while we call the converter. Use a faux block to do the // lock so that it unlocks immediately afterwards. // UErrorCode err = U_ZERO_ERROR; int32_t targetCap; { XMLMutexLock lockConverter(&fMutex); targetCap = ucnv_fromUChars ( fConverter , retBuf , targetLen + 1 , actualSrc , -1 , &err ); } // If targetLen is not enough then buffer overflow might occur if (err == U_BUFFER_OVERFLOW_ERROR) { // // Reset the error, delete the old buffer, allocate a new one, // and try again. // err = U_ZERO_ERROR; delete [] retBuf; retBuf = new char[targetCap + 1]; // Lock again before we retry XMLMutexLock lockConverter(&fMutex); targetCap = ucnv_fromUChars ( fConverter , retBuf , targetCap , actualSrc , -1 , &err ); } if (U_FAILURE(err)) { delete [] retBuf; return 0; } // Cap it off and return retBuf[targetCap] = 0; return retBuf; } > Buffer overflow in ICULCPTranscoder::transcode > ---------------------------------------------- > > Key: XERCESC-1319 > URL: http://nagoya.apache.org/jira/browse/XERCESC-1319 > Project: Xerces-C++ > Type: Bug > Components: Utilities > Environment: All Platforms > Reporter: Alex R. Herbstritt > Attachments: saxbug01cz.xml > > I have found a bug in the transcoder code when transcoding from UTF-16 to > UTF-8. We use Xerces against an in house library so I cannot include the code > that reproduces the bug. But the bug has been reproduced on Windows and > HPUX32. Instead I will give the details of the bug - along with the fix. > The bug is a buffer over run that happens in a very special case. The fix for > it is very simple. I find it hard to believe that nobody has seen this bug > before. > The problem is located in the file > xercesc/util/Transcoders/ICU/ICUTranService.cpp > in the method > XMLCh* ICULCPTranscoder::transcode(const char* const toTranscode) > with the function call ucnv_fromUChars: > targetCap = ucnv_fromUChars > ( > fConverter > , retBuf > , targetLen + 1 > , actualSrc > , -1 > , &err > ); > This is the function that is doing the actual conversion. The problem is with > the "targetLen + 1" - this should be replaced with "targetLen". (Note that > the call that follows has "targetCap", not "targetCap + 1".) > The problem is that ucnv_fromUChars can fill the buffer up, including the > space held for the null term. That is, targetCap is returned equaling > targetLen+1, along with a U_STRING_NOT_TERMINATED_WARNING. This is all fine, > until the end of the method where, > // Cap it off and return > retBuf[targetCap] = 0; > return retBuf; > will place the null term outside of the buffer. That is, we should never let > targetCap be larger than targetLen. (The buffer overflow will only happen > when targetCap==targetLen+1.) > Replacing "targetLen + 1" with "targetLen" results in a > U_BUFFER_OVERFLOW_ERROR. This is correct, because in the overflow case the > problem is that the new string created is one byte longer than the buffer > that was allocated. So we want the error to cause a new buffer to be > allocated. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://nagoya.apache.org/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]