[jira] [Commented] (XERCESC-1947) XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8.
[ https://issues.apache.org/jira/browse/XERCESC-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052589#comment-13052589 ] Ben Griffin commented on XERCESC-1947: -- That certainly fixed the problem that I saw. I (for one) am happy that this is resolved. Thanks very much, XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8. Key: XERCESC-1947 URL: https://issues.apache.org/jira/browse/XERCESC-1947 Project: Xerces-C++ Issue Type: Bug Components: Utilities Affects Versions: 3.1.0, 3.1.1 Environment: Tested on mac os and debian linux. The failure is only manifest on v3.1.x Reporter: Ben Griffin Assignee: Alberto Massari Priority: Minor Fix For: 3.2.0 Attachments: TransService.cpp.patch, TransService.patch, transtest.cpp This can be demonstrated with the following 2 lines of code. const XMLCh uval [] = { 0x254B, 0x}; //BOX DRAWINGS HEAVY VERTICAL AND HORIZONTAL (needs 3 bytes for utf-8) char* uc = (char*)TranscodeToStr(uval,UTF-8).adopt(); cout uc endl flush; XMLString::release(uc); //faulty exception; The error is: terminate called after throwing an instance of 'xercesc_3_1::TranscodingException' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] Commented: (XERCESC-1947) XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8.
[ https://issues.apache.org/jira/browse/XERCESC-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007374#comment-13007374 ] Lee Doron commented on XERCESC-1947: I've attached a patch that addresses this bug and another I discovered. First, with regard to the issue at hand, it seems to me that an empty string (len == 0) *should* be transcoded, with the result being another zero-terminated empty string. Otherwise the caller has an undue burden to examine the string before attempting to transcode it. Also, the Throw at line 624 is warranted, in case the input XMLCh string is malformed (in my book, that includes having a premature zero before len characters). So, I avoid an early exit. Instead, I add enough space to allocSize for the 4 terminating zeroes, which has two beneficial effects -- in some cases it avoids a reallocation, and it also guarantees enough space for at least one UTF-8 transcoded character, so we can safely keep the Throw. However, if the input string is empty, we just skip calling transcodeTo(). I applied a similar fix to TranscodeFromStr::transcode(), and that's where I found an entirely different bug. When it needs to reallocate, it does a memcpy(newBuf, fString, fCharsWritten) to copy the existing partial string to the new, larger buffer. However, memcpy() takes a count in units of bytes, while fCharsWritten is a count of XMLCh! The call should be memcpy(newBuf, fString, fCharsWritten * sizeof(XMLCh)). I made a couple of other minor changes to improve readability and optimize. XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8. Key: XERCESC-1947 URL: https://issues.apache.org/jira/browse/XERCESC-1947 Project: Xerces-C++ Issue Type: Bug Components: Utilities Affects Versions: 3.1.0, 3.1.1 Environment: Tested on mac os and debian linux. The failure is only manifest on v3.1.x Reporter: Ben Griffin Priority: Minor Fix For: 3.1.2, 3.2.0 Attachments: TransService.cpp.patch, TransService.patch, transtest.cpp This can be demonstrated with the following 2 lines of code. const XMLCh uval [] = { 0x254B, 0x}; //BOX DRAWINGS HEAVY VERTICAL AND HORIZONTAL (needs 3 bytes for utf-8) char* uc = (char*)TranscodeToStr(uval,UTF-8).adopt(); cout uc endl flush; XMLString::release(uc); //faulty exception; The error is: terminate called after throwing an instance of 'xercesc_3_1::TranscodingException' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] Commented: (XERCESC-1947) XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8.
[ https://issues.apache.org/jira/browse/XERCESC-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920195#action_12920195 ] Boris Kolpackov commented on XERCESC-1947: -- Ben, is this only a problem in TranscodeToStr or can you also get this by parsing/serializing XML? It is my understanding that this only affects TranscodeToStr. XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8. Key: XERCESC-1947 URL: https://issues.apache.org/jira/browse/XERCESC-1947 Project: Xerces-C++ Issue Type: Bug Components: Utilities Affects Versions: 3.1.0, 3.1.1 Environment: Tested on mac os and debian linux. The failure is only manifest on v3.1.x Reporter: Ben Griffin Priority: Critical Attachments: TransService.patch, transtest.cpp This can be demonstrated with the following 2 lines of code. const XMLCh uval [] = { 0x254B, 0x}; //BOX DRAWINGS HEAVY VERTICAL AND HORIZONTAL (needs 3 bytes for utf-8) char* uc = (char*)TranscodeToStr(uval,UTF-8).adopt(); cout uc endl flush; XMLString::release(uc); //faulty exception; The error is: terminate called after throwing an instance of 'xercesc_3_1::TranscodingException' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] Commented: (XERCESC-1947) XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8.
[ https://issues.apache.org/jira/browse/XERCESC-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920219#action_12920219 ] Ben Griffin commented on XERCESC-1947: -- Hi Boris, I'm pretty sure that any serializer that uses TranscodeToStr::transcode(const XMLCh *in, XMLSize_t len, XMLTranscoder* trans) will have this problem when the nature of the encoding that the transcoder is for is such that characters have variable sizes, most especially when the number of bytes needed to transcode a character is greater than the number of bytes used by the existing encoding. The problem is most easily exposed by the patch. Essentially, the failure happens because there isn't enough memory given to return any bytes eaten - even though there is a need to eat them. So when using UCS2 -- UTF-8, there is no problem until you get to 3-byte or more UTF-8 encodings:- characters larger than U+0x0800. When there is a single character to be transcoded then the initial allocSize is not going to be large enough to hold that one character, so the transcoder will return 0 'charsRead'. This error was exposed to me when querying attributes that were set with single byte Unicode values from around U+2500. My code was doing something like... DOMAttr* enoda = enod-getAttributeNode(a_name); const XMLCh* x_attrval = enoda-getNodeValue(); if (x_attrval != NULL x_attrval[0] != 0 ) { std::string attrval; char* value = (char*)TranscodeToStr(x_attrval,UTF-8).adopt(); } I am not sure whether or not the supplied serializer uses TranscodeToStr in that sort of way - you are probably better informed than me about that. Maybe the component that I put the bug under shouldn't be 'Utilities' ? I'm not sure that I understand why you are interested in whether it affects parsing/serializing? It certainly affects being able to use TranscodeToStr::transcode(). I don't believe that the error is in XMLUTF8Transcoder::transcodeTo(), because AFAIK it doesn't have storage for semi-consumed characters. I believe that the error is with TranscodeToStr::transcode(). XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8. Key: XERCESC-1947 URL: https://issues.apache.org/jira/browse/XERCESC-1947 Project: Xerces-C++ Issue Type: Bug Components: Utilities Affects Versions: 3.1.0, 3.1.1 Environment: Tested on mac os and debian linux. The failure is only manifest on v3.1.x Reporter: Ben Griffin Priority: Critical Attachments: TransService.patch, transtest.cpp This can be demonstrated with the following 2 lines of code. const XMLCh uval [] = { 0x254B, 0x}; //BOX DRAWINGS HEAVY VERTICAL AND HORIZONTAL (needs 3 bytes for utf-8) char* uc = (char*)TranscodeToStr(uval,UTF-8).adopt(); cout uc endl flush; XMLString::release(uc); //faulty exception; The error is: terminate called after throwing an instance of 'xercesc_3_1::TranscodingException' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] Commented: (XERCESC-1947) XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8.
[ https://issues.apache.org/jira/browse/XERCESC-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919245#action_12919245 ] Ben Griffin commented on XERCESC-1947: -- XMLUTF8Transcoder::transcodeTo() names it's fifth character 'charsEaten', and returns the value of just how many characters were successfully transcoded before hitting the end-buffer. However, TranscodeToStr::transcode() calls transcodeTo with the same parameter named as 'charsRead', and expects a value greater than zero. This is clearly a mistake, as a single character that requires more than one byte will not be eaten, even though it was 'read'. As I see it, the Throw at line 624 of TransService.cpp is unnecessary. There are obviously cases where there are no characters 'read' because there isn't enough memory to read them yet. The Transservice should be able to rely upon the transcodeTo() method to handle exceptions, rather than just be 'surprised' at what it gets back. Therefore, there is no error when no bytes are 'read' - instead the memory should be increased and the transcoding should continue. OTOH, it seems reasonable for transcode() to test for a zero length string before callint transcodeTo(). Just my humble 2ยข. XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8. Key: XERCESC-1947 URL: https://issues.apache.org/jira/browse/XERCESC-1947 Project: Xerces-C++ Issue Type: Bug Affects Versions: 3.1.0, 3.1.1 Environment: Tested on mac os and debian linux. The failure is only manifest on v3.1.x Reporter: Ben Griffin Priority: Critical Attachments: transtest.cpp This can be demonstrated with the following 2 lines of code. const XMLCh uval [] = { 0x254B, 0x}; //BOX DRAWINGS HEAVY VERTICAL AND HORIZONTAL (needs 3 bytes for utf-8) char* uc = (char*)TranscodeToStr(uval,UTF-8).adopt(); cout uc endl flush; XMLString::release(uc); //faulty exception; The error is: terminate called after throwing an instance of 'xercesc_3_1::TranscodingException' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org