[jira] [Commented] (XERCESC-1947) XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8.

2011-06-21 Thread Ben Griffin (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052589#comment-13052589
 ] 

Ben Griffin commented on XERCESC-1947:
--

That certainly fixed the problem that I saw. 
I (for one) am happy that this is resolved. 

Thanks very much,


 XMLUTF8Transcoder::transcodeTo  fails with an exception when transcoding 
 single characters that require 3 or more bytes as UTF8.
 

 Key: XERCESC-1947
 URL: https://issues.apache.org/jira/browse/XERCESC-1947
 Project: Xerces-C++
  Issue Type: Bug
  Components: Utilities
Affects Versions: 3.1.0, 3.1.1
 Environment: Tested on mac os and debian linux. The failure is only 
 manifest on v3.1.x
Reporter: Ben Griffin
Assignee: Alberto Massari
Priority: Minor
 Fix For: 3.2.0

 Attachments: TransService.cpp.patch, TransService.patch, transtest.cpp


 This can be demonstrated with the following 2 lines of code.
   const XMLCh uval [] = { 0x254B, 0x}; //BOX DRAWINGS HEAVY VERTICAL 
 AND HORIZONTAL (needs 3 bytes for utf-8)
   char* uc = (char*)TranscodeToStr(uval,UTF-8).adopt(); cout  uc  
 endl  flush; XMLString::release(uc); //faulty exception;
 The error is: terminate called after throwing an instance of 
 'xercesc_3_1::TranscodingException'

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] Commented: (XERCESC-1947) XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8.

2011-03-16 Thread Lee Doron (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007374#comment-13007374
 ] 

Lee Doron commented on XERCESC-1947:


I've attached a patch that addresses this bug and another I discovered. First, 
with regard to the issue at hand, it seems to me that an empty string (len == 
0) *should* be transcoded, with the result being another zero-terminated 
empty string. Otherwise the caller has an undue burden to examine the string 
before attempting to transcode it. Also, the Throw at line 624 is warranted, in 
case the input XMLCh string is malformed (in my book, that includes having a 
premature zero before len characters). So, I avoid an early exit. Instead, I 
add enough space to allocSize for the 4 terminating zeroes, which has two 
beneficial effects -- in some cases it avoids a reallocation, and it also 
guarantees enough space for at least one UTF-8 transcoded character, so we can 
safely keep the Throw. However, if the input string is empty, we just skip 
calling transcodeTo().

I applied a similar fix to TranscodeFromStr::transcode(), and that's where I 
found an entirely different bug. When it needs to reallocate, it does a 
memcpy(newBuf, fString, fCharsWritten) to copy the existing partial string to 
the new, larger buffer. However, memcpy() takes a count in units of bytes, 
while fCharsWritten is a count of XMLCh! The call should be memcpy(newBuf, 
fString, fCharsWritten * sizeof(XMLCh)).

I made a couple of other minor changes to improve readability and optimize.

 XMLUTF8Transcoder::transcodeTo  fails with an exception when transcoding 
 single characters that require 3 or more bytes as UTF8.
 

 Key: XERCESC-1947
 URL: https://issues.apache.org/jira/browse/XERCESC-1947
 Project: Xerces-C++
  Issue Type: Bug
  Components: Utilities
Affects Versions: 3.1.0, 3.1.1
 Environment: Tested on mac os and debian linux. The failure is only 
 manifest on v3.1.x
Reporter: Ben Griffin
Priority: Minor
 Fix For: 3.1.2, 3.2.0

 Attachments: TransService.cpp.patch, TransService.patch, transtest.cpp


 This can be demonstrated with the following 2 lines of code.
   const XMLCh uval [] = { 0x254B, 0x}; //BOX DRAWINGS HEAVY VERTICAL 
 AND HORIZONTAL (needs 3 bytes for utf-8)
   char* uc = (char*)TranscodeToStr(uval,UTF-8).adopt(); cout  uc  
 endl  flush; XMLString::release(uc); //faulty exception;
 The error is: terminate called after throwing an instance of 
 'xercesc_3_1::TranscodingException'

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] Commented: (XERCESC-1947) XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8.

2010-10-12 Thread Boris Kolpackov (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920195#action_12920195
 ] 

Boris Kolpackov commented on XERCESC-1947:
--

Ben, is this only a problem in TranscodeToStr or can you also get this by 
parsing/serializing XML? It is my understanding that this only affects 
TranscodeToStr.

 XMLUTF8Transcoder::transcodeTo  fails with an exception when transcoding 
 single characters that require 3 or more bytes as UTF8.
 

 Key: XERCESC-1947
 URL: https://issues.apache.org/jira/browse/XERCESC-1947
 Project: Xerces-C++
  Issue Type: Bug
  Components: Utilities
Affects Versions: 3.1.0, 3.1.1
 Environment: Tested on mac os and debian linux. The failure is only 
 manifest on v3.1.x
Reporter: Ben Griffin
Priority: Critical
 Attachments: TransService.patch, transtest.cpp


 This can be demonstrated with the following 2 lines of code.
   const XMLCh uval [] = { 0x254B, 0x}; //BOX DRAWINGS HEAVY VERTICAL 
 AND HORIZONTAL (needs 3 bytes for utf-8)
   char* uc = (char*)TranscodeToStr(uval,UTF-8).adopt(); cout  uc  
 endl  flush; XMLString::release(uc); //faulty exception;
 The error is: terminate called after throwing an instance of 
 'xercesc_3_1::TranscodingException'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] Commented: (XERCESC-1947) XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8.

2010-10-12 Thread Ben Griffin (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920219#action_12920219
 ] 

Ben Griffin commented on XERCESC-1947:
--

Hi Boris,

I'm pretty sure that any serializer that uses TranscodeToStr::transcode(const 
XMLCh *in, XMLSize_t len, XMLTranscoder* trans) will have this problem when the 
nature of the encoding that the transcoder is for is such that characters have 
variable sizes, most especially when the number of bytes needed to transcode a 
character is greater than the number of bytes used by the existing encoding. 
The problem is most easily exposed by the patch.  Essentially, the failure 
happens because there isn't enough memory given to return any bytes eaten - 
even though there is a need to eat them.

So when using UCS2 -- UTF-8, there is no problem until you get to 3-byte or 
more UTF-8 encodings:- characters larger than U+0x0800.  When there is a single 
character to be transcoded then the initial allocSize is not going to be large 
enough to hold that one character, so the transcoder will return 0 'charsRead'.

This error was exposed to me when querying attributes that were set with single 
byte Unicode values from around U+2500.

My code was doing something like...
DOMAttr* enoda = enod-getAttributeNode(a_name);
const XMLCh* x_attrval = enoda-getNodeValue();
if (x_attrval != NULL  x_attrval[0] != 0 ) {
std::string attrval;
char* value = (char*)TranscodeToStr(x_attrval,UTF-8).adopt();
}

I am not sure whether or not the supplied serializer uses TranscodeToStr in 
that sort of way - you are probably better informed than me about that.  
Maybe the component that I put the bug under shouldn't be 'Utilities' ?  I'm 
not sure that I understand why you are interested in whether it affects 
parsing/serializing?  It certainly affects being able to use 
TranscodeToStr::transcode().  I don't believe that the error is in 
XMLUTF8Transcoder::transcodeTo(), because AFAIK it doesn't have storage for 
semi-consumed characters. I believe that the error is with 
TranscodeToStr::transcode().



 XMLUTF8Transcoder::transcodeTo  fails with an exception when transcoding 
 single characters that require 3 or more bytes as UTF8.
 

 Key: XERCESC-1947
 URL: https://issues.apache.org/jira/browse/XERCESC-1947
 Project: Xerces-C++
  Issue Type: Bug
  Components: Utilities
Affects Versions: 3.1.0, 3.1.1
 Environment: Tested on mac os and debian linux. The failure is only 
 manifest on v3.1.x
Reporter: Ben Griffin
Priority: Critical
 Attachments: TransService.patch, transtest.cpp


 This can be demonstrated with the following 2 lines of code.
   const XMLCh uval [] = { 0x254B, 0x}; //BOX DRAWINGS HEAVY VERTICAL 
 AND HORIZONTAL (needs 3 bytes for utf-8)
   char* uc = (char*)TranscodeToStr(uval,UTF-8).adopt(); cout  uc  
 endl  flush; XMLString::release(uc); //faulty exception;
 The error is: terminate called after throwing an instance of 
 'xercesc_3_1::TranscodingException'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] Commented: (XERCESC-1947) XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding single characters that require 3 or more bytes as UTF8.

2010-10-08 Thread Ben Griffin (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919245#action_12919245
 ] 

Ben Griffin commented on XERCESC-1947:
--

XMLUTF8Transcoder::transcodeTo() names it's fifth character 'charsEaten', and 
returns the value of just how many characters were successfully transcoded 
before hitting the end-buffer.
However, TranscodeToStr::transcode() calls transcodeTo with the same parameter 
named as 'charsRead', and expects a value greater than zero.  This is clearly a 
mistake, as a single character that requires more than one byte will not be 
eaten, even though it was 'read'.  

As I see it, the Throw at line 624 of TransService.cpp is unnecessary.  There 
are obviously cases where there are no characters 'read' because there isn't 
enough memory to read them yet. The Transservice should be able to rely upon 
the transcodeTo() method to handle exceptions, rather than just be 'surprised' 
at what it gets back.

Therefore, there is no error when no bytes are 'read' - instead the memory 
should be increased and the transcoding should continue.

OTOH, it seems reasonable for transcode() to test for a zero length string 
before callint transcodeTo().

Just my humble 2ยข.

 XMLUTF8Transcoder::transcodeTo  fails with an exception when transcoding 
 single characters that require 3 or more bytes as UTF8.
 

 Key: XERCESC-1947
 URL: https://issues.apache.org/jira/browse/XERCESC-1947
 Project: Xerces-C++
  Issue Type: Bug
Affects Versions: 3.1.0, 3.1.1
 Environment: Tested on mac os and debian linux. The failure is only 
 manifest on v3.1.x
Reporter: Ben Griffin
Priority: Critical
 Attachments: transtest.cpp


 This can be demonstrated with the following 2 lines of code.
   const XMLCh uval [] = { 0x254B, 0x}; //BOX DRAWINGS HEAVY VERTICAL 
 AND HORIZONTAL (needs 3 bytes for utf-8)
   char* uc = (char*)TranscodeToStr(uval,UTF-8).adopt(); cout  uc  
 endl  flush; XMLString::release(uc); //faulty exception;
 The error is: terminate called after throwing an instance of 
 'xercesc_3_1::TranscodingException'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org