[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)
[ https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334409#comment-16334409 ] Andreas Krantz commented on XERCESC-2130: - https://issues.apache.org/jira/browse/XERCESC-1854 describes that xerces could be used to write files that no longer can be read. [http://svn.apache.org/viewvc/xerces/c/trunk/src/xercesc/dom/impl/DOMLSSerializerImpl.cpp?r1=768978&r2=1226891] introduced the new method DOMLSSerializerImpl::ensureValidString method that fails to validate characters x1-#x10. Those valid characters can not be displayed using one 16bit XMLCh but two 16bit XMLCh are needed. To implement those characters the range D800 - DFFF is used. [https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF] There is one leading(high) 16bit XMLCh and a trailing(low) 16bit character. Checking [http://svn.apache.org/viewvc/xerces/c/trunk/src/xercesc/util/XMLChar.cpp] will show you that the {{isXMLChar}} method used already is aware of this fact and can be used to validate two character XMLChs. *An easy fix would be:* * *reopen XERCESC 1854* * *clear the content of ensureValidString to do nothing* * *make sure this redistributed to avoid not beeing able to write x1-#x10* I use xerces for over a decade and writing invalid files was always there. So it does no harm to remove this broken feature (introduced in 3.2.0) again. P.S.: Signing an CLA seems not that easy. I am checking. > UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces > 3.2.0 (e.g. emoticons) > > > Key: XERCESC-2130 > URL: https://issues.apache.org/jira/browse/XERCESC-2130 > Project: Xerces-C++ > Issue Type: Bug > Components: DOM >Affects Versions: 3.2.0 >Reporter: Andreas Krantz >Priority: Critical > Attachments: fix.patch, patch_.cpp, reproduce.cpp > > > Solution for XERCESC-1854 introduced method > {{DOMLSSerializerImpl::ensureValidString}} > which has an error in validation. > The method validates XMLCh which represent UTF16. > [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | > [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10] > are the valid UTF32 characters. > The UTF16 surrogate range from xD800 - xDFFF is used to represent > [#x1-#x10] and should not be handled as nvalid. > *The reader threads this correctly and does not complain, which leads to an > asmetric behavior* > Reading DOM => OK > Save back DOM => Exception > I tried to attach an example to show the behavior. > The used methods > {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}} > already have a second optional parameter to check surrogate values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)
[ https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334309#comment-16334309 ] Scott Cantor commented on XERCESC-2130: --- I don't believe the patch would cross any threshold of significance as to require a CLA, but that doesn't really matter. I am not touching this code, it would be just as likely to break ten other Unicode features as fix anything (I say that from a position of ignorance, I couldn't possibly know what it would do). Rolling back whatever broke this, depending on what *that* fixed, would be a more likely fix from my perspective. But I don't know anything about the original change, I didn't make it. If somebody else understands any of this and wants to take responsibility for the change, have at it. > UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces > 3.2.0 (e.g. emoticons) > > > Key: XERCESC-2130 > URL: https://issues.apache.org/jira/browse/XERCESC-2130 > Project: Xerces-C++ > Issue Type: Bug > Components: DOM >Affects Versions: 3.2.0 >Reporter: Andreas Krantz >Priority: Critical > Attachments: fix.patch, patch_.cpp, reproduce.cpp > > > Solution for XERCESC-1854 introduced method > {{DOMLSSerializerImpl::ensureValidString}} > which has an error in validation. > The method validates XMLCh which represent UTF16. > [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | > [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10] > are the valid UTF32 characters. > The UTF16 surrogate range from xD800 - xDFFF is used to represent > [#x1-#x10] and should not be handled as nvalid. > *The reader threads this correctly and does not complain, which leads to an > asmetric behavior* > Reading DOM => OK > Save back DOM => Exception > I tried to attach an example to show the behavior. > The used methods > {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}} > already have a second optional parameter to check surrogate values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)
[ https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334103#comment-16334103 ] Roger Leigh commented on XERCESC-2130: -- Regarding signing, I did my work on my employer's time for at least some of it, so I had to get them to also sign a corporate CLA. It wasn't a problem, but it was a massive pain due to it taking about six months to be approved. May be easier for smaller organisations with less tortuous bureaucracy! > UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces > 3.2.0 (e.g. emoticons) > > > Key: XERCESC-2130 > URL: https://issues.apache.org/jira/browse/XERCESC-2130 > Project: Xerces-C++ > Issue Type: Bug > Components: DOM >Affects Versions: 3.2.0 >Reporter: Andreas Krantz >Priority: Critical > Attachments: fix.patch, patch_.cpp, reproduce.cpp > > > Solution for XERCESC-1854 introduced method > {{DOMLSSerializerImpl::ensureValidString}} > which has an error in validation. > The method validates XMLCh which represent UTF16. > [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | > [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10] > are the valid UTF32 characters. > The UTF16 surrogate range from xD800 - xDFFF is used to represent > [#x1-#x10] and should not be handled as nvalid. > *The reader threads this correctly and does not complain, which leads to an > asmetric behavior* > Reading DOM => OK > Save back DOM => Exception > I tried to attach an example to show the behavior. > The used methods > {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}} > already have a second optional parameter to check surrogate values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)
[ https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334092#comment-16334092 ] Andreas Krantz commented on XERCESC-2130: - An alternative would be an rollback of {{DOMLSSerializerImpl}} by means of going back to not check. Actually the xerces 3.2.0 has a broken writer code. You will not be able to save any messager message including an emoticon. *So it is important to come up with an 3.2.1 before the 3.2.0 is migrated in to many distributions including this issue.* P.S:. I think I must first figure out if I can just sign this. I am not a legal expert too. > UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces > 3.2.0 (e.g. emoticons) > > > Key: XERCESC-2130 > URL: https://issues.apache.org/jira/browse/XERCESC-2130 > Project: Xerces-C++ > Issue Type: Bug > Components: DOM >Affects Versions: 3.2.0 >Reporter: Andreas Krantz >Priority: Critical > Attachments: fix.patch, patch_.cpp, reproduce.cpp > > > Solution for XERCESC-1854 introduced method > {{DOMLSSerializerImpl::ensureValidString}} > which has an error in validation. > The method validates XMLCh which represent UTF16. > [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | > [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10] > are the valid UTF32 characters. > The UTF16 surrogate range from xD800 - xDFFF is used to represent > [#x1-#x10] and should not be handled as nvalid. > *The reader threads this correctly and does not complain, which leads to an > asmetric behavior* > Reading DOM => OK > Save back DOM => Exception > I tried to attach an example to show the behavior. > The used methods > {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}} > already have a second optional parameter to check surrogate values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)
[ https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334086#comment-16334086 ] Roger Leigh commented on XERCESC-2130: -- I'm not a legal expert, and I don't know where the Apache organisation draws the line between trivial and non-trivial contributions which require a CLA, but I suspect this counts as non-trivial. I think you would need to fill out an [individual CLA]([https://www.apache.org/licenses/#clas)] to allow this to be included. However, others might wish to correct me if I'm wrong. > UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces > 3.2.0 (e.g. emoticons) > > > Key: XERCESC-2130 > URL: https://issues.apache.org/jira/browse/XERCESC-2130 > Project: Xerces-C++ > Issue Type: Bug > Components: DOM >Affects Versions: 3.2.0 >Reporter: Andreas Krantz >Priority: Critical > Attachments: fix.patch, patch_.cpp, reproduce.cpp > > > Solution for XERCESC-1854 introduced method > {{DOMLSSerializerImpl::ensureValidString}} > which has an error in validation. > The method validates XMLCh which represent UTF16. > [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | > [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10] > are the valid UTF32 characters. > The UTF16 surrogate range from xD800 - xDFFF is used to represent > [#x1-#x10] and should not be handled as nvalid. > *The reader threads this correctly and does not complain, which leads to an > asmetric behavior* > Reading DOM => OK > Save back DOM => Exception > I tried to attach an example to show the behavior. > The used methods > {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}} > already have a second optional parameter to check surrogate values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)
[ https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334076#comment-16334076 ] Andreas Krantz commented on XERCESC-2130: - Could someone please tell me what to do to get this bug fix in there. *Potentially a 3.2.1 should be produced!* > UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces > 3.2.0 (e.g. emoticons) > > > Key: XERCESC-2130 > URL: https://issues.apache.org/jira/browse/XERCESC-2130 > Project: Xerces-C++ > Issue Type: Bug > Components: DOM >Affects Versions: 3.2.0 >Reporter: Andreas Krantz >Priority: Critical > Attachments: fix.patch, patch_.cpp, reproduce.cpp > > > Solution for XERCESC-1854 introduced method > {{DOMLSSerializerImpl::ensureValidString}} > which has an error in validation. > The method validates XMLCh which represent UTF16. > [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | > [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10] > are the valid UTF32 characters. > The UTF16 surrogate range from xD800 - xDFFF is used to represent > [#x1-#x10] and should not be handled as nvalid. > *The reader threads this correctly and does not complain, which leads to an > asmetric behavior* > Reading DOM => OK > Save back DOM => Exception > I tried to attach an example to show the behavior. > The used methods > {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}} > already have a second optional parameter to check surrogate values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)
[ https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323900#comment-16323900 ] Andreas Krantz commented on XERCESC-2130: - I am new on the apache stuff. Just figured out the issue when migrating our product to xercers 3.2.0. So actually I have no Apache CLA. > UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces > 3.2.0 (e.g. emoticons) > > > Key: XERCESC-2130 > URL: https://issues.apache.org/jira/browse/XERCESC-2130 > Project: Xerces-C++ > Issue Type: Bug > Components: DOM >Affects Versions: 3.2.0 >Reporter: Andreas Krantz >Priority: Critical > Attachments: fix.patch, patch_.cpp, reproduce.cpp > > > Solution for XERCESC-1854 introduced method > {{DOMLSSerializerImpl::ensureValidString}} > which has an error in validation. > The method validates XMLCh which represent UTF16. > [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | > [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10] > are the valid UTF32 characters. > The UTF16 surrogate range from xD800 - xDFFF is used to represent > [#x1-#x10] and should not be handled as nvalid. > *The reader threads this correctly and does not complain, which leads to an > asmetric behavior* > Reading DOM => OK > Save back DOM => Exception > I tried to attach an example to show the behavior. > The used methods > {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}} > already have a second optional parameter to check surrogate values. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)
[ https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323829#comment-16323829 ] Roger Leigh commented on XERCESC-2130: -- Ouch, emoticons were a bit of a low blow! I'm not sure what the threshold is for contributions to require it, but have you already submitted the Apache CLA? Thanks, Roger > UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces > 3.2.0 (e.g. emoticons) > > > Key: XERCESC-2130 > URL: https://issues.apache.org/jira/browse/XERCESC-2130 > Project: Xerces-C++ > Issue Type: Bug > Components: DOM >Affects Versions: 3.2.0 >Reporter: Andreas Krantz >Priority: Critical > Attachments: fix.patch, patch_.cpp, reproduce.cpp > > > Solution for XERCESC-1854 introduced method > {{DOMLSSerializerImpl::ensureValidString}} > which has an error in validation. > The method validates XMLCh which represent UTF16. > [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | > [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10] > are the valid UTF32 characters. > The UTF16 surrogate range from xD800 - xDFFF is used to represent > [#x1-#x10] and should not be handled as nvalid. > *The reader threads this correctly and does not complain, which leads to an > asmetric behavior* > Reading DOM => OK > Save back DOM => Exception > I tried to attach an example to show the behavior. > The used methods > {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}} > already have a second optional parameter to check surrogate values. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org