[jira] [Commented] (LANG-857) StringIndexOutOfBoundsException in CharSequenceTranslator
[ https://issues.apache.org/jira/browse/LANG-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502022#comment-13502022 ] Kazuki Hamasaki commented on LANG-857: -- I created additional test cases. But tests for {{escapeJava}} and {{escapeEcmaScript}} fail at this time, due to [LANG-858] {code:java} @Test public void testEscapeSurrogatePairs() throws Exception { assertEquals(\uD83D\uDE30, StringEscapeUtils.escapeCsv(\uD83D\uDE30)); // Examples from https://en.wikipedia.org/wiki/UTF-16 assertEquals(\uD800\uDC00, StringEscapeUtils.escapeCsv(\uD800\uDC00)); assertEquals(\uD834\uDD1E, StringEscapeUtils.escapeCsv(\uD834\uDD1E)); assertEquals(\uDBFF\uDFFD, StringEscapeUtils.escapeCsv(\uDBFF\uDFFD)); assertEquals(\uDBFF\uDFFD, StringEscapeUtils.escapeHtml3(\uDBFF\uDFFD)); assertEquals(\uDBFF\uDFFD, StringEscapeUtils.escapeHtml4(\uDBFF\uDFFD)); assertEquals(\\uDBFF\\uDFFD, StringEscapeUtils.escapeJava(\uDBFF\uDFFD)); //fail assertEquals(\\uDBFF\\uDFFD, StringEscapeUtils.escapeEcmaScript(\uDBFF\uDFFD)); //fail assertEquals(\uDBFF\uDFFD, StringEscapeUtils.escapeXml(\uDBFF\uDFFD)); } @Test public void testUnEscapeSurrogatePairs() throws Exception { assertEquals(\uD83D\uDE30, StringEscapeUtils.unescapeCsv(\uD83D\uDE30)); // Examples from https://en.wikipedia.org/wiki/UTF-16 assertEquals(\uD800\uDC00, StringEscapeUtils.unescapeCsv(\uD800\uDC00)); assertEquals(\uD834\uDD1E, StringEscapeUtils.unescapeCsv(\uD834\uDD1E)); assertEquals(\uDBFF\uDFFD, StringEscapeUtils.unescapeCsv(\uDBFF\uDFFD)); assertEquals(\uDBFF\uDFFD, StringEscapeUtils.unescapeHtml3(\uDBFF\uDFFD)); assertEquals(\uDBFF\uDFFD, StringEscapeUtils.unescapeHtml4(\uDBFF\uDFFD)); assertEquals(\uDBFF\uDFFD, StringEscapeUtils.unescapeJava(\\uDBFF\\uDFFD)); assertEquals(\uDBFF\uDFFD, StringEscapeUtils.unescapeEcmaScript(\\uDBFF\\uDFFD)); assertEquals(\uDBFF\uDFFD, StringEscapeUtils.escapeXml(\uDBFF\uDFFD)); } {code} StringIndexOutOfBoundsException in CharSequenceTranslator - Key: LANG-857 URL: https://issues.apache.org/jira/browse/LANG-857 Project: Commons Lang Issue Type: Bug Components: lang.text.translate.* Affects Versions: 3.x Reporter: Kazuki Hamasaki Priority: Minor Labels: patch Fix For: 3.2 Attachments: CharSequenceTranslator_translate.patch I found that there is bad surrogate pair handling in the CharSequenceTranslator This is a simple test case for this problem. \uD83D\uDE30 is a surrogate pair. {code:java} @Test public void testEscapeSurrogatePairs() throws Exception { assertEquals(\uD83D\uDE30, StringEscapeUtils.escapeCsv(\uD83D\uDE30)); } {code} You'll get the exception as shown below. {code} java.lang.StringIndexOutOfBoundsException: String index out of range: 2 at java.lang.String.charAt(String.java:658) at java.lang.Character.codePointAt(Character.java:4668) at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:95) at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:59) at org.apache.commons.lang3.StringEscapeUtils.escapeCsv(StringEscapeUtils.java:556) {code} Patch attached, the method affected: # public final void translate(CharSequence input, Writer out) throws IOException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LANG-857) StringIndexOutOfBoundsException in CharSequenceTranslator
[ https://issues.apache.org/jira/browse/LANG-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13501654#comment-13501654 ] Kazuki Hamasaki commented on LANG-857: -- Currently same issues appear in StringEscapeUtils.escapeCsv and StringEscapeUtils.unescapeCsv only, because the other escape methods use LookupTranslator which never reach the buggy code. However, I think we should add a couple of tests for other (un)escape methods. StringIndexOutOfBoundsException in CharSequenceTranslator - Key: LANG-857 URL: https://issues.apache.org/jira/browse/LANG-857 Project: Commons Lang Issue Type: Bug Components: lang.text.translate.* Affects Versions: 3.x Reporter: Kazuki Hamasaki Priority: Minor Labels: patch Fix For: 3.2 Attachments: CharSequenceTranslator_translate.patch I found that there is bad surrogate pair handling in the CharSequenceTranslator This is a simple test case for this problem. \uD83D\uDE30 is a surrogate pair. {code:java} @Test public void testEscapeSurrogatePairs() throws Exception { assertEquals(\uD83D\uDE30, StringEscapeUtils.escapeCsv(\uD83D\uDE30)); } {code} You'll get the exception as shown below. {code} java.lang.StringIndexOutOfBoundsException: String index out of range: 2 at java.lang.String.charAt(String.java:658) at java.lang.Character.codePointAt(Character.java:4668) at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:95) at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:59) at org.apache.commons.lang3.StringEscapeUtils.escapeCsv(StringEscapeUtils.java:556) {code} Patch attached, the method affected: # public final void translate(CharSequence input, Writer out) throws IOException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LANG-857) StringIndexOutOfBoundsException in CharSequenceTranslator
[ https://issues.apache.org/jira/browse/LANG-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13501683#comment-13501683 ] Gary Gregory commented on LANG-857: --- Ok, would you be willing to submit a patch for additional test coverage? Thank you, Gary StringIndexOutOfBoundsException in CharSequenceTranslator - Key: LANG-857 URL: https://issues.apache.org/jira/browse/LANG-857 Project: Commons Lang Issue Type: Bug Components: lang.text.translate.* Affects Versions: 3.x Reporter: Kazuki Hamasaki Priority: Minor Labels: patch Fix For: 3.2 Attachments: CharSequenceTranslator_translate.patch I found that there is bad surrogate pair handling in the CharSequenceTranslator This is a simple test case for this problem. \uD83D\uDE30 is a surrogate pair. {code:java} @Test public void testEscapeSurrogatePairs() throws Exception { assertEquals(\uD83D\uDE30, StringEscapeUtils.escapeCsv(\uD83D\uDE30)); } {code} You'll get the exception as shown below. {code} java.lang.StringIndexOutOfBoundsException: String index out of range: 2 at java.lang.String.charAt(String.java:658) at java.lang.Character.codePointAt(Character.java:4668) at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:95) at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:59) at org.apache.commons.lang3.StringEscapeUtils.escapeCsv(StringEscapeUtils.java:556) {code} Patch attached, the method affected: # public final void translate(CharSequence input, Writer out) throws IOException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira