Re: RFR: 8311216: DataURI can lose information in some charset environments

Andy Goryachev Fri, 07 Jul 2023 12:21:58 -0700

On Fri, 7 Jul 2023 19:16:20 GMT, Andy Goryachev <[email protected]> wrote:


>> DataURI uses the following implementation to decode the percent-encoded 
>> payload of a "data" URI:
>> 
>> 
>> ...
>> String data = uri.substring(dataSeparator + 1);
>> Charset charset = Charset.defaultCharset();
>> ...
>> URLDecoder.decode(data.replace("+", "%2B"), charset).getBytes(charset)
>> 
>> 
>> This approach only works if the charset that is passed into 
>> `URLDecoder.decode` and `String.getBytes` doesn't lose information when 
>> converting between `String` and `byte[]` representations, as might happen in 
>> a US-ASCII environment.
>> 
>> This PR solves the problem by not using `URLDecoder`, but instead simply 
>> decoding percent-encoded escape sequences as specified by RFC 3986, page 11.
>> 
>> **Note to reviewers**: the failing test can only be observed when the JVM 
>> uses a default charset that can't represent the payload, which can be 
>> enforced by specifying the `-Dfile.encoding=US-ASCII` VM option.
>
> modules/javafx.graphics/src/main/java/com/sun/javafx/util/DataURI.java line 
> 115:
> 
>> 113:             nameValuePairs,
>> 114:             base64,
>> 115:             base64 ? Base64.getDecoder().decode(data) : 
>> decodePercentEncoding(data));
> 
> I wonder if this is all necessary.  The data is supposed to be url-encoded, 
> so it's essentially ASCII, no?
> 
> passing default charset to getBytes() is not right, it probably should be
> 
> URLDecoder.decode(data.replace("+", "%2B"), 
> charset).getBytes(StandardCharsets.US_ASCII));
> 
> or am I missing something?

>From https://datatracker.ietf.org/doc/html/rfc3986#page-11


Therefore, the





Berners-Lee, et al.         Standards Track                    [Page 11]

[RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986)                   URI 
Generic Syntax               January 2005


   integer values used by the ABNF must be mapped back to their
   corresponding characters via US-ASCII in order to complete the syntax
   rules.

-------------

PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1256344029

Re: RFR: 8311216: DataURI can lose information in some charset environments

Reply via email to