Re: RFR: 8260265: UTF-8 by Default [v9]

2021-08-28 Thread Alan Bateman
On Fri, 27 Aug 2021 23:14:58 GMT, Naoto Sato  wrote:

>> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of 
>> the changes is `Charset.defaultCharset()` returning `UTF-8` and 
>> `file.encoding` system property being added in the spec, but another notable 
>> modification is in `java.io.PrintStream` where it continues to use the 
>> `Console` encoding as the default charset instead of `UTF-8`. Other changes 
>> are mostly clarification of the term "default charset" and their links. 
>> Corresponding CSR has also been drafted.
>> 
>> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
>> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266
>
> Naoto Sato has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fixed a failing test

Marked as reviewed by alanb (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default [v9]

2021-08-27 Thread Naoto Sato
> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of 
> the changes is `Charset.defaultCharset()` returning `UTF-8` and 
> `file.encoding` system property being added in the spec, but another notable 
> modification is in `java.io.PrintStream` where it continues to use the 
> `Console` encoding as the default charset instead of `UTF-8`. Other changes 
> are mostly clarification of the term "default charset" and their links. 
> Corresponding CSR has also been drafted.
> 
> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266

Naoto Sato has updated the pull request incrementally with one additional 
commit since the last revision:

  Fixed a failing test

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4733/files
  - new: https://git.openjdk.java.net/jdk/pull/4733/files/67e1d4a9..28e79d4e

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=4733=08
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=4733=07-08

  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4733.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4733/head:pull/4733

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default [v8]

2021-08-27 Thread Naoto Sato
> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of 
> the changes is `Charset.defaultCharset()` returning `UTF-8` and 
> `file.encoding` system property being added in the spec, but another notable 
> modification is in `java.io.PrintStream` where it continues to use the 
> `Console` encoding as the default charset instead of `UTF-8`. Other changes 
> are mostly clarification of the term "default charset" and their links. 
> Corresponding CSR has also been drafted.
> 
> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266

Naoto Sato has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains 15 additional commits since the 
last revision:

 - Removed "default" alias
 - Merge branch 'master' into JDK-8260265
 - Merge branch 'master' into JDK-8260265
 - Merge branch 'master' into JDK-8260265
 - Merge branch 'master' into JDK-8260265
 - Merge branch 'master' into JDK-8260265
 - Merge branch 'master' into JDK-8260265
 - Refined `file.encoding` description
 - Merge branch 'master' into JDK-8260265
 - Reflects comments
 - ... and 5 more: https://git.openjdk.java.net/jdk/compare/66c3531f...67e1d4a9

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4733/files
  - new: https://git.openjdk.java.net/jdk/pull/4733/files/7d5137d3..67e1d4a9

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=4733=07
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=4733=06-07

  Stats: 15450 lines in 595 files changed: 11498 ins; 2067 del; 1885 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4733.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4733/head:pull/4733

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default [v7]

2021-08-12 Thread Naoto Sato
> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of 
> the changes is `Charset.defaultCharset()` returning `UTF-8` and 
> `file.encoding` system property being added in the spec, but another notable 
> modification is in `java.io.PrintStream` where it continues to use the 
> `Console` encoding as the default charset instead of `UTF-8`. Other changes 
> are mostly clarification of the term "default charset" and their links. 
> Corresponding CSR has also been drafted.
> 
> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266

Naoto Sato has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains 12 additional commits since the 
last revision:

 - Merge branch 'master' into JDK-8260265
 - Merge branch 'master' into JDK-8260265
 - Merge branch 'master' into JDK-8260265
 - Merge branch 'master' into JDK-8260265
 - Refined `file.encoding` description
 - Merge branch 'master' into JDK-8260265
 - Reflects comments
 - Removed leftover `console` references in `PrintStream`
 - Reflects comments
 - Reflects review comments
 - ... and 2 more: https://git.openjdk.java.net/jdk/compare/387b32b3...7d5137d3

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4733/files
  - new: https://git.openjdk.java.net/jdk/pull/4733/files/6f9e5eb4..7d5137d3

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=4733=06
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=4733=05-06

  Stats: 53933 lines in 926 files changed: 45460 ins; 4145 del; 4328 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4733.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4733/head:pull/4733

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default [v6]

2021-07-22 Thread Alan Bateman
On Wed, 21 Jul 2021 17:34:15 GMT, Naoto Sato  wrote:

>> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of 
>> the changes is `Charset.defaultCharset()` returning `UTF-8` and 
>> `file.encoding` system property being added in the spec, but another notable 
>> modification is in `java.io.PrintStream` where it continues to use the 
>> `Console` encoding as the default charset instead of `UTF-8`. Other changes 
>> are mostly clarification of the term "default charset" and their links. 
>> Corresponding CSR has also been drafted.
>> 
>> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
>> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266
>
> Naoto Sato has updated the pull request with a new target base due to a merge 
> or a rebase. The incremental webrev excludes the unrelated changes brought in 
> by the merge/rebase. The pull request contains ten additional commits since 
> the last revision:
> 
>  - Merge branch 'master' into JDK-8260265
>  - Refined `file.encoding` description
>  - Merge branch 'master' into JDK-8260265
>  - Reflects comments
>  - Removed leftover `console` references in `PrintStream`
>  - Reflects comments
>  - Reflects review comments
>  - Merge branch 'master' into JDK-8260265
>  - 8260265: UTF-8 by Default

Thanks for incorporating the suggestion for the getProperties text. I think it 
looks good now.

-

Marked as reviewed by alanb (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default [v4]

2021-07-15 Thread Naoto Sato
> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of 
> the changes is `Charset.defaultCharset()` returning `UTF-8` and 
> `file.encoding` system property being added in the spec, but another notable 
> modification is in `java.io.PrintStream` where it continues to use the 
> `Console` encoding as the default charset instead of `UTF-8`. Other changes 
> are mostly clarification of the term "default charset" and their links. 
> Corresponding CSR has also been drafted.
> 
> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266

Naoto Sato has updated the pull request incrementally with one additional 
commit since the last revision:

  Removed leftover `console` references in `PrintStream`

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4733/files
  - new: https://git.openjdk.java.net/jdk/pull/4733/files/182dcb6d..38b8d988

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=4733=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=4733=02-03

  Stats: 14 lines in 1 file changed: 0 ins; 4 del; 10 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4733.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4733/head:pull/4733

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default [v3]

2021-07-14 Thread Naoto Sato
On Wed, 14 Jul 2021 20:52:54 GMT, Naoto Sato  wrote:

>> src/java.base/share/classes/java/io/Console.java line 587:
>> 
>>> 585: try {
>>> 586: cs = Charset.forName(csname);
>>> 587: } catch (Exception ignored) { }
>> 
>> A separate enhancement...
>> I've long thought that should be a way to avoid the exception here.
>> For example,  a Charset.forName(csname, default);
>> The caller might have a default in mind or supply null and then be able to 
>> test for null.
>
> Agreed. Will file an RFE for this.

https://bugs.openjdk.java.net/browse/JDK-8270490

-

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default [v3]

2021-07-14 Thread Naoto Sato
On Wed, 14 Jul 2021 21:03:53 GMT, Roger Riggs  wrote:

>> That was intentional. Only those two are supported, others continue to work 
>> as before (but not supported).
>
> Still it leaves an uncomfortable feeling, perhaps remedied by an "other 
> values have unspecified behavior"
> or the "other values are implementation specific".

Added the clarifying sentence to the spec.

-

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default [v3]

2021-07-14 Thread Naoto Sato
> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of 
> the changes is `Charset.defaultCharset()` returning `UTF-8` and 
> `file.encoding` system property being added in the spec, but another notable 
> modification is in `java.io.PrintStream` where it continues to use the 
> `Console` encoding as the default charset instead of `UTF-8`. Other changes 
> are mostly clarification of the term "default charset" and their links. 
> Corresponding CSR has also been drafted.
> 
> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266

Naoto Sato has updated the pull request incrementally with one additional 
commit since the last revision:

  Reflects comments

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4733/files
  - new: https://git.openjdk.java.net/jdk/pull/4733/files/981200f7..182dcb6d

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=4733=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=4733=01-02

  Stats: 28 lines in 2 files changed: 0 ins; 0 del; 28 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4733.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4733/head:pull/4733

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default [v2]

2021-07-14 Thread Roger Riggs
On Wed, 14 Jul 2021 20:53:34 GMT, Naoto Sato  wrote:

>> src/java.base/share/classes/java/io/FileReader.java line 41:
>> 
>>> 39:  * @see InputStreamReader
>>> 40:  * @see FileInputStream
>>> 41:  * @see java.nio.charset.Charset#defaultCharset()
>> 
>> The @ see duplicates the link above, the javadoc can do without the @ see.
>
> If I remove that `@see`, I don't see the link in `See Also` section. Am I 
> missing something?

In my view the @ linkplain is sufficient to allow the reader to navigate; but 
YMMV.

>> src/java.base/share/classes/java/lang/System.java line 802:
>> 
>>> 800:  * {@systemProperty file.encoding}
>>> 801:  * The name of the default charset. Users may specify
>>> 802:  * {@code UTF-8} or {@code COMPAT} on the command line to the 
>>> value.
>> 
>> The wording could imply that only those two values can be supplied.
>> It could be rephrased to say that *if* the property is supplied on the 
>> command line
>> it overrides the default UTF-8.
>
> That was intentional. Only those two are supported, others continue to work 
> as before (but not supported).

Still it leaves an uncomfortable feeling, perhaps remedied by an "other values 
have unspecified behavior"
or the "other values are implementation specific".

-

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default [v2]

2021-07-14 Thread Naoto Sato
On Wed, 14 Jul 2021 16:20:28 GMT, Jesse Glick 
 wrote:

>> src/java.base/share/classes/java/io/PrintStream.java line 49:
>> 
>>> 47:  *  All characters printed by a {@code PrintStream} are converted 
>>> into
>>> 48:  * bytes using the given encoding or charset, or the default
>>> 49:  * console charset if not specified.
>> 
>> JEP 400 doesn't give a rationale for using the console charset for 
>> PrintStream.
>> PrintStreams are used for output to files and other media other than just a 
>> tty/console.
>> The charset of system.out/err should use the console charset.
>
> This was my thinking in 
> https://github.com/openjdk/jdk/pull/4733#issuecomment-876793372.

OK, I am now conviced. Modified not to default to Console.charset() for generic 
PrintStream w/o charset constructor.

-

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default [v2]

2021-07-14 Thread Naoto Sato
> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of 
> the changes is `Charset.defaultCharset()` returning `UTF-8` and 
> `file.encoding` system property being added in the spec, but another notable 
> modification is in `java.io.PrintStream` where it continues to use the 
> `Console` encoding as the default charset instead of `UTF-8`. Other changes 
> are mostly clarification of the term "default charset" and their links. 
> Corresponding CSR has also been drafted.
> 
> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266

Naoto Sato has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains three additional commits since 
the last revision:

 - Reflects review comments
 - Merge branch 'master' into JDK-8260265
 - 8260265: UTF-8 by Default

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4733/files
  - new: https://git.openjdk.java.net/jdk/pull/4733/files/107210cf..981200f7

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=4733=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=4733=00-01

  Stats: 6434 lines in 314 files changed: 3877 ins; 1452 del; 1105 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4733.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4733/head:pull/4733

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default [v2]

2021-07-14 Thread Naoto Sato
On Wed, 14 Jul 2021 14:55:39 GMT, Roger Riggs  wrote:

>> Naoto Sato has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains three additional 
>> commits since the last revision:
>> 
>>  - Reflects review comments
>>  - Merge branch 'master' into JDK-8260265
>>  - 8260265: UTF-8 by Default
>
> src/java.base/share/classes/java/io/Console.java line 587:
> 
>> 585: try {
>> 586: cs = Charset.forName(csname);
>> 587: } catch (Exception ignored) { }
> 
> A separate enhancement...
> I've long thought that should be a way to avoid the exception here.
> For example,  a Charset.forName(csname, default);
> The caller might have a default in mind or supply null and then be able to 
> test for null.

Agreed. Will file an RFE for this.

> src/java.base/share/classes/java/io/FileReader.java line 41:
> 
>> 39:  * @see InputStreamReader
>> 40:  * @see FileInputStream
>> 41:  * @see java.nio.charset.Charset#defaultCharset()
> 
> The @ see duplicates the link above, the javadoc can do without the @ see.

If I remove that `@see`, I don't see the link in `See Also` section. Am I 
missing something?

> src/java.base/share/classes/java/lang/System.java line 802:
> 
>> 800:  * {@systemProperty file.encoding}
>> 801:  * The name of the default charset. Users may specify
>> 802:  * {@code UTF-8} or {@code COMPAT} on the command line to the 
>> value.
> 
> The wording could imply that only those two values can be supplied.
> It could be rephrased to say that *if* the property is supplied on the 
> command line
> it overrides the default UTF-8.

That was intentional. Only those two are supported, others continue to work as 
before (but not supported).

> src/java.base/share/classes/java/nio/charset/Charset.java line 601:
> 
>> 599:  * value designates {@code COMPAT}, the default charset is derived 
>> from
>> 600:  * the {@code native.encoding} system property, which typically 
>> depends
>> 601:  * upon the locale and charset of the underlying operating system.
> 
> The description in java.lang.System of the file.encoding property does not 
> indicate it is 'implementation specific'.
> In that context, it appears to be part of the JavaSE spec.
> Having the spec in a single place with references to it from others could 
> avoid duplication.

`file.encoding` is listed under `@implNote` tag in `System.getProperties()`, so 
it is implementation-specific.

-

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default

2021-07-14 Thread Naoto Sato
On Wed, 14 Jul 2021 12:39:46 GMT, Giacomo Baso 
 wrote:

> > Consider an application that creates a java.io.FileWriter with its 
> > one-argument constructor and then uses it to write some text to a file. The 
> > resulting file will contain a sequence of bytes encoded using the default 
> > charset of the JDK running the application. A second application, run on a 
> > different machine or by a different user on the same machine, creates a 
> > java.io.FileReader with its one-argument constructor and uses it to read 
> > the bytes in that file. The resulting text contains a sequence of 
> > characters decoded using the default charset of the JDK running the second 
> > application. If the default charset differs between the JDK of the first 
> > application and the JDK of the second application, then the resulting text 
> > may be silently corrupted or incomplete, since these APIs replace erroneous 
> > input rather than fail.
> 
> It's even worse than that, because many OpenSSH installs are configured by 
> default to [forward](https://man.openbsd.org/ssh_config.5#SendEnv) and 
> [accept](https://man.openbsd.org/sshd_config.5#AcceptEnv) the user locale 
> (see e.g. for [RHEL 7](https://access.redhat.com/solutions/974273)).
> 
> So a single application, on a single remote machine, can be unknowingly 
> started by a single user with different locales, and therefore different 
> encodings, depending on how the user connected to the remote machine. For 
> example, on Windows connecting via powershell results in `LANG=en_US.UTF-8`, 
> while using WSL2 results in `LANG=C.UTF-8`. On Java 11 in a RHEL7 machine, 
> `file.encoding` results in `UTF-8` in the first case, but `ANSI_X3.4-1968` in 
> the second, leading to a default charset `ASCII`.
> 
> Worth mentioning is also that `Charset.forName("default")` is just an alias 
> to `ASCII`, per `sun.nio.cs.StandardCharsets$Aliases`.

Thanks. Updated the JEP.

-

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default

2021-07-14 Thread Jesse Glick
On Wed, 14 Jul 2021 15:09:41 GMT, Roger Riggs  wrote:

>> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of 
>> the changes is `Charset.defaultCharset()` returning `UTF-8` and 
>> `file.encoding` system property being added in the spec, but another notable 
>> modification is in `java.io.PrintStream` where it continues to use the 
>> `Console` encoding as the default charset instead of `UTF-8`. Other changes 
>> are mostly clarification of the term "default charset" and their links. 
>> Corresponding CSR has also been drafted.
>> 
>> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
>> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266
>
> src/java.base/share/classes/java/io/PrintStream.java line 49:
> 
>> 47:  *  All characters printed by a {@code PrintStream} are converted into
>> 48:  * bytes using the given encoding or charset, or the default
>> 49:  * console charset if not specified.
> 
> JEP 400 doesn't give a rationale for using the console charset for 
> PrintStream.
> PrintStreams are used for output to files and other media other than just a 
> tty/console.
> The charset of system.out/err should use the console charset.

This was my thinking in 
https://github.com/openjdk/jdk/pull/4733#issuecomment-876793372.

-

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default

2021-07-14 Thread Roger Riggs
On Thu, 8 Jul 2021 21:23:00 GMT, Naoto Sato  wrote:

> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of 
> the changes is `Charset.defaultCharset()` returning `UTF-8` and 
> `file.encoding` system property being added in the spec, but another notable 
> modification is in `java.io.PrintStream` where it continues to use the 
> `Console` encoding as the default charset instead of `UTF-8`. Other changes 
> are mostly clarification of the term "default charset" and their links. 
> Corresponding CSR has also been drafted.
> 
> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266

src/java.base/share/classes/java/io/ByteArrayOutputStream.java line 291:

> 289:  * method, which takes an encoding-name or charset argument,
> 290:  * or the {@code toString()} method, which uses the default
> 291:  * charset.

Fold to previous line.

src/java.base/share/classes/java/io/Console.java line 587:

> 585: try {
> 586: cs = Charset.forName(csname);
> 587: } catch (Exception ignored) { }

A separate enhancement...
I've long thought that should be a way to avoid the exception here.
For example,  a Charset.forName(csname, default);
The caller might have a default in mind or supply null and then be able to test 
for null.

src/java.base/share/classes/java/io/FileReader.java line 41:

> 39:  * @see InputStreamReader
> 40:  * @see FileInputStream
> 41:  * @see java.nio.charset.Charset#defaultCharset()

The @ see duplicates the link above, the javadoc can do without the @ see.

src/java.base/share/classes/java/io/InputStreamReader.java line 39:

> 37:  * java.nio.charset.Charset charset}.  The charset that it uses
> 38:  * may be specified by name or may be given explicitly, or the
> 39:  * {@link Charset#defaultCharset() default charset} may be accepted.

"may be accepted" seems like the API has some choice in the matter. 
Perhaps "accepted" -> "used".
And in other classes below if there's a suitable replacement.

src/java.base/share/classes/java/io/PrintStream.java line 49:

> 47:  *  All characters printed by a {@code PrintStream} are converted into
> 48:  * bytes using the given encoding or charset, or the default
> 49:  * console charset if not specified.

JEP 400 doesn't give a rationale for using the console charset for PrintStream.
PrintStreams are used for output to files and other media other than just a 
tty/console.
The charset of system.out/err should use the console charset.

src/java.base/share/classes/java/lang/System.java line 802:

> 800:  * {@systemProperty file.encoding}
> 801:  * The name of the default charset. Users may specify
> 802:  * {@code UTF-8} or {@code COMPAT} on the command line to the 
> value.

The wording could imply that only those two values can be supplied.
It could be rephrased to say that *if* the property is supplied on the command 
line
it overrides the default UTF-8.

src/java.base/share/classes/java/net/URLDecoder.java line 92:

> 90: 
> 91: // The default charset
> 92: static String dfltEncName = URLEncoder.dfltEncName;

Perhaps add the value of file.encoding to the StaticProperties either as a 
string or as the Charset.
That would allow a few different lookups of the property to be simplified.

src/java.base/share/classes/java/net/URLEncoder.java line 165:

> 163: try {
> 164: str = encode(s, dfltEncName);
> 165: } catch (UnsupportedEncodingException e) {

Perhaps a separate cleanup, the Charset should be cached, not just the name and 
use the `encode(s, charset)` method.

src/java.base/share/classes/java/nio/charset/Charset.java line 601:

> 599:  * value designates {@code COMPAT}, the default charset is derived 
> from
> 600:  * the {@code native.encoding} system property, which typically 
> depends
> 601:  * upon the locale and charset of the underlying operating system.

The description in java.lang.System of the file.encoding property does not 
indicate it is 'implementation specific'.
In that context, it appears to be part of the JavaSE spec.
Having the spec in a single place with references to it from others could avoid 
duplication.

-

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default

2021-07-14 Thread Giacomo Baso
On Thu, 8 Jul 2021 21:23:00 GMT, Naoto Sato  wrote:

> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of 
> the changes is `Charset.defaultCharset()` returning `UTF-8` and 
> `file.encoding` system property being added in the spec, but another notable 
> modification is in `java.io.PrintStream` where it continues to use the 
> `Console` encoding as the default charset instead of `UTF-8`. Other changes 
> are mostly clarification of the term "default charset" and their links. 
> Corresponding CSR has also been drafted.
> 
> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266

> Consider an application that creates a java.io.FileWriter with its 
> one-argument constructor and then uses it to write some text to a file. The 
> resulting file will contain a sequence of bytes encoded using the default 
> charset of the JDK running the application. A second application, run on a 
> different machine or by a different user on the same machine, creates a 
> java.io.FileReader with its one-argument constructor and uses it to read the 
> bytes in that file. The resulting text contains a sequence of characters 
> decoded using the default charset of the JDK running the second application. 
> If the default charset differs between the JDK of the first application and 
> the JDK of the second application, then the resulting text may be silently 
> corrupted or incomplete, since these APIs replace erroneous input rather than 
> fail.

It's even worse than that, because many OpenSSH installs are configured by 
default to [forward](https://man.openbsd.org/ssh_config.5#SendEnv) and 
[accept](https://man.openbsd.org/sshd_config.5#AcceptEnv) the user locale (see 
e.g. for [RHEL 7](https://access.redhat.com/solutions/974273)).

So a single application, on a single remote machine, can be unknowingly started 
by a single user with different locales, and therefore different encodings, 
depending on how the user connected to the remote machine. For example, on 
Windows connecting via powershell results in `LANG=en_US.UTF-8`, while using 
WSL2 results in `LANG=C.UTF-8`. On Java 11 in a RHEL7 machine, `file.encoding` 
results in `UTF-8` in the first case, but `ANSI_X3.4-1968` in the second, 
leading to a default charset `ASCII`.

Worth mentioning is also that `Charset.forName("default")` is just an alias to 
`ASCII`, per `sun.nio.cs.StandardCharsets$Aliases`.

-

PR: https://git.openjdk.java.net/jdk/pull/4733


Re: RFR: 8260265: UTF-8 by Default

2021-07-12 Thread Jesse Glick
On Thu, 8 Jul 2021 21:23:00 GMT, Naoto Sato  wrote:

> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of 
> the changes is `Charset.defaultCharset()` returning `UTF-8` and 
> `file.encoding` system property being added in the spec, but another notable 
> modification is in `java.io.PrintStream` where it continues to use the 
> `Console` encoding as the default charset instead of `UTF-8`. Other changes 
> are mostly clarification of the term "default charset" and their links. 
> Corresponding CSR has also been drafted.
> 
> JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
> CSR: https://bugs.openjdk.java.net/browse/JDK-8260266

I would have expected `System.out` and `.err` to use the console encoding, but 
application-constructed `PrintStream`s to use UTF-8 by default.

-

PR: https://git.openjdk.java.net/jdk/pull/4733


RFR: 8260265: UTF-8 by Default

2021-07-08 Thread Naoto Sato
This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of the 
changes is `Charset.defaultCharset()` returning `UTF-8` and `file.encoding` 
system property being added in the spec, but another notable modification is in 
`java.io.PrintStream` where it continues to use the `Console` encoding as the 
default charset instead of `UTF-8`. Other changes are mostly clarification of 
the term "default charset" and their links. Corresponding CSR has also been 
drafted.

JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041
CSR: https://bugs.openjdk.java.net/browse/JDK-8260266

-

Commit messages:
 - 8260265: UTF-8 by Default

Changes: https://git.openjdk.java.net/jdk/pull/4733/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=4733=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8260265
  Stats: 297 lines in 18 files changed: 184 ins; 9 del; 104 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4733.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4733/head:pull/4733

PR: https://git.openjdk.java.net/jdk/pull/4733