Re: RFR: 8234835 Use UTF-8 charset in make support Java code

2019-12-05 Thread Erik Joelsson

I'm fine with this.

/Erik

On 2019-12-04 15:30, Dan Smith wrote:

On Dec 3, 2019, at 5:24 PM, Jonathan Gibbons  
wrote:

Hi Dan,

I think it's a combination of oral tradition and long-standing precedent.

Earlier this year, I raised this general issue, partly because of inconsistent 
use of -encoding in the build system.  The response was that there was some 
concern that not all tools in the tool chain could handle UTF-8 files.

$ find open/make -name \*.gmk | xargs grep -o -e '-encoding [^ ]*'
open/make/Docs.gmk:-encoding ISO-8859-1
open/make/Docs.gmk:-encoding ISO-8859-1
open/make/common/SetupJavaCompilers.gmk:-encoding ascii
open/make/common/SetupJavaCompilers.gmk:-encoding ascii

I think we should be consistent, but (at the time) it did not seem worth 
pushing for UTF-8 everywhere.

Yeah, I think I'll join you in not being ready for this crusade. Might be nice, 
but will require broad familiarity with everything in the JDK that touches text.

Instead, I'm happy to assert that, within the small space of spec processing 
tools, we need to support UTF-8, and so target my changeset to just the 
fixuppandoc tool.

Can I get a review on this specific change?:

-

diff -r 8c4c358272a9 -r 4c0e6c85037c 
make/jdk/src/classes/build/tools/fixuppandoc/Main.java
--- a/make/jdk/src/classes/build/tools/fixuppandoc/Main.javaFri Nov 15 
20:39:26 2019 +0800
+++ b/make/jdk/src/classes/build/tools/fixuppandoc/Main.javaWed Dec 04 
16:24:25 2019 -0700
@@ -46,6 +46,7 @@
  import java.util.Set;
  import java.util.regex.Matcher;
  import java.util.regex.Pattern;
+import static java.nio.charset.StandardCharsets.UTF_8;
  
  /**

   * Fixup HTML generated by pandoc.
@@ -184,7 +185,7 @@
  if (inFile != null) {
  read(inFile);
  } else {
-read(new BufferedReader(new InputStreamReader(System.in)));
+read(new BufferedReader(new InputStreamReader(System.in, 
UTF_8)));
  }
  }
  }
@@ -198,9 +199,9 @@
   */
  private Writer openWriter(Path file) throws IOException {
  if (file != null) {
-return Files.newBufferedWriter(file);
+return Files.newBufferedWriter(file, UTF_8);
  } else {
-return new BufferedWriter(new OutputStreamWriter(System.out) {
+return new BufferedWriter(new OutputStreamWriter(System.out, 
UTF_8) {
  @Override
  public void close() throws IOException {
  flush();
@@ -615,7 +616,7 @@
   * @param file the file
   */
  void read(Path file) {
-try (Reader r = Files.newBufferedReader(file)) {
+try (Reader r = Files.newBufferedReader(file, UTF_8)) {
  this.file = file;
  read(r);
  } catch (IOException e) {



Re: RFR: 8234835 Use UTF-8 charset in make support Java code

2019-12-04 Thread Jonathan Gibbons

+1 for the change. +1 for Martin's suggestion for an additional comment.

-- Jon


On 12/4/19 3:52 PM, Martin Buchholz wrote:
Looks good ... but please add a comment pointing to 
https://pandoc.org/MANUAL.html

"""Pandoc uses the UTF-8 character encoding for both input and output."""

On Wed, Dec 4, 2019 at 3:30 PM Dan Smith > wrote:


> On Dec 3, 2019, at 5:24 PM, Jonathan Gibbons
mailto:jonathan.gibb...@oracle.com>>
wrote:
>
> Hi Dan,
>
> I think it's a combination of oral tradition and long-standing
precedent.
>
> Earlier this year, I raised this general issue, partly because
of inconsistent use of -encoding in the build system. The response
was that there was some concern that not all tools in the tool
chain could handle UTF-8 files.
>
> $ find open/make -name \*.gmk | xargs grep -o -e '-encoding [^ ]*'
> open/make/Docs.gmk:-encoding ISO-8859-1
> open/make/Docs.gmk:-encoding ISO-8859-1
> open/make/common/SetupJavaCompilers.gmk:-encoding ascii
> open/make/common/SetupJavaCompilers.gmk:-encoding ascii
>
> I think we should be consistent, but (at the time) it did not
seem worth pushing for UTF-8 everywhere.

Yeah, I think I'll join you in not being ready for this crusade.
Might be nice, but will require broad familiarity with everything
in the JDK that touches text.

Instead, I'm happy to assert that, within the small space of spec
processing tools, we need to support UTF-8, and so target my
changeset to just the fixuppandoc tool.

Can I get a review on this specific change?:

-

diff -r 8c4c358272a9 -r 4c0e6c85037c
make/jdk/src/classes/build/tools/fixuppandoc/Main.java
--- a/make/jdk/src/classes/build/tools/fixuppandoc/Main.java   Fri
Nov 15 20:39:26 2019 +0800
+++ b/make/jdk/src/classes/build/tools/fixuppandoc/Main.java   Wed
Dec 04 16:24:25 2019 -0700
@@ -46,6 +46,7 @@
 import java.util.Set;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
+import static java.nio.charset.StandardCharsets.UTF_8;

 /**
  * Fixup HTML generated by pandoc.
@@ -184,7 +185,7 @@
                 if (inFile != null) {
                     read(inFile);
                 } else {
-                    read(new BufferedReader(new
InputStreamReader(System.in)));
+                    read(new BufferedReader(new
InputStreamReader(System.in, UTF_8)));
                 }
             }
         }
@@ -198,9 +199,9 @@
          */
         private Writer openWriter(Path file) throws IOException {
             if (file != null) {
-                return Files.newBufferedWriter(file);
+                return Files.newBufferedWriter(file, UTF_8);
             } else {
-                return new BufferedWriter(new
OutputStreamWriter(System.out) {
+                return new BufferedWriter(new
OutputStreamWriter(System.out, UTF_8) {
                     @Override
                     public void close() throws IOException {
                         flush();
@@ -615,7 +616,7 @@
          * @param file the file
          */
         void read(Path file) {
-            try (Reader r = Files.newBufferedReader(file)) {
+            try (Reader r = Files.newBufferedReader(file, UTF_8)) {
                 this.file = file;
                 read(r);
             } catch (IOException e) {



Re: RFR: 8234835 Use UTF-8 charset in make support Java code

2019-12-04 Thread Martin Buchholz
Looks good ... but please add a comment pointing to
https://pandoc.org/MANUAL.html
"""Pandoc uses the UTF-8 character encoding for both input and output."""

On Wed, Dec 4, 2019 at 3:30 PM Dan Smith  wrote:

> > On Dec 3, 2019, at 5:24 PM, Jonathan Gibbons <
> jonathan.gibb...@oracle.com> wrote:
> >
> > Hi Dan,
> >
> > I think it's a combination of oral tradition and long-standing precedent.
> >
> > Earlier this year, I raised this general issue, partly because of
> inconsistent use of -encoding in the build system.  The response was that
> there was some concern that not all tools in the tool chain could handle
> UTF-8 files.
> >
> > $ find open/make -name \*.gmk | xargs grep -o -e '-encoding [^ ]*'
> > open/make/Docs.gmk:-encoding ISO-8859-1
> > open/make/Docs.gmk:-encoding ISO-8859-1
> > open/make/common/SetupJavaCompilers.gmk:-encoding ascii
> > open/make/common/SetupJavaCompilers.gmk:-encoding ascii
> >
> > I think we should be consistent, but (at the time) it did not seem worth
> pushing for UTF-8 everywhere.
>
> Yeah, I think I'll join you in not being ready for this crusade. Might be
> nice, but will require broad familiarity with everything in the JDK that
> touches text.
>
> Instead, I'm happy to assert that, within the small space of spec
> processing tools, we need to support UTF-8, and so target my changeset to
> just the fixuppandoc tool.
>
> Can I get a review on this specific change?:
>
> -
>
> diff -r 8c4c358272a9 -r 4c0e6c85037c
> make/jdk/src/classes/build/tools/fixuppandoc/Main.java
> --- a/make/jdk/src/classes/build/tools/fixuppandoc/Main.javaFri Nov 15
> 20:39:26 2019 +0800
> +++ b/make/jdk/src/classes/build/tools/fixuppandoc/Main.javaWed Dec 04
> 16:24:25 2019 -0700
> @@ -46,6 +46,7 @@
>  import java.util.Set;
>  import java.util.regex.Matcher;
>  import java.util.regex.Pattern;
> +import static java.nio.charset.StandardCharsets.UTF_8;
>
>  /**
>   * Fixup HTML generated by pandoc.
> @@ -184,7 +185,7 @@
>  if (inFile != null) {
>  read(inFile);
>  } else {
> -read(new BufferedReader(new
> InputStreamReader(System.in)));
> +read(new BufferedReader(new
> InputStreamReader(System.in, UTF_8)));
>  }
>  }
>  }
> @@ -198,9 +199,9 @@
>   */
>  private Writer openWriter(Path file) throws IOException {
>  if (file != null) {
> -return Files.newBufferedWriter(file);
> +return Files.newBufferedWriter(file, UTF_8);
>  } else {
> -return new BufferedWriter(new
> OutputStreamWriter(System.out) {
> +return new BufferedWriter(new
> OutputStreamWriter(System.out, UTF_8) {
>  @Override
>  public void close() throws IOException {
>  flush();
> @@ -615,7 +616,7 @@
>   * @param file the file
>   */
>  void read(Path file) {
> -try (Reader r = Files.newBufferedReader(file)) {
> +try (Reader r = Files.newBufferedReader(file, UTF_8)) {
>  this.file = file;
>  read(r);
>  } catch (IOException e) {
>
>


Re: RFR: 8234835 Use UTF-8 charset in make support Java code

2019-12-04 Thread Dan Smith
> On Dec 3, 2019, at 5:24 PM, Jonathan Gibbons  
> wrote:
> 
> Hi Dan,
> 
> I think it's a combination of oral tradition and long-standing precedent.
> 
> Earlier this year, I raised this general issue, partly because of 
> inconsistent use of -encoding in the build system.  The response was that 
> there was some concern that not all tools in the tool chain could handle 
> UTF-8 files.
> 
> $ find open/make -name \*.gmk | xargs grep -o -e '-encoding [^ ]*'
> open/make/Docs.gmk:-encoding ISO-8859-1
> open/make/Docs.gmk:-encoding ISO-8859-1
> open/make/common/SetupJavaCompilers.gmk:-encoding ascii
> open/make/common/SetupJavaCompilers.gmk:-encoding ascii
> 
> I think we should be consistent, but (at the time) it did not seem worth 
> pushing for UTF-8 everywhere.

Yeah, I think I'll join you in not being ready for this crusade. Might be nice, 
but will require broad familiarity with everything in the JDK that touches text.

Instead, I'm happy to assert that, within the small space of spec processing 
tools, we need to support UTF-8, and so target my changeset to just the 
fixuppandoc tool.

Can I get a review on this specific change?:

-

diff -r 8c4c358272a9 -r 4c0e6c85037c 
make/jdk/src/classes/build/tools/fixuppandoc/Main.java
--- a/make/jdk/src/classes/build/tools/fixuppandoc/Main.javaFri Nov 15 
20:39:26 2019 +0800
+++ b/make/jdk/src/classes/build/tools/fixuppandoc/Main.javaWed Dec 04 
16:24:25 2019 -0700
@@ -46,6 +46,7 @@
 import java.util.Set;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
+import static java.nio.charset.StandardCharsets.UTF_8;
 
 /**
  * Fixup HTML generated by pandoc.
@@ -184,7 +185,7 @@
 if (inFile != null) {
 read(inFile);
 } else {
-read(new BufferedReader(new InputStreamReader(System.in)));
+read(new BufferedReader(new InputStreamReader(System.in, 
UTF_8)));
 }
 }
 }
@@ -198,9 +199,9 @@
  */
 private Writer openWriter(Path file) throws IOException {
 if (file != null) {
-return Files.newBufferedWriter(file);
+return Files.newBufferedWriter(file, UTF_8);
 } else {
-return new BufferedWriter(new OutputStreamWriter(System.out) {
+return new BufferedWriter(new OutputStreamWriter(System.out, 
UTF_8) {
 @Override
 public void close() throws IOException {
 flush();
@@ -615,7 +616,7 @@
  * @param file the file
  */
 void read(Path file) {
-try (Reader r = Files.newBufferedReader(file)) {
+try (Reader r = Files.newBufferedReader(file, UTF_8)) {
 this.file = file;
 read(r);
 } catch (IOException e) {



Re: RFR: 8234835 Use UTF-8 charset in make support Java code

2019-12-03 Thread Jonathan Gibbons

Hi Dan,

I think it's a combination of oral tradition and long-standing precedent.

Earlier this year, I raised this general issue, partly because of 
inconsistent use of -encoding in the build system.  The response was 
that there was some concern that not all tools in the tool chain could 
handle UTF-8 files.


$ find open/make -name \*.gmk | xargs grep -o -e '-encoding [^ ]*'
open/make/Docs.gmk:-encoding ISO-8859-1
open/make/Docs.gmk:-encoding ISO-8859-1
open/make/common/SetupJavaCompilers.gmk:-encoding ascii
open/make/common/SetupJavaCompilers.gmk:-encoding ascii

I think we should be consistent, but (at the time) it did not seem worth 
pushing for UTF-8 everywhere.


-- Jon


On 11/27/2019 07:23 PM, Dan Smith wrote:

For the other files, it seems strange to force the use of a charset
which is different from the charset of record for all our source files
(i.e. US-ASCII).

Can you clarify where this "charset of record" rule comes from? Is this written 
down somewhere, or more of an oral tradition?

The non-ASCII characters I'm working with are, in fact, in the original 
Markdown sources. If it's really important to avoid those in all sources, I 
could (reluctantly) use a different strategy.

If the consensus is that the build tools should standardize on US-ASCII, I 
guess there's a separate question about whether we're willing to rely on the 
implicit platform default (now uniformly US-ASCII via command-line args), or 
whether it's better to be explicit about it (s/UTF_8/US_ASCII/ in my changeset).




Re: RFR: 8234835 Use UTF-8 charset in make support Java code

2019-11-27 Thread Dan Smith
> For the other files, it seems strange to force the use of a charset 
> which is different from the charset of record for all our source files 
> (i.e. US-ASCII).

Can you clarify where this "charset of record" rule comes from? Is this written 
down somewhere, or more of an oral tradition?

The non-ASCII characters I'm working with are, in fact, in the original 
Markdown sources. If it's really important to avoid those in all sources, I 
could (reluctantly) use a different strategy.

If the consensus is that the build tools should standardize on US-ASCII, I 
guess there's a separate question about whether we're willing to rely on the 
implicit platform default (now uniformly US-ASCII via command-line args), or 
whether it's better to be explicit about it (s/UTF_8/US_ASCII/ in my changeset).

Re: RFR: 8234835 Use UTF-8 charset in make support Java code

2019-11-27 Thread Jonathan Gibbons

Martin,

Agreed, but this is not the email-thread to have that discussion on 
behalf of all OpenJDK.


-- Jon


On 11/27/2019 10:18 AM, Martin Buchholz wrote:

The ubiquitous use of UTF-8 is one of the few clear successes in the
software world.
In 2019, most software projects should declare that all their source files
are encoded in UTF-8, not US-ASCII.

On Wed, Nov 27, 2019 at 9:04 AM Dan Smith  wrote:


Please review this patch to make explicit use of the UTF-8 charset in
build tools' IO code.

JDK-8065704 changed the platform default to US-ASCII, so the intended
effect of this change is to address a regression and restore the typical
earlier behavior.

My particular interest is in fixuppandoc, but it seems like we might has
well patch all of this code to avoid relying on the platform default.

http://cr.openjdk.java.net/~dlsmith/8234835/




Re: RFR: 8234835 Use UTF-8 charset in make support Java code

2019-11-27 Thread Martin Buchholz
The ubiquitous use of UTF-8 is one of the few clear successes in the
software world.
In 2019, most software projects should declare that all their source files
are encoded in UTF-8, not US-ASCII.

On Wed, Nov 27, 2019 at 9:04 AM Dan Smith  wrote:

> Please review this patch to make explicit use of the UTF-8 charset in
> build tools' IO code.
>
> JDK-8065704 changed the platform default to US-ASCII, so the intended
> effect of this change is to address a regression and restore the typical
> earlier behavior.
>
> My particular interest is in fixuppandoc, but it seems like we might has
> well patch all of this code to avoid relying on the platform default.
>
> http://cr.openjdk.java.net/~dlsmith/8234835/


Re: RFR: 8234835 Use UTF-8 charset in make support Java code

2019-11-27 Thread Jonathan Gibbons
fixuppandoc is notable because it reads generated files .. which 
presumably contain non-ASCII UTF-8 characters.


For the other files, it seems strange to force the use of a charset 
which is different from the charset of record for all our source files 
(i.e. US-ASCII).  I'm not saying that UTF-8 isn't a good choice, but it 
seems questionable to be inconsistent.


--Jon

On 11/27/19 9:04 AM, Dan Smith wrote:

Please review this patch to make explicit use of the UTF-8 charset in build 
tools' IO code.

JDK-8065704 changed the platform default to US-ASCII, so the intended effect of 
this change is to address a regression and restore the typical earlier behavior.

My particular interest is in fixuppandoc, but it seems like we might has well 
patch all of this code to avoid relying on the platform default.

http://cr.openjdk.java.net/~dlsmith/8234835/


RFR: 8234835 Use UTF-8 charset in make support Java code

2019-11-27 Thread Dan Smith
Please review this patch to make explicit use of the UTF-8 charset in build 
tools' IO code.

JDK-8065704 changed the platform default to US-ASCII, so the intended effect of 
this change is to address a regression and restore the typical earlier behavior.

My particular interest is in fixuppandoc, but it seems like we might has well 
patch all of this code to avoid relying on the platform default.

http://cr.openjdk.java.net/~dlsmith/8234835/