Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-17 Thread Ichiroh Takiguchi
On Mon, 2 May 2022 15:00:07 GMT, Roger Riggs  wrote:

>> Ichiroh Takiguchi has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8285517: System.getenv() returns unexpected value if environment variable 
>> has non ASCII character
>
> Can you clarify the use of different charset properties for the 
> ProcessEnvironment vs the CommandLine strings?
> 
> They are used to decode or encode strings in the APIs to native functions 
> respectively.
> If I understand `native.encoding` was introduced to reflect the default 
> encoding of file contents, while `sun.jnu.encoding` is used to encode 
> filenames.
> 
> I think I would have expected to see `sun.jnu.encoding` to be used in all 
> contexts when encoding/decoding strings to the native representations.  
> (Assuming its not required for backward compatibility).

Hello @RogerRiggs .
Do you have any comment ?

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-13 Thread Ichiroh Takiguchi
On Mon, 2 May 2022 15:00:07 GMT, Roger Riggs  wrote:

>> Ichiroh Takiguchi has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8285517: System.getenv() returns unexpected value if environment variable 
>> has non ASCII character
>
> Can you clarify the use of different charset properties for the 
> ProcessEnvironment vs the CommandLine strings?
> 
> They are used to decode or encode strings in the APIs to native functions 
> respectively.
> If I understand `native.encoding` was introduced to reflect the default 
> encoding of file contents, while `sun.jnu.encoding` is used to encode 
> filenames.
> 
> I think I would have expected to see `sun.jnu.encoding` to be used in all 
> contexts when encoding/decoding strings to the native representations.  
> (Assuming its not required for backward compatibility).

Hello @RogerRiggs and @naotoj .
I modified `ProcessEnvironment.java`and `ProcessImpl.java`.
`i18nEnvArg.java` checks return code instead of string detection.

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-07 Thread Ichiroh Takiguchi
On Mon, 2 May 2022 15:00:07 GMT, Roger Riggs  wrote:

>> Ichiroh Takiguchi has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8285517: System.getenv() returns unexpected value if environment variable 
>> has non ASCII character
>
> Can you clarify the use of different charset properties for the 
> ProcessEnvironment vs the CommandLine strings?
> 
> They are used to decode or encode strings in the APIs to native functions 
> respectively.
> If I understand `native.encoding` was introduced to reflect the default 
> encoding of file contents, while `sun.jnu.encoding` is used to encode 
> filenames.
> 
> I think I would have expected to see `sun.jnu.encoding` to be used in all 
> contexts when encoding/decoding strings to the native representations.  
> (Assuming its not required for backward compatibility).

Hello @RogerRiggs .
When I executed the new testcase with JDK18.0.1, I got following errors: 

ERROR: argument EUC_JP_TEXT is:
  Actual:   \u7FB2\u221A\uFFFD
  Expected: \u6F22\u5B57
ERROR: Variable EUC_JP_TEXT is encoded by:
  Actual:   \xE6\xBC\xA2\xE5\xAD\x97
  Expected: \xB4\xC1\xBB\xFA
ERROR: Value EUC_JP_TEXT is encoded by:
  Actual:   \xE6\xBC\xA2\xE5\xAD\x97
  Expected: \xB4\xC1\xBB\xFA

The new testcase verifies internal byte data for EUC_JP_TEXT environment 
variable and value.

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-06 Thread Roger Riggs
On Wed, 4 May 2022 03:01:19 GMT, Ichiroh Takiguchi  
wrote:

>> Do we need to verify the intermediate byte encoding? Could we simply compare 
>> the text given to environ.put() in Start process and System.getenv() in 
>> Verify process match?
>
> Hello @naotoj .
> I think if 2nd process' encoder (like UTF-8) and 3rd process' decoder are 
> same encoding, System.getenv() returns expected data.
> Actually the testcase checks 3 parts on Verify process:
> 1. Expected environment variable is defined or not (it uses System.getenv())
> 2. Expected argument is received or not
> 3. Expected environment variable is encoded by proper encoding
> 
> When I ran this testcase with jdk18.0.1, I got following result:
> 
> Unexpected argument was received: \u6F22\u5B57<->\u7FB2\u221A\uFFFD
> Unexpected environment variables: 
> \xE6\xBC\xA2\xE5\xAD\x97\x3D\xE6\xBC\xA2\xE5\xAD\x97
> 
> It means 1st test was passed, 2nd and 3rd test were failed.
> I don't think environment variable issue can be seen without 3rd test.
> Please let me know if you find out another way.

This part of the test is very brittle; I'm pretty sure it will fail on AIX that 
adds its own environment variables.
It should not fail if it finds the two entries it expects. It should ignore 
other entries.

I don't see what value it has over checking the entries from System.getEnv(), 
please elaborate.

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-06 Thread Ichiroh Takiguchi
On Mon, 2 May 2022 15:00:07 GMT, Roger Riggs  wrote:

>> Ichiroh Takiguchi has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8285517: System.getenv() returns unexpected value if environment variable 
>> has non ASCII character
>
> Can you clarify the use of different charset properties for the 
> ProcessEnvironment vs the CommandLine strings?
> 
> They are used to decode or encode strings in the APIs to native functions 
> respectively.
> If I understand `native.encoding` was introduced to reflect the default 
> encoding of file contents, while `sun.jnu.encoding` is used to encode 
> filenames.
> 
> I think I would have expected to see `sun.jnu.encoding` to be used in all 
> contexts when encoding/decoding strings to the native representations.  
> (Assuming its not required for backward compatibility).

Hello @RogerRiggs and @naotoj .
I put sun.jnu.encoding related code into s`StaticProperty.java`.
Could you review the files again including javadoc comment ?

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-03 Thread Ichiroh Takiguchi
On Tue, 3 May 2022 18:35:33 GMT, Naoto Sato  wrote:

>> I think 3 processes required for this testing.
>> 1. Start test process on ja_JP.eucjp locale
>> 2. Set the test data by encoder
>> 3. Verify the encoded data by decoder
>> 
>> To verify the encoded data, I think I need to check the byte data.
>
> Do we need to verify the intermediate byte encoding? Could we simply compare 
> the text given to environ.put() in Start process and System.getenv() in 
> Verify process match?

Hello @naotoj .
I think if 2nd process' encoder (like UTF-8) and 3rd process' decoder are same 
encoding, System.getenv() returns expected data.
Actually the testcase checks 3 parts on Verify process:
1. Expected environment variable is defined or not (it uses System.getenv())
2. Expected argument is received or not
3. Expected environment variable is encoded by proper encoding

When I ran this testcase with jdk18.0.1, I got following result:

Unexpected argument was received: \u6F22\u5B57<->\u7FB2\u221A\uFFFD
Unexpected environment variables: 
\xE6\xBC\xA2\xE5\xAD\x97\x3D\xE6\xBC\xA2\xE5\xAD\x97

It means 1st test was passed, 2nd and 3rd test were failed.
I don't think environment variable issue can be seen without 3rd test.
Please let me know if you find out another way.

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-03 Thread Naoto Sato
On Tue, 3 May 2022 17:30:34 GMT, Ichiroh Takiguchi  
wrote:

>> test/jdk/java/lang/System/i18nEnvArg.java line 63:
>> 
>>> 61: }
>>> 62: 
>>> 63: public static class Start {
>> 
>> Do we need `Start` class? Can it be merged with the parent to reduce the 
>> complexity?
>
> I'm not sure, it expected one...
> 
> I created i18nEnvArg.sh for start-up, then add `@modules jdk.charsets` and 
> remove `i18nEnvArg$Start` class.

Sorry if my comment was unclear. I meant to not create a shell script (shell 
scripts in the test are discouraged), but to re-use the main class twice, 
invoking with different arguments to main(), as both creating a testJvm process 
(one for eucjp jvm, and the other for Start).

>> test/jdk/java/lang/System/i18nEnvArg.java line 130:
>> 
>>> 128: environ_mid.setAccessible(true);
>>> 129: byte[][] environ = (byte[][]) environ_mid.invoke(null,
>>> 130: (Object[])null);
>> 
>> I am not sure I like this piece, invoking a package private method of 
>> `ProcessEnvironment` class. Can we simply verify the result of 
>> `System.getenv(EUC_JP_TEXT)`?
>
> I think 3 processes required for this testing.
> 1. Start test process on ja_JP.eucjp locale
> 2. Set the test data by encoder
> 3. Verify the encoded data by decoder
> 
> To verify the encoded data, I think I need to check the byte data.

Do we need to verify the intermediate byte encoding? Could we simply compare 
the text given to environ.put() in Start process and System.getenv() in Verify 
process match?

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-03 Thread Ichiroh Takiguchi
On Mon, 2 May 2022 20:50:11 GMT, Naoto Sato  wrote:

>> Ichiroh Takiguchi has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8285517: System.getenv() returns unexpected value if environment variable 
>> has non ASCII character
>
> test/jdk/java/lang/System/i18nEnvArg.java line 63:
> 
>> 61: }
>> 62: 
>> 63: public static class Start {
> 
> Do we need `Start` class? Can it be merged with the parent to reduce the 
> complexity?

I'm not sure, it expected one...

I created i18nEnvArg.sh for start-up, then add `@modules jdk.charsets` and 
remove `i18nEnvArg$Start` class.

> test/jdk/java/lang/System/i18nEnvArg.java line 87:
> 
>> 85: System.getProperty("java.class.path"),
>> 86: "i18nEnvArg$Verify",
>> 87: EUC_JP_TEXT);
> 
> Can utilize `ProcessTools` class.

Use `ProcessTools`

> test/jdk/java/lang/System/i18nEnvArg.java line 130:
> 
>> 128: environ_mid.setAccessible(true);
>> 129: byte[][] environ = (byte[][]) environ_mid.invoke(null,
>> 130: (Object[])null);
> 
> I am not sure I like this piece, invoking a package private method of 
> `ProcessEnvironment` class. Can we simply verify the result of 
> `System.getenv(EUC_JP_TEXT)`?

I think 3 processes required for this testing.
1. Start test process on ja_JP.eucjp locale
2. Set the test data by encoder
3. Verify the encoded data by decoder

To verify the encoded data, I think I need to check the byte data.

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-03 Thread Ichiroh Takiguchi
On Mon, 2 May 2022 15:00:07 GMT, Roger Riggs  wrote:

>> Ichiroh Takiguchi has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8285517: System.getenv() returns unexpected value if environment variable 
>> has non ASCII character
>
> Can you clarify the use of different charset properties for the 
> ProcessEnvironment vs the CommandLine strings?
> 
> They are used to decode or encode strings in the APIs to native functions 
> respectively.
> If I understand `native.encoding` was introduced to reflect the default 
> encoding of file contents, while `sun.jnu.encoding` is used to encode 
> filenames.
> 
> I think I would have expected to see `sun.jnu.encoding` to be used in all 
> contexts when encoding/decoding strings to the native representations.  
> (Assuming its not required for backward compatibility).

Hello @RogerRiggs  and @naotoj .

I put new testcase.
I'm not sure, it's expected one...

During my code changes, SecurityManager related testcases were failed.
I'm not sure I can use `System.getProperty("sun.jnu.encoding")` or not. 
So I put SecurityManager related code.

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-02 Thread Naoto Sato
On Sun, 1 May 2022 04:51:17 GMT, Ichiroh Takiguchi  
wrote:

>> On JDK19 with Linux ja_JP.eucjp locale,
>> System.getenv() returns unexpected value if environment variable has 
>> Japanese EUC characters.
>> It seems this issue happens because of JEP 400.
>> Arguments for ProcessBuilder have same kind of issue.
>
> Ichiroh Takiguchi has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   8285517: System.getenv() returns unexpected value if environment variable 
> has non ASCII character

test/jdk/java/lang/System/i18nEnvArg.java line 63:

> 61: }
> 62: 
> 63: public static class Start {

Do we need `Start` class? Can it be merged with the parent to reduce the 
complexity?

test/jdk/java/lang/System/i18nEnvArg.java line 87:

> 85: System.getProperty("java.class.path"),
> 86: "i18nEnvArg$Verify",
> 87: EUC_JP_TEXT);

Can utilize `ProcessTools` class.

test/jdk/java/lang/System/i18nEnvArg.java line 130:

> 128: environ_mid.setAccessible(true);
> 129: byte[][] environ = (byte[][]) environ_mid.invoke(null,
> 130: (Object[])null);

I am not sure I like this piece, invoking a package private method of 
`ProcessEnvironment` class. Can we simply verify the result of 
`System.getenv(EUC_JP_TEXT)`?

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-02 Thread Naoto Sato
On Sun, 1 May 2022 04:47:09 GMT, Ichiroh Takiguchi  
wrote:

>> test/jdk/java/lang/System/i18nEnvArg.java line 26:
>> 
>>> 24: /*
>>> 25:  * @test
>>> 26:  * @bug 8285517
>> 
>> If the test should work only on a particular env, should describe here and 
>> add `@requires`  tag.
>
> Add "@requires (os.family == "linux")"

Also `@modules jdk.charsets` should be added so that the test should be 
immediately ignored if it does not have `EUC-JP` charset.

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-02 Thread Ichiroh Takiguchi
On Mon, 2 May 2022 15:00:07 GMT, Roger Riggs  wrote:

>> Ichiroh Takiguchi has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8285517: System.getenv() returns unexpected value if environment variable 
>> has non ASCII character
>
> Can you clarify the use of different charset properties for the 
> ProcessEnvironment vs the CommandLine strings?
> 
> They are used to decode or encode strings in the APIs to native functions 
> respectively.
> If I understand `native.encoding` was introduced to reflect the default 
> encoding of file contents, while `sun.jnu.encoding` is used to encode 
> filenames.
> 
> I think I would have expected to see `sun.jnu.encoding` to be used in all 
> contexts when encoding/decoding strings to the native representations.  
> (Assuming its not required for backward compatibility).

Hello @RogerRiggs .
You are right. Both case, `sun.jnu.encoding` should be used.

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-05-02 Thread Roger Riggs
On Sun, 1 May 2022 04:51:17 GMT, Ichiroh Takiguchi  
wrote:

>> On JDK19 with Linux ja_JP.eucjp locale,
>> System.getenv() returns unexpected value if environment variable has 
>> Japanese EUC characters.
>> It seems this issue happens because of JEP 400.
>> Arguments for ProcessBuilder have same kind of issue.
>
> Ichiroh Takiguchi has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   8285517: System.getenv() returns unexpected value if environment variable 
> has non ASCII character

Can you clarify the use of different charset properties for the 
ProcessEnvironment vs the CommandLine strings?

They are used to decode or encode strings in the APIs to native functions 
respectively.
If I understand `native.encoding` was introduced to reflect the default 
encoding of file contents, while `sun.jnu.encoding` is used to encode filenames.

I think I would have expected to see `sun.jnu.encoding` to be used in all 
contexts when encoding/decoding strings to the native representations.  
(Assuming its not required for backward compatibility).

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-04-30 Thread Ichiroh Takiguchi
On Wed, 27 Apr 2022 17:20:11 GMT, Roger Riggs  wrote:

>> Ichiroh Takiguchi has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8285517: System.getenv() returns unexpected value if environment variable 
>> has non ASCII character
>
> src/java.base/unix/classes/java/lang/ProcessEnvironment.java line 68:
> 
>> 66: private static final Map theUnmodifiableEnvironment;
>> 67: static final int MIN_NAME_LENGTH = 0;
>> 68: static final Charset cs = 
>> Charset.forName(StaticProperty.nativeEncoding(),
> 
> cs should have an all-CAPS name to make it clear its a static value 
> throughout the implementation.
> And `private` too.

Change to "private static final"

> test/jdk/java/lang/System/i18nEnvArg.java line 41:
> 
>> 39: 
>> 40: public class i18nEnvArg {
>> 41: final static String text = "\u6F22\u5B57";
> 
> Can this be renamed to indicate it is the string under test?  like KANJI_TEXT 
> or EUC_JP_TEXT or similar.

Use EUC_JP_TEXT

> test/jdk/java/lang/System/i18nEnvArg.java line 49:
> 
>> 47: final static int maxSize = 4096;
>> 48: 
>> 49: public static void main(String[] args) throws Exception {
> 
> There are 3 main()'s in this test; it would be useful to add a comment 
> indicating the purpose of each.

Add comments

> test/jdk/java/lang/System/i18nEnvArg.java line 60:
> 
>> 58: Process p = pb.start();
>> 59: InputStream is = p.getInputStream();
>> 60: byte[] ba = new byte[maxSize];
> 
> You can use `InputStream.readAllBytes()` here to return a byte buffer.
> 
> Its not clear why all the bytes are read?  I assume its only for a possible 
> error from the child.

Use InputStream.readAllBytes()

> test/jdk/java/lang/System/i18nEnvArg.java line 128:
> 
>> 126: Method enviorn_mid = cls.getDeclaredMethod("environ");
>> 127: enviorn_mid.setAccessible(true);
>> 128: byte[][] environ = (byte[][]) enviorn_mid.invoke(null,
> 
> typo: "enviorn_mid" -> "environ_mid".

Fix typo

> test/jdk/java/lang/System/i18nEnvArg.java line 138:
> 
>> 136: bb.put(environ[i+1]);
>> 137: byte[] envb = Arrays.copyOf(ba, bb.position());
>> 138: if (Arrays.equals(kanji, envb)) return;
> 
> It seems odd to exit the loop on the first match.
> 
> I think AIX always has environment variables not set by the caller.

On this testing, use 2 environment variables:
* LANG=ja_JP.eucjp
* \u6F22\u5B57=\u6F22\u5B57

If the other environment variable is defined, it's printed.

> test/jdk/java/lang/System/i18nEnvArg.java line 146:
> 
>> 144: sb.append((char)b);
>> 145: } else {
>> 146: sb.append(String.format("\\x%02X", (int)b & 
>> 0xFF));
> 
> The methods of java.util.HexFormat may be useful here.
> It seems that the "5C" would fit into this "else" clause and not need a 
> separate statement.

Use HexFormat class

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-04-30 Thread Ichiroh Takiguchi
On Tue, 26 Apr 2022 21:17:34 GMT, Naoto Sato  wrote:

>> Ichiroh Takiguchi has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8285517: System.getenv() returns unexpected value if environment variable 
>> has non ASCII character
>
> Thanks for fixing the issue.
> As to the test, it has lots of redundancy, and too complicated to me. Can you 
> simplify the test?

Hello @naotoj and @RogerRiggs .
I appreciate your comments.
I applied the changes.
Additionally, I put bugid into test/jdk/java/lang/ProcessBuilder/Basic.java

> src/java.base/unix/classes/java/lang/ProcessImpl.java line 146:
> 
>> 144: private static final LaunchMechanism launchMechanism = 
>> platform.launchMechanism();
>> 145: private static final byte[] helperpath = 
>> toCString(StaticProperty.javaHome() + "/lib/jspawnhelper");
>> 146: private static Charset jnuCharset = null;
> 
> Can we simply call `jnuCharset()` here?

Change jnuCharset to "private static".

> test/jdk/java/lang/ProcessBuilder/Basic.java line 605:
> 
>> 603: String fileEncoding = System.getProperty("file.encoding");
>> 604: Charset cs;
>> 605: if (nativeEncoding != null && 
>> !nativeEncoding.equals(fileEncoding)) {
> 
> What's the reason for comparing `file.encoding`? Does 
> `Charset.forName(nativeEncoding, Charset.defaultCharset())` suffice?

Remove file.encoding related code.

> test/jdk/java/lang/System/i18nEnvArg.java line 26:
> 
>> 24: /*
>> 25:  * @test
>> 26:  * @bug 8285517
> 
> If the test should work only on a particular env, should describe here and 
> add `@requires`  tag.

Add "@requires (os.family == "linux")"

> test/jdk/java/lang/System/i18nEnvArg.java line 50:
> 
>> 48: 
>> 49: public static void main(String[] args) throws Exception {
>> 50: ProcessBuilder pb = new ProcessBuilder(javeExe,
> 
> Can utilize `jdk.test.lib.process.ProcessTools`.

Use ProcessTools class.

-

PR: https://git.openjdk.java.net/jdk/pull/8378


Re: RFR: 8285517: System.getenv() returns unexpected value if environment variable has non ASCII character [v2]

2022-04-30 Thread Ichiroh Takiguchi
> On JDK19 with Linux ja_JP.eucjp locale,
> System.getenv() returns unexpected value if environment variable has Japanese 
> EUC characters.
> It seems this issue happens because of JEP 400.
> Arguments for ProcessBuilder have same kind of issue.

Ichiroh Takiguchi has updated the pull request incrementally with one 
additional commit since the last revision:

  8285517: System.getenv() returns unexpected value if environment variable has 
non ASCII character

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8378/files
  - new: https://git.openjdk.java.net/jdk/pull/8378/files/89ee195a..050dd663

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8378=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8378=00-01

  Stats: 106 lines in 4 files changed: 26 ins; 41 del; 39 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8378.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8378/head:pull/8378

PR: https://git.openjdk.java.net/jdk/pull/8378