Re: RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-23 Thread Raffaello Giulietti

Thanks for pushing changeset 865b5ca81009.
Raffaello



On 2020-07-21 19:18, Lance Andersen wrote:

Hi Roger,

I will plan to push tomorrow morning pending any last minute reviews

Best
Lance

On Jul 21, 2020, at 9:57 AM, Roger Riggs > wrote:


Hi Rafaello, Lance,

That looks good to me.

Thanks, Roger

On 7/19/20 2:31 PM, Lance Andersen wrote:

Hi Raffaello,

I think the changes look reasonable.

I have created a webrev, 
http://cr.openjdk.java.net/~lancea/8222187/webrev.00/, for others to 
review as it is a bit easier than the patch.


I have also run the JCK tests in this area as well as mach5 tiers1-3 
(which I believe Roger has also)


Best
Lance

On Jul 15, 2020, at 5:44 PM, Raffaello Giulietti 
> wrote:


Hi Roger,

On 2020-07-15 22:21, Roger Riggs wrote:

Hi Raffaello,
Base64.java:
2: Please update 2nd copyright year to 2020


I was unsure what to do here, thanks for guidance.
I also removed the seemingly useless import at former L33.




leftovers():
996: "&" the shortcutting && is more typical in a condition.
997: the -= is buried in an expression and a reader might miss it.


Corrected, as well as the analogous -= usage for wpos now at L1106-7


1053:  "can be reallocated to other vars":  not a useful comment, 
reflecting on only one possible implementation that is out of the 
control of the developer.
I'd almost rather see 'len' than 'limit - off'; it might be 
informative to the reader if 'limit' was declared final. (1056:)


Done, as well as declaring i and v final in the loop.




TestBase54.java:
2: Update 2nd copyright year to 2020
27:  Please add the bug number to the @bug line.
Style-wise, I would remove the blank lines at the beginning of 
blocks. (1095, 1115)


Done.



Thanks, Roger
On 7/14/20 11:47 AM, Raffaello Giulietti wrote:

Hi Roger,

here's the latest version of the patch.

I did:
* Withdraw the simplification in encodedOutLength(), as it is 
indeed out of scope for this bug fix.
* Restore field names in DecInputStream except for nextin (now 
wpos) and nextout (now rpos) because they have slightly different 
semantics. That's just for clarity. I would have nothing against 
reusing the old names for the proposed purpose.

* Switch back to the original no-arg read() implementation.
* Adjust comment continuation lines.
* Preserve the proposed logic of read(byte[], int, int) and the 
supporting methods.


Others needed changes are:
* Correct the copyright years: that's something better left to Oracle.
* Remove the apparently useless import at L33. I could build and 
run without it.



Greetings
Raffaello






# HG changeset patch
# User lello
# Date 1594848775 -7200
#  Wed Jul 15 23:32:55 2020 +0200
# Node ID f7407f35e83ab508f0e6deab4f776bb1a1c85e68
# Parent  a5649ebf94193770ca763391dd63807059d333b4
8222187: java.util.Base64.Decoder stream adds unexpected null bytes 
at the end

Reviewed-by: TBD
Contributed-by: Raffaello Giulietti >


diff --git a/src/java.base/share/classes/java/util/Base64.java 
b/src/java.base/share/classes/java/util/Base64.java

--- a/src/java.base/share/classes/java/util/Base64.java
+++ b/src/java.base/share/classes/java/util/Base64.java
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2012, 2019, Oracle and/or its affiliates. All 
rights reserved.
+ * Copyright (c) 2012, 2020, Oracle and/or its affiliates. All 
rights reserved.

 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
 *
 * This code is free software; you can redistribute it and/or modify it
@@ -30,7 +30,6 @@
import java.io.IOException;
import java.io.OutputStream;
import java.nio.ByteBuffer;
-import java.nio.charset.StandardCharsets;

import sun.nio.cs.ISO_8859_1;

@@ -961,12 +960,15 @@

private final InputStream is;
private final boolean isMIME;
-    private final int[] base64;  // base64 -> byte mapping
-    private int bits = 0;    // 24-bit buffer for decoding
-    private int nextin = 18; // next available "off" in 
"bits" for input;

- // -> 18, 12, 6, 0
-    private int nextout = -8;    // next available "off" in 
"bits" for output;
- // -> 8, 0, -8 (no byte 
for output)

+    private final int[] base64; // base64 -> byte mapping
+    private int bits = 0;   // 24-bit buffer for decoding
+
+    /* writing bit pos inside bits; one of 24 (left, msb), 18, 
12, 6, 0 */

+    private int wpos = 0;
+
+    /* reading bit pos inside bits: one of 24 (left, msb), 16, 
8, 0 */

+    private int rpos = 0;
+
private boolean eof = false;
private boolean closed = false;

@@ -983,107 +985,153 @@
return read(sbBuf, 0, 1) == -1 ? -1 : sbBuf[0] & 0xff;
}

-    private int eof(byte[] b, int off, int len, int oldOff)
-    throws IOException
-    {
+    private int 

Re: RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-21 Thread Lance Andersen
Hi Roger,

I will plan to push tomorrow morning pending any last minute reviews

Best
Lance

> On Jul 21, 2020, at 9:57 AM, Roger Riggs  wrote:
> 
> Hi Rafaello, Lance,
> 
> That looks good to me.  
> 
> Thanks, Roger
> 
> On 7/19/20 2:31 PM, Lance Andersen wrote:
>> Hi Raffaello,
>> 
>> I think the changes look reasonable.
>> 
>> I have created a webrev, 
>> http://cr.openjdk.java.net/~lancea/8222187/webrev.00/ 
>> , for others to 
>> review as it is a bit easier than the patch.
>> 
>> I have also run the JCK tests in this area as well as mach5 tiers1-3 (which 
>> I believe Roger has also)
>> 
>> Best
>> Lance
>> 
>>> On Jul 15, 2020, at 5:44 PM, Raffaello Giulietti 
>>> mailto:raffaello.giulie...@gmail.com>> 
>>> wrote:
>>> 
>>> Hi Roger,
>>> 
>>> On 2020-07-15 22:21, Roger Riggs wrote:
 Hi Raffaello,
 Base64.java:
 2: Please update 2nd copyright year to 2020
>>> 
>>> I was unsure what to do here, thanks for guidance.
>>> I also removed the seemingly useless import at former L33.
>>> 
>>> 
>>> 
 leftovers():
 996: "&" the shortcutting && is more typical in a condition.
 997: the -= is buried in an expression and a reader might miss it.
>>> 
>>> Corrected, as well as the analogous -= usage for wpos now at L1106-7
>>> 
>>> 
 1053:  "can be reallocated to other vars":  not a useful comment, 
 reflecting on only one possible implementation that is out of the control 
 of the developer.
 I'd almost rather see 'len' than 'limit - off'; it might be informative to 
 the reader if 'limit' was declared final. (1056:)
>>> 
>>> Done, as well as declaring i and v final in the loop.
>>> 
>>> 
>>> 
 TestBase54.java:
 2: Update 2nd copyright year to 2020
 27:  Please add the bug number to the @bug line.
 Style-wise, I would remove the blank lines at the beginning of blocks. 
 (1095, 1115)
>>> 
>>> Done.
>>> 
>>> 
 Thanks, Roger
 On 7/14/20 11:47 AM, Raffaello Giulietti wrote:
> Hi Roger,
> 
> here's the latest version of the patch.
> 
> I did:
> * Withdraw the simplification in encodedOutLength(), as it is indeed out 
> of scope for this bug fix.
> * Restore field names in DecInputStream except for nextin (now wpos) and 
> nextout (now rpos) because they have slightly different semantics. That's 
> just for clarity. I would have nothing against reusing the old names for 
> the proposed purpose.
> * Switch back to the original no-arg read() implementation.
> * Adjust comment continuation lines.
> * Preserve the proposed logic of read(byte[], int, int) and the 
> supporting methods.
> 
> Others needed changes are:
> * Correct the copyright years: that's something better left to Oracle.
> * Remove the apparently useless import at L33. I could build and run 
> without it.
> 
> 
> Greetings
> Raffaello
> 
>>> 
>>> 
>>> 
>>> 
>>> # HG changeset patch
>>> # User lello
>>> # Date 1594848775 -7200
>>> #  Wed Jul 15 23:32:55 2020 +0200
>>> # Node ID f7407f35e83ab508f0e6deab4f776bb1a1c85e68
>>> # Parent  a5649ebf94193770ca763391dd63807059d333b4
>>> 8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the 
>>> end
>>> Reviewed-by: TBD
>>> Contributed-by: Raffaello Giulietti >> >
>>> 
>>> diff --git a/src/java.base/share/classes/java/util/Base64.java 
>>> b/src/java.base/share/classes/java/util/Base64.java
>>> --- a/src/java.base/share/classes/java/util/Base64.java
>>> +++ b/src/java.base/share/classes/java/util/Base64.java
>>> @@ -1,5 +1,5 @@
>>> /*
>>> - * Copyright (c) 2012, 2019, Oracle and/or its affiliates. All rights 
>>> reserved.
>>> + * Copyright (c) 2012, 2020, Oracle and/or its affiliates. All rights 
>>> reserved.
>>>  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>>  *
>>>  * This code is free software; you can redistribute it and/or modify it
>>> @@ -30,7 +30,6 @@
>>> import java.io.IOException;
>>> import java.io.OutputStream;
>>> import java.nio.ByteBuffer;
>>> -import java.nio.charset.StandardCharsets;
>>> 
>>> import sun.nio.cs.ISO_8859_1;
>>> 
>>> @@ -961,12 +960,15 @@
>>> 
>>> private final InputStream is;
>>> private final boolean isMIME;
>>> -private final int[] base64;  // base64 -> byte mapping
>>> -private int bits = 0;// 24-bit buffer for decoding
>>> -private int nextin = 18; // next available "off" in "bits" 
>>> for input;
>>> - // -> 18, 12, 6, 0
>>> -private int nextout = -8;// next available "off" in "bits" 
>>> for output;
>>> - // -> 8, 0, -8 (no byte for 
>>> output)
>>> +private final int[] base64; // base64 -> byte mapping
>>> +private int bits = 0;   // 24-bit buffer for decoding
>>> +
>>> +/* 

Re: RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-21 Thread Roger Riggs

Hi Rafaello, Lance,

That looks good to me.

Thanks, Roger

On 7/19/20 2:31 PM, Lance Andersen wrote:

Hi Raffaello,

I think the changes look reasonable.

I have created a webrev, 
http://cr.openjdk.java.net/~lancea/8222187/webrev.00/, for others to 
review as it is a bit easier than the patch.


I have also run the JCK tests in this area as well as mach5 tiers1-3 
(which I believe Roger has also)


Best
Lance

On Jul 15, 2020, at 5:44 PM, Raffaello Giulietti 
> wrote:


Hi Roger,

On 2020-07-15 22:21, Roger Riggs wrote:

Hi Raffaello,
Base64.java:
2: Please update 2nd copyright year to 2020


I was unsure what to do here, thanks for guidance.
I also removed the seemingly useless import at former L33.




leftovers():
996: "&" the shortcutting && is more typical in a condition.
997: the -= is buried in an expression and a reader might miss it.


Corrected, as well as the analogous -= usage for wpos now at L1106-7


1053:  "can be reallocated to other vars":  not a useful comment, 
reflecting on only one possible implementation that is out of the 
control of the developer.
I'd almost rather see 'len' than 'limit - off'; it might be 
informative to the reader if 'limit' was declared final. (1056:)


Done, as well as declaring i and v final in the loop.




TestBase54.java:
2: Update 2nd copyright year to 2020
27:  Please add the bug number to the @bug line.
Style-wise, I would remove the blank lines at the beginning of 
blocks. (1095, 1115)


Done.



Thanks, Roger
On 7/14/20 11:47 AM, Raffaello Giulietti wrote:

Hi Roger,

here's the latest version of the patch.

I did:
* Withdraw the simplification in encodedOutLength(), as it is 
indeed out of scope for this bug fix.
* Restore field names in DecInputStream except for nextin (now 
wpos) and nextout (now rpos) because they have slightly different 
semantics. That's just for clarity. I would have nothing against 
reusing the old names for the proposed purpose.

* Switch back to the original no-arg read() implementation.
* Adjust comment continuation lines.
* Preserve the proposed logic of read(byte[], int, int) and the 
supporting methods.


Others needed changes are:
* Correct the copyright years: that's something better left to Oracle.
* Remove the apparently useless import at L33. I could build and 
run without it.



Greetings
Raffaello






# HG changeset patch
# User lello
# Date 1594848775 -7200
#  Wed Jul 15 23:32:55 2020 +0200
# Node ID f7407f35e83ab508f0e6deab4f776bb1a1c85e68
# Parent  a5649ebf94193770ca763391dd63807059d333b4
8222187: java.util.Base64.Decoder stream adds unexpected null bytes 
at the end

Reviewed-by: TBD
Contributed-by: Raffaello Giulietti >


diff --git a/src/java.base/share/classes/java/util/Base64.java 
b/src/java.base/share/classes/java/util/Base64.java

--- a/src/java.base/share/classes/java/util/Base64.java
+++ b/src/java.base/share/classes/java/util/Base64.java
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2012, 2019, Oracle and/or its affiliates. All 
rights reserved.
+ * Copyright (c) 2012, 2020, Oracle and/or its affiliates. All 
rights reserved.

 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
 *
 * This code is free software; you can redistribute it and/or modify it
@@ -30,7 +30,6 @@
import java.io.IOException;
import java.io.OutputStream;
import java.nio.ByteBuffer;
-import java.nio.charset.StandardCharsets;

import sun.nio.cs.ISO_8859_1;

@@ -961,12 +960,15 @@

private final InputStream is;
private final boolean isMIME;
-    private final int[] base64;  // base64 -> byte mapping
-    private int bits = 0;    // 24-bit buffer for decoding
-    private int nextin = 18; // next available "off" in 
"bits" for input;

- // -> 18, 12, 6, 0
-    private int nextout = -8;    // next available "off" in 
"bits" for output;
- // -> 8, 0, -8 (no byte for 
output)

+    private final int[] base64; // base64 -> byte mapping
+    private int bits = 0;   // 24-bit buffer for decoding
+
+    /* writing bit pos inside bits; one of 24 (left, msb), 18, 
12, 6, 0 */

+    private int wpos = 0;
+
+    /* reading bit pos inside bits: one of 24 (left, msb), 16, 
8, 0 */

+    private int rpos = 0;
+
private boolean eof = false;
private boolean closed = false;

@@ -983,107 +985,153 @@
return read(sbBuf, 0, 1) == -1 ? -1 : sbBuf[0] & 0xff;
}

-    private int eof(byte[] b, int off, int len, int oldOff)
-    throws IOException
-    {
+    private int leftovers(byte[] b, int off, int pos, int limit) {
eof = true;
-    if (nextin != 18) {
-    if (nextin == 12)
-    throw new IOException("Base64 stream has one 
un-decoded dangling byte.");

-    // treat ending xx/xxx 

Re: RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-19 Thread Lance Andersen
Hi Raffaello,

I think the changes look reasonable.

I have created a webrev, http://cr.openjdk.java.net/~lancea/8222187/webrev.00/ 
, for others to review 
as it is a bit easier than the patch.

I have also run the JCK tests in this area as well as mach5 tiers1-3 (which I 
believe Roger has also)

Best
Lance

> On Jul 15, 2020, at 5:44 PM, Raffaello Giulietti 
>  wrote:
> 
> Hi Roger,
> 
> On 2020-07-15 22:21, Roger Riggs wrote:
>> Hi Raffaello,
>> Base64.java:
>> 2: Please update 2nd copyright year to 2020
> 
> I was unsure what to do here, thanks for guidance.
> I also removed the seemingly useless import at former L33.
> 
> 
> 
>> leftovers():
>> 996: "&" the shortcutting && is more typical in a condition.
>> 997: the -= is buried in an expression and a reader might miss it.
> 
> Corrected, as well as the analogous -= usage for wpos now at L1106-7
> 
> 
>> 1053:  "can be reallocated to other vars":  not a useful comment, reflecting 
>> on only one possible implementation that is out of the control of the 
>> developer.
>> I'd almost rather see 'len' than 'limit - off'; it might be informative to 
>> the reader if 'limit' was declared final. (1056:)
> 
> Done, as well as declaring i and v final in the loop.
> 
> 
> 
>> TestBase54.java:
>> 2: Update 2nd copyright year to 2020
>> 27:  Please add the bug number to the @bug line.
>> Style-wise, I would remove the blank lines at the beginning of blocks. 
>> (1095, 1115)
> 
> Done.
> 
> 
>> Thanks, Roger
>> On 7/14/20 11:47 AM, Raffaello Giulietti wrote:
>>> Hi Roger,
>>> 
>>> here's the latest version of the patch.
>>> 
>>> I did:
>>> * Withdraw the simplification in encodedOutLength(), as it is indeed out of 
>>> scope for this bug fix.
>>> * Restore field names in DecInputStream except for nextin (now wpos) and 
>>> nextout (now rpos) because they have slightly different semantics. That's 
>>> just for clarity. I would have nothing against reusing the old names for 
>>> the proposed purpose.
>>> * Switch back to the original no-arg read() implementation.
>>> * Adjust comment continuation lines.
>>> * Preserve the proposed logic of read(byte[], int, int) and the supporting 
>>> methods.
>>> 
>>> Others needed changes are:
>>> * Correct the copyright years: that's something better left to Oracle.
>>> * Remove the apparently useless import at L33. I could build and run 
>>> without it.
>>> 
>>> 
>>> Greetings
>>> Raffaello
>>> 
> 
> 
> 
> 
> # HG changeset patch
> # User lello
> # Date 1594848775 -7200
> #  Wed Jul 15 23:32:55 2020 +0200
> # Node ID f7407f35e83ab508f0e6deab4f776bb1a1c85e68
> # Parent  a5649ebf94193770ca763391dd63807059d333b4
> 8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end
> Reviewed-by: TBD
> Contributed-by: Raffaello Giulietti  >
> 
> diff --git a/src/java.base/share/classes/java/util/Base64.java 
> b/src/java.base/share/classes/java/util/Base64.java
> --- a/src/java.base/share/classes/java/util/Base64.java
> +++ b/src/java.base/share/classes/java/util/Base64.java
> @@ -1,5 +1,5 @@
> /*
> - * Copyright (c) 2012, 2019, Oracle and/or its affiliates. All rights 
> reserved.
> + * Copyright (c) 2012, 2020, Oracle and/or its affiliates. All rights 
> reserved.
>  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>  *
>  * This code is free software; you can redistribute it and/or modify it
> @@ -30,7 +30,6 @@
> import java.io.IOException;
> import java.io.OutputStream;
> import java.nio.ByteBuffer;
> -import java.nio.charset.StandardCharsets;
> 
> import sun.nio.cs.ISO_8859_1;
> 
> @@ -961,12 +960,15 @@
> 
> private final InputStream is;
> private final boolean isMIME;
> -private final int[] base64;  // base64 -> byte mapping
> -private int bits = 0;// 24-bit buffer for decoding
> -private int nextin = 18; // next available "off" in "bits" 
> for input;
> - // -> 18, 12, 6, 0
> -private int nextout = -8;// next available "off" in "bits" 
> for output;
> - // -> 8, 0, -8 (no byte for output)
> +private final int[] base64; // base64 -> byte mapping
> +private int bits = 0;   // 24-bit buffer for decoding
> +
> +/* writing bit pos inside bits; one of 24 (left, msb), 18, 12, 6, 0 
> */
> +private int wpos = 0;
> +
> +/* reading bit pos inside bits: one of 24 (left, msb), 16, 8, 0 */
> +private int rpos = 0;
> +
> private boolean eof = false;
> private boolean closed = false;
> 
> @@ -983,107 +985,153 @@
> return read(sbBuf, 0, 1) == -1 ? -1 : sbBuf[0] & 0xff;
> }
> 
> -private int eof(byte[] b, int off, int len, int oldOff)
> -throws IOException
> -{
> +private int leftovers(byte[] b, int off, int pos, int limit) {
>  

Re: RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-15 Thread Raffaello Giulietti

Hi Roger,

On 2020-07-15 22:21, Roger Riggs wrote:

Hi Raffaello,

Base64.java:
2: Please update 2nd copyright year to 2020



I was unsure what to do here, thanks for guidance.
I also removed the seemingly useless import at former L33.




leftovers():
996: "&" the shortcutting && is more typical in a condition.
997: the -= is buried in an expression and a reader might miss it.



Corrected, as well as the analogous -= usage for wpos now at L1106-7




1053:  "can be reallocated to other vars":  not a useful comment, 
reflecting on only one possible implementation that is out of the 
control of the developer.
I'd almost rather see 'len' than 'limit - off'; it might be informative 
to the reader if 'limit' was declared final. (1056:)




Done, as well as declaring i and v final in the loop.




TestBase54.java:

2: Update 2nd copyright year to 2020

27:  Please add the bug number to the @bug line.

Style-wise, I would remove the blank lines at the beginning of blocks. 
(1095, 1115)




Done.



Thanks, Roger


On 7/14/20 11:47 AM, Raffaello Giulietti wrote:

Hi Roger,

here's the latest version of the patch.

I did:
* Withdraw the simplification in encodedOutLength(), as it is indeed 
out of scope for this bug fix.
* Restore field names in DecInputStream except for nextin (now wpos) 
and nextout (now rpos) because they have slightly different semantics. 
That's just for clarity. I would have nothing against reusing the old 
names for the proposed purpose.

* Switch back to the original no-arg read() implementation.
* Adjust comment continuation lines.
* Preserve the proposed logic of read(byte[], int, int) and the 
supporting methods.


Others needed changes are:
* Correct the copyright years: that's something better left to Oracle.
* Remove the apparently useless import at L33. I could build and run 
without it.



Greetings
Raffaello






# HG changeset patch
# User lello
# Date 1594848775 -7200
#  Wed Jul 15 23:32:55 2020 +0200
# Node ID f7407f35e83ab508f0e6deab4f776bb1a1c85e68
# Parent  a5649ebf94193770ca763391dd63807059d333b4
8222187: java.util.Base64.Decoder stream adds unexpected null bytes at 
the end

Reviewed-by: TBD
Contributed-by: Raffaello Giulietti 

diff --git a/src/java.base/share/classes/java/util/Base64.java 
b/src/java.base/share/classes/java/util/Base64.java

--- a/src/java.base/share/classes/java/util/Base64.java
+++ b/src/java.base/share/classes/java/util/Base64.java
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2012, 2019, Oracle and/or its affiliates. All rights 
reserved.
+ * Copyright (c) 2012, 2020, Oracle and/or its affiliates. All rights 
reserved.

  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
  *
  * This code is free software; you can redistribute it and/or modify it
@@ -30,7 +30,6 @@
 import java.io.IOException;
 import java.io.OutputStream;
 import java.nio.ByteBuffer;
-import java.nio.charset.StandardCharsets;

 import sun.nio.cs.ISO_8859_1;

@@ -961,12 +960,15 @@

 private final InputStream is;
 private final boolean isMIME;
-private final int[] base64;  // base64 -> byte mapping
-private int bits = 0;// 24-bit buffer for decoding
-private int nextin = 18; // next available "off" in 
"bits" for input;

- // -> 18, 12, 6, 0
-private int nextout = -8;// next available "off" in 
"bits" for output;
- // -> 8, 0, -8 (no byte for 
output)

+private final int[] base64; // base64 -> byte mapping
+private int bits = 0;   // 24-bit buffer for decoding
+
+/* writing bit pos inside bits; one of 24 (left, msb), 18, 12, 
6, 0 */

+private int wpos = 0;
+
+/* reading bit pos inside bits: one of 24 (left, msb), 16, 8, 0 */
+private int rpos = 0;
+
 private boolean eof = false;
 private boolean closed = false;

@@ -983,107 +985,153 @@
 return read(sbBuf, 0, 1) == -1 ? -1 : sbBuf[0] & 0xff;
 }

-private int eof(byte[] b, int off, int len, int oldOff)
-throws IOException
-{
+private int leftovers(byte[] b, int off, int pos, int limit) {
 eof = true;
-if (nextin != 18) {
-if (nextin == 12)
-throw new IOException("Base64 stream has one 
un-decoded dangling byte.");

-// treat ending xx/xxx without padding character legal.
-// same logic as v == '=' below
-b[off++] = (byte)(bits >> (16));
-if (nextin == 0) {   // only one padding byte
-if (len == 1) {  // no enough output space
-bits >>= 8;  // shift to lowest byte
-nextout = 0;
-} else {
-b[off++] = (byte) (bits >>  8);
-}
-}
+
+

Re: RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-15 Thread Roger Riggs

Hi Raffaello,

Base64.java:
2: Please update 2nd copyright year to 2020

leftovers():
996: "&" the shortcutting && is more typical in a condition.
997: the -= is buried in an expression and a reader might miss it.


1053:  "can be reallocated to other vars":  not a useful comment, 
reflecting on only one possible implementation that is out of the 
control of the developer.
I'd almost rather see 'len' than 'limit - off'; it might be informative 
to the reader if 'limit' was declared final. (1056:)


TestBase54.java:

2: Update 2nd copyright year to 2020

27:  Please add the bug number to the @bug line.

Style-wise, I would remove the blank lines at the beginning of blocks.  
(1095, 1115)


Thanks, Roger


On 7/14/20 11:47 AM, Raffaello Giulietti wrote:

Hi Roger,

here's the latest version of the patch.

I did:
* Withdraw the simplification in encodedOutLength(), as it is indeed 
out of scope for this bug fix.
* Restore field names in DecInputStream except for nextin (now wpos) 
and nextout (now rpos) because they have slightly different semantics. 
That's just for clarity. I would have nothing against reusing the old 
names for the proposed purpose.

* Switch back to the original no-arg read() implementation.
* Adjust comment continuation lines.
* Preserve the proposed logic of read(byte[], int, int) and the 
supporting methods.


Others needed changes are:
* Correct the copyright years: that's something better left to Oracle.
* Remove the apparently useless import at L33. I could build and run 
without it.



Greetings
Raffaello



# HG changeset patch
# User lello
# Date 1594741356 -7200
#  Tue Jul 14 17:42:36 2020 +0200
# Node ID 2bebe2aced4c3fa3b42b3c6ee445f9b7b0d20b5d
# Parent  a5649ebf94193770ca763391dd63807059d333b4
8222187: java.util.Base64.Decoder stream adds unexpected null bytes at 
the end

Reviewed-by: TBD
Contributed-by: Raffaello Giulietti 

diff --git a/src/java.base/share/classes/java/util/Base64.java 
b/src/java.base/share/classes/java/util/Base64.java

--- a/src/java.base/share/classes/java/util/Base64.java
+++ b/src/java.base/share/classes/java/util/Base64.java
@@ -961,12 +961,15 @@

 private final InputStream is;
 private final boolean isMIME;
-    private final int[] base64;  // base64 -> byte mapping
-    private int bits = 0;    // 24-bit buffer for decoding
-    private int nextin = 18; // next available "off" in 
"bits" for input;

- // -> 18, 12, 6, 0
-    private int nextout = -8;    // next available "off" in 
"bits" for output;
- // -> 8, 0, -8 (no byte for 
output)

+    private final int[] base64; // base64 -> byte mapping
+    private int bits = 0;   // 24-bit buffer for decoding
+
+    /* writing bit pos inside bits; one of 24 (left, msb), 18, 
12, 6, 0 */

+    private int wpos = 0;
+
+    /* reading bit pos inside bits: one of 24 (left, msb), 16, 8, 
0 */

+    private int rpos = 0;
+
 private boolean eof = false;
 private boolean closed = false;

@@ -983,107 +986,156 @@
 return read(sbBuf, 0, 1) == -1 ? -1 : sbBuf[0] & 0xff;
 }

-    private int eof(byte[] b, int off, int len, int oldOff)
-    throws IOException
-    {
+    private int leftovers(byte[] b, int off, int pos, int limit) {
 eof = true;
-    if (nextin != 18) {
-    if (nextin == 12)
-    throw new IOException("Base64 stream has one 
un-decoded dangling byte.");

-    // treat ending xx/xxx without padding character legal.
-    // same logic as v == '=' below
-    b[off++] = (byte)(bits >> (16));
-    if (nextin == 0) {   // only one padding byte
-    if (len == 1) {  // no enough output space
-    bits >>= 8;  // shift to lowest byte
-    nextout = 0;
-    } else {
-    b[off++] = (byte) (bits >>  8);
-    }
-    }
+
+    /*
+ * We use a loop here, as this method is executed only a 
few times.
+ * Unrolling the loop would probably not contribute much 
here.

+ */
+    while (rpos - 8 >= wpos & pos != limit) {
+    b[pos++] = (byte) (bits >> (rpos -= 8));
 }
-    return off == oldOff ? -1 : off - oldOff;
+    return pos - off != 0 || rpos - 8 >= wpos ? pos - off : -1;
 }

-    private int padding(byte[] b, int off, int len, int oldOff)
-    throws IOException
-    {
-    // = shiftto==18 unnecessary padding
-    // x=    shiftto==12 dangling x, invalid unit
-    // xx=   shiftto==6 && missing last '='
-    // xx=y  or last is not '='
-    if (nextin == 18 || nextin == 12 ||
-

Re: RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-14 Thread Raffaello Giulietti

Hi Roger,

here's the latest version of the patch.

I did:
* Withdraw the simplification in encodedOutLength(), as it is indeed out 
of scope for this bug fix.
* Restore field names in DecInputStream except for nextin (now wpos) and 
nextout (now rpos) because they have slightly different semantics. 
That's just for clarity. I would have nothing against reusing the old 
names for the proposed purpose.

* Switch back to the original no-arg read() implementation.
* Adjust comment continuation lines.
* Preserve the proposed logic of read(byte[], int, int) and the 
supporting methods.


Others needed changes are:
* Correct the copyright years: that's something better left to Oracle.
* Remove the apparently useless import at L33. I could build and run 
without it.



Greetings
Raffaello



# HG changeset patch
# User lello
# Date 1594741356 -7200
#  Tue Jul 14 17:42:36 2020 +0200
# Node ID 2bebe2aced4c3fa3b42b3c6ee445f9b7b0d20b5d
# Parent  a5649ebf94193770ca763391dd63807059d333b4
8222187: java.util.Base64.Decoder stream adds unexpected null bytes at 
the end

Reviewed-by: TBD
Contributed-by: Raffaello Giulietti 

diff --git a/src/java.base/share/classes/java/util/Base64.java 
b/src/java.base/share/classes/java/util/Base64.java

--- a/src/java.base/share/classes/java/util/Base64.java
+++ b/src/java.base/share/classes/java/util/Base64.java
@@ -961,12 +961,15 @@

 private final InputStream is;
 private final boolean isMIME;
-private final int[] base64;  // base64 -> byte mapping
-private int bits = 0;// 24-bit buffer for decoding
-private int nextin = 18; // next available "off" in 
"bits" for input;

- // -> 18, 12, 6, 0
-private int nextout = -8;// next available "off" in 
"bits" for output;
- // -> 8, 0, -8 (no byte for 
output)

+private final int[] base64; // base64 -> byte mapping
+private int bits = 0;   // 24-bit buffer for decoding
+
+/* writing bit pos inside bits; one of 24 (left, msb), 18, 12, 
6, 0 */

+private int wpos = 0;
+
+/* reading bit pos inside bits: one of 24 (left, msb), 16, 8, 0 */
+private int rpos = 0;
+
 private boolean eof = false;
 private boolean closed = false;

@@ -983,107 +986,156 @@
 return read(sbBuf, 0, 1) == -1 ? -1 : sbBuf[0] & 0xff;
 }

-private int eof(byte[] b, int off, int len, int oldOff)
-throws IOException
-{
+private int leftovers(byte[] b, int off, int pos, int limit) {
 eof = true;
-if (nextin != 18) {
-if (nextin == 12)
-throw new IOException("Base64 stream has one 
un-decoded dangling byte.");

-// treat ending xx/xxx without padding character legal.
-// same logic as v == '=' below
-b[off++] = (byte)(bits >> (16));
-if (nextin == 0) {   // only one padding byte
-if (len == 1) {  // no enough output space
-bits >>= 8;  // shift to lowest byte
-nextout = 0;
-} else {
-b[off++] = (byte) (bits >>  8);
-}
-}
+
+/*
+ * We use a loop here, as this method is executed only a 
few times.

+ * Unrolling the loop would probably not contribute much here.
+ */
+while (rpos - 8 >= wpos & pos != limit) {
+b[pos++] = (byte) (bits >> (rpos -= 8));
 }
-return off == oldOff ? -1 : off - oldOff;
+return pos - off != 0 || rpos - 8 >= wpos ? pos - off : -1;
 }

-private int padding(byte[] b, int off, int len, int oldOff)
-throws IOException
-{
-// = shiftto==18 unnecessary padding
-// x=shiftto==12 dangling x, invalid unit
-// xx=   shiftto==6 && missing last '='
-// xx=y  or last is not '='
-if (nextin == 18 || nextin == 12 ||
-nextin == 6 && is.read() != '=') {
-throw new IOException("Illegal base64 ending sequence:" 
+ nextin);
+private int eof(byte[] b, int off, int pos, int limit) throws 
IOException {

+/*
+ * pos != limit
+ *
+ * wpos == 18: x dangling single x, invalid unit
+ * accept ending xx or xxx without padding characters
+ */
+if (wpos == 18) {
+throw new IOException("Base64 stream has one un-decoded 
dangling byte.");

 }
-b[off++] = (byte)(bits >> (16));
-if (nextin == 0) {   // only one padding byte
-if (len == 1) {  // no enough output space
-   

Re: RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-08 Thread Lance Andersen
Hi Raffaello,

Thank you for your taking up this issue.  When working on a fix such as this, 
it is best to keep the changes narrowly focused to the issue at hand and not 
attempt to address additional issues within the same fix (though this can be 
tempting). 

To move forward, I would suggest for your revised patch, to just address the 
issue described in the bug report.  At the same time create a new issue(s) to 
address proposed readability issues and/or additional areas of improvement.

Best
Lance

> On Jul 8, 2020, at 3:50 PM, Raffaello Giulietti 
>  wrote:
> 
> Hello Roger,
> 
> I would of course retract the changes in encodedOutLength() because, as you 
> observe, it is out of scope w.r.t. the bug. I would also retract the other 
> minor changes you mention and clean up the appearance of the comments to 
> adhere to community shared practices.
> 
> Thus, if you think that doing these changes can facilitate my contribution to 
> get through the review process, I would be happy to prepare another patch.
> 
> On the other hand, I see no incentive in dropping the bulk of the new working 
> implementation to try to fix the current one. I would not feel as confident 
> as I am with the proposed code.
> 
> For the sake of completeness and love of experiment, I privately added many 
> cherry-picking and kind of self-documenting assertions and tested tons of 
> normal and edge cases without troubles.
> 
> 
> Greetings
> Raffaello
> 
> 
> 
> On 2020-07-08 18:45, Roger Riggs wrote:
>> Hi Rafaello,
>> Since you have expanded the scope of the fix, you will need to update the 
>> bug to
>> include the other changes you are making.  Or create a new issue for the 
>> additional work.
>> > Personal preference in style has very little place when maintaining
>> existing code.
>> The style should match the existing code.
>> There is little correspondence between the existing code and the new code 
>> that
>> it make is more difficult to determine that they are equivalent.
>> The messages in the exceptions should not be changed without a compelling 
>> reason.
>> 1014: It would be more informative to include the rogue character in the 
>> message than to drop it.
>> As is, it gives some indication what's wrong about the end.
>> Grautituous changes such as changing the condition in 979 are unnecessary.
>> The change from using "eof" to "eos" is unconventional for IO streams and 
>> unnecessary.
>> The note about allowing limit to overflow to MIN_VALUE is informative but it 
>> would
>> be better to avoid it and keep a future maintainer from falling into the 
>> trap.
>> Regards, Roger
>> On 7/6/20 4:48 PM, Raffaello Giulietti wrote:
>>> Hi Roger,
>>> 
>>> the structure of the original design is preserved, the details change with 
>>> the intent of having code that is easier to reason about.
>>> 
>>> The main loop has many exit points in the original code, as well as in the 
>>> new code. For this reason, I prefer not to express it as a while or as a 
>>> counted for loop. That's why it has the for (;;) form, which promptly warns 
>>> that more exit points at unusual positions should be expected.
>>> 
>>> The "Illegal base64 character" IOException in the original code is wrong, 
>>> probably due to oversight. That's why the new code has two vars, namely i 
>>> and v, to convey more meaningful diagnostic.
>>> 
>>> The new code keeps track of only one running variable pos instead of the 
>>> original two vars off and len, witĥout using more live variables for this 
>>> purpose. Less mutating vars help keeping the code simpler. The method has 
>>> only one mutating var.
>>> 
>>> With the use of writing and reading bit positions fields wpos and rpos, 
>>> checking when a byte in field bits is ready means testing rpos - 8 >= wpos.
>>> 
>>> I can add more comments, e.g., in the form of Dijkstra/Hoare-style loop 
>>> invariants, if this helps improving confidence. Or I can add 
>>> constant-guarded assertions to the proposed code instead of just comments. 
>>> Or both.
>>> 
>>> 
>>> 
>>> There's nothing I can find in the coding convention [1] about block 
>>> comments that states that continuation lines must start with the asterisk 
>>> *, although the examples do so. While I find these asterisks quite 
>>> distracting, it would be a trivial matter to add them in the next review 
>>> iteration.
>>> 
>>> The end-of-line comments either comply with the rules in [1] or, in the 
>>> description of the fields, have been inadvertently "inherited" as multiple 
>>> lines from the original code. I will of course correct them there in the 
>>> next review iteration.
>>> 
>>> 
>>> Greetings
>>> Raffaello
>>> 
>>> 
>>> 
>>> [1] 
>>> https://www.oracle.com/java/technologies/javase/codeconventions-comments.html
>>>  
>>> 
>>> 
>>> On 2020-07-06 20:17, Roger Riggs wrote:
 Hi Raffaello,
 
 I'm still not sure it needed this much of a re-write to fix the bug;
 It will take some time to look at the changes.
 
 

Re: RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-08 Thread Raffaello Giulietti

Hello Roger,

I would of course retract the changes in encodedOutLength() because, as 
you observe, it is out of scope w.r.t. the bug. I would also retract the 
other minor changes you mention and clean up the appearance of the 
comments to adhere to community shared practices.


Thus, if you think that doing these changes can facilitate my 
contribution to get through the review process, I would be happy to 
prepare another patch.


On the other hand, I see no incentive in dropping the bulk of the new 
working implementation to try to fix the current one. I would not feel 
as confident as I am with the proposed code.


For the sake of completeness and love of experiment, I privately added 
many cherry-picking and kind of self-documenting assertions and tested 
tons of normal and edge cases without troubles.



Greetings
Raffaello



On 2020-07-08 18:45, Roger Riggs wrote:

Hi Rafaello,

Since you have expanded the scope of the fix, you will need to update 
the bug to
include the other changes you are making.  Or create a new issue for the 
additional work.

 > Personal preference in style has very little place when maintaining
existing code.
The style should match the existing code.

There is little correspondence between the existing code and the new 
code that

it make is more difficult to determine that they are equivalent.

The messages in the exceptions should not be changed without a 
compelling reason.
1014: It would be more informative to include the rogue character in the 
message than to drop it.

As is, it gives some indication what's wrong about the end.

Grautituous changes such as changing the condition in 979 are unnecessary.

The change from using "eof" to "eos" is unconventional for IO streams 
and unnecessary.


The note about allowing limit to overflow to MIN_VALUE is informative 
but it would
be better to avoid it and keep a future maintainer from falling into the 
trap.


Regards, Roger

On 7/6/20 4:48 PM, Raffaello Giulietti wrote:

Hi Roger,

the structure of the original design is preserved, the details change 
with the intent of having code that is easier to reason about.


The main loop has many exit points in the original code, as well as in 
the new code. For this reason, I prefer not to express it as a while 
or as a counted for loop. That's why it has the for (;;) form, which 
promptly warns that more exit points at unusual positions should be 
expected.


The "Illegal base64 character" IOException in the original code is 
wrong, probably due to oversight. That's why the new code has two 
vars, namely i and v, to convey more meaningful diagnostic.


The new code keeps track of only one running variable pos instead of 
the original two vars off and len, witĥout using more live variables 
for this purpose. Less mutating vars help keeping the code simpler. 
The method has only one mutating var.


With the use of writing and reading bit positions fields wpos and 
rpos, checking when a byte in field bits is ready means testing rpos - 
8 >= wpos.


I can add more comments, e.g., in the form of Dijkstra/Hoare-style 
loop invariants, if this helps improving confidence. Or I can add 
constant-guarded assertions to the proposed code instead of just 
comments. Or both.




There's nothing I can find in the coding convention [1] about block 
comments that states that continuation lines must start with the 
asterisk *, although the examples do so. While I find these asterisks 
quite distracting, it would be a trivial matter to add them in the 
next review iteration.


The end-of-line comments either comply with the rules in [1] or, in 
the description of the fields, have been inadvertently "inherited" as 
multiple lines from the original code. I will of course correct them 
there in the next review iteration.



Greetings
Raffaello



[1] 
https://www.oracle.com/java/technologies/javase/codeconventions-comments.html 




On 2020-07-06 20:17, Roger Riggs wrote:

Hi Raffaello,

I'm still not sure it needed this much of a re-write to fix the bug;
It will take some time to look at the changes.

Regardless, OpenJDK conventions call for following the style of the 
existing code.
Your new comments follow neither the existing convention to use 
"//..." comments
nor the other prevalent comment form using /*... */ which use 
consistent indentation

and "* " on continuation lines.

Regards, Roger


On 7/3/20 11:48 AM, Raffaello Giulietti wrote:

Hello,

after Roger's notes, which escaped my attention before, I've 
withdrawn all the changes but for DecInputStream, *except* that I 
couldn't resist to simplify the maths in encodedOutLength(), while 
still using xxxExact() arithmetic.


Sorry for the confusion
Raffaello




Hi Raffaello,

There is way more code changed here than is needed to fix the bug.
General enhancement should be separated from bug fixes.
It makes it easier to review to see the bug was fixed
and easier to separately review other code to see that there are no 
unexpected 

Re: RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-08 Thread Roger Riggs

Hi Rafaello,

Since you have expanded the scope of the fix, you will need to update 
the bug to
include the other changes you are making.  Or create a new issue for the 
additional work.


Personal preference in style has very little place when maintaining 
existing code.

The style should match the existing code.

There is little correspondence between the existing code and the new 
code that

it make is more difficult to determine that they are equivalent.

The messages in the exceptions should not be changed without a 
compelling reason.
1014: It would be more informative to include the rogue character in the 
message than to drop it.

As is, it gives some indication what's wrong about the end.

Grautituous changes such as changing the condition in 979 are unnecessary.

The change from using "eof" to "eos" is unconventional for IO streams 
and unnecessary.


The note about allowing limit to overflow to MIN_VALUE is informative 
but it would
be better to avoid it and keep a future maintainer from falling into the 
trap.


Regards, Roger

On 7/6/20 4:48 PM, Raffaello Giulietti wrote:

Hi Roger,

the structure of the original design is preserved, the details change 
with the intent of having code that is easier to reason about.


The main loop has many exit points in the original code, as well as in 
the new code. For this reason, I prefer not to express it as a while 
or as a counted for loop. That's why it has the for (;;) form, which 
promptly warns that more exit points at unusual positions should be 
expected.


The "Illegal base64 character" IOException in the original code is 
wrong, probably due to oversight. That's why the new code has two 
vars, namely i and v, to convey more meaningful diagnostic.


The new code keeps track of only one running variable pos instead of 
the original two vars off and len, witĥout using more live variables 
for this purpose. Less mutating vars help keeping the code simpler. 
The method has only one mutating var.


With the use of writing and reading bit positions fields wpos and 
rpos, checking when a byte in field bits is ready means testing rpos - 
8 >= wpos.


I can add more comments, e.g., in the form of Dijkstra/Hoare-style 
loop invariants, if this helps improving confidence. Or I can add 
constant-guarded assertions to the proposed code instead of just 
comments. Or both.




There's nothing I can find in the coding convention [1] about block 
comments that states that continuation lines must start with the 
asterisk *, although the examples do so. While I find these asterisks 
quite distracting, it would be a trivial matter to add them in the 
next review iteration.


The end-of-line comments either comply with the rules in [1] or, in 
the description of the fields, have been inadvertently "inherited" as 
multiple lines from the original code. I will of course correct them 
there in the next review iteration.



Greetings
Raffaello



[1] 
https://www.oracle.com/java/technologies/javase/codeconventions-comments.html



On 2020-07-06 20:17, Roger Riggs wrote:

Hi Raffaello,

I'm still not sure it needed this much of a re-write to fix the bug;
It will take some time to look at the changes.

Regardless, OpenJDK conventions call for following the style of the 
existing code.
Your new comments follow neither the existing convention to use 
"//..." comments
nor the other prevalent comment form using /*... */ which use 
consistent indentation

and "* " on continuation lines.

Regards, Roger


On 7/3/20 11:48 AM, Raffaello Giulietti wrote:

Hello,

after Roger's notes, which escaped my attention before, I've 
withdrawn all the changes but for DecInputStream, *except* that I 
couldn't resist to simplify the maths in encodedOutLength(), while 
still using xxxExact() arithmetic.


Sorry for the confusion
Raffaello




Hi Raffaello,

There is way more code changed here than is needed to fix the bug.
General enhancement should be separated from bug fixes.
It makes it easier to review to see the bug was fixed
and easier to separately review other code to see that there are no 
unexpected changes.


If some of the changes are motivated by expected performance 
improvements,

there should be JMH tests comparing the before and after.
The change to use byte arrays seems useful, but even using char[]
there is little danger of cache thrashing.

If using the code using xxxExact was correct, don't change it.
Those methods are intrinsified and perform as well or better than 
using long.

Usually, it is better to leave code alone and not risk breaking it.

Special care needs taken when changing a method that is intrinsified.
The optimized version may use fields of the object and stop
working if they are changed.

In the test, the range of buffer sizes tests seems to waste a lot
of cycles on sizes greater than the encoded size of the input.

Regards, Roger



-

# HG changeset patch
# User lello
# Date 1593790152 -7200
#  Fri Jul 03 17:29:12 2020 +0200

Re: RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-06 Thread Raffaello Giulietti

Hi Roger,

the structure of the original design is preserved, the details change 
with the intent of having code that is easier to reason about.


The main loop has many exit points in the original code, as well as in 
the new code. For this reason, I prefer not to express it as a while or 
as a counted for loop. That's why it has the for (;;) form, which 
promptly warns that more exit points at unusual positions should be 
expected.


The "Illegal base64 character" IOException in the original code is 
wrong, probably due to oversight. That's why the new code has two vars, 
namely i and v, to convey more meaningful diagnostic.


The new code keeps track of only one running variable pos instead of the 
original two vars off and len, witĥout using more live variables for 
this purpose. Less mutating vars help keeping the code simpler. The 
method has only one mutating var.


With the use of writing and reading bit positions fields wpos and rpos, 
checking when a byte in field bits is ready means testing rpos - 8 >= wpos.


I can add more comments, e.g., in the form of Dijkstra/Hoare-style loop 
invariants, if this helps improving confidence. Or I can add 
constant-guarded assertions to the proposed code instead of just 
comments. Or both.




There's nothing I can find in the coding convention [1] about block 
comments that states that continuation lines must start with the 
asterisk *, although the examples do so. While I find these asterisks 
quite distracting, it would be a trivial matter to add them in the next 
review iteration.


The end-of-line comments either comply with the rules in [1] or, in the 
description of the fields, have been inadvertently "inherited" as 
multiple lines from the original code. I will of course correct them 
there in the next review iteration.



Greetings
Raffaello



[1] 
https://www.oracle.com/java/technologies/javase/codeconventions-comments.html



On 2020-07-06 20:17, Roger Riggs wrote:

Hi Raffaello,

I'm still not sure it needed this much of a re-write to fix the bug;
It will take some time to look at the changes.

Regardless, OpenJDK conventions call for following the style of the 
existing code.
Your new comments follow neither the existing convention to use "//..." 
comments
nor the other prevalent comment form using /*... */ which use consistent 
indentation

and "* " on continuation lines.

Regards, Roger


On 7/3/20 11:48 AM, Raffaello Giulietti wrote:

Hello,

after Roger's notes, which escaped my attention before, I've withdrawn 
all the changes but for DecInputStream, *except* that I couldn't 
resist to simplify the maths in encodedOutLength(), while still using 
xxxExact() arithmetic.


Sorry for the confusion
Raffaello




Hi Raffaello,

There is way more code changed here than is needed to fix the bug.
General enhancement should be separated from bug fixes.
It makes it easier to review to see the bug was fixed
and easier to separately review other code to see that there are no 
unexpected changes.


If some of the changes are motivated by expected performance 
improvements,

there should be JMH tests comparing the before and after.
The change to use byte arrays seems useful, but even using char[]
there is little danger of cache thrashing.

If using the code using xxxExact was correct, don't change it.
Those methods are intrinsified and perform as well or better than 
using long.

Usually, it is better to leave code alone and not risk breaking it.

Special care needs taken when changing a method that is intrinsified.
The optimized version may use fields of the object and stop
working if they are changed.

In the test, the range of buffer sizes tests seems to waste a lot
of cycles on sizes greater than the encoded size of the input.

Regards, Roger



-

# HG changeset patch
# User lello
# Date 1593790152 -7200
#  Fri Jul 03 17:29:12 2020 +0200
# Node ID 73370832de1d2bf3b450930f5cb2467e10528c69
# Parent  a7c0307232406c7b0c1a4b8c2de111077848203d
8222187: java.util.Base64.Decoder stream adds unexpected null bytes at 
the end

Reviewed-by: TBD
Contributed-by: Raffaello Giulietti 

diff --git a/src/java.base/share/classes/java/util/Base64.java 
b/src/java.base/share/classes/java/util/Base64.java

--- a/src/java.base/share/classes/java/util/Base64.java
+++ b/src/java.base/share/classes/java/util/Base64.java
@@ -255,14 +255,11 @@
  *
  */
 private final int encodedOutLength(int srclen, boolean 
throwOOME) {

-    int len = 0;
+    int len;
 try {
-    if (doPadding) {
-    len = Math.multiplyExact(4, 
(Math.addExact(srclen, 2) / 3));

-    } else {
-    int n = srclen % 3;
-    len = Math.addExact(Math.multiplyExact(4, (srclen 
/ 3)), (n == 0 ? 0 : n + 1));

-    }
+    len = doPadding
+    ? Math.multiplyExact(4, 
(Math.addExact(srclen, 2) / 3))
+  

Re: RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-06 Thread Roger Riggs

Hi Raffaello,

I'm still not sure it needed this much of a re-write to fix the bug;
It will take some time to look at the changes.

Regardless, OpenJDK conventions call for following the style of the 
existing code.
Your new comments follow neither the existing convention to use "//..." 
comments
nor the other prevalent comment form using /*... */ which use consistent 
indentation

and "* " on continuation lines.

Regards, Roger


On 7/3/20 11:48 AM, Raffaello Giulietti wrote:

Hello,

after Roger's notes, which escaped my attention before, I've withdrawn 
all the changes but for DecInputStream, *except* that I couldn't 
resist to simplify the maths in encodedOutLength(), while still using 
xxxExact() arithmetic.


Sorry for the confusion
Raffaello




Hi Raffaello,

There is way more code changed here than is needed to fix the bug.
General enhancement should be separated from bug fixes.
It makes it easier to review to see the bug was fixed
and easier to separately review other code to see that there are no 
unexpected changes.


If some of the changes are motivated by expected performance 
improvements,

there should be JMH tests comparing the before and after.
The change to use byte arrays seems useful, but even using char[]
there is little danger of cache thrashing.

If using the code using xxxExact was correct, don't change it.
Those methods are intrinsified and perform as well or better than 
using long.

Usually, it is better to leave code alone and not risk breaking it.

Special care needs taken when changing a method that is intrinsified.
The optimized version may use fields of the object and stop
working if they are changed.

In the test, the range of buffer sizes tests seems to waste a lot
of cycles on sizes greater than the encoded size of the input.

Regards, Roger



-

# HG changeset patch
# User lello
# Date 1593790152 -7200
#  Fri Jul 03 17:29:12 2020 +0200
# Node ID 73370832de1d2bf3b450930f5cb2467e10528c69
# Parent  a7c0307232406c7b0c1a4b8c2de111077848203d
8222187: java.util.Base64.Decoder stream adds unexpected null bytes at 
the end

Reviewed-by: TBD
Contributed-by: Raffaello Giulietti 

diff --git a/src/java.base/share/classes/java/util/Base64.java 
b/src/java.base/share/classes/java/util/Base64.java

--- a/src/java.base/share/classes/java/util/Base64.java
+++ b/src/java.base/share/classes/java/util/Base64.java
@@ -255,14 +255,11 @@
  *
  */
 private final int encodedOutLength(int srclen, boolean 
throwOOME) {

-    int len = 0;
+    int len;
 try {
-    if (doPadding) {
-    len = Math.multiplyExact(4, 
(Math.addExact(srclen, 2) / 3));

-    } else {
-    int n = srclen % 3;
-    len = Math.addExact(Math.multiplyExact(4, (srclen 
/ 3)), (n == 0 ? 0 : n + 1));

-    }
+    len = doPadding
+    ? Math.multiplyExact(4, 
(Math.addExact(srclen, 2) / 3))
+    : Math.addExact((Math.addExact(srclen, 2) / 
3), srclen);

 if (linemax > 0) { // line separators
 len = Math.addExact(len, (len - 1) / linemax * 
newline.length);

 }
@@ -961,14 +958,15 @@

 private final InputStream is;
 private final boolean isMIME;
-    private final int[] base64;  // base64 -> byte mapping
-    private int bits = 0;    // 24-bit buffer for decoding
-    private int nextin = 18; // next available "off" in 
"bits" for input;

- // -> 18, 12, 6, 0
-    private int nextout = -8;    // next available "off" in 
"bits" for output;
- // -> 8, 0, -8 (no byte for 
output)

-    private boolean eof = false;
-    private boolean closed = false;
+    private final int[] base64; // mapping from alphabet to 
values

+    private int bits;   // 24 bit buffer for decoding
+    private int wpos;   // writing bit pos inside bits
+    // one of 24 (left, msb), 18, 12, 6, 0
+    private int rpos;   // reading bit pos inside bits
+    // one of 24 (left, msb), 16, 8, 0
+    private boolean eos;
+    private boolean closed;
+    private byte[] onebuf = new byte[1];

 DecInputStream(InputStream is, int[] base64, boolean isMIME) {
 this.is = is;
@@ -976,114 +974,158 @@
 this.isMIME = isMIME;
 }

-    private byte[] sbBuf = new byte[1];
-
 @Override
 public int read() throws IOException {
-    return read(sbBuf, 0, 1) == -1 ? -1 : sbBuf[0] & 0xff;
+    return read(onebuf, 0, 1) >= 0 ? onebuf[0] & 0xff : -1;
+    }
+
+    private int leftovers(byte[] b, int off, int pos, int limit) {
+    eos = true;
+    /*
+    We use a loop here, as this code is executed 

RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-03 Thread Raffaello Giulietti

Hello,

after Roger's notes, which escaped my attention before, I've withdrawn 
all the changes but for DecInputStream, *except* that I couldn't resist 
to simplify the maths in encodedOutLength(), while still using 
xxxExact() arithmetic.


Sorry for the confusion
Raffaello




Hi Raffaello,

There is way more code changed here than is needed to fix the bug.
General enhancement should be separated from bug fixes.
It makes it easier to review to see the bug was fixed
and easier to separately review other code to see that there are no 
unexpected changes.


If some of the changes are motivated by expected performance improvements,
there should be JMH tests comparing the before and after.
The change to use byte arrays seems useful, but even using char[]
there is little danger of cache thrashing.

If using the code using xxxExact was correct, don't change it.
Those methods are intrinsified and perform as well or better than using 
long.

Usually, it is better to leave code alone and not risk breaking it.

Special care needs taken when changing a method that is intrinsified.
The optimized version may use fields of the object and stop
working if they are changed.

In the test, the range of buffer sizes tests seems to waste a lot
of cycles on sizes greater than the encoded size of the input.

Regards, Roger



-

# HG changeset patch
# User lello
# Date 1593790152 -7200
#  Fri Jul 03 17:29:12 2020 +0200
# Node ID 73370832de1d2bf3b450930f5cb2467e10528c69
# Parent  a7c0307232406c7b0c1a4b8c2de111077848203d
8222187: java.util.Base64.Decoder stream adds unexpected null bytes at 
the end

Reviewed-by: TBD
Contributed-by: Raffaello Giulietti 

diff --git a/src/java.base/share/classes/java/util/Base64.java 
b/src/java.base/share/classes/java/util/Base64.java

--- a/src/java.base/share/classes/java/util/Base64.java
+++ b/src/java.base/share/classes/java/util/Base64.java
@@ -255,14 +255,11 @@
  *
  */
 private final int encodedOutLength(int srclen, boolean 
throwOOME) {

-int len = 0;
+int len;
 try {
-if (doPadding) {
-len = Math.multiplyExact(4, (Math.addExact(srclen, 
2) / 3));

-} else {
-int n = srclen % 3;
-len = Math.addExact(Math.multiplyExact(4, (srclen / 
3)), (n == 0 ? 0 : n + 1));

-}
+len = doPadding
+? Math.multiplyExact(4, (Math.addExact(srclen, 
2) / 3))
+: Math.addExact((Math.addExact(srclen, 2) / 3), 
srclen);
 if (linemax > 0) { // line 
separators
 len = Math.addExact(len, (len - 1) / linemax * 
newline.length);

 }
@@ -961,14 +958,15 @@

 private final InputStream is;
 private final boolean isMIME;
-private final int[] base64;  // base64 -> byte mapping
-private int bits = 0;// 24-bit buffer for decoding
-private int nextin = 18; // next available "off" in 
"bits" for input;

- // -> 18, 12, 6, 0
-private int nextout = -8;// next available "off" in 
"bits" for output;
- // -> 8, 0, -8 (no byte for 
output)

-private boolean eof = false;
-private boolean closed = false;
+private final int[] base64; // mapping from alphabet to values
+private int bits;   // 24 bit buffer for decoding
+private int wpos;   // writing bit pos inside bits
+// one of 24 (left, msb), 18, 12, 6, 0
+private int rpos;   // reading bit pos inside bits
+// one of 24 (left, msb), 16, 8, 0
+private boolean eos;
+private boolean closed;
+private byte[] onebuf = new byte[1];

 DecInputStream(InputStream is, int[] base64, boolean isMIME) {
 this.is = is;
@@ -976,114 +974,158 @@
 this.isMIME = isMIME;
 }

-private byte[] sbBuf = new byte[1];
-
 @Override
 public int read() throws IOException {
-return read(sbBuf, 0, 1) == -1 ? -1 : sbBuf[0] & 0xff;
+return read(onebuf, 0, 1) >= 0 ? onebuf[0] & 0xff : -1;
+}
+
+private int leftovers(byte[] b, int off, int pos, int limit) {
+eos = true;
+/*
+We use a loop here, as this code is executed only once.
+Unrolling the loop would probably not contribute much here.
+ */
+while (rpos - 8 >= wpos && pos != limit) {
+b[pos++] = (byte) (bits >> (rpos -= 8));
+}
+return pos - off != 0 || rpos - 8 >= wpos ? pos - off : -1;
 }

-private int eof(byte[] b, int off, int len, int oldOff)
-throws IOException
-{
-eof = true;
-

RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

2020-07-03 Thread Raffaello Giulietti

Hi,

with respect to [1], this version of the patch includes performance 
enhancements for the case of small buffers, in particular for the 
implementation of the no-arg read(), which makes use of a one-byte array.


I also expanded the comments.

As I don't have a webrev repo, I beg the reviewer(s) to publish the 
patch on the own repo, if it makes sense.



Greetings
Raffaello



[1] 
https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-June/067538.html


---

# HG changeset patch
# User lello
# Date 1593779777 -7200
#  Fri Jul 03 14:36:17 2020 +0200
# Node ID 72f0e35ada03c4b850dc1bcd47e3296f14c6e531
# Parent  a7c0307232406c7b0c1a4b8c2de111077848203d
8222187: java.util.Base64.Decoder stream adds unexpected null bytes at 
the end

Reviewed-by: TBD
Contributed-by: Raffaello Giulietti 

diff --git a/src/java.base/share/classes/java/util/Base64.java 
b/src/java.base/share/classes/java/util/Base64.java

--- a/src/java.base/share/classes/java/util/Base64.java
+++ b/src/java.base/share/classes/java/util/Base64.java
@@ -30,7 +30,6 @@
 import java.io.IOException;
 import java.io.OutputStream;
 import java.nio.ByteBuffer;
-import java.nio.charset.StandardCharsets;

 import sun.nio.cs.ISO_8859_1;

@@ -133,7 +132,7 @@
  */
 public static Encoder getMimeEncoder(int lineLength, byte[] 
lineSeparator) {

  Objects.requireNonNull(lineSeparator);
- int[] base64 = Decoder.fromBase64;
+ byte[] base64 = Decoder.fromBase64;
  for (byte b : lineSeparator) {
  if (base64[b & 0xff] != -1)
  throw new IllegalArgumentException(
@@ -216,7 +215,7 @@
  * index values into their "Base64 Alphabet" equivalents as 
specified

  * in "Table 1: The Base64 Alphabet" of RFC 2045 (and RFC 4648).
  */
-private static final char[] toBase64 = {
+private static final byte[] toBase64 = {
 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 
'L', 'M',
 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 
'Y', 'Z',
 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 
'l', 'm',

@@ -229,7 +228,7 @@
  * in Table 2 of the RFC 4648, with the '+' and '/' changed to 
'-' and

  * '_'. This table is used when BASE64_URL is specified.
  */
-private static final char[] toBase64URL = {
+private static final byte[] toBase64URL = {
 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 
'L', 'M',
 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 
'Y', 'Z',
 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 
'l', 'm',

@@ -238,7 +237,7 @@
 };

 private static final int MIMELINEMAX = 76;
-private static final byte[] CRLF = new byte[] {'\r', '\n'};
+private static final byte[] CRLF = {'\r', '\n'};

 static final Encoder RFC4648 = new Encoder(false, null, -1, true);
 static final Encoder RFC4648_URLSAFE = new Encoder(true, null, 
-1, true);

@@ -255,27 +254,19 @@
  *
  */
 private final int encodedOutLength(int srclen, boolean 
throwOOME) {

-int len = 0;
-try {
-if (doPadding) {
-len = Math.multiplyExact(4, (Math.addExact(srclen, 
2) / 3));

-} else {
-int n = srclen % 3;
-len = Math.addExact(Math.multiplyExact(4, (srclen / 
3)), (n == 0 ? 0 : n + 1));

-}
-if (linemax > 0) { // line 
separators
-len = Math.addExact(len, (len - 1) / linemax * 
newline.length);

-}
-} catch (ArithmeticException ex) {
-if (throwOOME) {
-throw new OutOfMemoryError("Encoded size is too 
large");

-} else {
-// let the caller know that encoded bytes length
-// is too large
-len = -1;
-}
+long len = doPadding
+? (srclen + 2L) / 3 * 4
+: (srclen + 2L) / 3 + srclen;
+if (linemax > 0) {
+len += (len - 1) / linemax * newline.length;
 }
-return len;
+if (len <= Integer.MAX_VALUE) {
+return (int) len;
+}
+if (throwOOME) {
+throw new OutOfMemoryError("Encoded size is too large");
+}
+return -1;
 }

 /**
@@ -421,24 +412,24 @@

 @HotSpotIntrinsicCandidate
 private void encodeBlock(byte[] src, int sp, int sl, byte[] 
dst, int dp, boolean isURL) {

-char[] base64 = isURL ? toBase64URL : toBase64;
+byte[] base64 = isURL ? toBase64URL : toBase64;
 for (int sp0 = sp, dp0 = dp ; sp0 < sl; ) {
 int bits = (src[sp0++] &