Re: Review/comment needed for the new public java.util.Base64 class

2012-11-30 Thread Ulf Zibis

Hi Sherman,

is this ssue still open ?

-Ulf


Am 03.11.2012 21:33, schrieb Ulf Zibis:

Am 30.10.2012 23:40, schrieb Xueming Shen:

I'm confused about the order of xxcode() and Xxcoder.
In other places e.g. charsets, we have de... before en..., which is also true 
alphabetical


should not be an issue. javadoc output should be in alphabetic order. If it is 
really
concerns you, I can do a copy/paste:-)


Yes, it doesn't matter much, but would be a nice conform style, so for me personally I would 
like the copy/paste:-)



- What (should) happen(s), if lineSeparator.length == 0 ?

do nothing. expected?


I would say, empty line separator should not be allowed, so you might check:
 Objects.requireNonEmpty(lineSeparator);


 603 static {
 604 Arrays.fill(fromBase64, -1);
 605 for (int i = 0; i  Encoder.toBase64.length; i++)
 606 fromBase64[Encoder.toBase64[i]] = i;
 607 fromBase64['='] = -2;
 608 }
This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 
table to the outer class.

Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?


understood. 


It seems you have mixed 2 tweaks to one. ;-) See additional paragraph at the 
end ...


but it appears the hotspot likes it the constant/fixed length lookup
table is inside encoder. 

Yes, but see my suggestion 12 lines below.


Same as you suggestion in your previous email to use
String in source and expand it during runtime. It saves about 500 bytes but 
slows
the server vm.


Are you really sure? As it only runs once at class init, JIT should not compile 
this code.
Executing the bytecode to init the final int[], value for value, by interpreter should not be 
faster as expanding a String source in a loop.



Those repeating lines of isURL?  is also supposed to be a
performance tweak to eliminate the array boundary check in loop.


Yep, so I was thinking about:
 653 private final int[] base64;
 654 private final boolean isMIME;
 655
 656 private Decoder(boolean isURL, boolean isMIME) {
 657 base64 = isURL ? fromBase64URL : fromBase64;
 658 this.isMIME = isMIME;
 659 }
.
 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst) {
 912 int[] base64 = this.base64; // local copy for performance reasons (but maybe 
superfluous if HotSpot is intelligent enough)

or:
 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst, int[] 
base64) {
.


Why you mix the algorithms to check the padding? :
 824 if (b == -2) {   // padding byte
 890 if (b == '=') {


It is supposed reduce one if check for normal base64 character inside the
loop. I'm not that sure it really helps though.


 925 if (b == '=') {
-- causes one additional if in the _main_ path of the loop, so should be 
slower for regular input

 859 if (b == -2) {   // padding byte
-- causes one additional if in the _side_ path of the loop, so should only 
affect irregular input
Maybe the following code is little faster as no loading of the constant '-2' is 
required:
 858 if ((b = base64[b])  0) {
 859 if (++b  0) {   // padding byte '='


Once again (the following was meant for the decode algorithm, not 
initialisation):
Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?
Retrieving the bytes by b = base64[x] then could be done without address-shift and smaller/faster 
LoadB operations could be used by JIT. In an int[] table, the index offset must be shifted by 2 
before. Maybe this doesn't directly affect the performance on x86/IA64 CPU, but wastes space in 
CPU cache for other tasks as a side effect. On ARM architectures I imagine, directly addressing a 
byte-stepped table could be faster than addressing a 4-byte-stepped table. At least the footprint 
of the table would be smaller.


Little nit:
You could delete line 641 for conformity with 629.

-Ulf






Re: Review/comment needed for the new public java.util.Base64 class

2012-11-30 Thread Xueming Shen

Ulf,

The base64 implementation is in TL right now. It does address some of the
issue you suggested in this email. And I remember I did try use byte[]
alphabets, which I don't recall bring us any benefit, but I'm not sure in which
setup.  The latest is at

http://cr.openjdk.java.net/~sherman/8004212/webrev/

in which I'm trying to fix a corner case of incorrect return value from 
decode(buf, buf).

The Base64 now is in TL does not necessary mean the issue is closed. You
can continue suggest/comment on the latest version for any possible performance
improvement, bug fix and even API change, if necessary.

-Sherman


On 11/30/2012 02:01 PM, Ulf Zibis wrote:

Hi Sherman,

is this ssue still open ?

-Ulf


Am 03.11.2012 21:33, schrieb Ulf Zibis:

Am 30.10.2012 23:40, schrieb Xueming Shen:

I'm confused about the order of xxcode() and Xxcoder.
In other places e.g. charsets, we have de... before en..., which is also true 
alphabetical


should not be an issue. javadoc output should be in alphabetic order. If it is 
really
concerns you, I can do a copy/paste:-)


Yes, it doesn't matter much, but would be a nice conform style, so for me 
personally I would like the copy/paste:-)


- What (should) happen(s), if lineSeparator.length == 0 ?

do nothing. expected?


I would say, empty line separator should not be allowed, so you might check:
 Objects.requireNonEmpty(lineSeparator);


 603 static {
 604 Arrays.fill(fromBase64, -1);
 605 for (int i = 0; i  Encoder.toBase64.length; i++)
 606 fromBase64[Encoder.toBase64[i]] = i;
 607 fromBase64['='] = -2;
 608 }
This causes the inner Encoder class to be loaded, even if not needed. So maybe 
move the toBase64 table to the outer class.
Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?


understood. 


It seems you have mixed 2 tweaks to one. ;-) See additional paragraph at the 
end ...


but it appears the hotspot likes it the constant/fixed length lookup
table is inside encoder. 

Yes, but see my suggestion 12 lines below.


Same as you suggestion in your previous email to use
String in source and expand it during runtime. It saves about 500 bytes but 
slows
the server vm.


Are you really sure? As it only runs once at class init, JIT should not compile 
this code.
Executing the bytecode to init the final int[], value for value, by interpreter 
should not be faster as expanding a String source in a loop.


Those repeating lines of isURL?  is also supposed to be a
performance tweak to eliminate the array boundary check in loop.


Yep, so I was thinking about:
 653 private final int[] base64;
 654 private final boolean isMIME;
 655
 656 private Decoder(boolean isURL, boolean isMIME) {
 657 base64 = isURL ? fromBase64URL : fromBase64;
 658 this.isMIME = isMIME;
 659 }
.
 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst) {
 912 int[] base64 = this.base64; // local copy for performance 
reasons (but maybe superfluous if HotSpot is intelligent enough)
or:
 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst, int[] 
base64) {
.


Why you mix the algorithms to check the padding? :
 824 if (b == -2) {   // padding byte
 890 if (b == '=') {


It is supposed reduce one if check for normal base64 character inside the
loop. I'm not that sure it really helps though.


 925 if (b == '=') {
-- causes one additional if in the _main_ path of the loop, so should be 
slower for regular input

 859 if (b == -2) {   // padding byte
-- causes one additional if in the _side_ path of the loop, so should only 
affect irregular input
Maybe the following code is little faster as no loading of the constant '-2' is 
required:
 858 if ((b = base64[b])  0) {
 859 if (++b  0) {   // padding byte '='


Once again (the following was meant for the decode algorithm, not 
initialisation):
Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?
Retrieving the bytes by b = base64[x] then could be done without address-shift 
and smaller/faster LoadB operations could be used by JIT. In an int[] table, 
the index offset must be shifted by 2 before. Maybe this doesn't directly 
affect the performance on x86/IA64 CPU, but wastes space in CPU cache for other 
tasks as a side effect. On ARM architectures I imagine, directly addressing a 
byte-stepped table could be faster than addressing a 4-byte-stepped table. At 
least the footprint of the table would be smaller.

Little nit:
You could delete line 641 for conformity with 629.

-Ulf








Re: Review/comment needed for the new public java.util.Base64 class

2012-11-03 Thread Ulf Zibis

Am 30.10.2012 23:40, schrieb Xueming Shen:

I'm confused about the order of xxcode() and Xxcoder.
In other places e.g. charsets, we have de... before en..., which is also true 
alphabetical


should not be an issue. javadoc output should be in alphabetic order. If it is 
really
concerns you, I can do a copy/paste:-)


Yes, it doesn't matter much, but would be a nice conform style, so for me personally I would like 
the copy/paste:-)



- What (should) happen(s), if lineSeparator.length == 0 ?

do nothing. expected?


I would say, empty line separator should not be allowed, so you might check:
 Objects.requireNonEmpty(lineSeparator);


 603 static {
 604 Arrays.fill(fromBase64, -1);
 605 for (int i = 0; i  Encoder.toBase64.length; i++)
 606 fromBase64[Encoder.toBase64[i]] = i;
 607 fromBase64['='] = -2;
 608 }
This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 
table to the outer class.

Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?


understood. 


It seems you have mixed 2 tweaks to one. ;-) See additional paragraph at the 
end ...


but it appears the hotspot likes it the constant/fixed length lookup
table is inside encoder. 

Yes, but see my suggestion 12 lines below.


Same as you suggestion in your previous email to use
String in source and expand it during runtime. It saves about 500 bytes but 
slows
the server vm.


Are you really sure? As it only runs once at class init, JIT should not compile 
this code.
Executing the bytecode to init the final int[], value for value, by interpreter should not be faster 
as expanding a String source in a loop.



Those repeating lines of isURL?  is also supposed to be a
performance tweak to eliminate the array boundary check in loop.


Yep, so I was thinking about:
 653 private final int[] base64;
 654 private final boolean isMIME;
 655
 656 private Decoder(boolean isURL, boolean isMIME) {
 657 base64 = isURL ? fromBase64URL : fromBase64;
 658 this.isMIME = isMIME;
 659 }
.
 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst) {
 912 int[] base64 = this.base64; // local copy for performance reasons (but maybe 
superfluous if HotSpot is intelligent enough)

or:
 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst, int[] 
base64) {
.


Why you mix the algorithms to check the padding? :
 824 if (b == -2) {   // padding byte
 890 if (b == '=') {


It is supposed reduce one if check for normal base64 character inside the
loop. I'm not that sure it really helps though.


 925 if (b == '=') {
-- causes one additional if in the _main_ path of the loop, so should be 
slower for regular input

 859 if (b == -2) {   // padding byte
-- causes one additional if in the _side_ path of the loop, so should only 
affect irregular input
Maybe the following code is little faster as no loading of the constant '-2' is 
required:
 858 if ((b = base64[b])  0) {
 859 if (++b  0) {   // padding byte '='


Once again (the following was meant for the decode algorithm, not 
initialisation):
Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?
Retrieving the bytes by b = base64[x] then could be done without address-shift and smaller/faster 
LoadB operations could be used by JIT. In an int[] table, the index offset must be shifted by 2 
before. Maybe this doesn't directly affect the performance on x86/IA64 CPU, but wastes space in CPU 
cache for other tasks as a side effect. On ARM architectures I imagine, directly addressing a 
byte-stepped table could be faster than addressing a 4-byte-stepped table. At least the footprint of 
the table would be smaller.


Little nit:
You could delete line 641 for conformity with 629.

-Ulf



Re: Review/comment needed for the new public java.util.Base64 class

2012-10-30 Thread Ulf Zibis

Am 24.10.2012 04:56, schrieb Xueming Shen:

Thanks for the review. I hope I addressed most of the comments in the
updated webrev at

http://cr.openjdk.java.net/~sherman/4235519/webrev


Hi Sherman,

my additional comments:

I'm confused about the order of xxcode() and Xxcoder.
In other places e.g. charsets, we have de... before en..., which is also true 
alphabetical

1026 private Base64() {}
1027 private static final int MIMELINEMAX = 76;
1028 private static final byte[] crlf = new byte[] {'\r', '\n'};
- difficult to find, please move (more) to the beginning, at least not in between inner class 
definitions.

- I more would like CRLF than crlf

 118  * @param lineLengththe length of each output line (rounded down
 119  *  to nearest multiple of 4).
This could be interpreted as: the parameter must be rounded before, so maybe 
better:
 118  * @param lineLengththe length of each output line (if not a 
multiple of 4,
 119  *  it will be rounded down accordingly).


 126 public static Encoder getEncoder(int lineLength, byte[] lineSeparator) 
{
 127  Objects.requireNonNull(lineSeparator);
 128  return new Encoder(false, lineSeparator, lineLength / 4 * 4);
 129 }
- What (should) happen(s), if lineSeparator.length == 0 ?
- Isn't one of these little faster, or at least more clear? :
lineLength -= lineLength % 4
lineLength  0xFFFC
lineLength  2  2

Broken indentation (why at all, compared to lines 213..215?):
 208 private final byte[] newline;
 209 private final intlinemax;
 210 private final boolean isURL;

Unconventional indentation and even broken in lines 223..224 (maybe more?):
 247  * @param   src   the byte array to encode
 248  * @param   dst   the output byte array
 249  * @returnThe number of bytes written to the output byte 
array
 250  *
 251  * @throwsIllegalArgumentException if {@code dst} does not 
have
 252  *enough space for encoding all input bytes.

More conventional:
 247  * @param  src  the byte array to encode
 248  * @param  dst  the output byte array
 249  * @return The number of bytes written to the output byte array
 250  *
 251  * @throws IllegalArgumentException if {@code dst} does not have
 252  * enough space for encoding all input bytes.

 603 static {
 604 Arrays.fill(fromBase64, -1);
 605 for (int i = 0; i  Encoder.toBase64.length; i++)
 606 fromBase64[Encoder.toBase64[i]] = i;
 607 fromBase64['='] = -2;
 608 }
This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 
table to the outer class.

Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?
Retrieving the bytes by b = base64[x] then could be done without address-shift and smaller/faster 
LoadB operations could be used by JIT. At least the footprint of the table would be smaller.


Missing spaces:
 805 return new DecInputStream(is, isURL?fromBase64URL:fromBase64, 
isMIME);
Why at all you repeat this code many times? Why not? :
 631 this.base64 = isURL ? fromBase64URL : fromBase64;
Also it would at least be helpful for readers to define final, e.g.:
 809 final int[] base64 = isURL ? fromBase64URL : fromBase64;

Why you mix the algorithms to check the padding? :
 824 if (b == -2) {   // padding byte
 890 if (b == '=') {


-Ulf



Re: Review/comment needed for the new public java.util.Base64 class

2012-10-30 Thread Ulf Zibis

Oops:

lineLength  2  2

I meant::
lineLength  2  2

-Ulf



Re: Review/comment needed for the new public java.util.Base64 class

2012-10-30 Thread Ulf Zibis

Hi Sherman,

thanks for your details.
Has this discussion been on the list, and I have missed it?

I see a problem with hiding the singleton choice:
Developers might tend to repetitively invoke
public static Encoder getEncoder(int lineLength, byte[] lineSeparator)
instead reusing the the same instance.
(Note: It should not harm, to move the
 Objects.requireNonNull(lineSeparator);
to the general private constructor.)
In case of a public constant
Encoder.RFC4648
... developer would be aware about the re-usability of the encoder.

IMHO, the get...() methods are just waste of source lines and bytecode 
footprint.

But in contrast to the v4 I like the inner class approach: Base64.De/Encoder.

Little nit: In lines 89,100,127,128,138,149,159 the indentation is 5 instead 4.

-Ulf



Am 30.10.2012 02:51, schrieb Xueming Shen:

Ulf, my apology. Some how I missed your email.

We tried various prototypes for this simple utility class. See
http://cr.openjdk.java.net/~sherman/base64/

The v4 might be close to the static constant approach you suggested. While It's 
hard
to draw a clear line on which one is better, we concluded that the proposed 
approach
provides the best balance among usability, readability and extensibility. For 
example,
the get approach allows us to hide the singleton choice to the 
implementation. It
provides a consistent interface fixed basic/url/mime type en/decoder and the 
configurable
floating length/linefeed encoder.

-Sherman

On 10/29/12 11:15 AM, Ulf Zibis wrote:

Hi Sherman,

can you give me a short answer please?

-Ulf

Am 23.10.2012 16:57, schrieb Ulf Zibis:

Am 23.10.2012 15:04, schrieb Alan Bateman:
I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The 
reason is that the method name makes it sound like it returns a URLEncoder or or at least an 
encoder for HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a 
base64url encoder.


I'm wondering, why there are those get... methods at all.

Alternatively you could make the appropriate constructors and predifined static variants public. 
So one only should use:

Base64.Encoder encoder = new Base64.Encoder(...);
Base64.Encoder urlEncoder = Base64.Encoder.RFC4648_URLSAFE;

No need for those looong method names.

-Ulf











Re: Review/comment needed for the new public java.util.Base64 class

2012-10-30 Thread Xueming Shen

Hi Ulf, thanks for the comments.

Webrev has been updated to address most of your comments.

http://cr.openjdk.java.net/~sherman/4235519/webrev

(1) all @param and @return tags have been normalized.
(2) private constructor BAse64() {} has been moved up.
(3) MIMELINEMAX and CRLF have been moved into encoder.
(4) - lineLength 22;

Your questions:




I'm confused about the order of xxcode() and Xxcoder.
In other places e.g. charsets, we have de... before en..., which is 
also true alphabetical


should not be an issue. javadoc output should be in alphabetic order. If 
it is really

concerns you, I can do a copy/paste:-)





- What (should) happen(s), if lineSeparator.length == 0 ?

do nothing. expected?




- Isn't one of these little faster, or at least more clear? :
lineLength -= lineLength % 4
lineLength  0xFFFC
lineLength  2  2


 603 static {
 604 Arrays.fill(fromBase64, -1);
 605 for (int i = 0; i  Encoder.toBase64.length; i++)
 606 fromBase64[Encoder.toBase64[i]] = i;
 607 fromBase64['='] = -2;
 608 }
This causes the inner Encoder class to be loaded, even if not needed. 
So maybe move the toBase64 table to the outer class.
Have you compared performance with fromBase64 as byte[] (including 
local 'b' in decode() as byte)?


understood. but it appears the hotspot likes it the constant/fixed 
length lookup
table is inside encoder. Same as you suggestion in your previous email 
to use
String in source and expand it during runtime. It saves about 500 bytes 
but slows
the server vm. Those repeating lines of isURL?  is also supposed 
to be a

performance tweak to eliminate the array boundary check in loop.




Why you mix the algorithms to check the padding? :
 824 if (b == -2) {   // padding byte
 890 if (b == '=') {


It is supposed reduce one if check for normal base64 character inside the
loop. I'm not that sure it really helps though.

-Sherman



Re: Review/comment needed for the new public java.util.Base64 class

2012-10-29 Thread Ulf Zibis

Hi Sherman,

can you give me a short answer please?

-Ulf

Am 23.10.2012 16:57, schrieb Ulf Zibis:

Am 23.10.2012 15:04, schrieb Alan Bateman:
I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The reason 
is that the method name makes it sound like it returns a URLEncoder or or at least an encoder for 
HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a base64url encoder.


I'm wondering, why there are those get... methods at all.

Alternatively you could make the appropriate constructors and predifined static variants public. 
So one only should use:

Base64.Encoder encoder = new Base64.Encoder(...);
Base64.Encoder urlEncoder = Base64.Encoder.RFC4648_URLSAFE;

No need for those looong method names.

-Ulf






Re: Review/comment needed for the new public java.util.Base64 class

2012-10-29 Thread Xueming Shen

Ulf, my apology. Some how I missed your email.

We tried various prototypes for this simple utility class. See
http://cr.openjdk.java.net/~sherman/base64/

The v4 might be close to the static constant approach you suggested. 
While It's hard
to draw a clear line on which one is better, we concluded that the 
proposed approach
provides the best balance among usability, readability and 
extensibility. For example,
the get approach allows us to hide the singleton choice to the 
implementation. It
provides a consistent interface fixed basic/url/mime type en/decoder 
and the configurable

floating length/linefeed encoder.

-Sherman

On 10/29/12 11:15 AM, Ulf Zibis wrote:

Hi Sherman,

can you give me a short answer please?

-Ulf

Am 23.10.2012 16:57, schrieb Ulf Zibis:

Am 23.10.2012 15:04, schrieb Alan Bateman:
I'm not sure that getUrlEncoder is the most suitable name to get a 
base64url encoder. The reason is that the method name makes it sound 
like it returns a URLEncoder or or at least an encoder for HTML 
forms. While more verbose, getBase64UrlEncoder is clear that it 
returns a base64url encoder.


I'm wondering, why there are those get... methods at all.

Alternatively you could make the appropriate constructors and 
predifined static variants public. So one only should use:

Base64.Encoder encoder = new Base64.Encoder(...);
Base64.Encoder urlEncoder = Base64.Encoder.RFC4648_URLSAFE;

No need for those looong method names.

-Ulf








Re: Review/comment needed for the new public java.util.Base64 class

2012-10-23 Thread Alan Bateman

On 18/10/2012 03:10, Xueming Shen wrote:

:

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev
I took another pass over this, focusing on the API as that is what we 
have to get right. Performance is important too but I think the priority 
has to be the API first.


Overall I think it is quite nice and easy to use.

I wonder if the Base64 class description should try to establish the 
terms base64 and base64url so that they can be referenced from the 
various methods? That would avoid needing to duplicate references to the 
RFC 4668 and RFC 2045 in so many places.


I think it would also be useful if the specification indicated whether 
Encoders and Decoders are safe for use by concurrent threads. Also make 
it clear that NPE is thrown if any of the parameters are passed as null 
(unless otherwise specified of course).


I'm not sure that getUrlEncoder is the most suitable name to get a 
base64url encoder. The reason is that the method name makes it sound 
like it returns a URLEncoder or or at least an encoder for HTML forms. 
While more verbose, getBase64UrlEncoder is clear that it returns a 
base64url encoder.


Do you think getEncoder(int,byte[]) will be used much? If not then 
perhaps it should be left out for the first version (I guess part of 
this is getting used to providing the line separate as a byte array).


For the javadoc then Encoder and Decoder will need @since 1.8. They 
should probably cross reference each other too.


encode(byte[]) should be clearer that it encodes all bytes in the given 
array. Also I think it needs to be clear that the returned byte array is 
appropriately sized -- as currently worded it doesn't make it clear that 
they can't be extra elements at the end (odd as it might be).


Typo at line 215, byter array - byte array

Typo at line 246, methold - method.

Typo at line 247, arry - array

Type at line 254, encocde - encode

Typo at line 277, buffter - buffer. Another one at line 702.

Minor comment, but I assume you should move the 
@SuppressWarnings(deprecation) on encodeToString to after the method 
comment rather than before. I see the same thing in decode(String).


I think encode(ByteBuffer) needs to be clear as to the 
position/limit/capacity of the returned buffer.


I'm not sure so about encode(ba, null) returning the required length, it 
feels odd and a bit like some of the win32 APIs. If the intention is 
that the caller allocates the byte[] and then calls encode again then it 
would be easier to just encode(ba).


For the decoder methods then IllegalArgumentException may be thrown 
mid-flight, meaning that some bytes may have been written into output 
buffer or array even though an exception is thrown. I think this needs 
to be made clear in the spec. It also makes me wonder if this is the 
right exception, it feels like a more specialized malformed input exception.


Another point about the decode methods is that they stop at the padding 
bytes and so this allows for bytes after the padding. I'm curious about 
this choice and whether you considered this as an error? I see how this 
influences decode(ByteBuffer,ByteBuffer) to return a boolean and I just 
wonder if there are other choices to consider.


That's mostly it for now. I didn't check in the IDE but there are lot of 
imports that are don't appear to be  used, perhaps left over from 
previous iterations?


-Alan


Re: Review/comment needed for the new public java.util.Base64 class

2012-10-23 Thread Ulf Zibis

Am 23.10.2012 15:04, schrieb Alan Bateman:
I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The reason 
is that the method name makes it sound like it returns a URLEncoder or or at least an encoder for 
HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a base64url encoder.


I'm wondering, why there are those get... methods at all.

Alternatively you could make the appropriate constructors and predifined static variants public. So 
one only should use:

Base64.Encoder encoder = new Base64.Encoder(...);
Base64.Encoder urlEncoder = Base64.Encoder.RFC4648_URLSAFE;

No need for those looong method names.

-Ulf



Re: Review/comment needed for the new public java.util.Base64 class

2012-10-23 Thread Xueming Shen

Hi Alan,

Thanks for the review. I hope I addressed most of the comments in the
updated webrev at

http://cr.openjdk.java.net/~sherman/4235519/webrev

mainly

(1) Pulled the base64 terms up to the class doc and then be referenced 
from various methods
(2) Gave up the C style de/encode(byte[], null), just leave the invoker 
to have enough space. I
 remember Paul once suggested to have a convenient method to return 
the estimated length
 of the byte[] needed to en/decode a specified input byte 
array/buffer. I think we can do it
 later when it is really desired. I agreed your argument that if 
people need to get the size and

 prepare a new array, they bet off just call de/encode(byte[])
(3) Gave up the liberal decoding design, tight the spec/impl to treat 
the input as illegal base64

 if the padding character appears in the middle of input.
(4) Some clarification of spec for thread safe, null exception and the 
pos/limit of input/output byte

 buffer as suggested.
(5) Fixed the typos

I still don't have a better name for method getUrlEn/Decoder(), 
Base64.getBase64UrlEn/Decoder()

does not feel good for me.

-Sherman


On 10/23/12 6:04 AM, Alan Bateman wrote:

On 18/10/2012 03:10, Xueming Shen wrote:

:

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev
I took another pass over this, focusing on the API as that is what we 
have to get right. Performance is important too but I think the 
priority has to be the API first.


Overall I think it is quite nice and easy to use.

I wonder if the Base64 class description should try to establish the 
terms base64 and base64url so that they can be referenced from the 
various methods? That would avoid needing to duplicate references to 
the RFC 4668 and RFC 2045 in so many places.


I think it would also be useful if the specification indicated whether 
Encoders and Decoders are safe for use by concurrent threads. Also 
make it clear that NPE is thrown if any of the parameters are passed 
as null (unless otherwise specified of course).


I'm not sure that getUrlEncoder is the most suitable name to get a 
base64url encoder. The reason is that the method name makes it sound 
like it returns a URLEncoder or or at least an encoder for HTML forms. 
While more verbose, getBase64UrlEncoder is clear that it returns a 
base64url encoder.


Do you think getEncoder(int,byte[]) will be used much? If not then 
perhaps it should be left out for the first version (I guess part of 
this is getting used to providing the line separate as a byte array).


For the javadoc then Encoder and Decoder will need @since 1.8. They 
should probably cross reference each other too.


encode(byte[]) should be clearer that it encodes all bytes in the 
given array. Also I think it needs to be clear that the returned byte 
array is appropriately sized -- as currently worded it doesn't make it 
clear that they can't be extra elements at the end (odd as it might be).


Typo at line 215, byter array - byte array

Typo at line 246, methold - method.

Typo at line 247, arry - array

Type at line 254, encocde - encode

Typo at line 277, buffter - buffer. Another one at line 702.

Minor comment, but I assume you should move the 
@SuppressWarnings(deprecation) on encodeToString to after the method 
comment rather than before. I see the same thing in decode(String).


I think encode(ByteBuffer) needs to be clear as to the 
position/limit/capacity of the returned buffer.


I'm not sure so about encode(ba, null) returning the required length, 
it feels odd and a bit like some of the win32 APIs. If the intention 
is that the caller allocates the byte[] and then calls encode again 
then it would be easier to just encode(ba).


For the decoder methods then IllegalArgumentException may be thrown 
mid-flight, meaning that some bytes may have been written into output 
buffer or array even though an exception is thrown. I think this needs 
to be made clear in the spec. It also makes me wonder if this is the 
right exception, it feels like a more specialized malformed input 
exception.


Another point about the decode methods is that they stop at the 
padding bytes and so this allows for bytes after the padding. I'm 
curious about this choice and whether you considered this as an error? 
I see how this influences decode(ByteBuffer,ByteBuffer) to return a 
boolean and I just wonder if there are other choices to consider.


That's mostly it for now. I didn't check in the IDE but there are lot 
of imports that are don't appear to be  used, perhaps left over from 
previous iterations?


-Alan




Re: Review/comment needed for the new public java.util.Base64 class

2012-10-23 Thread Xueming Shen



On 10/23/2012 8:50 PM, Mike Duigou wrote:

I'm eager to use this!

Some comments:

-ttcode/tt  is used in some places rather than {@code }

fixed


- The initialul  would perhaps be better as adl. I don't know if definition 
lists are allowed in javadoc.

- public OutputStream wrap(OutputStream os) closing the underlying output 
stream precludes many potential uses. FilterOutputStreams do not consistently 
close the underlying stream.
It has been an issue for a while. Obviously there are users that prefer 
the underlying stream not
get closed when the outer one is closed.  I think we have couple rfes 
that ask for a close this
stream only but not the underlying one close. But I don't think we can 
do this in the default
close(), otherwise, I'm sure we will gen another side of complain, the 
close() does not close the

underlying stream:-)


- spelling error methold

fixed


- InputStream wrap() it might be nice if the return stream EOF when the padding 
character(s) are consumed but not close() or otherwise alter the source 
InputStream

I need give it a little more time tomorrow.


- private static byte[] getBytes(ByteBuffer bb) should document, even though 
it's a private method, that the bytes returned should be not considered mutable 
since they might be shared.
I have been thinking of rewriting the de/encode(ByteBuffer) to get rid 
of getBytes(...). Probably
need to move some piece around. If I end up keeping it I will put some 
comment int.


- the if (ret != dst.length) // TBD: not necessary is worrisome as it will 
generate garbage.
while I'm pretty sure I don't need this for encode (it is definitely 
needed for

decode()), I just keep it for safe:-) I removed the TBD comment though.


- I'm still wishing String encode(byte[] buf) didn't use an intermediate byte 
array. :-)

- stream after use, in which -  stream after use, during which

fixed.

Thanks for taking a look.

-Sherman


On Oct 23 2012, at 19:56 , Xueming Shen wrote:


Hi Alan,

Thanks for the review. I hope I addressed most of the comments in the
updated webrev at

http://cr.openjdk.java.net/~sherman/4235519/webrev

mainly

(1) Pulled the base64 terms up to the class doc and then be referenced from 
various methods
(2) Gave up the C style de/encode(byte[], null), just leave the invoker to have 
enough space. I
 remember Paul once suggested to have a convenient method to return the 
estimated length
 of the byte[] needed to en/decode a specified input byte array/buffer. I 
think we can do it
 later when it is really desired. I agreed your argument that if people 
need to get the size and
 prepare a new array, they bet off just call de/encode(byte[])
(3) Gave up the liberal decoding design, tight the spec/impl to treat the input as 
illegal base64
 if the padding character appears in the middle of input.
(4) Some clarification of spec for thread safe, null exception and the 
pos/limit of input/output byte
 buffer as suggested.
(5) Fixed the typos

I still don't have a better name for method getUrlEn/Decoder(), 
Base64.getBase64UrlEn/Decoder()
does not feel good for me.

-Sherman


On 10/23/12 6:04 AM, Alan Bateman wrote:

On 18/10/2012 03:10, Xueming Shen wrote:

:

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

I took another pass over this, focusing on the API as that is what we have to 
get right. Performance is important too but I think the priority has to be the 
API first.

Overall I think it is quite nice and easy to use.

I wonder if the Base64 class description should try to establish the terms base64 and 
base64url so that they can be referenced from the various methods? That would avoid 
needing to duplicate references to the RFC 4668 and RFC 2045 in so many places.

I think it would also be useful if the specification indicated whether Encoders 
and Decoders are safe for use by concurrent threads. Also make it clear that 
NPE is thrown if any of the parameters are passed as null (unless otherwise 
specified of course).

I'm not sure that getUrlEncoder is the most suitable name to get a base64url 
encoder. The reason is that the method name makes it sound like it returns a 
URLEncoder or or at least an encoder for HTML forms. While more verbose, 
getBase64UrlEncoder is clear that it returns a base64url encoder.

Do you think getEncoder(int,byte[]) will be used much? If not then perhaps it 
should be left out for the first version (I guess part of this is getting used 
to providing the line separate as a byte array).

For the javadoc then Encoder and Decoder will need @since 1.8. They should 
probably cross reference each other too.

encode(byte[]) should be clearer that it encodes all bytes in the given array. 
Also I think it needs to be clear that the returned byte array is appropriately 
sized -- as currently worded it doesn't make it clear that they can't be extra 
elements at the end (odd as it might be).

Typo at line 215, byter array -  byte array

Typo at line 246, methold -  

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Xueming Shen


On 10/18/2012 6:19 AM, Ulf Zibis wrote:

Hi Sherman,

you could create the code table by help of a String constant, which 
would speed up the loading and save a little footprint on the class 
file. (you have forgotten those tricks from sun.nio.cs coders? ;-) )


The implementation probably needs more tuning.



But my main question:
Why don't you enqueue those coders in the well known sun.nio.cs.ext 
provider?


Base64 encoding is not a charset. Mainly it is supposed to be 
byte[]-byte[] coding.
Even for byte[]=char[]/String, Base64 encode byte[] to char[]/String 
and decode
from char[]/String to byte[]. The wrong direction compared to charset 
en/decoder.


At least, I think, class Base64 should be hosted in java.nio.charset 
rather than common java.util.

As I believe, that the usage of this encoding is far from common


you might be surprised how common it might be:-)

-Sherman

, adding it to the lazy loaded charsets.jar IMO should be more 
reasonable.


-Ulf


Am 18.10.2012 04:10, schrieb Xueming Shen:

Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
(2) some minor spec clarification regarding the end of decoding
(3) performance tuning.

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

some performance scores:
http://cr.openjdk.java.net/~sherman/4235519/score3

-Sherman







Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Ulf Zibis

Am 18.10.2012 16:04, schrieb Xueming Shen:


On 10/18/2012 6:19 AM, Ulf Zibis wrote:


Oops, you are working at 6 in the morning?


Hi Sherman,

you could create the code table by help of a String constant, which would speed up the loading 
and save a little footprint on the class file. (you have forgotten those tricks from sun.nio.cs 
coders? ;-) )


The implementation probably needs more tuning.


Hopefully, some day we have arrays of primitive types in the class file's 
constant pool.


But my main question:
Why don't you enqueue those coders in the well known sun.nio.cs.ext provider?


Base64 encoding is not a charset. Mainly it is supposed to be byte[]-byte[] 
coding.
Even for byte[]=char[]/String, Base64 encode byte[] to char[]/String and 
decode
from char[]/String to byte[]. The wrong direction compared to charset 
en/decoder.


1:0 for you.
I only had the ASCII/Unicode characters - Base64 encoding in mind, which seems in fact a 
Java-String to byte[] to char[]/String 2-step-encoding.





At least, I think, class Base64 should be hosted in java.nio.charset rather 
than common java.util.
As I believe, that the usage of this encoding is far from common


you might be surprised how common it might be:-)


or surprised, why it wasn't part of the JDK before.

-Ulf



Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Michael Schierl
Am 18.10.2012 04:10, schrieb Xueming Shen:
 Hi
 
 Webrev has been updated with following changes
 
 (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods

I think line 443 needs an additional check for atom == 0. Same for the
array case. At least if you do not want an endOfInput flag in the
parameters. If you have an endOfInput flag, you can use that in the
condition instead of atom == 0.

Consider this example where trying to encode 25 bytes using an input
ByteBuffer of 16 bytes and an output ByteBuffer of 32 bytes (Of course,
you'd use larger buffers in practice, but the same can happen with
larger buffers too, as long as your buffer size is not divisible by
three, or you do not completely fill the buffer with each read. Both can
happen if the media you are reading from uses a different sector/block
size - like when attaching a base64 attachment of a large file from disk
to an email).

First you fill 16 bytes of input into input buffer, then you call encode
the first time. it will encode the first 15 bytes to 20 bytes of output,
then encode the 16th byte to XY==. You flush the output and fill the
next 9 bytes into the input buffer. The next call to encode will encode
the 9 bytes to 12 bytes. So, in total you have 36 bytes, but the padding
is in the middle.

With the check, the first call will encode the first 15 bytes to 20
bytes of output and leave the 16th byte in the buffer. You flush the
output and load 9 bytes to the 1 remaining byte in the input buffer. The
second call will encode the first 9 bytes to 12 bytes and leave one byte
left over. You flush the buffer and call a third time, which will encode
the last byte to XY==. Still 36 bytes, this time with padding at the end.


Regards,


Michael


Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Xueming Shen

Michael,


The current approach assumes that the base64 en/decoding of a file, stream
will be handled by the en/decoder.wrap(...). The en/decode(bb, bb) is 
basically

for one time invocation input and output from/to existing buffers, with the
hope of avoiding the coding-life-circle management, so it assumes all input
will be passed in in the src buffer and only needs limit support for 
cases that
the existing output buffer is not sufficient. We will probably have to 
go with
the endOfInput flag approach, if the current version is still not 
enough for

most of common use cases.

-Sherman


On 10/18/2012 11:01 AM, Michael Schierl wrote:

Am 18.10.2012 04:10, schrieb Xueming Shen:

Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods

I think line 443 needs an additional check for atom == 0. Same for the
array case. At least if you do not want an endOfInput flag in the
parameters. If you have an endOfInput flag, you can use that in the
condition instead of atom == 0.

Consider this example where trying to encode 25 bytes using an input
ByteBuffer of 16 bytes and an output ByteBuffer of 32 bytes (Of course,
you'd use larger buffers in practice, but the same can happen with
larger buffers too, as long as your buffer size is not divisible by
three, or you do not completely fill the buffer with each read. Both can
happen if the media you are reading from uses a different sector/block
size - like when attaching a base64 attachment of a large file from disk
to an email).

First you fill 16 bytes of input into input buffer, then you call encode
the first time. it will encode the first 15 bytes to 20 bytes of output,
then encode the 16th byte to XY==. You flush the output and fill the
next 9 bytes into the input buffer. The next call to encode will encode
the 9 bytes to 12 bytes. So, in total you have 36 bytes, but the padding
is in the middle.

With the check, the first call will encode the first 15 bytes to 20
bytes of output and leave the 16th byte in the buffer. You flush the
output and load 9 bytes to the 1 remaining byte in the input buffer. The
second call will encode the first 9 bytes to 12 bytes and leave one byte
left over. You flush the buffer and call a third time, which will encode
the last byte to XY==. Still 36 bytes, this time with padding at the end.


Regards,


Michael




Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Mike Duigou
I wonder if there would be advantage in using a SharedSecrets mechanism to 
allow construction of a String directly from a char array. The intermediate 
byte array seems wasteful especially for what is likely to be a heavily used 
path.

Mike

On Oct 17 2012, at 19:10 , Xueming Shen wrote:

 Hi
 
 Webrev has been updated with following changes
 
 (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
 (2) some minor spec clarification regarding the end of decoding
 (3) performance tuning.
 
 webrev:
 http://cr.openjdk.java.net/~sherman/4235519/webrev
 
 some performance scores:
 http://cr.openjdk.java.net/~sherman/4235519/score3
 
 -Sherman



Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Xueming Shen

On 10/18/2012 07:43 AM, Alan Bateman wrote:

On 18/10/2012 03:10, Xueming Shen wrote:

Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
(2) some minor spec clarification regarding the end of decoding
(3) performance tuning.

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

some performance scores:
http://cr.openjdk.java.net/~sherman/4235519/score3

-Sherman
Have you done any performance tuning on the new methods that 
encode/decode into an existing ByteBuffer? Just wondering as the 
direct buffer case seems more expensive that I would expect. It would 
be interesting to see using the put(int,byte) and fixing up the 
position at the end would help. It might also be something to look at 
with the compiler folks.


get(index) and put(index, byte) do help, see 20%+ improvement.

sherman@sherman-linux:~/Workspace/jdk8/test/java/util/Base64$ java 
PermBase64 20 1000

 j.u.Base64.encode(ba)  : 702000
 j.u.Base64.encode(bb)  : 690024
 j.u.Base64.encode(bb, bb)  : 910582
 j.u.Base64.encode(bb, bb)-D: 1198606
  migBase64.encode(ba)  : 661271
--
 j.u.Base64.decode(ba)  : 797152
 j.u.Base64.decode(bb)  : 802968
 j.u.Base64.decode(bb, bb)  : 999577
 j.u.Base64.decode(bb, bb)-D: 1472003
  migBase64.decode(ba)  : 1458450

webrev has been updated accordingly. Now implementations of
de/encodeArray and de/encodeBuffer are identical except the
access operation.

-Sherman



Re: Review/comment needed for the new public java.util.Base64 class

2012-10-17 Thread Xueming Shen

Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
(2) some minor spec clarification regarding the end of decoding
(3) performance tuning.

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

some performance scores:
http://cr.openjdk.java.net/~sherman/4235519/score3

-Sherman


Re: Review/comment needed for the new public java.util.Base64 class

2012-10-13 Thread Peter Levart
One way to do it without state in encoder is 1st as suggested to have 
the 'endOfStream' flag and in addition when new-line breaks are 
requested, to only encode full lines at each invocation. So If one 
wants to use this line-breakage feature he/she must make sure that there 
is enough space (line_size+2 at worst) left in destination buffer to do 
any encoding or the encoder will not do anything (return 0 or false).


Regards, Peter

On 10/12/2012 10:58 PM, Xueming Shen wrote:


Yes, I'm trying to figure out where to have this  position info stored...

On 10/12/2012 01:48 PM, Michael Schierl wrote:

Am 12.10.2012 22:12, schrieb Xueming Shen:

Hi,

It appears to be possible to do something like

boolean de/encode(ByteBuffer src, ByteBuffer dst);

returns true if all remaining bytes in src are en/decoded, false, 
the dst
is not big enough for all output bytes, the src.position() will be 
advanced
to the position of next un-en/decoded byte, dst.position() will be 
updated

accordingly as well.

to avoid the en/decoder to hold an internal state.

If it was unclear, that was what I tried to suggest. I thought you
refered to the automatic adding of newlines when you said you'd need
internal state (although that state is small enough - how many encoded
bytes remaining before adding the next newline - so that it might be
possible to pass it as a parameter, too). I'm not sure how important
that adding of newlines really is (in all cases where I was guilty of
using sun.misc.Base64Encoder, I always replaced them out after
encoding), so maybe the ByteBuffer API could live without it.

Note that without an explicit end of stream flag, it might not be easily
possible to encode the last block - if you have a 16 byte buffer with 14
bytes filled, you cannot be sure whether you can encode the last two
bytes to XYZ= because it is end of stream or you have to wait for the
next byte. An option without the flag might be to assume when the input
is less than 3 bytes that it is the end of the stream. But I doubt it
will make the API easier to understand or use. :-)


Regards,


Michael






Re: Review/comment needed for the new public java.util.Base64 class

2012-10-13 Thread Xueming Shen


With moving some code piece around (moved those lookup table into 
En/Decoder)
and some array access code tuning (with the hope of eliminating the 
array boundary
access) in encode0(), it appears the hotspot now can better optimize the 
decoder
loop for the -server mode for encoding. decode0 also get 10% +boost with 
simply

moving in the lookup table.

The latest version is at (decode0() vs decode1())

http://cr.openjdk.java.net/~sherman/4235519/Base64.java

The latest scores is at

http://cr.openjdk.java.net/~sherman/4235519/score2

(compared to http://cr.openjdk.java.net/~sherman/4235519/score1)

-Sherman


On 10/11/12 11:38 PM, Xueming Shen wrote:



On 10/11/2012 02:43 AM, Stephen Colebourne wrote:

There are lots of other public base 64 implementations to test/check 
against.
http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html 

http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html 


http://migbase64.sourceforge.net/  (claims to be fast)




I did a quick performance check against migbase64 with the basic 
base64 de/encoding

using this non-scientific benchmark.
http://cr.openjdk.java.net/~sherman/4235519/PermBase64.java

Here is the scores
http://cr.openjdk.java.net/~sherman/4235519/scores

It's a tie, j.u.Base64 wins the decoding, but needs some work on 
encoding side.


java -server PermBase64 20 50
 j.u.Base64.encode() : 1390212
  migBase64.encode() : 720517
 j.u.Base64.decode() : 1200642
  migBase64.decode() : 2070015

java -server PermBase64 20 100
 j.u.Base64.encode() : 1239407
  migBase64.encode() : 740404
 j.u.Base64.decode() : 1257092
  migBase64.decode() : 2012910

java -server PermBase64 20 1000
 j.u.Base64.encode() : 1062212
  migBase64.encode() : 657342
 j.u.Base64.decode() : 1133740
  migBase64.decode() : 1930612

java -client PermBase64 20 50
 j.u.Base64.encode() : 961331
  migBase64.encode() : 875635
 j.u.Base64.decode() : 1528790
  migBase64.decode() : 2188473

java -client PermBase64 20 100
 j.u.Base64.encode() : 966453
  migBase64.encode() : 858082
 j.u.Base64.decode() : 1459159
  migBase64.decode() : 2115045

java -client PermBase64 20 1000
 j.u.Base64.encode() : 954401
  migBase64.encode() : 854903
 j.u.Base64.decode() : 1444603
  migBase64.decode() : 2038865






Re: Review/comment needed for the new public java.util.Base64 class

2012-10-13 Thread Xueming Shen

On 10/13/12 10:21 AM, Xueming Shen wrote:


With moving some code piece around (moved those lookup table into 
En/Decoder)
and some array access code tuning (with the hope of eliminating the 
array boundary
access) in encode0(), it appears the hotspot now can better optimize 
the decoder
loop for the -server mode for encoding. decode0 also get 10% +boost 
with simply

moving in the lookup table.

The latest version is at (decode0() vs decode1())

-- encode0() vs encode1())

encode0 is the latest/faster one. encode1 is the one in webrev.

-Sherman



http://cr.openjdk.java.net/~sherman/4235519/Base64.java

The latest scores is at

http://cr.openjdk.java.net/~sherman/4235519/score2

(compared to http://cr.openjdk.java.net/~sherman/4235519/score1)

-Sherman


On 10/11/12 11:38 PM, Xueming Shen wrote:



On 10/11/2012 02:43 AM, Stephen Colebourne wrote:

There are lots of other public base 64 implementations to 
test/check against.
http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html 

http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html 


http://migbase64.sourceforge.net/  (claims to be fast)




I did a quick performance check against migbase64 with the basic 
base64 de/encoding

using this non-scientific benchmark.
http://cr.openjdk.java.net/~sherman/4235519/PermBase64.java

Here is the scores
http://cr.openjdk.java.net/~sherman/4235519/scores

It's a tie, j.u.Base64 wins the decoding, but needs some work on 
encoding side.


java -server PermBase64 20 50
 j.u.Base64.encode() : 1390212
  migBase64.encode() : 720517
 j.u.Base64.decode() : 1200642
  migBase64.decode() : 2070015

java -server PermBase64 20 100
 j.u.Base64.encode() : 1239407
  migBase64.encode() : 740404
 j.u.Base64.decode() : 1257092
  migBase64.decode() : 2012910

java -server PermBase64 20 1000
 j.u.Base64.encode() : 1062212
  migBase64.encode() : 657342
 j.u.Base64.decode() : 1133740
  migBase64.decode() : 1930612

java -client PermBase64 20 50
 j.u.Base64.encode() : 961331
  migBase64.encode() : 875635
 j.u.Base64.decode() : 1528790
  migBase64.decode() : 2188473

java -client PermBase64 20 100
 j.u.Base64.encode() : 966453
  migBase64.encode() : 858082
 j.u.Base64.decode() : 1459159
  migBase64.decode() : 2115045

java -client PermBase64 20 1000
 j.u.Base64.encode() : 954401
  migBase64.encode() : 854903
 j.u.Base64.decode() : 1444603
  migBase64.decode() : 2038865








Re: Review/comment needed for the new public java.util.Base64 class

2012-10-12 Thread Xueming Shen



On 10/11/2012 02:43 AM, Stephen Colebourne wrote:

There are lots of other public base 64 implementations to test/check 
against.
http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html 

http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html 


http://migbase64.sourceforge.net/  (claims to be fast)




I did a quick performance check against migbase64 with the basic base64 
de/encoding

using this non-scientific benchmark.
http://cr.openjdk.java.net/~sherman/4235519/PermBase64.java

Here is the scores
http://cr.openjdk.java.net/~sherman/4235519/scores

It's a tie, j.u.Base64 wins the decoding, but needs some work on 
encoding side.


java -server PermBase64 20 50
 j.u.Base64.encode() : 1390212
  migBase64.encode() : 720517
 j.u.Base64.decode() : 1200642
  migBase64.decode() : 2070015

java -server PermBase64 20 100
 j.u.Base64.encode() : 1239407
  migBase64.encode() : 740404
 j.u.Base64.decode() : 1257092
  migBase64.decode() : 2012910

java -server PermBase64 20 1000
 j.u.Base64.encode() : 1062212
  migBase64.encode() : 657342
 j.u.Base64.decode() : 1133740
  migBase64.decode() : 1930612

java -client PermBase64 20 50
 j.u.Base64.encode() : 961331
  migBase64.encode() : 875635
 j.u.Base64.decode() : 1528790
  migBase64.decode() : 2188473

java -client PermBase64 20 100
 j.u.Base64.encode() : 966453
  migBase64.encode() : 858082
 j.u.Base64.decode() : 1459159
  migBase64.decode() : 2115045

java -client PermBase64 20 1000
 j.u.Base64.encode() : 954401
  migBase64.encode() : 854903
 j.u.Base64.decode() : 1444603
  migBase64.decode() : 2038865




Re: Review/comment needed for the new public java.util.Base64 class

2012-10-12 Thread Alan Bateman

On 11/10/2012 19:30, Ariel Weisberg wrote:

:

I know that ByteBuffers are pain, but I did notice that you can't
specify a source/dest pair when using ByteBuffers and that ByteBuffers
without arrays have to be copied. I don't see a simple safe way to
normalize access to them the way you can if everything is a byte array.

I agree, encode/decode methods where the destination is a given 
ByteBuffer would be desirable (and probably more useful than returning a 
new ByteBuffer each time). Byte arrays are so commonly used that it 
probably justifying having both variants as proposed.


-Alan


Re: Review/comment needed for the new public java.util.Base64 class

2012-10-12 Thread Michael Schierl
Hello,

(sorry if the threading is broken, but I was not subscribed to the list
and only found the discussion on Twitter and read it in the mailing list
archive)

Ariel Weisberg wrote on Thu Oct 11 11:30:56 PDT 2012:
 I know that ByteBuffers are pain, but I did notice that you can't
 specify a source/dest pair when using ByteBuffers and that ByteBuffers
 without arrays have to be copied. I don't see a simple safe way to
 normalize access to them the way you can if everything is a byte array.

Agreed. One of the advantages of using byte buffers is reducing
allocations, resulting in fewer garbage collections.

In addition, in this implementation the ByteBuffers have to contain the
full data.

What I like about most byte buffers APIs is that I can pass in a
ByteBuffer with incomplete data or maybe an output ByteBuffer that is
too small to hold the complete result, and it will just process as much
as it can, and leave the rest for the next round (which should work well
for Base64, too, as it always processes chunks of 3 or 4 bytes).

So, a useful ByteBuffer API in my opinion needs a method like

public boolean encode(ByteBuffer in, ByteBuffer out,
  boolean endOfInput);

public boolean decode(ByteBuffer in, ByteBuffer out,
  boolean endOfInput);

(similar to CharsetEncoder#encode) that can process partial input and
will return true if all processable input has been processed (i. e. in
has to be refilled) or false if some input could not have been processed
(i. e. out has to be flushed).

Users have to call it again and again until they call it with
endOfInput=true and get true back (Using an enum as result similar to
CoderResult#UNDERFLOW and CoderResult#OVERFLOW might be another option
if the boolean results are too cryptic).

Having a ByteBuffer Base64 API might be useful (although I'm not sure
yet if I ever need it), but as it is now, it is mostly useless for
serious ByteBuffer usage, as if I have to split and copy the data
manually anyway, I can as well use the array APIs.


Just my 0.02 EUR,


Michael


Re: Review/comment needed for the new public java.util.Base64 class

2012-10-11 Thread Alan Bateman

On 11/10/2012 04:15, Xueming Shen wrote:
There is no plan yet. The sun.misc.BASE64En/Decoder should already 
trigger a compiler
warning for it's a sun private API. @Deprecated annotation might be a 
good fit.


-Sherman
Yes, I think we should deprecate them. The other thing suggested in the 
JEP is that we would also look usages in the JDK, also other 
implementations (as there are many) to see if they can be retired. This 
can of course be done later. It also an opportunity for contributions, 
say for example someone might like to look at java.util.prefs to retire 
the implementation there. Another one that might be able to retire is 
the implementation in the com.sun.net.httpserver.BasicAuthenticator 
class and there are others.


-Alan


Re: Review/comment needed for the new public java.util.Base64 class

2012-10-11 Thread Stephen Colebourne
The class level javadoc is quite short. It also has hyperlinks in the
first sentence, which means that they are visible in package level
javadoc. Consider having no hyperlinks in the first sentence, and
expanding a little on what base 64 is.

There are lots of other public base 64 implementations to test/check against.
http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html
http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html
http://migbase64.sourceforge.net/  (claims to be fast)

The arrays are defined inconsistently within the code (3 styles).
  private Encoder(byte[] base64, byte[] newline, int linemax)
  byte [] getBytes(ByteBuffer bb)
  private static final byte toBase64[] =
I'd strongly encourage one style be used, and that it is the first of
the three above.

Stephen


On 10 October 2012 18:54, Xueming Shen xueming.s...@oracle.com wrote:

 A standard/public API for Base64 encoding and decoding has been long
 overdue. JDK8  has a JEP [1] for this particular request.

 Here is the draft proposal to add a public Base64 utility class for JDK8.

 http://cr.openjdk.java.net/~sherman/4235519/webrev

 This class basically provides 4 variants of Base64 encoding scheme,

 (1) the default RFC 4648, which uses standard mapping, no line breaks,
 (2) the URL-safe RFE 4648, no line breaks, use - and _ to replace +
 and
 / for the mapping
 (3) the default MIME style, as in RFC 2045 (and its earlier versions), which
 uses
 standard base64 mapping, a 76 characters per line limit and uses crlf
 pair
  \r\n for line break.
 (4) an extend-able MIME base64, with the char-per-line and the line
 separator(s)
  specified by developer.

 The encoder/decoder now provides encoding and decoding for byte[], String,
 ByteBuffer and a pair of EncoderInputStream and DecoderOutputStrream,
 which we believe/hope should satisfy most of the common use cases.
 Additional
 method(s) can be added if strongly desired.

 We tried couple slightly different styles of design for such this simple
 utility
 class [2].  We kinda concluded that the version proposed probably provides
 the best balance among readability, usability and extensibility.

 Please help review and comment on the API and implementation.

 Thanks!
 -Sherman

 [1] http://openjdk.java.net/jeps/135
 [2] http://cr.openjdk.java.net/~sherman/base64/


Re: Review/comment needed for the new public java.util.Base64 class

2012-10-11 Thread Xueming Shen

On 10/11/2012 02:43 AM, Stephen Colebourne wrote:

The class level javadoc is quite short. It also has hyperlinks in the
first sentence, which means that they are visible in package level
javadoc. Consider having no hyperlinks in the first sentence, and
expanding a little on what base 64 is.



Sure, will come up something longer.

There are lots of other public base 64 implementations to test/check against.
http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html
http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html
http://migbase64.sourceforge.net/  (claims to be fast)


I did compare the result against the apache version, the difference 
appears to
be the apache (1)append line feeds at the end of the encoded bytes 
(2)skip the

padding '=' characters for URL-safe style.  Will try other implementations.


The arrays are defined inconsistently within the code (3 styles).
   private Encoder(byte[] base64, byte[] newline, int linemax)
   byte [] getBytes(ByteBuffer bb)
   private static final byte toBase64[] =
I'd strongly encourage one style be used, and that it is the first of
the three above.


Good catch, the later two are not intentional, the leftover of old code. 
webrev has

been updated according.

Thanks!
-Sherman



Stephen


On 10 October 2012 18:54, Xueming Shenxueming.s...@oracle.com  wrote:

A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use - and _ to replace +
and
 / for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions), which
uses
 standard base64 mapping, a 76 characters per line limit and uses crlf
pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], String,
ByteBuffer and a pair of EncoderInputStream and DecoderOutputStrream,
which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this simple
utility
class [2].  We kinda concluded that the version proposed probably provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/




RE: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Iris Clark
Hi, Sherman.

I'm glad to see this coming in.  As you said, long overdue.

I'm curious.  What are the plans are to encourage migration from the JDK 
private and unsupported sun.misc.BASE64{En,DE}coder classes?  Compile-time 
warning?  Documentation?  Something else?

Thanks,
iris

-Original Message-
From: Xueming Shen 
Sent: Wednesday, October 10, 2012 10:55 AM
To: core-libs-dev
Subject: Review/comment needed for the new public java.util.Base64 class


A standard/public API for Base64 encoding and decoding has been long overdue. 
JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use - and _ to replace + and
 / for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions), which 
uses
 standard base64 mapping, a 76 characters per line limit and uses crlf 
pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], String, 
ByteBuffer and a pair of EncoderInputStream and DecoderOutputStrream, which 
we believe/hope should satisfy most of the common use cases. 
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this simple 
utility class [2].  We kinda concluded that the version proposed probably 
provides the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/


Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Joe Darcy

Hello,

On 10/10/2012 1:03 PM, Iris Clark wrote:

Hi, Sherman.

I'm glad to see this coming in.  As you said, long overdue.

I'm curious.  What are the plans are to encourage migration from the JDK 
private and unsupported sun.misc.BASE64{En,DE}coder classes?  Compile-time 
warning?  Documentation?  Something else?


On that point, sun.misc.BASE64{En,DE}coder may present a fine 
opportunity to use the @Deprecated annotation...


-Joe



Thanks,
iris

-Original Message-
From: Xueming Shen
Sent: Wednesday, October 10, 2012 10:55 AM
To: core-libs-dev
Subject: Review/comment needed for the new public java.util.Base64 class


A standard/public API for Base64 encoding and decoding has been long overdue. 
JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use - and _ to replace + and
  / for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions), which 
uses
  standard base64 mapping, a 76 characters per line limit and uses crlf 
pair
   \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
   specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of 
EncoderInputStream and DecoderOutputStrream, which we believe/hope should 
satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this simple 
utility class [2].  We kinda concluded that the version proposed probably provides the 
best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/




Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Weijun Wang

Several questions:

1. In encode0(byte[] src, byte[] dst)

 281 if (linepos == linemax  (atom != 0 || sp  sl)) {

Maybe atom != 0 is not necessary?

2. Is it necessary to explicitly mention in the spec that there is no 
CrLf at the end of a MIME encoded string?


3. The test confirms decoding can correctly reverse the encoding but it 
says nothing about the correctness of the encoding. Maybe we can just 
use 10.  Test Vectors of RFC 4648?


Thanks
Max

On 10/11/2012 01:54 AM, Xueming Shen wrote:


A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use - and _ to replace
+ and
 / for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions),
which uses
 standard base64 mapping, a 76 characters per line limit and uses
crlf pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], String,
ByteBuffer and a pair of EncoderInputStream and DecoderOutputStrream,
which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this
simple utility
class [2].  We kinda concluded that the version proposed probably provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/


Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Xueming Shen

On 10/10/12 6:51 PM, Weijun Wang wrote:

Several questions:

1. In encode0(byte[] src, byte[] dst)

 281 if (linepos == linemax  (atom != 0 || sp  sl)) {

Maybe atom != 0 is not necessary?


The logic here is that if we reached the last atom (atom == 0), but if there
is still byte(s) in src (sp  sl), we will need to output the last 
special unit,

which has one or two padding charactere '=', in this case, we still need to
output the line separator(s).



2. Is it necessary to explicitly mention in the spec that there is no 
CrLf at the end of a MIME encoded string?


I'm struggling with which is the appropriate/desired behavior, output 
the crlf for the last line
or not. Apache's common coder appears to append the crlf for the last 
line, but our sun.misc
version does not (but sun.misc.BASE64 actually appends the line 
separator if the last line
happens to have 76 characters, I would assume this is a bug though). The 
current implement

tries to match the sun.misc. I'm happy to go either way.

But, as you suggested, it might be worth explicitly describing whatever 
behavior we choose.




3. The test confirms decoding can correctly reverse the encoding but 
it says nothing about the correctness of the encoding. Maybe we can 
just use 10.  Test Vectors of RFC 4648?




I do have a version of TestBase64 actually compares encoded results of 
j.u.Base64,
sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe 
I should

at least plug in the code for comparing with sun.misc.Base64Encoder.

Thanks!
-Sherman





On 10/11/2012 01:54 AM, Xueming Shen wrote:


A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for 
JDK8.


http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use - and _ to replace
+ and
 / for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions),
which uses
 standard base64 mapping, a 76 characters per line limit and uses
crlf pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], 
String,
ByteBuffer and a pair of EncoderInputStream and 
DecoderOutputStrream,

which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this
simple utility
class [2].  We kinda concluded that the version proposed probably 
provides

the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/




Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Xueming Shen
There is no plan yet. The sun.misc.BASE64En/Decoder should already 
trigger a compiler
warning for it's a sun private API. @Deprecated annotation might be a 
good fit.


-Sherman

On 10/10/12 1:40 PM, Joe Darcy wrote:

Hello,

On 10/10/2012 1:03 PM, Iris Clark wrote:

Hi, Sherman.

I'm glad to see this coming in.  As you said, long overdue.

I'm curious.  What are the plans are to encourage migration from the 
JDK private and unsupported sun.misc.BASE64{En,DE}coder classes?  
Compile-time warning?  Documentation?  Something else?


On that point, sun.misc.BASE64{En,DE}coder may present a fine 
opportunity to use the @Deprecated annotation...


-Joe



Thanks,
iris

-Original Message-
From: Xueming Shen
Sent: Wednesday, October 10, 2012 10:55 AM
To: core-libs-dev
Subject: Review/comment needed for the new public java.util.Base64 class


A standard/public API for Base64 encoding and decoding has been long 
overdue. JDK8  has a JEP [1] for this particular request.


Here is the draft proposal to add a public Base64 utility class for 
JDK8.


http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use - and _ to replace 
+ and

  / for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier 
versions), which uses
  standard base64 mapping, a 76 characters per line limit and 
uses crlf pair

   \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
   specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], 
String, ByteBuffer and a pair of EncoderInputStream and 
DecoderOutputStrream, which we believe/hope should satisfy most of 
the common use cases.

Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this 
simple utility class [2].  We kinda concluded that the version 
proposed probably provides the best balance among readability, 
usability and extensibility.


Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/






Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Weijun Wang



On 10/11/2012 11:09 AM, Xueming Shen wrote:

On 10/10/12 6:51 PM, Weijun Wang wrote:

Several questions:

1. In encode0(byte[] src, byte[] dst)

 281 if (linepos == linemax  (atom != 0 || sp  sl)) {

Maybe atom != 0 is not necessary?


The logic here is that if we reached the last atom (atom == 0), but if
there
is still byte(s) in src (sp  sl), we will need to output the last
special unit,
which has one or two padding charactere '=', in this case, we still need to
output the line separator(s).


But when atom != 0, it seems sp  sl should always be true.

-Max





2. Is it necessary to explicitly mention in the spec that there is no
CrLf at the end of a MIME encoded string?


I'm struggling with which is the appropriate/desired behavior, output
the crlf for the last line
or not. Apache's common coder appears to append the crlf for the last
line, but our sun.misc
version does not (but sun.misc.BASE64 actually appends the line
separator if the last line
happens to have 76 characters, I would assume this is a bug though). The
current implement
tries to match the sun.misc. I'm happy to go either way.

But, as you suggested, it might be worth explicitly describing whatever
behavior we choose.



3. The test confirms decoding can correctly reverse the encoding but
it says nothing about the correctness of the encoding. Maybe we can
just use 10.  Test Vectors of RFC 4648?



I do have a version of TestBase64 actually compares encoded results of
j.u.Base64,
sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe
I should
at least plug in the code for comparing with sun.misc.Base64Encoder.

Thanks!
-Sherman





On 10/11/2012 01:54 AM, Xueming Shen wrote:


A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for
JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use - and _ to replace
+ and
 / for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions),
which uses
 standard base64 mapping, a 76 characters per line limit and uses
crlf pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[],
String,
ByteBuffer and a pair of EncoderInputStream and
DecoderOutputStrream,
which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this
simple utility
class [2].  We kinda concluded that the version proposed probably
provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/




Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Xueming Shen

On 10/10/12 8:16 PM, Weijun Wang wrote:



On 10/11/2012 11:09 AM, Xueming Shen wrote:

On 10/10/12 6:51 PM, Weijun Wang wrote:

Several questions:

1. In encode0(byte[] src, byte[] dst)

 281 if (linepos == linemax  (atom != 0 || sp  
sl)) {


Maybe atom != 0 is not necessary?


The logic here is that if we reached the last atom (atom == 0), but if
there
is still byte(s) in src (sp  sl), we will need to output the last
special unit,
which has one or two padding charactere '=', in this case, we still 
need to

output the line separator(s).


But when atom != 0, it seems sp  sl should always be true.


Yes, the sp  sl part only counts if atom == 0. Means if it's NOT last atom,
(atom != 0) output the line feeds, if it IS the last atom (atom == 0), only
output the line feeds if there are leftover byte (sp  sl), which means the
last 4-byte unit (with one or two '=') will be after this line feed.

-Sherman



-Max





2. Is it necessary to explicitly mention in the spec that there is no
CrLf at the end of a MIME encoded string?


I'm struggling with which is the appropriate/desired behavior, output
the crlf for the last line
or not. Apache's common coder appears to append the crlf for the last
line, but our sun.misc
version does not (but sun.misc.BASE64 actually appends the line
separator if the last line
happens to have 76 characters, I would assume this is a bug though). The
current implement
tries to match the sun.misc. I'm happy to go either way.

But, as you suggested, it might be worth explicitly describing whatever
behavior we choose.



3. The test confirms decoding can correctly reverse the encoding but
it says nothing about the correctness of the encoding. Maybe we can
just use 10.  Test Vectors of RFC 4648?



I do have a version of TestBase64 actually compares encoded results of
j.u.Base64,
sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe
I should
at least plug in the code for comparing with sun.misc.Base64Encoder.

Thanks!
-Sherman





On 10/11/2012 01:54 AM, Xueming Shen wrote:


A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for
JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use - and _ to replace
+ and
 / for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions),
which uses
 standard base64 mapping, a 76 characters per line limit and 
uses

crlf pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[],
String,
ByteBuffer and a pair of EncoderInputStream and
DecoderOutputStrream,
which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this
simple utility
class [2].  We kinda concluded that the version proposed probably
provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/






Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Weijun Wang



On 10/11/2012 11:32 AM, Xueming Shen wrote:

On 10/10/12 8:16 PM, Weijun Wang wrote:



On 10/11/2012 11:09 AM, Xueming Shen wrote:

On 10/10/12 6:51 PM, Weijun Wang wrote:

Several questions:

1. In encode0(byte[] src, byte[] dst)

 281 if (linepos == linemax  (atom != 0 || sp 
sl)) {

Maybe atom != 0 is not necessary?


The logic here is that if we reached the last atom (atom == 0), but if
there
is still byte(s) in src (sp  sl), we will need to output the last
special unit,
which has one or two padding charactere '=', in this case, we still
need to
output the line separator(s).


But when atom != 0, it seems sp  sl should always be true.


Yes, the sp  sl part only counts if atom == 0. Means if it's NOT last
atom,
(atom != 0) output the line feeds, if it IS the last atom (atom == 0), only
output the line feeds if there are leftover byte (sp  sl), which means the
last 4-byte unit (with one or two '=') will be after this line feed.


So, the value of sp  sl is always the same as (atom != 0 || sp  sl). 
That's why I said atom != 0 is not necessary.


-Max



-Sherman



-Max





2. Is it necessary to explicitly mention in the spec that there is no
CrLf at the end of a MIME encoded string?


I'm struggling with which is the appropriate/desired behavior, output
the crlf for the last line
or not. Apache's common coder appears to append the crlf for the last
line, but our sun.misc
version does not (but sun.misc.BASE64 actually appends the line
separator if the last line
happens to have 76 characters, I would assume this is a bug though). The
current implement
tries to match the sun.misc. I'm happy to go either way.

But, as you suggested, it might be worth explicitly describing whatever
behavior we choose.



3. The test confirms decoding can correctly reverse the encoding but
it says nothing about the correctness of the encoding. Maybe we can
just use 10.  Test Vectors of RFC 4648?



I do have a version of TestBase64 actually compares encoded results of
j.u.Base64,
sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe
I should
at least plug in the code for comparing with sun.misc.Base64Encoder.

Thanks!
-Sherman





On 10/11/2012 01:54 AM, Xueming Shen wrote:


A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for
JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use - and _ to replace
+ and
 / for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions),
which uses
 standard base64 mapping, a 76 characters per line limit and
uses
crlf pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[],
String,
ByteBuffer and a pair of EncoderInputStream and
DecoderOutputStrream,
which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this
simple utility
class [2].  We kinda concluded that the version proposed probably
provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/






Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Xueming Shen

On 10/10/12 8:39 PM, Weijun Wang wrote:



On 10/11/2012 11:32 AM, Xueming Shen wrote:

On 10/10/12 8:16 PM, Weijun Wang wrote:



On 10/11/2012 11:09 AM, Xueming Shen wrote:

On 10/10/12 6:51 PM, Weijun Wang wrote:

Several questions:

1. In encode0(byte[] src, byte[] dst)

 281 if (linepos == linemax  (atom != 0 || sp 
sl)) {

Maybe atom != 0 is not necessary?


The logic here is that if we reached the last atom (atom == 0), but if
there
is still byte(s) in src (sp  sl), we will need to output the last
special unit,
which has one or two padding charactere '=', in this case, we still
need to
output the line separator(s).


But when atom != 0, it seems sp  sl should always be true.


Yes, the sp  sl part only counts if atom == 0. Means if it's NOT last
atom,
(atom != 0) output the line feeds, if it IS the last atom (atom == 
0), only
output the line feeds if there are leftover byte (sp  sl), which 
means the

last 4-byte unit (with one or two '=') will be after this line feed.


So, the value of sp  sl is always the same as (atom != 0 || sp  sl). 
That's why I said atom != 0 is not necessary.


You're absolutely correct! Somehow I kept thinking you were saying sp  
sl is not necessary:-)

not the other way around.

webrev will be updated shortly.

Thanks!
-Sherman




-Max



-Sherman



-Max





2. Is it necessary to explicitly mention in the spec that there is no
CrLf at the end of a MIME encoded string?


I'm struggling with which is the appropriate/desired behavior, output
the crlf for the last line
or not. Apache's common coder appears to append the crlf for the last
line, but our sun.misc
version does not (but sun.misc.BASE64 actually appends the line
separator if the last line
happens to have 76 characters, I would assume this is a bug 
though). The

current implement
tries to match the sun.misc. I'm happy to go either way.

But, as you suggested, it might be worth explicitly describing 
whatever

behavior we choose.



3. The test confirms decoding can correctly reverse the encoding but
it says nothing about the correctness of the encoding. Maybe we can
just use 10.  Test Vectors of RFC 4648?



I do have a version of TestBase64 actually compares encoded results of
j.u.Base64,
sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. 
Maybe

I should
at least plug in the code for comparing with sun.misc.Base64Encoder.

Thanks!
-Sherman





On 10/11/2012 01:54 AM, Xueming Shen wrote:


A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for
JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line 
breaks,
(2) the URL-safe RFE 4648, no line breaks, use - and _ to 
replace

+ and
 / for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier 
versions),

which uses
 standard base64 mapping, a 76 characters per line limit and
uses
crlf pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[],
String,
ByteBuffer and a pair of EncoderInputStream and
DecoderOutputStrream,
which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this
simple utility
class [2].  We kinda concluded that the version proposed probably
provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/