Hi,
Here is patch for 6443578 and 6202130 also in webrev form.
http://files.paratix.ch/jdk/6372077and6443578/webrev.01/
http://files.paratix.ch/jdk/6372077and6443578/webrev.01.zip
Hope it helps. With all the patience, can I do anything to make it
easier to get feedback or find a sponsor?
Regards,
Philipp
On 02.05.2018 07:21, Philipp Kunz wrote:
Hi,
Recently, I tried to fix only bug 6202130 with the intention to fix
bug 6443578 later with the intention to get some opportunity for
feedback, but haven't got any, and propose now a fix for both together
which in my opinion makes more sense.
See attached patch.
Some considerations, assumptions, and explanations
* In my opinion, the code for writing manifests was distributed in
the two classes Attributes and Manifest in an elegant way but
somewhat difficult to explain the coherence. I chose to group the
code that writes manifests into a new class ManifestWriter. The
main incentive for that was to prevent or reduce duplicated code I
would have had to change twice otherwise. This also results in a
source file of a suitable size.
* I could not support the assumption that the write and writeMain
methods in Attributes couldn't be referenced anywhere so I
deprecated them rather than having them removed.
* I assumed the patch will not make it into JDK 10 and, hence, the
deprecated annotations are attributed with since = 11.
* I could not figure out any reason for the use of DataOutputStream
and did not use it.
* Performance-wise I assume that the code is approximately
comparable to the previous version. The biggest improvement in
this respect I hope comes from removing the String that contains
the byte array constructed with deprecated String(byte[], int,
int, int) and then copying it over again to a StringBuffer and
from there to a String again and then Characters. On the other
hand, keeping whole characters together when breaking lines might
make it slightly slower. I hope my changes are an overall
improvement, but I haven't measured it.
* For telling first from continuation bytes of utf-8 characters
apart I re-used a method isNotUtfContinuationByte from either
StringCoding or UTF_8.Decoder. Unfortunately I found no way not to
duplicate it.
* Where it said before "XXX Need to handle UTF8 values and break up
lines longer than 72 bytes" in Attributes#writeMain I did not dare
to remove the comment completely because it still does not deal
correctly with version headers longer than 72 bytes and the set of
allowed values. I changed it accordingly. Two similar comments are
removed in the patch.
* I added two tests, WriteDeprecated and NullKeysAndValues, to
demonstrate compatibility as good as I could. Might however not be
desired to keep and having to maintain.
* LineBrokenMultiByteCharacter for jarsigner should not be removed
or not so immediately because someone might attempt to sign an
older jarfile created without that patch with a newer jarsigner
that already contains it.
suggested changes or additions to the bug database: (i have no
permissions to edit it myself)
* Re-combine copies of isNotUtfContinuationByte (three by now).
Relates to 6184334. Worth to file another issue?
* Manifest versions have specific specifications, cannot break
across lines and can contain a subset of characters only. Bug
6910466 relates but is not exactly the same. If someone else is
convinced that writing a manifest should issue a warning or any
other way to deal with a version that does not conform to the
specification, I'd suggest to create a separate bug for that.
Now, I would be glad if someone sponsored a review. This is only my
third attempt to submit a patch which is why I chose a lesser
important subject to fix in order to get familiar and now I understand
it's not the most attractive patch to review. Please don't hesitate to
suggest what I could do better or differently.
As a bonus, with these changes, manifest files will always be
displayed correctly with just any utf capable viewer even if they
contain multi-byte utf characters that would have been broken across a
line break with the current/previous implementation and all manifests
will become also valid strings in Java.
Regards,
Philipp
On 20.04.2018 00:58, Philipp Kunz wrote:
Hi,
I tried to fix bug 6202130 about manifest utf support and come up now
with a test as suggested in the bug's comments that shows that utf
charset actually works before removing the comments from the code.
When I wanted to remove the XXX comments about utf it occurred to me
that version attributes ("Signature-Version" and "Manifest-Version")
would never be broken across lines and should anyway not support the
whole utf character set which sounds more like related to bugs
6910466 or 4935610 but it's not a real fit. Therefore, I could not
remove one such comment of Attributes#writeMain but I changed it. The
first comment in bug 6202130 mentions only two comments but there are
three in Attributes. In the attached patch I removed only two of
three and changed the remaining third to not mention utf anymore.
At the moment, at least until 6443578 is fixed, multi-byte utf
characters can be broken across lines. It might be worth a
consideration to test that explicitly as well but then I guess there
is not much of a point in testing the current behavior that will
change with 6443578, hopefully soon. There are in my opinion enough
characters broken across lines in the attached test that demonstrate
that this still works like it did before.
I would have preferred also to remove the calls to deprecated
String(byte[], int, int, int) but then figured it relates more to bug
6443578 than 6202130 and now prefer to do that in another separate
patch.
Bug 6202130 also states that lines are broken by String.length not by
byte length. While it looks so at first glance, I could not confirm.
The combination of getBytes("UTF8"), String(byte[], int, int, int),
and then DataOutputStream.writeBytes(String) in that combination does
not drop high-bytes because every byte (whether a whole character or
only a part of a multi-byte character) becomes a character in
String(...) containing that byte in its low-byte which will be read
again from writeBytes(...). Or put in a different way, every utf
encoded byte becomes a character and multi-byte utf characters are
converted into multiple string characters containing one byte each in
their lower bytes. The current solution is not nice, but at least
works. With that respect I'd like to suggest to deprecate
DataOutputStream.writeBytes(String) because it does something not
exactly expected when guessing from its name and that would suit a
byte[] parameter better very much like it has been done with
String(byte[], int, int, int). Any advice about the procedure to
deprecate something?
I was surprised that it was not trivial to list all valid utf
characters. If someone has a better idea than isValidUtfCharacter in
the attached test, let me know.
Altogether, I would not consider 6202130 resolved completely, unless
maybe all remaining points are copied to 6443578 and maybe another
bug about valid values for "Signature-Version" and "Manifest-Version"
if at all desired. But still I consider the attached patch an
improvement and most of the remainder can then be solved in 6443578
and so far I am looking forward to any kind of feedback.
Regards,
Philipp