Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base
Hello Volker. (I'm sorry for duplicate posting, I did bad operation) What's actually the difference between "ibm-942C" and "ibm-942"? They have two differences on single byte part: 1. Control Character rotation for 0x1A and 0x1C, 0x7F [2] 2. Character replacement for 0x5C and 0x7E (0xFE, 0xFF) For IBM-942 [3], 0x1A<=>U+001C,0x1C<=>U+007F,0x5C<=>U+00A5,0x7E<=>U+203E,0x7F<=>U+001A 0xFE<=>U+005C,0xFF<=>U+007E For IBM-942C, 0x1A<=>U+001A,0x1C<=>U+001C,0x5C<=>U+005C,0x7E<=>U+007E,0x7F<=>U+007F, 0x5C<=U+00A5,0x7E<=U+203E 0xFE=>U+005C,0xFF=>U+007E (It's ASCII compatible) IBM-942's single byte part is IBM-1041 [4]. IBM-932's single byte part is IBM-897 [5]. IBM-1041 is not same as IBM-897. 5 characters were added into IBM-1041 [4]. (0x80,0xA0,0xFD,0xFE,0xFF [2]) [2] https://www-01.ibm.com/software/globalization/cdra/appendix_g.html [3] https://en.wikipedia.org/wiki/Code_page_942 [4] http://www-01.ibm.com/software/globalization/cp/cp01041.html [5] http://www-01.ibm.com/software/globalization/cp/cp00897.html On 2018-04-17 23:52, Volker Simonis wrote: Hi Bhaktavatsal Reddy, you change looks good, although I can't really verify all the charset aliases. For example Wikipedia mentions that "ibm-932" is equivalent to "ibm-942" [1] but you made it an alias for "ibm-942C". What's actually the difference between "ibm-942C" and "ibm-942"? I can sponsor your change although I would appreciate if somebody else from IBM could have another look at your change. I tried to compare with "IBM Java 9" but it doesn't seem to exist. They only refer to AdoptOpenJDK and AdoptOpenJDK just uses a vanilla version of OpenJDK. Finally, I hope you won't mind if I update the copyright years on the files you changed before pushing (this is a convention in the OpenJDK project). Best regards, Volker [1] https://en.wikipedia.org/wiki/Code_page_932_(IBM) On Mon, Apr 16, 2018 at 1:10 PM, Bhaktavatsal R Maramwrote: Hi All, I've regenerated webrev using "hg rename" to create template files. webrev looks much neat now.. Thanks Alan for suggestion. webrev - http://cr.openjdk.java.net/~gromero/8201540/v2/ Thanks, Bhaktavatsal Reddy -"core-libs-dev" wrote: - To: Alan Bateman From: "Bhaktavatsal R Maram" Sent by: "core-libs-dev" Date: 04/16/2018 02:38PM Cc: Tim Ellison , ppc-aix-port-...@openjdk.java.net, Java Core Libs Subject: Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base Hi Alan, I deleted IBM943C.java (using hg remove) and added new file IBM943C.java.template (using hg add). I now understand that using "hg rename" is giving more meaningful representation in webrev/index.html. I will re-generate webrev by renaming source files to templates using "hg rename" Thanks, Bhaktavatsal Reddy -Alan Bateman wrote: - To: Bhaktavatsal R Maram , Volker Simonis From: Alan Bateman Date: 04/16/2018 02:16PM Cc: Java Core Libs , Tim Ellison , ppc-aix-port-...@openjdk.java.net Subject: Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base On 16/04/2018 09:22, Bhaktavatsal R Maram wrote: 3. Source files for IBM-942C and IBM-943C are changed to template to support #1 You might want to double check the webrev as it looks like you've added templates where as I assume you mean to use "hg rename" to rename IBM942C.java and IBM943C.java. -Alan
Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base
Hi Volker, Thank you for reviewing the patch. > you change looks good, although I can't really verify all the charset > aliases. For example Wikipedia mentions that "ibm-932" is equivalent > to "ibm-942" [1] but you made it an alias for "ibm-942C". What's > actually the difference between "ibm-942C" and "ibm-942"? IBM-942C is a customized version of IBM-942, in which following characters are replaced with ASCII thus making first 96 character mappings same as ASCII. 0x1A is mapped to 0x1C (in IBM-942) and to 0x1A (in IBM-942C) 0x1C is mapped to 0x7F (in IBM-942) and to 0x1C (in IBM-942C) 0x5C is mapped to 0xA5 (in IBM-942) and to 0x5C (in IBM-942C) 0x7E is mapped to 0x203E (in IBM-942) and to 0x7E (in IBM-942C) 0x7F is mapped to 0x1A (in IBM-942) and to 0x7F (in IBM-942C) Similarly, IBM-943C is a customization for IBM-943 in which character mappings for Yen(¥) and overline(‾) are replaced by their ASCII equivalents backslash (\) and tilde (~). So, we should be mapping OS code-page IBM-943 to code-page IBM-943C in Java. I am working on fixing these inconsistencies in another defect in-order not to confuse things (I hope it is alright). Current patch mainly address moving default codepage from extended codepage list to standard codepage list. Also, There are few codepages which are missing in OpenJDK. > I can sponsor your change although I would appreciate if somebody else > from IBM could have another look at your change. I tried to compare > with "IBM Java 9" but it doesn't seem to exist. They only refer to > AdoptOpenJDK and AdoptOpenJDK just uses a vanilla version of OpenJDK. Right! OpenJ9 version of JDK9 in AdoptOpenJDK is vanilla version of OpenJDK with OpenJ9. I've picked aliases for this patch from IBM JDK 8. > Finally, I hope you won't mind if I update the copyright years on the > files you changed before pushing (this is a convention in the OpenJDK > project). Sorry, I forgot to take care of copyright. Please change it this time before pushing. I will take care of it henceforth. Thanks, Bhaktavatsal Reddy -Volker Simoniswrote: - To: Bhaktavatsal R Maram From: Volker Simonis Date: 04/17/2018 08:30PM Cc: Alan Bateman , Tim Ellison , ppc-aix-port-...@openjdk.java.net, Java Core Libs Subject: Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base Hi Bhaktavatsal Reddy, you change looks good, although I can't really verify all the charset aliases. For example Wikipedia mentions that "ibm-932" is equivalent to "ibm-942" [1] but you made it an alias for "ibm-942C". What's actually the difference between "ibm-942C" and "ibm-942"? I can sponsor your change although I would appreciate if somebody else from IBM could have another look at your change. I tried to compare with "IBM Java 9" but it doesn't seem to exist. They only refer to AdoptOpenJDK and AdoptOpenJDK just uses a vanilla version of OpenJDK. Finally, I hope you won't mind if I update the copyright years on the files you changed before pushing (this is a convention in the OpenJDK project). Best regards, Volker [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Code-5Fpage-5F932-5F-28IBM-29=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=KUVGEwJiRVpNtQ9wUhGP6BKqzSTV1OWX31WWPdQMmqg=DencrOI40Trgt_TxNW4dYVWqYtpT7dPnHzaSOEsw_ZQ=xYfspcI7N7ZAbVMqyjM7YIb_kd-RsFPn6pINIFz_Oa4= On Mon, Apr 16, 2018 at 1:10 PM, Bhaktavatsal R Maram wrote: > Hi All, > > I've regenerated webrev using "hg rename" to create template files. webrev > looks much neat now.. Thanks Alan for suggestion. > > webrev - > https://urldefense.proofpoint.com/v2/url?u=http-3A__cr.openjdk.java.net_-7Egromero_8201540_v2_=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=KUVGEwJiRVpNtQ9wUhGP6BKqzSTV1OWX31WWPdQMmqg=DencrOI40Trgt_TxNW4dYVWqYtpT7dPnHzaSOEsw_ZQ=mDikak1wXAwU-a0yd6dJml9X5N1DJg-GkQmgPl4v_5g= > > Thanks, > Bhaktavatsal Reddy > > > -"core-libs-dev" wrote: - > To: Alan Bateman > From: "Bhaktavatsal R Maram" > Sent by: "core-libs-dev" > Date: 04/16/2018 02:38PM > Cc: Tim Ellison , ppc-aix-port-...@openjdk.java.net, > Java Core Libs > Subject: Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in > java.base > > Hi Alan, > > I deleted IBM943C.java (using hg remove) and added new file > IBM943C.java.template (using hg add). I now understand that using "hg rename" > is giving more meaningful representation in webrev/index.html. > > I will re-generate webrev by renaming source files to templates using "hg > rename" > > Thanks, > Bhaktavatsal Reddy > > > > -Alan Bateman wrote: - > To: Bhaktavatsal R Maram , Volker Simonis > > From:
Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base
Hi Bhaktavatsal Reddy, you change looks good, although I can't really verify all the charset aliases. For example Wikipedia mentions that "ibm-932" is equivalent to "ibm-942" [1] but you made it an alias for "ibm-942C". What's actually the difference between "ibm-942C" and "ibm-942"? I can sponsor your change although I would appreciate if somebody else from IBM could have another look at your change. I tried to compare with "IBM Java 9" but it doesn't seem to exist. They only refer to AdoptOpenJDK and AdoptOpenJDK just uses a vanilla version of OpenJDK. Finally, I hope you won't mind if I update the copyright years on the files you changed before pushing (this is a convention in the OpenJDK project). Best regards, Volker [1] https://en.wikipedia.org/wiki/Code_page_932_(IBM) On Mon, Apr 16, 2018 at 1:10 PM, Bhaktavatsal R Maramwrote: > Hi All, > > I've regenerated webrev using "hg rename" to create template files. webrev > looks much neat now.. Thanks Alan for suggestion. > > webrev - http://cr.openjdk.java.net/~gromero/8201540/v2/ > > Thanks, > Bhaktavatsal Reddy > > > -"core-libs-dev" wrote: - > To: Alan Bateman > From: "Bhaktavatsal R Maram" > Sent by: "core-libs-dev" > Date: 04/16/2018 02:38PM > Cc: Tim Ellison , ppc-aix-port-...@openjdk.java.net, > Java Core Libs > Subject: Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in > java.base > > Hi Alan, > > I deleted IBM943C.java (using hg remove) and added new file > IBM943C.java.template (using hg add). I now understand that using "hg rename" > is giving more meaningful representation in webrev/index.html. > > I will re-generate webrev by renaming source files to templates using "hg > rename" > > Thanks, > Bhaktavatsal Reddy > > > > -Alan Bateman wrote: - > To: Bhaktavatsal R Maram , Volker Simonis > > From: Alan Bateman > Date: 04/16/2018 02:16PM > Cc: Java Core Libs , Tim Ellison > , ppc-aix-port-...@openjdk.java.net > Subject: Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in > java.base > > > On 16/04/2018 09:22, Bhaktavatsal R Maram wrote: >> >> 3. Source files for IBM-942C and IBM-943C are changed to template to support >> #1 >> > You might want to double check the webrev as it looks like you've added > templates where as I assume you mean to use "hg rename" to rename > IBM942C.java and IBM943C.java. > > -Alan > > >
Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base
On 16/04/2018 18:43, Xueming Shen wrote: It looks good to me. I agree, the main thing is that it's not adding charsets to java.base for the other builds. -Alan
Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base
It looks good to me. -Sherman On 4/16/18, 4:10 AM, Bhaktavatsal R Maram wrote: Hi All, I've regenerated webrev using "hg rename" to create template files. webrev looks much neat now.. Thanks Alan for suggestion. webrev - http://cr.openjdk.java.net/~gromero/8201540/v2/ Thanks, Bhaktavatsal Reddy -"core-libs-dev"wrote: - To: Alan Bateman From: "Bhaktavatsal R Maram" Sent by: "core-libs-dev" Date: 04/16/2018 02:38PM Cc: Tim Ellison , ppc-aix-port-...@openjdk.java.net, Java Core Libs Subject: Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base Hi Alan, I deleted IBM943C.java (using hg remove) and added new file IBM943C.java.template (using hg add). I now understand that using "hg rename" is giving more meaningful representation in webrev/index.html. I will re-generate webrev by renaming source files to templates using "hg rename" Thanks, Bhaktavatsal Reddy -Alan Bateman wrote: - To: Bhaktavatsal R Maram , Volker Simonis From: Alan Bateman Date: 04/16/2018 02:16PM Cc: Java Core Libs , Tim Ellison , ppc-aix-port-...@openjdk.java.net Subject: Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base On 16/04/2018 09:22, Bhaktavatsal R Maram wrote: 3. Source files for IBM-942C and IBM-943C are changed to template to support #1 You might want to double check the webrev as it looks like you've added templates where as I assume you mean to use "hg rename" to rename IBM942C.java and IBM943C.java. -Alan
Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base
Hi All, I've regenerated webrev using "hg rename" to create template files. webrev looks much neat now.. Thanks Alan for suggestion. webrev - http://cr.openjdk.java.net/~gromero/8201540/v2/ Thanks, Bhaktavatsal Reddy -"core-libs-dev"wrote: - To: Alan Bateman From: "Bhaktavatsal R Maram" Sent by: "core-libs-dev" Date: 04/16/2018 02:38PM Cc: Tim Ellison , ppc-aix-port-...@openjdk.java.net, Java Core Libs Subject: Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base Hi Alan, I deleted IBM943C.java (using hg remove) and added new file IBM943C.java.template (using hg add). I now understand that using "hg rename" is giving more meaningful representation in webrev/index.html. I will re-generate webrev by renaming source files to templates using "hg rename" Thanks, Bhaktavatsal Reddy -Alan Bateman wrote: - To: Bhaktavatsal R Maram , Volker Simonis From: Alan Bateman Date: 04/16/2018 02:16PM Cc: Java Core Libs , Tim Ellison , ppc-aix-port-...@openjdk.java.net Subject: Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base On 16/04/2018 09:22, Bhaktavatsal R Maram wrote: > > 3. Source files for IBM-942C and IBM-943C are changed to template to support > #1 > You might want to double check the webrev as it looks like you've added templates where as I assume you mean to use "hg rename" to rename IBM942C.java and IBM943C.java. -Alan
Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base
Hi Alan, I deleted IBM943C.java (using hg remove) and added new file IBM943C.java.template (using hg add). I now understand that using "hg rename" is giving more meaningful representation in webrev/index.html. I will re-generate webrev by renaming source files to templates using "hg rename" Thanks, Bhaktavatsal Reddy -Alan Batemanwrote: - To: Bhaktavatsal R Maram , Volker Simonis From: Alan Bateman Date: 04/16/2018 02:16PM Cc: Java Core Libs , Tim Ellison , ppc-aix-port-...@openjdk.java.net Subject: Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base On 16/04/2018 09:22, Bhaktavatsal R Maram wrote: > > 3. Source files for IBM-942C and IBM-943C are changed to template to support > #1 > You might want to double check the webrev as it looks like you've added templates where as I assume you mean to use "hg rename" to rename IBM942C.java and IBM943C.java. -Alan
Re: RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base
On 16/04/2018 09:22, Bhaktavatsal R Maram wrote: 3. Source files for IBM-942C and IBM-943C are changed to template to support #1 You might want to double check the webrev as it looks like you've added templates where as I assume you mean to use "hg rename" to rename IBM942C.java and IBM943C.java. -Alan
RFR(S): 8201540: [AIX] Extend the set of supported charsets in java.base
Hi All, Please review. Bug: https://bugs.openjdk.java.net/browse/JDK-8201540 webrev: http://cr.openjdk.java.net/~gromero/8201540/v1/webrev/ In this patch, 1. Default charsets Big5, Big5_Solaris, Big5_HKSCS, GB18030, IBM856, IBM921, IBM922, IBM942, IBM942C, IBM943, IBM943C, IBM950, IBM970, IBM1046, IBM1124, IBM1383, ISO_8859_6, ISO_8859_8, MS1252, TIS_620 for different locales supported on AIX are added to standard charsets in java.base module 2. More aliases are added to existing charsets. 3. Source files for IBM-942C and IBM-943C are changed to template to support #1 4. Modified file make/jdk/src/classes/build/tools/charsetmapping/SPI.java to increase the Hashtable capacity that holds aliases of standard charsets. As the no.of charsets we include in standard charsets are more in this patch, existing capacity of Hashtable used to hold aliases is not efficient. Without change to this file, JDK won't get built and following error is seen Exception in thread "main" java.lang.RuntimeException: Cannot find a suitable size within given constraints at build.tools.charsetmapping.Hasher.build(Hasher.java:122) at build.tools.charsetmapping.Hasher.genClass(Hasher.java:261) at build.tools.charsetmapping.SPI.genClass(SPI.java:130) at build.tools.charsetmapping.Main.main(Main.java:99) Please note that webrev/index.html is not showing below 2 files I deleted. However, webrev/jdk11.changeset file show them deleted properly. src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM942C.java src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM943C.java I've built the JDK with this patch on both Linux and Aix. Thanks, Bhaktavatsal Reddy -Volker Simoniswrote: - To: Bhaktavatsal R Maram From: Volker Simonis Date: 04/13/2018 08:51PM Cc: Alan Bateman , Java Core Libs , Tim Ellison , ppc-aix-port-...@openjdk.java.net Subject: Re: Missing many locales support on AIX platform Hi Bhaktavatsal Reddy, thanks for addressing this long standing issue. I've opened "8201540: [AIX] Extend the set of supported charsets in java.base" [1] to track this issue. As I wrote in the bug report, this problem is the consequence of an emergency fix (JDK-8081332) I did back in 2015 to fix the build on AIX after the integration of the modularity support change (see discussion: [2]). At that time I only added the minimal set of charsets which were required to fix the build. It would be great if you can get in touch with your IBM colleagues to find out which are the default extended charsets supported by IBM J9 on AIX. I think we should try to use a similar set here in our OpenJDK port which is also used by the OpenJ9 for building OpenJ9 on AIX. Also, your IBM colleagues can help you to host a webrev which will make the further review of these changes much easier. I've but Tim from IBM on CC which should have an overview of all the IBM people involved in the OpenJDK project. Besides that I know at least the following people who might help you: Michihiro Horie: ho...@jp.ibm.com Hiroshi H Horii: ho...@jp.ibm.com Gustavo Romero: grom...@linux.vnet.ibm.com Matthew Brandyberyy: mbra...@linux.vnet.ibm.com Thank you and best regards, Volker [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8201540=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=KUVGEwJiRVpNtQ9wUhGP6BKqzSTV1OWX31WWPdQMmqg=cieXGANE8bD3liEC2gJzl5hfZorR2qIfggL6U9t-Et8=hC4gcA6uomYgY26uR74VelobSIK1ReDsjRZGmI46UBY= [2] https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.openjdk.java.net_pipermail_2d-2Ddev_2015-2DMay_005431.html=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=KUVGEwJiRVpNtQ9wUhGP6BKqzSTV1OWX31WWPdQMmqg=cieXGANE8bD3liEC2gJzl5hfZorR2qIfggL6U9t-Et8=qfYLaVVn7tapNDhv5bY6yE72nhngaWqte4wtRhQ3wB8= On Fri, Apr 13, 2018 at 2:42 PM, Bhaktavatsal R Maram wrote: > Hi Alan, > > Thank you for your response. I'm happy that my patch was attached. But, I > don't see attachment. So, I inlined patch which contain diffs from 2 > changesets in mail text. If a Jira bug is opened for this issue, probably I > can attach complete and consolidated patch there. > > At high level, I'm adding following charsets to standard charset in > java.base. For this, I have to change IBM943C and IBM942C from source to > template to handle java package and aliases. It is also required to add > codepage 932 as alias for IBM942C because both are one and same. > > Big5, Big5_HKSCS, GB18030, IBM942, IBM942C, IBM943, IBM943C, IBM950, IBM970, > IBM1124, TIS_620 > > > These are default charsets for some of locales supported by Operating System > (AIX). Since these are not available in standard charset, JDK can't be used > in those locale even if they are available in jdk.charset module (java > -version fails). > > I've followed some of the discussions around this in