Re: GB18030-2022 Support in PostgreSQL

2025-10-18 Thread Chao Li
On Mon, Sep 29, 2025 at 12:03 PM John Naylor wrote: > On Wed, Sep 24, 2025 at 4:18 PM Chao Li wrote: > > I am not sure if you should also upgrade the UCM file to 2022 version, > but if we need, let’s do it with a separate commit. > > If they can all use the same file, we should just do that for

Re: GB18030-2022 Support in PostgreSQL

2025-10-18 Thread Chao Li
On Sep 24, 2025, at 15:04, Chao Li wrote: On Sep 24, 2025, at 14:42, John Naylor wrote: Sounds good. Were you also interested in seeing if EUC_CN can use the same UCM file? That would allow us to get rid of the XML file. Sure, let me take a look. I found that both EUC_CN and UHC use the sam

Re: GB18030-2022 Support in PostgreSQL

2025-10-18 Thread Chao Li
> On Sep 29, 2025, at 17:32, John Naylor wrote: > > On Wed, Sep 24, 2025 at 4:18 PM Chao Li wrote: >> >> I found that both EUC_CN and UHC use the same XML file, so I updated both. > > When you say "same file", that implies to me the file we have checked > in our repo. They have different nam

Re: GB18030-2022 Support in PostgreSQL

2025-10-18 Thread John Naylor
On Wed, Sep 24, 2025 at 4:18 PM Chao Li wrote: > > I found that both EUC_CN and UHC use the same XML file, so I updated both. When you say "same file", that implies to me the file we have checked in our repo. They have different names and the UHC file is downloaded on demand, so it doesn't seem l

Re: GB18030-2022 Support in PostgreSQL

2025-10-17 Thread Chao Li
On Tue, Sep 30, 2025 at 2:05 PM John Naylor wrote: > On Mon, Sep 29, 2025 at 5:36 PM Chao Li wrote: > > “same file" was a mistake. windows-949-2000.ucm is a different file from > gb-18030-2000(2022).ucm. > > > > In theory, we don’t need to change UHC if our goal is to delete > gb-18030-2000.xml.

Re: GB18030-2022 Support in PostgreSQL

2025-10-02 Thread John Naylor
On Fri, Oct 3, 2025 at 12:12 PM Chao Li wrote: > > * Do we want to switch UHC from using xml to ucm? That would not lead to map > file change, instead it just removes the code of parsing xml file, making > future maintenance easier. I seriously doubt there will be any future maintenance, in whi

Re: GB18030-2022 Support in PostgreSQL

2025-10-02 Thread Chao Li
Hi John, Thank you again much very for your support. > On Oct 2, 2025, at 13:44, John Naylor wrote: > > > Thanks, pushed after correcting the file name in the perl script > comment. I've marked the CF entry committed. > So the work for GB18030 is done. I just want to check with your two mo

Re: GB18030-2022 Support in PostgreSQL

2025-10-01 Thread John Naylor
On Tue, Sep 30, 2025 at 1:31 PM Chao Li wrote: > Sure, no problem. Please see the attached v4, I reverted UHC change from v3. > Again, please "git rm" the xml file when you push the commit. Thanks, pushed after correcting the file name in the perl script comment. I've marked the CF entry committ

Re: GB18030-2022 Support in PostgreSQL

2025-09-29 Thread John Naylor
On Mon, Sep 29, 2025 at 5:36 PM Chao Li wrote: > “same file" was a mistake. windows-949-2000.ucm is a different file from > gb-18030-2000(2022).ucm. > > In theory, we don’t need to change UHC if our goal is to delete > gb-18030-2000.xml. That was my goal, yes. Let's stay focused on that and not

Re: GB18030-2022 Support in PostgreSQL

2025-09-28 Thread John Naylor
On Wed, Sep 24, 2025 at 4:18 PM Chao Li wrote: > I am not sure if you should also upgrade the UCM file to 2022 version, but if > we need, let’s do it with a separate commit. If they can all use the same file, we should just do that for the sake of simplicity, in which case a separate commit is j

Re: GB18030-2022 Support in PostgreSQL

2025-09-24 Thread Chao Li
> On Sep 24, 2025, at 14:42, John Naylor wrote: > > > Sounds good. Were you also interested in seeing if EUC_CN can use the > same UCM file? That would allow us to get rid of the XML file. > Sure, let me take a look. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.hig

Re: GB18030-2022 Support in PostgreSQL

2025-09-23 Thread John Naylor
On Thu, Sep 18, 2025 at 2:59 PM John Naylor wrote: > The only change I made for v9 is to reword the regression test > addition from "upgrades" to "change". I'm planning to commit next week > unless there are objections. (If anyone otherwise busy with the PG18 > release wants a chance to weigh in,

Re: GB18030-2022 Support in PostgreSQL

2025-09-20 Thread Chao Li
Hi John, Thanks for working on v9. > On Sep 18, 2025, at 15:59, John Naylor wrote: > > > It'll be a good idea to communicate how to detect (unlikely but not > impossible) incompatibilities for existing systems, but I don't think > committing needs to wait for that piece. > > -- > John Naylor

Re: GB18030-2022 Support in PostgreSQL

2025-09-18 Thread John Naylor
On Thu, Sep 18, 2025 at 3:16 PM Chao Li wrote: > > When you say “communicate how to detect incompatibility for existing > systems”, what would be the communication channel? I am actually very new to > the PG development community, your guidance will be greatly appreciated. My first thought was

Re: GB18030-2022 Support in PostgreSQL

2025-09-18 Thread Chao Li
> On Sep 18, 2025, at 16:53, John Naylor wrote: > > On Thu, Sep 18, 2025 at 3:16 PM Chao Li wrote: >> >> When you say “communicate how to detect incompatibility for existing >> systems”, what would be the communication channel? I am actually very new to >> the PG development community, your

Re: GB18030-2022 Support in PostgreSQL

2025-09-16 Thread John Naylor
On Fri, Sep 12, 2025 at 8:57 AM Chao Li wrote: > * In 0003, updated a function comment in utf8_and_gb18030.c to address John's > comment about reference to the xml file. Thanks, but the entire point of that comment change was to remove the reference to the XML file, yet it didn't actually do tha

Re: GB18030-2022 Support in PostgreSQL

2025-09-16 Thread John Naylor
On Thu, Sep 11, 2025 at 4:09 PM Chao Li wrote: > Then I switched to the patch branch, it got 21 different lines. After I > updated the 18 known changes in the out file, then it got only 3 different > lines: > > ``` > - \x8135f437 | \xe1b8bf > + \x8135f437 | \xee9f87 > > - \xa3a0 | \xee

Re: GB18030-2022 Support in PostgreSQL

2025-09-11 Thread John Naylor
On Wed, Sep 10, 2025 at 6:54 PM Chao Li wrote: > I downloaded the tests from the referenced mail, but I cannot make the tests to run. After extracting the 2 patch files, it added src/test/encodings, but "make check" seems to not run them. I tried to copy .out and .sql files to src/test/regress, b

Re: GB18030-2022 Support in PostgreSQL

2025-08-31 Thread Chao Li
> On Aug 18, 2025, at 16:50, Chao Li wrote: > > > Hi John, Any follow up on this patch? Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/

Re: GB18030-2022 Support in PostgreSQL

2025-08-18 Thread John Naylor
On Mon, Aug 18, 2025 at 1:36 PM Chao Li wrote: > I think that patch could be separate, because the makefile changes are > generic to all map files. The current GB18030 patch doesn't depend on that > makefile patch at all. The makefile patch just makes build a little bit > easier upon map file c

Re: GB18030-2022 Support in PostgreSQL

2025-08-17 Thread John Naylor
On Wed, Aug 13, 2025 at 3:08 PM Chao Li wrote: > Attached is the new patch. It downloads the UCM file in make: > After regenerating the map files, there is no change found in the map files. I can confirm, thanks. We split a patch into multiple patches, it's customary include all of them, since

Re: GB18030-2022 Support in PostgreSQL

2025-08-13 Thread Chao Li
On 2025/8/13 15:20, Chao Li wrote: Sounds good. Let me recreate the patch. Attached is the new patch. It downloads the UCM file in make: ``` Unicode % make gb18030_to_utf8.map wget -O gb-18030-2000.ucm --no-use-server-timestamps https://raw.githubusercontent.com/unicode-org/icu-data/d9d3

Re: GB18030-2022 Support in PostgreSQL

2025-08-13 Thread Chao Li
> On Aug 13, 2025, at 15:17, John Naylor wrote: > > On Wed, Aug 13, 2025 at 2:41 AM Peter Eisentraut wrote: >> Could we download this file on demand, like we do for the other input >> files for the conversion mappings? > > That sounds like the way to go. > > While poking around, I found that

Re: GB18030-2022 Support in PostgreSQL

2025-08-13 Thread John Naylor
On Wed, Aug 13, 2025 at 2:41 AM Peter Eisentraut wrote: > Could we download this file on demand, like we do for the other input > files for the conversion mappings? That sounds like the way to go. While poking around, I found that UCS_to_EUC_CN.pl also uses gb-18030-2000.xml for its input, so no

Re: GB18030-2022 Support in PostgreSQL

2025-08-12 Thread Peter Eisentraut
On 12.08.25 06:57, John Naylor wrote: Before getting to that, I thought I'd bring this up to the community: +# Copyright (C) 2000-2009, International Business Machines Corporation and others. +# All Rights Reserved. The previous XML file didn't contain a copyright notice -- does anyone want to

Re: GB18030-2022 Support in PostgreSQL

2025-08-11 Thread Chao Li
>> >> 3. Skip patch 2, directly go to patch 3. So that, patch 3 will include >> changes introduced by both 2005 and 2022. This way makes minimum changes to >> map files. > > #3 is what I had in mind to begin with unless we found some reason not > to. Minimizing churn is a lucky side effect tha

Re: GB18030-2022 Support in PostgreSQL

2025-08-11 Thread John Naylor
On Tue, Aug 12, 2025 at 9:09 AM Chao Li wrote: [bringing this back to the original thread] > So, I compared 2000 ucm with 2005 ucm also compared 2005 ucm with 2022 ucm. > Then I found that some changed in 2005 is reverted in 2022, that why diff > between 2000 and 2022 is small. For example, th

Re: GB18030-2022 Support in PostgreSQL

2025-08-11 Thread John Naylor
On Mon, Aug 11, 2025 at 4:25 PM Chao Li wrote: > > Sure I can split the patch into two. The patch only changes the .xml file to > .ucm file and updating the perl script. As a result, map files should not be > changed. > > Then the second patch will update the ucm file, so that the second patch

Re: GB18030-2022 Support in PostgreSQL

2025-08-11 Thread Chao Li
> That would match my expectation. In case it wasn't clear before, my > preference is to split this patch into two patches: First convert to > .ucm, then update to 2022 revision. Then the small diff will be > obvious to everyone who looks at the second commit. Sure I can split the patch into two

Re: GB18030-2022 Support in PostgreSQL

2025-08-11 Thread John Naylor
On Mon, Aug 11, 2025 at 3:22 PM Chao Li wrote: Hi, For future reference, please don't quote my entire message below yours -- it clutters the archives and also removes context. > Yes, I did a diff between 2000.ucm and 2022.ucm when I worked on the patch. > The diff between 2000.ucm and 2022.ucm

Re: GB18030-2022 Support in PostgreSQL

2025-08-11 Thread Chao Li
Hi John, Thanks for your review. Yes, I did a diff between 2000.ucm and 2022.ucm when I worked on the patch. The diff between 2000.ucm and 2022.ucm are quite small: ```diff - omit the comment part > \x80 |3 > \xA3\xA0 |3 > \xA3\xA0 |4 > 28067a28099,28114 > \xFE\x59 |0 > \x82\x35\x90\x37 |3

Re: GB18030-2022 Support in PostgreSQL

2025-08-10 Thread John Naylor
On Mon, Aug 11, 2025 at 9:01 AM Chao Li wrote: > > I have created a patch https://commitfest.postgresql.org/patch/5954/. > CommitFests requested a rebase, so I rebased the code and created the v2 > patch. > > BTW, I have tested all 66 new characters, 9 not-required characters and 18 > changed c

Re: GB18030-2022 Support in PostgreSQL

2025-08-06 Thread Peter Eisentraut
On 05.08.25 08:22, Chao Li wrote: I agree with Tom that we may just redefine GB18030 to comply with the 2022 standard. As John Naylor pointed, 2022 is not backward compatible, that is true. However, I went through all the incompatible changes, those are all characters rarely used. So I would

Re: GB18030-2022 Support in PostgreSQL

2025-08-05 Thread John Naylor
On Tue, Aug 5, 2025 at 1:22 PM Chao Li wrote: > > 2025年8月4日 21:51,Tom Lane wrote: > > So on the whole I'd lean a bit towards just redefining GB18030 as > meaning the new standard. The fact that we don't support it as a > server-side encoding perhaps makes that idea more tenable than it > would b

Re: GB18030-2022 Support in PostgreSQL

2025-08-04 Thread Chao Li
> 2025年8月4日 21:51,Tom Lane wrote: > > > So on the whole I'd lean a bit towards just redefining GB18030 as > meaning the new standard. The fact that we don't support it as a > server-side encoding perhaps makes that idea more tenable than it > would be if the encoding governed the interpretati

Re: GB18030-2022 Support in PostgreSQL

2025-08-04 Thread Ken Marshall
On Mon, Aug 04, 2025 at 04:08:24PM +0800, JiaoShuntian wrote: > Hi hackers, > > I noticed that PostgreSQL currently supports GB18030 encoding based on the > older GB18030-2000 standard (as seen in commits like extend GB18030 > conversion). However, China has since updated its mandatory character

Re: GB18030-2022 Support in PostgreSQL

2025-08-04 Thread Tom Lane
Andrew Dunstan writes: > On 2025-08-04 Mo 6:35 AM, John Naylor wrote: >> There is a risk of breaking applications, although only a few dozen >> mappings changed. If it were added as a separate encoding, users could >> opt in. > That makes sense ... naming the new encoding so as to avoid confusion

Re: GB18030-2022 Support in PostgreSQL

2025-08-04 Thread Andrew Dunstan
On 2025-08-04 Mo 6:35 AM, John Naylor wrote: On Mon, Aug 4, 2025 at 3:08 PM JiaoShuntian wrote: I noticed that PostgreSQL currently supports GB18030 encoding based on the older GB18030-2000 standard (as seen in commits like extend GB18030 conversion). However, China has since updated its ma

Re: GB18030-2022 Support in PostgreSQL

2025-08-04 Thread John Naylor
On Mon, Aug 4, 2025 at 3:08 PM JiaoShuntian wrote: > I noticed that PostgreSQL currently supports GB18030 encoding based on the > older GB18030-2000 standard (as seen in commits like extend GB18030 > conversion). However, China has since updated its mandatory character set > standard to GB18030

Re: GB18030-2022 Support in PostgreSQL

2025-08-04 Thread wenhui qiu
Hi 😂,Not long ago, many people were rushing to remove this character set because of a security vulnerability. I was honestly quite shocked when I saw it. Thanks On Mon, Aug 4, 2025 at 4:08 PM JiaoShuntian wrote: > Hi hackers, > > I noticed that PostgreSQL currently supports GB18030 encodin

Re: GB18030-2022 Support in PostgreSQL

2025-08-04 Thread 矫顺田
> I would like to ask: > > Are there any plans to upgrade PostgreSQL’s GB18030 support to the 2022 > version?Would the community be open to contributions in this area? I think we only need to update the perl script and map file to complete this task. JiaoShuntian HighGo Inc.

GB18030-2022 Support in PostgreSQL

2025-08-04 Thread JiaoShuntian
Hi hackers, I noticed that PostgreSQL currently supports GB18030 encoding based on the older GB18030-2000 standard (as seen in commits like extend GB18030 conversion). However, China has since updated its mandatory character set standard to GB18030-2022, which includes additional characters and