On Mon, Sep 29, 2025 at 12:03 PM John Naylor
wrote:
> On Wed, Sep 24, 2025 at 4:18 PM Chao Li wrote:
> > I am not sure if you should also upgrade the UCM file to 2022 version,
> but if we need, let’s do it with a separate commit.
>
> If they can all use the same file, we should just do that for
On Sep 24, 2025, at 15:04, Chao Li wrote:
On Sep 24, 2025, at 14:42, John Naylor wrote:
Sounds good. Were you also interested in seeing if EUC_CN can use the
same UCM file? That would allow us to get rid of the XML file.
Sure, let me take a look.
I found that both EUC_CN and UHC use the sam
> On Sep 29, 2025, at 17:32, John Naylor wrote:
>
> On Wed, Sep 24, 2025 at 4:18 PM Chao Li wrote:
>>
>> I found that both EUC_CN and UHC use the same XML file, so I updated both.
>
> When you say "same file", that implies to me the file we have checked
> in our repo. They have different nam
On Wed, Sep 24, 2025 at 4:18 PM Chao Li wrote:
>
> I found that both EUC_CN and UHC use the same XML file, so I updated both.
When you say "same file", that implies to me the file we have checked
in our repo. They have different names and the UHC file is downloaded
on demand, so it doesn't seem l
On Tue, Sep 30, 2025 at 2:05 PM John Naylor wrote:
> On Mon, Sep 29, 2025 at 5:36 PM Chao Li wrote:
> > “same file" was a mistake. windows-949-2000.ucm is a different file from
> gb-18030-2000(2022).ucm.
> >
> > In theory, we don’t need to change UHC if our goal is to delete
> gb-18030-2000.xml.
On Fri, Oct 3, 2025 at 12:12 PM Chao Li wrote:
>
> * Do we want to switch UHC from using xml to ucm? That would not lead to map
> file change, instead it just removes the code of parsing xml file, making
> future maintenance easier.
I seriously doubt there will be any future maintenance, in whi
Hi John,
Thank you again much very for your support.
> On Oct 2, 2025, at 13:44, John Naylor wrote:
>
>
> Thanks, pushed after correcting the file name in the perl script
> comment. I've marked the CF entry committed.
>
So the work for GB18030 is done.
I just want to check with your two mo
On Tue, Sep 30, 2025 at 1:31 PM Chao Li wrote:
> Sure, no problem. Please see the attached v4, I reverted UHC change from v3.
> Again, please "git rm" the xml file when you push the commit.
Thanks, pushed after correcting the file name in the perl script
comment. I've marked the CF entry committ
On Mon, Sep 29, 2025 at 5:36 PM Chao Li wrote:
> “same file" was a mistake. windows-949-2000.ucm is a different file from
> gb-18030-2000(2022).ucm.
>
> In theory, we don’t need to change UHC if our goal is to delete
> gb-18030-2000.xml.
That was my goal, yes. Let's stay focused on that and not
On Wed, Sep 24, 2025 at 4:18 PM Chao Li wrote:
> I am not sure if you should also upgrade the UCM file to 2022 version, but if
> we need, let’s do it with a separate commit.
If they can all use the same file, we should just do that for the sake
of simplicity, in which case a separate commit is j
> On Sep 24, 2025, at 14:42, John Naylor wrote:
>
>
> Sounds good. Were you also interested in seeing if EUC_CN can use the
> same UCM file? That would allow us to get rid of the XML file.
>
Sure, let me take a look.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.hig
On Thu, Sep 18, 2025 at 2:59 PM John Naylor wrote:
> The only change I made for v9 is to reword the regression test
> addition from "upgrades" to "change". I'm planning to commit next week
> unless there are objections. (If anyone otherwise busy with the PG18
> release wants a chance to weigh in,
Hi John,
Thanks for working on v9.
> On Sep 18, 2025, at 15:59, John Naylor wrote:
>
>
> It'll be a good idea to communicate how to detect (unlikely but not
> impossible) incompatibilities for existing systems, but I don't think
> committing needs to wait for that piece.
>
> --
> John Naylor
On Thu, Sep 18, 2025 at 3:16 PM Chao Li wrote:
>
> When you say “communicate how to detect incompatibility for existing
> systems”, what would be the communication channel? I am actually very new to
> the PG development community, your guidance will be greatly appreciated.
My first thought was
> On Sep 18, 2025, at 16:53, John Naylor wrote:
>
> On Thu, Sep 18, 2025 at 3:16 PM Chao Li wrote:
>>
>> When you say “communicate how to detect incompatibility for existing
>> systems”, what would be the communication channel? I am actually very new to
>> the PG development community, your
On Fri, Sep 12, 2025 at 8:57 AM Chao Li wrote:
> * In 0003, updated a function comment in utf8_and_gb18030.c to address John's
> comment about reference to the xml file.
Thanks, but the entire point of that comment change was to remove the
reference to the XML file, yet it didn't actually do tha
On Thu, Sep 11, 2025 at 4:09 PM Chao Li wrote:
> Then I switched to the patch branch, it got 21 different lines. After I
> updated the 18 known changes in the out file, then it got only 3 different
> lines:
>
> ```
> - \x8135f437 | \xe1b8bf
> + \x8135f437 | \xee9f87
>
> - \xa3a0 | \xee
On Wed, Sep 10, 2025 at 6:54 PM Chao Li wrote:
> I downloaded the tests from the referenced mail, but I cannot make the
tests to run. After extracting the 2 patch files, it added
src/test/encodings, but "make check" seems to not run them. I tried to copy
.out and .sql files to src/test/regress, b
> On Aug 18, 2025, at 16:50, Chao Li wrote:
>
>
>
Hi John,
Any follow up on this patch?
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Mon, Aug 18, 2025 at 1:36 PM Chao Li wrote:
> I think that patch could be separate, because the makefile changes are
> generic to all map files. The current GB18030 patch doesn't depend on that
> makefile patch at all. The makefile patch just makes build a little bit
> easier upon map file c
On Wed, Aug 13, 2025 at 3:08 PM Chao Li wrote:
> Attached is the new patch. It downloads the UCM file in make:
> After regenerating the map files, there is no change found in the map files.
I can confirm, thanks.
We split a patch into multiple patches, it's customary include all of
them, since
On 2025/8/13 15:20, Chao Li wrote:
Sounds good. Let me recreate the patch.
Attached is the new patch. It downloads the UCM file in make:
```
Unicode % make gb18030_to_utf8.map
wget -O gb-18030-2000.ucm --no-use-server-timestamps
https://raw.githubusercontent.com/unicode-org/icu-data/d9d3
> On Aug 13, 2025, at 15:17, John Naylor wrote:
>
> On Wed, Aug 13, 2025 at 2:41 AM Peter Eisentraut wrote:
>> Could we download this file on demand, like we do for the other input
>> files for the conversion mappings?
>
> That sounds like the way to go.
>
> While poking around, I found that
On Wed, Aug 13, 2025 at 2:41 AM Peter Eisentraut wrote:
> Could we download this file on demand, like we do for the other input
> files for the conversion mappings?
That sounds like the way to go.
While poking around, I found that UCS_to_EUC_CN.pl also uses
gb-18030-2000.xml for its input, so no
On 12.08.25 06:57, John Naylor wrote:
Before getting to that, I thought I'd bring this up to the community:
+# Copyright (C) 2000-2009, International Business Machines
Corporation and others.
+# All Rights Reserved.
The previous XML file didn't contain a copyright notice -- does anyone
want to
>>
>> 3. Skip patch 2, directly go to patch 3. So that, patch 3 will include
>> changes introduced by both 2005 and 2022. This way makes minimum changes to
>> map files.
>
> #3 is what I had in mind to begin with unless we found some reason not
> to. Minimizing churn is a lucky side effect tha
On Tue, Aug 12, 2025 at 9:09 AM Chao Li wrote:
[bringing this back to the original thread]
> So, I compared 2000 ucm with 2005 ucm also compared 2005 ucm with 2022 ucm.
> Then I found that some changed in 2005 is reverted in 2022, that why diff
> between 2000 and 2022 is small. For example, th
On Mon, Aug 11, 2025 at 4:25 PM Chao Li wrote:
>
> Sure I can split the patch into two. The patch only changes the .xml file to
> .ucm file and updating the perl script. As a result, map files should not be
> changed.
>
> Then the second patch will update the ucm file, so that the second patch
> That would match my expectation. In case it wasn't clear before, my
> preference is to split this patch into two patches: First convert to
> .ucm, then update to 2022 revision. Then the small diff will be
> obvious to everyone who looks at the second commit.
Sure I can split the patch into two
On Mon, Aug 11, 2025 at 3:22 PM Chao Li wrote:
Hi,
For future reference, please don't quote my entire message below yours
-- it clutters the archives and also removes context.
> Yes, I did a diff between 2000.ucm and 2022.ucm when I worked on the patch.
> The diff between 2000.ucm and 2022.ucm
Hi John,
Thanks for your review.
Yes, I did a diff between 2000.ucm and 2022.ucm when I worked on the patch. The
diff between 2000.ucm and 2022.ucm are quite small:
```diff - omit the comment part
> \x80 |3
> \xA3\xA0 |3
> \xA3\xA0 |4
>
28067a28099,28114
> \xFE\x59 |0
> \x82\x35\x90\x37 |3
On Mon, Aug 11, 2025 at 9:01 AM Chao Li wrote:
>
> I have created a patch https://commitfest.postgresql.org/patch/5954/.
> CommitFests requested a rebase, so I rebased the code and created the v2
> patch.
>
> BTW, I have tested all 66 new characters, 9 not-required characters and 18
> changed c
On 05.08.25 08:22, Chao Li wrote:
I agree with Tom that we may just redefine GB18030 to comply with the
2022 standard.
As John Naylor pointed, 2022 is not backward compatible, that is true.
However, I went through all the incompatible changes, those are all
characters rarely used. So I would
On Tue, Aug 5, 2025 at 1:22 PM Chao Li wrote:
>
> 2025年8月4日 21:51,Tom Lane wrote:
>
> So on the whole I'd lean a bit towards just redefining GB18030 as
> meaning the new standard. The fact that we don't support it as a
> server-side encoding perhaps makes that idea more tenable than it
> would b
> 2025年8月4日 21:51,Tom Lane wrote:
>
>
> So on the whole I'd lean a bit towards just redefining GB18030 as
> meaning the new standard. The fact that we don't support it as a
> server-side encoding perhaps makes that idea more tenable than it
> would be if the encoding governed the interpretati
On Mon, Aug 04, 2025 at 04:08:24PM +0800, JiaoShuntian wrote:
> Hi hackers,
>
> I noticed that PostgreSQL currently supports GB18030 encoding based on the
> older GB18030-2000 standard (as seen in commits like extend GB18030
> conversion). However, China has since updated its mandatory character
Andrew Dunstan writes:
> On 2025-08-04 Mo 6:35 AM, John Naylor wrote:
>> There is a risk of breaking applications, although only a few dozen
>> mappings changed. If it were added as a separate encoding, users could
>> opt in.
> That makes sense ... naming the new encoding so as to avoid confusion
On 2025-08-04 Mo 6:35 AM, John Naylor wrote:
On Mon, Aug 4, 2025 at 3:08 PM JiaoShuntian wrote:
I noticed that PostgreSQL currently supports GB18030 encoding based on the
older GB18030-2000 standard (as seen in commits like extend GB18030
conversion). However, China has since updated its ma
On Mon, Aug 4, 2025 at 3:08 PM JiaoShuntian wrote:
> I noticed that PostgreSQL currently supports GB18030 encoding based on the
> older GB18030-2000 standard (as seen in commits like extend GB18030
> conversion). However, China has since updated its mandatory character set
> standard to GB18030
Hi
😂,Not long ago, many people were rushing to remove this character set
because of a security vulnerability. I was honestly quite shocked when I
saw it.
Thanks
On Mon, Aug 4, 2025 at 4:08 PM JiaoShuntian wrote:
> Hi hackers,
>
> I noticed that PostgreSQL currently supports GB18030 encodin
> I would like to ask:
>
> Are there any plans to upgrade PostgreSQL’s GB18030 support to the 2022
> version?Would the community be open to contributions in this area?
I think we only need to update the perl script and map file to complete this
task.
JiaoShuntian
HighGo Inc.
Hi hackers,
I noticed that PostgreSQL currently supports GB18030 encoding based on the
older GB18030-2000 standard (as seen in commits like extend GB18030
conversion). However, China has since updated its mandatory character set
standard to GB18030-2022, which includes additional characters and
42 matches
Mail list logo