Re: WIN32 pg_import_system_collations

2023-03-01 Thread Juan José Santamaría Flecha
On Tue, Feb 28, 2023 at 9:26 PM Andrew Dunstan  wrote:

> I think you missed my point, which was that the COLLATE clause above
> seemed particularly pointless. But I agree that all these are not much use,
> so I'll remove them as you suggest.
>

Maybe there has been some miscommunication, please let me try to explain
myself a little better. The whole test is an attempt to mimic
collate.linux.utf8, which has that same command, only for collate 'tr_TR',
and so does collate.icu.utf8 but commented out.

I've seen that you have committed this and now drongo is green, which is
great. Thank you for taking care of it.

Regards,

Juan José Santamaría Flecha


Re: WIN32 pg_import_system_collations

2023-02-28 Thread Andrew Dunstan


On 2023-02-28 Tu 11:40, Juan José Santamaría Flecha wrote:


On Tue, Feb 28, 2023 at 12:55 PM Andrew Dunstan  
wrote:


On 2023-02-27 Mo 17:20, Juan José Santamaría Flecha wrote:


Hmm, yeah. I'm not sure I understand the point of this test anyway:


SELECT to_char(date '2010-03-01', 'DD TMMON ' COLLATE "de_DE");


Uhm, they probably don't make much sense except for "tr_TR", so I'm 
fine with removing them. PFA a patch for so.





I think you missed my point, which was that the COLLATE clause above 
seemed particularly pointless. But I agree that all these are not much 
use, so I'll remove them as you suggest.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: WIN32 pg_import_system_collations

2023-02-28 Thread Juan José Santamaría Flecha
On Tue, Feb 28, 2023 at 12:55 PM Andrew Dunstan  wrote:

> On 2023-02-27 Mo 17:20, Juan José Santamaría Flecha wrote:
>
>
> Hmm, yeah. I'm not sure I understand the point of this test anyway:
>
>
> SELECT to_char(date '2010-03-01', 'DD TMMON ' COLLATE "de_DE");
>

Uhm, they probably don't make much sense except for "tr_TR", so I'm fine
with removing them. PFA a patch for so.

Regards,

Juan José Santamaría Flecha


0002-remove-useless-test-from-collate.windows.win1252.patch
Description: Binary data


0001-change-locale-name-for-test-collate.windows.win1252.patch
Description: Binary data


Re: WIN32 pg_import_system_collations

2023-02-28 Thread Andrew Dunstan


On 2023-02-27 Mo 17:20, Juan José Santamaría Flecha wrote:



El lun, 27 feb 2023, 23:05, Juan José Santamaría Flecha 
 escribió:



The command that's failing is "SET lc_time TO 'de_DE';", and that
area of code is untouched by this patch. As mentioned in [1],
the problem seems to come from a Windows bug that the CI images
and my development machines have patched out.


What I wanted to post as [1]:

https://www.postgresql.org/message-id/CAC%2BAXB1agvrgpyHEfqbDr2MOpcON3d%2BWYte_SLzn1E4TamLs9g%40mail.gmail.com



Hmm, yeah. I'm not sure I understand the point of this test anyway:


SELECT to_char(date '2010-03-01', 'DD TMMON ' COLLATE "de_DE");


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: WIN32 pg_import_system_collations

2023-02-27 Thread Juan José Santamaría Flecha
El lun, 27 feb 2023, 23:05, Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> escribió:

>
> The command that's failing is "SET lc_time TO 'de_DE';", and that area of
> code is untouched by this patch. As mentioned in [1], the problem seems to
> come from a Windows bug that the CI images and my development machines have
> patched out.
>

What I wanted to post as [1]:

https://www.postgresql.org/message-id/CAC%2BAXB1agvrgpyHEfqbDr2MOpcON3d%2BWYte_SLzn1E4TamLs9g%40mail.gmail.com


> Regards,
>
> Juan José Santamaría Flecha
>


Re: WIN32 pg_import_system_collations

2023-02-27 Thread Juan José Santamaría Flecha
On Mon, Feb 27, 2023 at 1:10 PM Andrew Dunstan  wrote:

> On 2023-02-26 Su 16:02, Andrew Dunstan wrote:
>
> Now that I have removed the barrier to testing this in the buildfarm, and
> added an appropriate locale setting to drongo, we can see that this test
> fails like this:
>
>
> diff -w -U3 
> c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out
>  
> c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
> --- 
> c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out
> 2023-01-23 04:39:06.755149600 +
> +++ 
> c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
>  2023-02-26 17:32:54.115515200 +
> @@ -363,16 +363,17 @@
>
>  -- to_char
>  SET lc_time TO 'de_DE';
> +ERROR:  invalid value for parameter "lc_time": "de_DE"
>  SELECT to_char(date '2010-03-01', 'DD TMMON ');
> to_char
>  -
> - 01 MRZ 2010
> + 01 MAR 2010
>  (1 row)
>
>  SELECT to_char(date '2010-03-01', 'DD TMMON ' COLLATE "de_DE");
> to_char
>  -
> - 01 MRZ 2010
> + 01 MAR 2010
>  (1 row)
>
>  -- to_date
>
>
> The last of these is especially an issue, as it doesn't even throw an
> error.
>
> See
> 
> 
>
>
> Further investigation shows that if we change the two instances of "de_DE"
> to "de-DE" the tests behave as expected, so it appears that while POSIX
> style aliases have been created for the BCP 47 style locales, using the
> POSIX aliases doesn't in fact work. I cant see anything that turns the
> POSIX locale name back into BCP 47 at the point of use, which seems to be
> what's needed.
>

The command that's failing is "SET lc_time TO 'de_DE';", and that area of
code is untouched by this patch. As mentioned in [1], the problem seems to
come from a Windows bug that the CI images and my development machines have
patched out.

I think we should change the locale name to make the test more robust, as
the attached. But I don't see a problem with making an alias for the
collations.

Regards,

Juan José Santamaría Flecha


0001-change-locale-name-for-test-collate.windows.win1252.patch
Description: Binary data


Re: WIN32 pg_import_system_collations

2023-02-27 Thread Andrew Dunstan


On 2023-02-26 Su 16:02, Andrew Dunstan wrote:



On 2023-01-03 Tu 08:48, Peter Eisentraut wrote:

On 09.12.22 13:48, Juan José Santamaría Flecha wrote:
On Thu, Dec 1, 2022 at 8:46 AM Peter Eisentraut 
> wrote:



    What is the status of this now?  I think the other issue has been
    addressed?


Yes, that's addressed for MSVC builds. I think there are a couple of 
pending issues for MinGW, but those should have their own threads.


The patch had rotten, so PFA a rebased version.


committed




Now that I have removed the barrier to testing this in the buildfarm, 
and added an appropriate locale setting to drongo, we can see that 
this test fails like this:



diff -w -U3 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out
 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
--- 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out
  2023-01-23 04:39:06.755149600 +
+++ 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
   2023-02-26 17:32:54.115515200 +
@@ -363,16 +363,17 @@
  
  -- to_char

  SET lc_time TO 'de_DE';
+ERROR:  invalid value for parameter "lc_time": "de_DE"
  SELECT to_char(date '2010-03-01', 'DD TMMON ');
 to_char
  -
- 01 MRZ 2010
+ 01 MAR 2010
  (1 row)
  
  SELECT to_char(date '2010-03-01', 'DD TMMON ' COLLATE "de_DE");

 to_char
  -
- 01 MRZ 2010
+ 01 MAR 2010
  (1 row)
  
  -- to_date



The last of these is especially an issue, as it doesn't even throw an 
error.


See 








Further investigation shows that if we change the two instances of 
"de_DE" to "de-DE" the tests behave as expected, so it appears that 
while POSIX style aliases have been created for the BCP 47 style 
locales, using the POSIX aliases doesn't in fact work. I cant see 
anything that turns the POSIX locale name back into BCP 47 at the point 
of use, which seems to be what's needed.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: WIN32 pg_import_system_collations

2023-02-26 Thread Andrew Dunstan


On 2023-01-03 Tu 08:48, Peter Eisentraut wrote:

On 09.12.22 13:48, Juan José Santamaría Flecha wrote:
On Thu, Dec 1, 2022 at 8:46 AM Peter Eisentraut 
> wrote:



    What is the status of this now?  I think the other issue has been
    addressed?


Yes, that's addressed for MSVC builds. I think there are a couple of 
pending issues for MinGW, but those should have their own threads.


The patch had rotten, so PFA a rebased version.


committed




Now that I have removed the barrier to testing this in the buildfarm, 
and added an appropriate locale setting to drongo, we can see that this 
test fails like this:



diff -w -U3 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out
 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
--- 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out
  2023-01-23 04:39:06.755149600 +
+++ 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
   2023-02-26 17:32:54.115515200 +
@@ -363,16 +363,17 @@
 
 -- to_char

 SET lc_time TO 'de_DE';
+ERROR:  invalid value for parameter "lc_time": "de_DE"
 SELECT to_char(date '2010-03-01', 'DD TMMON ');
to_char
 -
- 01 MRZ 2010
+ 01 MAR 2010
 (1 row)
 
 SELECT to_char(date '2010-03-01', 'DD TMMON ' COLLATE "de_DE");

to_char
 -
- 01 MRZ 2010
+ 01 MAR 2010
 (1 row)
 
 -- to_date



The last of these is especially an issue, as it doesn't even throw an error.

See 




cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: WIN32 pg_import_system_collations

2023-01-03 Thread Peter Eisentraut

On 09.12.22 13:48, Juan José Santamaría Flecha wrote:
On Thu, Dec 1, 2022 at 8:46 AM Peter Eisentraut 
> wrote:



What is the status of this now?  I think the other issue has been
addressed?


Yes, that's addressed for MSVC builds. I think there are a couple of 
pending issues for MinGW, but those should have their own threads.


The patch had rotten, so PFA a rebased version.


committed





Re: WIN32 pg_import_system_collations

2022-12-09 Thread Juan José Santamaría Flecha
ate = ' ||
+  quote_literal(current_setting('lc_collate')) ||
+  ', lc_ctype = ' ||
+  quote_literal(current_setting('lc_ctype')) || ');';
+END
+$$;
+CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
+CREATE COLLATION testx (locale = 'nonsense'); -- fail
+
+CREATE COLLATION test4 FROM nonsense;
+CREATE COLLATION test5 FROM test0;
+
+SELECT collname FROM pg_collation WHERE collname LIKE 'test%' ORDER BY 1;
+
+ALTER COLLATION test1 RENAME TO test11;
+ALTER COLLATION test0 RENAME TO test11; -- fail
+ALTER COLLATION test1 RENAME TO test22; -- fail
+
+ALTER COLLATION test11 OWNER TO regress_test_role;
+ALTER COLLATION test11 OWNER TO nonsense;
+ALTER COLLATION test11 SET SCHEMA test_schema;
+
+COMMENT ON COLLATION test0 IS 'US English';
+
+SELECT collname, nspname, obj_description(pg_collation.oid, 'pg_collation')
+FROM pg_collation JOIN pg_namespace ON (collnamespace = pg_namespace.oid)
+WHERE collname LIKE 'test%'
+ORDER BY 1;
+
+DROP COLLATION test0, test_schema.test11, test5;
+DROP COLLATION test0; -- fail
+DROP COLLATION IF EXISTS test0;
+
+SELECT collname FROM pg_collation WHERE collname LIKE 'test%';
+
+DROP SCHEMA test_schema;
+DROP ROLE regress_test_role;
+
+
+-- ALTER
+
+ALTER COLLATION "en_US" REFRESH VERSION;
+
+
+-- dependencies
+
+CREATE COLLATION test0 FROM "C";
+
+CREATE TABLE collate_dep_test1 (a int, b text COLLATE test0);
+CREATE DOMAIN collate_dep_dom1 AS text COLLATE test0;
+CREATE TYPE collate_dep_test2 AS (x int, y text COLLATE test0);
+CREATE VIEW collate_dep_test3 AS SELECT text 'foo' COLLATE test0 AS foo;
+CREATE TABLE collate_dep_test4t (a int, b text);
+CREATE INDEX collate_dep_test4i ON collate_dep_test4t (b COLLATE test0);
+
+DROP COLLATION test0 RESTRICT; -- fail
+DROP COLLATION test0 CASCADE;
+
+\d collate_dep_test1
+\d collate_dep_test2
+
+DROP TABLE collate_dep_test1, collate_dep_test4t;
+DROP TYPE collate_dep_test2;
+
+-- test range types and collations
+
+create type textrange_c as range(subtype=text, collation="C");
+create type textrange_en_us as range(subtype=text, collation="en_US");
+
+select textrange_c('A','Z') @> 'b'::text;
+select textrange_en_us('A','Z') @> 'b'::text;
+
+drop type textrange_c;
+drop type textrange_en_us;
+
+
+-- nondeterministic collations
+-- (not supported with libc provider)
+
+CREATE COLLATION ctest_det (locale = 'en_US', deterministic = true);
+CREATE COLLATION ctest_nondet (locale = 'en_US', deterministic = false);
+
+
+-- cleanup
+SET client_min_messages TO warning;
+DROP SCHEMA collate_tests CASCADE;
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 1d86cd6..823fea0 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -185,6 +185,7 @@ sub installcheck
 sub check
 {
 	my $schedule = shift || 'parallel';
+	my $encoding = $ENV{ENCODING} || "SQL_ASCII";
 	# for backwards compatibility, "serial" runs the tests in
 	# parallel_schedule one by one.
 	my $maxconn = $maxconn;
@@ -199,7 +200,7 @@ sub check
 		"--bindir=",
 		"--schedule=${schedule}_schedule",
 		"--max-concurrent-tests=20",
-		"--encoding=SQL_ASCII",
+		"--encoding=${encoding}",
 		"--no-locale",
 		"--temp-instance=./tmp_check");
 	push(@args, $maxconn) if $maxconn;
-- 
2.11.0

From 4d334e48e468a05f1ef3136ce51c7f7668277026 Mon Sep 17 00:00:00 2001
From: Juan Jose Santamaria Flecha 
Date: Thu, 1 Dec 2022 09:58:29 -0500
Subject: [PATCH 1/2] WIN32 pg_import_system_collations

Windows can enumerate the locales that are either installed or supported by
calling EnumSystemLocalesEx(), similar to what is already done in the
READ_LOCALE_A_OUTPUT switch. We can refactor some of the logic already used
in that switch into a new function create_collation_from_locale().

The enumerated locales have BCP 47 shape, that is with a hyphen between
language and territory, instead of POSIX's underscore. The created collations
will retain the BCP 47 shape, but we will also create a POSIX alias, so xx-YY
will have an xx_YY alias.
---
 src/backend/commands/collationcmds.c | 236 +++
 1 file changed, 186 insertions(+), 50 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 81e54e0..df972bd 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -610,6 +610,167 @@ get_icu_locale_comment(const char *localename)
 #endif			/* USE_ICU */
 
 
+/* will we use EnumSystemLocalesEx in pg_import_system_collations? */
+#ifdef WIN32
+#define ENUM_SYSTEM_LOCALE
+#endif
+
+
+#if defined(READ_LOCALE_A_OUTPUT) || defined(ENUM_SYSTEM_LOCALE)
+/*
+ * Create a new collation using the input locale 'locale'.
+ *
+ * The parameter 'nvalid' is incremented if the locale has a valid encoding.
+ *
+ * The parameter 'ncreated' is incremented if the collatio

Re: WIN32 pg_import_system_collations

2022-11-30 Thread Peter Eisentraut

On 10.11.22 11:08, Juan José Santamaría Flecha wrote:

This looks pretty good to me.  The refactoring of the
non-Windows parts
makes sense.  The Windows parts look reasonable on manual
inspection,
but again, I don't have access to Windows here, so someone else
should
also look it over.

I was going to say that at least it is getting tested on the CI, but
I have found out that meson changes version(). That is fixed in this
version.


Now is currently failing due to [1], so maybe we can leave this patch on 
hold until that's addressed.


[1] 
https://www.postgresql.org/message-id/CAC%2BAXB1wJEqfKCuVcNpoH%3Dgxd61N%3D7c2fR3Ew6YRPpSfEUA%3DyQ%40mail.gmail.com 


What is the status of this now?  I think the other issue has been addressed?





Re: WIN32 pg_import_system_collations

2022-11-10 Thread Juan José Santamaría Flecha
On Wed, Nov 9, 2022 at 12:02 AM Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:

> On Mon, Nov 7, 2022 at 4:08 PM Peter Eisentraut <
> peter.eisentr...@enterprisedb.com> wrote:
>
>>
>> This looks pretty good to me.  The refactoring of the non-Windows parts
>> makes sense.  The Windows parts look reasonable on manual inspection,
>> but again, I don't have access to Windows here, so someone else should
>> also look it over.
>>
>> I was going to say that at least it is getting tested on the CI, but I
> have found out that meson changes version(). That is fixed in this version.
>

Now is currently failing due to [1], so maybe we can leave this patch on
hold until that's addressed.

[1]
https://www.postgresql.org/message-id/CAC%2BAXB1wJEqfKCuVcNpoH%3Dgxd61N%3D7c2fR3Ew6YRPpSfEUA%3DyQ%40mail.gmail.com


Regards,

Juan José Santamaría Flecha


Re: WIN32 pg_import_system_collations

2022-11-08 Thread Juan José Santamaría Flecha
On Mon, Nov 7, 2022 at 4:08 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

>
> This looks pretty good to me.  The refactoring of the non-Windows parts
> makes sense.  The Windows parts look reasonable on manual inspection,
> but again, I don't have access to Windows here, so someone else should
> also look it over.
>
> I was going to say that at least it is getting tested on the CI, but I
have found out that meson changes version(). That is fixed in this version.


> A small style issue: Change return (TRUE) to return TRUE.
>
> Fixed.


> The code
>
> +   if (strlen(localebuf) == 5 && localebuf[2] == '-')
>
> might be too specific.  At least on some POSIX systems, I have seen
> locales with a three-letter language name.  Maybe you should look with
> strchr() and not be too strict about the exact position.
>
> Ok, in this version the POSIX alias is created unconditionally.


> For the test patch, why is a separate test for non-UTF8 needed on
> Windows.  Does the UTF8 one not work?
>
> Windows locales will retain their CP_ACP encoding unless you change the OS
code page to UFT8, which is still experimental [1].


> +   version() !~ 'Visual C\+\+'
>
> This probably won't work for MinGW.
>
> When I proposed this patch it wouldn't have worked because of the
project's Windows minimum version requirement, now it should work in MinGW.
It actually doesn't because most locales are failing with "skipping locale
with unrecognized encoding", but checking what's wrong
with pg_get_encoding_from_locale() in MiNGW is subject for another thread.

[1]
https://stackoverflow.com/questions/56419639/what-does-beta-use-unicode-utf-8-for-worldwide-language-support-actually-do

Regards,

Juan José Santamaría Flecha


v8-0001-WIN32-pg_import_system_collations.patch
Description: Binary data


v8-0002-Add-collate.windows.win1252-test.patch
Description: Binary data


Re: WIN32 pg_import_system_collations

2022-11-07 Thread Peter Eisentraut

On 04.11.22 23:08, Juan José Santamaría Flecha wrote:

Ok, I can definitely improve the comments for that function.

Also consider describing in the commit message what you are doing in
more detail, including some of the things that have been discussed in
this thread.

Going through the thread for the commit message, I think that maybe the 
collation naming remarks were not properly addressed. In the current 
version the collations retain their native name, but an alias is created 
for those with a shape that we can assume a POSIX equivalent exists.


This looks pretty good to me.  The refactoring of the non-Windows parts 
makes sense.  The Windows parts look reasonable on manual inspection, 
but again, I don't have access to Windows here, so someone else should 
also look it over.


A small style issue: Change return (TRUE) to return TRUE.

The code

+   if (strlen(localebuf) == 5 && localebuf[2] == '-')

might be too specific.  At least on some POSIX systems, I have seen 
locales with a three-letter language name.  Maybe you should look with 
strchr() and not be too strict about the exact position.


For the test patch, why is a separate test for non-UTF8 needed on 
Windows.  Does the UTF8 one not work?


+   version() !~ 'Visual C\+\+'

This probably won't work for MinGW.





Re: WIN32 pg_import_system_collations

2022-11-04 Thread Juan José Santamaría Flecha
On Mon, Oct 31, 2022 at 3:09 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

Thanks for taking a look into this patch.

>
> Consider this function you are introducing:
>
> +/*
> + * Create a collation if the input locale is valid for so.
> + * Also keeps track of the number of valid locales and collations created.
> + */
> +static int
> +CollationFromLocale(char *isolocale, char *localebuf, int *nvalid,
> +   int *ncreated, int nspid)
>
> This declaration is incomprehensible without studying all the callers
> and the surrounding code.
>
> Start with the name: What does "collation from locale" mean?  Does it
> make a collation?  Does it convert one?  Does it find one?  There should
> be a verb in there.
>
> (I think in the context of this file, a lower case name would be more
> appropriate for a static function.)
>
> Then the arguments.  The input arguments should be "const".  All the
> arguments should be documented.  What is "isolocale", what is
> "localebuf", how are they different?  What is being counted by "valid"
> (collatons?, locales?), and what makes a thing valid and invalid?  What
> is being "created"?  What is nspid?  What is the return value?
>
> Please make another pass over this.
>
> Ok, I can definitely improve the comments for that function.


> Also consider describing in the commit message what you are doing in
> more detail, including some of the things that have been discussed in
> this thread.
>
> Going through the thread for the commit message, I think that maybe the
collation naming remarks were not properly addressed. In the current
version the collations retain their native name, but an alias is created
for those with a shape that we can assume a POSIX equivalent exists.

Please find attached a new version.

Regards,

Juan José Santamaría Flecha


v7-0002-Add-collate.windows.win1252-test.patch
Description: Binary data


v7-0001-WIN32-pg_import_system_collations.patch
Description: Binary data


Re: WIN32 pg_import_system_collations

2022-10-31 Thread Peter Eisentraut

On 12.07.22 21:32, Juan José Santamaría Flecha wrote:
Please find attached a rebased version. I have split the patch into two 
parts trying to make it easier to review, one with the code changes and 
the other with the test.


Other than that, there are minimal changes from the previous version to 
the code due to the update of _WIN32_WINNT and enabling the test on cirrus.


I'm not familiar with Windows, so I'm just looking at the overall 
structure of this patch.  I think it pretty much makes sense.  But we 
need to consider that this operates on the confluence of various 
different operating system interfaces that not all people will be 
familiar with, so we need to really get the documentation done well.


Consider this function you are introducing:

+/*
+ * Create a collation if the input locale is valid for so.
+ * Also keeps track of the number of valid locales and collations created.
+ */
+static int
+CollationFromLocale(char *isolocale, char *localebuf, int *nvalid,
+   int *ncreated, int nspid)

This declaration is incomprehensible without studying all the callers 
and the surrounding code.


Start with the name: What does "collation from locale" mean?  Does it 
make a collation?  Does it convert one?  Does it find one?  There should 
be a verb in there.


(I think in the context of this file, a lower case name would be more 
appropriate for a static function.)


Then the arguments.  The input arguments should be "const".  All the 
arguments should be documented.  What is "isolocale", what is 
"localebuf", how are they different?  What is being counted by "valid" 
(collatons?, locales?), and what makes a thing valid and invalid?  What 
is being "created"?  What is nspid?  What is the return value?


Please make another pass over this.

Also consider describing in the commit message what you are doing in 
more detail, including some of the things that have been discussed in 
this thread.






Re: WIN32 pg_import_system_collations

2022-07-12 Thread Juan José Santamaría Flecha
Please find attached a rebased version. I have split the patch into two
parts trying to make it easier to review, one with the code changes and the
other with the test.

Other than that, there are minimal changes from the previous version to the
code due to the update of _WIN32_WINNT and enabling the test on cirrus.

Regards,

Juan José Santamaría Flecha

>


v6-0001-WIN32-pg_import_system_collations.patch
Description: Binary data


v6-0002-Add-collate.windows.win1252-test.patch
Description: Binary data


Re: WIN32 pg_import_system_collations

2022-04-11 Thread Juan José Santamaría Flecha
On Tue, Mar 22, 2022 at 2:00 AM Andres Freund  wrote:

>
> Currently fails to apply, please rebase:
> http://cfbot.cputube.org/patch_37_3450.log
>
> Marked as waiting-on-author.
>
> Please, find attached a rebased version, no other significant change.

Regards,

Juan José Santamaría Flecha


v5-0001-WIN32-pg_import_system_collations.patch
Description: Binary data


Re: WIN32 pg_import_system_collations

2022-03-21 Thread Andres Freund
Hi,

On 2022-01-25 15:49:01 +0100, Juan José Santamaría Flecha wrote:
> So, I'm doing that in the attached version, just changing the object name.

Currently fails to apply, please rebase: 
http://cfbot.cputube.org/patch_37_3450.log

Marked as waiting-on-author.

- Andres




Re: WIN32 pg_import_system_collations

2022-01-25 Thread Juan José Santamaría Flecha
On Tue, Jan 25, 2022 at 11:40 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 24.01.22 22:23, Dmitry Koval wrote:
>

Thanks for looking into this.


> > +/*
> > + * Windows will use hyphens between language and territory, where POSIX
> > + * uses an underscore. Simply make it POSIX looking.
> > + */
> > + hyphen = strchr(localebuf, '-');
> > + if (hyphen)
> > +*hyphen = '_';
> >
> > After this block modified collation name is used in function
> >
> > GetNLSVersionEx(COMPARE_STRING, wide_collcollate, )
> >
> > (see win32_read_locale() -> CollationFromLocale() -> CollationCreate()
> > call). Is it correct to use (wide_collcollate = "en_NZ") instead of
> > (wide_collcollate = "en-NZ") in GetNLSVersionEx() function?
>

The problem that David Rowley addressed was coming from Windows collations
in the shape of "English_New Zealand", GetNLSVersionEx() will work with
both "en_NZ" and "en-NZ". You can check collversion in pg_collation in the
patched version.

>
> I don't really know if this is necessary anyway.  Just create the
> collations with the names that the operating system presents.  There is
> no requirement to make the names match POSIX.
>
> If you want to make them match POSIX for some reason, you can also just
> change the object name but leave the collcollate/collctype fields the
> way they came from the OS.
>

I think there is some value in making collation names consistent across
different platforms, e.g. making user scripts more portable. So, I'm doing
that in the attached version, just changing the object name.

Regards,

Juan José Santamaría Flecha


v4-0001-WIN32-pg_import_system_collations.patch
Description: Binary data


Re: WIN32 pg_import_system_collations

2022-01-25 Thread Peter Eisentraut

On 24.01.22 22:23, Dmitry Koval wrote:

+/*
+ * Windows will use hyphens between language and territory, where POSIX
+ * uses an underscore. Simply make it POSIX looking.
+ */
+ hyphen = strchr(localebuf, '-');
+ if (hyphen)
+    *hyphen = '_';

After this block modified collation name is used in function

GetNLSVersionEx(COMPARE_STRING, wide_collcollate, )

(see win32_read_locale() -> CollationFromLocale() -> CollationCreate()
call). Is it correct to use (wide_collcollate = "en_NZ") instead of
(wide_collcollate = "en-NZ") in GetNLSVersionEx() function?


I don't really know if this is necessary anyway.  Just create the 
collations with the names that the operating system presents.  There is 
no requirement to make the names match POSIX.


If you want to make them match POSIX for some reason, you can also just 
change the object name but leave the collcollate/collctype fields the 
way they came from the OS.





Re: WIN32 pg_import_system_collations

2022-01-24 Thread Dmitry Koval

Hi Juan José,

I a bit tested this feature and have small doubts about block:

+/*
+ * Windows will use hyphens between language and territory, where POSIX
+ * uses an underscore. Simply make it POSIX looking.
+ */
+ hyphen = strchr(localebuf, '-');
+ if (hyphen)
+*hyphen = '_';

After this block modified collation name is used in function

GetNLSVersionEx(COMPARE_STRING, wide_collcollate, )

(see win32_read_locale() -> CollationFromLocale() -> CollationCreate()
call). Is it correct to use (wide_collcollate = "en_NZ") instead of
(wide_collcollate = "en-NZ") in GetNLSVersionEx() function?

1) Documentation [1], [2], quote:
If it is a neutral locale for which the script is significant,
the pattern is 

Re: WIN32 pg_import_system_collations

2022-01-24 Thread Dmitry Koval

Hi Juan José,

I a bit tested this feature and have small doubts about block:

+/*
+ * Windows will use hyphens between language and territory, where POSIX
+ * uses an underscore. Simply make it POSIX looking.
+ */
+ hyphen = strchr(localebuf, '-');
+ if (hyphen)
+*hyphen = '_';

After this block modified collation name is used in function

GetNLSVersionEx(COMPARE_STRING, wide_collcollate, )

(see win32_read_locale() -> CollationFromLocale() -> CollationCreate()
call). Is it correct to use (wide_collcollate = "en_NZ") instead of
(wide_collcollate = "en-NZ") in GetNLSVersionEx() function?

1) Documentation [1], [2], quote:
If it is a neutral locale for which the script is significant,
the pattern is 

Re: WIN32 pg_import_system_collations

2022-01-19 Thread Juan José Santamaría Flecha
On Wed, Jan 19, 2022 at 10:53 AM Julien Rouhaud  wrote:

>
> This version doesn't apply anymore:
>
> Thanks for the heads up.

Please find attached a rebased patch. I have also rewritten some comments
to address previous reviews, code and test remain the same.

Regards,

Juan José Santamaría Flecha


v3-0001-WIN32-pg_import_system_collations.patch
Description: Binary data


Re: WIN32 pg_import_system_collations

2022-01-19 Thread Julien Rouhaud
Hi,

On Mon, Dec 13, 2021 at 05:28:47PM +0100, Juan José Santamaría Flecha wrote:
> On Mon, Dec 13, 2021 at 9:41 AM Juan José Santamaría Flecha <
> juanjo.santama...@gmail.com> wrote:
> 
> Per path tester.

This version doesn't apply anymore:

http://cfbot.cputube.org/patch_36_3450.log
=== Applying patches on top of PostgreSQL commit ID 
e0e567a106726f6709601ee7cffe73eb6da8084e ===
=== applying patch ./v2-0001-WIN32-pg_import_system_collations.patch
[...]
patching file src/tools/msvc/vcregress.pl
Hunk #1 succeeded at 153 (offset -1 lines).
Hunk #2 FAILED at 170.
1 out of 2 hunks FAILED -- saving rejects to file 
src/tools/msvc/vcregress.pl.rej

Could you send a rebased version?  In the meantime I will switch the CF entry
to Waiting on Author.




Re: WIN32 pg_import_system_collations

2021-12-14 Thread Thomas Munro
On Wed, Dec 15, 2021 at 3:52 PM Michael Paquier  wrote:
> On Wed, Dec 15, 2021 at 10:45:28AM +1300, Thomas Munro wrote:
> > Ah, right.  I hope we can make the leap to 0x0A00 (Win10) soon and
> > just stop thinking about these old ghosts, as mentioned by various
> > people in various threads.
>
> Seeing your message here..  My apologies for the short digression.
> Would that mean that we could use CreateSymbolicLinkA() as a mapper
> for pgreadlink() rather than junction points?  I am wondering how much
> code in src/port/ such a move could allow us to do.

Sadly, (1) it wouldn't work unless running with a special privilege or
as admin, and (2) it wouldn't work on non-NTFS filesystems.  I think
it's mostly intended to allow things like unpacking tarballs, checking
out git repos etc etc etc that came from Unix systems, which is why it
works with 'developer mode' enabled[1], though obviously it wouldn't
be totally impossible for us to require that privilege.  Didn't seem
great to me, though, that's why I gave up on it over in
https://commitfest.postgresql.org/36/3090/ where this was recently
discussed.

[1] https://blogs.windows.com/windowsdeveloper/2016/12/02/symlinks-windows-10/




Re: WIN32 pg_import_system_collations

2021-12-14 Thread Michael Paquier
On Wed, Dec 15, 2021 at 10:45:28AM +1300, Thomas Munro wrote:
> Ah, right.  I hope we can make the leap to 0x0A00 (Win10) soon and
> just stop thinking about these old ghosts, as mentioned by various
> people in various threads.

Seeing your message here..  My apologies for the short digression.
Would that mean that we could use CreateSymbolicLinkA() as a mapper
for pgreadlink() rather than junction points?  I am wondering how much
code in src/port/ such a move could allow us to do.
--
Michael


signature.asc
Description: PGP signature


Re: WIN32 pg_import_system_collations

2021-12-14 Thread Thomas Munro
On Wed, Dec 15, 2021 at 9:14 AM Juan José Santamaría Flecha
 wrote:
> What I meant to say is that to run the test, you need a database that has 
> successfully run pg_import_system_collations. This would be also possible in 
> Mingw for _WIN32_WINNT> = 0x0600, but the current value in 
> src\include\port\win32.h is _WIN32_WINNT = 0x0501 when compiling with Mingw.

Ah, right.  I hope we can make the leap to 0x0A00 (Win10) soon and
just stop thinking about these old ghosts, as mentioned by various
people in various threads.  Do you happen to know if there are
complications for that, with the non-MSVC tool chains?




Re: WIN32 pg_import_system_collations

2021-12-14 Thread Juan José Santamaría Flecha
On Mon, Dec 13, 2021 at 9:54 PM Thomas Munro  wrote:

>
> I haven't tested yet but +1 for the feature.  I guess the API didn't
> exist at the time collation support was added.
>
> Good to hear.


> This conversion makes sense, to keep the user experience the same
> across platforms.   Nitpick on the comment: why ANSI?  I think we can
> call "en_NZ" a POSIX locale identifier[1], and I think we can call
> "en-NZ" a BCP 47 language tag.
>
> POSIX also works for me.


> When would the full set of locales not be installed on a Windows
> system, and why does this need Visual Studio?  Wondering if this test
> will work with some of the frankenstein/cross toolchains tool chains
> (not objecting if it doesn't and could be skipped, just trying to
> understand the comment).
>
> What I meant to say is that to run the test, you need a database that has
successfully run pg_import_system_collations. This would be also possible
in Mingw for _WIN32_WINNT> = 0x0600, but the current value in
src\include\port\win32.h is _WIN32_WINNT = 0x0501 when compiling with
Mingw.


> Regards,

 Juan José Santamaría Flecha


Re: WIN32 pg_import_system_collations

2021-12-13 Thread Thomas Munro
On Tue, Dec 14, 2021 at 5:29 AM Juan José Santamaría Flecha
 wrote:
> On Mon, Dec 13, 2021 at 9:41 AM Juan José Santamaría Flecha 
>  wrote:
> Per path tester.

Hi Juan José,

I haven't tested yet but +1 for the feature.  I guess the API didn't
exist at the time collation support was added.

+/*
+ * Windows will use hyphens between language and territory, where ANSI
+ * uses an underscore. Simply make it ANSI looking.
+ */
+hyphen = strchr(localebuf, '-');
+if (hyphen)
+*hyphen = '_';
+

This conversion makes sense, to keep the user experience the same
across platforms.   Nitpick on the comment: why ANSI?  I think we can
call "en_NZ" a POSIX locale identifier[1], and I think we can call
"en-NZ" a BCP 47 language tag.

+/*
+ * This test is for Windows/Visual Studio systems and assumes that a full set
+ * of locales is installed. It must be run in a database with WIN1252 encoding,
+ * because of the locales' encondings. We lose some interesting cases from the
+ * UTF-8 version, like Turkish dotted and undotted 'i' or Greek sigma.
+ */

s/encondings/encodings/

When would the full set of locales not be installed on a Windows
system, and why does this need Visual Studio?  Wondering if this test
will work with some of the frankenstein/cross toolchains tool chains
(not objecting if it doesn't and could be skipped, just trying to
understand the comment).

Slightly related to this, in case you didn't see it, I'd also like to
use BCP 47 tags for the default locale for PostgreSQL 15[2].

[1] https://en.wikipedia.org/wiki/Locale_(computer_software)#POSIX_platforms
[2] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com




Re: WIN32 pg_import_system_collations

2021-12-13 Thread Juan José Santamaría Flecha
On Mon, Dec 13, 2021 at 9:41 AM Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:

Per path tester.


> Regards,
>
> Juan José Santamaría Flecha
>


v2-0001-WIN32-pg_import_system_collations.patch
Description: Binary data


WIN32 pg_import_system_collations

2021-12-13 Thread Juan José Santamaría Flecha
I want to propose an implementation of pg_import_system_collations() for
WIN32 using EnumSystemLocalesEx() [1], which is available from Windows
Server 2008 onwards.

The patch includes a test emulating that of collate.linux.utf8, but for
Windows-1252. The main difference is that it doesn't have the tests for
Turkish dotted and undotted 'i', since that locale is WIN1254.

I am opening an item in the commitfest for this.

[1]
https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-enumsystemlocalesex

Regards,

Juan José Santamaría Flecha
From ddf3114e82dc32edb59278d9ed2cb995222a5adf Mon Sep 17 00:00:00 2001
From: Juan Jose Santamaria Flecha 
Date: Fri, 10 Dec 2021 17:44:15 -0500
Subject: [PATCH] WIN32 pg_import_system_collations . Adds the
 pg_import_system_collations() functionality for Windows newer than 2008 using
 EnumSystemLocalesEx(). Also adding a test emulating collate.linux.utf8 but
 for Windows-1252

---
 src/backend/commands/collationcmds.c   |  197 +++-
 .../regress/expected/collate.windows.win1252.out   | 1019 
 .../regress/expected/collate.windows.win1252_1.out |   12 +
 src/test/regress/parallel_schedule |2 +-
 src/test/regress/sql/collate.windows.win1252.sql   |  418 
 src/tools/msvc/vcregress.pl|3 +-
 6 files changed, 1599 insertions(+), 52 deletions(-)
 create mode 100644 src/test/regress/expected/collate.windows.win1252.out
 create mode 100644 src/test/regress/expected/collate.windows.win1252_1.out
 create mode 100644 src/test/regress/sql/collate.windows.win1252.sql

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 53fc579..c286113 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -522,6 +522,128 @@ get_icu_locale_comment(const char *localename)
 }
 #endif			/* USE_ICU */
 
+/* will we use EnumSystemLocalesEx in pg_import_system_collations? */
+#if defined(WIN32) && _WIN32_WINNT >= 0x0600
+#define ENUM_SYSTEM_LOCALE
+#endif
+
+#if defined(READ_LOCALE_A_OUTPUT) || defined(ENUM_SYSTEM_LOCALE)
+/*
+ * Create a collation if the input locale is valid for so.
+ * Also keeps track of the number of valid locales and collations created.
+ */
+static int
+CollationFromLocale(char *localebuf, int *nvalid, int *ncreated, int nspid)
+{
+	int			enc;
+	Oid			collid;
+
+	/*
+	 * Some systems have locale names that don't consist entirely of
+	 * ASCII letters (such as "bokml" or "franais").
+	 * This is pretty silly, since we need the locale itself to
+	 * interpret the non-ASCII characters. We can't do much with
+	 * those, so we filter them out.
+	 */
+	if (!pg_is_ascii(localebuf))
+	{
+		elog(DEBUG1, "skipping locale with non-ASCII name: \"%s\"", localebuf);
+		return -1;
+	}
+
+	enc = pg_get_encoding_from_locale(localebuf, false);
+	if (enc < 0)
+	{
+		elog(DEBUG1, "skipping locale with unrecognized encoding: \"%s\"",
+			 localebuf);
+		return -1;
+	}
+	if (!PG_VALID_BE_ENCODING(enc))
+	{
+		elog(DEBUG1, "skipping locale with client-only encoding: \"%s\"", localebuf);
+		return -1;
+	}
+	if (enc == PG_SQL_ASCII)
+		return -1;		/* C/POSIX are already in the catalog */
+
+	/* count valid locales found in operating system */
+	*nvalid += 1;
+
+	/*
+	 * Create a collation named the same as the locale, but quietly
+	 * doing nothing if it already exists.  This is the behavior we
+	 * need even at initdb time, because some versions of "locale -a"
+	 * can report the same locale name more than once.  And it's
+	 * convenient for later import runs, too, since you just about
+	 * always want to add on new locales without a lot of chatter
+	 * about existing ones.
+	 */
+	collid = CollationCreate(localebuf, nspid, GetUserId(),
+			 COLLPROVIDER_LIBC, true, enc,
+			 localebuf, localebuf,
+			 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
+			 true, true);
+	if (OidIsValid(collid))
+	{
+		*ncreated += 1;
+
+		/* Must do CCI between inserts to handle duplicates correctly */
+		CommandCounterIncrement();
+	}
+
+	return enc;
+}
+#endif			/* defined(READ_LOCALE_A_OUTPUT) || defined(ENUM_SYSTEM_LOCALE) */
+
+#ifdef ENUM_SYSTEM_LOCALE
+/* Parameter to be passed to the callback function win32_read_locale() */
+typedef struct
+{
+	int		   *ncreated;
+	int		   *nvalid;
+	Oid			nspid;
+} CollParam;
+
+/*
+ * Callback function for EnumSystemLocalesEx() in pg_import_system_collations()
+ * Creates a collation for every valid locale.
+ */
+BOOL CALLBACK
+win32_read_locale(LPWSTR pStr, DWORD dwFlags, LPARAM lparam)
+{
+	CollParam  *param = (CollParam *) lparam;
+	char		localebuf[NAMEDATALEN];
+	int			result;
+	char	   *hyphen;
+
+	(void) dwFlags;
+
+	result = WideCharToMultiByte(CP_ACP, 0, pStr, -1, localebuf, NAMEDATALEN,
+ NULL, NULL);
+
+	if (result == 0)
+	{
+		if (GetLastError(