Re: [HACKERS] fixes for the Danish locale

2016-07-26 Thread Bjorn Munch
On 21/07 08.42, Jeff Janes wrote:
> In Danish, the sequence 'aa' is sometimes treated as a single letter
> which collates after 'z'.

For the record: this is also true for Norwegian, in both locales it
collates equal to 'Ã¥' which is the 29th letter of the alphabet. But
'aa' is no longer used in ordinary words, only names (in Norwegian
only personal names, in Danish also place names).

- Bjorn Munch


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-22 Thread Noah Misch
On Thu, Jul 21, 2016 at 03:53:45PM -0400, Andrew Dunstan wrote:
> 
> 
> On 07/21/2016 02:26 PM, Greg Stark wrote:
> >On Thu, Jul 21, 2016 at 5:44 PM, Tom Lane  wrote:
> >>Confirmed here.  Will deal with it, but I wonder why we have no buildfarm
> >>members covering this ...
> >We're not going to have a build farm member for every locale the local
> >systems support.
> >
> >Perhaps the build farm script should pick a random locale for each
> >run. Either a random locale from the set on the OS or a random
> >language from a list of locale that the regression tests are intended
> >to be safe for.
> >
> 
> 
> I don't see why we shouldn't have a buildfarm machine that tests a very
> large number of locales. It takes a very lightly resourced machine like
> nightjar just over two minutes per locale. The list of locales to test is a
> setting in the config file.

+1.  Ten animals of ~75 locales apiece would give fair per-animal runtime.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-22 Thread Andreas Karlsson

On 07/22/2016 03:59 AM, Jeff Janes wrote:

On Thu, Jul 21, 2016 at 2:11 PM, Tom Lane  wrote:

I see that the core tests fall over in Turkish still :-(


Turkish has never passed (at least back to 9.0).  It looks like it is
in the stemming functions.  I don't understand why, I would think
everything other than English would be failing those if the regression
tests hard-code English stemming expectations but fail to arrange for
English stemming rules.


If something fails for Turkish but not other languages it is usually due 
to the upper/lower casing rules of the dotted and the dotless I (I -> ı 
and İ -> i rather than most languages which have I -> i).


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-22 Thread Tom Lane
Jeff Janes  writes:
> The attached patch fixes regression tests for Welsh (cy_GB), needed in
> 9.5 and 9.6.

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-22 Thread Jeff Janes
On Thu, Jul 21, 2016 at 11:49 AM, Jeff Janes  wrote:
> On Thu, Jul 21, 2016 at 9:44 AM, Tom Lane  wrote:
>> Jeff Janes  writes:
>>> In Danish, the sequence 'aa' is sometimes treated as a single letter
>>> which collates after 'z'.
>>> Some regression tests got into 9.5, and are still in 9.6beta3, which
>>> fail due to assuming they know how things will sort or compare.
>>
>> Confirmed here.  Will deal with it, but I wonder why we have no buildfarm
>> members covering this ...
>>
>
> My CentOS box came with 735 locales installed, so testing all of them
> on a regular basis would be quite a task.  And it doesn't help that
> many of them seem to be very slow compared to C locale.
>
> I guess the good news is that nothing I tested which was working in
> 9.5 is broken in 9.6, but several things which were working in 9.4 did
> get broken in 9.5 and still are in 9.6.
>
> The Danish fix will probably also fix the (very large) Norwegian family.
>
> The Welsh (cy_GB) apparently put 'dd' after 'f', which breaks row
> level security in much the same way as 'aa' does.
>
> I think that that will cover all of the ones that were working in 9.4.

The attached patch fixes regression tests for Welsh (cy_GB), needed in
9.5 and 9.6.

Cheers,

Jeff


welsh_rowlevel.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Tom Lane
Jeff Janes  writes:
> On Thu, Jul 21, 2016 at 2:11 PM, Tom Lane  wrote:
>> I see that the core tests fall over in Turkish still :-(

> Turkish has never passed (at least back to 9.0).  It looks like it is
> in the stemming functions.  I don't understand why, I would think
> everything other than English would be failing those if the regression
> tests hard-code English stemming expectations but fail to arrange for
> English stemming rules.

It looks to me like the 'simple' dictionary assumes it can apply the
lowercasing rules implied by LC_CTYPE regardless of which language
it's supposedly working on.  This is probably something we should
improve sometime, but I doubt it's an easy change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Jeff Janes
On Thu, Jul 21, 2016 at 2:11 PM, Tom Lane  wrote:
> Jeff Janes  writes:
>> In Danish, the sequence 'aa' is sometimes treated as a single letter
>> which collates after 'z'.
>> Some regression tests got into 9.5, and are still in 9.6beta3, which
>> fail due to assuming they know how things will sort or compare.
>
> As of HEAD, "LANG=danish make check-world" passes for me, which it
> did not before the round of fixes I just pushed.
>
> I see that the core tests fall over in Turkish still :-(

Turkish has never passed (at least back to 9.0).  It looks like it is
in the stemming functions.  I don't understand why, I would think
everything other than English would be failing those if the regression
tests hard-code English stemming expectations but fail to arrange for
English stemming rules.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Tom Lane
Jeff Janes  writes:
> In Danish, the sequence 'aa' is sometimes treated as a single letter
> which collates after 'z'.
> Some regression tests got into 9.5, and are still in 9.6beta3, which
> fail due to assuming they know how things will sort or compare.

As of HEAD, "LANG=danish make check-world" passes for me, which it
did not before the round of fixes I just pushed.

I see that the core tests fall over in Turkish still :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Andrew Dunstan



On 07/21/2016 02:26 PM, Greg Stark wrote:

On Thu, Jul 21, 2016 at 5:44 PM, Tom Lane  wrote:

Confirmed here.  Will deal with it, but I wonder why we have no buildfarm
members covering this ...

We're not going to have a build farm member for every locale the local
systems support.

Perhaps the build farm script should pick a random locale for each
run. Either a random locale from the set on the OS or a random
language from a list of locale that the regression tests are intended
to be safe for.




I don't see why we shouldn't have a buildfarm machine that tests a very 
large number of locales. It takes a very lightly resourced machine like 
nightjar just over two minutes per locale. The list of locales to test 
is a setting in the config file.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Peter Geoghegan
On Thu, Jul 21, 2016 at 11:49 AM, Jeff Janes  wrote:
> Does testing in other locales ever uncover bugs other than those in
> the tests themselves?  Is it worth trying to maintain broad coverage?

Potentially, yes. The strxfrm() inconsistency issue disproportionately
affected de_DE.utf8, for example. There were other locales that were
affected less severely, and I think the majority were not shown to be
affected at all.

That being said, it probably wouldn't have caught that particular
issue if we had broad coverage. It probably would catch a broken test,
though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Peter Geoghegan
On Thu, Jul 21, 2016 at 11:44 AM, Tom Lane  wrote:
> Note that there are certain locales we've deliberately chosen not to
> support in some regression tests (see e.g. plpython_unicode.sql), so
> I'm not really willing to buy into the idea that "any random locale found
> on a buildfarm animal should work" anyway.  I'm much more interested in
> supporting locales that someone cares enough about to configure a
> buildfarm animal for.

That seems like a high standard to me. Locale rules are known to
change, and are explicitly versioned by glibc, for example.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Jeff Janes
On Thu, Jul 21, 2016 at 9:44 AM, Tom Lane  wrote:
> Jeff Janes  writes:
>> In Danish, the sequence 'aa' is sometimes treated as a single letter
>> which collates after 'z'.
>> Some regression tests got into 9.5, and are still in 9.6beta3, which
>> fail due to assuming they know how things will sort or compare.
>
> Confirmed here.  Will deal with it, but I wonder why we have no buildfarm
> members covering this ...
>

My CentOS box came with 735 locales installed, so testing all of them
on a regular basis would be quite a task.  And it doesn't help that
many of them seem to be very slow compared to C locale.

I guess the good news is that nothing I tested which was working in
9.5 is broken in 9.6, but several things which were working in 9.4 did
get broken in 9.5 and still are in 9.6.

The Danish fix will probably also fix the (very large) Norwegian family.

The Welsh (cy_GB) apparently put 'dd' after 'f', which breaks row
level security in much the same way as 'aa' does.

I think that that will cover all of the ones that were working in 9.4.

Does testing in other locales ever uncover bugs other than those in
the tests themselves?  Is it worth trying to maintain broad coverage?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, Jul 21, 2016 at 11:29 AM, Tom Lane  wrote:
>> Nah, we have a hard enough time with reproducibility of buildfarm results
>> without deliberately injecting transient failures.

> It could be pseudo-random, and so deterministic per buildfarm animal.
> That's what I did.

I'm not impressed with that proposal either --- then we don't even have
any control over what set of locales are getting tested.

Note that there are certain locales we've deliberately chosen not to
support in some regression tests (see e.g. plpython_unicode.sql), so
I'm not really willing to buy into the idea that "any random locale found
on a buildfarm animal should work" anyway.  I'm much more interested in
supporting locales that someone cares enough about to configure a
buildfarm animal for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Peter Geoghegan
On Thu, Jul 21, 2016 at 11:29 AM, Tom Lane  wrote:
>> Perhaps the build farm script should pick a random locale for each
>> run. Either a random locale from the set on the OS or a random
>> language from a list of locale that the regression tests are intended
>> to be safe for.
>
> Nah, we have a hard enough time with reproducibility of buildfarm results
> without deliberately injecting transient failures.

It could be pseudo-random, and so deterministic per buildfarm animal.
That's what I did.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Tom Lane
Greg Stark  writes:
> On Thu, Jul 21, 2016 at 5:44 PM, Tom Lane  wrote:
>> Confirmed here.  Will deal with it, but I wonder why we have no buildfarm
>> members covering this ...

> We're not going to have a build farm member for every locale the local
> systems support.

Probably not, but Danish seems odd enough to be worth testing.  Aside
from this issue, I found one in the pltcl tests.

> Perhaps the build farm script should pick a random locale for each
> run. Either a random locale from the set on the OS or a random
> language from a list of locale that the regression tests are intended
> to be safe for.

Nah, we have a hard enough time with reproducibility of buildfarm results
without deliberately injecting transient failures.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Peter Geoghegan
On Thu, Jul 21, 2016 at 11:26 AM, Greg Stark  wrote:
> Perhaps the build farm script should pick a random locale for each
> run. Either a random locale from the set on the OS or a random
> language from a list of locale that the regression tests are intended
> to be safe for.

That's more or less what I did with the amcheck regression tests.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Greg Stark
On Thu, Jul 21, 2016 at 5:44 PM, Tom Lane  wrote:
>
> Confirmed here.  Will deal with it, but I wonder why we have no buildfarm
> members covering this ...

We're not going to have a build farm member for every locale the local
systems support.

Perhaps the build farm script should pick a random locale for each
run. Either a random locale from the set on the OS or a random
language from a list of locale that the regression tests are intended
to be safe for.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fixes for the Danish locale

2016-07-21 Thread Tom Lane
Jeff Janes  writes:
> In Danish, the sequence 'aa' is sometimes treated as a single letter
> which collates after 'z'.
> Some regression tests got into 9.5, and are still in 9.6beta3, which
> fail due to assuming they know how things will sort or compare.

Confirmed here.  Will deal with it, but I wonder why we have no buildfarm
members covering this ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers