Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-27 Thread Robert Haas
On Mon, Jan 26, 2015 at 9:52 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Tue, Jan 27, 2015 at 4:21 AM, Robert Haas robertmh...@gmail.com wrote:
 That's what I hope to find out.  :-)
 Buildfarm seems happy now. I just gave a try to that on one of my
 small Windows VMs and compared the performance with 9.4 for this
 simple test case when building with MSVC 2010:
 create table aa as select random()::text as a, 'filler filler filler'
 as b a from generate_series(1,100);
 create index aai in aa(a):
 On 9.4, the index creation took 26.5s, while on master it took 18s.
 That's nice, particularly for things like a restore from a dump.

Cool.  That's a little bit smaller win than I would have expected
given my Linux results, but it's certainly respectable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-26 Thread Robert Haas
On Mon, Jan 26, 2015 at 2:10 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Jan 26, 2015 at 11:05 AM, Robert Haas robertmh...@gmail.com wrote:
 Now that these issues are fixed and the buildfarm is green again, I'm
 going to try re-enabling this optimization on Windows.  My working
 theory is that disabling that categorically was a mis-diagnosis of the
 real problem, and that now that the issues mentioned above are cleaned
 up, it'll just work.  That might not be right, but I think it's worth
 a try.

 Right. We're only supporting the C locale on Windows. How could
 Windows possibly be broken if locale-related functions like strxfrm()
 and strcoll() are not even called?

That's what I hope to find out.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-26 Thread Robert Haas
On Fri, Jan 23, 2015 at 12:34 PM, Robert Haas robertmh...@gmail.com wrote:
 In other words, even on systems that don't HAVE_LOCALE_T, we still
 have to support the default collation and the C collation, and they
 have to behave differently.  There's no way to make that work using
 only strxfrm(), because nothing gets passed to that function to tell
 it which of those two things it is supposed to do.

 Now even if the above were not an issue, for example because we
 dropped support for systems where !HAVE_LOCALE_T, I think it would be
 poor form to depend on strxfrm_l() to behave like memcpy() where we
 could just as easily call memcpy() and be *sure* that it was going to
 do what we wanted.  Much of writing good code is figuring out what
 could go wrong and then figuring out how to prevent it, and relying on
 strxfrm_l() would be an unnecessary risk regardless.  Given the
 existence of !HAVE_LOCALE_T systems, it's just plain broken.

Now that these issues are fixed and the buildfarm is green again, I'm
going to try re-enabling this optimization on Windows.  My working
theory is that disabling that categorically was a mis-diagnosis of the
real problem, and that now that the issues mentioned above are cleaned
up, it'll just work.  That might not be right, but I think it's worth
a try.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-26 Thread Peter Geoghegan
On Mon, Jan 26, 2015 at 11:05 AM, Robert Haas robertmh...@gmail.com wrote:
 Now that these issues are fixed and the buildfarm is green again, I'm
 going to try re-enabling this optimization on Windows.  My working
 theory is that disabling that categorically was a mis-diagnosis of the
 real problem, and that now that the issues mentioned above are cleaned
 up, it'll just work.  That might not be right, but I think it's worth
 a try.

Right. We're only supporting the C locale on Windows. How could
Windows possibly be broken if locale-related functions like strxfrm()
and strcoll() are not even called?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-26 Thread Michael Paquier
On Tue, Jan 27, 2015 at 4:21 AM, Robert Haas robertmh...@gmail.com wrote:
 That's what I hope to find out.  :-)
Buildfarm seems happy now. I just gave a try to that on one of my
small Windows VMs and compared the performance with 9.4 for this
simple test case when building with MSVC 2010:
create table aa as select random()::text as a, 'filler filler filler'
as b a from generate_series(1,100);
create index aai in aa(a):
On 9.4, the index creation took 26.5s, while on master it took 18s.
That's nice, particularly for things like a restore from a dump.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-23 Thread Robert Haas
On Thu, Jan 22, 2015 at 5:51 PM, Peter Geoghegan p...@heroku.com wrote:
 That having been said, it's clearer to continue to handle each case (C
 locale vs other locales) separately within the new
 bttext_abbrev_convert() function, just to be consistent, but also to
 avoid NUL-terminating the text strings to pass to strxfrm(), which as
 you point out is an avoidable cost. So, I'm not asking you to restore
 the terser uniform use of strxfrm() we had before, but, for what it's
 worth, that was not the real issue. The real issue was that strxfrm()
 spuriously used the wrong locale, as you said. Provided strxfrm() had
 the correct locale (the C locale), then AFAICT it ought to have been
 fine, regardless of whether or not it then behave exactly equivalently
 to memcpy().

I don't agree.  On a system where HAVE_LOCALE_T is not defined, there
is *no way* to get strxfrm() to behave like memcpy(), because we're
not passing a locale to it.  Clearly, strxfrm() is going to do a
locale-aware transformation in that case whether the user wrote
collate C or not.  The comments in regc_pg_locale.c explain:

 * 1. In C/POSIX collations, we use hard-wired code.  We can't depend on
 * the ctype.h functions since those will obey LC_CTYPE.  Note that these
 * collations don't give a fig about multibyte characters.
 *
 * 2. In the default collation (which is supposed to obey LC_CTYPE):
 *
 * 2a. When working in UTF8 encoding, we use the wctype.h functions if
 * available.  This assumes that every platform uses Unicode codepoints
 * directly as the wchar_t representation of Unicode.  On some platforms
 * wchar_t is only 16 bits wide, so we have to punt for codepoints  0x.
 *
 * 2b. In all other encodings, or on machines that lack wctype.h, we use
 * the ctype.h functions for pg_wchar values up to 255, and punt for values
 * above that.  This is only 100% correct in single-byte encodings such as
 * LATINn.  However, non-Unicode multibyte encodings are mostly Far Eastern
 * character sets for which the properties being tested here aren't very
 * relevant for higher code values anyway.  The difficulty with using the
 * wctype.h functions with non-Unicode multibyte encodings is that we can
 * have no certainty that the platform's wchar_t representation matches
 * what we do in pg_wchar conversions.
 *
 * 3. Other collations are only supported on platforms that HAVE_LOCALE_T.
 * Here, we use the locale_t-extended forms of the wctype.h and ctype.h
 * functions, under exactly the same cases as #2.

In other words, even on systems that don't HAVE_LOCALE_T, we still
have to support the default collation and the C collation, and they
have to behave differently.  There's no way to make that work using
only strxfrm(), because nothing gets passed to that function to tell
it which of those two things it is supposed to do.

Now even if the above were not an issue, for example because we
dropped support for systems where !HAVE_LOCALE_T, I think it would be
poor form to depend on strxfrm_l() to behave like memcpy() where we
could just as easily call memcpy() and be *sure* that it was going to
do what we wanted.  Much of writing good code is figuring out what
could go wrong and then figuring out how to prevent it, and relying on
strxfrm_l() would be an unnecessary risk regardless.  Given the
existence of !HAVE_LOCALE_T systems, it's just plain broken.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 10:57 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 21, 2015 at 2:30 PM, Peter Geoghegan p...@heroku.com wrote:
 Even following Robert's disabling of abbreviated keys on Windows,
 buildfarm animals hamerkop, brolga, currawong and bowerbird remain
 unhappy, with failing regression tests for collate and sometimes
 (but not always) aggregates. Some of these only use the C locale.

 I think that aggregates does not fail for brolga because it's built
 without locale_t support (not sure how to interpret MSVC configure
 output, but I think that the other animals do have such support).  So
 maybe this code being executed within btsortsupport_worker(), for the
 C locale on Windows is at issue (note that abbreviation is still
 disabled):

 tss-locale = pg_newlocale_from_collation(collid);

 Yes, you seem to have rather unwisely changed around the order of the
 checks in btsortsupport_worker().  I've just rewritten it heavily to
 try to more clearly separate the decision about whether to do
 fmgr-elision, and if so which comparator to use, from the decision
 about whether to use abbreviation.  Naturally I can't promise that
 this is completely right, but I hope that, if it's not, it will at
 least be easier to fix.  There's no sanity in removing the
 lc_collate_is_c() check from the top of the function and then trying
 to remember to exclude that case individually from the multiple places
 further down that count on the fact that they'll never be reached in
 that case.  This function may even have a third decision to make
 someday, and they can't all be intertwined.

This seems to have broken more stuff.  My working hypothesis is that
the culprit is here:

/*
 * There is no special handling of the C locale here, unlike with
 * varstr_cmp().  strxfrm() is used indifferently.
 */

As far as I can see, that's just hoping that when the locale is C,
strxfrm() is memcpy().  If it isn't, then you will potentially get
different results when the abbreviated keys stuff is used vs. when it
isn't, because when it isn't, we're going to memcmp(), and when it is,
we're going to memcmp() the results of strxfrm().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Robert Haas
On Wed, Jan 21, 2015 at 2:30 PM, Peter Geoghegan p...@heroku.com wrote:
 Even following Robert's disabling of abbreviated keys on Windows,
 buildfarm animals hamerkop, brolga, currawong and bowerbird remain
 unhappy, with failing regression tests for collate and sometimes
 (but not always) aggregates. Some of these only use the C locale.

 I think that aggregates does not fail for brolga because it's built
 without locale_t support (not sure how to interpret MSVC configure
 output, but I think that the other animals do have such support).  So
 maybe this code being executed within btsortsupport_worker(), for the
 C locale on Windows is at issue (note that abbreviation is still
 disabled):

 tss-locale = pg_newlocale_from_collation(collid);

Yes, you seem to have rather unwisely changed around the order of the
checks in btsortsupport_worker().  I've just rewritten it heavily to
try to more clearly separate the decision about whether to do
fmgr-elision, and if so which comparator to use, from the decision
about whether to use abbreviation.  Naturally I can't promise that
this is completely right, but I hope that, if it's not, it will at
least be easier to fix.  There's no sanity in removing the
lc_collate_is_c() check from the top of the function and then trying
to remember to exclude that case individually from the multiple places
further down that count on the fact that they'll never be reached in
that case.  This function may even have a third decision to make
someday, and they can't all be intertwined.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Peter Geoghegan
On Thu, Jan 22, 2015 at 12:34 PM, Robert Haas robertmh...@gmail.com wrote:
 This isn't really Windows-specific.  The root of the problem is that
 when LC_COLLATE=C you were trying to use strxfrm() for the abbreviated
 key even though memcmp() is the authoritative comparator in that case.
 Exactly which platforms happened to blow up as a result of that is
 kind of beside the point.

I don't see how that could be a problem. Even if the strxfrm() blob
wasn't identical to the original string (I think it should always be,
accept maybe on MacOSX), it's still reasonable to assume that there is
equivalent behavior across the C locale and locales with collations
like en_US.UTF-8. It wasn't as if I was using strxfrm() within
bttextfastcmp_c() -- just within bttext_abbrev_convert(), to form an
abbreviated key for text that uses the C locale.

The C locale is just another locale - AFAICT, the only reason we have
treated it differently in PostgreSQL is because that tended to avoid
needing to copy and NUL-terminate, which strcoll() requires. It might
be that that doesn't work out for some reason (but not because
strxfrm() will not have behavior equivalent to memcpy() with the C
locale -- I was *not* relying on that).

That having been said, it's clearer to continue to handle each case (C
locale vs other locales) separately within the new
bttext_abbrev_convert() function, just to be consistent, but also to
avoid NUL-terminating the text strings to pass to strxfrm(), which as
you point out is an avoidable cost. So, I'm not asking you to restore
the terser uniform use of strxfrm() we had before, but, for what it's
worth, that was not the real issue. The real issue was that strxfrm()
spuriously used the wrong locale, as you said. Provided strxfrm() had
the correct locale (the C locale), then AFAICT it ought to have been
fine, regardless of whether or not it then behave exactly equivalently
to memcpy().

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 11:28 AM, Robert Haas robertmh...@gmail.com wrote:
 This seems to have broken more stuff.  My working hypothesis is that
 the culprit is here:

 /*
  * There is no special handling of the C locale here, unlike with
  * varstr_cmp().  strxfrm() is used indifferently.
  */

 As far as I can see, that's just hoping that when the locale is C,
 strxfrm() is memcpy().  If it isn't, then you will potentially get
 different results when the abbreviated keys stuff is used vs. when it
 isn't, because when it isn't, we're going to memcmp(), and when it is,
 we're going to memcmp() the results of strxfrm().

As noted also on the thread Kevin started, I've now pushed a fix for
this and am eagerly awaiting new buildfarm returns.  I wonder if this
might account for the ordering failures we were seeing on Windows
before.  We thought it was a Windows-specific issue, but I bet the
real problem was here:

if (collid != DEFAULT_COLLATION_OID)
{
if (!OidIsValid(collid))
{
/*
 * This typically means that the parser could
not resolve a
 * conflict of implicit collations, so report
it that way.
 */
ereport(ERROR,

(errcode(ERRCODE_INDETERMINATE_COLLATION),
 errmsg(could not determine
which collation to use for string comparison),
 errhint(Use the COLLATE
clause to set the collation explicitly.)));
}
#ifdef HAVE_LOCALE_T
tss-locale = pg_newlocale_from_collation(collid);
#endif
}

Before the abbreviated keys commit, this code only ran if we'd already
determined that lc_collate_is_c(collid) was false.  But that commit
made this also run when that value was true.  So if HAVE_LOCALE_T and
collid != DEFAULT_COLLATION_ID, then tss-locale was getting set.  If
it got set to something that made strxfrm() behave like memcmp(), then
all was well.  If not, then life was bad.  Now you'd hope that would
work out, but maybe not.

Also, suppose HAVE_LOCALE_T was defined but collid ==
DEFAULT_COLLATION_ID.  Then tss-locale didn't get initialized at all,
and bttext_abbrev_convert() just passed whatever stupid value it found
there to strxfrm_l().

Finally, suppose we *didn't* HAVE_LOCALE_T.   Then, the
non-abbreviated comparator knew it should use memcmp(), but the
abbreviated comparator was happy to use strxfrm() even though that
would use the current locale, but the C locale that the user
requested.

Long story short: this was broken.  It may be that when the dust
settles we can try re-enabling this on Windows.   It might work now
that this issue is (hopefully) fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 1:00 PM, Peter Geoghegan p...@heroku.com wrote:
 On Thu, Jan 22, 2015 at 9:55 AM, Robert Haas robertmh...@gmail.com wrote:
 Stay tuned for more exciting dispatches from the department of abbreviated 
 keys!

 I certainly suspected that we'd have problems with Windows, but
 nothing this bad.

This isn't really Windows-specific.  The root of the problem is that
when LC_COLLATE=C you were trying to use strxfrm() for the abbreviated
key even though memcmp() is the authoritative comparator in that case.
Exactly which platforms happened to blow up as a result of that is
kind of beside the point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote:
 Long story short: this was broken.  It may be that when the dust
 settles we can try re-enabling this on Windows.   It might work now
 that this issue is (hopefully) fixed.

Uggh.  That still wasn't right; I've pushed commit
d060e07fa919e0eb681e2fa2cfbe63d6c40eb2cf to try to fix it.

Stay tuned for more exciting dispatches from the department of abbreviated keys!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Peter Geoghegan
On Thu, Jan 22, 2015 at 9:55 AM, Robert Haas robertmh...@gmail.com wrote:
 Stay tuned for more exciting dispatches from the department of abbreviated 
 keys!

I certainly suspected that we'd have problems with Windows, but
nothing this bad.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers