Re: [HACKERS] Audit of logout

2014-07-27 Thread Amit Kapila
On Thu, Jul 3, 2014 at 1:13 AM, Fujii Masao  wrote:
> On Wed, Jul 2, 2014 at 11:39 PM, Joe Conway  wrote:
>
> No. If we change it to PGC_SIGHUP, SHOW command does display
> the changed value after a reload. It's the same behavior as other
> PGC_SIGHUP parameters do. Attached patch just changes it to PGC_SIGHUP.
> You can test that by using the patch.

As this patch is marked as Needs Review, so I went ahead and
picked up for review, however after reading mail chain, it seems to
me that there is a general inclination to have a new category in
GucContext for this feature.  I don't see the patch implementing the
same in this thread, so I think it is better to move it to next CF
(2014-08).

Whats your opinion?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-07-27 Thread Thomas Munro
On 27 July 2014 14:31, David Rowley  wrote:
> On Sun, Jul 27, 2014 at 4:49 AM, Thomas Munro  wrote:
>>
>> Here is a new version of the patch with a single enum LockWaitPolicy
>> defined in utils/lockwaitpolicy.h.
>>
>
> That seems much cleaner
>
> A few more comments:
> You seem to have lost the comment which indicates that the values of the
> enum are important due to the code in applyLockingClause(), but I see now
> instead that you've added some assert checks in applyLockingClause(), likely
> these should use Assert() rather than StaticAssertExpr().

Here's a new version with explicit numerical values and a comment in
lockwaitpolicy.h to explain that the order is important and point to
the relevant code.  The assertions are about the relationship between
constant values known at compile time, so why would we want a runtime
assertion?  I have changed it from StaticAssertExpr to
StaticAssertStmt though.

> I've also been looking at the isolation tests and I see that you've added a
> series of tests for NOWAIT. I was wondering why you did that as that's
> really existing code, probably if you thought the tests were a bit thin
> around NOWAIT then maybe that should be a separate patch?

Since I was meddling with code that controls both NOWAIT and SKIP
LOCKED, I wanted to convince myself that I had not broken NOWAIT using
at least a basic smoke test .  I suppose by the same logic I should
also wite isolation tests for default blocking FOR UPDATE...  Ok, I've
taken nowait.spec out for now.

On the subject of isolation tests, I think skip-locked.spec is only
producing schedules that reach third of the three 'return
HeapTupleWouldBlock' statements in heap_lock_tuple.  I will follow up
with some more thorough isolation tests in the next week or so to
cover the other two, and some other scenarios and interactions with
other feature.

> In ExecLockRows(), is there a need to define the wait_policy variable now?
> It's just used once so you could probably just pass erm->waitPolicy directly
> to heap_lock_tuple().

Fixed.

> I'm a bit confused at some of the code in heap_lock_tuple(). If I'm not
> wrong then after the line that does have_tuple_lock = true; it's never
> possible for have_tuple_lock to be false, but I see you've added checks to
> ensure we only unlock if have_tuple_lock is true. I'm thinking you probably
> did this because in the goto failed situation the check is done, but I was
> thinking that was perhaps put there in case a goto jump was added before
> have_tuple_lock is set to true. I'm wondering if it would be ok just to
> replace the test with an Assert() instead, or maybe just no check.

Right, I have removed the redundant conditionals.

> Also, I'm just looking at some of the changes that you've done to function
> signatures... I see quite a few of them are now beyond 80 chars wide (see
> http://www.postgresql.org/docs/devel/static/source-format.html).

Fixed.

Best regards,
Thomas Munro
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 231dc6a..0469705 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -45,7 +45,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressioncount | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
 [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
-[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT ] [...] ]
+[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
 
@@ -1283,7 +1283,7 @@ FETCH { FIRST | NEXT } [ count ] {
 The locking clause has the general form
 
 
-FOR lock_strength [ OF table_name [, ...] ] [ NOWAIT ]
+FOR lock_strength [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ]
 
 
 where lock_strength can be one of
@@ -1359,11 +1359,17 @@ KEY SHARE
 

 To prevent the operation from waiting for other transactions to commit,
-use the NOWAIT option.  With NOWAIT, the statement
-reports an error, rather than waiting, if a selected row
-cannot be locked immediately.  Note that NOWAIT applies only
-to the row-level lock(s) — the required ROW SHARE
-table-level lock is still taken in the ordinary way (see
+use either the NOWAIT or SKIP LOCKED
+option.  With NOWAIT, the statement reports an error, rather
+than waiting, if a selected row cannot be locked immediately.
+With SKIP LOCKED, any selected rows that cannot be
+immediately locked are skipped.  Skipping locked rows provides an
+inconsistent view of the data, so this is not suitable for general purpose
+work, but can be used to avoid lock contention with multiple consumers
+accessing a queue-like table.  Note that NOWAIT
+and SKIP LOCKED apply only to the row-level lock(s)
+— the required ROW SHARE table-level lock is
+still taken in the ordinary way (see
 ).  You can use
 
 with the NOWAIT option first,
@@ -1386,14 +1392,14 @@ KEY

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-27 Thread Peter Geoghegan
On Sun, Jul 27, 2014 at 8:23 AM, Greg Stark  wrote:
> I haven't looked yet. Can you describe what exactly the AC_TRY_RUN is
> testing for?

It's more or less testing for a primary weight level (i.e. the first
part of the blob) that is no larger than the original characters of
the string, and has no "header bytes" or other redundancies.  It also
matches secondary and subsequently weight levels to ensure that they
match, since the two stings tested have identical case, use of
diacritics, etc (they're both lowercase ASCII-safe strings). I don't
set a locale, but that shouldn't matter. I have good reason to believe
that many strxfrm() implementations behave this way, based on the
Unicode standard, and some investigation. Still, that is something
that can be more formally verified as long as we're not trusting of
strxfrm() generally rather than just discriminating against Mac OS X
specifically. I think that the Mac OS X implementation is an anomaly
(I haven't really looked into why), and the FreeBSD one just isn't
very good. But even the FreeBSD one appears to append primary weights
(only) to the blob it returns, and so is essentially the same for my
purposes [1].

[1] http://lists.freebsd.org/pipermail/freebsd-current/2003-April/001273.html
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 pg_control corruption

2014-07-27 Thread Josh Berkus
On 07/27/2014 09:35 AM, Tom Lane wrote:
> =?utf-8?B?5p2O5rW36b6Z?=  writes:
>> I have a PostgreSQL datadir named /export/pg94beta1_data/ which was 
>> initialized with PostgreSQL 9.4beta1,
>> [ and 9.4beta2 won't start with it ]
> 
> This is expected; you need to initdb.  Or use pg_upgrade to upgrade
> the cluster.  We had to change pg_control format post-beta1.

Thank you for testing that though!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 pg_control corruption

2014-07-27 Thread 李海龙


Understand!

Before I wrote last email, I had initialized a new db with PostgreSQL 
9.4beta2 and restored the pg_dumpall data of /export/pg94beta1_data/


Thanks

Best Regards!

at 2014-07-28 00:35 +08, Tom Lane wrote:
> =?utf-8?B?5p2O5rW36b6Z?=  writes:
>> I have a PostgreSQL datadir named /export/pg94beta1_data/ which was
>> initialized with PostgreSQL 9.4beta1,
>> [ and 9.4beta2 won't start with it ]
> This is expected; you need to initdb.  Or use pg_upgrade to upgrade
> the cluster.  We had to change pg_control format post-beta1.
>
>   regards, tom lane
>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 pg_control corruption

2014-07-27 Thread Tom Lane
=?utf-8?B?5p2O5rW36b6Z?=  writes:
> I have a PostgreSQL datadir named /export/pg94beta1_data/ which was 
> initialized with PostgreSQL 9.4beta1,
> [ and 9.4beta2 won't start with it ]

This is expected; you need to initdb.  Or use pg_upgrade to upgrade
the cluster.  We had to change pg_control format post-beta1.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_loop_count() fails to ignore RELOPT_DEADREL rels

2014-07-27 Thread Tom Lane
David Rowley  writes:
> On Sun, Jul 27, 2014 at 2:35 AM, Tom Lane  wrote:
>> That patch is entirely bogus.  What you should be asking is why
>> get_loop_count is being applied to a relation that's supposedly been
>> removed from the query.

> hmm ok. After further investigation it seems that this is down to the
> EquivalenceClass still containing references to the dead rel. What seems to
> be happening is match_eclass_clauses_to_index() is calling
> generate_implied_equalities_for_column() which is generating RestrictInfo
> clauses which make reference to the dead relation.

Hm.  Maybe we need to improve the join removal code so that it cleans out
the eclasses better?  But, as you say, it's not clear why it's not failing
for the existing left-join cases if that's the problem.  You should
probably spend a bit of time to understand exactly how the difference is
arising.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] building pdfs

2014-07-27 Thread Tom Lane
Andrew Dunstan  writes:
> On 07/26/2014 06:44 PM, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> Yes, I did that and generated a PDF, but I got an enormous number of
>>> errors or warnings. See
>>>  for example.

>> If they're things like "overfull hbox" from the TeX step, they're
>> expected.

> That's rather sad. How would we find out that something has actually 
> gone wrong, short of it failing to write a PDF altogether? Searching 
> through 204,000 lines of output doesn't sound like fun.

I've always assumed that if there were something seriously wrong, it
*would* fail to generate a PDF.  It certainly does so when we hit
things like the link-crosses-a-page-boundary restriction.

> There are lots of these:
> Package Fancyhdr Warning: \fancyhead's `E' option without twoside
> option is use
> less on input line 83877.

Well, if you'd like to do the work to figure out a way to suppress
that, more power to you.

> and quite a few overfull hboxes that are more than 72pt too wide.

I can't see that we'd ever invest the effort to get rid of those.

Personally I find the PDF docs to be an anachronism: surely nobody
is printing them on dead trees any more, and for on-computer usage,
what do they offer that the HTML format doesn't?  So I'm unexcited
about making them slightly prettier.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-27 Thread Greg Stark
On Sun, Jul 27, 2014 at 8:00 AM, Peter Geoghegan  wrote:
> What may be of more interest to reviewers is the revised AC_TRY_RUN
> test program that "configure" consults.

I haven't looked yet. Can you describe what exactly the AC_TRY_RUN is
testing for?

If it's just whether the system supports strxfrm or UTF-8 at all that
might be ok. If it's detailed behaviour of the locales then that's a
problem since that could vary the platform the code is eventually run
on on compared to the one it's built on.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 pg_control corruption

2014-07-27 Thread 李海龙

Hi,dear steven && pgsql-hackers

I've encountered the similar phenonmenon with 9.4 .



1.  environment

1.1 OS version

postgres@lhl-Latitude-E5420:~$ cat /etc/issue
Ubuntu 13.10 \n \l

postgres@lhl-Latitude-E5420:~$ uname -av
Linux lhl-Latitude-E5420 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 
16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux


1.2 PostgreSQL version


postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_controldata --version
pg_controldata (PostgreSQL) 9.4beta2
postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_config
BINDIR = /opt/pg94/bin
DOCDIR = /opt/pg94/share/doc/postgresql
HTMLDIR = /opt/pg94/share/doc/postgresql
INCLUDEDIR = /opt/pg94/include
PKGINCLUDEDIR = /opt/pg94/include/postgresql
INCLUDEDIR-SERVER = /opt/pg94/include/postgresql/server
LIBDIR = /opt/pg94/lib
PKGLIBDIR = /opt/pg94/lib/postgresql
LOCALEDIR = /opt/pg94/share/locale
MANDIR = /opt/pg94/share/man
SHAREDIR = /opt/pg94/share/postgresql
SYSCONFDIR = /opt/pg94/etc/postgresql
PGXS = /opt/pg94/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/pg94' '--with-perl' '--with-libxml' 
'--with-libxslt' '--with-ossp-uuid'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed 
-Wl,-rpath,'/opt/pg94/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lxslt -lxml2 -lz -lreadline -lrt -lcrypt 
-ldl -lm
VERSION = PostgreSQL 9.4beta2


2.  phenonmenon

I have a PostgreSQL datadir named /export/pg94beta1_data/ which was 
initialized with PostgreSQL 9.4beta1,



postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_controldata 
/export/pg94beta1_data/
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting.  The results below are untrustworthy.

pg_control version number:937
Catalog version number:   201405111
Database system identifier:   6014427290583411360
Database cluster state:   in production
pg_control last modified: 2014年07月27日 星期日 16时36分50秒
Latest checkpoint location:   0/17462890
Prior checkpoint location:0/17462828
Latest checkpoint's REDO location:0/17462890
Latest checkpoint's REDO WAL file:00010017
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: off
Latest checkpoint's NextXID:  0/1387
Latest checkpoint's NextOID:  0
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:715
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:2014年07月27日 星期日 16时36分50秒
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
Current wal_level setting:minimal
Current wal_log_hints setting:off
Current max_connections setting:  100
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 65793
Date/time type storage:   floating-point numbers
Float4 argument passing:  by reference
Float8 argument passing:  by reference
Data page checksum version:   307500851



but the server complained about the following when I started it with 
PostgreSQL 9.4beta2,

postgres@lhl-Latitude-E5420:~$  /opt/pg94/bin/pg_ctl -D 
/export/pg94beta1_data/ start
server starting
postgres@lhl-Latitude-E5420:~$ [2014-07-27 19:23:57.922 CST 27983 
53d4e14d.6d4f 1 0]FATAL:  database files are incompatible with server
[2014-07-27 19:23:57.922 CST 27983 53d4e14d.6d4f 2 0]DETAIL: The 
database cluster was initialized with PG_CONTROL_VERSION 937, but the 
server was compiled with PG_CONTROL_VERSION 942.
[2014-07-27 19:23:57.922 CST 27983 53d4e14d.6d4f 3 0]HINT:  It looks 
like you need to initdb.




I always think that it should not come up the PG_CONTROL_VERSION 
mismatch when the PostgreSQL version upgrade between the small version .

Is there some important differences in Postgr

Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-07-27 Thread Stephen Frost
Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > > > Of course, we handle this in 'GRANT' with 'GRANT ON ALL TABLES', so why
> > > > not 'ALTER TABLE ON ALL TABLES IN TABLESPACE '?  that does get
> > > > pretty darn verbose but is at least a bit more in-line with what we have
> > > > done before..
> > > 
> > > That's not a bad line of thought --- I doubt that verbosity is critical
> > > here.
> > 
> > Alright, sounds like this is more-or-less the concensus.  I'll see about
> > making it happen shortly.
> 
> Were you able to work on this?

Apologies, I've been gone more-or-less all of July.  I'm back now and
have time to spend on this.

> Can you be more specific on the exact grammar you're considering?  The
> proposal above,
> ALTER TABLE ON ALL TABLES IN TABLESPACE xyz
> doesn't seem very good to me.  I would think it'd be more like
> ALTER ALL TABLES IN TABLESPACE xyz
> but then if you return ALTER TABLE as a command tag that might be a bit
> strange.  Maybe
> ALTER TABLE ALL IN TABLESPACE xyz
> which AFAICS should work since ALL is already a reserved keyword.

Interesting idea.  I wonder if we might apply that to other capabilities
of ALTER TABLE too..  That is, make it a generic way to specify that all
tables in the tablespace (or schema..?) should be modified in a certain
way.  We'd have to consider which of the ALTER TABLE operations this
would work for, of course.  I don't believe it'd make sense for any of
the 'ADD/DROP COLUMN' or similar commands, but 'OWNER TO' would make
sense...

I'm not sure how much we want to work this over during beta though..  My
original thought was just to adjust the grammar and hopefully minimize
the other changes in this area, but it'd be good to have an idea about
where we want to take this, so the grammar can support the future
capabilities while not actually adding them at this time.

> Also, how would we document this?  Would we have it in the same page as
> all the ALTER TABLE variants, or would we create a separate page for
> ALTER TABLE ALL?  Keeping in mind that in the future we might want to
> allow things such as ALTER TABLE ALL IN SCHEMA xyz it might be better to
> have the selection logic documented neatly in its own little page
> instead of together with the ALTER TABLE mess which is already rather
> large.

I would think we'd simply use the ALTER TABLE page as I don't think the
documentation of this would be all that difficult to describe in a
reasonable paragraph.  A page dedicated to it feels like overkill, but
perhaps that might change as we add more.

Other thoughts on this?  We should really get any of the changes we're
doing here done soon.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-07-27 Thread David Rowley
On Sun, Jul 27, 2014 at 4:49 AM, Thomas Munro  wrote:

> Here is a new version of the patch with a single enum LockWaitPolicy
> defined in utils/lockwaitpolicy.h.
>
>
That seems much cleaner

A few more comments:
You seem to have lost the comment which indicates that the values of the
enum are important due to the code in applyLockingClause(), but I see now
instead that you've added some assert checks in applyLockingClause(),
likely these should use Assert() rather than StaticAssertExpr().

I've also been looking at the isolation tests and I see that you've added a
series of tests for NOWAIT. I was wondering why you did that as that's
really existing code, probably if you thought the tests were a bit thin
around NOWAIT then maybe that should be a separate patch?

In ExecLockRows(), is there a need to define the wait_policy variable now?
It's just used once so you could probably just pass erm->waitPolicy
directly to heap_lock_tuple().

I'm a bit confused at some of the code in heap_lock_tuple(). If I'm not
wrong then after the line that does have_tuple_lock = true; it's never
possible for have_tuple_lock to be false, but I see you've added checks to
ensure we only unlock if have_tuple_lock is true. I'm thinking you probably
did this because in the goto failed situation the check is done, but I was
thinking that was perhaps put there in case a goto jump was added before
have_tuple_lock is set to true. I'm wondering if it would be ok just to
replace the test with an Assert() instead, or maybe just no check.

Also, I'm just looking at some of the changes that you've done to function
signatures... I see quite a few of them are now beyond 80 chars wide (see
http://www.postgresql.org/docs/devel/static/source-format.html).

Regards

David Rowley


Re: [HACKERS] get_loop_count() fails to ignore RELOPT_DEADREL rels

2014-07-27 Thread David Rowley
On Sun, Jul 27, 2014 at 2:35 AM, Tom Lane  wrote:

> David Rowley  writes:
> > In order to get my patch working with an Assert enabled build I've had to
> > apply the attached patch.
>
> That patch is entirely bogus.  What you should be asking is why
> get_loop_count is being applied to a relation that's supposedly been
> removed from the query.  It should only get applied to rels that are
> required outer rels for a parameterized path, and thus certainly
> not dead.
>
>
hmm ok. After further investigation it seems that this is down to the
EquivalenceClass still containing references to the dead rel. What seems to
be happening is match_eclass_clauses_to_index() is calling
generate_implied_equalities_for_column() which is generating RestrictInfo
clauses which make reference to the dead relation.

With the existing left join removal code, in all the test cases I've thrown
at it, match_eclass_clauses_to_index() seems to early exit on if
(!index->rel->has_eclass_joins), but this is not the case when I'm removing
SEMI and ANTI joins. So likely this is something I'll have to tackle in
that patch, perhaps by stripping out the equivalence class members which
belong to the newly dead rel in remove_rel_from_query().

Apologies for the noise

Regards

David Rowley


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-27 Thread Peter Geoghegan
On Thu, Jun 12, 2014 at 2:09 PM, Peter Geoghegan  wrote:
> Thanks for looking into this.

Is anyone going to look at this?

I attach a new revision. The only real change to the code is that I
fixed an open item concerning what to do on WIN32 with UTF-8, where
the UTF-16 hacks that we do there cannot be worked around to make the
optimization still work, and yet we're still obligated to set up a
sort support state since there is a cataloged sort support function
(unfortunately I wasn't able to test it on Windows since I no longer
own a Windows machine, but it should be fine). I set up a comparison
shim within varlena.c for that case only. This is a kludge, but I
decided that it was better than preparing the sort support
infrastructure to not get a sane sort support state from all opclasses
just because of some platform-specific issue. I think that's justified
by the fact that it's very unlikely to happen again. text is a
datatype that is unusually tied to the operating system. This does
necessitate having the sort support struct record which sort operator
was originally used to lookup the sort support function, but it seems
reasonable to store that in all instances.

What may be of more interest to reviewers is the revised AC_TRY_RUN
test program that "configure" consults. While the code is unchanged,
there is now a detailed rationale for why it's reasonable to expect
that a significant amount of entropy will be concentrated in the first
8 bytes of a strxfrm() blob, with reference to how those algorithms
are more or less required to behave by the Unicode consortium. The
basic reason is that the algorithm for building binary sort keys
(described by a Unicode standard [1] defining the behavior of Unicode
collation algorithms, and implemented by strxfrm()) specifically works
by storing primary level, secondary level and tertiary level weights
in turn in the returned blob. There may be additional levels, too. As
the standard says:

"""
The primary level (L1) is for the basic sorting of the text, and the
non-primary levels (L2..Ln) are for adjusting string weights for other
linguistic elements in the writing system that are important to users
in ordering, but less important than the order of the basic sorting.
"""

Robert pointed out a case where varying character case of an English
word did not alter the primary level bytes (and thus the poor man's
normalized key was the same). He acknowledged that most of the entropy
of the first 8 bytes of the string went into the first 8 bytes of the
blob/key. This can actually be very useful to the optimization in some
cases. In particular, with most Latin alphabets you'll see the same
pattern when diacritics are used. This means that even though the
original string has (say) an accented character that would take 2
bytes to store in UTF-8, the weight in the primary level is the same
as an equivalent unaccented character (and so only takes one byte to
store at that level, with differences only in subsequent levels).
Whole strxfrm() blobs are the same length regardless of how many
accents appear in otherwise equivalent original Latin string, and you
get exactly the same high concentrations of entropy in the first 8
bytes in pretty much all Latin alphabets (the *absence* of diacritics
is stored in later weight levels too, even with the "en_US.UTF-8"
collation). As the standard says:

"""
By default, the algorithm makes use of three fully-customizable
levels. For the Latin script, these levels correspond roughly to:

alphabetic ordering
diacritic ordering
case ordering.

A final level may be used for tie-breaking between strings not
otherwise distinguished.
"""

In practice a huge majority of comparisons are expected to be resolved
at the primary weight level (in the case of a straight-up sort of
complete, all-unique strxfrm() blobs), which at least in the case of
glibc appear to require (at most) as many bytes to store as an
original UTF-8 string did. Performance of a sort using a
strcoll()-based collation could be *faster* than the "C" location with
this patch if there were plenty of Latin characters with diacritics.
The diacritics would effectively be removed from the poor man's
normalized key representations, and would only be considered when a
tie-breaker is required.

[1] http://www.unicode.org/reports/tr10/
-- 
Peter Geoghegan


poorman-hll-win32.2014_07_26.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers