Re: [HACKERS] condition variables

2016-08-14 Thread Thomas Munro
On Sun, Aug 14, 2016 at 9:04 AM, Thomas Munro
 wrote:
> On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas  wrote:
>> [condition-variable-v1.patch]
>
> Don't you need to set proc->cvSleeping = false in ConditionVariableSignal?

I poked at this a bit... OK, a lot... and have some feedback:

1.  As above, we need to clear cvSleeping before setting the latch.

2.  The following schedule corrupts the waitlist by trying to delete
something from it that isn't in it:

P1: ConditionVariablePrepareToSleep: push self onto waitlist
P2:   ConditionVariableSignal: pop P1 from waitlist
P1: 
P3: ConditionVariablePrepareToSleep: push self onto waitlist
P1: ConditionVariableCancelSleep: delete self from waitlist (!)

Without P3 coming along you probably wouldn't notice because the
waitlist will be happily cleared and look valid, but P3's entry gets
lost and then it hangs forever.

One solution is to teach ConditionVariableCancelSleep to check if
we're still actually there first.  That can be done in O(1) time by
clearing nodes' next and prev pointers when deleting, so that we can
rely on that in a new proclist_contains() function.  See attached.

3.  The following schedule corrupts the waitlist by trying to insert
something into it that is already in it:

P1: ConditionVariablePrepareToSleep: push self onto waitlist
P1: 
P1: ConditionVariableSleep
P1: ConditionVariablePrepareToSleep: push self onto waitlist (!)

Nodes before and after self's pre-existing position can be forgotten
when self's node is pushed to the front of the list.  That can be
fixed by making ConditionVariablePrepareToSleep also check if we're
already in the list.

4.  The waitlist is handled LIFO-ly.  Would it be better for the first
guy to start waiting to be woken up first, like we do for locks?  The
Pthreads equivalent says that it depends on "scheduling policy".  I
don't know if it matters much, just an observation.

5.  The new proclist function you added is the first to work in terms
of PGPROC* rather than procno.  Maybe the whole interface should work
with either PGPROC pointers or procnos?  No strong view.

Please find attached a -v2 of your patch which includes suggestions
1-3 above.  Like the -v1, it applies on top of
lwlocks-in-dsm-v3.patch.  Also, I have attached a v2->v3 diff to show
just my proposed changes.

-- 
Thomas Munro
http://www.enterprisedb.com


diff.patch
Description: Binary data


condition-variable-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-14 Thread Andres Freund
On 2016-08-14 21:04:57 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-08-07 14:46:06 -0400, Tom Lane wrote:
> >> Robert Haas  writes:
> >>> I think the whole idea of a fast temporary table is that there are no
> >>> catalog entries.  If there are no catalog entries, then dependencies
> >>> are not visible.  If there ARE catalog entries, to what do they refer?
> >>> Without a pg_class entry for the table, there's no table OID upon
> >>> which to depend.
> 
> >> TBH, I think that the chances of such a design getting committed are
> >> not distinguishable from zero.  Tables have to have OIDs; there is just
> >> too much code that assumes that.  And I seriously doubt that it will
> >> work (for any large value of "work") without catalog entries.
> 
> > That seems a bit too defeatist.
> 
> Huh?  I didn't say we shouldn't work on the problem --- I just think that
> this particular approach isn't good.  Which you seemed to agree with.

I took your statement to mean that they need a pg_class entry - even if
there were a partial solution to the pg_depend problem allowing to avoid
pg_attribute entries, tha't still not really be a solution. If that's
not what you mean, sorry - and nice that we agree ;)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-14 Thread Tom Lane
Andres Freund  writes:
> On 2016-08-07 14:46:06 -0400, Tom Lane wrote:
>> Robert Haas  writes:
>>> I think the whole idea of a fast temporary table is that there are no
>>> catalog entries.  If there are no catalog entries, then dependencies
>>> are not visible.  If there ARE catalog entries, to what do they refer?
>>> Without a pg_class entry for the table, there's no table OID upon
>>> which to depend.

>> TBH, I think that the chances of such a design getting committed are
>> not distinguishable from zero.  Tables have to have OIDs; there is just
>> too much code that assumes that.  And I seriously doubt that it will
>> work (for any large value of "work") without catalog entries.

> That seems a bit too defeatist.

Huh?  I didn't say we shouldn't work on the problem --- I just think that
this particular approach isn't good.  Which you seemed to agree with.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-14 Thread Andres Freund
On 2016-08-07 14:46:06 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > I think the whole idea of a fast temporary table is that there are no
> > catalog entries.  If there are no catalog entries, then dependencies
> > are not visible.  If there ARE catalog entries, to what do they refer?
> >  Without a pg_class entry for the table, there's no table OID upon
> > which to depend.
>
> TBH, I think that the chances of such a design getting committed are
> not distinguishable from zero.  Tables have to have OIDs; there is just
> too much code that assumes that.  And I seriously doubt that it will
> work (for any large value of "work") without catalog entries.

That seems a bit too defeatist. It's obviously not a small change to get
there - and I don't think the patch upthread is really attacking the
relevant problems yet - but saying that we'll never have temp tables
without pg_class/pg_depend bloat seems to be pretty close to just giving
up.  Having 8 byte oids (as explicit columns instead of magic? Or just
oid64?) and then reserving ranges for temp objects stored in a local
memory seems to be feasible.  The pinning problem could potentially be
solved by "session lifetime" pins in pg_depend, which prevents dependent
objects being dropped.  Obviously that's just spitballing; but I think
the problem is too big to just give up.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Improve handling of ^C in psql / top-level transaction abort

2016-08-14 Thread Jim Nasby
I recently had a client contact me thinking that a CREATE MATERIALIZED 
VIEW had somehow managed to keep running in the background after being 
^C'd in psql. Turns out this confusion was because their monitoring 
eventually complained about the still open (but now aborted) 
transaction. The user didn't realize that ^C of a command in an open 
transaction would still leave the transaction open.


I'm wondering if there's some way to improve this. One idea is if 
TBLOCK_INPROGRESS could also do some of what CleanupTransaction() does 
but still leave the status in TBLOCK_ABORT. In particular, release 
snapshots. Since there's no substransaction, there's no possibility of 
running further commands. Is there risk to that with portals still open?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-14 Thread Andreas Joseph Krogh
På søndag 07. august 2016 kl. 18:35:56, skrev Tom Lane >:
Dean Rasheed  writes:
 > On 5 August 2016 at 21:48, Tom Lane  wrote:
 >> OK, thanks.  What shall we do about Andreas' request to back-patch this?
 >> I'm personally willing to do it, but there is the old bugaboo of "maybe
 >> it will destabilize a plan that someone is happy with".

 > My inclination would be to back-patch it because arguably it's a
 > bug-fix -- at the very least the old behaviour didn't match the docs
 > for stadistinct:

 Yeah.  I suspect that situations like this are pretty rare, or we'd have
 recognized the problem sooner.  And at least for Andreas, it'd be fixing
 a real problem.  I'll apply the back-patch unless I hear objections in
 the next couple of hours.

 > Additionally, I think that example is misleading because it's only
 > really true if there are no null values in the column. Perhaps it
 > would help to have a more explicit example to illustrate how nulls
 > affect stadistinct, for example:

 Good idea, will do.

 regards, tom lane
 
 
(Sorry for the HTML-formatting, I'm using a web-based client which isn't very 
good at converting the HTML to plain-text)
 
Just wanted to thank you for fixing this.
 
Take this query (pg-9.5):
 
explain analyze SELECT com.entity_id AS company_id , sum((coalesce(log
.adjusted_time,log.duration) / (3600 * 1000)::numeric) * coalesce(log
.adjusted_hourly_rate,log.hourly_rate)) as not_invoiced_hours_amount_adjusted , 
sum((log.duration / (3600 * 1000)::numeric) * log.hourly_rate) as 
not_invoiced_hours_amount_original ,sum(coalesce(log.adjusted_time, log
.duration))AS not_invoiced_hours_duration_adjusted , sum(log.duration) AS 
not_invoiced_hours_duration_originalFROM onp_crm_relation com JOIN 
onp_crm_activity_loglog ON com.entity_id = log.relation_id JOIN onp_crm_person 
logforON logfor.onp_user_id = log.logged_for AND logfor.is_resource = FALSE  
LEFT OUTER JOIN( onp_crm_activity act INNER JOIN onp_crm_project proj ON 
act.project_id = proj.id )ON log.activity_id = act.id WHERE 1 = 1 AND log
.entry_type ='TIMESHEET_ENTRY' AND log.start_date IS NOT NULL  AND log
.is_chargable =TRUE  AND log.status IN (1,2,3,5) AND log.start_date <= 
'2016-06-27' -- AND com.entity_id = 2246068 AND NOT EXISTS( SELECT * FROM 
onp_crm_calendarentry_invoice_membership cemJOIN onp_crm_invoice_line il ON 
cem.invoice_line_id = il.idJOIN onp_crm_invoice inv ON il.invoice_id = 
inv.entity_idWHERE cem.calendar_entry_id = log.id AND inv.status_key = 
'INVOICE_STATUS_INVOICED' AND inv.sent_date <= '2016-06-27' AND NOT EXISTS( 
SELECT* FROM onp_crm_invoice creditnote WHERE il.invoice_id = 
creditnote.credit_againstAND creditnote.status_key = 'INVOICE_STATUS_INVOICED' 
ANDcreditnote.sent_date <= '2016-06-27' ) ) GROUP BY com.entity_id; 
 
Before the fix, the plan was like this:
 
QUERY PLAN 
---
HashAggregate (cost=13874.94..13894.38 rows=972 width=32) (actual time
=420188.465..420189.232rows=462 loops=1) Group Key: com.entity_id -> Nested 
Loop Left Join(cost=1710.21..13848.21 rows=972 width=32) (actual time
=1004.858..420166.736rows=3882 loops=1) -> Nested Loop Anti Join (cost
=1709.51..12990.02rows=972 width=36) (actual time=1003.001..419356.749 rows=3882
loops=1) Join Filter: (cem.calendar_entry_id = log.id) Rows Removed by Join 
Filter: 3996874113 -> Hash Join (cost=1172.69..10612.53 rows=972 width=44) 
(actualtime=29.085..333.323 rows=89431 loops=1) Hash Cond: (log.relation_id = 
com.entity_id) ->Hash Join (cost=1063.39..10489.85 rows=976 width=40) (actual 
time=27.598..261.822 rows=89431 loops=1) Hash Cond: (log.logged_for = 
logfor.onp_user_id) -> Bitmap Heap Scanon onp_crm_activity_log log (cost
=976.83..10055.99rows=90010 width=44) (actual time=27.402..223.407 rows=89486 
loops=1) Recheck Cond: ((status = ANY ('{1,2,3,5}'::integer[])) AND is_chargable
AND(start_date IS NOT NULL) AND ((entry_type)::text = 'TIMESHEET_ENTRY'::text)) 
Filter: (start_date <= '2016-06-27'::date) Rows Removed by Filter: 991 Heap 
Blocks: exact=6095 -> Bitmap Index Scan on origo_calendar_entry_status_1_idx (
cost=0.00..954.33 rows=93008 width=0) (actual time=26.339..26.339 rows=101274 
loops=1) Index Cond: (status = ANY ('{1,2,3,5}'::integer[])) -> Hash (cost
=39.46..39.46rows=3768 width=8) (actual time=0.148..0.148 rows=36 loops=1) 
Buckets:4096 Batches: 1 Memory Usage: 34kB -> Bitmap Heap Scan on 
onp_crm_person logfor (cost=3.69..39.46 rows=3768 width=8) (actual time
=0.049..0.137rows=36 loops=1) Recheck Cond: (onp_user_id IS NOT NULL) Filter: (
NOTis_resource) Rows Removed by Filter: 5 Heap Blocks: exact=11 -> Bitmap Index 
Scanon onp_crm_person_onp_id_idx (cost=0.00..2.75 rows=41 width=0) 

[HACKERS] Why --backup-and-modify-in-place in perltidy config?

2016-08-14 Thread Tom Lane
I did a trial run following the current pgindent README procedure, and
noticed that the perltidy step left me with a pile of '.bak' files
littering the entire tree.  This seems like a pretty bad idea because
a naive "git add ." would have committed them.  It's evidently because
src/tools/pgindent/perltidyrc includes --backup-and-modify-in-place.
Is there a good reason for that, and if so what is it?

Also, is there a reason why the perltidy invocation command hasn't
been packaged into a shell script, rather than expecting the committer
to copy-and-paste a rather large string?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Small patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-14 Thread Ryan Murphy
Hello Postgres Hackers!

I sent this already a few hours ago but it got blocked because I had not
yet joined the mailing list.  Trying again, sorry for any redundancy or
confusion!

This is to fix an issue that came up for me when running initdb.

At the end of a successful initdb it says:

Success. You can now start the database server using:
pg_ctl -D /some/path/to/data -l logfile start

but this command did not work for me because my data directory
contained a space.  The src/bin/initdb/initdb.c source code
did already have a QUOTE_PATH constant, but it was the empty
string for non-windows cases.

Therefore, added quotes via existing QUOTE_PATH constant:

pg_ctl -D '/some/path/to/data' -l logfile start

Best,
Ryan


0001-initdb-for-QUOTE_PATH-non-windows.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FREEZE and PD_ALL_VISIBLE

2016-08-14 Thread Jeff Janes
On Tue, Nov 3, 2015 at 6:37 AM, Simon Riggs  wrote:
> On 3 November 2015 at 15:23, Amit Kapila  wrote:
>>
>> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs 
>> wrote:
>>>
>>> On 21 October 2015 at 13:31, Jeff Janes  wrote:
>>>
 Index-only scans will visit the heap for each tuple until the first
 VACUUM is done.

 The first vacuum will read the entire table, but not need to write it
 anymore.  And will create the _vm file.

 I think we really want to create _vm file as well as set PD_ALL_VISIBLE,
 but I don't know the best way to do that.  Set a flag somewhere and then
 create it in bulk at the end of the transaction?  Set it bit by bit as the
 pages are extended and initialized?
>>>
>>>
>>> Easy enough to do it at the end of the COPY FREEZE in one step.
>>
>>
>> Here, we might want to consider that setting bit in visibility map
>> will generate WAL log whereas Copy Freeze otherwise skip WAL
>> when wal_level is less than archive.  This can lead to extra disk
>> writes which can slow down Copy Freeze, but OTOH that might
>> be acceptable.
>
>
> I'm building the map as I go, in the latest version of this patch I'm
> working on.

Hi Simon,

Is this still on your radar?  If you would like someone else to pick
it up, can you post the WIP patch you have?

Thanks,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [NOVICE] numeric data type upper limit.

2016-08-14 Thread Tom Lane
Aravind Kumar  writes:
> when I do
>  select 1.0e+1001::numeric;
> I get
> invalid input syntax for type numeric: "1.0e+1001"

> Postgresql documentation tells me that numeric data type has an upper
> limit of 131072 digits before decimal point.

You can successfully do
select pow(10::numeric, 131071);
or
select ('1' || repeat('0', 131071))::numeric;
so it seems pretty arbitrary that there's such a small limit on the
allowed exponent value.  This is coming from this bit in numeric.c's
set_var_from_str():

if (exponent > NUMERIC_MAX_PRECISION ||
exponent < -NUMERIC_MAX_PRECISION)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 errmsg("invalid input syntax for type numeric: \"%s\"",
str)));

which I think is just brain fade.  We should probably change that so that
it only restricts the exponent enough to prevent integer overflow in the
weight calculations, and leave it to make_result() to decide whether the
value is out of storage range.  (I notice it's not bothering to check for
an overflow signal from strtol(), either :-(.  Also it seems like this
should be an "overflow" errcode not an "invalid syntax" one.)

Poking around for any other possible misuses of NUMERIC_MAX_PRECISION,
I came across this bit in numeric_recv():

len = (uint16) pq_getmsgint(buf, sizeof(uint16));
if (len < 0 || len > NUMERIC_MAX_PRECISION + NUMERIC_MAX_RESULT_SCALE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 errmsg("invalid length in external \"numeric\" value")));

which again represents a totally misplaced assumption that
NUMERIC_MAX_PRECISION has something to do with the max number of digits to
the left of the decimal point.  This is actually pretty bad because it
creates a dump/reload failure hazard when using binary COPY:

regression=# create table bign (f1 numeric);
CREATE TABLE
regression=# insert into bign select pow(9.9::numeric, 10);
INSERT 0 1
regression=# \copy bign to 'bign.data' (format binary);
COPY 1
regression=# \copy bign from 'bign.data' (format binary);
ERROR:  invalid length in external "numeric" value
CONTEXT:  COPY bign, line 1, column f1

I think we can probably just remove the length test there altogether:
anything that fits in uint16 is probably OK.

Because of the dump/reload hazard, I think the numeric_recv() issue
is definitely a back-patchable bug fix.  You could argue that relaxing
the E-notation limit in numeric_in() is more of a feature addition,
but considering that the documentation has stated very specifically
since 9.1 that NUMERIC accepts large values (cf commit cabf5d84b),
I'm inclined to think that's a bug fix too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-08-14 Thread Andrey Borodin
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, failed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi hackers!

I've read the patch and here is my code review.

==PURPOSE
I've used this feature from time to time with MS SQL. From my experience 
INCLUDE is a 'sugar on top' feature. 
Some MS SQL classes do not even mention INCLUDE despite it's there from 2005 
(though classes do not mention lots of important things, so it's not kind of 
valuable indicator).
But those who use it, use it whenever possible. For example, system view with 
recommended indices rarely list one without INCLUDE columns.
So, this feature is very important from perspective of converting MS SQL DBAs 
to PostgreSQL. This is how I see it.

SUGGESTIONS==
0. Index build is broken. This script 
https://github.com/x4m/pggistopt/blob/8ad65d2e305e98c836388a07909af5983dba9c73/test.sql
 SEGFAULTs and may cause situation when you cannot insert anything into table 
(I think drop of index would help, but didn't tested this)
1. I think MS SQL syntax INCLUDE instead of INCLUDING would be better (for a 
purpose listed above)
2. Empty line added in ruleutils.c. Is it for a reason?
3. Now we have indnatts and indnkeyatts instead of indnatts. I think it is 
worth considering renaming indnatts to something different from old name. 
Someone somewhere could still suppose it's a number of keys.

PERFORMANCE==
Due to suggestion number 0 I could not measure performance of index build. 
Index crashes when there's more than 1.1 million of rows in a table.
Performance test script is here 
https://github.com/x4m/pggistopt/blob/f206b4395baa15a2fa42897eeb27bd555619119a/test.sql
Test scenario is following:
1. Create table, then create index, then add data.
2. Make a query touching data in INCLUDING columns.
This scenario is tested against table with:
A. Table with index, that do not contain touched columns, just PK.
B. Index with all columns in index.
C. Index with PK in keys and INCLUDING all other columns.

Tests were executed 5 times on Ubuntu VM under Hyper-V i5 2500 CPU, 16 Gb of 
RAM, SSD disk.
Time to insert 10M rows:
A. AVG 110 seconds STD 4.8
B. AVG 121 seconds STD 2.0
C. AVG 111 seconds STD 5.7
Inserts to INCLUDING index is almost as fast as inserts to index without extra 
columns.

Time to run SELECT query:
A. AVG 2864 ms STD 794
B. AVG 2329 ms STD 84
C. AVG 2293 ms STD 58
Selects with INCLUDING columns is almost as fast as with full index.

Index size (deterministic measure, STD = 0)
A. 317 MB
B. 509 MB
C. 399 MB
Index size is in the middle between full index and minimal index.

I think this numbers agree with expectation from the feature.

CONCLUSION==
This patch brings useful and important feature. Build shall be repaired; other 
my suggestions are only suggestions.



Best regards, Andrey Borodin, Octonica & Ural Federal University.

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-14 Thread Ryan Murphy
Hello Postgres Team!

This is to fix an issue that came up for me when running initdb.

At the end of a successful initdb it says:

Success. You can now start the database server using:
pg_ctl -D /some/path/to/data -l logfile start

but this command did not work for me because my data directory
contained a space.  The src/bin/initdb/initdb.c source code
did already have a QUOTE_PATH constant, but it was the empty
string for non-windows cases.

Therefore, added quotes via existing QUOTE_PATH constant:

pg_ctl -D '/some/path/to/data' -l logfile start


0001-initdb-for-QUOTE_PATH-non-windows.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Barriers

2016-08-14 Thread konstantin knizhnik
Hi Thomas,

Barriers are really very simple and convenient mechanism for process 
synchronization.
But it is actually a special case of semaphores: having semaphore primitive it 
is trivial to implement a barrier.
We have semaphores in Postgres, but ... them can not be used by extensions: 
there is fixed number of semaphores allocated based on maximal number of 
connections and there is no mechanism for requesting additional semaphores. 


Rober has recently proposed conditional variables, which are also very useful. 
Right now we have spinlocks, LW-locks and latches.
>From my point of view it is not enough. While creating various extensions for 
>Postgres I always fill lack of such synchronization primitive as events 
>(condition variables) and semaphores. Events and semaphores are similar and it 
>is possible to implement any of them based on another. But from user's point 
>of view them have different semantic and use cases, so it is better to provide 
>both of them.

I wonder if we should provide system-abstraction-layer (SAL) for Postgres, 
where we can isolate all system dependent code and which will be available for 
vore developers as well as for developers of extensions?  The obvious 
candidates for SAL are:
1. Synchronization primitives (locks, events, semaphores, mutexes, spinlocks, 
latches, barriers)
2. Shared memory
3. File access
4. Network sockets
5. Process control (fork, ...)

Certainly it requires a lot of refactoring but will make Postgres code much 
more elegant, easer to read and maintain.
Also it is not necessary to do all this changes  in one step: here we do not 
need atomic transactions:)
We can start for example with synchronization primitives, as far as in any case 
a lot of changes are proposed here.

Parallel execution is one of the most promising approach to improve Postgres 
performance. I do not mean just parallel execution of single query.
Parallel vacuum, parallel index creation, parallel sort, ... And to implement 
all this stuff we definitely need convenient and efficient synchronization 
primitives.
The set of such primitives can be discussed. IMHO it should include RW-locks 
(current LW-locks), mutexes (spinlock + some mechanism to wait), events 
(condition variables), semaphores and barriers (based on semaphores). Latches 
can be left for backward compatibility or be replaced with events.
I wonder if somebody has measured  how much times latches (signal+socket) are 
slower then posix semaphores or conditional variables?



On Aug 14, 2016, at 2:18 AM, Thomas Munro wrote:

> Hi hackers,
> 
> I would like to propose "barriers" for Postgres processes.  A barrier
> is a very simple mechanism for coordinating parallel computation, as
> found in many threading libraries.
> 
> First, you initialise a Barrier object somewhere in shared memory,
> most likely in the DSM segment used by parallel query, by calling
> BarrierInit(, nworkers).  Then workers can call
> BarrierWait() when they want to block until all workers arrive
> at the barrier.  When the final worker arrives, BarrierWait returns in
> all workers, releasing them to continue their work.  One arbitrary
> worker receives a different return value as a way of "electing" it to
> perform serial phases of computation.  For parallel phases of
> computation, the return value can be ignored.  For example, there may
> be preparation, merging, or post-processing phases which must be done
> by just one worker, interspersed with phases where all workers do
> something.
> 
> My use case for this is coordinating the phases of parallel hash
> joins, but I strongly suspect there are other cases.  Parallel sort
> springs to mind, which is why I wanted to post this separately and
> earlier than my larger patch series, to get feedback from people
> working on other parallel features.
> 
> A problem that I'm still grappling with is how to deal with workers
> that fail to launch.  What I'm proposing so far is based on static
> worker sets, where you can only give the number of workers at
> initialisation time, just like pthread_barrier_init.  Some other
> libraries allow for adjustable worker sets, and I'm wondering if a
> parallel leader might need to be able to adjust the barrier when it
> hears of a worker not starting.  More on that soon.
> 
> Please see the attached WIP patch.  I had an earlier version with its
> own waitlists and signalling machinery etc, but I've now rebased it to
> depend on Robert Haas's proposed condition variables, making this code
> much shorter and sweeter.  So it depends on his
> condition-variable-vX.patch[1], which in turn depends on my
> lwlocks-in-dsm-vX.patch[2] (for proclist).
> 
> When Michaël Paquier's work on naming wait points[3] lands, I plan to
> include event IDs as an extra argument to BarrierWait which will be
> passed though so as to show up in pg_stat_activity.  Then you'll be
> able to see where workers are waiting for each other!  For now I
> didn't want to tangle this up with yet