date:20151111

Re: [HACKERS] can we add SKIP LOCKED to UPDATE?

2015-11-11 Thread Torsten Zühlsdorff




On 10.11.2015 07:23, Craig Ringer wrote:

On 10 November 2015 at 01:38, Jeff Janes  wrote:


this would be handy in conjunction with LIMIT
(which also doesn't exist for UPDATE right now).


... and, in turn, UPDATE ... ORDER BY ..., since LIMIT without ORDER
BY is usually not a great choice.

I'd quite like to see UPDATE ... ORDER BY for deadlock avoidance
anyway. Right now doing it really reliably seems to require a SELECT
... FOR UPDATE, then an UPDATE on the SELECTed tuples only. If you're
in READ COMMITTED you can't assume the UPDATE won't see new tuples
since the SELECT so you need to supply a key-list to the UPDATE
directly or via a wCTE.

I'm constantly surprised that people don't seem to hit deadlocks
between updates more often. I guess the number of cases where
multi-row updates on overlapping but non-identical sets of rows occur
concurrently must be fairly limited in practice.


From my experience most databases are just to small. There operation 
finished before there could be a deadlock. Same for race conditions - 
most developer don't know them, because the never stumbled about them. I 
am matching regularly discussions if a database is already to big when 
holding 10.000 records in the whole cluster...


Most time it is relatively predictable if an application will hit such a 
problem or not. But of course you should make it right.



Using SKIP LOCKED in a wCTE with an UPDATE is clunkier but not that
bad. So I don't think it's drastically important, but it would be
nice.


This is my opinion too.

Greetings,
Torsten


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-11 Thread Thomas Munro

On Wed, Nov 11, 2015 at 9:42 PM, Heikki Linnakangas  wrote:

> On 11/11/2015 10:23 AM, Simon Riggs wrote:
>
>> On 11 November 2015 at 05:37, Thomas Munro > >
>> wrote:
>>
>> Many sites use hot standby servers to spread read-heavy workloads over
>> more
>>
>>> hardware, or at least would like to.  This works well today if your
>>> application can tolerate some time lag on standbys.  The problem is that
>>> there is no guarantee of when a particular commit will become visible for
>>> clients connected to standbys.  The existing synchronous commit feature
>>> is
>>> no help here because it guarantees only that the WAL has been flushed on
>>> another server before commit returns.  It says nothing about whether it
>>> has
>>> been applied or whether it has been applied on the standby that you
>>> happen
>>> to be talking to.
>>>
>>
>> Thanks for working on this issue.
>>
>
> +1.
>
> 3.  Commit on the primary with "causal_reads = on" waits for all
>>> 'available' standbys either to apply the commit record, or to cease to be
>>> 'available' and begin raising the error if they are still alive (because
>>> their authorizations have expired).
>>>
>>>
>> This causes every writer to wait.
>>
>> What we want is to isolate the wait only to people performing a write-read
>> sequence, so I think it should be readers that wait. Let's have that
>> debate
>> up front before we start reviewing the patch.
>>
>
> Agreed. And in the write-read sequence, you don't need to wait at the
> write either, it's enough that you wait just before you start doing the
> read. An application might do a lot of other things between the two, so
> that in most cases, there would in fact be no waiting as the record is
> already applied when you perform the read.
>
> I'm thinking the client should get some kind of a token back from the
> commit, and it could use the token on the standby, to wait for that commit
> to be applied. The token could be just the XID, or the LSN of the commit
> record. Or the application could generate the token and pass it to the
> server in the commit, similar to how 2PC works. So the interaction would be
> something like:
>
> In master:
> BEGIN;
> INSERT INTO FOO ...;
> COMMIT;
> Server returns: COMMITted with token 1234
>
> Later, in standby:
> BEGIN WAIT FOR COMMIT 1234 TO BE VISIBLE;
> SELECT * FROM foo;
> ...

I thought about this question, and considered three different approaches:

1.  Reader waits with exposed LSNs, as Heikki suggests.  This is what
BerkeleyDB does in "read-your-writes" mode.  It means that application
developers have the responsibility for correctly identifying transactions
with causal dependencies and dealing with LSNs (or whatever equivalent
tokens), potentially even passing them to other processes where the
transactions are causally dependent but run by multiple communicating
clients (for example, communicating microservices).  This makes it
difficult to retrofit load balancing to pre-existing applications and (like
anything involving concurrency) difficult to reason about as applications
grow in size and complexity.  It is efficient if done correctly, but it is
a tax on application complexity.

2.  Reader waits for a conservatively chosen LSN.  This is roughly what
MySQL derivatives do in their "causal_reads = on" and "wsrep_sync_wait = 1"
modes.  Read transactions would start off by finding the current end of WAL
on the primary, since that must be later than any commit that already
completed, and then waiting for that to apply locally.  That means every
read transaction waits for a complete replication lag period, potentially
unnecessarily.  This is tax on readers with unnecessary waiting.

3.  Writer waits, as proposed.  In this model, there is no tax on readers
(they have zero overhead, aside from the added complexity of dealing with
the possibility of transactions being rejected when a standby falls behind
and is dropped from 'available' status; but database clients must already
deal with certain types of rare rejected queries/failures such as
deadlocks, serialization failures, server restarts etc).  This is a tax on
writers.

My thinking was that the reason for wanting to load balance over a set of
hot standbys is because you have a very read-heavy workload, so it makes
sense to tax the writers and leave the many dominant readers unburdened, so
(3) should be better than (2) for the majority of users who want such a
configuration.  (Note also that it's not a requirement to tax every write;
with this proposal you can set causal_reads to off for those transactions
where you know there is no possibility of a causally dependent read).

As for (1), my thinking was that most application developers would probably
prefer not to have to deal with that type of interface.  For users who do
want to do that, it would be comparatively simple to make that possible,
and would not conflict with this proposal.  This proposal could be used by
people

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-11 Thread Ants Aasma

On Wed, Nov 11, 2015 at 11:22 AM, Thomas Munro
 wrote:
> On Wed, Nov 11, 2015 at 9:42 PM, Heikki Linnakangas  wrote:
>> On 11/11/2015 10:23 AM, Simon Riggs wrote:
>>> Thanks for working on this issue.
>>
>> +1.

+1. I have seen a lot of interest for something along these lines.

>> I'm thinking the client should get some kind of a token back from the
>> commit, and it could use the token on the standby, to wait for that commit
>> to be applied. The token could be just the XID, or the LSN of the commit
>> record. Or the application could generate the token and pass it to the
>> server in the commit, similar to how 2PC works. So the interaction would be
>> something like:
>>
>> In master:
>> BEGIN;
>> INSERT INTO FOO ...;
>> COMMIT;
>> Server returns: COMMITted with token 1234
>>
>> Later, in standby:
>> BEGIN WAIT FOR COMMIT 1234 TO BE VISIBLE;
>> SELECT * FROM foo;
>> ...

To avoid read anomalies (backwards timetravel) it should also be
possible to receive a token from read-only transactions based on the
latest snapshot used.

> My thinking was that the reason for wanting to load balance over a set of
> hot standbys is because you have a very read-heavy workload, so it makes
> sense to tax the writers and leave the many dominant readers unburdened, so
> (3) should be better than (2) for the majority of users who want such a
> configuration.  (Note also that it's not a requirement to tax every write;
> with this proposal you can set causal_reads to off for those transactions
> where you know there is no possibility of a causally dependent read).
>
> As for (1), my thinking was that most application developers would probably
> prefer not to have to deal with that type of interface.  For users who do
> want to do that, it would be comparatively simple to make that possible, and
> would not conflict with this proposal.  This proposal could be used by
> people retrofitting load balancing to an existing applications with relative
> ease, or simply not wanting to have to deal with LSNs and complexity.  (I
> have considered proposing pg_wait_for_xlog_replay_location(lsn, timeout)
> separately, which could be called on a standby with the lsn obtained from
> pg_current_xlog_location() on the primary any time after a COMMIT completes,
> but I was thinking of that as a different feature addressing a different
> user base: people prepared to do more work to squeeze out some extra
> performance.)

Although I still think that 1) is the correct long term solution I
must say that I agree with the reasoning presented. I think we should
review the API in the light that in the future we might have a mix of
clients, some clients that are able to keep track of causality tokens
and either want to wait when a read request arrives, or pick a host to
use based on the token, and then there are "dumb" clients that want to
use write side waits.

Also, it should be possible to configure which standbys are considered
for waiting on. Otherwise a reporting slave will occasionally catch up
enough to be considered "available" and then cause a latency peak when
a long query blocks apply again.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-11 Thread Atri Sharma

> I'm thinking the client should get some kind of a token back from the
commit, and it could use the token on the standby, to wait for that commit
to be applied. The token could be just the XID, or the LSN of the commit
record. Or the application could generate the token and pass it to the
server in the commit, similar to how 2PC works. So the interaction would be
something like:
>
> In master:
> BEGIN;
> INSERT INTO FOO ...;
> COMMIT;
> Server returns: COMMITted with token 1234
>
> Later, in standby:
> BEGIN WAIT FOR COMMIT 1234 TO BE VISIBLE;
> SELECT * FROM foo;

+1.

The LSN should be good enough IMO.

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-11 Thread Heikki Linnakangas


On 11/11/2015 10:23 AM, Simon Riggs wrote:

On 11 November 2015 at 05:37, Thomas Munro 
wrote:

Many sites use hot standby servers to spread read-heavy workloads over more

hardware, or at least would like to.  This works well today if your
application can tolerate some time lag on standbys.  The problem is that
there is no guarantee of when a particular commit will become visible for
clients connected to standbys.  The existing synchronous commit feature is
no help here because it guarantees only that the WAL has been flushed on
another server before commit returns.  It says nothing about whether it has
been applied or whether it has been applied on the standby that you happen
to be talking to.


Thanks for working on this issue.


+1.


3.  Commit on the primary with "causal_reads = on" waits for all
'available' standbys either to apply the commit record, or to cease to be
'available' and begin raising the error if they are still alive (because
their authorizations have expired).



This causes every writer to wait.

What we want is to isolate the wait only to people performing a write-read
sequence, so I think it should be readers that wait. Let's have that debate
up front before we start reviewing the patch.


Agreed. And in the write-read sequence, you don't need to wait at the 
write either, it's enough that you wait just before you start doing the 
read. An application might do a lot of other things between the two, so 
that in most cases, there would in fact be no waiting as the record is 
already applied when you perform the read.


I'm thinking the client should get some kind of a token back from the 
commit, and it could use the token on the standby, to wait for that 
commit to be applied. The token could be just the XID, or the LSN of the 
commit record. Or the application could generate the token and pass it 
to the server in the commit, similar to how 2PC works. So the 
interaction would be something like:


In master:
BEGIN;
INSERT INTO FOO ...;
COMMIT;
Server returns: COMMITted with token 1234

Later, in standby:
BEGIN WAIT FOR COMMIT 1234 TO BE VISIBLE;
SELECT * FROM foo;
...

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal: numeric scale functions

2015-11-11 Thread Marko Tiikkaja


Hi,

Dealing with "numeric"s right now in cases where it's really important 
that the scale is correct is quite painful.  For example, if I want to 
accept a EUR amount as an input, I often want to reject values such as 
'21.413', but I'd be fine with e.g. '21.41'.  My suggestion is to 
add two functions: one to return the number of decimal places in a 
numeric, and another one to remove non-significant decimal places. 
These two functions are useful on their own, and when combined allow you 
to look at the number of significant decimal places in a numeric.


Any thoughts, objections?


.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

2015-11-11 Thread YUriy Zhuravlev

On Friday 06 November 2015 12:55:44 you wrote:
> Omitted bounds are common in other languages and would be handy. I
> don't think they'd cause any issues with multi-dimensional arrays or
> variable start-pos arrays.

And yet, what about my patch?
Discussions about ~ and{:} it seems optional.

Thanks.
-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Python 3 compatibility fun

2015-11-11 Thread Tom Lane

According to
https://bugzilla.redhat.com/show_bug.cgi?id=1280404

we're failing to build against Python 3.5 because the python guys
have randomly changed some error message texts, again.

In the short run the answer must be to add some more variant
expected-files, but I wonder if we should be looking for another way.

One question worth asking is whether these specific test cases are
worth having at all ... they don't look all that useful to me.

regards, tom lane


*** 
/builddir/build/BUILD/postgresql-9.4.5/src/pl/plpython/expected/python3/plpython_error.out
  Wed Nov 11 14:44:44 2015
--- 
/builddir/build/BUILD/postgresql-9.4.5/src/pl/plpython/results/python3/plpython_error.out
   Wed Nov 11 14:44:51 2015
***
*** 245,251 
  plpy.nonexistent
  $$ LANGUAGE plpython3u;
  SELECT toplevel_attribute_error();
! ERROR:  AttributeError: 'module' object has no attribute 'nonexistent'
  CONTEXT:  Traceback (most recent call last):
PL/Python function "toplevel_attribute_error", line 2, in 
  plpy.nonexistent
--- 245,251 
  plpy.nonexistent
  $$ LANGUAGE plpython3u;
  SELECT toplevel_attribute_error();
! ERROR:  AttributeError: module 'plpy' has no attribute 'nonexistent'
  CONTEXT:  Traceback (most recent call last):
PL/Python function "toplevel_attribute_error", line 2, in 
  plpy.nonexistent
==
*** 
/builddir/build/BUILD/postgresql-9.4.5/src/pl/plpython/expected/python3/plpython_subtransaction.out
 Wed Nov 11 14:44:44 2015
--- 
/builddir/build/BUILD/postgresql-9.4.5/src/pl/plpython/results/python3/plpython_subtransaction.out
  Wed Nov 11 14:44:51 2015
***
*** 58,64 
  
  TRUNCATE subtransaction_tbl;
  SELECT subtransaction_test('Python');
! ERROR:  AttributeError: 'module' object has no attribute 'attribute_error'
  CONTEXT:  Traceback (most recent call last):
PL/Python function "subtransaction_test", line 13, in 
  plpy.attribute_error
--- 58,64 
  
  TRUNCATE subtransaction_tbl;
  SELECT subtransaction_test('Python');
! ERROR:  AttributeError: module 'plpy' has no attribute 'attribute_error'
  CONTEXT:  Traceback (most recent call last):
PL/Python function "subtransaction_test", line 13, in 
  plpy.attribute_error
***
*** 110,116 
  
  TRUNCATE subtransaction_tbl;
  SELECT subtransaction_ctx_test('Python');
! ERROR:  AttributeError: 'module' object has no attribute 'attribute_error'
  CONTEXT:  Traceback (most recent call last):
PL/Python function "subtransaction_ctx_test", line 8, in 
  plpy.attribute_error
--- 110,116 
  
  TRUNCATE subtransaction_tbl;
  SELECT subtransaction_ctx_test('Python');
! ERROR:  AttributeError: module 'plpy' has no attribute 'attribute_error'
  CONTEXT:  Traceback (most recent call last):
PL/Python function "subtransaction_ctx_test", line 8, in 
  plpy.attribute_error
==


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Pavel Stehule

2015-11-11 19:03 GMT+01:00 Thom Brown :

> On 11 November 2015 at 17:59, Pavel Stehule 
> wrote:
> > Hi
> >
> > I have a first query
> >
> > I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
> > differen
> >
> > postgres=# set max_parallel_degree to 4;
> > SET
> > Time: 0.717 ms
> > postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
> >
> ┌───┐
> > │  QUERY PLAN
> > │
> >
> ╞═══╡
> > │ Aggregate  (cost=9282.50..9282.51 rows=1 width=0) (actual
> > time=142.541..142.541 rows=1 loops=1)   │
> > │   ->  Gather  (cost=1000.00..9270.00 rows=5000 width=0) (actual
> > time=0.633..130.926 rows=10 loops=1)  │
> > │ Number of Workers: 2
> > │
> > │ ->  Parallel Seq Scan on xxx  (cost=0.00..7770.00 rows=5000
> > width=0) (actual time=0.052..411.303 rows=169631 loops=1) │
> > │   Filter: ((a % 10) = 0)
> > │
> > │   Rows Removed by Filter: 1526399
> > │
> > │ Planning time: 0.167 ms
> > │
> > │ Execution time: 144.519 ms
> > │
> >
> └───┘
> > (8 rows)
> >
> > Time: 145.374 ms
> > postgres=# set max_parallel_degree to 1;
> > SET
> > Time: 0.706 ms
> > postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
> >
> ┌┐
> > │   QUERY PLAN
> > │
> >
> ╞╡
> > │ Aggregate  (cost=14462.50..14462.51 rows=1 width=0) (actual
> > time=163.355..163.355 rows=1 loops=1)  │
> > │   ->  Gather  (cost=1000.00..14450.00 rows=5000 width=0) (actual
> > time=0.485..152.827 rows=10 loops=1)  │
> > │ Number of Workers: 1
> > │
> > │ ->  Parallel Seq Scan on xxx  (cost=0.00..12950.00 rows=5000
> > width=0) (actual time=0.043..309.740 rows=145364 loops=1) │
> > │   Filter: ((a % 10) = 0)
> > │
> > │   Rows Removed by Filter: 1308394
> > │
> > │ Planning time: 0.129 ms
> > │
> > │ Execution time: 165.102 ms
> > │
> >
> └┘
> > (8 rows)
> >
> > Rows removed by filter: 1308394 X 1526399. Is it expected?
>
> Yeah, I noticed the same thing, but more pronounced:
>
> With set max_parallel_degree = 4:
>
> # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->0->>'term' like 'design%' or
> content->'tags'->0->>'term' like 'web%';
>
> QUERY PLAN
>
> -
>  Aggregate  (cost=49575.51..49575.52 rows=1 width=0) (actual
> time=744.267..744.267 rows=1 loops=1)
>Output: count(*)
>Buffers: shared hit=175423
>->  Gather  (cost=1000.00..49544.27 rows=12496 width=0) (actual
> time=0.351..731.662 rows=55151 loops=1)
>  Output: content
>  Number of Workers: 4
>  Buffers: shared hit=175423
>  ->  Parallel Seq Scan on public.js  (cost=0.00..47294.67
> rows=12496 width=0) (actual time=0.030..5912.118 rows=96062 loops=1)
>Output: content
>Filter: (js.content -> 'tags'::text) -> 0) ->>
> 'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
> -> 0) ->> 'term'::text) ~~ 'web%'::text))
>Rows Removed by Filter: 2085546
>Buffers: shared hit=305123
>  Planning time: 0.123 ms
>  Execution time: 759.313 ms
> (14 rows)
>
>
> With set max_parallel_degree = 0:
>
> # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->0->>'term' like 'design%' or
> content->'tags'->0->>'term' like 'web%';
>
>  QUERY PLAN
>
> ---
>  Aggregate  (cost=212857.25..212857.26 rows=1 width=0) (actual
> time=1235.082..1235.082 rows=1 loops=1)
>Output: count(*)
>Buffers: shared hit=175243
>->  Seq Scan on public.js  (cost=0.00..212826.01 rows=12496
> width=0) (actual time=0.019..1228.515 rows=55151

Re: [HACKERS] [COMMITTERS] pgsql: Translation updates

2015-11-11 Thread Peter Eisentraut

On 11/10/15 12:03 PM, Alvaro Herrera wrote:
> Peter Eisentraut wrote:
>> Translation updates
>>
>> Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
>> Source-Git-Hash: cd263526676705b4a8a3a708c9842461c4a2bcc3
> 
> Hi Peter,
> 
> Would you please document this process?

It's documented in src/tools/RELEASE_CHANGES (except I don't do the
tagging anymore).  You can try it yourself.

> (It would also be good to know how to create new branches and how to
> destroy one when it goes out of support.)

Yeah that one's a bit trickier and potentially confusing.  See recent
discussion about branch naming on the translators list.  I usually just
check the commit history for how it was done the previous time. ;-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to install config/missing

2015-11-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 11/2/15 4:07 PM, Tom Lane wrote:
>> I wonder how much we need that script at all though.  If, say, configure
>> doesn't find bison, what's so wrong with just defining BISON=bison and
>> letting the usual shell "bison: command not found" error leak through?

> I agree.  Something like the attached patch.

I was thinking more of removing the "missing" script and associated logic
entirely, rather than making PGXS a special case.  I think we should do
our best to minimize differences between behaviors in core builds and
PGXS builds, if only because we don't test the latter very much and
might not notice problems there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Pavel Stehule

Hi

I have a first query

I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
differen

postgres=# set max_parallel_degree to 4;
SET
Time: 0.717 ms
postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
┌───┐
│  QUERY
PLAN   │
╞═══╡
│ Aggregate  (cost=9282.50..9282.51 rows=1 width=0) (actual
time=142.541..142.541 rows=1 loops=1)   │
│   ->  Gather  (cost=1000.00..9270.00 rows=5000 width=0) (actual
time=0.633..130.926 rows=10 loops=1)  │
│ Number of Workers:
2
│
│ ->  Parallel Seq Scan on xxx  (cost=0.00..7770.00 rows=5000
width=0) (actual time=0.052..411.303 rows=169631 loops=1) │
│   Filter: ((a % 10) =
0)
│
│   Rows Removed by Filter:
1526399
│
│ Planning time: 0.167
ms
│
│ Execution time: 144.519
ms
│
└───┘
(8 rows)

Time: 145.374 ms
postgres=# set max_parallel_degree to 1;
SET
Time: 0.706 ms
postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
┌┐
│   QUERY
PLAN   │
╞╡
│ Aggregate  (cost=14462.50..14462.51 rows=1 width=0) (actual
time=163.355..163.355 rows=1 loops=1)  │
│   ->  Gather  (cost=1000.00..14450.00 rows=5000 width=0) (actual
time=0.485..152.827 rows=10 loops=1)  │
│ Number of Workers:
1
│
│ ->  Parallel Seq Scan on xxx  (cost=0.00..12950.00 rows=5000
width=0) (actual time=0.043..309.740 rows=145364 loops=1) │
│   Filter: ((a % 10) =
0)
│
│   Rows Removed by Filter:
1308394
│
│ Planning time: 0.129
ms
│
│ Execution time: 165.102
ms
│
└┘
(8 rows)

Rows removed by filter: 1308394 X 1526399. Is it expected?

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Thom Brown

On 11 November 2015 at 17:59, Pavel Stehule  wrote:
> Hi
>
> I have a first query
>
> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
> differen
>
> postgres=# set max_parallel_degree to 4;
> SET
> Time: 0.717 ms
> postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
> ┌───┐
> │  QUERY PLAN
> │
> ╞═══╡
> │ Aggregate  (cost=9282.50..9282.51 rows=1 width=0) (actual
> time=142.541..142.541 rows=1 loops=1)   │
> │   ->  Gather  (cost=1000.00..9270.00 rows=5000 width=0) (actual
> time=0.633..130.926 rows=10 loops=1)  │
> │ Number of Workers: 2
> │
> │ ->  Parallel Seq Scan on xxx  (cost=0.00..7770.00 rows=5000
> width=0) (actual time=0.052..411.303 rows=169631 loops=1) │
> │   Filter: ((a % 10) = 0)
> │
> │   Rows Removed by Filter: 1526399
> │
> │ Planning time: 0.167 ms
> │
> │ Execution time: 144.519 ms
> │
> └───┘
> (8 rows)
>
> Time: 145.374 ms
> postgres=# set max_parallel_degree to 1;
> SET
> Time: 0.706 ms
> postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
> ┌┐
> │   QUERY PLAN
> │
> ╞╡
> │ Aggregate  (cost=14462.50..14462.51 rows=1 width=0) (actual
> time=163.355..163.355 rows=1 loops=1)  │
> │   ->  Gather  (cost=1000.00..14450.00 rows=5000 width=0) (actual
> time=0.485..152.827 rows=10 loops=1)  │
> │ Number of Workers: 1
> │
> │ ->  Parallel Seq Scan on xxx  (cost=0.00..12950.00 rows=5000
> width=0) (actual time=0.043..309.740 rows=145364 loops=1) │
> │   Filter: ((a % 10) = 0)
> │
> │   Rows Removed by Filter: 1308394
> │
> │ Planning time: 0.129 ms
> │
> │ Execution time: 165.102 ms
> │
> └┘
> (8 rows)
>
> Rows removed by filter: 1308394 X 1526399. Is it expected?

Yeah, I noticed the same thing, but more pronounced:

With set max_parallel_degree = 4:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';

QUERY PLAN
-
 Aggregate  (cost=49575.51..49575.52 rows=1 width=0) (actual
time=744.267..744.267 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=175423
   ->  Gather  (cost=1000.00..49544.27 rows=12496 width=0) (actual
time=0.351..731.662 rows=55151 loops=1)
 Output: content
 Number of Workers: 4
 Buffers: shared hit=175423
 ->  Parallel Seq Scan on public.js  (cost=0.00..47294.67
rows=12496 width=0) (actual time=0.030..5912.118 rows=96062 loops=1)
   Output: content
   Filter: (js.content -> 'tags'::text) -> 0) ->>
'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text))
   Rows Removed by Filter: 2085546
   Buffers: shared hit=305123
 Planning time: 0.123 ms
 Execution time: 759.313 ms
(14 rows)


With set max_parallel_degree = 0:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';

 QUERY PLAN
---
 Aggregate  (cost=212857.25..212857.26 rows=1 width=0) (actual
time=1235.082..1235.082 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=175243
   ->  Seq Scan on public.js  (cost=0.00..212826.01 rows=12496
width=0) (actual time=0.019..1228.515 rows=55151 loops=1)
 Output: content
 Filter: (js.content -> 'tags'::text) -> 0) ->>
'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text))
 Rows Removed by Filter: 1197822
 Buffers: shared hit=175243

Re: [HACKERS] Patch to install config/missing

2015-11-11 Thread Peter Eisentraut

On 11/2/15 4:07 PM, Tom Lane wrote:
> I wonder how much we need that script at all though.  If, say, configure
> doesn't find bison, what's so wrong with just defining BISON=bison and
> letting the usual shell "bison: command not found" error leak through?

I agree.  Something like the attached patch.

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 51f4797..5379c90 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -293,8 +293,12 @@ ifneq (@PERL@,)
 # quoted to protect pathname with spaces
 PERL		= '@PERL@'
 else
+ifdef PGXS
+PERL		= perl
+else
 PERL		= $(missing) perl
 endif
+endif
 perl_archlibexp		= @perl_archlibexp@
 perl_privlibexp		= @perl_privlibexp@
 perl_embed_ldflags	= @perl_embed_ldflags@
@@ -597,6 +601,15 @@ TAS = @TAS@
 #
 # Global targets and rules
 
+ifdef PGXS
+ifeq (,$(BISON))
+BISON = bison
+endif
+ifeq (,$(FLEX))
+FLEX = flex
+endif
+endif
+
 %.c: %.l
 ifdef FLEX
 	$(FLEX) $(if $(FLEX_NO_BACKUP),-b) $(FLEXFLAGS) -o'$@' $<

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-11-11 Thread Robert Haas

On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita
 wrote:
> To test this change, I think we should update the postgres_fdw patch so as
> to add the RecheckForeignScan.
>
> Having said that, as I said previously, I don't see much value in adding the
> callback routine, to be honest.  I know KaiGai-san considers that that would
> be useful for custom joins, but I don't think that that would be useful even
> for foreign joins, because I think that in case of foreign joins, the
> practical implementation of that routine in FDWs would be to create a
> secondary plan and execute that plan by performing ExecProcNode, as my patch
> does [1].  Maybe I'm missing something, though.

I really don't see why you're fighting on this point.  Making this a
generic feature will require only a few extra lines of code for FDW
authors.  If this were going to cause some great inconvenience for FDW
authors, then I'd agree it isn't worth it.  But I see zero evidence
that this is actually the case.  From my point of view I'm now
thinking this solution has two parts:

(1) Let foreign scans have inner and outer subplans.  For this
purpose, we only need one, but it's no more work to enable both, so we
may as well.  If we had some reason, we could add a list of subplans
of arbitrary length, but there doesn't seem to be an urgent need for
that.

(2) Add a recheck callback.

If the foreign data wrapper wants to adopt the solution you're
proposing, the recheck callback can call
ExecProcNode(outerPlanState(node)).  I don't think this should end up
being more than a few lines of code, although of course we should
verify that.  So no problem: postgres_fdw and any other FDWs where the
remote side is a database can easily delegate to a subplan, and
anybody who wants to do something else still can.

What is not to like about that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Python 3 compatibility fun

2015-11-11 Thread Peter Eisentraut

On 11/11/15 12:08 PM, Tom Lane wrote:
> we're failing to build against Python 3.5 because the python guys
> have randomly changed some error message texts, again.

This has already been fixed in the 9.5.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Amit Langote

On Wed, Nov 11, 2015 at 11:53 PM, Robert Haas  wrote:
> For those following along at home, here's a demo:
>
> rhaas=# \timing
> Timing is on.
> rhaas=# select * from pgbench_accounts where filler like '%a%';
>  aid | bid | abalance | filler
> -+-+--+
> (0 rows)
>
> Time: 743.061 ms
> rhaas=# set max_parallel_degree = 4;
> SET
> Time: 0.270 ms
> rhaas=# select * from pgbench_accounts where filler like '%a%';
>  aid | bid | abalance | filler
> -+-+--+
> (0 rows)
>
> Time: 213.412 ms
>
> This is all pretty primitive at this point - there are still lots of
> things that need to be fixed and improved, and it applies to only the
> very simplest cases at present, but, hey, parallel query.  Check it
> out.

Yay! Great work guys!

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-11 Thread Tom Lane

I wrote:
> When I checked the behavior of 5d1ff6bd559ea8df, I must have only
> tried it for unshared catalogs.  Those are set up by
> RelationCacheInitializePhase3, which is post-authentication, so the
> message comes out and causes regression test failures as expected.

> This is kind of annoying :-(.  As noted in elog.c, it doesn't seem
> like a terribly good idea to send WARNING messages while the client
> is still in authentication mode; we can't be very sure that clients
> will react desirably.  So we can't fix it by mucking with that.

> One answer is to promote the case to an ERROR.  We could (probably) keep
> a bad initfile from becoming a permanent lockout condition by unlinking
> the initfile before reporting ERROR, but this way still seems like a
> reliability hazard that could be worse than the original problem.

After sleeping on it, the best compromise I can think of is to add an
"Assert(false)" after the WARNING report for the shared-catalogs case.
This will make the failure un-missable in any development build, while
not breaking production builds' ability to recover from corner cases
we might not've foreseen.

Of course, if you run an assert-enabled build in production, you might
possibly lose.  But that's never been recommended practice.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Robert Haas

On Mon, Nov 9, 2015 at 11:15 AM, Amit Kapila  wrote:
> Okay, I have updated the patch to make seq scan node parallel aware.
> To make that happen we need to have parallel_aware flag both in Plan
> as well as Path, so that we can pass that information from Path to Plan.
> I think the right place to copy parallel_aware info from path to
> plan is copy_path_costsize and ideally we should change the name
> of function to something like copy_generic_path_info(), but for
> now I have retained it's original name as it is used at many places,
> let me know if you think we should goahead and change the name
> of function as well.
>
> I have changed Explain as well so that it adds Parallel for Seq Scan if
> SeqScan node is parallel_aware.
>
> I have not integrated it with consider-parallel patch, so that this and
> Partial Seq Scan version of the patch can be compared without much
> difficulity.
>
> Thoughts?

I've committed most of this, except for some planner bits that I
didn't like, and after a bunch of cleanup.  Instead, I committed the
consider-parallel-v2.patch with some additional planner bits to make
up for the ones I removed from your patch.  So, now we have parallel
sequential scan!

For those following along at home, here's a demo:

rhaas=# \timing
Timing is on.
rhaas=# select * from pgbench_accounts where filler like '%a%';
 aid | bid | abalance | filler
-+-+--+
(0 rows)

Time: 743.061 ms
rhaas=# set max_parallel_degree = 4;
SET
Time: 0.270 ms
rhaas=# select * from pgbench_accounts where filler like '%a%';
 aid | bid | abalance | filler
-+-+--+
(0 rows)

Time: 213.412 ms

This is all pretty primitive at this point - there are still lots of
things that need to be fixed and improved, and it applies to only the
very simplest cases at present, but, hey, parallel query.  Check it
out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Thom Brown

On 11 November 2015 at 14:53, Robert Haas  wrote:
> On Mon, Nov 9, 2015 at 11:15 AM, Amit Kapila  wrote:
>> Okay, I have updated the patch to make seq scan node parallel aware.
>> To make that happen we need to have parallel_aware flag both in Plan
>> as well as Path, so that we can pass that information from Path to Plan.
>> I think the right place to copy parallel_aware info from path to
>> plan is copy_path_costsize and ideally we should change the name
>> of function to something like copy_generic_path_info(), but for
>> now I have retained it's original name as it is used at many places,
>> let me know if you think we should goahead and change the name
>> of function as well.
>>
>> I have changed Explain as well so that it adds Parallel for Seq Scan if
>> SeqScan node is parallel_aware.
>>
>> I have not integrated it with consider-parallel patch, so that this and
>> Partial Seq Scan version of the patch can be compared without much
>> difficulity.
>>
>> Thoughts?
>
> I've committed most of this, except for some planner bits that I
> didn't like, and after a bunch of cleanup.  Instead, I committed the
> consider-parallel-v2.patch with some additional planner bits to make
> up for the ones I removed from your patch.  So, now we have parallel
> sequential scan!
>
> For those following along at home, here's a demo:
>
> rhaas=# \timing
> Timing is on.
> rhaas=# select * from pgbench_accounts where filler like '%a%';
>  aid | bid | abalance | filler
> -+-+--+
> (0 rows)
>
> Time: 743.061 ms
> rhaas=# set max_parallel_degree = 4;
> SET
> Time: 0.270 ms
> rhaas=# select * from pgbench_accounts where filler like '%a%';
>  aid | bid | abalance | filler
> -+-+--+
> (0 rows)
>
> Time: 213.412 ms
>
> This is all pretty primitive at this point - there are still lots of
> things that need to be fixed and improved, and it applies to only the
> very simplest cases at present, but, hey, parallel query.  Check it
> out.

Congratulations to both you and Amit.  This is a significant landmark
in PostgreSQL feature development.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Pavel Stehule

2015-11-11 16:18 GMT+01:00 Thom Brown :

> On 11 November 2015 at 14:53, Robert Haas  wrote:
> > On Mon, Nov 9, 2015 at 11:15 AM, Amit Kapila 
> wrote:
> >> Okay, I have updated the patch to make seq scan node parallel aware.
> >> To make that happen we need to have parallel_aware flag both in Plan
> >> as well as Path, so that we can pass that information from Path to Plan.
> >> I think the right place to copy parallel_aware info from path to
> >> plan is copy_path_costsize and ideally we should change the name
> >> of function to something like copy_generic_path_info(), but for
> >> now I have retained it's original name as it is used at many places,
> >> let me know if you think we should goahead and change the name
> >> of function as well.
> >>
> >> I have changed Explain as well so that it adds Parallel for Seq Scan if
> >> SeqScan node is parallel_aware.
> >>
> >> I have not integrated it with consider-parallel patch, so that this and
> >> Partial Seq Scan version of the patch can be compared without much
> >> difficulity.
> >>
> >> Thoughts?
> >
> > I've committed most of this, except for some planner bits that I
> > didn't like, and after a bunch of cleanup.  Instead, I committed the
> > consider-parallel-v2.patch with some additional planner bits to make
> > up for the ones I removed from your patch.  So, now we have parallel
> > sequential scan!
> >
> > For those following along at home, here's a demo:
> >
> > rhaas=# \timing
> > Timing is on.
> > rhaas=# select * from pgbench_accounts where filler like '%a%';
> >  aid | bid | abalance | filler
> > -+-+--+
> > (0 rows)
> >
> > Time: 743.061 ms
> > rhaas=# set max_parallel_degree = 4;
> > SET
> > Time: 0.270 ms
> > rhaas=# select * from pgbench_accounts where filler like '%a%';
> >  aid | bid | abalance | filler
> > -+-+--+
> > (0 rows)
> >
> > Time: 213.412 ms
> >
> > This is all pretty primitive at this point - there are still lots of
> > things that need to be fixed and improved, and it applies to only the
> > very simplest cases at present, but, hey, parallel query.  Check it
> > out.
>
> Congratulations to both you and Amit.  This is a significant landmark
> in PostgreSQL feature development.
>

+1

Pavel


>
> Thom
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] pageinspect patch, for showing tuple data

2015-11-11 Thread Nikolay Shaplov

В письме от 28 октября 2015 16:57:36 пользователь Michael Paquier написал:
> On Sat, Oct 17, 2015 at 1:48 AM, Michael Paquier wrote:
> > On Sat, Oct 17, 2015 at 5:15 AM, Nikolay Shaplov wrote:
> >> Or it's ready to commit, and just not marked this way?
> > 
> > No, I don't think we have reached this state yet.
> > 
> >> I am going to make report based on this patch in Vienna. It would be
> >> nice to have it committed until then :)
> > 
> > This is registered in this November's CF and the current one is not
> > completely wrapped up, so I haven't rushed into looking at it.
> 
> So, I have finally been able to look at this patch in more details,
> resulting in the attached. I noticed a couple of things that should be
> addressed, mainly:
> - addition of a new routine text_to_bits to perform the reverse
> operation of bits_to_text. This was previously part of
> tuple_data_split, I think that it deserves its own function.
> - split_tuple_data should be static
> - t_bits_str should not be a text, rather a char* fetched using
> text_to_cstring(PG_GETARG_TEXT_PP(4)). This way there is no need to
> perform extra calculations with VARSIZE and VARHDRSZ
> - split_tuple_data can directly use the relation OID instead of the
> tuple descriptor of the relation
> - t_bits was leaking memory. For correctness I think that it is better
> to free it after calling split_tuple_data.
> - PG_DETOAST_DATUM_COPY allocates some memory, this was leaking as
> well in raw_attr actually. I refactored the code such as a bytea* is
> used and always freed when allocated to avoid leaks. Removing raw_attr
> actually simplified the code a bit.
> - I simplified the docs, that was largely too verbose in my opinion.
> - Instead of using VARATT_IS_1B_E and VARTAG_EXTERNAL, using
> VARATT_IS_EXTERNAL and VARATT_IS_EXTERNAL_ONDISK seems more adapted to
> me, those other ones are much more low-level and not really spread in
> the backend code.
> - Found some typos in the code, the docs and some comments. I reworked
> the error messages as well.
> - The code format was not really in line with the project guidelines.
> I fixed that as well.
> - I removed heap_page_item_attrs for now to get this patch in a
> bare-bone state. Though I would not mind if this is re-added (I
> personally don't think that's much necessary in the module
> actually...).
> - The calculation of the length of t_bits using HEAP_NATTS_MASK is
> more correct as you mentioned earlier, so I let it in its state.
> That's actually more accurate for error handling as well.
> That's everything I recall I have. How does this look?
You've completely rewrite everything ;-)

Let everything be the way you wrote. This code is better than mine.

With one exception. I really  need heap_page_item_attrs function. I used it in 
my Tuples Internals presentation 
https://github.com/dhyannataraj/tuple-internals-presentation
and I am 100% sure that this function is needed for educational purposes, and 
this function should be as simple as possible, so students cat use it without 
extra efforts.

I still have an opinion that documentation should be more verbose, than your 
version, but I can accept your version.

Who is going to add heap_page_item_attrs to your patch? me or you?

-- 
Nikolay Shaplov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

2015-11-11 Thread Alexander Korotkov

On Mon, Nov 9, 2015 at 8:23 PM, Pavel Stehule 
wrote:

> 2015-11-09 17:55 GMT+01:00 Alexander Korotkov :
>
>> On Mon, Nov 9, 2015 at 4:53 PM, Pavel Stehule 
>> wrote:
>>
>>> 2015-11-09 14:44 GMT+01:00 YUriy Zhuravlev :
>>>
 On Monday 09 November 2015 13:50:20 Pavel Stehule wrote:
 > New symbols increase a complexity of our code and our documentation.
 >
 > If some functionality can be implemented via functions without
 performance
 > impacts, we should not to create new operators or syntax - mainly for
 > corner use cases.
 >
 > Regards
 >
 > Pavel

 Ok we can use {:} instead [:] for zero array access.
 The function is the solution half.

>>>
>>> It isn't solution. The any syntax/behave change have to have stronger
>>> motivation. We had so talk about it 20 years ago :(
>>>
>>
>> Assuming array[~n] has a current meaning, could we give a try to new
>> syntax which doesn't have current meaning? Not yet sure what exactly it
>> could be...
>>
>
> Using this syntax can introduce compatibility issues -
> http://www.postgresql.org/docs/9.1/static/sql-createoperator.html
>

I actually meant some other syntax which doesn't introduce compatibility
issues. For instance, array{n} doesn't have meaning in current syntax
AFAICS.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Some questions about the array.

2015-11-11 Thread Pavel Stehule

2015-11-11 12:25 GMT+01:00 YUriy Zhuravlev :

> On Friday 06 November 2015 12:55:44 you wrote:
> > Omitted bounds are common in other languages and would be handy. I
> > don't think they'd cause any issues with multi-dimensional arrays or
> > variable start-pos arrays.
>
> And yet, what about my patch?
> Discussions about ~ and{:} it seems optional.
>

In this case the syntax is major issue. Any language should not to have any
possible feature on the world.

My opinion on this proposal is same - the general benefit for users is
minimal and disputable - Introduction new syntax is wrong idea.

So -1

Pavel



>
> Thanks.
> --
> YUriy Zhuravlev
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-11 Thread Robert Haas

On Wed, Nov 11, 2015 at 3:23 AM, Simon Riggs  wrote:
> This causes every writer to wait.
>
> What we want is to isolate the wait only to people performing a write-read
> sequence, so I think it should be readers that wait. Let's have that debate
> up front before we start reviewing the patch.

One advantage of having writers wait is that the master and its read
slaves can't ever get too far apart.  Suppose the master is generating
WAL much faster than the read slaves (or one of them) can replay it.
You might say it sucks to slow down the master to the speed the slaves
can keep up with, and that's true.  On the other hand, if the master
is allowed to run ahead, then a process that sends a read query to a
standby which has gotten far behind might have to wait minutes or
hours for it to catch up.  I think a lot of people are enabling
synchronous replication today just for the purpose of avoiding this
problem - keeping the two machines "together in time" makes the
overall system behavior a lot more predictable.

Also, if we made readers wait, wouldn't that require a network
roundtrip to the master every time a query on a reader wanted a new
snapshot?  That seems like it would be unbearably expensive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Trigonometric functions in degrees

2015-11-11 Thread Dean Rasheed

On 11 November 2015 at 06:04, Michael Paquier  wrote:
>>> I also modified some of the CHECKFLOATVAL() checks which didn't look
>>> right to me, unless there's some odd platform-specific behaviour that
>>> I'm not aware of, functions like sin and asin should never return
>>> infinity.
>
> -   CHECKFLOATVAL(result, isinf(arg1), true);
> +   CHECKFLOATVAL(result, false, true);
> PG_RETURN_FLOAT8(result);
>
> Hm. I would let them as-is, and update your patch to do the similar checks
> in the new functions introduced. See f9ac414 from 2007 which is the result
> of the following thread:
> http://www.postgresql.org/message-id/200612271844.kbriivb18...@momjian.us
> It doesn't seem wise to be backward regarding those Inf/NaN checks.
>

The conclusion of that thread seemed to be that we ought to allow
silent underflows to 0, but any finite inputs that produce infinite
results ought to consider reporting an error.

That seems like a reasonable principle, but it's not what the code
actually does. For example, 1e-300::float8 * 1e-300::float8 generates
an error rather than silently underflowing to 0. In addition, some of
the checks appear to be backwards, for example the division functions
like float4div do the following:

CHECKFLOATVAL(result, isinf(arg1) || isinf(arg2), arg1 == 0);

whereas the result is 0 not infinity if arg2 is infinite (unless arg1
is also infinite, in which case it ought to be NaN).

In the case of functions like sin(), I can well believe that there are
some platforms for which sin(Infinity) is NaN, and others for which it
is an error, but are there really any for which the result is
infinite? If so, I'd argue that we should throw an error anyway --
sin(x) is supposed to be between -1 and 1, so I don't think allowing
an infinite result ever makes sense.

Anyway, this looks like a wider discussion than the scope of this
patch, so I'll revert those changes in this patch, and the decision
about what (if anything) should be done with those CHECKFLOATVAL
checks can be discussed separately.

>> The alternative expected outputs for test float8 need to be updated,
>> this is causing regression failures particularly on win32 where 3
>> digits are used for exponentials and where tan('NaN') actually results
>> in an ERROR. See for example the attached regressions.diffs.
>
> It would be nice to split the results specific to NaN and Infinity into
> separate queries. The test for arctangent is one where things should be
> splitted.
>

Agreed. In fact, given the platform-dependent nature of those tests,
perhaps there's not much value in them at all.

> +   if (isinf(arg1) || arg1 < -1 || arg1 > 1)
> +   ereport(ERROR,
> +
> (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> +errmsg("input is out of range")));
> This should check on -1.0 and 1.0?
>

Perhaps. I admit that I'm not terribly consistent when I write such
code and sometimes I just rely on implicit type promotion, and other
times I prefer to be explicit. Is there a project recommended style on
this? The current code seems to be somewhat inconsistent (e.g., dpow,
dlog1, dlog10 and setseed all have similar comparisons of doubles with
1, whereas float8_regr_syy, etc. compare against 1.0). I don't have
any particular preference, so I'll happily go with whatever other
people prefer, if there's a clear consensus.

> +   if (arg1 > 180)
> +   {
> +   /* tand(360-x) = -tand(x) */
> +arg1 = 360 - arg1;
> +   sign = -sign;
> +   }
> Picky detail: be careful of incorrect tab spaces.
>

Oops. Will fix.

> =# select acos(-1.1);
>  acos
> --
>   NaN
> (1 row)

I get an error for that, so it's clearly platform-dependent.

> =# select acosd(-1.1);
> ERROR:  22003: input is out of range
> LOCATION:  dacosd, float.c:1752
> Some results are inconsistent, it seems that we should return NaN in the
> second case instead of an error.
>

I opted to have acosd() throw the same error that acos() does on my
platform, and I tend to think an error is more useful in this case.
Perhaps if consistency is important, we should modify the existing
functions to throw an error on all platforms, rather than being
platform-dependent.

> I had as well a look at the monotony of those functions, using rough queries
> like this one, and things are looking nice. The precise values are taken
> into account and their surroundings are monotone.

Thanks for testing. I'll post an updated patch sometime soon.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] can we add SKIP LOCKED to UPDATE?

2015-11-11 Thread Craig Ringer

On 11 November 2015 at 16:02, Torsten Zühlsdorff <
mailingli...@toco-domains.de> wrote:


> From my experience most databases are just tpo small. Their operations
> finish before there can be a deadlock. Same for race conditions - most
> developer don't know about them, because they never stumbled upon them. I
> am matching regularly discussions if a database is already to big when
> holding 10.000 records in the whole cluster...
>

Ha. Yes. So true.

I see Stack Overflow posts where somebody explains that their query takes
ages on their Huge!!1! database. Then it turns out the query takes 0.2
seconds on a 400MB table.

Huge. Right.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] proposal: numeric scale functions

2015-11-11 Thread Peter Eisentraut

On 11/11/15 5:03 AM, Marko Tiikkaja wrote:
> Dealing with "numeric"s right now in cases where it's really important
> that the scale is correct is quite painful.  For example, if I want to
> accept a EUR amount as an input, I often want to reject values such as
> '21.413', but I'd be fine with e.g. '21.41'.  My suggestion is to
> add two functions: one to return the number of decimal places in a
> numeric, and another one to remove non-significant decimal places.

I can see both of these being useful, but the terminology is not
correct.  The trailing zeroes are "significant".  Maybe the function
could be called "trim", in analogy to trimming whitespace off strings.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-11-11 Thread Ildus Kurbangaliev


On 11/09/2015 10:32 PM, Ildus Kurbangaliev wrote:



On Nov 9, 2015, at 7:56 PM, Alvaro Herrera  wrote:

Ildus Kurbangaliev wrote:


Thanks for the review. I've attached a new version of SLRU patch. I've
removed add_postfix and fixed EXEC_BACKEND case.


Thanks.

Please do not use "committs" in commit_ts.c; people didn't like the
abbreviated name without the underscore.  But then, why are we
abbreviating here?  We could keep it complete and with a space instead
of underscore, so why not use just "commit timestamp", because it's just
a string, right?

In multixact.c, is there a reason to have underscore in the strings?  We
could substitute it with a space and it'd look prettier; but really, we
could also keep those names parallel to subdirectory names by using the
already existing string parameter as name here, and not add another one.


I do not insist on concrete names or a case here, but I think that identifiers 
are more
useful when they don't contain spaces. For example that name will be exposed 
later
in other places and can be part of some longer string.



Why do we have two per-buffer loops in SimpleLruInit?  I mean, why not
add the LWLockInitialize call to the second one?


Thanks. I didn't see that.



I'm up to speed on how the LWLockTranche API works -- does assigning to
tranche_name a pstrdup string work okay?  Is the pstrdup really
necessary?


I think pstrdup can be removed here.




/* Initialize our shared state struct */
diff --git a/src/backend/access/transam/slru.c 
b/src/backend/access/transam/slru.c
index 90c7cf5..868b35a 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -157,6 +157,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
if (nlsns > 0)
sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));/* 
group_lsn[] */

+   sz += MAXALIGN(nslots * sizeof(LWLockPadded)); /* lwlocks[] */
+
return BUFFERALIGN(sz) + BLCKSZ * nslots;
}


What is the "lwlocks[]" comment supposed to mean?  I don't think there's
a struct member with that name, is there?


It just means that we are allocating memory for an array of lwlocks,
i'll change it.



Uhm, actually, why do we keep buffer_locks[] at all?  This arrangement
seems pretty odd, where if I understand correctly we have one array
which is the tranche and another array which points to each item in the
tranche ...


Actually yes, that is a good idea.


Attached a new version of the patch that moves SLRU tranches and LWLocks 
to SLRU control structs.


`buffer_locks` field now contains LWLocks itself, so we have some 
economy of the memory here. `pstrdup` removed in SimpleLruInit. I didn't

change names from the previous patch yet, but I don't mind if they'll
be changed.

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3a58f1e..ea83655 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -456,7 +456,7 @@ void
 CLOGShmemInit(void)
 {
 	ClogCtl->PagePrecedes = CLOGPagePrecedes;
-	SimpleLruInit(ClogCtl, "CLOG Ctl", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
+	SimpleLruInit(ClogCtl, "clog", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
   CLogControlLock, "pg_clog");
 }
 
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index b21a313..349228d 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -478,7 +478,7 @@ CommitTsShmemInit(void)
 	bool		found;
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
-	SimpleLruInit(CommitTsCtl, "CommitTs Ctl", CommitTsShmemBuffers(), 0,
+	SimpleLruInit(CommitTsCtl, "commit_timestamp", CommitTsShmemBuffers(), 0,
   CommitTsControlLock, "pg_commit_ts");
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 7d97085..b66a2b6 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1838,10 +1838,10 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-  "MultiXactOffset Ctl", NUM_MXACTOFFSET_BUFFERS, 0,
+  "multixact_offset", NUM_MXACTOFFSET_BUFFERS, 0,
   MultiXactOffsetControlLock, "pg_multixact/offsets");
 	SimpleLruInit(MultiXactMemberCtl,
-  "MultiXactMember Ctl", NUM_MXACTMEMBER_BUFFERS, 0,
+  "multixact_member", NUM_MXACTMEMBER_BUFFERS, 0,
   MultiXactMemberControlLock, "pg_multixact/members");
 
 	/* Initialize our shared state struct */
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 90c7cf5..6a5c7c3 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -152,7 +152,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz +=

Re: [HACKERS] proposal: multiple psql option -c

2015-11-11 Thread Michael Paquier

On Tue, Nov 10, 2015 at 2:18 AM, Pavel Stehule  wrote:
> Hi
>
> 2015-11-05 22:23 GMT+01:00 Robert Haas :
>>
>> On Thu, Nov 5, 2015 at 3:53 PM, Catalin Iacob 
>> wrote:
>> > On Thu, Nov 5, 2015 at 5:27 PM, Robert Haas 
>> > wrote:
>> >>> I wrote some text. But needs some work of native speaker.
>> >>
>> >> It does.  It would be nice if some kind reviewer could help volunteer
>> >> to clean that up.
>> >
>> > I'll give it a go sometime next week.
>>
>> Thanks, that would be great!
>>
>> I recommend comparing the section on -c and the section on -C, and
>> probably updating the former as well as adjusting the wording of the
>> latter.  We don't want to repeat all the same details in both places,
>> but we hopefully want to give people a little clue that if they're
>> thinking about using -c, they may wish to instead consider -C.

Just catching up with this thread... Using a separate option looks
fine to me, and it's definitely better to leave -c alone due to its
transactional behavior. I guess that it is true that more than one
person got caught by the fact that -c was running all its stuff within
a single transaction, particularly when having queries that do not
like transaction blocks.

> -g was replaced by -C option and some other required changes.
>
> I have not idea about good long name. In this moment I used "multi-command".
> Can be changed freely.

Or --command-multi, or --multiple-commands, though I have a good
history here at choosing bad names.

> The name of this patch is same (although it doesn't use "group-command"
> internally anymore) due better orientation.

I have been looking this patch a bit, and here are some comments:

/*
 * process slash command if one was given to -c
 */
else if (options.action == ACT_SINGLE_SLASH)
This comment needs to be updated.

+   else if (options.action == ACT_COMMAND_LINE)
+   {
+   pset.notty = true;
+
+   /* use singleline mode, doesn't need semicolon on the
end line */
+   SetVariableBool(pset.vars, "SINGLELINE");
Er, enforcing an option is not user-friendly.

+   /* Is there some unprocessed multi command? */
"Check presence of unprocessed commands"

@@ -451,7 +491,6 @@ MainLoop(FILE *source)
return successResult;
 }  /* MainLoop() */

-
 /*
This is unnecessary diff noise.

+   fprintf(stderr, _("%s: options -c/--command and
-C/--multi_command cannot be used together\n"),
+   pset.progname);
I would rather formulate that as "cannot use --opt1 and --opt2 together".

+  -C command(s)
Don't think you need the "(s)" here.

+  
+  Specifies that psql is to execute one or
+  more command strings, commands,
+  and then exit. This is useful in shell scripts. Start-up files
+  (psqlrc and ~/.psqlrc) are
+  ignored with this option.
+  
This is a copy-paste of the same paragraph for option -c.

It seems to me that the documentation should specify that when -C is
used with -1 each individual series of commands is executed within a
transaction block. As far as I understood this command:
psql -1 -c 'SELECT 1;SELECT 2' -c 'SELECT 3;SELECT4'
is equivalent to that:
BEGIN:
SELECT 1;
SELECT 2;
COMMIT;
BEGIN:
SELECT 3;
SELECT 4;
COMMIT;

s/commads/commands/, and the documentation needs a good brush up:
- The first paragraph is a copy of what is used for -c
- Specifying multiple times -C concatenates those series of commands
into mutiple subsets running in their own transaction.
- Documentation should clearly mention what the interactions with -1.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-11 Thread Thomas Munro

On Thu, Nov 12, 2015 at 12:10 AM, Ants Aasma  wrote:

> On Wed, Nov 11, 2015 at 11:22 AM, Thomas Munro
>  wrote:
> > On Wed, Nov 11, 2015 at 9:42 PM, Heikki Linnakangas 
> wrote:
> >> On 11/11/2015 10:23 AM, Simon Riggs wrote:
> >>> Thanks for working on this issue.
> >>
> >> +1.
>
> +1. I have seen a lot of interest for something along these lines.
>
> >> I'm thinking the client should get some kind of a token back from the
> >> commit, and it could use the token on the standby, to wait for that
> commit
> >> to be applied. The token could be just the XID, or the LSN of the commit
> >> record. Or the application could generate the token and pass it to the
> >> server in the commit, similar to how 2PC works. So the interaction
> would be
> >> something like:
> >>
> >> In master:
> >> BEGIN;
> >> INSERT INTO FOO ...;
> >> COMMIT;
> >> Server returns: COMMITted with token 1234
> >>
> >> Later, in standby:
> >> BEGIN WAIT FOR COMMIT 1234 TO BE VISIBLE;
> >> SELECT * FROM foo;
> >> ...
>
> To avoid read anomalies (backwards timetravel) it should also be
> possible to receive a token from read-only transactions based on the
> latest snapshot used.
>
> > My thinking was that the reason for wanting to load balance over a set of
> > hot standbys is because you have a very read-heavy workload, so it makes
> > sense to tax the writers and leave the many dominant readers unburdened,
> so
> > (3) should be better than (2) for the majority of users who want such a
> > configuration.  (Note also that it's not a requirement to tax every
> write;
> > with this proposal you can set causal_reads to off for those transactions
> > where you know there is no possibility of a causally dependent read).
> >
> > As for (1), my thinking was that most application developers would
> probably
> > prefer not to have to deal with that type of interface.  For users who do
> > want to do that, it would be comparatively simple to make that possible,
> and
> > would not conflict with this proposal.  This proposal could be used by
> > people retrofitting load balancing to an existing applications with
> relative
> > ease, or simply not wanting to have to deal with LSNs and complexity.  (I
> > have considered proposing pg_wait_for_xlog_replay_location(lsn, timeout)
> > separately, which could be called on a standby with the lsn obtained from
> > pg_current_xlog_location() on the primary any time after a COMMIT
> completes,
> > but I was thinking of that as a different feature addressing a different
> > user base: people prepared to do more work to squeeze out some extra
> > performance.)
>
> Although I still think that 1) is the correct long term solution I
> must say that I agree with the reasoning presented. I think we should
> review the API in the light that in the future we might have a mix of
> clients, some clients that are able to keep track of causality tokens
> and either want to wait when a read request arrives, or pick a host to
> use based on the token, and then there are "dumb" clients that want to
> use write side waits.
>

Exactly!

I see the causality tokens approach (thank you for that terminology) not so
much as a "long term" solution, but rather as an expert feature likely to
interest a small number of sophisticated users willing to take on more
responsibility in exchange for greater control.  We should definitely add
support for that, and I expect the patch would be fairly simple and short.

But I believe the vast majority of users would like to be able to run new
and existing plain SQL on any node and see the data they just wrote, with
graceful failure modes, and without extra conceptual load or invasive code
changes.  So I think we should cater for that mode of usage that too.

Also, it should be possible to configure which standbys are considered
> for waiting on. Otherwise a reporting slave will occasionally catch up
> enough to be considered "available" and then cause a latency peak when
> a long query blocks apply again.
>

Good point.  Here's a new version which adds the GUC
causal_reads_standby_names, defaulting to '*' (but as before, the feature
is not activated until you set causal_reads_timeout).  Now you can list
standby names explicitly if you want a way to exclude certain standbys.
Also, I noticed that cascaded standbys shouldn't be available for causal
reads, so I added a check for that.

-- 
Thomas Munro
http://www.enterprisedb.com


causal-reads-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-table log_autovacuum_min_duration is actually documented

2015-11-11 Thread Michael Paquier

On Thu, Nov 12, 2015 at 9:06 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Thu, Nov 12, 2015 at 6:27 AM, Alvaro Herrera
>>  wrote:
>>> I think you're remembering this:
>>> http://www.postgresql.org/message-id/20150402205713.gb22...@eldon.alvh.no-ip.org
>
>> Right. Thanks. Do you think we'd still want a patch to improve that?
>
> Give it a try if you like, and see whether it seems to improve matters.
> I did not try moving material around like that in the patch I committed
> earlier today.

Hm. I am not sure we are quite at the point of hacking something. The
pinpoint regarding such a change would be where to gather a
description of all those storage parameters, which are already divided
by type: per-table and per-index. A new section called "Storage
Parameters" in the chapter "Server Configuration", just after "Query
Planning" for example would make sense. Then we could have a section
for indexes parameters, one for tables, and one for parameters shared
between both, like fillfactor. Then we would need to add to link to
this new section in the pages of CREATE/ALTER TABLE/INDEX.

Does that make sense? The fact that we have per-index and per-table
parameters is perhaps an argument sufficient to keep things the way
they are now, though we had better add an indexterm for example for
fillfactor.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-11-11 Thread Etsuro Fujita


On 2015/11/12 2:53, Robert Haas wrote:

On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita
 wrote:

To test this change, I think we should update the postgres_fdw patch so as
to add the RecheckForeignScan.

Having said that, as I said previously, I don't see much value in adding the
callback routine, to be honest.  I know KaiGai-san considers that that would
be useful for custom joins, but I don't think that that would be useful even
for foreign joins, because I think that in case of foreign joins, the
practical implementation of that routine in FDWs would be to create a
secondary plan and execute that plan by performing ExecProcNode, as my patch
does [1].  Maybe I'm missing something, though.



I really don't see why you're fighting on this point.  Making this a
generic feature will require only a few extra lines of code for FDW
authors.  If this were going to cause some great inconvenience for FDW
authors, then I'd agree it isn't worth it.  But I see zero evidence
that this is actually the case.


Really?  I think there would be not a little burden on an FDW author; 
when postgres_fdw delegates to the subplan to the remote server, for 
example, it would need to create a remote join query by looking at 
tuples possibly fetched and stored in estate->es_epqTuple[], send the 
query and receive the result during the callback routine.  Furthermore, 
what I'm most concerned about is that wouldn't be efficient.  So, my 
question about that approach is whether FDWs really do some thing like 
that during the callback routine, instead of performing a secondary join 
plan locally.  As I said before, I know that KaiGai-san considers that 
that approach would be useful for custom joins.  But I see zero evidence 
that there is a good use-case for an FDW.



From my point of view I'm now
thinking this solution has two parts:

(1) Let foreign scans have inner and outer subplans.  For this
purpose, we only need one, but it's no more work to enable both, so we
may as well.  If we had some reason, we could add a list of subplans
of arbitrary length, but there doesn't seem to be an urgent need for
that.

(2) Add a recheck callback.

If the foreign data wrapper wants to adopt the solution you're
proposing, the recheck callback can call
ExecProcNode(outerPlanState(node)).  I don't think this should end up
being more than a few lines of code, although of course we should
verify that.  So no problem: postgres_fdw and any other FDWs where the
remote side is a database can easily delegate to a subplan, and
anybody who wants to do something else still can.

What is not to like about that?





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Make timestamptz_out less slow.

2015-11-11 Thread Peter Geoghegan

On Wed, Nov 4, 2015 at 6:30 PM, David Rowley
 wrote:
>> Have you thought about *just* having an int64 pg_ltostr()? Are you
>> aware of an appreciable overhead from having int32 callers just rely
>> on a promotion to int64?
>
> I'd not thought of that. It would certainly add unnecessary overhead on
> 32bit machines.
> To be honest, I don't really like the idea of making fsec_t an int64, there
> are just to many places around the code base that need to be updated. Plus
> with float datetimes, we'd need to multiple the fractional seconds by
> 10, which is based on the MAX_TIME_PRECISION setting as defined when
> HAVE_INT64_TIMESTAMP is undefined.

OK.

>> I would just use a period/full stop here:
>>
>> + * Note str is not NUL terminated, callers are responsible for NUL
>> terminating
>> + * str themselves.
>>
>
> Do you mean change the comment to "Note str is not NUL terminated. Callers
> are responsible for NUL terminating str themselves." ?

Yes.

> Yes, that's a bit ugly. How about just Assert(padding > 0) just like above?
> That gets rid of one Note:
> The optimized part is probably not that important anyway. I just mentioned
> it to try and reduce the changes of someone using it when the padding is
> always too small.

Cool.

>> Maybe it's better to just special case INT_MIN and the do an Abs()?
>> Actually, would any likely caller of pg_ltostr_zeropad() actually care
>> if INT_MIN was a case that you cannot handle, and assert against? You
>> could perhaps reasonably make it the caller's problem. Haven't made up
>> my mind if your timestamp_delta.patch is better than that; cannot
>> immediately see a problem with putting it on the caller.
>>
>
> I can make that case work the same as pg_ltoa() for pg_ltostr(), that's not
> a problem, I did that in the delta patch. With pg_ltostr_zeropad() I'd need
> to add some corner case code to prepend the correct number of zeros onto
> -2147483648. This is likely not going to look very nice, and also it's
> probably going to end up about the same number of lines of code to what's in
> the patch already. If you think it's better for performance, then I've
> already done tests to show that it's not better. I really don't think it'll
> look very nice either. I quite like my negative number handling, since it's
> actually code that would be exercised 50% of the time ran than 1/ 2^32 of
> the time, assuming all numbers have an equal chance of being passed.

Fair enough.

>> More generally, maybe routines like EncodeDateTime() ought now to use
>> the Abs() macro.
>
> You might be right about that. Then we can just have pg_ltostr() do unsigned
> only. The reason I didn't do that is because I was trying to minimise what I
> was changing in the patch, and I thought it was less likely to make it if I
> started adding calls to Abs() around the code base.

I am very familiar with being conflicted about changing things beyond
what is actually strictly necessary. It's still something I deal with
a lot. One case that I think I have right in recommending commenting
on is the need to centrally document that there are many exceptions to
the 1900-based behavior of struct pg_tm. As I said, we should not
continue to inconsistently locally note the exceptions, even if your
patch doesn't make it any worse.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-11-11 Thread Kouhei Kaigai

> On Wed, Nov 11, 2015 at 3:29 PM, Andres Freund  wrote:
> > On 2015-11-11 14:59:33 -0500, Robert Haas wrote:
> >> I don't see this as being a particularly good idea.  The same issue
> >> exists for FDWs, and we're just living with it in that case.
> >
> > It's absolutely horrible there. I don't see why that's a justification
> > for much.  To deal with the lack of extensible copy/out/readfuncs I've
> > just had to copy the entirety of readfuncs.c into an extension. Or you
> > build replacements for those (as e.g. postgres_fdw essentially has
> > done).
> >
> >> If we do want to improve it, I'm not sure this is the way to go,
> >> either.  I think there could be other designs where we focus on making
> >> the serialization and deserialization options better, rather than
> >> letting people tack stuff onto the struct.
> >
> > Just better serialization doesn't actually help all that much. Being
> > able to conveniently access data directly, i.e. as fields in a struct,
> > makes code rather more readable. And in many cases more
> > efficient. Having to serialize internal datastructures unconditionally,
> > just so copyfuncs.c works if actually used, makes for a fair amount of
> > inefficiency (forced deserialization, even when not copying) and uglier
> > code.
> 
> Fair enough, but I'm still not very interested in having something for
> CustomScan only.
>
I agree with we have no reason why only custom-scan is allowed to have
serialize/deserialize capability. I can implement an equivalent stuff
for foreign-scan also, and it is helpful for extension authors, especially,
who tries to implement remote join feature because FDW driver has to
keep larger number of private fields than scan.

Also, what is the reason why we allow extensions to define a larger
structure which contains CustomPath or CustomScanState? It seems to
me that these types are not (fully) supported by the current copyfuncs,
outfuncs and readfuncs, aren't it?
(Although outfuncs.c supports path-nodes, thus CustomPathMethods has
TextOut callback but no copy/read handler at this moment.)

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-11-11 Thread Kouhei Kaigai

> Robert Haas  writes:
> > On Wed, Nov 11, 2015 at 3:29 PM, Andres Freund  wrote:
> >> Just better serialization doesn't actually help all that much. Being
> >> able to conveniently access data directly, i.e. as fields in a struct,
> >> makes code rather more readable. And in many cases more
> >> efficient. Having to serialize internal datastructures unconditionally,
> >> just so copyfuncs.c works if actually used, makes for a fair amount of
> >> inefficiency (forced deserialization, even when not copying) and uglier
> >> code.
> 
> > Fair enough, but I'm still not very interested in having something for
> > CustomScan only.
> 
> I'm with Robert here.  The fact is that postgres_fdw and other FDWs are
> living just fine with the restrictions we put in to allow copyfuncs.c to
> copy ForeignScan, and there's been no sufficient justification offered
> as to why CustomScan can't play by those rules.
>
So, I'd like to propose copy/out/read callbacks for both of ForeignScan
and CustomScan.

> Yes, it would be nice to be able to use a more-tailored struct definition,
> but it's not *necessary*.  We should not contort the core system
> enormously in order to provide a bit more notational cleanliness to
> extensions.
> 
> I'd be fine with fixing this if a nice solution were to present itself,
> but so far nothing's been offered that wasn't horrid.
> 
> BTW, the "inefficiency" argument is junk, unless you provide some hard
> evidence of a case where it hurts a lot.  If we were forcing people to
> use serialized representations at execution time, it would be meaningful,
> but we're not; you can convert as needed at executor setup time.
>
Indeed, it is a "nice to have" feature. We can survive without luxury
goods but more expensive tax.

Once an extension tries to implement own join logic, it shall have
much larger number of private fields than usual scan logic.
Andres got a shock when he looked at GpuJoin code of PG-Strom before
because of its pack/unpack routine at:
  https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L112
The custom_private/fdw_private have to be safe to copyObject(), thus,
we have to pack/unpack private variables to/from a List structure.
When we design serialization/deserialization of Plan nodes, it means
ForeignScan and CustomScan has to pay double cost for this.

I never say it is a "necessary" feature, but "very nice to have".

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [DESIGN] ParallelAppend

2015-11-11 Thread Kouhei Kaigai

I'm now designing the parallel feature of Append...

Here is one challenge. How do we determine whether each sub-plan
allows execution in the background worker context?

The commit f0661c4e8c44c0ec7acd4ea7c82e85b265447398 added
'parallel_aware' flag on Path and Plan structure.
It tells us whether the plan/path-node can perform by multiple
background worker processes concurrently, but also tells us
nothing whether the plan/path-node are safe to run on background
worker process context.

From the standpoint of parallel execution, I understand here are
three types of plan/path nodes.

Type-A) It can be performed on background worker context and
   picked up by multiple worker processes concurrently.
   (e.g: Parallel SeqScan)
Type-B) It can be performed on background worker context but
   cannot be picked up by multiple worker processes.
   (e.g: non-parallel aware node)
Type-C) It should not be performed on background worker context.
   (e.g: plan/path node with volatile functions)

The 'parallel_aware' flag allows to distinguish the type-A and
others, however, we cannot distinguish type-B from type-C.
From the standpoint of parallel append, it makes sense to launch
background workers even if all the sub-plan are type-B, with no
type-A node.

Sorry for late. I'd like to suggest to have 'int num_workers'
in Path and Plan node as a common field.
We give this field the following three meaning.
- If num_workers > 1, it is type-A node, thus, parallel aware
  Append node shall assign more than one workers on this node.
- If num_workers == 1, it is type-B node, thus, more than one
  background worker process shall be never assigned.
- If num_workers == 0, it is type-C node, thus, planner shall
  give up to run this node on background worker context.

The num_workers state shall propagate to the upper node.
For example, I expect a HashJoin node that takes Parallel
SeqScan with num_workers == 4 as outer input will also have
num_worker == 4, as long as join clauses are safe to run on
background worker side.

How about the idea?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
> Sent: Saturday, October 31, 2015 1:35 AM
> To: Robert Haas
> Cc: pgsql-hackers@postgresql.org; Amit Kapila; Kyotaro HORIGUCHI
> Subject: Re: [HACKERS] [DESIGN] ParallelAppend
> 
> > On Wed, Oct 28, 2015 at 3:55 PM, Kouhei Kaigai  wrote:
> > > At PGconf.EU, I could have a talk with Robert about this topic,
> > > then it became clear we have same idea.
> > >
> > >> ++
> > >> |sub-plan |   * Sub-Plan 1 ... Index Scan on p1
> > >> |index on *-> * Sub-Plan 2 ... PartialSeqScan on p2
> > >> |shared   |   * Sub-Plan 2 ... PartialSeqScan on p2
> > >> |memory   |   * Sub-Plan 2 ... PartialSeqScan on p2
> > >> +-+   * Sub-Plan 3 ... Index Scan on p3
> > >>
> > > In the above example, I put non-parallel sub-plan to use only
> > > 1 slot of the array, even though a PartialSeqScan takes 3 slots.
> > > It is a strict rule; non-parallel aware sub-plan can be picked
> > > up once.
> > > The index of sub-plan array is initialized to 0, then increased
> > > to 5 by each workers when it processes the parallel-aware Append.
> > > So, once a worker takes non-parallel sub-plan, other worker can
> > > never take the same slot again, thus, no duplicated rows will be
> > > produced by non-parallel sub-plan in the parallel aware Append.
> > > Also, this array structure will prevent too large number of
> > > workers pick up a particular parallel aware sub-plan, because
> > > PartialSeqScan occupies 3 slots; that means at most three workers
> > > can pick up this sub-plan. If 1st worker took the IndexScan on
> > > p1, and 2nd-4th worker took the PartialSeqScan on p2, then the
> > > 5th worker (if any) will pick up the IndexScan on p3 even if
> > > PartialSeqScan on p2 was not completed.
> >
> > Actually, this is not exactly what I had in mind.  I was thinking that
> > we'd have a single array whose length is equal to the number of Append
> > subplans, and each element of the array would be a count of the number
> > of workers executing that subplan.  So there wouldn't be multiple
> > entries for the same subplan, as you propose here.  To distinguish
> > between parallel-aware and non-parallel-aware plans, I plan to put a
> > Boolean flag in the plan itself.
> >
> I don't have strong preference here. Both of design can implement the
> requirement; none-parallel sub-plans are never picked up twice, and
> parallel-aware sub-plans can be picked up multiple times.
> So, I'll start with the above your suggestion.
> 
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei 
> 
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your

Re: [HACKERS] can we add SKIP LOCKED to UPDATE?

2015-11-11 Thread Greg Stark

On Wed, Nov 11, 2015 at 6:57 PM, Gavin Flower
 wrote:
> Don't you realize that 400MB is over 4 million of the old 100Kb floppy
> disks, and even with the new big 1.44MB 3.5 " disks, you'd need about 280!!!

Don't be silly. It's only four thousand 100Kb floppies.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Amit Langote

On 2015/11/12 4:26, Robert Haas wrote:
> On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule  
> wrote:
>> I have a first query
>>
>> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
>> differen
> 
> Hmm, I see I was right about people finding more bugs once this was
> committed.  That didn't take long.

I encountered one more odd behavior:

postgres=# EXPLAIN ANALYZE SELECT abalance FROM pgbench_accounts WHERE aid
= 23466;
QUERY PLAN

---
 Gather  (cost=1000.00..65207.88 rows=1 width=4) (actual
time=17450.595..17451.151 rows=1 loops=1)
   Number of Workers: 4
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..64207.78 rows=1
width=4) (actual time=55.934..157001.134 rows=2 loops=1)
 Filter: (aid = 23466)
 Rows Removed by Filter: 18047484
 Planning time: 0.198 ms
 Execution time: 17453.565 ms
(7 rows)


The #rows removed here is almost twice the number of rows in the table
(10m). Also, the #rows selected shown is 2 for Parallel Seq Scan whereas
only 1 row is selected.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making tab-complete.c easier to maintain

2015-11-11 Thread Thomas Munro

New version attached, merging recent changes.

-- 
Thomas Munro
http://www.enterprisedb.com


tab-complete-v8.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [DESIGN] ParallelAppend

2015-11-11 Thread Amit Langote

On 2015/11/12 14:09, Kouhei Kaigai wrote:
> I'm now designing the parallel feature of Append...
> 
> Here is one challenge. How do we determine whether each sub-plan
> allows execution in the background worker context?
> 
> The commit f0661c4e8c44c0ec7acd4ea7c82e85b265447398 added
> 'parallel_aware' flag on Path and Plan structure.
> It tells us whether the plan/path-node can perform by multiple
> background worker processes concurrently, but also tells us
> nothing whether the plan/path-node are safe to run on background
> worker process context.

When I was looking at the recent parallelism related commits, I noticed a
RelOptInfo.consider_parallel flag. That and the function
set_rel_consider_parallel() may be of interest in this regard.
set_append_rel_size() passes the parent's state of this flag down to child
relations but I guess that's not  what you're after.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-11 Thread Craig Ringer

On 3 November 2015 at 02:58, Jim Nasby  wrote:

> On 11/2/15 8:36 AM, Craig Ringer wrote:
>
>> Here's the protocol documentation discussed in the README. It's
>> asciidoc at the moment, so it can be formatted into something with
>> readable tables.
>>
>
> Is this by chance up on github? It'd be easier to read the final output
> there than the raw asciidoctor. ;)
>

HTML for the protocol documentation attached.

The docs are being converted to SGML at the moment.

I'll be pushing an update of pglogical_output soon. Patch in a followup
mail.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
<<< text/html; charset=UTF-8; name="protocol.html": Unrecognized >>>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple psql option -c

2015-11-11 Thread Michael Paquier

On Thu, Nov 12, 2015 at 9:35 AM, David G. Johnston
 wrote:
> On Wed, Nov 11, 2015 at 7:01 AM, Michael Paquier 
> wrote:
>>
>> It seems to me that the documentation should specify that when -C is
>> used with -1 each individual series of commands is executed within a
>> transaction block.
>
> In summary:
>
> Default (Not Single + Auto-Commit): One Transactions per parsed statement in
> all -Cs []
> Single + Auto-Commit: One Transaction per -C [--single-transaction] {same as
> --no-auto-commit]
> Not Single + Not Auto-Commit: One Transaction per -C [--no-auto-commit]
> {same as --single-transaction}
> Single + Not Auto-Commit: One Transaction covering all -Cs [--no-auto-commit
> --single-transaction]
> Explanation:

I am assuming you refer to this command in your analysis: "#1
Statement #1" being the first statement of the first -C, "#1 Statement
#2" the second statement in the first -C switch, and "Statement Only"
the statement of of a second -C switch. Or simply that:
psql -C 'Statement1,Statement2' -C 'Statement'

> The transactional behavior of -C
> can, with defaults, be described thusly:
> BEGIN:
> -C #1 Statement #1
> COMMIT;
> BEGIN;
> -C #1 Statement #2
> COMMIT;
> BEGIN;
> -C #2 Statement Only
> COMMIT;

Yep, that's what it does by going though MainLoop().

> Basically the explicit representation of Auto-Commit "on" Mode
>
> I don't understand how -c implements the promise of:
> """
> If the command string contains multiple SQL commands, they are processed in
> a single transaction, unless there are explicit BEGIN/COMMIT commands
> included in the string to divide it into multiple transactions.
> """

-c simply processes everything it has within libpq in one shot. This
code path does not care about --single-transaction, and it relies on a
backend check to be sure that nothing not allowed within a transaction
block runs when multiple queries are passed though it.

> But my gut (and Pavel) says that this is "legacy behavior" that should not
> be carried over to -C.  I would suggest going further and disallowing
> transaction control statements within -C commands.
>
> Now, in the presence of "--single-transaction" we would convert the
> transactional behavior from that shown above to:
>
> BEGIN;
> -C #1 Statement #1
> -C #1 Statement #2
> COMMIT; -- auto-committed;
> BEGIN;
> -C #2
> COMMIT;

Correct.

> Additionally, if the variable AUTOCOMMIT is "off" then the implicit script
> should look like:
>
> BEGIN;
> -C #1 Statement #1
> -C #2 Statement #2
> -C #2
> COMMIT;
> So a "true" single transaction requires setting AUTOCOMMIT to off otherwise
> you only get each -C singly.

Yeah, that's what -c does actually by relying on the backend to check
incompatibilities regarding stuff that should not run in transaction
blocks.

> I would suggest adding an action "--no-auto-commit" option to complete the
> existence of the "--single-transaction" option.  While the variable method
> works it doesn't feel as clean now that we are adding this option that (can)
> make direct use of it.
>
> Specifying only --no-auto-commit results in:
> BEGIN;
> -C #1 Statement #1
> -C #1 Statement #2
> COMMIT;
> BEGIN;
> -C #2
> COMMIT;
> Which is redundant with specifying only "--single-transaction".  Each -C
> still commits otherwise you would just use the default.

Don't you mean that actually:
BEGIN;
-C #1 Statement #1
-C #1 Statement #2
-C #2
COMMIT;
By the way there is no much point to this option. It seems to me that
it is already possible to do it with --set AUTOCOMMIT=off.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-11-11 Thread Kyotaro HORIGUCHI

Hello,


> > I really don't see why you're fighting on this point.  Making this a
> > generic feature will require only a few extra lines of code for FDW
> > authors.  If this were going to cause some great inconvenience for FDW
> > authors, then I'd agree it isn't worth it.  But I see zero evidence
> > that this is actually the case.
> 
> Really?  I think there would be not a little burden on an FDW author;
> when postgres_fdw delegates to the subplan to the remote server, for
> example, it would need to create a remote join query by looking at
> tuples possibly fetched and stored in estate->es_epqTuple[], send the
> query and receive the result during the callback routine.

Do you mind that FDW cannot generate a plan so that make a tuple
from eqpTules then apply fdw_quals from predefined executor
nodes?

The returned tuple itself can be stored in fdw_private as I think
Kiagai-san said before. So it is enough if we can fabricate a
Result node outerPlan of which is ForeignScan, which somehow
returns the tuple to examine.

I should be missing something, though.

regards,

> Furthermore, what I'm most concerned about is that wouldn't be
> efficient.  So, my question about that approach is whether FDWs really
> do some thing like that during the callback routine, instead of
> performing a secondary join plan locally.  As I said before, I know
> that KaiGai-san considers that that approach would be useful for
> custom joins.  But I see zero evidence that there is a good use-case
> for an FDW.
> 
> > From my point of view I'm now
> > thinking this solution has two parts:
> >
> > (1) Let foreign scans have inner and outer subplans.  For this
> > purpose, we only need one, but it's no more work to enable both, so we
> > may as well.  If we had some reason, we could add a list of subplans
> > of arbitrary length, but there doesn't seem to be an urgent need for
> > that.
> >
> > (2) Add a recheck callback.
> >
> > If the foreign data wrapper wants to adopt the solution you're
> > proposing, the recheck callback can call
> > ExecProcNode(outerPlanState(node)).  I don't think this should end up
> > being more than a few lines of code, although of course we should
> > verify that.  So no problem: postgres_fdw and any other FDWs where the
> > remote side is a database can easily delegate to a subplan, and
> > anybody who wants to do something else still can.
> >
> > What is not to like about that?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect patch, for showing tuple data

2015-11-11 Thread Michael Paquier

On Thu, Nov 12, 2015 at 9:04 AM, Michael Paquier wrote:
> On Thu, Nov 12, 2015 at 12:41 AM, Nikolay Shaplov wrote:
>> I still have an opinion that documentation should be more verbose, than your
>> version, but I can accept your version.
>
> I am not sure that's necessary, pageinspect is for developers.
>
>> Who is going to add heap_page_item_attrs to your patch? me or you?
>
> I recall some parts of the code I still did not like much :) I'll grab
> some room to have an extra look at it.

I have added back heap_page_item_attrs the patch attached, fixing at
the same time some typos found while looking again at the code. If you
are fine with this version, let's have a committer look at it.
-- 
Michael
diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index aec5258..91ab119 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -5,9 +5,9 @@ OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
 		  brinfuncs.o ginfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.3.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.4.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 8d1666c..fdbb66c 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -27,8 +27,11 @@
 
 #include "access/htup_details.h"
 #include "funcapi.h"
-#include "utils/builtins.h"
+#include "catalog/pg_type.h"
 #include "miscadmin.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
 
 
 /*
@@ -55,6 +58,42 @@ bits_to_text(bits8 *bits, int len)
 
 
 /*
+ * text_to_bits
+ *
+ * Converts a c-string representation of bits into a bits8-array. This is
+ * the reverse operation of previous routine.
+ */
+static bits8 *
+text_to_bits(char *str, int len)
+{
+	bits8	   *bits;
+	int			off = 0;
+	char		byte = 0;
+
+	bits = palloc(len + 1);
+
+	while (off < len)
+	{
+		if (off % 8 == 0)
+			byte = 0;
+
+		if ((str[off] == '0') || (str[off] == '1'))
+			byte = byte | ((str[off] - '0') << off % 8);
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_DATA_CORRUPTED),
+	 errmsg("illegal character '%c' in t_bits string", str[off])));
+
+		if (off % 8 == 7)
+			bits[off / 8] = byte;
+
+		off++;
+	}
+
+	return bits;
+}
+
+/*
  * heap_page_items
  *
  * Allows inspection of line pointers and tuple headers of a heap page.
@@ -122,8 +161,8 @@ heap_page_items(PG_FUNCTION_ARGS)
 		HeapTuple	resultTuple;
 		Datum		result;
 		ItemId		id;
-		Datum		values[13];
-		bool		nulls[13];
+		Datum		values[14];
+		bool		nulls[14];
 		uint16		lp_offset;
 		uint16		lp_flags;
 		uint16		lp_len;
@@ -154,7 +193,8 @@ heap_page_items(PG_FUNCTION_ARGS)
 			lp_offset + lp_len <= raw_page_size)
 		{
 			HeapTupleHeader tuphdr;
-			int			bits_len;
+			bytea		*tuple_data_bytea;
+			int			tuple_data_len;
 
 			/* Extract information from the tuple header */
 
@@ -168,6 +208,14 @@ heap_page_items(PG_FUNCTION_ARGS)
 			values[9] = UInt32GetDatum(tuphdr->t_infomask);
 			values[10] = UInt8GetDatum(tuphdr->t_hoff);
 
+			/* Copy raw tuple data into bytea attribute */
+			tuple_data_len = lp_len - tuphdr->t_hoff;
+			tuple_data_bytea = (bytea *) palloc(tuple_data_len + VARHDRSZ);
+			SET_VARSIZE(tuple_data_bytea, tuple_data_len + VARHDRSZ);
+			memcpy(VARDATA(tuple_data_bytea), (char *) tuphdr + tuphdr->t_hoff,
+	 tuple_data_len);
+			values[13] = PointerGetDatum(tuple_data_bytea);
+
 			/*
 			 * We already checked that the item is completely within the raw
 			 * page passed to us, with the length given in the line pointer.
@@ -180,11 +228,11 @@ heap_page_items(PG_FUNCTION_ARGS)
 			{
 if (tuphdr->t_infomask & HEAP_HASNULL)
 {
-	bits_len = tuphdr->t_hoff -
-		offsetof(HeapTupleHeaderData, t_bits);
+	int	bits_len =
+			((tuphdr->t_infomask2 & HEAP_NATTS_MASK) / 8 + 1) * 8;
 
 	values[11] = CStringGetTextDatum(
- bits_to_text(tuphdr->t_bits, bits_len * 8));
+ bits_to_text(tuphdr->t_bits, bits_len));
 }
 else
 	nulls[11] = true;
@@ -208,7 +256,7 @@ heap_page_items(PG_FUNCTION_ARGS)
 			 */
 			int			i;
 
-			for (i = 4; i <= 12; i++)
+			for (i = 4; i <= 13; i++)
 nulls[i] = true;
 		}
 
@@ -223,3 +271,205 @@ heap_page_items(PG_FUNCTION_ARGS)
 	else
 		SRF_RETURN_DONE(fctx);
 }
+
+/*
+ * split_tuple_data
+ *
+ * Split raw tuple data taken directly from a page into an array of bytea
+ * elements. This routine does a lookup on NULL values and creates array
+ * elements accordindly. This is a reimplementation of nocachegetattr()
+ * in heaptuple.c simplified for educational purposes.
+ */
+static Datum
+split_tuple_data(Oid relid, char *tupdata,
+

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-11-11 Thread Kouhei Kaigai





> -Original Message-
> From: Etsuro Fujita [mailto:fujita.ets...@lab.ntt.co.jp]
> Sent: Thursday, November 12, 2015 2:54 PM
> To: Robert Haas
> Cc: Kaigai Kouhei(海外 浩平); Tom Lane; Kyotaro HORIGUCHI;
> pgsql-hackers@postgresql.org; Shigeru Hanada
> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
> 
> On 2015/11/12 2:53, Robert Haas wrote:
> > On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita
> >  wrote:
> >> To test this change, I think we should update the postgres_fdw patch so as
> >> to add the RecheckForeignScan.
> >>
> >> Having said that, as I said previously, I don't see much value in adding 
> >> the
> >> callback routine, to be honest.  I know KaiGai-san considers that that 
> >> would
> >> be useful for custom joins, but I don't think that that would be useful 
> >> even
> >> for foreign joins, because I think that in case of foreign joins, the
> >> practical implementation of that routine in FDWs would be to create a
> >> secondary plan and execute that plan by performing ExecProcNode, as my 
> >> patch
> >> does [1].  Maybe I'm missing something, though.
> 
> > I really don't see why you're fighting on this point.  Making this a
> > generic feature will require only a few extra lines of code for FDW
> > authors.  If this were going to cause some great inconvenience for FDW
> > authors, then I'd agree it isn't worth it.  But I see zero evidence
> > that this is actually the case.
> 
> Really?  I think there would be not a little burden on an FDW author;
> when postgres_fdw delegates to the subplan to the remote server, for
> example, it would need to create a remote join query by looking at
> tuples possibly fetched and stored in estate->es_epqTuple[], send the
> query and receive the result during the callback routine.
>
I cannot understand why it is the only solution.
Our assumption is, FDW driver knows the best way to do. So, you can
take the best way for your FDW driver - including what you want to
implement in the built-in feature.

> Furthermore,
> what I'm most concerned about is that wouldn't be efficient. So, my
>
You have to add "because ..." sentence here because I and Robert
think a little inefficiency is not a problem. If you try to
persuade other parsons who have different opinion, you need to
introduce WHY you have different conclusion. (Of course, we might
oversight something)
Please don't start the sentence from "I think ...". We all knows
your opinion, but what I've wanted to see is "the reason why my
approach is valuable is ...".

I never suggest something technically difficult, but it is
a problem on communication.

> question about that approach is whether FDWs really do some thing like
> that during the callback routine, instead of performing a secondary join
> plan locally.
>
Nobody prohibits postgres_fdw performs a secondary join here.
All you need to do is, picking up a sub-plan tree from FDW's private
field then call ExecProcNode() inside the callback.

> As I said before, I know that KaiGai-san considers that
> that approach would be useful for custom joins.  But I see zero evidence
> that there is a good use-case for an FDW.
> 
> > From my point of view I'm now
> > thinking this solution has two parts:
> >
> > (1) Let foreign scans have inner and outer subplans.  For this
> > purpose, we only need one, but it's no more work to enable both, so we
> > may as well.  If we had some reason, we could add a list of subplans
> > of arbitrary length, but there doesn't seem to be an urgent need for
> > that.
> >
> > (2) Add a recheck callback.
> >
> > If the foreign data wrapper wants to adopt the solution you're
> > proposing, the recheck callback can call
> > ExecProcNode(outerPlanState(node)).  I don't think this should end up
> > being more than a few lines of code, although of course we should
> > verify that.  So no problem: postgres_fdw and any other FDWs where the
> > remote side is a database can easily delegate to a subplan, and
> > anybody who wants to do something else still can.
> >
> > What is not to like about that?
> >

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple psql option -c

2015-11-11 Thread Pavel Stehule

2015-11-12 1:35 GMT+01:00 David G. Johnston :

> On Wed, Nov 11, 2015 at 7:01 AM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>> It seems to me that the documentation should specify that when -C is
>> used with -1 each individual series of commands is executed within a
>> transaction block.
>
>
> In summary:
>
> Default (Not Single + Auto-Commit): One Transactions per parsed statement
> in all -Cs []
> Single + Auto-Commit: One Transaction per -C [--single-transaction] {same
> as --no-auto-commit]
> Not Single + Not Auto-Commit: One Transaction per -C [--no-auto-commit]
> {same as --single-transaction}
> Single + Not Auto-Commit: One Transaction covering all -Cs
> [--no-auto-commit --single-transaction]
> 
> Explanation:
>
> The transactional behavior of -C
> can, with defaults, be described thusly:
>
>
> BEGIN:
> -C #1 Statement #1
> COMMIT;
> BEGIN;
> -C #1 Statement #2
> COMMIT;
> BEGIN;
> -C #2 Statement Only
> COMMIT;
>
> Basically the explicit representation of Auto-Commit "on" Mode
>
> I don't understand how -c implements the promise of:
> """
> If the command string contains multiple SQL commands, they are processed
> in a single transaction, unless there are explicit BEGIN/COMMIT commands
> included in the string to divide it into multiple transactions.
> """
> But my gut (and Pavel) says that this is "legacy behavior" that should not
> be carried over to -C.  I would suggest going further and disallowing
> transaction control statements within -C commands.
>

This is relative difficult to implement - and from my view, it isn't
necessary

The implementation of "-c" is relative simple and then the options
"--single-transaction" or active autocommit has not effect. The string with
commands is pushed to server in one packet and it is processed as one
multicommand on server side. The implementation of "-C" is much more close
to interactive work - by default it is working in autocommit on mode and
following statements will be executed:

psql -C "cmd1;cmd2" -C "cmd3;cmd4"

executed statements:
cmd1;
cmd2;
cmd3;
cmd4;

or if you are thinking without implicit transactions:

BEGIN; cmd1; COMMIT;
BEGIN; cmd2; COMMIT;
BEGIN; cmd3; COMMIT;
BEGIN; cmd4; COMMIT;

when I use "--single-transaction", then the sequence of commands looks like:

BEGIN;
cmd1;
cmd2;
cmd3;
cmd4;
COMMIT;

I wouldn't to attach --single-transaction" option with individual "-C"
option, because the I feeling "--single-transaction" as global option.
More, partial transactions can be simply ensured by explicit transactions.
So I would to allow BEGIN,COMMIT in "-C" statements:

if I allow 'psql -C "BEGIN; cmd1; cmd2; COMMIT" -C "BEGIN; cmd3;cmd4;
COMMIT"

I am not big fan of some implicit transaction mechanisms and I prefer
simple joining implementation of "-C" with minimum design differences
against interactive work. This design looks simply.

The autocommit off mode is partially different, and I didn't though about
it. It requires explicit COMMIT (if it has to have some sense)

so if I run 'psql -C "cmd1;cmd2" -C"cmd3;cmd4"' in autocommit off mode,
then the result will be

BEGIN
cmd1;
cmd2;
cmd3;
cmd4;
 missing transaction end --- effective ROLLBACK -- it can good for
some "dry run" work.

but this mode can to allow

psql -C "cmd1;cmd2;COMMIT" -C "cmd3;cmd4; COMMIT"

It looks little bit obscure, but why not.

Using autocommit off and "--single-transaction" together is equivalent to
"--single-transaction" - but only in this case.

BEGIN; BEGIN; COMMIT; COMMIT isn't error

Regards

Pavel


>
> Now, in the presence of "--single-transaction" we would convert the
> transactional behavior from that shown above to:
>
> BEGIN;
> -C #1 Statement #1
> -C #1 Statement #2
> COMMIT; -- auto-committed;
> BEGIN;
> -C #2
> COMMIT;
>
> Additionally, if the variable AUTOCOMMIT is "off" then the implicit script
> should look like:
>
> BEGIN;
> -C #1 Statement #1
> -C #2 Statement #2
> -C #2
> COMMIT;
>
> So a "true" single transaction requires setting AUTOCOMMIT to off
> otherwise you only get each -C singly.
>
> I would suggest adding an action "--no-auto-commit" option to complete the
> existence of the "--single-transaction" option.  While the variable method
> works it doesn't feel as clean now that we are adding this option that
> (can) make direct use of it.
>
> Specifying only --no-auto-commit results in:
> BEGIN;
> -C #1 Statement #1
> -C #1 Statement #2
> COMMIT;
> BEGIN;
> -C #2
> COMMIT;
>
> Which is redundant with specifying only "--single-transaction".  Each -C
> still commits otherwise you would just use the default.
>
> David J.
>
>
>

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Thom Brown

On 11 November 2015 at 19:51, Thom Brown  wrote:
> On 11 November 2015 at 19:26, Robert Haas  wrote:
>> On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule  
>> wrote:
>>> I have a first query
>>>
>>> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
>>> differen
>>
>> Hmm, I see I was right about people finding more bugs once this was
>> committed.  That didn't take long.
>>
>> There's supposed to be code to handle this - see the
>> SharedPlanStateInstrumentation stuff in execParallel.c - but it's
>> evidently a few bricks shy of a load.
>> ExecParallelReportInstrumentation is supposed to transfer the counts
>> from each worker to the DSM:
>>
>> ps_instrument = >ps_instrument[i];
>> SpinLockAcquire(_instrument->mutex);
>> InstrAggNode(_instrument->instr, planstate->instrument);
>> SpinLockRelease(_instrument->mutex);
>>
>> And ExecParallelRetrieveInstrumentation is supposed to slurp those
>> counts back into the leader's PlanState objects:
>>
>> /* No need to acquire the spinlock here; workers have exited 
>> already. */
>> ps_instrument = >ps_instrument[i];
>> InstrAggNode(planstate->instrument, _instrument->instr);
>>
>> This might be a race condition, or it might be just wrong logic.
>> Could you test what happens if you insert something like a 1-second
>> sleep in ExecParallelFinish just after the call to
>> WaitForParallelWorkersToFinish()?  If that makes the results
>> consistent, this is a race.  If it doesn't, something else is wrong:
>> then it would be useful to know whether the workers are actually
>> calling ExecParallelReportInstrumentation, and whether the leader is
>> actually calling ExecParallelRetrieveInstrumentation, and if so
>> whether they are doing it for the correct set of nodes.
>
> Hmm.. I made the change, but clearly it's not sleeping properly with
> my change (I'm expecting a total runtime in excess of 1 second):
>
> max_parallel_degree = 4:
>
> # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->0->>'term' like 'design%' or
> content->'tags'->0->>'term' like 'web%';
>
> QUERY PLAN
> -
>  Aggregate  (cost=49578.18..49578.19 rows=1 width=0) (actual
> time=797.518..797.518 rows=1 loops=1)
>Output: count(*)
>Buffers: shared hit=174883 read=540
>->  Gather  (cost=1000.00..49546.93 rows=12500 width=0) (actual
> time=0.245..784.959 rows=55151 loops=1)
>  Output: content
>  Number of Workers: 4
>  Buffers: shared hit=174883 read=540
>  ->  Parallel Seq Scan on public.js  (cost=0.00..47296.93
> rows=12500 width=0) (actual time=0.019..6153.679 rows=94503 loops=1)
>Output: content
>Filter: (js.content -> 'tags'::text) -> 0) ->>
> 'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
> -> 0) ->> 'term'::text) ~~ 'web%'::text))
>Rows Removed by Filter: 2051330
>Buffers: shared hit=299224 read=907
>  Planning time: 0.086 ms
>  Execution time: 803.026 ms
>
>
> max_parallel_degree = 0:
>
> # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->0->>'term' like 'design%' or
> content->'tags'->0->>'term' like 'web%';
>
>  QUERY PLAN
> ---
>  Aggregate  (cost=212867.43..212867.44 rows=1 width=0) (actual
> time=1278.717..1278.717 rows=1 loops=1)
>Output: count(*)
>Buffers: shared hit=174671 read=572
>->  Seq Scan on public.js  (cost=0.00..212836.18 rows=12500
> width=0) (actual time=0.018..1272.030 rows=55151 loops=1)
>  Output: content
>  Filter: (js.content -> 'tags'::text) -> 0) ->>
> 'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
> -> 0) ->> 'term'::text) ~~ 'web%'::text))
>  Rows Removed by Filter: 1197822
>  Buffers: shared hit=174671 read=572
>  Planning time: 0.064 ms
>  Execution time: 1278.741 ms
> (10 rows)
>
> Time: 1279.145 ms
>
>
> I did, however, notice that repeated runs of the query with
> max_parallel_degree = 4 yields different counts of rows removed by
> filter:
>
> Run 1: 2051330
> Run 2: 2081252
> Run 3: 2065112
> Run 4: 2022045
> Run 5: 2025384
> Run 6: 2059360
> Run 7: 2079620
> Run 8: 2058541

Here's another oddity, with max_parallel_degree = 1:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->>'title' like '%design%';
 QUERY
PLAN

Re: [HACKERS] Per-table log_autovacuum_min_duration is actually documented

2015-11-11 Thread Michael Paquier

On Thu, Nov 12, 2015 at 6:27 AM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>
>> I recall that we had some talks about grouping all the relopts into a
>> single documentation section, perhaps not having one is at the origin
>> of the confusion?
>
> I think you're remembering this:
> http://www.postgresql.org/message-id/20150402205713.gb22...@eldon.alvh.no-ip.org

Right. Thanks. Do you think we'd still want a patch to improve that?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect patch, for showing tuple data

2015-11-11 Thread Michael Paquier

On Thu, Nov 12, 2015 at 12:41 AM, Nikolay Shaplov
 wrote:
> В письме от 28 октября 2015 16:57:36 пользователь Michael Paquier написал:
>> On Sat, Oct 17, 2015 at 1:48 AM, Michael Paquier wrote:
>> > On Sat, Oct 17, 2015 at 5:15 AM, Nikolay Shaplov wrote:
>> >> Or it's ready to commit, and just not marked this way?
>> >
>> > No, I don't think we have reached this state yet.
>> >
>> >> I am going to make report based on this patch in Vienna. It would be
>> >> nice to have it committed until then :)
>> >
>> > This is registered in this November's CF and the current one is not
>> > completely wrapped up, so I haven't rushed into looking at it.
>>
>> So, I have finally been able to look at this patch in more details,
>> resulting in the attached. I noticed a couple of things that should be
>> addressed, mainly:
>> - addition of a new routine text_to_bits to perform the reverse
>> operation of bits_to_text. This was previously part of
>> tuple_data_split, I think that it deserves its own function.
>> - split_tuple_data should be static
>> - t_bits_str should not be a text, rather a char* fetched using
>> text_to_cstring(PG_GETARG_TEXT_PP(4)). This way there is no need to
>> perform extra calculations with VARSIZE and VARHDRSZ
>> - split_tuple_data can directly use the relation OID instead of the
>> tuple descriptor of the relation
>> - t_bits was leaking memory. For correctness I think that it is better
>> to free it after calling split_tuple_data.
>> - PG_DETOAST_DATUM_COPY allocates some memory, this was leaking as
>> well in raw_attr actually. I refactored the code such as a bytea* is
>> used and always freed when allocated to avoid leaks. Removing raw_attr
>> actually simplified the code a bit.
>> - I simplified the docs, that was largely too verbose in my opinion.
>> - Instead of using VARATT_IS_1B_E and VARTAG_EXTERNAL, using
>> VARATT_IS_EXTERNAL and VARATT_IS_EXTERNAL_ONDISK seems more adapted to
>> me, those other ones are much more low-level and not really spread in
>> the backend code.
>> - Found some typos in the code, the docs and some comments. I reworked
>> the error messages as well.
>> - The code format was not really in line with the project guidelines.
>> I fixed that as well.
>> - I removed heap_page_item_attrs for now to get this patch in a
>> bare-bone state. Though I would not mind if this is re-added (I
>> personally don't think that's much necessary in the module
>> actually...).
>> - The calculation of the length of t_bits using HEAP_NATTS_MASK is
>> more correct as you mentioned earlier, so I let it in its state.
>> That's actually more accurate for error handling as well.
>> That's everything I recall I have. How does this look?
> You've completely rewrite everything ;-)
>
> Let everything be the way you wrote. This code is better than mine.
>
> With one exception. I really  need heap_page_item_attrs function. I used it in
> my Tuples Internals presentation
> https://github.com/dhyannataraj/tuple-internals-presentation
> and I am 100% sure that this function is needed for educational purposes, and
> this function should be as simple as possible, so students can use it without
> extra efforts.

Fine. That's your patch after all.

> I still have an opinion that documentation should be more verbose, than your
> version, but I can accept your version.

I am not sure that's necessary, pageinspect is for developers.

> Who is going to add heap_page_item_attrs to your patch? me or you?

I recall some parts of the code I still did not like much :) I'll grab
some room to have an extra look at it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-table log_autovacuum_min_duration is actually documented

2015-11-11 Thread Tom Lane

Michael Paquier  writes:
> On Thu, Nov 12, 2015 at 6:27 AM, Alvaro Herrera
>  wrote:
>> I think you're remembering this:
>> http://www.postgresql.org/message-id/20150402205713.gb22...@eldon.alvh.no-ip.org

> Right. Thanks. Do you think we'd still want a patch to improve that?

Give it a try if you like, and see whether it seems to improve matters.
I did not try moving material around like that in the patch I committed
earlier today.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-11 Thread Joe Conway

On 11/11/2015 11:31 AM, Tom Lane wrote:
> After sleeping on it, the best compromise I can think of is to add an
> "Assert(false)" after the WARNING report for the shared-catalogs case.
> This will make the failure un-missable in any development build, while
> not breaking production builds' ability to recover from corner cases
> we might not've foreseen.

That sounds like a good answer to me.

> Of course, if you run an assert-enabled build in production, you might
> possibly lose.  But that's never been recommended practice.

Yeah, there are plenty of other ways you would lose on that proposition.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Python 3 compatibility fun

2015-11-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 11/11/15 12:08 PM, Tom Lane wrote:
>> we're failing to build against Python 3.5 because the python guys
>> have randomly changed some error message texts, again.

> This has already been fixed in the 9.5.

Well, that's nice, but surely it should have been back-patched into
every branch that claims to support Python 3.x.  We cannot pretend
that the world is not a moving target.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to install config/missing

2015-11-11 Thread Tom Lane

Andrew Dunstan  writes:
> On 11/11/2015 12:34 PM, Tom Lane wrote:
>> I was thinking more of removing the "missing" script and associated logic
>> entirely, rather than making PGXS a special case.  I think we should do
>> our best to minimize differences between behaviors in core builds and
>> PGXS builds, if only because we don't test the latter very much and
>> might not notice problems there.

> At least two buildfarm members (crake and sitella) build FDWs using 
> PGXS. Of course, they aren't likely to uncover problems with missing 
> perl/bison/flex - especially perl ;-) But I don't want people to get the 
> idea we don't test PGXS regularly, because we do.

I know we have buildfarm coverage, but by its nature that's not going to
exercise scenarios like missing tools.  You only find out about problems
of that ilk from manual testing in non-controlled environments.  And
I think we have much more of that going on for the core build than for
any add-on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-11-11 Thread Robert Haas

On Mon, Nov 9, 2015 at 10:26 PM, Kouhei Kaigai  wrote:
> It is a relevant topic of readfuncs support for custom-scan.
>
> Unlike CustomPath and CustomScanState, we don't allow custom-scan
> provider to define own and larger structure that embeds CustomScan
> at head of the structure, to support copyObject().
> Thus, custom-scan provider needs to have own logic to pack and
> unpack private fields, like:
>   https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L112
>
> At present, only create_customscan_plan() and _copyCustomScan()
> can produce CustomScan node.
> The create_customscan_plan() invokes PlanCustomPath callback,
> thus, CSP can return a pointer of CustomScan field within
> a larger structure. So, we don't adjust this interface.
> On the other hands, _copyCustomScan() allocates a new node using
> makeNode(CustomScan), thus, here is no space for the larger
> structure if CSP wants to use to keep private values without
> packing and unpacking.
> Only CSP knows exact size of the larger structure, so all we can
> do is ask CSP to allocate a new node and copy its private fields.
> This patch newly adds NodeCopyCustomScan callback to support
> copyObject.
>
> Also, v9.6 will support nodeToString/stringToNode on plan nodes.
> So, we need to re-define the role of TextOutCustomScan callback
> that is also defined at v9.5.
> If CSP extends the CustomScan to save private values on the extra
> area, only CSP knows what values and which data types are stored,
> thus, all we can do is ask CSP to serialize and deserialize its
> private fields without inconsistency.
> Even though TextOutCustomScan callback shall be once ripped out
> to support readfuncs.c, a pair of TextOut and TextRead callback
> will re-define its responsibility again; when CSP defines
> a larger structure that embeds CustomScan, a pair of these
> callbacks are used to serialize/deserialize the entire custom
> defined objects.

I don't see this as being a particularly good idea.  The same issue
exists for FDWs, and we're just living with it in that case.  There
was talk previously of improving it, and maybe some day somebody will,
but I can't get excited about it.  Writing serialization and
deserialization code is annoying but completely insufficient, in my
mind, to justify the hassle of creating a system for this that will be
used by very, very little code.

If we do want to improve it, I'm not sure this is the way to go,
either.  I think there could be other designs where we focus on making
the serialization and deserialization options better, rather than
letting people tack stuff onto the struct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Translation updates

2015-11-11 Thread Alvaro Herrera

Peter Eisentraut wrote:
> On 11/10/15 12:03 PM, Alvaro Herrera wrote:
> > Peter Eisentraut wrote:
> >> Translation updates
> >>
> >> Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
> >> Source-Git-Hash: cd263526676705b4a8a3a708c9842461c4a2bcc3
> > 
> > Hi Peter,
> > 
> > Would you please document this process?
> 
> It's documented in src/tools/RELEASE_CHANGES (except I don't do the
> tagging anymore).  You can try it yourself.

Ah, that's it, thanks.  I remember doing it once in the past and
couldn't remember where had I gotten the procedure from.  I guess I will
add a pointer to that file in the recently created Release Process wiki
page.

> > (It would also be good to know how to create new branches and how to
> > destroy one when it goes out of support.)
> 
> Yeah that one's a bit trickier and potentially confusing.  See recent
> discussion about branch naming on the translators list.  I usually just
> check the commit history for how it was done the previous time. ;-)

Aha.  (I should get a fresh clone of that repository, I guess. I
probably have extra branches now.)

BTW I'm confused about the message-branches.txt file.  Aren't we
supposed to have an entry for REL9_5_STABLE in there?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Pavel Stehule

2015-11-11 20:26 GMT+01:00 Robert Haas :

> On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule 
> wrote:
> > I have a first query
> >
> > I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
> > differen
>
> Hmm, I see I was right about people finding more bugs once this was
> committed.  That didn't take long.
>

It is super feature, nobody can to wait to check it :). Much more people
can to put feedback and can do tests now.


>
> There's supposed to be code to handle this - see the
> SharedPlanStateInstrumentation stuff in execParallel.c - but it's
> evidently a few bricks shy of a load.
> ExecParallelReportInstrumentation is supposed to transfer the counts
> from each worker to the DSM:
>
> ps_instrument = >ps_instrument[i];
> SpinLockAcquire(_instrument->mutex);
> InstrAggNode(_instrument->instr, planstate->instrument);
> SpinLockRelease(_instrument->mutex);
>
> And ExecParallelRetrieveInstrumentation is supposed to slurp those
> counts back into the leader's PlanState objects:
>
> /* No need to acquire the spinlock here; workers have exited
> already. */
> ps_instrument = >ps_instrument[i];
> InstrAggNode(planstate->instrument, _instrument->instr);
>
> This might be a race condition, or it might be just wrong logic.
> Could you test what happens if you insert something like a 1-second
> sleep in ExecParallelFinish just after the call to
> WaitForParallelWorkersToFinish()?  If that makes the results
> consistent, this is a race.  If it doesn't, something else is wrong:
> then it would be useful to know whether the workers are actually
> calling ExecParallelReportInstrumentation, and whether the leader is
> actually calling ExecParallelRetrieveInstrumentation, and if so
> whether they are doing it for the correct set of nodes.
>

I did there pg_usleep(100L) without success

postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
 QUERY
PLAN
═
Aggregate  (cost=9282.50..9282.51 rows=1 width=0) (actual
time=154.535..154.535 rows=1 loops=1)
  ->  Gather  (cost=1000.00..9270.00 rows=5000 width=0) (actual
time=0.675..142.320 rows=10 loops=1)
Number of Workers: 2
->  Parallel Seq Scan on xxx  (cost=0.00..7770.00 rows=5000
width=0) (actual time=0.075..445.999 rows=168927 loops=1)
  Filter: ((a % 10) = 0)
  Rows Removed by Filter: 1520549
Planning time: 0.117 ms
Execution time: 1155.505 ms
(8 rows)

expected

postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
  QUERY
PLAN
═══
Aggregate  (cost=19437.50..19437.51 rows=1 width=0) (actual
time=171.233..171.233 rows=1 loops=1)
  ->  Seq Scan on xxx  (cost=0.00..19425.00 rows=5000 width=0) (actual
time=0.187..162.627 rows=10 loops=1)
Filter: ((a % 10) = 0)
Rows Removed by Filter: 90
Planning time: 0.119 ms
Execution time: 171.322 ms
(6 rows)

The tests is based on table xxx

create table xxx(a int);
insert into xxx select generate_series(1,100);


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] Patch to install config/missing

2015-11-11 Thread Peter Eisentraut

On 11/11/15 12:34 PM, Tom Lane wrote:
> I was thinking more of removing the "missing" script and associated logic
> entirely, rather than making PGXS a special case.  I think we should do
> our best to minimize differences between behaviors in core builds and
> PGXS builds, if only because we don't test the latter very much and
> might not notice problems there.

Well, about a year ago people were arguing for the opposite change in
the documentation build.  It used to default all the build tool
variables to programs that weren't there, and people got all confused
about that, so we stuck "missing" in there across the board.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

2015-11-11 Thread YUriy Zhuravlev

On Wednesday 11 November 2015 17:29:31 you wrote:
> In this case the syntax is major issue. Any language should not to have any
> possible feature on the world.

I am about omitted boundaries. It almost does not change the syntax and with 
nothing conflicts.

Thanks.
-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to install config/missing

2015-11-11 Thread Andrew Dunstan




On 11/11/2015 12:34 PM, Tom Lane wrote:

Peter Eisentraut  writes:

On 11/2/15 4:07 PM, Tom Lane wrote:

I wonder how much we need that script at all though.  If, say, configure
doesn't find bison, what's so wrong with just defining BISON=bison and
letting the usual shell "bison: command not found" error leak through?

I agree.  Something like the attached patch.

I was thinking more of removing the "missing" script and associated logic
entirely, rather than making PGXS a special case.  I think we should do
our best to minimize differences between behaviors in core builds and
PGXS builds, if only because we don't test the latter very much and
might not notice problems there.







At least two buildfarm members (crake and sitella) build FDWs using 
PGXS. Of course, they aren't likely to uncover problems with missing 
perl/bison/flex - especially perl ;-) But I don't want people to get the 
idea we don't test PGXS regularly, because we do.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Thom Brown

On 11 November 2015 at 19:26, Robert Haas  wrote:
> On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule  
> wrote:
>> I have a first query
>>
>> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
>> differen
>
> Hmm, I see I was right about people finding more bugs once this was
> committed.  That didn't take long.
>
> There's supposed to be code to handle this - see the
> SharedPlanStateInstrumentation stuff in execParallel.c - but it's
> evidently a few bricks shy of a load.
> ExecParallelReportInstrumentation is supposed to transfer the counts
> from each worker to the DSM:
>
> ps_instrument = >ps_instrument[i];
> SpinLockAcquire(_instrument->mutex);
> InstrAggNode(_instrument->instr, planstate->instrument);
> SpinLockRelease(_instrument->mutex);
>
> And ExecParallelRetrieveInstrumentation is supposed to slurp those
> counts back into the leader's PlanState objects:
>
> /* No need to acquire the spinlock here; workers have exited already. 
> */
> ps_instrument = >ps_instrument[i];
> InstrAggNode(planstate->instrument, _instrument->instr);
>
> This might be a race condition, or it might be just wrong logic.
> Could you test what happens if you insert something like a 1-second
> sleep in ExecParallelFinish just after the call to
> WaitForParallelWorkersToFinish()?  If that makes the results
> consistent, this is a race.  If it doesn't, something else is wrong:
> then it would be useful to know whether the workers are actually
> calling ExecParallelReportInstrumentation, and whether the leader is
> actually calling ExecParallelRetrieveInstrumentation, and if so
> whether they are doing it for the correct set of nodes.

Hmm.. I made the change, but clearly it's not sleeping properly with
my change (I'm expecting a total runtime in excess of 1 second):

max_parallel_degree = 4:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';

QUERY PLAN
-
 Aggregate  (cost=49578.18..49578.19 rows=1 width=0) (actual
time=797.518..797.518 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=174883 read=540
   ->  Gather  (cost=1000.00..49546.93 rows=12500 width=0) (actual
time=0.245..784.959 rows=55151 loops=1)
 Output: content
 Number of Workers: 4
 Buffers: shared hit=174883 read=540
 ->  Parallel Seq Scan on public.js  (cost=0.00..47296.93
rows=12500 width=0) (actual time=0.019..6153.679 rows=94503 loops=1)
   Output: content
   Filter: (js.content -> 'tags'::text) -> 0) ->>
'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text))
   Rows Removed by Filter: 2051330
   Buffers: shared hit=299224 read=907
 Planning time: 0.086 ms
 Execution time: 803.026 ms


max_parallel_degree = 0:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';

 QUERY PLAN
---
 Aggregate  (cost=212867.43..212867.44 rows=1 width=0) (actual
time=1278.717..1278.717 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=174671 read=572
   ->  Seq Scan on public.js  (cost=0.00..212836.18 rows=12500
width=0) (actual time=0.018..1272.030 rows=55151 loops=1)
 Output: content
 Filter: (js.content -> 'tags'::text) -> 0) ->>
'term'::text) ~~ 'design%'::text) OR js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text))
 Rows Removed by Filter: 1197822
 Buffers: shared hit=174671 read=572
 Planning time: 0.064 ms
 Execution time: 1278.741 ms
(10 rows)

Time: 1279.145 ms


I did, however, notice that repeated runs of the query with
max_parallel_degree = 4 yields different counts of rows removed by
filter:

Run 1: 2051330
Run 2: 2081252
Run 3: 2065112
Run 4: 2022045
Run 5: 2025384
Run 6: 2059360
Run 7: 2079620
Run 8: 2058541

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple psql option -c

2015-11-11 Thread David G. Johnston

On Wed, Nov 11, 2015 at 7:01 AM, Michael Paquier 
wrote:

> It seems to me that the documentation should specify that when -C is
> used with -1 each individual series of commands is executed within a
> transaction block.

In summary:

Default (Not Single + Auto-Commit): One Transactions per parsed statement
in all -Cs []
Single + Auto-Commit: One Transaction per -C [--single-transaction] {same
as --no-auto-commit]
Not Single + Not Auto-Commit: One Transaction per -C [--no-auto-commit]
{same as --single-transaction}
Single + Not Auto-Commit: One Transaction covering all -Cs
[--no-auto-commit --single-transaction]

Explanation:

The transactional behavior of -C
can, with defaults, be described thusly:

BEGIN:
-C #1 Statement #1
COMMIT;
BEGIN;
-C #1 Statement #2
COMMIT;
BEGIN;
-C #2 Statement Only
COMMIT;

Basically the explicit representation of Auto-Commit "on" Mode

I don't understand how -c implements the promise of:
"""
If the command string contains multiple SQL commands, they are processed in
a single transaction, unless there are explicit BEGIN/COMMIT commands
included in the string to divide it into multiple transactions.
"""
But my gut (and Pavel) says that this is "legacy behavior" that should not
be carried over to -C.  I would suggest going further and disallowing
transaction control statements within -C commands.

Now, in the presence of "--single-transaction" we would convert the
transactional behavior from that shown above to:

BEGIN;
-C #1 Statement #1
-C #1 Statement #2
COMMIT; -- auto-committed;
BEGIN;
-C #2
COMMIT;

Additionally, if the variable AUTOCOMMIT is "off" then the implicit script
should look like:

BEGIN;
-C #1 Statement #1
-C #2 Statement #2
-C #2
COMMIT;

So a "true" single transaction requires setting AUTOCOMMIT to off otherwise
you only get each -C singly.

I would suggest adding an action "--no-auto-commit" option to complete the
existence of the "--single-transaction" option.  While the variable method
works it doesn't feel as clean now that we are adding this option that
(can) make direct use of it.

Specifying only --no-auto-commit results in:
BEGIN;
-C #1 Statement #1
-C #1 Statement #2
COMMIT;
BEGIN;
-C #2
COMMIT;

Which is redundant with specifying only "--single-transaction".  Each -C
still commits otherwise you would just use the default.

David J.

Re: [HACKERS] can we add SKIP LOCKED to UPDATE?

2015-11-11 Thread Gavin Flower


On 12/11/15 13:52, Greg Stark wrote:

On Wed, Nov 11, 2015 at 6:57 PM, Gavin Flower
 wrote:

Don't you realize that 400MB is over 4 million of the old 100Kb floppy
disks, and even with the new big 1.44MB 3.5 " disks, you'd need about 280!!!

Don't be silly. It's only four thousand 100Kb floppies.



You're right, of course!

So you won't mind helping me back up 400MB on 100 kB floppies then???



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-11-11 Thread Andres Freund

On 2015-11-11 14:59:33 -0500, Robert Haas wrote:
> I don't see this as being a particularly good idea.  The same issue
> exists for FDWs, and we're just living with it in that case.

It's absolutely horrible there. I don't see why that's a justification
for much.  To deal with the lack of extensible copy/out/readfuncs I've
just had to copy the entirety of readfuncs.c into an extension. Or you
build replacements for those (as e.g. postgres_fdw essentially has
done).

> If we do want to improve it, I'm not sure this is the way to go,
> either.  I think there could be other designs where we focus on making
> the serialization and deserialization options better, rather than
> letting people tack stuff onto the struct.

Just better serialization doesn't actually help all that much. Being
able to conveniently access data directly, i.e. as fields in a struct,
makes code rather more readable. And in many cases more
efficient. Having to serialize internal datastructures unconditionally,
just so copyfuncs.c works if actually used, makes for a fair amount of
inefficiency (forced deserialization, even when not copying) and uglier
code.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-table log_autovacuum_min_duration is actually documented

2015-11-11 Thread Alvaro Herrera

Michael Paquier wrote:

> I recall that we had some talks about grouping all the relopts into a
> single documentation section, perhaps not having one is at the origin
> of the confusion?

I think you're remembering this:
http://www.postgresql.org/message-id/20150402205713.gb22...@eldon.alvh.no-ip.org

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to install config/missing

2015-11-11 Thread Alvaro Herrera

Peter Eisentraut wrote:
> On 11/11/15 12:34 PM, Tom Lane wrote:
> > I was thinking more of removing the "missing" script and associated logic
> > entirely, rather than making PGXS a special case.  I think we should do
> > our best to minimize differences between behaviors in core builds and
> > PGXS builds, if only because we don't test the latter very much and
> > might not notice problems there.
> 
> Well, about a year ago people were arguing for the opposite change in
> the documentation build.  It used to default all the build tool
> variables to programs that weren't there, and people got all confused
> about that, so we stuck "missing" in there across the board.

Ah, right :-(  It's obviously difficult to arrange a compromise that
pleases everyone here.  I think it's fair to keep "missing" for the doc
build and remove it from Perl/bison/flex, regardless of pgxs; extensions
cannot build doc files anyway.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] can we add SKIP LOCKED to UPDATE?

2015-11-11 Thread Gavin Flower


On 12/11/15 02:07, Craig Ringer wrote:



On 11 November 2015 at 16:02, Torsten Zühlsdorff 
> 
wrote:


From my experience most databases are just tpo small. Their
operations finish before there can be a deadlock. Same for race
conditions - most developer don't know about them, because they
never stumbled upon them. I am matching regularly discussions if a
database is already to big when holding 10.000 records in the
whole cluster...


Ha. Yes. So true.

I see Stack Overflow posts where somebody explains that their query 
takes ages on their Huge!!1! database. Then it turns out the query 
takes 0.2 seconds on a 400MB table.


Huge. Right.

--
 Craig Ringer http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


I'll say its huge.

Don't you realize that 400MB is over 4 million of the old 100Kb floppy 
disks, and even with the new big 1.44MB 3.5 " disks, you'd need about 280!!!


Though, I suspect that the people who are saying that 400MB is huge, 
never actually used those floppies I mentioned above.


Now-a-days I would not regard 400GB as huge, even though when I started 
programming, MainFrames often had less than 1MB of core memory, and the 
big 12" tape reels could hold a max of 35MB of data.  Now I'm sitting 
sitting at a Linux box were even my SSD has hundreds of times the 
storage (let alone the capacity of my HD's) the entire New Zealand Post 
Office had in 1980!  How times change...


The reality, is that people tend to compare things in their direct 
experience, and don't have a feeling for the 'size' of the computers 
they use in terms of storage & processing power.  The above mainframe I 
mentioned (an ICL 4/72) had washing machine sized boxes each with a 60MB 
disk - everything was impressively sized, now you can fit a TB HD in a 
match box.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-11 Thread Peter Eisentraut

On 11/11/15 4:22 AM, Thomas Munro wrote:
> My thinking was that the reason for wanting to load balance over a set
> of hot standbys is because you have a very read-heavy workload, so it
> makes sense to tax the writers and leave the many dominant readers
> unburdened, so (3) should be better than (2) for the majority of users
> who want such a configuration.

One problem I can see is that even if you have a read-heavy workload,
the writes can still be a bottleneck, since they are necessarily bound
to one node.  And so if the feature proposal is, we can make your reads
more consistent but the writes will become slower, then that's not a
good deal.

More generally, no matter whether you pick the writers or the readers to
wait, if you assume that read-only slaves are an application performance
feature, then it's questionable how much better such applications will
perform overall when network-bound waits are introduced in the system.

I think in practice applications that are busy enough to worry about
this don't really work like that anyway.  For example, the writes should
go to a message queue and are written out whenever, with a copy kept in
a cache for display in the meantime.  Maybe there could be additional
features to make managing this easier.

I think there are a lot of different variations of this in practice, not
only depending on the workload and other measurables, but also
business-dependent decisions on application behavior and degradability.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-11-11 Thread Robert Haas

On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule  wrote:
> I have a first query
>
> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
> differen

Hmm, I see I was right about people finding more bugs once this was
committed.  That didn't take long.

There's supposed to be code to handle this - see the
SharedPlanStateInstrumentation stuff in execParallel.c - but it's
evidently a few bricks shy of a load.
ExecParallelReportInstrumentation is supposed to transfer the counts
from each worker to the DSM:

ps_instrument = >ps_instrument[i];
SpinLockAcquire(_instrument->mutex);
InstrAggNode(_instrument->instr, planstate->instrument);
SpinLockRelease(_instrument->mutex);

And ExecParallelRetrieveInstrumentation is supposed to slurp those
counts back into the leader's PlanState objects:

/* No need to acquire the spinlock here; workers have exited already. */
ps_instrument = >ps_instrument[i];
InstrAggNode(planstate->instrument, _instrument->instr);

This might be a race condition, or it might be just wrong logic.
Could you test what happens if you insert something like a 1-second
sleep in ExecParallelFinish just after the call to
WaitForParallelWorkersToFinish()?  If that makes the results
consistent, this is a race.  If it doesn't, something else is wrong:
then it would be useful to know whether the workers are actually
calling ExecParallelReportInstrumentation, and whether the leader is
actually calling ExecParallelRetrieveInstrumentation, and if so
whether they are doing it for the correct set of nodes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-11-11 Thread Tom Lane

Robert Haas  writes:
> On Wed, Nov 11, 2015 at 3:29 PM, Andres Freund  wrote:
>> Just better serialization doesn't actually help all that much. Being
>> able to conveniently access data directly, i.e. as fields in a struct,
>> makes code rather more readable. And in many cases more
>> efficient. Having to serialize internal datastructures unconditionally,
>> just so copyfuncs.c works if actually used, makes for a fair amount of
>> inefficiency (forced deserialization, even when not copying) and uglier
>> code.

> Fair enough, but I'm still not very interested in having something for
> CustomScan only.

I'm with Robert here.  The fact is that postgres_fdw and other FDWs are
living just fine with the restrictions we put in to allow copyfuncs.c to
copy ForeignScan, and there's been no sufficient justification offered
as to why CustomScan can't play by those rules.

Yes, it would be nice to be able to use a more-tailored struct definition,
but it's not *necessary*.  We should not contort the core system
enormously in order to provide a bit more notational cleanliness to
extensions.

I'd be fine with fixing this if a nice solution were to present itself,
but so far nothing's been offered that wasn't horrid.

BTW, the "inefficiency" argument is junk, unless you provide some hard
evidence of a case where it hurts a lot.  If we were forcing people to
use serialized representations at execution time, it would be meaningful,
but we're not; you can convert as needed at executor setup time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-11-11 Thread Robert Haas

On Wed, Nov 11, 2015 at 3:29 PM, Andres Freund  wrote:
> On 2015-11-11 14:59:33 -0500, Robert Haas wrote:
>> I don't see this as being a particularly good idea.  The same issue
>> exists for FDWs, and we're just living with it in that case.
>
> It's absolutely horrible there. I don't see why that's a justification
> for much.  To deal with the lack of extensible copy/out/readfuncs I've
> just had to copy the entirety of readfuncs.c into an extension. Or you
> build replacements for those (as e.g. postgres_fdw essentially has
> done).
>
>> If we do want to improve it, I'm not sure this is the way to go,
>> either.  I think there could be other designs where we focus on making
>> the serialization and deserialization options better, rather than
>> letting people tack stuff onto the struct.
>
> Just better serialization doesn't actually help all that much. Being
> able to conveniently access data directly, i.e. as fields in a struct,
> makes code rather more readable. And in many cases more
> efficient. Having to serialize internal datastructures unconditionally,
> just so copyfuncs.c works if actually used, makes for a fair amount of
> inefficiency (forced deserialization, even when not copying) and uglier
> code.

Fair enough, but I'm still not very interested in having something for
CustomScan only.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-table log_autovacuum_min_duration is actually documented

2015-11-11 Thread Tom Lane

Michael Paquier  writes:
> On Wed, Nov 11, 2015 at 12:07 AM, Tom Lane  wrote:
>> Should it read "Overrides log_autovacuum_min_duration for autovacuum
>> operations on this specific table or toast table"?

> The same applied for all the other autovacuum and autoanalyze
> parameters. Do you think that we should add in the top paragraph of
> section "Storage Parameters" a mention of the type "If this parameter
> has a server-wide equivalent parameter, the per-table value overrides
> its server-wide equivalent if defined" or similar.

There's a whole lot of inconsistency in this area, apparently.  Some of
the entries in runtime-config-autovacuum are marked as being overridable
by storage parameters, some aren't (in particular this one is not, which
may be the origin of Bruce's complaint).  Some of the entries in
SQL-CREATETABLE-storage-parameters use the "custom" phraseology, some
don't but instead duplicate (or more often, rephrase poorly) the
documentation of the underlying GUC.  I think duplication is a bad
strategy here.  But I still don't care for "custom", perhaps because it's
been stretched to the point of being nearly meaningless elsewhere in the
system.  "Per-table" is used in other sentences in this same area, and
that seems like a better description.

I'll try to make this better.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-table log_autovacuum_min_duration is actually documented

2015-11-11 Thread Alvaro Herrera

Tom Lane wrote:
> Michael Paquier  writes:
> > On Wed, Nov 11, 2015 at 12:07 AM, Tom Lane  wrote:
> >> Should it read "Overrides log_autovacuum_min_duration for autovacuum
> >> operations on this specific table or toast table"?
> 
> > The same applied for all the other autovacuum and autoanalyze
> > parameters. Do you think that we should add in the top paragraph of
> > section "Storage Parameters" a mention of the type "If this parameter
> > has a server-wide equivalent parameter, the per-table value overrides
> > its server-wide equivalent if defined" or similar.
> 
> There's a whole lot of inconsistency in this area, apparently.  Some of
> the entries in runtime-config-autovacuum are marked as being overridable
> by storage parameters, some aren't (in particular this one is not, which
> may be the origin of Bruce's complaint).

log_autovacuum_min_duration was made changeable per-table recently by
commit 4ff695b17d32 of April 2015, having been introduced by commit
9d3b50244357ef4, Nov 2011, while the others have been changeable for a
much longer while, c.f. 834a6da4f72d of Feb 2009.

> Some of the entries in
> SQL-CREATETABLE-storage-parameters use the "custom" phraseology, some
> don't but instead duplicate (or more often, rephrase poorly) the
> documentation of the underlying GUC.  I think duplication is a bad
> strategy here.  But I still don't care for "custom", perhaps because it's
> been stretched to the point of being nearly meaningless elsewhere in the
> system.  "Per-table" is used in other sentences in this same area, and
> that seems like a better description.

Sounds fair.

> I'll try to make this better.

Thanks!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-11 Thread Simon Riggs

On 11 November 2015 at 05:37, Thomas Munro 
wrote:

Many sites use hot standby servers to spread read-heavy workloads over more
> hardware, or at least would like to.  This works well today if your
> application can tolerate some time lag on standbys.  The problem is that
> there is no guarantee of when a particular commit will become visible for
> clients connected to standbys.  The existing synchronous commit feature is
> no help here because it guarantees only that the WAL has been flushed on
> another server before commit returns.  It says nothing about whether it has
> been applied or whether it has been applied on the standby that you happen
> to be talking to.
>

Thanks for working on this issue.


> 3.  Commit on the primary with "causal_reads = on" waits for all
> 'available' standbys either to apply the commit record, or to cease to be
> 'available' and begin raising the error if they are still alive (because
> their authorizations have expired).
>

This causes every writer to wait.

What we want is to isolate the wait only to people performing a write-read
sequence, so I think it should be readers that wait. Let's have that debate
up front before we start reviewing the patch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

74 matches

Mail list logo