Re: [HACKERS] A couple of cosmetic changes around shared memory code

2016-05-18 Thread Piotr Stefaniak

On 2016-05-17 19:05, Tom Lane wrote:

Michael Paquier  writes:

On Tue, May 17, 2016 at 4:40 AM, Piotr Stefaniak
 wrote:
-toc_bytes = offsetof(shm_toc, toc_entry) +nentry * sizeof(shm_toc_entry)
+toc_bytes = offsetof(shm_toc, toc_entry) + nentry * sizeof(shm_toc_entry)
 + allocated_bytes;



I don't recall the exact reason, but this is intentional style
(memories from a patchwork with Tom).


Well, it's not so much intentional as that pgindent will make it look like
that no matter what you do --- it's got some weird interaction with
sizeof, offsetof, and typedef names versus operators later on the same
line.  I'd call that a pgindent bug myself, but have no particular desire
to try to fix it.


From my understanding, it's because pg_bsd_indent thinks that 
"(shm_toc)" is a cast since shm_toc is a keyword (type alias fetched 
from the "list of typedefs" file in this case, to be precise), forcing 
the "+" to be a unary operator instead of binary.


One easy way to work around this bug is putting shm_toc into its own 
parentheses but I'd prefer dropping this particular fix from my 
"cosmetic changes" patch until pg_bsd_indent is fixed.


I may come up with a possible fix for this bug, but don't hold your breath.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign table batch inserts

2016-05-18 Thread Craig Ringer
On 19 May 2016 at 01:39, Michael Paquier  wrote:

> On Wed, May 18, 2016 at 12:27 PM, Craig Ringer 
> wrote:
> > On 18 May 2016 at 06:08, Michael Paquier 
> wrote:
> >> > Wouldn’t it make sense to do the insert batch wise e.g. 100 rows ?
> >>
> >> Using a single query string with multiple values, perhaps, but after
> >> that comes into consideration query string limit particularly for
> >> large text values... The query used for the insertion is a prepared
> >> statement since writable queries are supported in 9.3, which makes the
> >> code quite simple actually.
> >
> > This should be done how PgJDBC does batches. It'd require a libpq
> > enhancement, but it's one we IMO need anyway: allow pipelined query
> > execution from libpq.
>
> That's also something that would be useful for the ODBC driver. Since
> it is using libpq as a hard dependency and does not speak the protocol
> directly, it is doing additional round trips to the server for this
> exact reason when preparing a statement.
>

Good to know. It'll hurt especially badly when statement level rollback is
enabled, since psqlODBC does savepoints then and it'd be able to get rid of
an extra pair of round trips.

It looks like there's plenty of use for this. FDWs, psqlODBC, client
applications doing batches, and postgres XL would benefit from it too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Peter Geoghegan
On Wed, May 18, 2016 at 3:57 PM, Alvaro Herrera
 wrote:
> Since we were considering a new VACUUM option, surely this is serious
> enough to warrant more than just contrib.

I would like to see us consider the long-term best place for amcheck's
functionality at the same time. Ideally, verification would be a
somewhat generic operation, with AM-specific code invoked as
appropriate.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Alvaro Herrera
Andres Freund wrote:

> > (AFAIK, "select count(*) from table" would offer a similar amount of
> > sanity checking as a full-table VACUUM scan does, so it's not like
> > we've removed functionality with no near-term replacement.)
> 
> I don't think that'd do anything comparable to
>   /*
>* As of PostgreSQL 9.2, the visibility map bit should never be 
> set if
>* the page-level bit is clear.  However, it's possible that 
> the bit
>* got cleared after we checked it and before we took the buffer
>* content lock, so we must recheck before jumping to the 
> conclusion
>* that something bad has happened.
>*/
>   else if (all_visible_according_to_vm && !PageIsAllVisible(page)
>&& VM_ALL_VISIBLE(onerel, blkno, ))
>   {
>   elog(WARNING, "page is not marked all-visible but 
> visibility map bit is set in relation \"%s\" page %u",
>relname, blkno);
>   visibilitymap_clear(onerel, blkno, vmbuffer);
>   }
> 
> If we had a checking module for all this it'd possibly be sufficient,
> but we don't.

Here's an idea.  We need core-blessed extensions (src/extensions/, you
know I've proposed this before), so why not take this opportunity to
create our first such and make it carry a function to scan a table
completely to do this task.

Since we were considering a new VACUUM option, surely this is serious
enough to warrant more than just contrib.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Andres Freund
On 2016-05-18 18:42:16 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-05-18 18:25:39 -0400, Tom Lane wrote:
> >> Yes, I've been wondering that too.  VACUUM is not meant as a corruption
> >> checker, and should not be made into one, so what is the point of this
> >> flag exactly?
> 
> > Well, so far a VACUUM FREEZE (or just setting vacuum_freeze_table_age =
> > 0) verified the correctness of the visibility map; and that found a
> > number of bugs.  Now visibilitymap grew additional responsibilities,
> > with a noticeable risk of data eating bugs, and there's no way to verify
> > whether visibilitymap's frozen bits are set correctly.
> 
> Meh.  I'm not sure we should grow a rather half-baked feature we'll never
> be able to remove as a substitute for a separate sanity checker.  The
> latter is really the right place for this kind of thing.

It's not a new feature, it's a feature we removed as a side effect. And
one that allows us to evaluate whether the new feature actually works.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Tom Lane
Andres Freund  writes:
> On 2016-05-18 18:25:39 -0400, Tom Lane wrote:
>> Yes, I've been wondering that too.  VACUUM is not meant as a corruption
>> checker, and should not be made into one, so what is the point of this
>> flag exactly?

> Well, so far a VACUUM FREEZE (or just setting vacuum_freeze_table_age =
> 0) verified the correctness of the visibility map; and that found a
> number of bugs.  Now visibilitymap grew additional responsibilities,
> with a noticeable risk of data eating bugs, and there's no way to verify
> whether visibilitymap's frozen bits are set correctly.

Meh.  I'm not sure we should grow a rather half-baked feature we'll never
be able to remove as a substitute for a separate sanity checker.  The
latter is really the right place for this kind of thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Andres Freund
On 2016-05-18 18:25:39 -0400, Tom Lane wrote:
> Josh berkus  writes:
> > Maybe this is the wrong perspective.  I mean, is there a reason we even
> > need this option, other than a lack of any other way to do a full table
> > scan to check for corruption, etc.?  If we're only doing this for
> > integrity checking, then maybe it's better if it becomes a function,
> > which could be later extended with additional forensic features?
> 
> Yes, I've been wondering that too.  VACUUM is not meant as a corruption
> checker, and should not be made into one, so what is the point of this
> flag exactly?

Well, so far a VACUUM FREEZE (or just setting vacuum_freeze_table_age =
0) verified the correctness of the visibility map; and that found a
number of bugs.  Now visibilitymap grew additional responsibilities,
with a noticeable risk of data eating bugs, and there's no way to verify
whether visibilitymap's frozen bits are set correctly.


> (AFAIK, "select count(*) from table" would offer a similar amount of
> sanity checking as a full-table VACUUM scan does, so it's not like
> we've removed functionality with no near-term replacement.)

I don't think that'd do anything comparable to
/*
 * As of PostgreSQL 9.2, the visibility map bit should never be 
set if
 * the page-level bit is clear.  However, it's possible that 
the bit
 * got cleared after we checked it and before we took the buffer
 * content lock, so we must recheck before jumping to the 
conclusion
 * that something bad has happened.
 */
else if (all_visible_according_to_vm && !PageIsAllVisible(page)
 && VM_ALL_VISIBLE(onerel, blkno, ))
{
elog(WARNING, "page is not marked all-visible but 
visibility map bit is set in relation \"%s\" page %u",
 relname, blkno);
visibilitymap_clear(onerel, blkno, vmbuffer);
}

If we had a checking module for all this it'd possibly be sufficient,
but we don't.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Tom Lane
Josh berkus  writes:
> Maybe this is the wrong perspective.  I mean, is there a reason we even
> need this option, other than a lack of any other way to do a full table
> scan to check for corruption, etc.?  If we're only doing this for
> integrity checking, then maybe it's better if it becomes a function,
> which could be later extended with additional forensic features?

Yes, I've been wondering that too.  VACUUM is not meant as a corruption
checker, and should not be made into one, so what is the point of this
flag exactly?

(AFAIK, "select count(*) from table" would offer a similar amount of
sanity checking as a full-table VACUUM scan does, so it's not like
we've removed functionality with no near-term replacement.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-05-18 Thread Michael Paquier
Hi all,

A couple of months back is has been reported to pgsql-bugs that WAL
segments were always switched with a low value of archive_timeout even
if a system is completely idle:
http://www.postgresql.org/message-id/20151016203031.3019.72...@wrigleys.postgresql.org
In short, a closer look at the problem has showed up that the logic in
charge of checking if a checkpoint should be skipped or not is
currently broken, because it completely ignores standby snapshots in
its calculation of the WAL activity. So a checkpoint always occurs
after checkpoint_timeout on an idle system since hot_standby has been
introduced as wal_level. This did not get better from 9.4, since
standby snapshots are logged every 15s by the background writer
process. In 9.6, since wal_level = 'archive' and 'hot_standby'
actually has the same meaning, the skip logic that worked with
wal_level = 'archive' does not do its job anymore.

One solution that has been discussed is to track the progress of WAL
activity when doing record insertion by being able to mark some
records as not updating the progress of WAL. Standby snapshot records
enter in this category, making the checkpoint skip logic more robust.

Attached is a patch implementing a solution for it, by adding in
WALInsertLock a new field that gets updated for each record to track
the LSN progress. This allows to reliably skip the generation of
standby snapshots in the bgwriter or checkpoints on an idle system.
Per discussion with Andres at PGcon, we decided that this is an
optimization, only for 9.7~ because this has been broken for a long
time. I have also changed XLogIncludeOrigin() to use a more generic
routine to set of status flags for a record being inserted:
XLogSetFlags(). This routine can use two flags:
- INCLUDE_ORIGIN to decide if the origin should be logged or not
- NO_PROGRESS to decide at insertion if a record should update the LSN
progress or not.
Andres mentioned me that we'd want to have something similar to
XLogIncludeOrigin, but while hacking I noticed that grouping both
things under the same umbrella made more sense.

I am adding that to the commit fest of September.

Regards,
-- 
Michael


hs-checkpoints-v11.patch
Description: invalid/octet-stream


hs-checkpoints-v11-2.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PgLogical 1.1 feedback

2016-05-18 Thread Joshua D. Drake

On 05/18/2016 01:17 PM, Joshua D. Drake wrote:

And then there are things like this:

postgres=# \c logical
You are now connected to database "logical" as user "postgres".
logical=# select * from pglogical.node;
-[ RECORD 1 ]--
node_id   | 3276292710
node_name | replica2
-[ RECORD 2 ]--
node_id   | 2125761069
node_name | subscriber0

logical=# select pglogical.drop_node(node_name := 'replica2');
ERROR:  cannot drop node "replica2" because it still has subscriptions 
associated with it

HINT:  drop the subscriptions first
logical=# select * from pglogical.subscription ;
-[ RECORD 1 ]+--
sub_id   | 1763399739
sub_name | subscription1
sub_origin   | 2125761069
sub_target   | 3276292710
sub_origin_if| 2466493301
sub_target_if| 182959776
sub_enabled  | t
sub_slot_name| pgl_logical_subscriber0_subscription1
sub_replication_sets | {default,default_insert_only,ddl_sql}
sub_forward_origins  | {all}

logical=# select pglogical.drop_subscription(subscription_name := 
'subscription1');



The last function call never returns, it will just hang indefinitely.

Sincerely,

JD
--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PgLogical 1.1 feedback

2016-05-18 Thread Joshua D. Drake

Hello,

Continuing my testing with pglogical:

It is very easy to end up in a broken state with pglogical. This isn't 
all that big of a deal because it is relatively easy to end up in a 
broken state with streaming replication as well. However, with streaming 
the replication the various components are of a known quantity. Using 
the wrong replication slot? Change the replication slot name to the 
correct one. Can't connect to the master? Check the log for pg_hba.conf 
errors, things like that.


With pg_logical it is a little more obtuse. You can end up with a 
replication slot (not to mention other things) that you are not able to 
get rid of because it is in use but it is difficult to figure out what 
is using it. I was able to plow through this by investigating the 
relations that reside under the schema pglogical. I do not think that is 
what our users need though.


In order for pg_logical to be further considered into core I think we 
need a lot more documentation. Every single relation needs to be 
documented. I was able to reasonably tell myself, "Oh, 
pglogical.subscriptions probably relates to 
pglogical.drop/create_subscription()" but that isn't going to be the norm.


Further, we may also want to consider having proper views in order to 
get information about what is going on:


pglogical.list_subscribers
pglogical.list_nodes

etc

Sincerely,

Joshua D. Drake

P.S. there is also a bug where pglogical does not release lwlocks which 
is resulting in a dot release shortly (per Github)


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Josh berkus
On 05/18/2016 03:51 PM, Peter Geoghegan wrote:
> On Wed, May 18, 2016 at 8:52 AM, Jeff Janes  wrote:
>> How about going with something that says more about why we are doing
>> it, rather than trying to describe in one or two words what it is
>> doing?
>>
>> VACUUM (FORENSIC)
>>
>> VACUUM (DEBUG)
>>
>> VACUUM (LINT)
> 
> +1

Maybe this is the wrong perspective.  I mean, is there a reason we even
need this option, other than a lack of any other way to do a full table
scan to check for corruption, etc.?  If we're only doing this for
integrity checking, then maybe it's better if it becomes a function,
which could be later extended with additional forensic features?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Peter Geoghegan
On Wed, May 18, 2016 at 8:52 AM, Jeff Janes  wrote:
> How about going with something that says more about why we are doing
> it, rather than trying to describe in one or two words what it is
> doing?
>
> VACUUM (FORENSIC)
>
> VACUUM (DEBUG)
>
> VACUUM (LINT)

+1


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query and temp_file_limit

2016-05-18 Thread Peter Geoghegan
On Wed, May 18, 2016 at 3:40 AM, Robert Haas  wrote:
>> I'll write a patch to fix the issue, if there is a consensus on a solution.
>
> I think for 9.6 we just have to document this issue.  In the next
> release, we could (and might well want to) try to do something more
> clever.

Works for me. You may wish to update comments within fd.c at the same time.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-05-18 Thread Tom Lane
Amit Langote  writes:
> On 2016/05/18 2:22, Tom Lane wrote:
>> The two ways that we've dealt with this type of hazard are to copy data
>> out of the relcache before using it; or to give the relcache the
>> responsibility of not moving a particular portion of data if it did not
>> change.  From memory, the latter applies to the tuple descriptor and
>> trigger data, but we've done most other things the first way.

After actually looking at the code, we do things that way for the
tupledesc, the relation's rules if any, and RLS policies --- see
RelationClearRelation().

> It seems that tuple descriptor is reference-counted; however trigger data
> is copied.  The former seems to have been done on performance grounds (I
> found 06e10abc).

We do refcount tuple descriptors, but we've been afraid to try to rely
completely on that; there are too many places that assume a relcache
entry's tupdesc is safe to reference.  It's not that easy to go over to
a fully refcounted approach, because that creates a new problem of being
sure that refcounts are decremented when necessary --- that's a pain,
particularly when a query is abandoned due to an error.

> So for a performance-sensitive relcache data structure, refcounting is the
> way to go (although done quite rarely)?

I'd be suspicious of this because of the cleanup problem.  The
don't-replace-unless-changed approach is the one that's actually battle
tested.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign table batch inserts

2016-05-18 Thread Michael Paquier
On Wed, May 18, 2016 at 12:27 PM, Craig Ringer  wrote:
> On 18 May 2016 at 06:08, Michael Paquier  wrote:
>> > Wouldn’t it make sense to do the insert batch wise e.g. 100 rows ?
>>
>> Using a single query string with multiple values, perhaps, but after
>> that comes into consideration query string limit particularly for
>> large text values... The query used for the insertion is a prepared
>> statement since writable queries are supported in 9.3, which makes the
>> code quite simple actually.
>
> This should be done how PgJDBC does batches. It'd require a libpq
> enhancement, but it's one we IMO need anyway: allow pipelined query
> execution from libpq.

That's also something that would be useful for the ODBC driver. Since
it is using libpq as a hard dependency and does not speak the protocol
directly, it is doing additional round trips to the server for this
exact reason when preparing a statement.

> [design follows]
> This would require libpq to be smarter about how it tracks queries. Right
> now it keeps track of current query, query results, etc directly in the
> connection object, and it sends a Sync after each operation then expects to
> wait in a busy state until it gets the results from that operation.

Yep.

> Instead we'd have to have a FIFO queue of messages libpq expects responses
> for. Variants of PQsendPrepare, PQsendQueryPrepared, PQsendDescribePrepared,
> etc would not  send a Sync message and would append an entry to the expected
> result queue instead of setting the current query, etc on the connection.
> They'd still mark the connection as busy, so no non-queue-aware calls could
> be run until the queue is consumed and empty.

Yep. That's exactly the ODBC regression, which become a huge problem
with more latency.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres_fdw join pushdown - getting server crash in left outer join of three table

2016-05-18 Thread Ashutosh Bapat
On Fri, May 13, 2016 at 10:14 AM, Robert Haas  wrote:

> On Wed, Mar 23, 2016 at 12:53 PM, Robert Haas 
> wrote:
> > On Wed, Mar 23, 2016 at 5:24 AM, Rajkumar Raghuwanshi
> >  wrote:
> >> Thanks Ashutosh for the patch. I have apply and retested it, now not
> getting
> >> server crash.
> >
> > Thanks for the report and the testing.  I have committed the patch.
>
> So it's been pointed out to me off-list that I have committed a
> different patch than the one I intended to commit - that is, the one
> Michael sent, not the one Ashutosh sent.  Oops.
>
> Reverting Michael's patch and applying Ashutosh's doesn't work any
> more due to conflicts in the regression tests.  And in re-rereviewing
> Ashutosh's patch I came to feel like the comments in this area needed
> a lot more work.
>

Thanks for improving comments. Those are much better than mine.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] explain analyze does not report actual rows correctly?

2016-05-18 Thread Tom Lane
chang chao  writes:
> The actual rows(rows=9950) part in the following line contained in the above 
> query plan seems strange.
> "  ->  Sort  (cost=10.64..11.14 rows=200 width=520) (actual time=0.045..0.561 
> rows=9950 loops=1)"
> Shouldn't it be 200?

No, that's probably correct, seeing that this node is the righthand child
of a mergejoin.  The discrepancy is from extra fetches due to the same row
being re-fetched multiple times thanks to mark/restore rescanning.  What
explain is reporting is the number of rows pulled from the node, not the
number of unique rows it processed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign table batch inserts

2016-05-18 Thread Craig Ringer
On 18 May 2016 at 06:08, Michael Paquier  wrote:


> > Wouldn’t it make sense to do the insert batch wise e.g. 100 rows ?
>
> Using a single query string with multiple values, perhaps, but after
> that comes into consideration query string limit particularly for
> large text values... The query used for the insertion is a prepared
> statement since writable queries are supported in 9.3, which makes the
> code quite simple actually.
>

This should be done how PgJDBC does batches. It'd require a libpq
enhancement, but it's one we IMO need anyway: allow pipelined query
execution from libpq.

[design follows]

What this should be doing is:

- send Parse to create an unnamed prepared statement; then
- loop, and:
  - send a Bind & an Execute for the query with values if the send buffer
is not full
  - If there are no more values to send, send a Sync message
  - Receive and process results if the receive buffer is not empty
  - Check each result and mark it off against the list of dispatched queries
  - If an ERROR is received, bail out
  - If a Sync is received, check that all results have been retrieved as
expected then return OK

This would require libpq to be smarter about how it tracks queries. Right
now it keeps track of current query, query results, etc directly in the
connection object, and it sends a Sync after each operation then expects to
wait in a busy state until it gets the results from that operation.

Instead we'd have to have a FIFO queue of messages libpq expects responses
for. Variants of
PQsendPrepare, PQsendQueryPrepared, PQsendDescribePrepared, etc would not
 send a Sync message and would append an entry to the expected result queue
instead of setting the current query, etc on the connection. They'd still
mark the connection as busy, so no non-queue-aware calls could be run until
the queue is consumed and empty.

These functions might return some kind of handle value that can be used to
identify the queue entry they created; it'd be pretty useless at the
moment, but needed if we ever get "cancel queries up to X" functionality on
the protocol or if we later added buffering of multiple query results.

A new PQsendSync or similar would be added to send a synchronisation point,
which would go into the FIFO. Clients would call that after enqueueing a
batch of work, e.g. after sending a commit for a batched xact. That's
required for error recovery.

Clients would use PQgetResults as before. When it returns null, they'd call
a new PQnextResult(...) function to initiate processing of the next
operation's input; this would pop the next operaiton from the FIFO, or
return null if there's nothing more in the queue. PQisBusy returns true
until there are no items left in the queue.

We'd still use the connection object for result sets, fetching rows, etc,
as there can still only be one "current" query for which a response is
being received from the server. Nothing much would change with PQgetResult
etc. There wouldn't be any PQgetResult variant to wait for results of the
nth query or for some kind of query handle, because we need the client to
consume the results of all prior queries. The client must process query
results in FIFO order. We could have per-query result buffers, etc, but it
seems pretty pointless; the client can do this for its self if it wants.

If the server sends an error, libpq would pop popping queries off the queue
until we get to the Sync there and consume input on the socketuntil we get
to a Sync on the wire. PQgetResult for each queued operation so skipped
would return a state indicating that it didn't execute because of an error
in a prior operation.

Such an API would benefit immensely from the "cancel up to" functionality
we discussed here recently; without it, it's hard to cancel anything
reliably and know what exactly you're cancelling, but it doesn't need it.
The cancel problem isn't much worse than before.

If we wanted to allow batch execution from the sync API we'd need a new
function that takes a prepared query and an array of values and manages the
send and receive buffer polling using the async API internally, since we
need to use nonblocking sockets to avoid deadlocking.

I don't think this would look that different to current libpq code to the
user. Ignoring the details about error handling on command dispatch, etc.
The app would just call a series of PQqueuePrepare, PQqueueQueryPrepared,
etc (bikeshed as desired) then PQsendSync(...). Then it'd call PQgetResults
until it returns null, call PQgetNextResult(...) and resume calling
PQgetResults(...). Repeat until PQgetNextResult(...) returns null, and
check that the most recent result was a PGRES_SYNC_OK, which is what we'll
return from processing a PQsendSync(...) result.

If the client wants to be totally nonblocking it can do the PQconsumeInput
and PQflush dance as normal. It's strongly preferable for the client to use
non-blocking writes, because if it doesn't then it risks creating a

[HACKERS] explain analyze does not report actual rows correctly?

2016-05-18 Thread chang chao
Hi all

The test data and sql is in the following mail.

http://www.postgresql.org/message-id/sg2pr06mb114954351fa5dae7566854a984...@sg2pr06mb1149.apcprd06.prod.outlook.com

After I disabled hash join and nested-loop join(set enable_hashjoin = off; set 
enable_nestloop = off;) and execute the query again,the query plan changes to 
the following one .


"Merge Join  (cost=3179.24..3183.55 rows=200 width=1044) (actual 
time=2.409..8.969 rows= loops=1)"
"  Merge Cond: (als.node_no = l1.level1_no)"
"  ->  Merge Join  (cost=3168.59..3249.95 rows=200 width=528) (actual 
time=2.362..5.933 rows= loops=1)"
"Merge Cond: (als.node_no = l2.parent_no)"
"->  Index Scan using all_level_status_pkey on all_level_status als  
(cost=0.29..109.10 rows=200 width=4) (actual time=0.023..0.152 rows=200 
loops=1)"
"  Index Cond: (level = 1)"
"  Filter: (status = 0)"
"->  Materialize  (cost=3168.30..3218.30 rows= width=524) (actual 
time=2.334..3.888 rows= loops=1)"
"  ->  Sort  (cost=3168.30..3193.30 rows= width=524) (actual 
time=2.330..2.778 rows= loops=1)"
"Sort Key: l2.parent_no"
"Sort Method: quicksort  Memory: 853kB"
"->  Seq Scan on level2_table l2  (cost=0.00..144.99 
rows= width=524) (actual time=0.007..0.924 rows= loops=1)"
"  ->  Sort  (cost=10.64..11.14 rows=200 width=520) (actual time=0.045..0.561 
rows=9950 loops=1)"
"Sort Key: l1.level1_no"
"Sort Method: quicksort  Memory: 34kB"
"->  Seq Scan on level1_table l1  (cost=0.00..3.00 rows=200 width=520) 
(actual time=0.006..0.016 rows=200 loops=1)"
"Planning time: 1.379 ms"
"Execution time: 9.422 ms"

The actual rows(rows=9950) part in the following line contained in the above 
query plan seems strange.
"  ->  Sort  (cost=10.64..11.14 rows=200 width=520) (actual time=0.045..0.561 
rows=9950 loops=1)"
Shouldn't it be 200?

Chao.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Jeff Janes
On Wed, May 18, 2016 at 7:09 AM, Joe Conway  wrote:
> On 05/18/2016 09:55 AM, Victor Yegorov wrote:
>> 2016-05-18 16:45 GMT+03:00 Robert Haas > >:
>>
>> No, that's what the existing FREEZE option does.  This new option is
>> about unnecessarily vacuuming pages that don't need it.  The
>> expectation is that vacuuming all-frozen pages will be a no-op.
>>
>>
>> VACUUM (INCLUDING ALL) ?
>
> VACUUM (FORCE ALL) ?


How about going with something that says more about why we are doing
it, rather than trying to describe in one or two words what it is
doing?

VACUUM (FORENSIC)

VACUUM (DEBUG)

VACUUM (LINT)

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The rewritting of join conditions caused a very slow query plan.

2016-05-18 Thread chang chao
Hi,all

This is my ddl and test data.

CREATE TABLE level1_table
(
  level1_no serial NOT NULL ,
  level1_node_name varchar(255),
   PRIMARY KEY (level1_no)
) WITHOUT OIDS;

CREATE TABLE level2_table
(
  level2_no serial NOT NULL ,
  parent_no int NOT NULL,
  level1_node_name varchar(255),
   PRIMARY KEY (level2_no)
) WITHOUT OIDS;

ALTER TABLE level2_table
   ADD FOREIGN KEY (parent_no)
   REFERENCES level1_table (level1_no)
   ON UPDATE RESTRICT
   ON DELETE RESTRICT
;

CREATE TABLE all_level_status
(
  level int NOT NULL,
  node_no int NOT NULL,
  status int,
   PRIMARY KEY (level, node_no)
) WITHOUT OIDS;


--create test data
INSERT INTO level1_table(
level1_no)
select generate_series(1,200);

INSERT INTO level2_table(
level2_no, parent_no)
select
level2_no,level2_no/50+1 as parent_no
from
generate_series(1,) level2_no ;

INSERT INTO all_level_status(
level, node_no, status)
select 1,level1_no,0
from level1_table;

INSERT INTO all_level_status(
level, node_no, status)
select 2,level2_no,0
from level2_table;

--analyze table
analyze level1_table;
analyze level2_table;
analyze all_level_status;

--execute sql
explain analyze
select * from level2_table l2
join (
 select l1.* from level1_table l1
 join all_level_status als on (als.level=1 and als.node_no=l1.level1_no)
 where  als.status=0
) normal_l1 on l2.parent_no=normal_l1.level1_no;

result :
"Hash Join  (cost=16.39..198.92 rows=200 width=1044) (actual time=0.246..3.772 
rows= loops=1)"
"  Hash Cond: (l2.parent_no = als.node_no)"
"  ->  Seq Scan on level2_table l2  (cost=0.00..144.99 rows= width=524) 
(actual time=0.011..0.840 rows= loops=1)"
"  ->  Hash  (cost=16.34..16.34 rows=4 width=524) (actual time=0.226..0.226 
rows=200 loops=1)"
"Buckets: 1024  Batches: 1  Memory Usage: 8kB"
"->  Merge Join  (cost=10.93..16.34 rows=4 width=524) (actual 
time=0.077..0.193 rows=200 loops=1)"
"  Merge Cond: (als.node_no = l1.level1_no)"
"  ->  Index Scan using all_level_status_pkey on all_level_status 
als  (cost=0.29..109.10 rows=200 width=4) (actual time=0.021..0.069 rows=200 
loops=1)"
"Index Cond: (level = 1)"
"Filter: (status = 0)"
"  ->  Sort  (cost=10.64..11.14 rows=200 width=520) (actual 
time=0.055..0.062 rows=200 loops=1)"
"Sort Key: l1.level1_no"
"Sort Method: quicksort  Memory: 34kB"
"->  Seq Scan on level1_table l1  (cost=0.00..3.00 rows=200 
width=520) (actual time=0.007..0.020 rows=200 loops=1)"
"Planning time: 1.158 ms"
"Execution time: 4.054 ms"

The main reason for the the big gap between estimated(200) and actual() 
rows in line 1 lies in that, in actuality rows for the join conditon 
column(node_no) in all_level_status table are not evenly distributed for the 
filter condition column(level),but in row estimation,the optimizer takes the 
assumption that they are evenly distributed.So as a lesson,I think we should 
try to make the distribution of rows for the join condition column even when 
possible.

Thus, can we draw the conclusion that the current structure of all_level_status 
table is a bad design and should be avoided ?

Chao.

From: chang chao 
Sent: Monday, May 16, 2016 17:23
To: pgsql-hackers@postgresql.org
Subject: The rewritting of join conditions caused a very slow query plan.
  

Hi,all

I have a query that is very slow,and the reason may be in the rewritting of 
join conditions.

this is the simplied version table and the key part of the sql.

level1_table and level2_table hold the tree data nodes,
and all_level_status table holds the current status all all nodes of all levels.
(I know that there would be much less trouble in performance if 
all_level_status was divided into two tables,namely,level1_status and 
level2_status tables.)

table1: level1_table
  level1_no   PK:serial 
  level1_node_name :varchar

table2:level2_table
  level2_no   PK:serial 
  parent_no   FK to level1_table.level1_no
  level2_node_name :varchar

table3: all_level_status
  level:1 OR 2 PK1
  node_no:level1_table.level1_no or level2_table.level2_no PK2
  status:0 OR 1(normal or abnormal)


The sql to find all level2 nodes whose parent level nodes are in normal status.

explain analyze
select * from level2_table l2
join (
 select l1.* from level1_table l1
 join all_level_status als on (als.level=1 and als.node_no=l1.level1_no)
 where  als.status=0
) normal_l1 on l2.parent_no=normal_l1.level1_no;


this is the query plan .

"Merge Join  (cost=3.38..5.13 rows=3 width=158) (actual time=0.087..0.179 
rows=21 loops=1)"
"  Merge Cond: (als.node_no = l2.parent_no)"
"  ->  Merge Join  (cost=1.63..7.66 rows=19 width=80) (actual time=0.067..0.126 
rows=18 loops=1)"
"    Merge Cond: (als.node_no = l1.level1_no)"
"    ->  Index Scan using all_level_status_pkey 

Re: [HACKERS] Declarative partitioning

2016-05-18 Thread Ildar Musin

Hi Amit and all,

On 18.05.2016 04:26, Amit Langote wrote:

On 2016/05/18 2:22, Tom Lane wrote:

The two ways that we've dealt with this type of hazard are to copy data
out of the relcache before using it; or to give the relcache the
responsibility of not moving a particular portion of data if it did not
change.  From memory, the latter applies to the tuple descriptor and
trigger data, but we've done most other things the first way.

It seems that tuple descriptor is reference-counted; however trigger data
is copied.  The former seems to have been done on performance grounds (I
found 06e10abc).

So for a performance-sensitive relcache data structure, refcounting is the
way to go (although done quite rarely)?  In this particular case, it is a
"partition descriptor" that could get big for a partitioned table with
partitions in hundreds or thousands.

Thanks,
Amit


Here is an experimental patch that optimizes planning time for range 
partitioned tables (it could be considered as a "proof of concept"). 
Patch should be applied on top of Amit's declarative partitioning patch. 
It handles only a very special case (often used though) where 
partitioning key consists of just a single attribute and doesn't contain 
expressions.


The main idea is the following:
* we are looking for clauses like 'VAR OP CONST' (where VAR is 
partitioning key attribute, OP is a comparison operator);

* using binary search find a partition (X) that fits CONST value;
* based on OP operator determine which partitions are also covered by 
clause. There are possible cases:
   1. If OP is '<' or '<=' then we need partitions standing left from X 
(including)
   2. If OP is '>' or '>=' then we need partitions standing right from 
X (including)

   3. If OP is '=' the we need only X partition
  (for '<' and '>' operators we also check if CONST value is equal to a 
lower or upper boundary (accordingly) and if it's true then exclude X).


For boolean expressions we evaluate left and right sides accordingly to 
algorithm above and then based on boolean operator find intersection 
(for AND) or union (for OR).


I run some benchmarks on:
1. original constraint exclusion mechanism,
2. optimized version (this patch) and
3. optimized version using relation->rd_partdesc pointer instead of 
RelationGetPartitionDesc() function (see previous discussion).


Initial conditions:

CREATE TABLE abc (id SERIAL NOT NULL, a INT, dt TIMESTAMP) PARTITION BY 
RANGE (a);

CREATE TABLE abc_1 PARTITION OF abc FOR VALUES START (0) END (1000);
CREATE TABLE abc_2 PARTITION OF abc FOR VALUES START (1000) END (2000);
...
etc
INSERT INTO %s (a) SELECT generate_series(0,  * 1000);

pgbench scripts: 
https://gist.github.com/zilder/872e634a8eeb405bd045465fc9527e53 (where 
:partitions is a number of partitions).
The first script tests fetching a single row from the partitioned table. 
Results (tps):


# of partitions | constraint excl. | optimized | optimized (using 
pointer)

+--+---+
100 |  658 |  2906 | 3079
   1000 |   45 |  2174 | 3021
   2000 |   22 |  1667 | 2919


The second script tests fetching all data from a single partition. 
Results (tps):


# of partitions | constraint excl. | optimized | optimized (using 
pointer)

+--+---+
100 |  317 |  1001 | 1051
   1000 |   34 |   941 | 1023
   2000 |   15 |   813 | 1016

Optimized version works much faster on large amount of partitions and 
degradates slower than constraint exclusion. But still there is a 
noticeable performance degradation from copying PartitionDesc structure: 
with 2000 partitions RelationGetPartitionDesc() function spent more than 
40% of all execution time on copying in first benchmark (measured with 
`perf`). Using reference counting as Amit suggests will allow to 
significantily decrease performance degradation.


Any comments and suggestions are very welcome. Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index abec2b8..cd4c727 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -47,6 +47,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+// #include "utils/rangeset.h"
 
 /*
  * Information about a single partition
@@ -77,6 +78,29 @@ typedef struct PartitionDescData
 	PartitionRangeBound **rangeuppers;	/* range upper bounds */
 } PartitionDescData;
 
+/*
+ * Context
+ */
+typedef struct walk_expr_tree_context
+{
+	OidparentId;
+	Index			index;
+	PartitionKey	key;
+	PartitionDesc	desc;
+} walk_expr_tree_context;
+
+typedef struct var_const_pair
+{
+	Var		*pvar;
+	Const	*pconst;
+} var_const_pair;
+
+typedef struct 

Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Joe Conway
On 05/18/2016 09:55 AM, Victor Yegorov wrote:
> 2016-05-18 16:45 GMT+03:00 Robert Haas  >:
> 
> No, that's what the existing FREEZE option does.  This new option is
> about unnecessarily vacuuming pages that don't need it.  The
> expectation is that vacuuming all-frozen pages will be a no-op.
> 
> 
> VACUUM (INCLUDING ALL) ?

VACUUM (FORCE ALL) ?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


[HACKERS] Improve BEGIN tab completion

2016-05-18 Thread Andreas Karlsson

Hi,

I noticed that the tab completion was not aware of that TRANSACTION/WORK 
is optional in BEGIN, and that we do not complete [NOT] DEFERRABLE.


While fixing it I also improved the completion support for SET SESSION 
slightly.


Andreas
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index a62ffe6..f07d598 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1873,8 +1873,11 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches5("ALTER", "GROUP", MatchAny, "ADD|DROP", "USER"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_roles);
 
-/* BEGIN, END, ABORT */
-	else if (Matches1("BEGIN|END|ABORT"))
+/* BEGIN */
+	else if (Matches1("BEGIN"))
+		COMPLETE_WITH_LIST6("WORK", "TRANSACTION", "ISOLATION LEVEL", "READ", "DEFERRABLE", "NOT DEFERRABLE");
+/* END, ABORT */
+	else if (Matches1("END|ABORT"))
 		COMPLETE_WITH_LIST2("WORK", "TRANSACTION");
 /* COMMIT */
 	else if (Matches1("COMMIT"))
@@ -2727,18 +2730,32 @@ psql_completion(const char *text, int start, int end)
 	/* Complete "SET TRANSACTION" */
 	else if (Matches2("SET|BEGIN|START", "TRANSACTION") ||
 			 Matches2("BEGIN", "WORK") ||
+			 Matches1("BEGIN") ||
 		  Matches5("SET", "SESSION", "CHARACTERISTICS", "AS", "TRANSACTION"))
-		COMPLETE_WITH_LIST2("ISOLATION LEVEL", "READ");
+		COMPLETE_WITH_LIST4("ISOLATION LEVEL", "READ", "DEFERRABLE", "NOT DEFERRABLE");
+	else if (Matches3("SET|BEGIN|START", "TRANSACTION|WORK", "NOT") ||
+			 Matches2("BEGIN", "NOT") ||
+			 Matches6("SET", "SESSION", "CHARACTERISTICS", "AS", "TRANSACTION", "NOT"))
+		COMPLETE_WITH_CONST("DEFERRABLE");
 	else if (Matches3("SET|BEGIN|START", "TRANSACTION|WORK", "ISOLATION") ||
+			 Matches2("BEGIN", "ISOLATION") ||
 			 Matches6("SET", "SESSION", "CHARACTERISTICS", "AS", "TRANSACTION", "ISOLATION"))
 		COMPLETE_WITH_CONST("LEVEL");
-	else if (Matches4("SET|BEGIN|START", "TRANSACTION|WORK", "ISOLATION", "LEVEL"))
+	else if (Matches4("SET|BEGIN|START", "TRANSACTION|WORK", "ISOLATION", "LEVEL") ||
+			 Matches3("BEGIN", "ISOLATION", "LEVEL") ||
+			 Matches7("SET", "SESSION", "CHARACTERISTICS", "AS", "TRANSACTION", "ISOLATION", "LEVEL"))
 		COMPLETE_WITH_LIST3("READ", "REPEATABLE READ", "SERIALIZABLE");
-	else if (Matches5("SET|BEGIN|START", "TRANSACTION|WORK", "ISOLATION", "LEVEL", "READ"))
+	else if (Matches5("SET|BEGIN|START", "TRANSACTION|WORK", "ISOLATION", "LEVEL", "READ") ||
+			 Matches4("BEGIN", "ISOLATION", "LEVEL", "READ") ||
+			 Matches8("SET", "SESSION", "CHARACTERISTICS", "AS", "TRANSACTION", "ISOLATION", "LEVEL", "READ"))
 		COMPLETE_WITH_LIST2("UNCOMMITTED", "COMMITTED");
-	else if (Matches5("SET|BEGIN|START", "TRANSACTION|WORK", "ISOLATION", "LEVEL", "REPEATABLE"))
+	else if (Matches5("SET|BEGIN|START", "TRANSACTION|WORK", "ISOLATION", "LEVEL", "REPEATABLE") ||
+			 Matches4("BEGIN", "ISOLATION", "LEVEL", "REPEATABLE") ||
+			 Matches8("SET", "SESSION", "CHARACTERISTICS", "AS", "TRANSACTION", "ISOLATION", "LEVEL", "REPEATABLE"))
 		COMPLETE_WITH_CONST("READ");
-	else if (Matches3("SET|BEGIN|START", "TRANSACTION|WORK", "READ"))
+	else if (Matches3("SET|BEGIN|START", "TRANSACTION|WORK", "READ") ||
+			 Matches2("BEGIN", "READ") ||
+			 Matches6("SET", "SESSION", "CHARACTERISTICS", "AS", "TRANSACTION", "READ"))
 		COMPLETE_WITH_LIST2("ONLY", "WRITE");
 	/* SET CONSTRAINTS */
 	else if (Matches2("SET", "CONSTRAINTS"))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Victor Yegorov
2016-05-18 16:45 GMT+03:00 Robert Haas :

> No, that's what the existing FREEZE option does.  This new option is
> about unnecessarily vacuuming pages that don't need it.  The
> expectation is that vacuuming all-frozen pages will be a no-op.
>

VACUUM (INCLUDING ALL) ?


-- 
Victor Y. Yegorov


Re: [HACKERS] memory layouts for binary search in nbtree

2016-05-18 Thread Simon Riggs
On 18 May 2016 at 14:25, Andres Freund  wrote:

> Hi,
>
> currently we IIRC use linearly sorted datums for the search in
> individual btree nodes.  Not surprisingly that's often one of the
> dominant entries in profiles. We could probably improve upon that by
> using an order more optimized for efficient binary search.
>
> See e.g.  http://cglab.ca/~morin/misc/arraylayout/ for benchmarks
> showing benefits.
>

Some stuff from >10 years ago about cache conscious btree layout as well.
That led to adoption of 64kB pages on some benchmarks.

I think its a good area of work.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Robert Haas
On Wed, May 18, 2016 at 9:42 AM, Joshua D. Drake  wrote:
>> It's not a bad thought, but I do think it might be a bit confusing.
>> My main priority for this new option is that people aren't tempted to
>> use it very often, and I think a name like "even_frozen_pages" is more
>> likely to accomplish that than just "frozen".
>
> freeze_all_pages?

No, that's what the existing FREEZE option does.  This new option is
about unnecessarily vacuuming pages that don't need it.  The
expectation is that vacuuming all-frozen pages will be a no-op.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Joshua D. Drake

On 05/18/2016 05:51 AM, Robert Haas wrote:

On Wed, May 18, 2016 at 8:41 AM, David Steele  wrote:

I think we should give this a name that hints more strongly at this
being an exceptional thing, like vacuum (even_frozen_pages).


How about just FROZEN?  Perhaps it's too confusing to have that and FREEZE,
but I thought I would throw it out there.


It's not a bad thought, but I do think it might be a bit confusing.
My main priority for this new option is that people aren't tempted to
use it very often, and I think a name like "even_frozen_pages" is more
likely to accomplish that than just "frozen".



freeze_all_pages?

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] memory layouts for binary search in nbtree

2016-05-18 Thread Andres Freund
Hi,

currently we IIRC use linearly sorted datums for the search in
individual btree nodes.  Not surprisingly that's often one of the
dominant entries in profiles. We could probably improve upon that by
using an order more optimized for efficient binary search.

See e.g.  http://cglab.ca/~morin/misc/arraylayout/ for benchmarks
showing benefits.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Robert Haas
On Wed, May 18, 2016 at 8:41 AM, David Steele  wrote:
>> I think we should give this a name that hints more strongly at this
>> being an exceptional thing, like vacuum (even_frozen_pages).
>
> How about just FROZEN?  Perhaps it's too confusing to have that and FREEZE,
> but I thought I would throw it out there.

It's not a bad thought, but I do think it might be a bit confusing.
My main priority for this new option is that people aren't tempted to
use it very often, and I think a name like "even_frozen_pages" is more
likely to accomplish that than just "frozen".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread David Steele

On 5/18/16 6:37 AM, Robert Haas wrote:

On Tue, May 17, 2016 at 5:47 PM, Gavin Flower
 wrote:

On 18/05/16 09:34, Vik Fearing wrote:

On 17/05/16 21:32, Alvaro Herrera wrote:


Is SCAN_ALL really the best we can do here?  The business of having an
underscore in an option name has no precedent (other than
CURRENT_DATABASE and the like).


ALTER DATABASE has options for ALLOW_CONNECTIONS, CONNECTION_LIMIT, and
IS_TEMPLATE.


How about COMPLETE, TOTAL, or WHOLE?


Sure, I'll play this game.  I like EXHAUSTIVE.


I prefer 'WHOLE', as it seems more obvious (and not because of the pun
relating to 'wholesomeness'!!!)


I think that users might believe that they need VACUUM (WHOLE) a lot
more often than they will actually need this option.  "Of course I
want to vacuum my whole table!"

I think we should give this a name that hints more strongly at this
being an exceptional thing, like vacuum (even_frozen_pages).


How about just FROZEN?  Perhaps it's too confusing to have that and 
FREEZE, but I thought I would throw it out there.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query and temp_file_limit

2016-05-18 Thread David Rowley
On 18 May 2016 at 22:40, Robert Haas  wrote:

> On Tue, May 17, 2016 at 6:40 PM, Peter Geoghegan  wrote:
> > On Tue, May 17, 2016 at 3:33 PM, Peter Geoghegan  wrote:
> >> Fundamentally, since temporary_files_size enforcement simply
> >> piggy-backs on low-level fd.c file management, without any
> >> consideration of what the temp files contain, it'll be hard to be sure
> >> that parallel workers will not have issues. I think it'll be far
> >> easier to fix the problem then it would be to figure out if it's
> >> possible to get away with it.
> >
> > I'll write a patch to fix the issue, if there is a consensus on a
> solution.
>
> I think for 9.6 we just have to document this issue.  In the next
> release, we could (and might well want to) try to do something more
> clever.
>
> What I'm tempted to do is trying to document that, as a point of
> policy, parallel query in 9.6 uses up to (workers + 1) times the
> resources that a single session might use.  That includes not only CPU
> but also things like work_mem and temp file space.  This obviously
> isn't ideal, but it's what could be done by the ship date.
>

I was asked (internally I believe) about abuse of work_mem during my work
on parallel aggregates, at the time I didn't really feel like I was abusing
that any more than parallel hash join was.  My thought was that one day it
would be nice if work_mem could be granted to a query and we had some query
marshal system which ensured that the total grants did not exceed the
server wide memory dedicated to work_mem. Of course that's lots of work, as
there's at least one node (HashAgg) which can still blow out work_mem for
bad estimates. For this release, I assumed it wouldn't be too big an issue
if we're shipping with max_parallel_degree = 0 as we could just decorate
the docs with some warnings about work_mem is per node / per worker to
caution users setting this setting any higher. That might be enough to give
us wriggle from for the future where we can make improvements, so I agree
with Robert, the docs seem like the best solution for 9.6.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Parallel query and temp_file_limit

2016-05-18 Thread Robert Haas
On Tue, May 17, 2016 at 6:40 PM, Peter Geoghegan  wrote:
> On Tue, May 17, 2016 at 3:33 PM, Peter Geoghegan  wrote:
>> Fundamentally, since temporary_files_size enforcement simply
>> piggy-backs on low-level fd.c file management, without any
>> consideration of what the temp files contain, it'll be hard to be sure
>> that parallel workers will not have issues. I think it'll be far
>> easier to fix the problem then it would be to figure out if it's
>> possible to get away with it.
>
> I'll write a patch to fix the issue, if there is a consensus on a solution.

I think for 9.6 we just have to document this issue.  In the next
release, we could (and might well want to) try to do something more
clever.

What I'm tempted to do is trying to document that, as a point of
policy, parallel query in 9.6 uses up to (workers + 1) times the
resources that a single session might use.  That includes not only CPU
but also things like work_mem and temp file space.  This obviously
isn't ideal, but it's what could be done by the ship date.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Robert Haas
On Tue, May 17, 2016 at 5:47 PM, Gavin Flower
 wrote:
> On 18/05/16 09:34, Vik Fearing wrote:
>> On 17/05/16 21:32, Alvaro Herrera wrote:
>>>
>>> Is SCAN_ALL really the best we can do here?  The business of having an
>>> underscore in an option name has no precedent (other than
>>> CURRENT_DATABASE and the like).
>>
>> ALTER DATABASE has options for ALLOW_CONNECTIONS, CONNECTION_LIMIT, and
>> IS_TEMPLATE.
>>
>>> How about COMPLETE, TOTAL, or WHOLE?
>>
>> Sure, I'll play this game.  I like EXHAUSTIVE.
>
> I prefer 'WHOLE', as it seems more obvious (and not because of the pun
> relating to 'wholesomeness'!!!)

I think that users might believe that they need VACUUM (WHOLE) a lot
more often than they will actually need this option.  "Of course I
want to vacuum my whole table!"

I think we should give this a name that hints more strongly at this
being an exceptional thing, like vacuum (even_frozen_pages).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Allocate all page images at once in generic wal interface

2016-05-18 Thread Mikael Kjellström



Could somebody explain me what's going on?


That seems entirely unrelated to what you changed, and curculio's next
run failed even more bizarrely:

commands/explain.o: could not read symbols: File format not recognized
collect2: ld returned 1 exit status


As that is my animal I will take a look at it.  I agree that error 
message is very bizzare.



It looks to me like that machine is suffering disk or filesystem
problems.


Looks like it.  I have taking it offline until I have had time to 
investigate more.


/Mikael


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers