Re: Is there a cache consistent interface to tables ?

2018-02-09 Thread Gary M
Thanks for the replies.

@Craig

Realtime ? well not really. I guess it's your definition of realtime.. I
usually think of micro to nano seconds as real-time. If I were still
designing chips, I'd be calling picosecs real-time these days.

Thank you for the recommendation on performance tuning tools. I am familiar
with OS tuning, I spent many years as a performance consultant targeting OS
integration and array controller design for major storage vendors. Most
don't consider to major design changes, but that's another story.

"Expect to do some work on latency spikes - scheduling issues, checkpoints,
etc."
Yes, I have been bumping into them. I'm trying to profile those issue as
I'm typing this.

"How sure are you that it's not viable for SQL queries?"
I'm topping off at 20K/s inserts due to locking and scheduling issues.
There are peaks reaching 50K inserts, but not sustained. I also placed the
login on a separate disk to prevent collisions. I have backing storage with
50us latency and 400mb/s bandwidth, so backing storage is not the issue.

"And, if not, what makes you think that a lower level interface will help
you? "
Well, I was thinking sql parsing and planning is an unnecessary step that
can be removed from the pipeline. Instead of parsing the data stream to
another format and creating a query, I'll just have the stream parser(s)
generate the table structure and write directly to the tables.

 "Has profiling and tracing/timing shown that significant time/delays are
arising from layers you can bypass in a sensible way?"
"Sensible" is why I'm posting the question here.  I'm not familiar enough
with the code and processing pipelines to understand subtleties effecting
high volume insert performance.  There is a lot of "stuff" going on, much
of it event driven.  My first impulse is a pipeline scheduler taking
advantage of processor affinity.  Its an ugly, brute force approach, but it
does work.

"You can probably gain a fair bit with some caching of all the type and
relation oids etc, but of course you must ensure you subscribe to the
necessary relcache/syscache invalidations and act on them appropriately.
See inval.[ch] ."
Roger.. Thank you...

"You'll definitely want to batch into txns and use async commit. But beware
the data durability implications."
I'm assuming batches based on block size.

"BDR and pglogical do some of this, you can take a look at them for some
ideas/examples."
Thanks... I'll look at them today...

"Make sure you have a buffering layer that can accumulate rows if there's a
DB failure/outage etc. Otherwise you can never, ever, ever upgrade,
diagnostics and maintenance are harder, etc. Don't fire-and-forget. It can
be a simple producer/consumer that writes sequentially to a collection of
buffer files or whatever."

The final design will likely require some type of shared block storage.
I've always liked the Solaris ZFS LVM layer, although is not distributed.
Fire and forget is not an option here. I was hoping to leverage existing
postgres facilities for that heavy lifting. That's why I had
originally looked at the wal interface.

I'm also looking at approaches from a project called "Bottled-Water"
https://www.confluent.io/blog/bottled-water-real-time-integration-
of-postgresql-and-kafka/

One more fact I forgot to add.. The insert load into the database is about
2kb/record or about 200MB/s.

thank you
gary

On Fri, Feb 9, 2018 at 7:40 AM, Craig Ringer  wrote:

> On 9 February 2018 at 15:56, Garym  wrote:
>
>> Hi,
>> This is an odd request for help. I'm looking to expose an interface so an
>> external app can insert to a table while maintaining cache consistency and
>> inserts be promoted via wal.
>>
>> I need to support about 100k+ inserts/sec from a sensor data stream. It
>> simply won't work using sql queries.  If the call overhead is too high for
>> single calls, multiple records per call is better. The data must be
>> available for selects in 500ms.  I current only have 24gb ram for pg, but
>> production will be 56gb.
>>
>> I'm taking this approach because pgpool2 chokes, delaying past
>> requirements. I initially wanted to use wal, but masters don't want wal in
>> feeds and slaves have unpredictable delays of seconds before provisioning
>> occurs.
>>
>>
> So you're looking to use Pg in a near-realtime application?
>
> Expect to do some work on latency spikes - scheduling issues, checkpoints,
> etc. I strongly advise you to spend some quality time getting faimiliar
> with perf, DTrace, systemtap, Linux eBPF tracing (
> http://www.brendangregg.com/ebpf.html), or the like. Tuning of kernel
> options related to I/O and writeback is likely to be needed, also scheduler
> and memory settings.
>
> How sure are you that it's not viable for SQL queries? And, if not, what
> makes you think that a lower level interface will help you? Has profiling
> and tracing/timing shown that significant time/delays are arising from
> layers you can bypass in a 

Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support

2018-02-09 Thread Thomas Munro
On Sat, Feb 10, 2018 at 4:32 PM, Thomas Munro
 wrote:
> I agree that it would be nice if the build farm (and my unofficial
> patch tester for that matter) could automatically test the LDAP stuff
> when running on a suitable system, but I think it would need to be
> based not just on ifeq ($(with_ldap),yes) as you showed.  I think the
> test would need a way to report that it can't find slapd so it can't
> run, but not consider that to be a hard failure.  Or something like
> that...

Hmm.  I guess I just changed the subject a bit there to running those
tests from check-world, if possible, by default.  But as
src/test/Makefile says, those test suites "are not secure to run on a
multi-user system".  You're talking about making the tests not fail if
you tried to run them directly (not from check-world, but by direct
invocation) when you didn't build with the right options.  I take back
what I said: it's probably better if you run those tests explicitly
when you know you have the prerequisites installed and you're OK with
the security implications (for example I should probably just do that
on those Travis CI patch tester builds).  Sorry for the noise.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support

2018-02-09 Thread Thomas Munro
On Sat, Feb 10, 2018 at 1:07 PM, Michael Paquier  wrote:
> On Thu, Feb 08, 2018 at 09:28:10AM -0500, Tom Lane wrote:
>> Michael Paquier  writes:
>> That seems like possibly not such a great idea.  If somebody were using
>> a run of src/test/ssl to check their build, they would now get no
>> notification if they'd forgotten to build --with-openssl.
>>
>> Perhaps we could introduce some new targets, "check-if-enabled" or so,
>> that act like you want.
>
> Per this argument, we need to do something even for check and
> installcheck anyway, no?  Except that what you are suggesting is to make
> the tests fail instead of silently being bypassed.  Copying an
> expression you used recently, this boils down to not spend CPU cycles
> for nothing.  The TAP tests showing in red all over your screen don't
> show any useful information either as one may be building with SSL
> support, and still getting failures because he/she is working on an
> SSL-related feature.
>
> I prefer making the tests personally not fail, as when building without
> SSL one needs to move down to run ./configure again, so he likely knows
> what is is doing.  Bypassing them also has the advantage to not do
> failure check dances, particularly in bash when using temporarily set
> +e/-e to avoid a problem, so this makes things easier for most cases.

One complication with the LDAP tests is that
src/test/ldap/t/001_auth.pl has greater requirements than --with-ldap.
For example, on a Debian system you probably only need the
libldap2-dev package installed for --with-ldap to build, but
001_auth.pl also requires the slapd package to be installed (the
OpenLDAP server).  On some other systems including macOS and maybe
AIX, a conforming LDAP client library (maybe OpenLDAP or maybe some
other implementation) is part of the base system but I don't think
slapd is.

I agree that it would be nice if the build farm (and my unofficial
patch tester for that matter) could automatically test the LDAP stuff
when running on a suitable system, but I think it would need to be
based not just on ifeq ($(with_ldap),yes) as you showed.  I think the
test would need a way to report that it can't find slapd so it can't
run, but not consider that to be a hard failure.  Or something like
that...

It might also be a good idea if you could tell 001_auth.pl where slapd
and openssl (used by 001_auth.pl to make certificates) are through
environment variables, in case anyone wants to run those tests against
an installation other than one in the hardcoded paths we told it about
for Linux, macOS (with Brew) and FreeBSD.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] MERGE SQL Statement for PG11

2018-02-09 Thread Tomas Vondra
Hi,

On 02/07/2018 10:24 AM, Pavan Deolasee wrote:
> 
> ...
>
> Here is v15 of the patch.
>

I've been looking at this version of the patch, mostly to educate myself
before attempting to write the "status summary".

One bit that I don't quite understand is GetXactWALBytes(). It pretty
much just returns XactLastRecEnd and is used in ExecMerge like this:

   int64 startWAL = GetXactWALBytes();
   bool qual = ExecQual(action->whenqual, econtext);

   /*
* SQL Standard says that WHEN AND conditions must not
* write to the database, so check we haven't written
* any WAL during the test. Very sensible that is, since
* we can end up evaluating some tests multiple times if
* we have concurrent activity and complex WHEN clauses.
*
* XXX If we had some clear form of functional labelling
* we could use that, if we trusted it.
*/
   if (startWAL < GetXactWALBytes())
   ereport(ERROR,
   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot write to database ...")));

I think this actually fails to enforce the rule, because some writes may
not produce WAL (think of unlogged tables). I also suspect it may be
incorrect "in the opposite direction" because a query may not do any
changes and yet it may produce WAL (e.g. due to wal_hint_bits=true).

So we may need to think of a different way to enforce this ...

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ldapi support

2018-02-09 Thread Peter Eisentraut
On 2/8/18 22:56, Thomas Munro wrote:
> The test doesn't actually succeed in reloading the pg_hba.conf
> file though:
> 
> 2018-02-09 16:41:15.886 NZDT [24472] LOG:  received SIGHUP, reloading
> configuration files
> 2018-02-09 16:41:15.893 NZDT [24472] LOG:  unsupported LDAP URL scheme: ldapi
> 2018-02-09 16:41:15.893 NZDT [24472] LOG:  pg_hba.conf was not reloaded

Hmm.  I think the ldap test suite should be changed to use
$node->restart instead of $node->reload, so we can be sure that the
various pg_hba.conf lines are actually accepted.  It appears that doing
so wouldn't impact the run time significantly.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Wait event names mismatch: oldserxid

2018-02-09 Thread Michael Paquier
On Fri, Feb 09, 2018 at 07:33:57PM +0530, Ashutosh Bapat wrote:
> I didn't pay attention to the second one. Thanks for pointing that
> out. But then like me a user may first land on OldSerXidLock since
> that is first in the list and get confused. May be we should use names
> which are not prefix of others.

Yeah, I don't think that it would be good to have users abuse of things
like ~ 'oldserxid' to track different event types.  This encourages such
filters on pg_stat_activity.
--
Michael


signature.asc
Description: PGP signature


Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support

2018-02-09 Thread Michael Paquier
On Thu, Feb 08, 2018 at 09:28:10AM -0500, Tom Lane wrote:
> Michael Paquier  writes:
> That seems like possibly not such a great idea.  If somebody were using
> a run of src/test/ssl to check their build, they would now get no
> notification if they'd forgotten to build --with-openssl.
> 
> Perhaps we could introduce some new targets, "check-if-enabled" or so,
> that act like you want.

Per this argument, we need to do something even for check and
installcheck anyway, no?  Except that what you are suggesting is to make
the tests fail instead of silently being bypassed.  Copying an
expression you used recently, this boils down to not spend CPU cycles
for nothing.  The TAP tests showing in red all over your screen don't
show any useful information either as one may be building with SSL
support, and still getting failures because he/she is working on an
SSL-related feature.

I prefer making the tests personally not fail, as when building without
SSL one needs to move down to run ./configure again, so he likely knows
what is is doing.  Bypassing them also has the advantage to not do
failure check dances, particularly in bash when using temporarily set
+e/-e to avoid a problem, so this makes things easier for most cases.
--
Michael


signature.asc
Description: PGP signature


Re: Built-in connection pooling

2018-02-09 Thread Ryan Pedela
On Fri, Feb 9, 2018 at 4:14 PM, Shay Rojansky  wrote:

> Am a bit late to this thread, sorry if I'm slightly rehashing things. I'd
> like to go back to the basic on this.
>
> Unless I'm mistaken, at least in the Java and .NET world, clients are
> almost always expected to have their own connection pooling, either
> implemented inside the driver (ADO.NET model) or as a separate modular
> component (JDBC). This approach has a few performance advantages:
>
> 1. "Opening" a new pooled connection is virtually free - no TCP connection
> needs to be opened, no I/O, no startup packet, nothing (only a tiny bit of
> synchronization).
> 2. Important client state can be associated to physical connections. For
> example, prepared statements can be tracked on the physical connection, and
> persisted when the connection is returned to the pool. The next time the
> physical connection is returned from the pool, if the user tries to
> server-prepare a statement, we can check on the connection if it has
> already been prepared in a "previous lifetime", and if so, no need to
> prepare again. This is vital for scenarios with short-lived (pooled)
> connections, such as web. Npgsql does this.
>
> Regarding the problem of idle connections being kept open by clients, I'd
> argue it's a client-side problem. If the client is using a connection pool,
> the pool should be configurable to close idle connections after a certain
> time (I think this is relatively standard behavior). If the client isn't
> using a pool, it seems to be the application's responsibility to release
> connections when they're no longer needed.
>
> The one drawback is that the pooling is application-specific, so it can't
> be shared by multiple applications/hosts. So in some scenarios it may make
> sense to use both client pooling and proxy/server pooling.
>
> To sum it up, I would argue that connection pooling should first and
> foremost be considered as a client feature, rather than a proxy feature
> (pgpool) or server feature (the PostgreSQL pooling being discussed here).
> This isn't to say server-side pooling has no value though.
>

Recently, I did a large amount of parallel data processing where the
results were stored in PG. I had about 1000 workers each with their own PG
connection. As you pointed out, application pooling doesn't make sense in
this scenario. I tried pgpool and pgbouncer, and both ended up as the
bottleneck. Overall throughput was not great but it was highest without a
pooler. That aligns with Konstantin's benchmarks too. As far as I know,
server pooling is the only solution to increase throughput, without
upgrading hardware, for this use case.

I hope this PR gets accepted!


Re: Built-in connection pooling

2018-02-09 Thread Shay Rojansky
Am a bit late to this thread, sorry if I'm slightly rehashing things. I'd
like to go back to the basic on this.

Unless I'm mistaken, at least in the Java and .NET world, clients are
almost always expected to have their own connection pooling, either
implemented inside the driver (ADO.NET model) or as a separate modular
component (JDBC). This approach has a few performance advantages:

1. "Opening" a new pooled connection is virtually free - no TCP connection
needs to be opened, no I/O, no startup packet, nothing (only a tiny bit of
synchronization).
2. Important client state can be associated to physical connections. For
example, prepared statements can be tracked on the physical connection, and
persisted when the connection is returned to the pool. The next time the
physical connection is returned from the pool, if the user tries to
server-prepare a statement, we can check on the connection if it has
already been prepared in a "previous lifetime", and if so, no need to
prepare again. This is vital for scenarios with short-lived (pooled)
connections, such as web. Npgsql does this.

Regarding the problem of idle connections being kept open by clients, I'd
argue it's a client-side problem. If the client is using a connection pool,
the pool should be configurable to close idle connections after a certain
time (I think this is relatively standard behavior). If the client isn't
using a pool, it seems to be the application's responsibility to release
connections when they're no longer needed.

The one drawback is that the pooling is application-specific, so it can't
be shared by multiple applications/hosts. So in some scenarios it may make
sense to use both client pooling and proxy/server pooling.

To sum it up, I would argue that connection pooling should first and
foremost be considered as a client feature, rather than a proxy feature
(pgpool) or server feature (the PostgreSQL pooling being discussed here).
This isn't to say server-side pooling has no value though.


Re: [PATCH][PROPOSAL] Add enum releation option type

2018-02-09 Thread Alvaro Herrera
Nikolay Shaplov wrote:

> I found out, that all relation options of string type in current postgres, 
> are 
> actually behaving as "enum" type.

If this patch gets in, I wonder if there are any external modules that
use actual strings.  An hypothetical example would be something like a
SSL cipher list; it needs to be somewhat free-form that an enum would
not cut it.  If there are such modules, then even if we remove all
existing in-core use cases we should keep the support code for strings.
Maybe we need some in-core user to verify the string case still works.
A new module in src/test/modules perhaps?  On the other hand, if we can
find no use for these string reloptions, maybe we should just remove the
support, since as I recall it's messy enough.

> [...] But each time this behavior is implemented as validate function
> plus strcmp to compare option value against one of the possible
> values.
> 
> I think it is not the best practice. It is better to have enum type
> where it is technically enum, and keep sting type for further usage
> (for example for some kind of index patterns or something like this).

Agreed with the goal, for code simplicity and hopefully reducing
code duplication.

> Possible flaws:
> 
> 1. I've changed error message from 'Valid values are "XXX", "YYY" and "ZZZ".' 
>  
> to 'Valid values are "XXX", "YYY", "ZZZ".' to make a code a bit simpler. If 
> it 
> is not acceptable, please let me know, I will add "and" to the string.

I don't think we care about this, but is this still the case if you use
a stringinfo?

> 2. Also about the string with the list of acceptable values: the code that 
> creates this string is inside parse_one_reloption function now.

I think you could save most of that mess by using appendStringInfo and
friends.

I don't much like the way you've represented the list of possible values
for each enum.  I think it'd be better to have a struct that represents
everything about each value (string name and C symbol.  Maybe the
numerical value too if that's needed, but is it?  I suppose all code
should use the C symbol only, so why do we care about the numerical
value?).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] logical decoding of two-phase transactions

2018-02-09 Thread Andres Freund
Hi,

First off: This patch has way too many different types of changes as
part of one huge commit. This needs to be split into several
pieces. First the cleanups (e.g. the fields -> flag changes), then the
individual infrastructure pieces (like the twophase.c changes, best
split into several pieces as well, the locking stuff), then the main
feature, then support for it in the output plugin.  Each should have an
individual explanation about why the change is necessary and not a bad
idea.


On 2018-02-06 17:50:40 +0530, Nikhil Sontakke wrote:
> @@ -46,6 +48,9 @@ typedef struct
>   boolskip_empty_xacts;
>   boolxact_wrote_changes;
>   boolonly_local;
> + booltwophase_decoding;
> + booltwophase_decode_with_catalog_changes;
> + int decode_delay; /* seconds to sleep after every 
> change record */

This seems too big a crock to add just for testing. It'll also make the
testing timing dependent...

>  } TestDecodingData;

>  void
>  _PG_init(void)
> @@ -85,9 +106,15 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
>   cb->begin_cb = pg_decode_begin_txn;
>   cb->change_cb = pg_decode_change;
>   cb->commit_cb = pg_decode_commit_txn;
> + cb->abort_cb = pg_decode_abort_txn;

>   cb->filter_by_origin_cb = pg_decode_filter;
>   cb->shutdown_cb = pg_decode_shutdown;
>   cb->message_cb = pg_decode_message;
> + cb->filter_prepare_cb = pg_filter_prepare;
> + cb->filter_decode_txn_cb = pg_filter_decode_txn;
> + cb->prepare_cb = pg_decode_prepare_txn;
> + cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
> + cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
>  }

Why does this introduce both abort_cb and abort_prepared_cb? That seems
to conflate two separate features.


> +/* Filter out unnecessary two-phase transactions */
> +static bool
> +pg_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> + TransactionId xid, const char *gid)
> +{
> + TestDecodingData *data = ctx->output_plugin_private;
> +
> + /* treat all transactions as one-phase */
> + if (!data->twophase_decoding)
> + return true;
> +
> + if (txn && txn_has_catalog_changes(txn) &&
> + !data->twophase_decode_with_catalog_changes)
> + return true;

What? I'm INCREDIBLY doubtful this is a sane thing to expose to output
plugins. As in, unless I hear a very very convincing reason I'm strongly
opposed.


> +/*
> + * Check if we should continue to decode this transaction.
> + *
> + * If it has aborted in the meanwhile, then there's no sense
> + * in decoding and sending the rest of the changes, we might
> + * as well ask the subscribers to abort immediately.
> + *
> + * This should be called if we are streaming a transaction
> + * before it's committed or if we are decoding a 2PC
> + * transaction. Otherwise we always decode committed
> + * transactions
> + *
> + * Additional checks can be added here, as needed
> + */
> +static bool
> +pg_filter_decode_txn(LogicalDecodingContext *ctx,
> +ReorderBufferTXN *txn)
> +{
> + /*
> +  * Due to caching, repeated TransactionIdDidAbort calls
> +  * shouldn't be that expensive
> +  */
> + if (txn != NULL &&
> + TransactionIdIsValid(txn->xid) &&
> + TransactionIdDidAbort(txn->xid))
> + return true;
> +
> + /* if txn is NULL, filter it out */

Why can this be NULL?

> + return (txn != NULL)? false:true;
> +}


This definitely shouldn't be a task for each output plugin. Even if we
want to make this configurable, I'm doubtful that it's a good idea to do
so here - make its much less likely to hit edge cases.



>  static bool
>  pg_decode_filter(LogicalDecodingContext *ctx,
>RepOriginId origin_id)
> @@ -409,8 +622,18 @@ pg_decode_change(LogicalDecodingContext *ctx, 
> ReorderBufferTXN *txn,
>   }
>   data->xact_wrote_changes = true;
>  
> + if (!LogicalLockTransaction(txn))
> + return;

It really really can't be right that this is exposed to output plugins.


> + /* if decode_delay is specified, sleep with above lock held */
> + if (data->decode_delay > 0)
> + {
> + elog(LOG, "sleeping for %d seconds", data->decode_delay);
> + pg_usleep(data->decode_delay * 100L);
> + }

Really not on board.




> @@ -1075,6 +1077,21 @@ EndPrepare(GlobalTransaction gxact)
>   Assert(hdr->magic == TWOPHASE_MAGIC);
>   hdr->total_len = records.total_len + sizeof(pg_crc32c);
>  
> + replorigin = (replorigin_session_origin != InvalidRepOriginId &&
> +   replorigin_session_origin != 
> DoNotReplicateId);
> +
> + if (replorigin)
> + {
> + Assert(replorigin_session_origin_lsn != InvalidXLogRecPtr);
> + 

Re: Add PGDLLIMPORT to enable_hashagg

2018-02-09 Thread Robert Haas
On Wed, Feb 7, 2018 at 6:32 AM, Metin Doslu  wrote:
> i. The list of Pascal (max_worker_processes was already with
> PGDLLIMPORT, so I also added to max_parallel_workers)
> ii. Some others in cost.h to make the file more readable.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: configure issue - warnings sort: No such file or directory

2018-02-09 Thread mxbi
Hi,

Were you ever able to solve this issue? I am encountering exactly the same
problem on Ubuntu 16.04.

Thank you,
Mikel.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: [COMMITTERS] pgsql: Rearm statement_timeout after each executed query.

2018-02-09 Thread Robert Haas
On Wed, Feb 7, 2018 at 7:04 PM, Thomas Munro
 wrote:
> On Tue, Feb 6, 2018 at 5:21 PM, Peter Eisentraut
>  wrote:
>> On 9/18/17 22:41, Andres Freund wrote:
>>> Rearm statement_timeout after each executed query.
>>
>> This appears to have broken statement_timeout behavior in master such
>> that only every second query is affected by it.
>
> Yeah, I also just ran into this while testing a nearby complaint about
> statement timeouts vs parallel query.  In the error path
> stmt_timeout_active remains true, so the next statement does nothing
> in enable_statement_timeout().  I think we just need to clear that
> flag in the error path, right where we call disable_all_timeouts().
> See attached.

Looks right.  Committed, but I thought the comment was strange (forget
about?) so I just left that out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2018-02-09 Thread Robert Haas
On Fri, Feb 9, 2018 at 7:53 AM, Amit Kapila  wrote:
> I am not saying to allow other things.  I am just replying to your
> question that why can't we use PQsetSingleRowMode.  I mean to say that
> one can fetch the data parallelly via the usage of cursors (by having
> restrictions like don't allow other parallel unsafe statements) in
> plpgsql.

I'm pretty sure that other parallel-unsafe statements is an
insufficient restriction.  I think we're just going around in circles
here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-02-09 Thread Robert Haas
On Fri, Feb 9, 2018 at 1:33 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Me neither.  I just ran the postgres_fdw regression tests 713 times in
>> a row without a failure.  Tom, since you seem to be able to reproduce
>> the problem locally, could you have a look at this proposed fix?
>
> I'm a bit busy, but AFAICS it's just a timing thing, so try inserting
> a sleep.  The attached is enough to reproduce rhinoceros' results
> for me.

Not for me, but when I pushed the pg_sleep up to 180 seconds, then it failed.

With the proposed patch, it passes repeatedly for me with no sleep,
and also passes for me with the sleep.  So I guess I'll commit this
and see what the buildfarm thinks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: JIT compiling with LLVM v10.0

2018-02-09 Thread Dmitry Dolgov
> On 8 February 2018 at 21:26, Thomas Munro  
> wrote:
> On Fri, Feb 9, 2018 at 3:14 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> $ ./configure --prefix=/build/postgres-jit/ --with-llvm
>> --enable-debug --enable-depend --enable-cassert
>
>> /usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This
>> file requires compiler and library support for the ISO C++ 2011
>> standard. This support must be enabled with the -std=c++11 or
>> -std=gnu++11 compiler options.
>
> Did you try passing CXXFLAGS="-std=c++11" to configure?

Yes, it solved the issue, thanks.



Re: [HACKERS] Transactions involving multiple postgres foreign servers

2018-02-09 Thread Robert Haas
On Thu, Feb 8, 2018 at 3:58 AM, Masahiko Sawada  wrote:
>> Overall, what's the status of this patch?  Are we hung up on this
>> issue only, or are there other things?
>
> AFAIK there is no more technical issue in this patch so far other than
> this issue. The patch has tests and docs, and includes all stuff to
> support atomic commit to distributed transactions: the introducing
> both the atomic commit ability to distributed transactions and some
> corresponding FDW APIs, and having postgres_fdw support 2pc. I think
> this patch needs to be reviewed, especially the functionality of
> foreign transaction resolution which is re-designed before.

OK.  I'm going to give 0002 a read-through now, but I think it would
be a good thing if others also contributed to the review effort.
There is a lot of code here, and there are a lot of other patches
competing for attention.  That said, here we go:

In the documentation for pg_prepared_fdw_xacts, the first two columns
have descriptions ending in a preposition.  That's typically to be
avoided in formal writing.  The first one can be fixed by moving "in"
before "which".  The second can be fixed by changing "that" to "with
which" and then dropping the trailing "with".  The first three columns
have descriptions ending in a period; the latter two do not.  Make it
consistent with whatever the surrounding style is, or at least
internally consistent if the surrounding style varies.  Also, some of
the descriptions begin with "the" and others do not; again, seems
better to be consistent and adhere to surrounding style.  The
documentation of the serverid column seems to make no sense.  Possibly
you mean "OID of the foreign server on which this foreign transaction
is prepared"?  As it is, you use "foreign server" twice, which is why
I'm confused.

The documentation of max_prepared_foreign_transactions seems a bit
brief.  I think that if I want to be able to use 2PC for N
transactions each across K servers, this variable needs to be set to
N*K, not just N.  That's not the right way to word it here, but
somehow you probably want to explain that a single local transaction
can give rise to multiple foreign transactions and that this should be
taken into consideration when setting a value for this variable.
Maybe also include a link to where the user can find more information,
which brings me to another point: there doesn't seem to be any
general, user-facing explanation of this system.  You explain the
catalogs, the GUCs, the interface, etc. but there's nothing anywhere
that explains the overall point of the whole thing, which seems pretty
important. The closest thing you've got is a description for people
writing FDWs, but we really need a place to explain the concept to
*users*.  One idea is to add a new chapter to the "Server
Administration" section, maybe just after "Logical Replication" and
before "Regression Tests".  But I'm open to other ideas.

It's important that the documentation of the various GUCs provide
users with some clue as to how to set them.  I notice this
particularly for foreign_transaction_resolution_interval; off the top
of my head, without having read the rest of this patch, I don't know
why I shouldn't want this to be zero.  But the others could use more
explanation as well.

It is unclear from reading the documentation for GetPreparedId why
this should be the responsibility of the FDW, or how the FDW is
supposed to guarantee uniqueness.

PrepareForignTransaction is spelled wrong.  Nearby typos: prepareing,
tranasaction.  A bit further down, "can _prepare" has an unwanted
space in the middle.  Various places in this section capitalize
certain words in the middle of sentences which is not standard English
style.  For example, in "Every Foreign Data Wrapper is required..."
the word "Every" is appropriately capitalized because it begins a
sentence, but there's no reason to capitalize the others.  Likewise
for "...employs Two-phase commit protocol..." and other similar cases.

EndForeignTransaction doesn't explain what sorts of things the FDW can
legally do when called, or how this method is intended to be used.
Those seem like important details.  Especially, an FDW that tries to
do arbitrary stuff during abort cleanup will probably cause big
problems.

The fdw-transactions section of the documentation seems to imply that
henceforth every FDW must call FdwXactRegisterForeignServer, which I
think is an unacceptable API break.

It doesn't seem advisable to make this behavior depend on
max_prepared_foreign_transactions.  I think that it should be an
server-level option: use 2PC for this server, or not?  FDWs that don't
have 2PC default to "no"; but others can be set to "yes" if the user
wishes.  But we shouldn't force that decision to be on a cluster-wide
basis.

+ shows the functions
+available for foreign transaction managements.

management

+Resolve a foreign transaction. This function search foreign transaction

searches 

Re: JIT compiling with LLVM v9.0

2018-02-09 Thread Andres Freund
On 2018-02-09 09:10:25 -0600, Merlin Moncure wrote:
> Question:  when watching the compilation log, I see quite a few files
> being compiled with both O2 and O1, for example:
> 
> clang -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -Wno-unused-command-line-argument -O2 -O1
> -Wno-ignored-attributes -Wno-unknown-warning-option
> -Wno-ignored-optimization-argument -I../../../../src/include
> -D_GNU_SOURCE -I/home/mmoncure/llvm/include -DLLVM_BUILD_GLOBAL_ISEL
> -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS
> -D__STDC_LIMIT_MACROS  -flto=thin -emit-llvm -c -o nbtsort.bc
> nbtsort.c
> 
> Is this intentional?  (didn't check standard compilation, it just jumped out).

It stemms from the following hunk in Makefile.global.in about emitting
bitcode:
# Add -O1 to the options as clang otherwise will emit 'noinline'
# attributes everywhere, making JIT inlining impossible to test in a
# debugging build.
#
# FIXME: While LLVM will re-optimize when emitting code (after
# inlining), it'd be better to only do this if -O0 is specified.
%.bc : CFLAGS +=-O1

%.bc : %.c
$(COMPILE.c.bc) -o $@ $<

Inspecting the clang source code it's impossible to stop clang from
emitting noinline attributes for every function on -O0.

I think it makes sense to change this to filtering out -O0 and only
adding -O1 if that's not present. :/

Greetings,

Andres Freund



Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-02-09 Thread Tom Lane
Robert Haas  writes:
> Me neither.  I just ran the postgres_fdw regression tests 713 times in
> a row without a failure.  Tom, since you seem to be able to reproduce
> the problem locally, could you have a look at this proposed fix?

I'm a bit busy, but AFAICS it's just a timing thing, so try inserting
a sleep.  The attached is enough to reproduce rhinoceros' results
for me.

regards, tom lane

diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 0b2c528..3f8bd6d 100644
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
*** INSERT INTO ft2 (c1,c2,c3)
*** 1133,1138 
--- 1133,1139 
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;-- can't be pushed down
  UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
+ select pg_sleep(60);
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c3 = 'baz'
FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)


Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-02-09 Thread Robert Haas
On Fri, Feb 9, 2018 at 5:24 AM, Etsuro Fujita
 wrote:
> I tried to reproduce that in my environment, but I couldn't.  On reflection
> I think an easy and reliable way to address that concern is to use local
> stats on foreign tables.  Attached is a patch for that.

Me neither.  I just ran the postgres_fdw regression tests 713 times in
a row without a failure.  Tom, since you seem to be able to reproduce
the problem locally, could you have a look at this proposed fix?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2018-02-09 Thread Claudio Freire
On Fri, Feb 9, 2018 at 10:32 AM, Alvaro Herrera  wrote:
> Claudio Freire wrote:
>> On Thu, Feb 8, 2018 at 8:39 PM, Alvaro Herrera  
>> wrote:
>
>> During the process of developing the patch, I got seriously broken
>> code that passed the tests nonetheless. The test as it was was very
>> ineffective at actually detecting issues.
>>
>> This new test may be slow, but it's effective. That's a very good
>> reason to make it slower, if you ask me.
>
> OK, I don't disagree with improving the test, but if we can make it fast
> *and* effective, that's better than slow and effective.

I'd love to have a test that uses multiple segments of dead tuples,
but for that, it needs to use more than 64MB of mwm. That amounts to,
basically, ~12M rows.

Is there a "slow test suite" where such a test could be added that
won't bother regular "make check"?

That, or we turn the initial segment size into a GUC, but I don't
think it's a useful GUC outside of the test suite.

>> > 3. Figure out the minimum size for the table that triggers the behavior
>> >you want.  Right now you use 400k tuples -- maybe 100k are sufficient?
>> >Don't know.
>>
>> For that test, I need enough *dead* tuples to cause several passes.
>> Even small mwm settings require tons of tuples for this. In fact, I'm
>> thinking that number might be too low for its purpose, even. I'll
>> re-check, but I doubt it's too high. If anything, it's too low.
>
> OK.

Turns out that it was a tad oversized. 300k tuples seems enough.

Attached is a new patch version that:

- Uses an unlogged table to make the large mwm test faster
- Uses a wait_barrier helper that waits for concurrent transactions
  to finish before vacuuming tables, to make sure deleted tuples
  actually are vacuumable
- Tweaks the size of the large mwm test to be as small as possible
- Optimizes the delete to avoid expensive operations yet attain
  the same end result
From 20ca670215a7bf63f1bc2290efd5a89bcafabb20 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Mon, 12 Sep 2016 23:36:42 -0300
Subject: [PATCH 1/2] Vacuum: allow using more than 1GB work mem

Turn the dead_tuples array into a structure composed of several
exponentially bigger arrays, to enable usage of more than 1GB
of work mem during vacuum and thus reduce the number of full
index scans necessary to remove all dead tids when the memory is
available.

Improve test ability to spot vacuum errors
---
 src/backend/commands/vacuumlazy.c| 402 ---
 src/test/regress/expected/vacuum.out |  72 ++-
 src/test/regress/sql/vacuum.sql  |  40 +++-
 3 files changed, 438 insertions(+), 76 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index cf7f5e1..1e82d26 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -11,11 +11,24 @@
  *
  * We are willing to use at most maintenance_work_mem (or perhaps
  * autovacuum_work_mem) memory space to keep track of dead tuples.  We
- * initially allocate an array of TIDs of that size, with an upper limit that
+ * initially allocate an array of TIDs of 128MB, or an upper limit that
  * depends on table size (this limit ensures we don't allocate a huge area
- * uselessly for vacuuming small tables).  If the array threatens to overflow,
+ * uselessly for vacuuming small tables). Additional arrays of increasingly
+ * large sizes are allocated as they become necessary.
+ *
+ * The TID array is thus represented as a list of multiple segments of
+ * varying size, beginning with the initial size of up to 128MB, and growing
+ * exponentially until the whole budget of (autovacuum_)maintenance_work_mem
+ * is used up.
+ *
+ * Lookup in that structure happens with a binary search of segments, and then
+ * with a binary search within each segment. Since segment's size grows
+ * exponentially, this retains O(log N) lookup complexity.
+ *
+ * If the multiarray's total size threatens to grow beyond maintenance_work_mem,
  * we suspend the heap scan phase and perform a pass of index cleanup and page
- * compaction, then resume the heap scan with an empty TID array.
+ * compaction, then resume the heap scan with an array of logically empty but
+ * already preallocated TID segments to be refilled with more dead tuple TIDs.
  *
  * If we're processing a table with no indexes, we can just vacuum each page
  * as we go; there's no need to save up multiple tuples to minimize the number
@@ -92,6 +105,14 @@
 #define LAZY_ALLOC_TUPLES		MaxHeapTuplesPerPage
 
 /*
+ * Minimum (starting) size of the dead_tuples array segments. Will allocate
+ * space for 128MB worth of tid pointers in the first segment, further segments
+ * will grow in size exponentially. Don't make it too small or the segment list
+ * will grow bigger than the sweetspot for search efficiency on big vacuums.
+ */
+#define LAZY_INIT_TUPLES		Max(MaxHeapTuplesPerPage, 

Re: [HACKERS] Bug in to_timestamp().

2018-02-09 Thread Dmitry Dolgov
> On 7 February 2018 at 22:51, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> On 6 February 2018 at 10:17, Arthur Zakirov  wrote:
>> It is strange. I still can apply both v9 [1] and v10 [2] via 'git
>> apply'. And Patch Tester [3] says that it is applied. But maybe
>> it is because of my git (git version 2.16.1).
>>
>> You can try also 'patch -p1':
>
> Yes, looks like the problem is on my side, sorry.

I went through this thread, and want to summarize a bit:

>From what I see this patch addresses most important concerns that were
mentioned in the thread, i.e. to make `to_timestamp` less confusing and be
close to Oracles behavior. The code itself looks clear and sufficient, with the
required documentation and green tests.

Looks like there are just two questions left so far:

* I've noticed what I think a difference between current that was introduced in
this patch and Oracle. In the following case we can have any number of spaces
after a separator `+` in the input string

SELECT to_timestamp('2000+JUN', '/MON');
  to_timestamp

 2000-06-01 00:00:00+02
(1 row)

SELECT to_timestamp('2000+   JUN', '/MON');
  to_timestamp

 2000-06-01 00:00:00+02
(1 row)

But no spaces before it (it actually depends on how many separators do we have
in the format string)

SELECT to_timestamp('2000 +JUN', '/MON');
ERROR:  22007: invalid value "+JU" for "MON"
DETAIL:  The given value did not match any of the allowed values
for this field.
LOCATION:  from_char_seq_search, formatting.c:2410

SELECT to_timestamp('2000 +JUN', '//MON');
  to_timestamp

 2000-06-01 00:00:00+02
(1 row)

SELECT to_timestamp('2000  +JUN', '//MON');
ERROR:  22007: invalid value "+JU" for "MON"
DETAIL:  The given value did not match any of the allowed values
for this field.
LOCATION:  from_char_seq_search, formatting.c:2410

Judging from this http://rextester.com/l/oracle_online_compiler in Oracle it's
possible to have any number of spaces before or after `+` independently from
the number of separators in an input string. Is it intended?

SELECT to_timestamp('2000  +  JUN', '/MON') FROM dual
01.06.2000 00:00:00

* About usage of locale dependent functions e.g. `isalpha`. Yes, looks like
it's better to have locale-agnostic implementation, but then I'm confused - all
functions except `isdigit`/`isxdigit` are locale-dependent, including
`isspace`, which is also in use.



Re: Built-in connection pooling

2018-02-09 Thread Konstantin Knizhnik
Attached please find new version of built-in connection pooling 
supporting temporary tables and session GUCs.

Also Win32 support was added.

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 93c4bbf..dfc072c 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -194,6 +194,7 @@ char	   *namespace_search_path = NULL;
 /* Local functions */
 static void recomputeNamespacePath(void);
 static void InitTempTableNamespace(void);
+static Oid  GetTempTableNamespace(void);
 static void RemoveTempRelations(Oid tempNamespaceId);
 static void RemoveTempRelationsCallback(int code, Datum arg);
 static void NamespaceCallback(Datum arg, int cacheid, uint32 hashvalue);
@@ -441,9 +442,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 		if (strcmp(newRelation->schemaname, "pg_temp") == 0)
 		{
 			/* Initialize temp namespace if first time through */
-			if (!OidIsValid(myTempNamespace))
-InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		/* use exact schema given */
 		namespaceId = get_namespace_oid(newRelation->schemaname, false);
@@ -452,9 +451,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 	else if (newRelation->relpersistence == RELPERSISTENCE_TEMP)
 	{
 		/* Initialize temp namespace if first time through */
-		if (!OidIsValid(myTempNamespace))
-			InitTempTableNamespace();
-		return myTempNamespace;
+		return GetTempTableNamespace();
 	}
 	else
 	{
@@ -463,8 +460,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 		if (activeTempCreationPending)
 		{
 			/* Need to initialize temp namespace */
-			InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		namespaceId = activeCreationNamespace;
 		if (!OidIsValid(namespaceId))
@@ -2902,9 +2898,7 @@ LookupCreationNamespace(const char *nspname)
 	if (strcmp(nspname, "pg_temp") == 0)
 	{
 		/* Initialize temp namespace if first time through */
-		if (!OidIsValid(myTempNamespace))
-			InitTempTableNamespace();
-		return myTempNamespace;
+		return GetTempTableNamespace();
 	}
 
 	namespaceId = get_namespace_oid(nspname, false);
@@ -2967,9 +2961,7 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 		if (strcmp(schemaname, "pg_temp") == 0)
 		{
 			/* Initialize temp namespace if first time through */
-			if (!OidIsValid(myTempNamespace))
-InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		/* use exact schema given */
 		namespaceId = get_namespace_oid(schemaname, false);
@@ -2982,8 +2974,7 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 		if (activeTempCreationPending)
 		{
 			/* Need to initialize temp namespace */
-			InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		namespaceId = activeCreationNamespace;
 		if (!OidIsValid(namespaceId))
@@ -3250,8 +3241,11 @@ void
 SetTempNamespaceState(Oid tempNamespaceId, Oid tempToastNamespaceId)
 {
 	/* Worker should not have created its own namespaces ... */
-	Assert(myTempNamespace == InvalidOid);
-	Assert(myTempToastNamespace == InvalidOid);
+	if (!ActiveSession)
+	{
+		Assert(myTempNamespace == InvalidOid);
+		Assert(myTempToastNamespace == InvalidOid);
+	}
 	Assert(myTempNamespaceSubID == InvalidSubTransactionId);
 
 	/* Assign same namespace OIDs that leader has */
@@ -3771,6 +3765,22 @@ recomputeNamespacePath(void)
 	list_free(oidlist);
 }
 
+static Oid
+GetTempTableNamespace(void)
+{
+	if (ActiveSession)
+	{
+		if (!OidIsValid(ActiveSession->tempNamespace))
+			InitTempTableNamespace();
+	}
+	else
+	{
+		if (!OidIsValid(myTempNamespace))
+			InitTempTableNamespace();
+	}
+	return myTempNamespace;
+}
+
 /*
  * InitTempTableNamespace
  *		Initialize temp table namespace on first use in a particular backend
@@ -3782,8 +3792,6 @@ InitTempTableNamespace(void)
 	Oid			namespaceId;
 	Oid			toastspaceId;
 
-	Assert(!OidIsValid(myTempNamespace));
-
 	/*
 	 * First, do permission check to see if we are authorized to make temp
 	 * tables.  We use a nonstandard error message here since "databasename:
@@ -3822,7 +3830,10 @@ InitTempTableNamespace(void)
 (errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
  errmsg("cannot create temporary tables during a parallel operation")));
 
-	snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", MyBackendId);
+	if (ActiveSession)
+		snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d_%s", MyBackendId, ActiveSession->id);
+	else
+		snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", MyBackendId);
 
 	namespaceId = get_namespace_oid(namespaceName, true);
 	if (!OidIsValid(namespaceId))
@@ -3854,8 +3865,10 @@ InitTempTableNamespace(void)
 	 * it. (We assume there is no need to clean it out if it does exist, since
 	 * dropping a parent 

Re: JIT compiling with LLVM v9.0

2018-02-09 Thread Merlin Moncure
On Thu, Feb 1, 2018 at 8:16 PM, Thomas Munro
 wrote:
> On Fri, Feb 2, 2018 at 2:05 PM, Andres Freund  wrote:
>> On 2018-02-01 09:32:17 -0800, Jeff Davis wrote:
>>> On Wed, Jan 31, 2018 at 12:03 AM, Konstantin Knizhnik
>>>  wrote:
>>> > The same problem takes place with old versions of GCC: I have to upgrade 
>>> > GCC
>>> > to 7.2 to make it possible to compile this code.
>>> > The problem in not in compiler itself, but in libc++ headers.
>>>
>>> How can I get this branch to compile on ubuntu 16.04? I have llvm-5.0
>>> and gcc-5.4 installed. Do I need to compile with clang or gcc? Any
>>> CXXFLAGS required?
>>
>> Just to understand: You're running in the issue with the header being
>> included from within the extern "C" {}?  Hm, I've pushed a quick fix for
>> that.
>
> That change wasn't quite enough: to get this building against libc++
> (Clang's native stdlb) I also needed this change to llvmjit.h so that
>  wouldn't be included with the wrong linkage (perhaps
> you can find a less ugly way):
>
> +#ifdef __cplusplus
> +}
> +#endif
>  #include 
> +#ifdef __cplusplus
> +extern "C"
> +{
> +#endif

This did the trick -- thanks.  Sitting through 20 minute computer
crashing link times really brings back C++ nightmares -- if anyone
else needs to compile llvm/clang as I did (I'm stuck on 3.2 with my
aging mint box), I strongly encourage you to use the gold linker.

Question:  when watching the compilation log, I see quite a few files
being compiled with both O2 and O1, for example:

clang -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -Wno-unused-command-line-argument -O2 -O1
-Wno-ignored-attributes -Wno-unknown-warning-option
-Wno-ignored-optimization-argument -I../../../../src/include
-D_GNU_SOURCE -I/home/mmoncure/llvm/include -DLLVM_BUILD_GLOBAL_ISEL
-D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS
-D__STDC_LIMIT_MACROS  -flto=thin -emit-llvm -c -o nbtsort.bc
nbtsort.c

Is this intentional?  (didn't check standard compilation, it just jumped out).

merlin



Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-09 Thread Claudio Freire
On Fri, Feb 9, 2018 at 1:36 AM, Masahiko Sawada  wrote:
> On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire  
> wrote:
>> On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada  
>> wrote:
>>> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire  
>>> wrote:
 I can look into doing 3, that *might* get rid of the need to do that
 initial FSM vacuum, but all other intermediate vacuums are still
 needed.
>>>
>>> Understood. So how about that this patch focuses only make FSM vacuum
>>> more frequently and leaves the initial FSM vacuum and the handling
>>> cancellation cases? The rest can be a separate patch.
>>
>> Ok.
>>
>> Attached split patches. All but the initial FSM pass is in 1, the
>> initial FSM pass as in the original patch is in 2.
>>
>> I will post an alternative patch for 2 when/if I have one. In the
>> meanwhile, we can talk about 1.
>
> Thank you for updating the patch!
>
> +/* Tree pruning for partial vacuums */
> +if (threshold)
> +{
> +child_avail = fsm_get_avail(page, slot);
> +if (child_avail >= threshold)
> +continue;
> +}
>
> I think slots in a non-leaf page might not have a correct value
> because fsm_set_avail doesn't propagate up beyond pages. So do we need
> to check the root of child page rather than a slot in the non-lead
> page? I might be missing something though.

I'm not sure I understand what you're asking.

Yes, a partial FSM vacuum won't correct those inconsistencies. It
doesn't matter though, because as long as the root node (or whichever
inner node we're looking at the time) reports free space, the lookup
for free space will recurse and find the lower nodes, even if they
report more space than the inner node reported. The goal of making
free space discoverable will have been achieved nonetheless.

The final FSM vacuum pass isn't partial, to finally correct all those
small inconsistencies.



Re: CALL stmt, ERROR: unrecognized node type: 113 bug

2018-02-09 Thread David G. Johnston
On Fri, Feb 9, 2018 at 7:42 AM, Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Fri, Feb 9, 2018 at 6:23 AM, Michael Paquier 
> wrote:
> >> On Fri, Feb 09, 2018 at 12:02:57PM +0100, Pavel Stehule wrote:
> >>> Blocking subqueries in CALL parameters is possible solution.
>
> > To me this feels like an interaction between two features that users are
> > going to expect to just work.
>
> Meh.  It doesn't look significantly different to me than the restriction
> that you can't have sub-selects in CHECK expressions, index expressions,
> etc.  Obviously we need a clean failure like you get for those cases.
> But otherwise it's an OK restriction that stems from exactly the same
> cause: we do not want to invoke the full planner in this context (and
> even if we did, we don't want to use the full executor to execute the
> result).
>

Does/Should:

CALL test(func(10)); --with or without an extra set of parentheses

work here too?

David J.


Re: CALL stmt, ERROR: unrecognized node type: 113 bug

2018-02-09 Thread Tom Lane
"David G. Johnston"  writes:
> On Fri, Feb 9, 2018 at 6:23 AM, Michael Paquier  wrote:
>> On Fri, Feb 09, 2018 at 12:02:57PM +0100, Pavel Stehule wrote:
>>> Blocking subqueries in CALL parameters is possible solution.

> To me this feels like an interaction between two features that users are
> going to expect to just work.

Meh.  It doesn't look significantly different to me than the restriction
that you can't have sub-selects in CHECK expressions, index expressions,
etc.  Obviously we need a clean failure like you get for those cases.
But otherwise it's an OK restriction that stems from exactly the same
cause: we do not want to invoke the full planner in this context (and
even if we did, we don't want to use the full executor to execute the
result).

regards, tom lane



Re: Is there a cache consistent interface to tables ?

2018-02-09 Thread Craig Ringer
On 9 February 2018 at 15:56, Garym  wrote:

> Hi,
> This is an odd request for help. I'm looking to expose an interface so an
> external app can insert to a table while maintaining cache consistency and
> inserts be promoted via wal.
>
> I need to support about 100k+ inserts/sec from a sensor data stream. It
> simply won't work using sql queries.  If the call overhead is too high for
> single calls, multiple records per call is better. The data must be
> available for selects in 500ms.  I current only have 24gb ram for pg, but
> production will be 56gb.
>
> I'm taking this approach because pgpool2 chokes, delaying past
> requirements. I initially wanted to use wal, but masters don't want wal in
> feeds and slaves have unpredictable delays of seconds before provisioning
> occurs.
>
>
So you're looking to use Pg in a near-realtime application?

Expect to do some work on latency spikes - scheduling issues, checkpoints,
etc. I strongly advise you to spend some quality time getting faimiliar
with perf, DTrace, systemtap, Linux eBPF tracing (
http://www.brendangregg.com/ebpf.html), or the like. Tuning of kernel
options related to I/O and writeback is likely to be needed, also scheduler
and memory settings.

How sure are you that it's not viable for SQL queries? And, if not, what
makes you think that a lower level interface will help you? Has profiling
and tracing/timing shown that significant time/delays are arising from
layers you can bypass in a sensible way?

You definitely *can* use the heapam and indexam at a lower level to form
tuples and insert into tables, then update the indexes. See genam.c for one
example, but it's optimised for ease of use more than tight performance
AFAIK. You're looking for heap_open, heap_form_tuple, heap_insert, etc.
Beware of index maintenance.

You can probably gain a fair bit with some caching of all the type and
relation oids etc, but of course you must ensure you subscribe to the
necessary relcache/syscache invalidations and act on them appropriately.
See inval.[ch] .

You'll definitely want to batch into txns and use async commit. But beware
the data durability implications.

BDR and pglogical do some of this, you can take a look at them for some
ideas/examples.

Make sure you have a buffering layer that can accumulate rows if there's a
DB failure/outage etc. Otherwise you can never, ever, ever upgrade,
diagnostics and maintenance are harder, etc. Don't fire-and-forget. It can
be a simple producer/consumer that writes sequentially to a collection of
buffer files or whatever.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: CALL stmt, ERROR: unrecognized node type: 113 bug

2018-02-09 Thread Pavel Stehule
2018-02-09 15:15 GMT+01:00 David G. Johnston :

> On Fri, Feb 9, 2018 at 6:23 AM, Michael Paquier 
> wrote:
>
>> On Fri, Feb 09, 2018 at 12:02:57PM +0100, Pavel Stehule wrote:
>> > 2018-02-09 7:56 GMT+01:00 Michael Paquier :
>> > > The second problem involves a cache lookup failure for a type when
>> > > trying to use pg_get_functiondef on a procedure.  Luckily, it is
>> > > possible to make the difference between a procedure and a function by
>> > > checking if prorettype is InvalidOid or not.  There is room for a new
>> > > patch which supports pg_get_proceduredef() to generate the definition
>> of
>> > > a procedure, with perhaps a dedicated psql shortcut, but that could
>> > > always be done later on.
>> >
>> > Blocking subqueries in CALL parameters is possible solution.
>>
>> ExecuteCallStmt has visibly been written so as it is able to handle the
>> input set of arguments with a minimal infrastructure in place.  SubLink
>> nodes require more advanced handling as those need to go through the
>> planner.  There is also additional processing in the rewriter.  At the
>> end I tend to think that any user would just turn their back on calling
>> a function for such cases anyway, so it seems to me that the potential
>> benefits are not worth the code complexity.
>>
>
> ​CALL is not just a different syntax for function invocation - if you want
> the properties of CALL then falling back to SELECT function() is not a
> valid alternative.​
>

+1


>
> To me this feels like an interaction between two features that users are
> going to expect to just work.  Current discussions lead me to think that is
> something we strive to provide unless a strong argument against is
> provided.  I'm not sure added code complexity here is going to make the
> grade even if I cannot reasonably judge just how much complexity is
> involved.
>

when some procedure can do transaction control, or can returns possible set
or multirecord set (in future), then 100% agree, so it is different
creature then function. But if not, then it should be specified why it is
different from void function.

I don't understand, why we should to prohibit subqueries as procedure
params - with some limits. I can understand to requirement to not change
any data.

Regards

Pavel




>
> David J.
>
>


Re: CALL stmt, ERROR: unrecognized node type: 113 bug

2018-02-09 Thread David G. Johnston
On Fri, Feb 9, 2018 at 6:23 AM, Michael Paquier  wrote:

> On Fri, Feb 09, 2018 at 12:02:57PM +0100, Pavel Stehule wrote:
> > 2018-02-09 7:56 GMT+01:00 Michael Paquier :
> > > The second problem involves a cache lookup failure for a type when
> > > trying to use pg_get_functiondef on a procedure.  Luckily, it is
> > > possible to make the difference between a procedure and a function by
> > > checking if prorettype is InvalidOid or not.  There is room for a new
> > > patch which supports pg_get_proceduredef() to generate the definition
> of
> > > a procedure, with perhaps a dedicated psql shortcut, but that could
> > > always be done later on.
> >
> > Blocking subqueries in CALL parameters is possible solution.
>
> ExecuteCallStmt has visibly been written so as it is able to handle the
> input set of arguments with a minimal infrastructure in place.  SubLink
> nodes require more advanced handling as those need to go through the
> planner.  There is also additional processing in the rewriter.  At the
> end I tend to think that any user would just turn their back on calling
> a function for such cases anyway, so it seems to me that the potential
> benefits are not worth the code complexity.
>

​CALL is not just a different syntax for function invocation - if you want
the properties of CALL then falling back to SELECT function() is not a
valid alternative.​

To me this feels like an interaction between two features that users are
going to expect to just work.  Current discussions lead me to think that is
something we strive to provide unless a strong argument against is
provided.  I'm not sure added code complexity here is going to make the
grade even if I cannot reasonably judge just how much complexity is
involved.

David J.


Re: Wait event names mismatch: oldserxid

2018-02-09 Thread Ashutosh Bapat
On Fri, Feb 9, 2018 at 7:23 PM, Michael Paquier  wrote:
> On Fri, Feb 09, 2018 at 06:04:39PM +0530, Ashutosh Bapat wrote:
>> Name for wait event "LWTRANCHE_OLDSERXID_BUFFERS" is printed as
>> "oldserxid", but documentation at
>> https://www.postgresql.org/docs/10/static/monitoring-stats.html does
>> not have exact same event there. Instead it has
>>
>> OldSerXidLock Waiting to read or record conflicting serializable
>> transactions.
>
> I see two events defined here in the code of type LWLock dedicated to
> oldserxid:
> - OldSerXidLock which is a wait event defined as it is part of
> LWLockNames.
> - oldserxid, which gets defined in SimpleLruInit(), which itself calls
> LWLockRegisterTranche() to define a second event of type LWLock.

I didn't pay attention to the second one. Thanks for pointing that
out. But then like me a user may first land on OldSerXidLock since
that is first in the list and get confused. May be we should use names
which are not prefix of others.

>
> So the docs look correct to me on this side.  What I find weird is the
> phrasing to define oldserxid.  Instead of that, the current description:
> Waiting to I/O on an oldserxid buffer.
> I would suggest "waiting *for* I/O"
>

+1.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Is there a cache consistent interface to tables ?

2018-02-09 Thread David G. Johnston
On Fri, Feb 9, 2018 at 12:56 AM, Garym  wrote:

> Hi,
> This is an odd request for help. I'm looking to expose an interface so an
> external app can insert to a table while maintaining cache consistency and
> inserts be promoted via wal.
>

​I don't understand what that all means (what cache? inserts be promoted
via wal?) but the most efficient loading mechanism in PostgreSQL is the
"COPY"​ SQL command.

​David J.


Re: [HACKERS] FOSDEM PGDay_2018_Developer_Meeting notes

2018-02-09 Thread Tomas Vondra


On 02/09/2018 02:27 PM, Michael Paquier wrote:
> On Fri, Feb 09, 2018 at 07:55:32AM -0500, Stephen Frost wrote:
>> * Michael Paquier (mich...@paquier.xyz) wrote:
>>> Translate that with "Friday 13th as a lucky or unlucky date for Postgres
>>> 13".  And also translate it to a joke.
>>
>> Indeed.  Besides, the real date will be 2020-11-13. ;)
> 
> This must happen.  Now we just need to do a development so bad that we
> get a bunch of open items to deal with, with a laxist release team
> behind :p

You can leave the bad development part to me. Shitty code is my natural
talent.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Wait event names mismatch: oldserxid

2018-02-09 Thread Michael Paquier
On Fri, Feb 09, 2018 at 06:04:39PM +0530, Ashutosh Bapat wrote:
> Name for wait event "LWTRANCHE_OLDSERXID_BUFFERS" is printed as
> "oldserxid", but documentation at
> https://www.postgresql.org/docs/10/static/monitoring-stats.html does
> not have exact same event there. Instead it has
> 
> OldSerXidLock Waiting to read or record conflicting serializable
> transactions.

I see two events defined here in the code of type LWLock dedicated to
oldserxid:
- OldSerXidLock which is a wait event defined as it is part of
LWLockNames.
- oldserxid, which gets defined in SimpleLruInit(), which itself calls
LWLockRegisterTranche() to define a second event of type LWLock.

So the docs look correct to me on this side.  What I find weird is the
phrasing to define oldserxid.  Instead of that, the current description:
Waiting to I/O on an oldserxid buffer.
I would suggest "waiting *for* I/O"

A small patch is attached.

Thanks,
--
Michael
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e138d1ef07..82c015806e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1068,7 +1068,7 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
 
 
  oldserxid
- Waiting to I/O on an oldserxid buffer.
+ Waiting for I/O on an oldserxid buffer.
 
 
  wal_insert


signature.asc
Description: PGP signature


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2018-02-09 Thread Alvaro Herrera
Claudio Freire wrote:
> On Thu, Feb 8, 2018 at 8:39 PM, Alvaro Herrera  
> wrote:

> During the process of developing the patch, I got seriously broken
> code that passed the tests nonetheless. The test as it was was very
> ineffective at actually detecting issues.
> 
> This new test may be slow, but it's effective. That's a very good
> reason to make it slower, if you ask me.

OK, I don't disagree with improving the test, but if we can make it fast
*and* effective, that's better than slow and effective.

> > Another argument against the LOCK pg_class idea is that it causes an
> > unnecessary contention point across the whole parallel test group --
> > with possible weird side effects.  How about a deadlock?
> 
> The real issue with lock pg_class is that locks on pg_class are
> short-lived, so I'm not waiting for whole transactions.

Doh.

> > Other than the wait loop I proposed, I think we can make a couple of
> > very simple improvements to this test case to avoid a slowdown:
> >
> > 1. the DELETE takes about 1/4th of the time and removes about the same
> > number of rows as the one using the IN clause:
> >   delete from vactst where random() < 3.0 / 4;
> 
> I did try this at first, but it causes random output, so the test
> breaks randomly.

OK.  Still, your query seqscans the table twice.  Maybe it's possible to
use a CTID scan to avoid that, but I'm not sure how.

> > 3. Figure out the minimum size for the table that triggers the behavior
> >you want.  Right now you use 400k tuples -- maybe 100k are sufficient?
> >Don't know.
> 
> For that test, I need enough *dead* tuples to cause several passes.
> Even small mwm settings require tons of tuples for this. In fact, I'm
> thinking that number might be too low for its purpose, even. I'll
> re-check, but I doubt it's too high. If anything, it's too low.

OK.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] FOSDEM PGDay_2018_Developer_Meeting notes

2018-02-09 Thread Michael Paquier
On Fri, Feb 09, 2018 at 07:55:32AM -0500, Stephen Frost wrote:
> * Michael Paquier (mich...@paquier.xyz) wrote:
> > Translate that with "Friday 13th as a lucky or unlucky date for Postgres
> > 13".  And also translate it to a joke.
> 
> Indeed.  Besides, the real date will be 2020-11-13. ;)

This must happen.  Now we just need to do a development so bad that we
get a bunch of open items to deal with, with a laxist release team
behind :p
--
Michael


signature.asc
Description: PGP signature


Re: CALL stmt, ERROR: unrecognized node type: 113 bug

2018-02-09 Thread Michael Paquier
On Fri, Feb 09, 2018 at 12:02:57PM +0100, Pavel Stehule wrote:
> 2018-02-09 7:56 GMT+01:00 Michael Paquier :
> > The second problem involves a cache lookup failure for a type when
> > trying to use pg_get_functiondef on a procedure.  Luckily, it is
> > possible to make the difference between a procedure and a function by
> > checking if prorettype is InvalidOid or not.  There is room for a new
> > patch which supports pg_get_proceduredef() to generate the definition of
> > a procedure, with perhaps a dedicated psql shortcut, but that could
> > always be done later on.
> 
> Blocking subqueries in CALL parameters is possible solution.

ExecuteCallStmt has visibly been written so as it is able to handle the
input set of arguments with a minimal infrastructure in place.  SubLink
nodes require more advanced handling as those need to go through the
planner.  There is also additional processing in the rewriter.  At the
end I tend to think that any user would just turn their back on calling
a function for such cases anyway, so it seems to me that the potential
benefits are not worth the code complexity.

> But blocking
> func def for procedures without any substitution doesn't look correct for
> me.

I don't disagree with you here, there is room for such a function, but
on the other side not having it does not make the existing feature less
useful.  As it is Peter's and Andrew's feature, the last word should
come from them.  Here is my opinion for what it's worth:
- Procedures are not functions, the code is pretty clear about that, so
a system function to generate the definition of a procedure should not
happen with pg_get_functiondef().  They share a lot of infrastructure so
you can reuse a lot of the code present.
- A different psql shortcut should be used, and I would assume that \sp
is adapted.
- Aggregates are also in pg_proc, we generate an error on them because
they are of different type, so an error for procedures when trying to
fetch a functoin definition looks like the good answer.

If folks feel that having a way to retrieve the procedure definition
easily via ruleutils.c is a must-have, then we have material for a good
open item :)
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] logical decoding of two-phase transactions

2018-02-09 Thread Nikhil Sontakke
Hi Stas,


> Reading through patch I’ve noticed that you deleted call to 
> SnapBuildCommitTxn()
> in DecodePrepare(). As you correctly spotted upthread there was unnecessary
> code that marked transaction as running after decoding of prepare. However 
> call
> marking it as committed before decoding of prepare IMHO is still needed as
> SnapBuildCommitTxn does some useful thing like setting base snapshot for 
> parent
> transactions which were skipped because of SnapBuildXactNeedsSkip().
>
> E.g. current code will crash in assert for following transaction:
>
> BEGIN;
> SAVEPOINT one;
> CREATE TABLE test_prepared_savepoints (a int);
> PREPARE TRANSACTION 'x';
> COMMIT PREPARED 'x';
> :get_with2pc_nofilter
> :get_with2pc_nofilter  <- second call will crash decoder
>

Thanks for taking a look!

The first ":get_with2pc_nofilter" call consumes the data appropriately.

The second ":get_with2pc_nofilter" sees that it has to skip and hence
enters the ReorderBufferForget() function in the skip code path
causing the assert. If we have to skip anyways why do we need to setup
SnapBuildCommitTxn() for such a transaction is my query? I don't see
the need for doing that for skipped transactions..

Will continue to look at this and will add this scenario to the test
cases. Further comments/feedback appreciated.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



Re: [HACKERS] FOSDEM PGDay_2018_Developer_Meeting notes

2018-02-09 Thread Stephen Frost
* Michael Paquier (mich...@paquier.xyz) wrote:
> On Fri, Feb 09, 2018 at 08:55:26AM +, Pantelis Theodosiou wrote:
> > On Tue, Feb 6, 2018 at 10:07 AM, Stephen Frost  wrote:
> >> SIDE TOPIC:
> >>
> >> Release date for PostgreSQL 13 agreed: Friday 13th September 2019!!
> > 
> > Isn't Postgres 12 to be released in 2019? And 13 in 2020?
> 
> Translate that with "Friday 13th as a lucky or unlucky date for Postgres
> 13".  And also translate it to a joke.

Indeed.  Besides, the real date will be 2020-11-13. ;)

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2018-02-09 Thread Amit Kapila
On Thu, Feb 8, 2018 at 7:49 PM, Robert Haas  wrote:
> On Thu, Feb 8, 2018 at 9:04 AM, Amit Kapila  wrote:
>>> Also, if you're OK with not being able to do anything except fetch
>>> from the cursor until you reach the end, then couldn't you just issue
>>> the query without the cursor and use PQsetSingleRowMode?
>>
>> I think there is a lot of cursor usage via plpgsql in which it could be 
>> useful.
>
> I don't see how.  If you can't do anything but fetch from the cursor
> while the cursor is open, then you can't go start doing things in
> plpgsql code either.
>

I am not saying to allow other things.  I am just replying to your
question that why can't we use PQsetSingleRowMode.  I mean to say that
one can fetch the data parallelly via the usage of cursors (by having
restrictions like don't allow other parallel unsafe statements) in
plpgsql.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Proposal: partition pruning by secondary attributes

2018-02-09 Thread Alvaro Herrera
Ildar Musin wrote:

> But if we filter the table by 'id' then planner has no other way but to
> append every partition to the plan.
> 
> EXPLAIN (COSTS OFF) SELECT * FROM events WHERE id = 123;
>  Append
>->  Seq Scan on events_0
>  Filter: (id = 123)
>->  Seq Scan on events_1
>  Filter: (id = 123)
>->  Seq Scan on events_2
>  Filter: (id = 123)

I think it should be possible to prune at runtime based on a brin index.
As Andres says this means we cannot prune at plan time, and you still
need to open the relations and indexes to perform pruning, but the
contention problem is solved.

A pretty crazy related idea is to allow BRIN indexes to be global -- so
you create a brin index on the partitioned table in such a way that it
doesn't cascade to create local indexes, but instead a single index
represents the whole hierarchy.  This requires a lot of other changes,
but seems to match your design.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Wait event names mismatch: oldserxid

2018-02-09 Thread Ashutosh Bapat
Hi,
Name for wait event "LWTRANCHE_OLDSERXID_BUFFERS" is printed as
"oldserxid", but documentation at
https://www.postgresql.org/docs/10/static/monitoring-stats.html does
not have exact same event there. Instead it has

OldSerXidLock Waiting to read or record conflicting serializable
transactions.

Are they same. In that case, should we change the documentation or
code in OldSerXidInit()
 808 SimpleLruInit(OldSerXidSlruCtl, "oldserxid",
 809   NUM_OLDSERXID_BUFFERS, 0, OldSerXidLock, "pg_serial",
 810   LWTRANCHE_OLDSERXID_BUFFERS);

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2018-02-09 Thread Claudio Freire
On Thu, Feb 8, 2018 at 8:39 PM, Alvaro Herrera  wrote:
> Claudio Freire wrote:
>
>> I don't like looping, though, seems overly cumbersome. What's worse?
>> maintaining that fragile weird loop that might break (by causing
>> unexpected output), or a slight slowdown of the test suite?
>>
>> I don't know how long it might take on slow machines, but in my
>> machine, which isn't a great machine, while the vacuum test isn't fast
>> indeed, it's just a tiny fraction of what a simple "make check" takes.
>> So it's not a huge slowdown in any case.
>
> Well, what about a machine running tests under valgrind, or the weird
> cache-clobbering infuriatingly slow code?  Or buildfarm members running
> on really slow hardware?  These days, a few people have spent a lot of
> time trying to reduce the total test time, and it'd be bad to lose back
> the improvements for no good reason.

It's not for no good reason. The old tests were woefully inadequate.

During the process of developing the patch, I got seriously broken
code that passed the tests nonetheless. The test as it was was very
ineffective at actually detecting issues.

This new test may be slow, but it's effective. That's a very good
reason to make it slower, if you ask me.

> I grant you that the looping I proposed is more complicated, but I don't
> see any reason why it would break.
>
> Another argument against the LOCK pg_class idea is that it causes an
> unnecessary contention point across the whole parallel test group --
> with possible weird side effects.  How about a deadlock?

The real issue with lock pg_class is that locks on pg_class are
short-lived, so I'm not waiting for whole transactions.

> Other than the wait loop I proposed, I think we can make a couple of
> very simple improvements to this test case to avoid a slowdown:
>
> 1. the DELETE takes about 1/4th of the time and removes about the same
> number of rows as the one using the IN clause:
>   delete from vactst where random() < 3.0 / 4;

I did try this at first, but it causes random output, so the test
breaks randomly.

> 2. Use a new temp table rather than vactst.  Everything is then faster.

I might try that.

> 3. Figure out the minimum size for the table that triggers the behavior
>you want.  Right now you use 400k tuples -- maybe 100k are sufficient?
>Don't know.

For that test, I need enough *dead* tuples to cause several passes.
Even small mwm settings require tons of tuples for this. In fact, I'm
thinking that number might be too low for its purpose, even. I'll
re-check, but I doubt it's too high. If anything, it's too low.



Re: non-bulk inserts and tuple routing

2018-02-09 Thread Etsuro Fujita

(2018/02/09 14:32), Amit Langote wrote:

I had mistakenly tagged these patches v24, but they were actually supposed
to be v5.  So the attached updated patch is tagged v6.


OK.


On 2018/02/07 19:36, Etsuro Fujita wrote:

(2018/02/05 14:34), Amit Langote wrote:

The code in tupconv_map_for_subplan() currently assumes that it can rely
on all leaf partitions having been initialized.


On reflection I noticed this analysis is not 100% correct; I think what
that function actually assumes is that all sublplans' partitions have
already been initialized, not all leaf partitions.


Yes, you're right.


Since we're breaking that
assumption with this proposal, that needed to be changed. So the patch
contained some refactoring to make it not rely on that assumption.


I don't think we really need this refactoring because since that as in
another patch you posted, we could initialize all subplans' partitions in
ExecSetupPartitionTupleRouting, I think tupconv_map_for_subplan could be
called without any changes to that function because of what I said above.


What my previous approach failed to consider is that in the update case,
we'd already have ResultRelInfo's for some of the leaf partitions
initialized, which could be saved into proute->partitions array right away.

Updated patch does things that way, so all the changes I had proposed to
tupconv_map_for_subplan are rendered unnecessary.


OK, thanks for the updated patch!


Here are comments for the other patch (patch
v24-0002-During-tuple-routing-initialize-per-partition-ob.patch):

o On changes to ExecSetupPartitionTupleRouting:

* The comment below wouldn't be correct; no ExecInitResultRelInfo in the
patch.

-   proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-  sizeof(ResultRelInfo *));
+   /*
+* Actual ResultRelInfo's and TupleConversionMap's are allocated in
+* ExecInitResultRelInfo().
+*/
+   proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions *
+   sizeof(ResultRelInfo

*));

I removed the comment altogether, as the comments elsewhere make the point
clear.


* The patch removes this from the initialization step for a subplan's
partition, but I think it would be better to keep this here because I
think it's a good thing to put the initialization stuff together into one
place.

-   /*
-* This is required in order to we convert the partition's
-* tuple to be compatible with the root partitioned table's
-* tuple descriptor.  When generating the per-subplan result
-* rels, this was not set.
-*/
-   leaf_part_rri->ri_PartitionRoot = rel;


It wasn't needed here with the previous approach, because we didn't touch
any ResultRelInfo's in ExecSetupPartitionTupleRouting, but I've added it
back in the updated patch.


* I think it would be better to keep this comment here.

-   /* Remember the subplan offset for this ResultRelInfo */


Fixed.


* Why is this removed from that initialization?

-   proute->partitions[i] = leaf_part_rri;


Because of the old approach.  Now it's back in.


o On changes to ExecInitPartitionInfo:

* I don't understand the step starting from this, but I'm wondering if
that step can be removed by keeping the above setup of proute->partitions
for the subplan's partition in ExecSetupPartitionTupleRouting.

+   /*
+* If we are doing tuple routing for update, try to reuse the
+* per-subplan resultrel for this partition that ExecInitModifyTable()
+* might already have created.
+*/
+   if (mtstate&&  mtstate->operation == CMD_UPDATE)


Done, as mentioned above.

On 2018/02/08 19:16, Etsuro Fujita wrote:

Here are some minor comments:

o On changes to ExecInsert

* This might be just my taste, but I think it would be better to (1)
change ExecInitPartitionInfo so that it creates and returns a
newly-initialized ResultRelInfo for an initialized partition, and (2)
rewrite this bit:

+   /* Initialize partition info, if not done already. */
+   ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate,
+ leaf_part_index);
+
 /*
  * Save the old ResultRelInfo and switch to the one corresponding to
  * the selected partition.
  */
 saved_resultRelInfo = resultRelInfo;
 resultRelInfo = proute->partitions[leaf_part_index];
+   Assert(resultRelInfo != NULL);

to something like this (I would say the same thing to the copy.c changes):

 /*
  * Save the old ResultRelInfo and switch to the one corresponding to
  * the selected partition.
  */
 saved_resultRelInfo = resultRelInfo;
 resultRelInfo = proute->partitions[leaf_part_index];
 if (resultRelInfo == NULL);
 {
 /* Initialize partition info. */
 resultRelInfo = ExecInitPartitionInfo(mtstate,

Re: Proposal: partition pruning by secondary attributes

2018-02-09 Thread Ildar Musin



On 08.02.2018 21:01, Andres Freund wrote:

On 2018-02-08 14:48:34 -0300, Alvaro Herrera wrote:

Ildar Musin wrote:


The idea is to store min and max values of secondary attributes
(like 'id' in the example above) for each partition somewhere in
catalog and use it for partition pruning along with partitioning
key. You can think of it as somewhat like BRIN index but for
partitions.


What is the problem with having a BRIN index?


Building plans to scan the individual partitions, lock them, open
the relevant files, etc is often going to be significantly more
expensive than pruning at plan time.

But there also seems to be a number of fairly nasty locking issues
with this proposal, leaving the amount of required code aside.



Sorry, I probably didn't describe it right. I wasn't talking about using
brin index for partition pruning or something like that, just used it as
a reference to the idea. I'll try to explain it in more detailed way.

Let's say we have a table to store some events, which is partitioned by
timestamp column:

CREATE TABLE events (
id serial,
dt timestamp,
...
) PARTITION BY RANGE (dt);

In some cases it is queried by 'dt' column and partition pruning is
working fine because 'dt' is a partitioning key:

EXPLAIN (COSTS OFF) SELECT ... FROM events WHERE dt >= '2018-01-01' AND
dt < '2018-02-01';
 Append
   ->  Seq Scan on events_0
 Filter: ((dt >= '2018-01-01 00:00:00'::timestamp without time
zone) AND (dt < '2018-02-01 00:00:00'::timestamp without time zone))

But if we filter the table by 'id' then planner has no other way but to
append every partition to the plan.

EXPLAIN (COSTS OFF) SELECT * FROM events WHERE id = 123;
 Append
   ->  Seq Scan on events_0
 Filter: (id = 123)
   ->  Seq Scan on events_1
 Filter: (id = 123)
   ->  Seq Scan on events_2
 Filter: (id = 123)

We can see though that values of 'dt' and 'id' both monotonically
increase over time and so we can potentially use 'id' column to do
partition pruning at plan time too. To do so we need to store min and
max values of 'id' column per partition somewhere in catalog and use
them to decide which partition should be added to the plan by matching
them to the query restrictions.

Each time table is updated we must check whether new value exceeds
stored min/max values and update those too if needed. This raises few
issues. One of them as Ashutosh Bapat mentioned is the need to change
catalog very often which could result in high catalog contention. I
can't think of comprehensive solution for this problem. But for
numerical and datetime types we could shift min or max bounds with
margin so that not every update will result in catalog change. The other
one is the need to change min/max of partition if rows were removed
which is less evil since we can postpone it and do it later (during
autovacuum for instance).

A new command for ALTER TABLE should also be introduced to specify the
column or expression which is not a partition key but can be used for
partition pruning as described above. This command would scan each
partition, gather min/max values and store them into catalog.

What do you think?

--
Ildar Musin
i.mu...@postgrespro.ru



Re: Using scalar function as set-returning: bug or feature?

2018-02-09 Thread Pavel Stehule
2018-02-09 12:02 GMT+01:00 Marko Tiikkaja :

> On Fri, Feb 9, 2018 at 9:58 AM, Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> wrote:
>
>> Attached please find patch reporting error in case of empty attribute
>> list for SELECT INTO
>>
>
> This is quite short-sighted.  The better way to do this is to complain if
> the number of expressions is different from the number of target variables
> (and the target variable is not a record-ish type).  There's been at least
> two patches for this earlier (one my me, and one by, I think Pavel
> Stehule).  I urge you to dig around in the archives to avoid wasting your
> time.
>

This issue can be detected by plpgsql_check and commitfest pipe is patch
that raise warning or error in this case.

Regards

Pavel


>
>
> .m
>


Re: CALL stmt, ERROR: unrecognized node type: 113 bug

2018-02-09 Thread Pavel Stehule
Hi

2018-02-09 7:56 GMT+01:00 Michael Paquier :

> On Fri, Feb 02, 2018 at 04:07:28PM +0900, Michael Paquier wrote:
> > You need to read that as "only a SubPlan can be executed after a SubLink
> > has been processed by the planner", so please replace the last "latter"
> > by "planner".
>
> (I forgot to add Peter and Andrew in CC: previously, so done now.)
>
> e4128ee7 is making is clear that SubLink are authorized when
> transforming it in transformSubLink(), however I cannot think about a
> use case so should we just forbid them, and this is actually untested.
> So the patch attached does so.
>
> The second problem involves a cache lookup failure for a type when
> trying to use pg_get_functiondef on a procedure.  Luckily, it is
> possible to make the difference between a procedure and a function by
> checking if prorettype is InvalidOid or not.  There is room for a new
> patch which supports pg_get_proceduredef() to generate the definition of
> a procedure, with perhaps a dedicated psql shortcut, but that could
> always be done later on.
>

Blocking subqueries in CALL parameters is possible solution. But blocking
func def for procedures without any substitution doesn't look correct for
me.

Regards

Pavel


> --
> Michael
>


Re: Using scalar function as set-returning: bug or feature?

2018-02-09 Thread Marko Tiikkaja
On Fri, Feb 9, 2018 at 9:58 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> Attached please find patch reporting error in case of empty attribute list
> for SELECT INTO
>

This is quite short-sighted.  The better way to do this is to complain if
the number of expressions is different from the number of target variables
(and the target variable is not a record-ish type).  There's been at least
two patches for this earlier (one my me, and one by, I think Pavel
Stehule).  I urge you to dig around in the archives to avoid wasting your
time.


.m


Re: non-bulk inserts and tuple routing

2018-02-09 Thread Etsuro Fujita

(2018/02/08 23:21), Robert Haas wrote:

On Thu, Feb 8, 2018 at 5:16 AM, Etsuro Fujita
  wrote:

 if (resultRelInfo == NULL);
 {
 /* Initialize partition info. */
 resultRelInfo = ExecInitPartitionInfo(mtstate,
   saved_resultRelInfo,
   proute,
   estate,
   leaf_part_index);
 }


I'm pretty sure that code has one semicolon too many.


Good catch!

Best regards,
Etsuro Fujita



Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-02-09 Thread Etsuro Fujita

(2018/02/09 10:48), Etsuro Fujita wrote:

(2018/02/09 4:32), Robert Haas wrote:

On Thu, Feb 8, 2018 at 11:05 AM, Tom Lane wrote:

there's still an intermittent issue. I ran "make installcheck" in
contrib/postgres_fdw in a loop, and got a similar failure on the
47th try --- my result duplicates the second plan change shown by
rhinoceros, but not the first one. I speculate that the plan change
is a result of autovacuum kicking in partway through the run.


Will look into this.


I tried to reproduce that in my environment, but I couldn't.  On 
reflection I think an easy and reliable way to address that concern is 
to use local stats on foreign tables.  Attached is a patch for that.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 4244,4249  explain (verbose, costs off) select * from ft3 f, loct3 l
--- 4244,4253 
  -- ===
  -- test writable foreign table stuff
  -- ===
+ -- Autovacuum on the remote side might affect remote estimates,
+ -- so use local stats on ft2 as well
+ ALTER FOREIGN TABLE ft2 OPTIONS (SET use_remote_estimate 'false');
+ ANALYZE ft2;
  EXPLAIN (verbose, costs off)
  INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
  QUERY PLAN
***
*** 5520,5551  UPDATE ft2 SET c3 = 'baz'
FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
RETURNING ft2.*, ft4.*, ft5.*;-- can't be pushed down
!   QUERY PLAN  
! --
   Update on public.ft2
 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
 Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
!->  Nested Loop
   Output: ft2.c1, ft2.c2, NULL::integer, 'baz'::text, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*, ft5.*, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
!  Join Filter: (ft2.c2 === ft4.c1)
!  ->  Foreign Scan on public.ft2
!Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid
!Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
   ->  Foreign Scan
!Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
!Relations: (public.ft4) INNER JOIN (public.ft5)
!Remote SQL: SELECT CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r2.c1, r2.c2, r2.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r3.c1, r3.c2, r3.c3 FROM ("S 1"."T 3" r2 INNER JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1
!->  Hash Join
!  Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
!  Hash Cond: (ft4.c1 = ft5.c1)
   ->  Foreign Scan on public.ft4
 Output: ft4.*, ft4.c1, ft4.c2, ft4.c3
 Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
!  ->  Hash
!Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
!->  Foreign Scan on public.ft5
!  Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
!  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
  (24 rows)
  
  UPDATE ft2 SET c3 = 'baz'
--- 5524, 
FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
RETURNING ft2.*, ft4.*, ft5.*;-- can't be pushed down
! QUERY PLAN 
! 

Re: [HACKERS] FOSDEM PGDay_2018_Developer_Meeting notes

2018-02-09 Thread Michael Paquier
On Fri, Feb 09, 2018 at 08:55:26AM +, Pantelis Theodosiou wrote:
> On Tue, Feb 6, 2018 at 10:07 AM, Stephen Frost  wrote:
>> SIDE TOPIC:
>>
>> Release date for PostgreSQL 13 agreed: Friday 13th September 2019!!
> 
> Isn't Postgres 12 to be released in 2019? And 13 in 2020?

Translate that with "Friday 13th as a lucky or unlucky date for Postgres
13".  And also translate it to a joke.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] FOSDEM PGDay_2018_Developer_Meeting notes

2018-02-09 Thread Dave Page
On Fri, Feb 9, 2018 at 8:55 AM, Pantelis Theodosiou 
wrote:

> On Tue, Feb 6, 2018 at 10:07 AM, Stephen Frost  wrote:
>
>>
>> That was also what seemed to be the consensus coming out of the FOSDEM
>> Developer meeting (notes here:
>> https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2018_Developer_Meeting).
>>
>>
> In the notes there is this, which confused me:
>
>
> > SIDE TOPIC:
> >
> > Release date for PostgreSQL 13 agreed: Friday 13th September 2019!!
>
>
> Isn't Postgres 12 to be released in 2019? And 13 in 2020?
>
>
It was a joke in the meeting.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Using scalar function as set-returning: bug or feature?

2018-02-09 Thread Konstantin Knizhnik



On 09.02.2018 11:58, Konstantin Knizhnik wrote:



On 09.02.2018 11:02, Konstantin Knizhnik wrote:



On 09.02.2018 10:47, Sergei Kornilov wrote:

Hello


select into b from my_insert('from func atx');

You missed select something into b. For example,
select ret into b from my_insert('from func atx') as ret;

Using scalar function in from is not bug.
Silent assigning NULL for variables in "into" not matches same in 
"select"... I think better would be raise warning.


regards, Sergei

Thank you.
Really the problem is caused by empty source list for INTO.
If I rewrite query as

    select my_insert into b from my_insert('from func atx');

then it works as expected.
But I wonder if the original statement should be considered as error 
or at least we should produce warning for such empty projections?



Attached please find patch reporting error in case of empty attribute 
list for SELECT INTO.




Sorry, this patch doesn't correctly handle case:

SELECT INTO [STRICT] target select_expressions FROM ...;

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Using scalar function as set-returning: bug or feature?

2018-02-09 Thread Konstantin Knizhnik



On 09.02.2018 11:02, Konstantin Knizhnik wrote:



On 09.02.2018 10:47, Sergei Kornilov wrote:

Hello


select into b from my_insert('from func atx');

You missed select something into b. For example,
select ret into b from my_insert('from func atx') as ret;

Using scalar function in from is not bug.
Silent assigning NULL for variables in "into" not matches same in 
"select"... I think better would be raise warning.


regards, Sergei

Thank you.
Really the problem is caused by empty source list for INTO.
If I rewrite query as

    select my_insert into b from my_insert('from func atx');

then it works as expected.
But I wonder if the original statement should be considered as error 
or at least we should produce warning for such empty projections?



Attached please find patch reporting error in case of empty attribute 
list for SELECT INTO.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 7f52849..c43d572 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -2853,6 +2853,7 @@ make_execsql_stmt(int firsttoken, int location)
 	int	prev_tok;
 	boolhave_into = false;
 	boolhave_strict = false;
+	boolhave_attributes = false;
 	int	into_start_loc = -1;
 	int	into_end_loc = -1;
 
@@ -2907,6 +2908,8 @@ make_execsql_stmt(int firsttoken, int location)
 continue;		/* INSERT INTO is not an INTO-target */
 			if (firsttoken == K_IMPORT)
 continue;		/* IMPORT ... INTO is not an INTO-target */
+			if (!have_attributes)
+yyerror("Empty list for INTO-target");
 			if (have_into)
 yyerror("INTO specified more than once");
 			have_into = true;
@@ -2915,6 +2918,8 @@ make_execsql_stmt(int firsttoken, int location)
 			read_into_target(, , _strict);
 			plpgsql_IdentifierLookup = IDENTIFIER_LOOKUP_EXPR;
 		}
+		else
+			have_attributes = true;
 	}
 
 	plpgsql_IdentifierLookup = save_IdentifierLookup;


Re: [HACKERS] FOSDEM PGDay_2018_Developer_Meeting notes

2018-02-09 Thread Pantelis Theodosiou
On Tue, Feb 6, 2018 at 10:07 AM, Stephen Frost  wrote:

>
> That was also what seemed to be the consensus coming out of the FOSDEM
> Developer meeting (notes here:
> https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2018_Developer_Meeting).
>
>
In the notes there is this, which confused me:


> SIDE TOPIC:
>
> Release date for PostgreSQL 13 agreed: Friday 13th September 2019!!


Isn't Postgres 12 to be released in 2019? And 13 in 2020?


Re: [HACKERS] More stats about skipped vacuums

2018-02-09 Thread Kyotaro HORIGUCHI
Hello.

At Thu, 08 Feb 2018 18:21:56 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180208.182156.96551245.horiguchi.kyot...@lab.ntt.co.jp>
> I suppose that the problem has not been resolved yet..

I found several bugs during studying this but my conclusion here
is that the required decision here is that whether we regard the
unavailability of DSM as a fatal error as we do for out of
memory. Maybe we can go for the direction but just doing it
certainly let some buidfarm animals (at least anole? smew is not
found.) out of their lives.

I've not found the exact cause of the problem that regtest on the
bf animals always suceeded using sysv shmem but "postgres --boot"
by initdb alone can fail using the same mechanism. But regtest
seems to continue working if initdb sets max_connection to 20 or
more.  At least it suceeds for me with the values max_connection
= 20 and shared_buffers=50MB on centos.

Finally, I'd like to propose the followings.

 - kill dynamic_shared_memory_type = nune just now.

 * server stops at startup if DSM is not available.

 - Let initdb set max_connection = 20 as the fallback value in
   the case. (Another porposed patch) And regression should
   succeed with that.

If we are agreed on this, I will be able to go forward.


I want to have opinions on this from the expericed poeple.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Parallel bt build crashes when DSM_NONE

2018-02-09 Thread Kyotaro HORIGUCHI
Hello.

I happend to find that server crashes during regtest when
DSM_NONE is enforced. The attached patch fixes that.

The cause is the fact that _bt_spools_heapscan runs
_bt_begin_parallel() even if dynamic_shared_memory_type is
DSM_NONE. It is because plan_create_index_workers() is ignoring
dynamic_shared_memory_type.

We can reproduce this by letting initdb set
dynamic_shared_memory_type=none regardless of actual
availability. (Second attached) and just "make check".

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 740de49..3e8cd14 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5825,7 +5825,8 @@ plan_create_index_workers(Oid tableOid, Oid indexOid)
 	double		allvisfrac;
 
 	/* Return immediately when parallelism disabled */
-	if (max_parallel_maintenance_workers == 0)
+	if (dynamic_shared_memory_type == DSM_IMPL_NONE ||
+		max_parallel_maintenance_workers == 0)
 		return 0;
 
 	/* Set up largely-dummy planner state */
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2efd3b7..876e153 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -871,6 +871,7 @@ choose_dsm_implementation(void)
 #ifdef HAVE_SHM_OPEN
 	int			ntries = 10;
 
+	return "none";
 	while (ntries > 0)
 	{
 		uint32		handle;


Re: Using scalar function as set-returning: bug or feature?

2018-02-09 Thread Konstantin Knizhnik



On 09.02.2018 10:47, Sergei Kornilov wrote:

Hello


select into b from my_insert('from func atx');

You missed select something into b. For example,
select ret into b from my_insert('from func atx') as ret;

Using scalar function in from is not bug.
Silent assigning NULL for variables in "into" not matches same in "select"... I 
think better would be raise warning.

regards, Sergei

Thank you.
Really the problem is caused by empty source list for INTO.
If I rewrite query as

    select my_insert into b from my_insert('from func atx');

then it works as expected.
But I wonder if the original statement should be considered as error or 
at least we should produce warning for such empty projections?



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company