Re: [HACKERS] show precise repos version for dev builds?

2017-10-13 Thread Fabien COELHO


Hello Robert,


Mmph.  I understand the desire to identify the exact commit used for a
build somehow, but something whose output depends on whether or not I
left a branch lying around locally doesn't seem that great.


Indeed, the branch/tag search might have a little strange behavior.

Probably you would be more happy with just the commit (--always) & knowing 
that it was changed (--dirty).


 sh> git describe --always --dirty
 b81eba6

 sh> vi README
 # edit

 sh> git describe --always --dirty
 b81eba6-dirty

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SIGSEGV in BRIN autosummarize

2017-10-13 Thread Justin Pryzby
I upgraded one of our customers to PG10 Tuesday night, and Wednesday replaced
an BTREE index with BRIN index (WITH autosummarize).

Today I see:
< 2017-10-13 17:22:47.839 -04  >LOG:  server process (PID 32127) was terminated 
by signal 11: Segmentation fault
< 2017-10-13 17:22:47.839 -04  >DETAIL:  Failed process was running: 
autovacuum: BRIN summarize public.gtt 747263

postmaster[32127] general protection ip:4bd467 sp:7ffd9b349990 error:0 in 
postgres[40+692000]

[pryzbyj@database ~]$ rpm -qa postgresql10
postgresql10-10.0-1PGDG.rhel6.x86_64

Oct 13 17:22:45 database kernel: postmaster[32127] general protection ip:4bd467 
sp:7ffd9b349990 error:0 in postgres[40+692000]
Oct 13 17:22:47 database abrtd: Directory 'ccpp-2017-10-13-17:22:47-32127' 
creation detected
Oct 13 17:22:47 database abrt[32387]: Saved core dump of pid 32127 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-13-17:22:47-32127 
(15040512 bytes)

..unfortunately:
Oct 13 17:22:47 database abrtd: Package 'postgresql10-server' isn't signed with 
proper key
Oct 13 17:22:47 database abrtd: 'post-create' on 
'/var/spool/abrt/ccpp-2017-10-13-17:22:47-32127' exited with 1
Oct 13 17:22:47 database abrtd: DELETING PROBLEM DIRECTORY 
'/var/spool/abrt/ccpp-2017-10-13-17:22:47-32127'

postgres=# SELECT * FROM bak_postgres_log_2017_10_13_1700 WHERE pid=32127 ORDER 
BY log_time DESC LIMIT 9;
-[ RECORD 1 
]--+-
log_time   | 2017-10-13 17:22:45.56-04
pid| 32127
session_id | 59e12e67.7d7f
session_line   | 2
command_tag| 
session_start_time | 2017-10-13 17:21:43-04
error_severity | ERROR
sql_state_code | 57014
message| canceling autovacuum task
context| processing work entry for relation 
"gtt.public.cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx"
-[ RECORD 2 
]--+-
log_time   | 2017-10-13 17:22:44.557-04
pid| 32127
session_id | 59e12e67.7d7f
session_line   | 1
session_start_time | 2017-10-13 17:21:43-04
error_severity | ERROR
sql_state_code | 57014
message| canceling autovacuum task
context| automatic analyze of table 
"gtt.public.cdrs_huawei_sgsnpdprecord_2017_10_13"

Time: 375.552 ms

It looks like this table was being inserted into simultaneously by a python
program using multiprocessing.  It looks like each subprocess was INSERTing
into several tables, each of which has one BRIN index on timestamp column.

gtt=# \dt+ cdrs_eric_egsnpdprecord_2017_10_13
 public | cdrs_eric_egsnpdprecord_2017_10_13 | table | gtt   | 5841 MB | 

gtt=# \di+ cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx 
 public | cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx | index | 
gtt   | cdrs_eric_egsnpdprecord_2017_10_13 | 136 kB | 

I don't have any reason to believe there's memory issue on the server, So I
suppose this is just a "heads up" to early adopters until/in case it happens
again and I can at least provide a stack trace.

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead bug (Was: amcheck (B-Tree integrity checking tool))

2017-10-13 Thread Noah Misch
On Mon, Oct 09, 2017 at 05:19:11PM -0700, Peter Geoghegan wrote:
> On Sun, Oct 16, 2016 at 6:46 PM, Noah Misch  wrote:
> > - Verify agreement between CLOG, MULTIXACT, and hint bits.
> 
> This is where it gets complicated, I think. This is what I really want
> to talk about.
> 
> The recent commit 20b65522 pretty clearly established that a tuple
> could technically escape freezing (having HEAP_XMIN_FROZEN set)
> despite actually being before a relfrozenxid cut-off. The idea was
> that as long as we reliably set hint bits on heap-only tuples through
> WAL-logging, it doesn't matter that their CLOG could be truncated,
> because nobody will ever need to interrogate the CLOG anyway (to coin
> a phrase, the heap-only tuple nevertheless still had its xmax
> "logically frozen"). If nothing else, this leaves me with a very
> squishy set of invariant conditions to work with when I go to verify
> agreement with CLOG, MULTIXACT, and hint bits.
> 
> So, the question is: What is the exact set of invariant conditions
> that I can check when I go to verify agreement between a heap relation
> and the various SLRUs? In particular, at what precise XID-wise point
> do CLOG and MULTIXACT stop reliably storing transaction status? And,
> is there a different point for the xmax of tuples that happen to be
> heap-only tuples?
> 
> Another important concern here following 20b65522 is: Why is it safe
> to assume that nobody will ever call TransactionIdDidCommit() for
> "logically frozen" heap-only tuples that are not at the end of their
> HOT chain, and in so doing get a wrong answer? I can't find a rule
> that implies that there is no dangerous ambiguity that's written down
> anywhere. I *can* find a comment that suggests that it's wrong, though
> -- the "N.B." comment at the top of heap_prepare_freeze_tuple().
> (Granted, that comment doesn't seem to acknowledge the fact that the
> caller does WAL-log as part of freezing; there has been some churn in
> this area.)

All good questions; I don't know offhand.  Discovering those answers is
perhaps the chief labor required of such a project.  The checker should
consider circumstances potentially carried from past versions via pg_upgrade.
Fortunately, if you get some details wrong, it's cheap to recover from checker
bugs.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_control_recovery() return value when not in recovery

2017-10-13 Thread Andres Freund
On 2017-10-13 16:31:37 -0700, Joe Conway wrote:
> On 09/17/2017 11:29 PM, Andres Freund wrote:
> > On 2017-09-18 07:24:43 +0100, Simon Riggs wrote:
> >> On 18 September 2017 at 05:50, Andres Freund  wrote:
> >> > Hi,
> >> >
> >> > Just noticed that we're returning the underlying values for
> >> > pg_control_recovery() without any checks:
> >> > postgres[14388][1]=# SELECT * FROM pg_control_recovery();
> >> > ┌──┬───┬──┬┬───┐
> >> > │ min_recovery_end_lsn │ min_recovery_end_timeline │ backup_start_lsn │ 
> >> > backup_end_lsn │ end_of_backup_record_required │
> >> > ├──┼───┼──┼┼───┤
> >> > │ 0/0  │ 0 │ 0/0  │ 
> >> > 0/0│ f │
> >> > └──┴───┴──┴┴───┘
> >> > (1 row)
> >> 
> >> Yes, that would have made sense for these to be NULL
> > 
> > Yea, that's what I think was well.  Joe, IIRC that's your code, do you
> > agree as well?
> 
> Sorry for the slow response, but thinking back on this now, the idea of
> these functions, in my mind at least, was to provide as close to the
> same output as possible to what pg_controldata outputs. So:
> 
> # pg_controldata
> ...
> Minimum recovery ending location: 0/0
> Min recovery ending loc's timeline:   0
> Backup start location:0/0
> Backup end location:  0/0
> End-of-backup record required:no
> ...
> 
> So if we make a change here, do we also change pg_controldata?

I'm unconvince that they need to be kept that close. For one, text
output doesn't have the concept of NULLs. Secondly, pg_controldata
output also the cluster state at the same time.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] show precise repos version for dev builds?

2017-10-13 Thread Michael Paquier
On Sat, Oct 14, 2017 at 4:47 AM, Robert Haas  wrote:
> Mmph.  I understand the desire to identify the exact commit used for a
> build somehow, but something whose output depends on whether or not I
> left a branch lying around locally doesn't seem that great.

Similarly to Peter, I prefer a minimum amount of information so I tend
to just use `git rev-parse --short HEAD` with --extra-version for my
own builds. Looking at the timestamp of the files installed is enough
to know when you worked on them, and when testing a patch and
committing it on a local branch before compiling you can know easily
where you left things off. git branch --contains is also useful to get
from which branch is commit from.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_control_recovery() return value when not in recovery

2017-10-13 Thread Joe Conway
On 09/17/2017 11:29 PM, Andres Freund wrote:
> On 2017-09-18 07:24:43 +0100, Simon Riggs wrote:
>> On 18 September 2017 at 05:50, Andres Freund  wrote:
>> > Hi,
>> >
>> > Just noticed that we're returning the underlying values for
>> > pg_control_recovery() without any checks:
>> > postgres[14388][1]=# SELECT * FROM pg_control_recovery();
>> > ┌──┬───┬──┬┬───┐
>> > │ min_recovery_end_lsn │ min_recovery_end_timeline │ backup_start_lsn │ 
>> > backup_end_lsn │ end_of_backup_record_required │
>> > ├──┼───┼──┼┼───┤
>> > │ 0/0  │ 0 │ 0/0  │ 
>> > 0/0│ f │
>> > └──┴───┴──┴┴───┘
>> > (1 row)
>> 
>> Yes, that would have made sense for these to be NULL
> 
> Yea, that's what I think was well.  Joe, IIRC that's your code, do you
> agree as well?

Sorry for the slow response, but thinking back on this now, the idea of
these functions, in my mind at least, was to provide as close to the
same output as possible to what pg_controldata outputs. So:

# pg_controldata
...
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
...

So if we make a change here, do we also change pg_controldata?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Continuous integration on Windows?

2017-10-13 Thread Thomas Munro
On Sat, Oct 14, 2017 at 11:27 AM, Thomas Munro
 wrote:
> On Sat, Oct 14, 2017 at 7:47 AM, legrand legrand
>  wrote:
>> Is it stored somewhere to permit to users like me
>> that want to test pg 10 on windows
>> without having to build it ?

Erm, wait, you said pg 10.  That's already released.  This thread is
about building and automatically testing development code on Windows.
If it's pre-built releases of PostgreSQL you're looking for, they're over
here:  https://www.postgresql.org/download/windows/

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-13 Thread Tom Lane
I wrote:
> Moving on to the exact color of the bikeshed: it seems like the right
> way to present this to users of CREATE AGGREGATE is in terms of "does
> the final function modify the transition state?".  So maybe the values
> could be spelled
> SMODIFY = READ_ONLY   ffunc never touches state, ok as window agg
> SMODIFY = SHARABLEffunc does some one-time change like sorting,
>   so state merging is OK but not window agg
> SMODIFY = READ_WRITE  ffunc trashes state, can't do merging either
> I'm not set on these names by any means; anyone have a better idea?

After contemplating the existing CREATE AGGREGATE parameters, particularly
[M]FINALFUNC_EXTRA, it seemed to me that better nomenclature is

[M]FINALFUNC_MODIFY = READ_ONLY
[M]FINALFUNC_MODIFY = STOP_UPDATES
[M]FINALFUNC_MODIFY = READ_WRITE

where "stop updates" is intended to imply "you can't call the transfn
anymore after the first finalfn call".  I'm still not that much in
love with that terminology, but don't have a better idea now.

Attached is a WIP patch; I believe it's code-complete but the SGML
docs are lacking.  Barring objections I'll finish up the docs and
push this.

regards, tom lane

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9af77c1..cfec246 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***
*** 487,492 
--- 487,512 
True to pass extra dummy arguments to aggmfinalfn
   
   
+   aggfinalmodify
+   char
+   
+   Whether aggfinalfn modifies the
+transition state value:
+r if it is read-only,
+s if the aggtransfn
+cannot be applied after the aggfinalfn, or
+w if it writes on the value
+   
+  
+  
+   aggmfinalmodify
+   char
+   
+   Like aggfinalmodify, but for
+the aggmfinalfn
+   
+  
+  
aggsortop
oid
pg_operator.oid
diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index a920450..ca3fd81 100644
*** a/src/backend/catalog/pg_aggregate.c
--- b/src/backend/catalog/pg_aggregate.c
***
*** 19,24 
--- 19,25 
  #include "catalog/dependency.h"
  #include "catalog/indexing.h"
  #include "catalog/pg_aggregate.h"
+ #include "catalog/pg_aggregate_fn.h"
  #include "catalog/pg_language.h"
  #include "catalog/pg_operator.h"
  #include "catalog/pg_proc.h"
*** AggregateCreate(const char *aggName,
*** 65,70 
--- 66,73 
  List *aggmfinalfnName,
  bool finalfnExtraArgs,
  bool mfinalfnExtraArgs,
+ char finalfnModify,
+ char mfinalfnModify,
  List *aggsortopName,
  Oid aggTransType,
  int32 aggTransSpace,
*** AggregateCreate(const char *aggName,
*** 656,661 
--- 659,666 
  	values[Anum_pg_aggregate_aggmfinalfn - 1] = ObjectIdGetDatum(mfinalfn);
  	values[Anum_pg_aggregate_aggfinalextra - 1] = BoolGetDatum(finalfnExtraArgs);
  	values[Anum_pg_aggregate_aggmfinalextra - 1] = BoolGetDatum(mfinalfnExtraArgs);
+ 	values[Anum_pg_aggregate_aggfinalmodify - 1] = CharGetDatum(finalfnModify);
+ 	values[Anum_pg_aggregate_aggmfinalmodify - 1] = CharGetDatum(mfinalfnModify);
  	values[Anum_pg_aggregate_aggsortop - 1] = ObjectIdGetDatum(sortop);
  	values[Anum_pg_aggregate_aggtranstype - 1] = ObjectIdGetDatum(aggTransType);
  	values[Anum_pg_aggregate_aggtransspace - 1] = Int32GetDatum(aggTransSpace);
diff --git a/src/backend/commands/aggregatecmds.c b/src/backend/commands/aggregatecmds.c
index a63539a..07ef4c5 100644
*** a/src/backend/commands/aggregatecmds.c
--- b/src/backend/commands/aggregatecmds.c
***
*** 26,31 
--- 26,32 
  #include "catalog/dependency.h"
  #include "catalog/indexing.h"
  #include "catalog/pg_aggregate.h"
+ #include "catalog/pg_aggregate_fn.h"
  #include "catalog/pg_proc.h"
  #include "catalog/pg_type.h"
  #include "commands/alter.h"
***
*** 39,44 
--- 40,48 
  #include "utils/syscache.h"
  
  
+ static char extractModify(DefElem *defel);
+ 
+ 
  /*
   *	DefineAggregate
   *
*** DefineAggregate(ParseState *pstate, List
*** 67,72 
--- 71,78 
  	List	   *mfinalfuncName = NIL;
  	bool		finalfuncExtraArgs = false;
  	bool		mfinalfuncExtraArgs = false;
+ 	char		finalfuncModify = 0;
+ 	char		mfinalfuncModify = 0;
  	List	   *sortoperatorName = NIL;
  	TypeName   *baseType = NULL;
  	TypeName   *transType = NULL;
*** DefineAggregate(ParseState *pstate, List
*** 143,148 
--- 149,158 
  			finalfuncExtraArgs = defGetBoolean(defel);
  		else if (pg_strcasecmp(defel->defname, "mfinalfunc_extra") == 0)
  			mfinalfuncExtraArgs = defGetBoolean(defel);
+ 		else if (pg_strcasecmp(defel->defname, "finalfunc_modify") == 0)
+ 			finalfuncModify = extractModify(defel);
+ 		else if (pg_strcasecmp(defel->defname, "mfinalfunc_modify") == 0)
+ 			mfinalfuncModify = extractModify(defel);
  		else if 

Re: [HACKERS] Disable cross products in postgres

2017-10-13 Thread Gourav Kumar
I want to get the join graph of a given query. Which has node for each
relation and an edge between two nodes if they have a join predicate among
them.
On 14-Oct-2017 3:58 AM, "Andres Freund"  wrote:

> On 2017-10-14 03:49:57 +0530, Gourav Kumar wrote:
> > But then is there some way to tell Optimizer not to consider transitive
> > joins ?
>
> What are you actually trying to achieve here?
>
> - Andres
>


Re: [HACKERS] Disable cross products in postgres

2017-10-13 Thread Andres Freund
On 2017-10-14 03:49:57 +0530, Gourav Kumar wrote:
> But then is there some way to tell Optimizer not to consider transitive
> joins ?

What are you actually trying to achieve here?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Continuous integration on Windows?

2017-10-13 Thread Thomas Munro
On Sat, Oct 14, 2017 at 7:47 AM, legrand legrand
 wrote:
> This may seems obvious for you
>
> but where is the build result ?

Each CI platform has a web page corresponding to your
GitHub/BitBucket/... user account that lists builds results.  You can
also get notifications by various means including email if there is a
failure.  Here's a randomly selected example build log:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/287749152

That particular build happened  because a cronjob of mine pushed a
Commitfest patch into a branch on github, and I maintain plenty more
branches here:

https://github.com/postgresql-cfbot/postgresql/branches

Here are the resulting builds:

https://travis-ci.org/postgresql-cfbot/postgresql/branches

Here's a build from a personal development branch of my own, this time
on AppVeyor:

https://ci.appveyor.com/project/macdice/postgres/build/1.0.3

As Andrew mentioned, AppVeyor builds using his appveyor.yml with the
addition of test_script that I showed above current fail in "test
tablespace" for some reason that we'll need to sort out, and as I
mentioned you can't yet see the regressions.diff output etc... but
it's a start.  This is actually already useful for me, because I
changed a bunch of temporarily file cleanup code and I wanted to
confirm that it would work on Window.  That's being exercised in one
of the tests.  So far so good.

> Is it stored somewhere to permit to users like me
> that want to test pg 10 on windows
> without having to build it ?

That's the idea.  It's quite similar to the build farm, except that it
tests your stuff *before* it gets committed to master and everyone
shouts at you, and can also be used to test strange experiments and
theories without disturbing anyone else.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disable cross products in postgres

2017-10-13 Thread Gourav Kumar
But then is there some way to tell Optimizer not to consider transitive
joins ?

Or to know if the join is transitive or not ?
On 14-Oct-2017 3:43 AM, "Tom Lane"  wrote:

> Gourav Kumar  writes:
> > For e.g. I am checking for this query
> > ...
> >  where
> > and ss1.ca_county = ss2.ca_county
> > and ss2.ca_county = ss3.ca_county
> > and ss1.ca_county = ws1.ca_county
> > and ws1.ca_county = ws2.ca_county
> > and ws1.ca_county = ws3.ca_county
>
> > It doesn't has a join predicate between ss1 and ws2 or ss1 and ws3. But
> > optimizer still considers a join among them.
>
> Sure it does, after transitive propagation of those equalities;
> for instance we can derive ss1.ca_county = ws2.ca_county from
> the above-quoted conditions.  And it would be very stupid of the
> optimizer not to consider those derived join conditions, because
> they may lead to the optimal join order.
>
> In general it's already true that the optimizer doesn't consider
> clauseless joins unless there's no other choice.  But this example
> isn't showing such a case.
>
> regards, tom lane
>


Re: [HACKERS] Disable cross products in postgres

2017-10-13 Thread Tom Lane
Gourav Kumar  writes:
> For e.g. I am checking for this query
> ...
>  where
> and ss1.ca_county = ss2.ca_county
> and ss2.ca_county = ss3.ca_county
> and ss1.ca_county = ws1.ca_county
> and ws1.ca_county = ws2.ca_county
> and ws1.ca_county = ws3.ca_county

> It doesn't has a join predicate between ss1 and ws2 or ss1 and ws3. But
> optimizer still considers a join among them.

Sure it does, after transitive propagation of those equalities;
for instance we can derive ss1.ca_county = ws2.ca_county from
the above-quoted conditions.  And it would be very stupid of the
optimizer not to consider those derived join conditions, because
they may lead to the optimal join order.

In general it's already true that the optimizer doesn't consider
clauseless joins unless there's no other choice.  But this example
isn't showing such a case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Discussion on missing optimizations

2017-10-13 Thread Andres Freund
Hi,

On 2017-10-14 10:38:13 +1300, David Rowley wrote:
> On 12 October 2017 at 04:50, Robert Haas  wrote:
> > We haven't really done a very good job figuring out what to do about
> > optimizations that some people (mostly you) think are marginal.  It's
> > obviously true that we can't just burn infinite planner cycles on
> > things that don't usually help, but at the same time, we can't just
> > keep ignoring requests by users for the same features and saying
> > "you'll be sad if we give this to you".  Those people don't give up on
> > wanting the optimization; they just don't use PostgreSQL.  I think we
> > need to stop just saying "no" and start thinking about what we could
> > do that would let us say "yes".

+1 to the general sentiment.


> I'm with Robert on this.  Planning time *is* important, but if we were
> to do a quick tally on the number of patches that we've seen in the
> last few years to improve execution time by either improving the
> executor code or adding some smarts to the planner to reduce executor
> work vs patches aimed to reduce planning time, I think we'd find the
> general focus is on the executor.

That's true. But I think that's partially because people benchmarking
with the goal to identify bottlenecks just habitually use prepared
statements to remove planning overhead.

That works well enough in benchmarking scenarios, but in practice I
think that's less clearly a good tradeoff - e.g in OLTP workloads you'll
often see custom plans defeating the intent of using prepared
statements.


> Personally, I don't recall a single patch aimed to just speed up the
> planner.

There were some a couple years back.


> It looks like there's plenty we could do in there, just nobody seems
> interested enough to go and do it, everyone who cares about
> performance is too busy trying to make execution run faster.

I think that's largely because our executor speed is quite bad. To some
degree because we paid much less attention to seed degradations there
for a couple years, absurdly enough.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How does postgres store the join predicate for a relation in a given query

2017-10-13 Thread Gourav Kumar
Why does have_relevant_joinclause() and have_relevant_eclass_joinclause()
return true for all possible joins for the query given below.
Even when they have no join predicate between them.
e.g. join between ss1 & ws3, ss2 & ws3 etc.

The query is :
TPC-DS query 50

 -- query 50 in stream 0 using template query31.tpl
with ss as
 (select ca_county,d_qoy, d_year,sum(ss_ext_sales_price) as store_sales
 from store_sales,date_dim,customer_address
 where ss_sold_date_sk = d_date_sk
  and ss_addr_sk=ca_address_sk
 group by ca_county,d_qoy, d_year),
 ws as
 (select ca_county,d_qoy, d_year,sum(ws_ext_sales_price) as web_sales
 from web_sales,date_dim,customer_address
 where ws_sold_date_sk = d_date_sk
  and ws_bill_addr_sk=ca_address_sk
 group by ca_county,d_qoy, d_year)
 select /* tt */
ss1.ca_county
   ,ss1.d_year
   ,ws2.web_sales/ws1.web_sales web_q1_q2_increase
   ,ss2.store_sales/ss1.store_sales store_q1_q2_increase
   ,ws3.web_sales/ws2.web_sales web_q2_q3_increase
   ,ss3.store_sales/ss2.store_sales store_q2_q3_increase
 from
ss ss1
   ,ss ss2
   ,ss ss3
   ,ws ws1
   ,ws ws2
   ,ws ws3
 where
ss1.d_qoy = 1
and ss1.d_year = 2000
and ss1.ca_county = ss2.ca_county
and ss2.d_qoy = 2
and ss2.d_year = 2000
 and ss2.ca_county = ss3.ca_county
and ss3.d_qoy = 3
and ss3.d_year = 2000
and ss1.ca_county = ws1.ca_county
and ws1.d_qoy = 1
and ws1.d_year = 2000
and ws1.ca_county = ws2.ca_county
and ws2.d_qoy = 2
and ws2.d_year = 2000
and ws1.ca_county = ws3.ca_county
and ws3.d_qoy = 3
and ws3.d_year =2000
and case when ws1.web_sales > 0 then ws2.web_sales/ws1.web_sales else
null end
   > case when ss1.store_sales > 0 then ss2.store_sales/ss1.store_sales
else null end
and case when ws2.web_sales > 0 then ws3.web_sales/ws2.web_sales else
null end
   > case when ss2.store_sales > 0 then ss3.store_sales/ss2.store_sales
else null end
 order by web_q2_q3_increase;

-- end


On 13 October 2017 at 01:00, Gourav Kumar  wrote:

> Well for this given query it is possible. I haven't come across any such
> query yet.
>
> Possibly because I am more concerned about the TPCDS and TPCH benchmarks,
> where it's less likely to occur.
>
> On 13 October 2017 at 00:52, Tom Lane  wrote:
>
>> Gourav Kumar  writes:
>> > A Join clause/predicate will only mention 2 relations. It can't have 3
>> or
>> > more relations.
>>
>> Really?  What of, say,
>>
>> select ... from a,b,c where (a.x + b.y) = c.z;
>>
>> regards, tom lane
>>
>
>
>
> --
> Thanks,
> Gourav Kumar
> Computer Science and Automation
> Indian Institute of Science
>



-- 
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science


Re: [HACKERS] Discussion on missing optimizations

2017-10-13 Thread David Rowley
On 12 October 2017 at 04:50, Robert Haas  wrote:
> We haven't really done a very good job figuring out what to do about
> optimizations that some people (mostly you) think are marginal.  It's
> obviously true that we can't just burn infinite planner cycles on
> things that don't usually help, but at the same time, we can't just
> keep ignoring requests by users for the same features and saying
> "you'll be sad if we give this to you".  Those people don't give up on
> wanting the optimization; they just don't use PostgreSQL.  I think we
> need to stop just saying "no" and start thinking about what we could
> do that would let us say "yes".

I'm with Robert on this.  Planning time *is* important, but if we were
to do a quick tally on the number of patches that we've seen in the
last few years to improve execution time by either improving the
executor code or adding some smarts to the planner to reduce executor
work vs patches aimed to reduce planning time, I think we'd find the
general focus is on the executor. Personally, I don't recall a single
patch aimed to just speed up the planner. Perhaps I missed some? It
looks like there's plenty we could do in there, just nobody seems
interested enough to go and do it, everyone who cares about
performance is too busy trying to make execution run faster.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve catcache/syscache performance.

2017-10-13 Thread Andres Freund
Hi,

On 2017-10-13 10:38:47 -0700, Andres Freund wrote:
> On 2017-10-13 13:06:41 -0400, Robert Haas wrote:
> > On Thu, Sep 14, 2017 at 2:12 AM, Andres Freund  wrote:
> > > This patch gives me roughly 8% speedup in a workload that consists out
> > > of a fast query that returns a lot of columns.  If I apply a few
> > > other performance patches, this patch itself starts to make a bigger
> > > difference, of around 11%.
> > 
> > I did a read-through of this patch today.
> 
> Thanks!

Pushed after making the adaptions you suggested, pgindenting, and fixing
one bug (cstring lookups + datumCopy() = not good).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disable cross products in postgres

2017-10-13 Thread Gourav Kumar
For e.g. I am checking for this query

with ss as
 (select ca_county,d_qoy, d_year,sum(ss_ext_sales_price) as store_sales
 from store_sales,date_dim,customer_address
 where ss_sold_date_sk = d_date_sk
  and ss_addr_sk=ca_address_sk
 group by ca_county,d_qoy, d_year),
 ws as
 (select ca_county,d_qoy, d_year,sum(ws_ext_sales_price) as web_sales
 from web_sales,date_dim,customer_address
 where ws_sold_date_sk = d_date_sk
  and ws_bill_addr_sk=ca_address_sk
 group by ca_county,d_qoy, d_year)
 select /* tt */
ss1.ca_county
   ,ss1.d_year
   ,ws2.web_sales/ws1.web_sales web_q1_q2_increase
   ,ss2.store_sales/ss1.store_sales store_q1_q2_increase
   ,ws3.web_sales/ws2.web_sales web_q2_q3_increase
   ,ss3.store_sales/ss2.store_sales store_q2_q3_increase
 from
ss ss1
   ,ss ss2
   ,ss ss3
   ,ws ws1
   ,ws ws2
   ,ws ws3
 where
ss1.d_qoy = 1
and ss1.d_year = 2000
and ss1.ca_county = ss2.ca_county
and ss2.d_qoy = 2
and ss2.d_year = 2000
 and ss2.ca_county = ss3.ca_county
and ss3.d_qoy = 3
and ss3.d_year = 2000
and ss1.ca_county = ws1.ca_county
and ws1.d_qoy = 1
and ws1.d_year = 2000
and ws1.ca_county = ws2.ca_county
and ws2.d_qoy = 2
and ws2.d_year = 2000
and ws1.ca_county = ws3.ca_county
and ws3.d_qoy = 3
and ws3.d_year =2000
and case when ws1.web_sales > 0 then ws2.web_sales/ws1.web_sales else
null end
   > case when ss1.store_sales > 0 then ss2.store_sales/ss1.store_sales
else null end
and case when ws2.web_sales > 0 then ws3.web_sales/ws2.web_sales else
null end
   > case when ss2.store_sales > 0 then ss3.store_sales/ss2.store_sales
else null end
 order by web_q2_q3_increase;


It's a TPC-DS benchmark query.
It doesn't has a join predicate between ss1 and ws2 or ss1 and ws3. But
optimizer still considers a join among them.

On 14 October 2017 at 02:20, Gourav Kumar  wrote:

> I tried debugging the code, at no point in execution the function
>  make_rels_by_clauseless_joins was called.  Although optimizer did
> consider some of the  joins which are cross products.
>
> On 14 October 2017 at 01:57, Gourav Kumar  wrote:
>
>> Can I use something like joininfo, which will store the join predicates
>> and I can check if there is no join predicate among the two relations don't
>> consider them.
>>
>> On 14 October 2017 at 01:48, Fabrízio de Royes Mello <
>> fabriziome...@gmail.com> wrote:
>>
>>>
>>> On Fri, Oct 13, 2017 at 5:08 PM, Robert Haas 
>>> wrote:
>>> >
>>> > On Fri, Oct 13, 2017 at 4:06 PM, Gourav Kumar 
>>> wrote:
>>> > > Can you guide me where to look for it?
>>> >
>>> > Search for make_rels_by_clauseless_joins.
>>> >
>>>
>>> I wonder if it's possible implement it as an extension using some hook
>>>
>>> --
>>> Fabrízio de Royes Mello
>>> Consultoria/Coaching PostgreSQL
>>> >> Timbira: http://www.timbira.com.br
>>> >> Blog: http://fabriziomello.github.io
>>> >> Linkedin: http://br.linkedin.com/in/fabriziomello
>>> >> Twitter: http://twitter.com/fabriziomello
>>> >> Github: http://github.com/fabriziomello
>>>
>>
>>
>>
>> --
>> Thanks,
>> Gourav Kumar
>> Computer Science and Automation
>> Indian Institute of Science
>>
>
>
>
> --
> Thanks,
> Gourav Kumar
> Computer Science and Automation
> Indian Institute of Science
>



-- 
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science


Re: [HACKERS] Disable cross products in postgres

2017-10-13 Thread Gourav Kumar
I tried debugging the code, at no point in execution the function
make_rels_by_clauseless_joins
was called.  Although optimizer did consider some of the  joins which are
cross products.

On 14 October 2017 at 01:57, Gourav Kumar  wrote:

> Can I use something like joininfo, which will store the join predicates
> and I can check if there is no join predicate among the two relations don't
> consider them.
>
> On 14 October 2017 at 01:48, Fabrízio de Royes Mello <
> fabriziome...@gmail.com> wrote:
>
>>
>> On Fri, Oct 13, 2017 at 5:08 PM, Robert Haas 
>> wrote:
>> >
>> > On Fri, Oct 13, 2017 at 4:06 PM, Gourav Kumar 
>> wrote:
>> > > Can you guide me where to look for it?
>> >
>> > Search for make_rels_by_clauseless_joins.
>> >
>>
>> I wonder if it's possible implement it as an extension using some hook
>>
>> --
>> Fabrízio de Royes Mello
>> Consultoria/Coaching PostgreSQL
>> >> Timbira: http://www.timbira.com.br
>> >> Blog: http://fabriziomello.github.io
>> >> Linkedin: http://br.linkedin.com/in/fabriziomello
>> >> Twitter: http://twitter.com/fabriziomello
>> >> Github: http://github.com/fabriziomello
>>
>
>
>
> --
> Thanks,
> Gourav Kumar
> Computer Science and Automation
> Indian Institute of Science
>



-- 
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science


Re: [HACKERS] Extended statistics is not working on Vars hidden under a RelabelType

2017-10-13 Thread David Rowley
On 14 October 2017 at 09:04, Robert Haas  wrote:
> On Mon, Oct 9, 2017 at 11:03 PM, David Rowley
>  wrote:
>> -- Unpatched
>>  Planning time: 0.184 ms
>>  Execution time: 105.878 ms
>>
>> -- Patched
>>  Planning time: 2.175 ms
>>  Execution time: 106.326 ms
>
> This might not be the best example to show the advantages of the
> patch, honestly.

The focus was on the row estimate. I try to highlight that by
mentioning "Note rows=1 vs rows=98 in the Gather node.". I can't
imagine the test I added would have made the planner about 12 times
slower, but just for the record:

create table ab (a varchar, b varchar);
insert into ab select (x%1000)::varchar, (x%1)::Varchar from
generate_Series(1,100)x;
create statistics ab_a_b_stats (dependencies) on a,b from ab;
vacuum analyze ab;

$ cat a.sql
explain select * from ab where a = '1' and b = '1';

e9ef11ac8bb2acc2d2462fc17ec3291a959589e7 (Patched)

$ pgbench -f a.sql -T 60 -n
transaction type: a.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 496950
latency average = 0.121 ms
tps = 8282.481310 (including connections establishing)
tps = 8282.750821 (excluding connections establishing)

e9ef11ac8bb2acc2d2462fc17ec3291a959589e7~1 (Unpatched)

$ pgbench -f a.sql -T 60 -n
transaction type: a.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 511250
latency average = 0.117 ms
tps = 8520.822410 (including connections establishing)
tps = 8521.132784 (excluding connections establishing)

With the patch we are making use of the extended statistics, which we
do expect to be more work for the planner. Although, we didn't add
extended statistics to speed up the planner.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended statistics is not working on Vars hidden under a RelabelType

2017-10-13 Thread Tomas Vondra

On 10/13/2017 10:04 PM, Robert Haas wrote:
> On Mon, Oct 9, 2017 at 11:03 PM, David Rowley
>  wrote:
>> -- Unpatched
>>  Planning time: 0.184 ms
>>  Execution time: 105.878 ms
>>
>> -- Patched
>>  Planning time: 2.175 ms
>>  Execution time: 106.326 ms
> 
> This might not be the best example to show the advantages of the
> patch, honestly.
> 

Not sure what exactly is your point? If you're suggesting this example
is bad because the planning time increased from 0.184 to 2.175 ms, then
perhaps consider the plans were likely generated on a assert-enabled
build and on a laptop (both of which adds quite a bit of noise to
occasional timings). The patch has no impact on planning time (at least
I've been unable to measure any).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-10-13 Thread Peter Geoghegan
On Fri, Oct 13, 2017 at 1:02 PM, Robert Haas  wrote:
>> I don't think it's our place to "interpret" the bits. Are we *also*
>> going to show HEAP_XMIN_FROZEN when xmin is physically set to
>> FrozenTransactionId?
>
> No, of course not.  We're talking about how to display the 256 and 512
> bits of t_infomask.  Those have four states: nothing, committed,
> invalid, frozen.  You're arguing that frozen isn't a real state, that
> it's somehow just a combination of committed and invalid, but I think
> that's the wrong way of thinking about it.

No, I'm arguing that they're just bits. Show the bits, rather than
interpreting what is displayed. Document that there are other logical
states that are represented as composites of contradictory/mutually
exclusive states.

Anyone who hopes to interpret these values has to be an expert anyway,
or willing to become something of an expert. There is a good chance
that they've taken an interest because something is already wrong.

> Before HEAP_XMIN_FROZEN existed, it would have been right to display
> the state where both bits are set as committed|invalid, because that
> would clearly show you that two things had been set that should never
> both be set at the same time.  But now that's a valid state with a
> well-defined meaning and I think we should display the actual meaning
> of that state.
>
>> Where does it end?
>
> I guess it ends wherever we decide to stop.

You can take what you're saying much further. What about
HEAP_XMAX_SHR_LOCK, and HEAP_MOVED? Code like HEAP_LOCKED_UPGRADED()
pretty strongly undermines the idea that these composite values are
abstractions.

>> I think that we should prominently document that HEAP_XMIN_COMMITTED
>> |HEAP_XMIN_ABORTED == HEAP_XMIN_FROZEN, rather than trying to hide
>> complexity that we have no business hiding in a tool like pageinspect.
>
> I respect that opinion, but I don't think I'm trying to hide anything.
> I think I'm proposing that we display the information in what I
> believed to be the clearest and most accurate way.

pg_filedump doesn't display HEAP_XMIN_FROZEN, either. (Nor does it
ever display any of the other composite t_infomask/t_infomask2
values.)

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disable cross products in postgres

2017-10-13 Thread Gourav Kumar
Can I use something like joininfo, which will store the join predicates and
I can check if there is no join predicate among the two relations don't
consider them.

On 14 October 2017 at 01:48, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:

>
> On Fri, Oct 13, 2017 at 5:08 PM, Robert Haas 
> wrote:
> >
> > On Fri, Oct 13, 2017 at 4:06 PM, Gourav Kumar 
> wrote:
> > > Can you guide me where to look for it?
> >
> > Search for make_rels_by_clauseless_joins.
> >
>
> I wonder if it's possible implement it as an extension using some hook
>
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
> >> Timbira: http://www.timbira.com.br
> >> Blog: http://fabriziomello.github.io
> >> Linkedin: http://br.linkedin.com/in/fabriziomello
> >> Twitter: http://twitter.com/fabriziomello
> >> Github: http://github.com/fabriziomello
>



-- 
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science


Re: [HACKERS] Parallel safety for extern params

2017-10-13 Thread Robert Haas
On Fri, Oct 13, 2017 at 1:19 AM, Amit Kapila  wrote:
> After fixing this problem, when I ran the regression tests with
> force_parallel_mode = regress, I saw multiple other failures.  All the
> failures boil down to two kinds of cases:
>
> 1. There was an assumption while extracting simple expressions that
> the target list of gather node can contain constants or Var's.  Now,
> once the above patch allows extern params as parallel-safe, that
> assumption no longer holds true.  We need to handle params as well.
> Attached patch fix_simple_expr_interaction_gather_v1.patch handles
> that case.

- * referencing the child node's output ... but setrefs.c might also have
- * copied a Const as-is.
+ * referencing the child node's output or a Param... but setrefs.c might
+ * also have copied a Const as-is.

I think the Param case should be mentioned after "... but" not before
- i.e. referencing the child node's output... but setrefs.c might also
have copied a Const or Param is-is.

> 2. We don't allow execution to use parallelism if the plan can be
> executed multiple times.  This has been enforced in ExecutePlan, but
> it seems like that we miss to handle the case where we are already in
> parallel mode by the time we enforce that condition.  So, what
> happens, as a result, is that it will allow to use parallelism when it
> shouldn't (when the same plan is executed multiple times) and lead to
> a crash.  One way to fix is that we temporarily exit the parallel mode
> in such cases and reenter in the same state once the current execution
> is over. Attached patch fix_parallel_mode_nested_execution_v1.patch
> fixes this problem.

This seems completely unsafe.  If somebody's already entered parallel
mode, they are counting on having the parallel-mode restrictions
enforced until they exit parallel mode.  We can't just disable those
restrictions for a while in the middle and then put them back.

I think the bug is in ExecGather(Merge): it assumes that if we're in
parallel mode, it's OK to start workers.  But actually, it shouldn't
do this unless ExecutePlan ended up with use_parallel_mode == true,
which isn't quite the same thing.  I think we might need ExecutePlan
to set a flag in the estate that ExecGather(Merge) can check.

Thanks for working on this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disable cross products in postgres

2017-10-13 Thread Fabrízio de Royes Mello
On Fri, Oct 13, 2017 at 5:08 PM, Robert Haas  wrote:
>
> On Fri, Oct 13, 2017 at 4:06 PM, Gourav Kumar 
wrote:
> > Can you guide me where to look for it?
>
> Search for make_rels_by_clauseless_joins.
>

I wonder if it's possible implement it as an extension using some hook

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Disable cross products in postgres

2017-10-13 Thread Robert Haas
On Fri, Oct 13, 2017 at 4:06 PM, Gourav Kumar  wrote:
> Can you guide me where to look for it?

Search for make_rels_by_clauseless_joins.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disable cross products in postgres

2017-10-13 Thread Gourav Kumar
Can you guide me where to look for it?

On 14 October 2017 at 01:35, Robert Haas  wrote:

> On Fri, Oct 13, 2017 at 3:41 PM, Gourav Kumar 
> wrote:
> > is there some way through which I can disable cross products in
> postgresql?
> >
> > This will make the DP join to not to consider join between two relations
> if
> > they don't have a join predicate among them.
>
> I mean, it would be easy enough to modify the code.  We don't have a
> configuration option for it.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science


Re: [HACKERS] Disable cross products in postgres

2017-10-13 Thread Robert Haas
On Fri, Oct 13, 2017 at 3:41 PM, Gourav Kumar  wrote:
> is there some way through which I can disable cross products in postgresql?
>
> This will make the DP join to not to consider join between two relations if
> they don't have a join predicate among them.

I mean, it would be easy enough to modify the code.  We don't have a
configuration option for it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended statistics is not working on Vars hidden under a RelabelType

2017-10-13 Thread Robert Haas
On Mon, Oct 9, 2017 at 11:03 PM, David Rowley
 wrote:
> -- Unpatched
>  Planning time: 0.184 ms
>  Execution time: 105.878 ms
>
> -- Patched
>  Planning time: 2.175 ms
>  Execution time: 106.326 ms

This might not be the best example to show the advantages of the
patch, honestly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-10-13 Thread Robert Haas
On Thu, Oct 12, 2017 at 7:35 PM, Peter Geoghegan  wrote:
> On Tue, Aug 15, 2017 at 10:54 AM, Robert Haas  wrote:
>>> Or at least make the filtering optional.
>>
>> I don't think "filtering" is the right way to think about it.  It's
>> just labeling each combination of bits with the meaning appropriate to
>> that combination of bits.
>
> I do. -1 to not just showing what's on the page -- if the
> HEAP_XMIN_COMMITTED and HEAP_XMIN_ABORTED bits are set, then I think
> we should show them. Yeah, I accept that there is a real danger of
> confusing people with that. Unfortunately, I think that displaying
> HEAP_XMIN_FROZEN will cause even more confusion. I don't think that
> HEAP_XMIN_FROZEN is an abstraction at all. It's a notational
> convenience.

Well, *I* think that HEAP_XMIN_FROZEN is an abstraction.  That's why
we have #define -- to help us create abstractions.

> I don't think it's our place to "interpret" the bits. Are we *also*
> going to show HEAP_XMIN_FROZEN when xmin is physically set to
> FrozenTransactionId?

No, of course not.  We're talking about how to display the 256 and 512
bits of t_infomask.  Those have four states: nothing, committed,
invalid, frozen.  You're arguing that frozen isn't a real state, that
it's somehow just a combination of committed and invalid, but I think
that's the wrong way of thinking about it.  When the 256-bit is clear,
the 512-bit tells you whether the xmin is known invalid, but when the
256-bit is set, the 512-bit tells you whether the tuple is also
frozen.

Before HEAP_XMIN_FROZEN existed, it would have been right to display
the state where both bits are set as committed|invalid, because that
would clearly show you that two things had been set that should never
both be set at the same time.  But now that's a valid state with a
well-defined meaning and I think we should display the actual meaning
of that state.

> Where does it end?

I guess it ends wherever we decide to stop.  This isn't some kind of
crazy slippery slope we're talking about here, where one day we're
labeling informask bits and the next day it's global thermonuclear
war.

> I think that we should prominently document that HEAP_XMIN_COMMITTED
> |HEAP_XMIN_ABORTED == HEAP_XMIN_FROZEN, rather than trying to hide
> complexity that we have no business hiding in a tool like pageinspect.

I respect that opinion, but I don't think I'm trying to hide anything.
I think I'm proposing that we display the information in what I
believed to be the clearest and most accurate way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] show precise repos version for dev builds?

2017-10-13 Thread Robert Haas
On Thu, Oct 12, 2017 at 4:50 PM, Fabien COELHO  wrote:
> "svnversion" adds a "M" for modified on the status. There is an option with
> "git describe" to get something similar:
>
> git describe --long --always --all --dirty

I tried this out a bit:

[rhaas pgsql]$ git describe --long --always --all --dirty
heads/master-0-gd133982d59
[rhaas pgsql]$ git checkout head~
...blah blah...
[rhaas pgsql]$ git describe --long --always --all --dirty
heads/x-218-g6393613b6a

Eh, what?  Hmm, maybe it's because I have a local branch called 'x'
lying around from something or other.

[rhaas pgsql]$ git branch -D x
Deleted branch x (was 77b6b5e9ce).
[rhaas pgsql]$ git describe --long --always --all --dirty
tags/REL_10_BETA3-458-g6393613b6a

Mmph.  I understand the desire to identify the exact commit used for a
build somehow, but something whose output depends on whether or not I
left a branch lying around locally doesn't seem that great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Disable cross products in postgres

2017-10-13 Thread Gourav Kumar
Hi all,

is there some way through which I can disable cross products in postgresql?

This will make the DP join to not to consider join between two relations if
they don't have a join predicate among them.


Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science


Re: [HACKERS] Pluggable storage

2017-10-13 Thread Alexander Korotkov
On Fri, Oct 13, 2017 at 9:41 PM, Robert Haas  wrote:

> On Fri, Oct 13, 2017 at 1:59 PM, Peter Geoghegan  wrote:
> >> Fully agreed.
> >
> > If we implement that interface, where does that leave EvalPlanQual()?
>

>From the first glance, it seems that pluggable storage should
override EvalPlanQualFetch(), rest of EvalPlanQual() looks quite generic.


> > Do those semantics have to be preserved?
>
> For a general-purpose heap storage format, I would say yes.


+1

I mean, we don't really have control over how people use the API.  If
> somebody decides to implement a storage API that breaks EvalPlanQual
> semantics horribly, I can't stop them, and I don't want to stop them.
> Open source FTW.
>

Yeah.  We don't have any kind of "safe extensions".  Any extension can
break things really horribly.
For me that means user should absolutely trust extension developer.

But I don't really want that code in our tree, either.


We keep things in our tree as correct as we can.  And for sure, we should
follow this politics for pluggable storages too.


> I think a
> storage engine is and should be about the format in which data gets
> stored on disk, and that it should only affect the performance of
> queries not the answers that they give.


Pretty same idea as index access methods.  They also affects the
performance, but not query answers.  When it's not true, this situation
is considered as bug, and it needs to be fixed.


> I am sure there will be cases
> where, for reasons of implementation complexity, that turns out not to
> be true, but I think in general we should try to avoid it as much as
> we can.


I think in some cases we can tolerate missing features (and document it),
but don't tolerate wrong features.  For instance, we may have some pluggable
storage which doesn't support transactions at all (and that should be
documented for sure), but we shouldn't have pluggable storage which
transaction support is wrong.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] relkind check in DefineIndex

2017-10-13 Thread Robert Haas
On Fri, Oct 13, 2017 at 12:38 PM, Alvaro Herrera
 wrote:
> The relkind check in DefineIndex has grown into an ugly rats nest of
> 'if' statements.  I propose to change it into a switch, as per the
> attached.

wfm

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2017-10-13 Thread Robert Haas
On Fri, Oct 13, 2017 at 12:34 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> Implement table partitioning.
>
>> Currently, tables can be range-partitioned or list-partitioned.  List
>> partitioning is limited to a single column, but range partitioning can
>> involve multiple columns.  A partitioning "column" can be an
>> expression.
>
> I find the "partition strategy" thing a bit suspect code-wise.  I was a
> bit bothered by all the "default:" clauses in switches that deal with
> the possible values, and I was going to propose simply that we turn that
> into an enum -- a trivial patch, I thought.  Not so: the way it's used
> by the grammar is a bit odd.  Apparently, it starts life as a string
> (either "list" or "range"), and then transformPartitionSpec() has to go
> out of its way to return it as a char.
>
> I propose we have gram.y itself resolve the strings to either 'list' or
> 'range' and that the node contains only those values, not any string.
> Unless there is a reason why things are like this that I'm not seeing?

I don't think I feel strongly about this, but I'm also not sure
exactly what problem you are trying to solve.  Do you want to
elaborate on that a bit?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] relkind check in DefineIndex

2017-10-13 Thread Alvaro Herrera
The relkind check in DefineIndex has grown into an ugly rats nest of
'if' statements.  I propose to change it into a switch, as per the
attached.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 1a7420321f7f48bbc87dfdd38ba10fa57c6513da Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 13 Oct 2017 17:56:44 +0200
Subject: [PATCH] reword kind check using switch

---
 src/backend/commands/indexcmds.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b61aaac284..3f615b6260 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -375,25 +375,24 @@ DefineIndex(Oid relationId,
relationId = RelationGetRelid(rel);
namespaceId = RelationGetNamespace(rel);
 
-   if (rel->rd_rel->relkind != RELKIND_RELATION &&
-   rel->rd_rel->relkind != RELKIND_MATVIEW)
+   /* Ensure that it makes sense to index this kind of relation */
+   switch (rel->rd_rel->relkind)
{
-   if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
-
-   /*
-* Custom error message for FOREIGN TABLE since the 
term is close
-* to a regular table and can confuse the user.
-*/
+   case RELKIND_RELATION:
+   case RELKIND_MATVIEW:
+   /* OK */
+   break;
+   case RELKIND_FOREIGN_TABLE:
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("cannot create index on foreign 
table \"%s\"",

RelationGetRelationName(rel;
-   else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+   case RELKIND_PARTITIONED_TABLE:
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("cannot create index on 
partitioned table \"%s\"",

RelationGetRelationName(rel;
-   else
+   default:
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("\"%s\" is not a table or 
materialized view",
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2017-10-13 Thread Alvaro Herrera
Robert Haas wrote:
> Implement table partitioning.

> Currently, tables can be range-partitioned or list-partitioned.  List
> partitioning is limited to a single column, but range partitioning can
> involve multiple columns.  A partitioning "column" can be an
> expression.

I find the "partition strategy" thing a bit suspect code-wise.  I was a
bit bothered by all the "default:" clauses in switches that deal with
the possible values, and I was going to propose simply that we turn that
into an enum -- a trivial patch, I thought.  Not so: the way it's used
by the grammar is a bit odd.  Apparently, it starts life as a string
(either "list" or "range"), and then transformPartitionSpec() has to go
out of its way to return it as a char.

I propose we have gram.y itself resolve the strings to either 'list' or
'range' and that the node contains only those values, not any string.
Unless there is a reason why things are like this that I'm not seeing?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10

2017-10-13 Thread Robert Haas
On Thu, Oct 12, 2017 at 9:14 AM, Dilip Kumar  wrote:
>> Yep, this fixes the failures for me.
>>
> Thanks for confirming.

Committed and back-patched to v10.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-10-13 Thread Alexander Korotkov
On Fri, Oct 13, 2017 at 9:37 PM, Robert Haas  wrote:

> On Fri, Oct 13, 2017 at 5:25 AM, Kuntal Ghosh
>  wrote:
> > For some other
> > storage engine, if we maintain the older version in different storage,
> > undo for example, and don't require a new index entry, should we still
> > call it HOT-chain?
>
> I would say, emphatically, no.  HOT is a creature of the existing
> heap.  If it's creeping into storage APIs they are not really
> abstracted from what we have currently.


+1,
different storage may need to insert entries to only *some* of indexes.
Wherein these new index entries may have either same or new TIDs.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Continuous integration on Windows?

2017-10-13 Thread legrand legrand
This may seems obvious for you

but where is the build result ?

Is it stored somewhere to permit to users like me 
that want to test pg 10 on windows
without having to build it ?

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-10-13 Thread Robert Haas
On Fri, Oct 13, 2017 at 1:59 PM, Peter Geoghegan  wrote:
>> Fully agreed.
>
> If we implement that interface, where does that leave EvalPlanQual()?
> Do those semantics have to be preserved?

For a general-purpose heap storage format, I would say yes.

I mean, we don't really have control over how people use the API.  If
somebody decides to implement a storage API that breaks EvalPlanQual
semantics horribly, I can't stop them, and I don't want to stop them.
Open source FTW.

But I don't really want that code in our tree, either.  I think a
storage engine is and should be about the format in which data gets
stored on disk, and that it should only affect the performance of
queries not the answers that they give.  I am sure there will be cases
where, for reasons of implementation complexity, that turns out not to
be true, but I think in general we should try to avoid it as much as
we can.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-10-13 Thread Robert Haas
On Fri, Oct 13, 2017 at 5:25 AM, Kuntal Ghosh
 wrote:
> For some other
> storage engine, if we maintain the older version in different storage,
> undo for example, and don't require a new index entry, should we still
> call it HOT-chain?

I would say, emphatically, no.  HOT is a creature of the existing
heap.  If it's creeping into storage APIs they are not really
abstracted from what we have currently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve catcache/syscache performance.

2017-10-13 Thread Tom Lane
Andres Freund  writes:
> On 2017-10-13 14:07:54 -0400, Tom Lane wrote:
>> One idea might be to see if we can precalculate all the control data
>> needed for the caches and set it up as compile-time constants,
>> a la Gen_fmgrtab.pl, rather than reading it from the catalogs during
>> startup.  That would make the code less dependent on initialization
>> order rather than more so.

> Hm. That sounds somewhat enticing. You're thinking of doing so for
> catcaches alone, or something grander, including the relcaches? I'd
> assume the former?

Yeah.  The relcaches are basically impossible to precalculate as constants
because they contain run-time variable data such as relfilenode.  I might
be wrong because I didn't go look, but offhand I think there is nothing in
the catcache control data that couldn't be predetermined.

> For catcaches the hardest part probably is that we need a TupleDesc. Per
> type function lookups, oids, should be fairly easy in contrast.

We already have a mechanism for precalculated TupleDescs for system
catalogs, cf src/backend/catalog/schemapg.h.  Right now we only apply
that for a few "bootstrap" system catalogs, but I don't really see
a reason we couldn't use it for every catalog that has a catcache.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve catcache/syscache performance.

2017-10-13 Thread Andres Freund
On 2017-10-13 14:07:54 -0400, Tom Lane wrote:
> One idea might be to see if we can precalculate all the control data
> needed for the caches and set it up as compile-time constants,
> a la Gen_fmgrtab.pl, rather than reading it from the catalogs during
> startup.  That would make the code less dependent on initialization
> order rather than more so.

Hm. That sounds somewhat enticing. You're thinking of doing so for
catcaches alone, or something grander, including the relcaches? I'd
assume the former?

For catcaches the hardest part probably is that we need a TupleDesc. Per
type function lookups, oids, should be fairly easy in contrast.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility)

2017-10-13 Thread Robert Haas
On Fri, Oct 6, 2017 at 5:07 PM, Badrul Chowdhury  wrote:
> I added a mechanism to fall back to v3.0 if the BE fails to start when FE 
> initiates a connection with v3.1 (with optional startup parameters). This 
> completely eliminates the need to backpatch older servers, ie newer FE can 
> connect to older BE. Please let me know what you think.

Well, I think it needs a good bit of cleanup before we can really get
to the substance of the patch.

+fe_utils \
 interfaces \
 backend/replication/libpqwalreceiver \
 backend/replication/pgoutput \
-fe_utils \

Useless change, omit.

+if (whereToSendOutput != DestRemote ||
+PG_PROTOCOL_MAJOR(FrontendProtocol) < 3)
+return -1;
+
+int sendStatus = 0;

Won't compile on older compilers.  We generally aim for C89
compliance, with a few exceptions for newer features.

Also, why initialize sendStatus and then overwrite the value in the
very next line of code?

Also, the PG_PROTOCOL_MAJOR check here seems to be redundant with the
one in the caller.

+/* NegotiateServerProtocol packet structure
+ *
+ * [ 'Y' | msgLength | min_version | max_version | param_list_len
| list of param names ]
+ */
+

Please pgindent your patches.  I suspect you'll find this gets garbled.

Is there really any reason to separate NegotiateServerProtocol and
ServerProtocolVersion into separate functions?

-libpq = -L$(libpq_builddir) -lpq
+libpq = -L$(libpq_builddir) -lpq -L$(top_builddir)/src/common
-lpgcommon -L$(top_builddir)/src/fe_utils -lpgfeutils
+$libpq->AddReference($libpgcommon, $libpgfeutils, $libpgport);

I haven't done any research to try to figure out why you did this, but
I don't think these are likely to be acceptable changes.

SendServerProtocolVersionMessage should be adjusted to use the new
facilities added by commit 1de09ad8eb1fa673ee7899d6dfbb2b49ba204818.

-/* Check we can handle the protocol the frontend is using. */
-
-if (PG_PROTOCOL_MAJOR(proto) < PG_PROTOCOL_MAJOR(PG_PROTOCOL_EARLIEST) ||
-PG_PROTOCOL_MAJOR(proto) > PG_PROTOCOL_MAJOR(PG_PROTOCOL_LATEST) ||
-(PG_PROTOCOL_MAJOR(proto) == PG_PROTOCOL_MAJOR(PG_PROTOCOL_LATEST) &&
- PG_PROTOCOL_MINOR(proto) > PG_PROTOCOL_MINOR(PG_PROTOCOL_LATEST)))
-ereport(FATAL,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("unsupported frontend protocol %u.%u: server supports %
u.0 to %u.%u",
-PG_PROTOCOL_MAJOR(proto), PG_PROTOCOL_MINOR(proto),
-PG_PROTOCOL_MAJOR(PG_PROTOCOL_EARLIEST),
-PG_PROTOCOL_MAJOR(PG_PROTOCOL_LATEST),
-PG_PROTOCOL_MINOR(PG_PROTOCOL_LATEST;

The way you've arranged things here looks like it'll cause us to
accept connections even for protocol versions 4.x or higher; I don't
think we want that.  I suggest checking the major version number at
this point in the code; then, the code path for version 3+ needs no
additional check and the code path for version 2 can enforce 2.0.

+bool
+is_optional(const char *guc_name)
+{
+const char *optionalPrefix = "_pq_";
+bool isOptional = false;
+
+/* "_pq_" must be a proper prefix of the guc name in all encodings */
+if (guc_name_compare(guc_name, optionalPrefix) == 1 &&
+strstr(guc_name, optionalPrefix))
+isOptional = true;
+
+return isOptional;
+}

This seems like very strange coding for all kinds of reasons.  Why is
guc_name_compare() used to do the comparison and strstr() then used
afterwards?  Why do we need two tests instead of just one, and why
should one of them be case-sensitive and the other not?  Why not just
use strncmp?  Why write bool var = false; if (blah blah) var = true;
return var; instead of just return blah blah?  Why the comment about
encodings - that doesn't seem particularly relevant here?  Why
redeclare the prefix here instead of having a common definition
someplace that can be used by both the frontend and the backend,
probably a header file #define?

Also, this really doesn't belong in guc.c at all.  We should be
separating out these options in ProcessStartupPacket() just as we do
for existing protocol-level options.  When we actually have some
options, I think they should be segregated into a separate list
hanging off of the port, instead of letting them get mixed into
port->guc_options, but for right now we don't have any yet, so a bunch
of this complexity can go away.

+ListCell *gucopts = list_head(port->guc_options);
+while (gucopts)
+{
+char   *name;
+
+/* First comes key, which we need. */
+name = lfirst(gucopts);
+gucopts = lnext(gucopts);
+
+/* Then comes value, which we don't need. */
+gucopts = lnext(gucopts);
+
+pq_sendstring(, name);
+}

This is another coding rule violation because the declaration of
gucopts follows executable statements.

-SimpleStringList roles = {NULL, 

Re: [HACKERS] Improve catcache/syscache performance.

2017-10-13 Thread Tom Lane
Andres Freund  writes:
> On 2017-10-13 13:06:41 -0400, Robert Haas wrote:
>> I don't think it's this patch's job to do it, but it seems like we
>> ought to just invent some early-initialization step where things like
>> this can happen, so that we don't have to branch here at all.

> Yea, that'd be nice - it does show up in profiles.  I'd tried to do
> that, but it's not exactly trivial, so I decided to delay it. The
> ordering when syscache stuff gets initialized is, uh, fragile.  During
> InitCatalogCache() the catalog is not ready for access. After that, we
> access catcaches while not all catalogs are quite ready to be accessed.

Yeah, I think the fragility vs. benefit tradeoff here is not very
promising.

One idea might be to see if we can precalculate all the control data
needed for the caches and set it up as compile-time constants,
a la Gen_fmgrtab.pl, rather than reading it from the catalogs during
startup.  That would make the code less dependent on initialization
order rather than more so.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-10-13 Thread Peter Geoghegan
On Thu, Oct 12, 2017 at 2:23 PM, Robert Haas  wrote:
> On Thu, Oct 12, 2017 at 4:38 PM, Alexander Korotkov
>> However I imply that alternative storage would share our "MVCC model".  So,
>> it
>> should share our transactional model including transactions,
>> subtransactions, snapshots etc.
>> Therefore, if alternative storage is transactional, then in particular it
>> should be able to fetch tuple with
>> given TID according to given snapshot.  However, how it's implemented
>> internally is
>> a black box for us.  Thus, we don't insist that tuple should have different
>> TID after update;
>> we don't insist there is any analogue of HOT; we don't insist alternative
>> storage needs vacuum
>> (or if even it needs vacuum, it might be performed in completely different
>> way) and so on.
>
> Fully agreed.

If we implement that interface, where does that leave EvalPlanQual()?
Do those semantics have to be preserved?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve catcache/syscache performance.

2017-10-13 Thread Andres Freund
Hi,

On 2017-10-13 13:06:41 -0400, Robert Haas wrote:
> On Thu, Sep 14, 2017 at 2:12 AM, Andres Freund  wrote:
> > This patch gives me roughly 8% speedup in a workload that consists out
> > of a fast query that returns a lot of columns.  If I apply a few
> > other performance patches, this patch itself starts to make a bigger
> > difference, of around 11%.
> 
> I did a read-through of this patch today.

Thanks!


> +/* not as performance critical & "complicated" */
> 
> This comment kinda sucks.  I don't think it will be clear to somebody
> in 3 years what this means.  It's clear enough in context but later I
> think it won't be.  I suggest dumping this altogether and expanding
> the comment up above to encompass this:
> 
> Hash and equality functions for system types that are used as cache
> key fields.  In some cases, we just call the regular SQL-callable
> functions for the appropriate data type, but that tends to be a little
> slow, and the speed of these functions is performance-critical.
> Therefore, for data types that frequently occur as catcache keys, we
> hard-code the logic here.  Avoiding the overhead of
> DirectFunctionCallN(...) is a substantial win, and in certain cases
> (like int4) we can adopt a faster hash algorithm as well.

K.


> -if (cache->cc_tupdesc == NULL)
> +if (unlikely(cache->cc_tupdesc == NULL))
>  CatalogCacheInitializeCache(cache);
> 
> I don't think it's this patch's job to do it, but it seems like we
> ought to just invent some early-initialization step where things like
> this can happen, so that we don't have to branch here at all.

Yea, that'd be nice - it does show up in profiles.  I'd tried to do
that, but it's not exactly trivial, so I decided to delay it. The
ordering when syscache stuff gets initialized is, uh, fragile.  During
InitCatalogCache() the catalog is not ready for access. After that, we
access catcaches while not all catalogs are quite ready to be accessed.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Determine state of cluster (HA)

2017-10-13 Thread Joshua D. Drake

On 10/12/2017 05:50 PM, Joshua D. Drake wrote:

-Hackers,


Bumping this.



I had a long call with a firm developing front end proxy/cache/HA for 
Postgres today. Essentially the software is a replacement for PGPool in 
entirety but also supports analytics etc... When I was asking them about 
pain points they talked about the below and I was wondering if this is a 
problem we would like to solve:


  Per your request, here is our failover issue.

1.  In a modern devops environment, the database should be able to scale 
and morph over time based on need.
2.  Tools that are leveraging the database should be able to easily 
discover and potentially control (with permissions) the database. 
Currently, you can discover the master and what nodes are syncing off of 
it, but on a failure, a tool can't easily discover what orchestration 
has done on the back-end to make the cluster whole again, i.e. from the 
slave, you can't discover the master reliably and easily.


The logic that our code now uses is to:

1.  Find the master
2.  Add replication nodes per the master's configuration.

To find a master, we start with a list of candidate nodes that MAY be a 
master at any point, and:

1. issue "SELECT pg_is_in_recovery()" to find if it is a slave
a. If so, use "SELECT pg_read_file('recovery.conf')" to extract the host
b. Attempt to connect to the host directly, if not...
c. use the slave and use the hostname via dblink to connect to the 
master, as the hostname , i.e. select * from dblink('" + connInfo + " 
dbname=postgres', 'select inet_server_addr()') AS t(inet_server_addr 
inet).  This is necessary in the event the hostname used in the 
recovery.conf file is not resolvable from the outside.
d. Use the dblink connection to ID the master node via select 
inet_server_addr();

e. connect to the IP provided by the master.
f.  Repeat through nodes until we get a master.

Issues:
1.  The dblink call doesn't have a way to specify a timeout, so we have 
to use Java futures to control how long this may take to a reasonable 
amount of time;
2.  NAT mapping may result in us detecting IP ranges that are not 
accessible to the application nodes.
3.  there is no easy way to monitor for state changes as they happen, 
allowing faster failovers, everything has to be polled based on events;
4.  It doesn't support cascading replication very well, although we 
could augment the logic to allow us to map the relationship between nodes.
5.  There is no way to connect to a db node with something akin to 
SQL-Server's "application intent" flags, to allow a connection to be 
rejected if we wish it to be a read/write connection.  This helps detect 
the state of the node directly without having to ask any further 
questions of the node, and makes it easier to "stall" during connection 
until a proper connection can be made.
6.  The master, on shutdown, will not actually close and disable 
connections as it shuts down, instead, it will issue an error that it is 
shutting down as it does so.


Fundamentally, the biggest issue is that it is very hard to determine 
the state of the cluster by asking all the nodes, in particular in the 
case of a failure.  Some state information is lost that is necessary to 
talk to the cluster moving forward in a reliable manner.






--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql: Avoid coercing a whole-row variable that is already coerced

2017-10-13 Thread Robert Haas
On Fri, Oct 13, 2017 at 12:56 PM, Tom Lane  wrote:
> It's better ... but after reading the patched code, a lot of my remaining
> beef is with the lack of clarity of the comments.  You need ESP to
> understand what the function is trying to accomplish and what the
> constraints are.  I'll take a whack at improving that and push.

OK, thanks.   The good thing is, now that we know you have ESP, you
can use it with the Ouija board Andres used to decide whether to apply
sizeof to the variable or the type.  That's got to be a win.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve catcache/syscache performance.

2017-10-13 Thread Robert Haas
On Thu, Sep 14, 2017 at 2:12 AM, Andres Freund  wrote:
> This patch gives me roughly 8% speedup in a workload that consists out
> of a fast query that returns a lot of columns.  If I apply a few
> other performance patches, this patch itself starts to make a bigger
> difference, of around 11%.

I did a read-through of this patch today.  I don't see anything really
serious to complain about here.  Somebody might question the use of
the no-inline stuff, but it seems sensible to me in this context.

+/* not as performance critical & "complicated" */

This comment kinda sucks.  I don't think it will be clear to somebody
in 3 years what this means.  It's clear enough in context but later I
think it won't be.  I suggest dumping this altogether and expanding
the comment up above to encompass this:

Hash and equality functions for system types that are used as cache
key fields.  In some cases, we just call the regular SQL-callable
functions for the appropriate data type, but that tends to be a little
slow, and the speed of these functions is performance-critical.
Therefore, for data types that frequently occur as catcache keys, we
hard-code the logic here.  Avoiding the overhead of
DirectFunctionCallN(...) is a substantial win, and in certain cases
(like int4) we can adopt a faster hash algorithm as well.

+{
+return false;
+}

Excess braces.

+ * The use of argument specific numbers is encouraged, they're faster, and
+ * insulates the caller from changes in the maximum number of keys.

s/, they're faster/. They're faster/

-if (cache->cc_tupdesc == NULL)
+if (unlikely(cache->cc_tupdesc == NULL))
 CatalogCacheInitializeCache(cache);

I don't think it's this patch's job to do it, but it seems like we
ought to just invent some early-initialization step where things like
this can happen, so that we don't have to branch here at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Discussion on missing optimizations

2017-10-13 Thread Adam Brusselback
So from following this discussion and others focused on making the
planner "smarter", there is always an argument to be had over wasting
planner cycles, and it's always a hard fought battle to get any changes made.

Now, i'm speaking without any knowledge of the Postgres internals, so
please bear with me.  It seems like the underlying "feature" which would
allow more edge case optimizations into the planner would be some
framework for knowing when to consider using those optimizations, and
when to not waste time planning and skip expensive planning optimizations.

It seems with a system like that, maybe Postgres could still maintain
(or even improve) it's hard fought quick planning time for OLTP queries,
while being able to spend more time planning for OLAP queries where
spending an extra 500ms planning may save minutes of execution time.

I know in my database, I have analytical queries which I wouldn't mind spending
multiple seconds planning if necessary to produce an optimal plan,
because execution
time dominates everything.

Now what that system looks like is something I have no real opinion
or authority on.  It seems like there are three options though, automatic,
manual, and automatic with manual override.  Two of those, manual / automatic
with manual override seem to be a little too close to query hints for them to be
considered from all previous discussions I've heard (correct me if i'm wrong).
So that leaves automatic as the only option I can see being considered viable.

Now comes the question of if it's possible to automatically classify queries in
such a way that we can cheaply know what optimizations we should attempt,
and what should not even be considered because the planning time would
dominate.


I'll leave my thoughts at that, and await comments from the smarter people
in the room who know exactly why this idea wouldn't work.  The general thought
behind it is to make it easier for more and more query optimizations
to make it into
the planner, while not hurting queries which shouldn't spend their time on it.

If there are better ways of going about it, great!

Thanks for listening to me ramble,
-Adam


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql: Avoid coercing a whole-row variable that is already coerced

2017-10-13 Thread Tom Lane
Robert Haas  writes:
> On Fri, Oct 13, 2017 at 5:57 AM, Amit Khandekar  
> wrote:
>> One thing we can do is : instead of calling
>> map_variable_attnos_mutator(), convert the var inside the if block for
>> "if (IsA(node, ConvertRowtypeExpr))". Please check the attached patch.
>> There, I have avoided coerced_var context variable.

> Tom, is this more like what you have in mind?

It's better ... but after reading the patched code, a lot of my remaining
beef is with the lack of clarity of the comments.  You need ESP to
understand what the function is trying to accomplish and what the
constraints are.  I'll take a whack at improving that and push.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

2017-10-13 Thread Peter Geoghegan
On Fri, Oct 13, 2017 at 4:54 AM, Robert Haas  wrote:
> I haven't really followed this thread in depth, but I'm very nervous
> about the idea that we should ever be examining the raw-xmin of a
> tuple that has been marked HEAP_XMIN_FROZEN for anything other than
> forensic purposes.

I'm nervous about it as well. These problems are very difficult to reason about.

> The design principle I followed in writing the
> original patch was that marking a tuple HEAP_XMIN_FROZEN was identical
> to setting the xmin to 2 except that the original xmin was still
> available for manual inspection.

I did point this out myself.

> If we're finding that we need the
> raw xmin after freezing, doesn't that mean that our freezing algorithm
> was flat busted prior to that commit?

I wouldn't put it that way at all. That commit provided us with the
opportunity to put in a better fix for a problem with update chain
traversal, a problem that was particularly critical when certain HOT
pruning + freezing races occur (the "failed to find parent tuple"
thing). It's clear that freezing was just as busted before and after
that commit, though.

> And maybe still busted when
> things wrap around multiple times and raw-xmins are reused?

I'm more concerned about the situation that will remain (or has now
been created) on 9.3, where we don't have raw xmin, and so must use
very forgiving FrozenTransactionId matching within
HeapTupleUpdateXmaxMatchesXmin().

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-13 Thread Tom Lane
Heikki Linnakangas  writes:
> We've been doing that window agg thing for a long time, so I think 
> "works as window agg" should be the default for regular aggregates. For 
> ordered-set aggregates, "no merging, no more transfn() calls after 
> finalfn()" seems safest.

> It's a bit of a shame to have different defaults for regular and 
> ordered-set aggregates. But that is what we implicitly assume today, 
> anyway, and it's not too that hard to document.

It's kind of a pain in the rear, really, especially for purposes like
pg_dump, which will have to know the rule to decide whether it can
omit an SMODIFY clause.  (Which it should do if the value is default,
so as not to make back-porting of dumps harder than necessary.)

On reflection though I see little choice.  Right now it's impossible
to use an OSA as a window agg at all because the parser doesn't support
that combination of options, and nodeWindowAgg is lacking necessary
support too.  If we ever get around to relaxing that, it would really
be necessary that the default for user-created OSAs be "not safe as
window agg", because almost certainly they wouldn't be.  So I think
there's no option but to have different default safety levels for OSA
and normal aggs.  As long as we're forced into that anyway, it does
make sense to default to "not safe to merge" as well.  We don't know
for sure whether there are any user-created OSAs in the wild, but
if there are, they likely borrowed code from orderedsetaggs.c.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql: Avoid coercing a whole-row variable that is already coerced

2017-10-13 Thread Robert Haas
On Fri, Oct 13, 2017 at 5:57 AM, Amit Khandekar  wrote:
> One thing we can do is : instead of calling
> map_variable_attnos_mutator(), convert the var inside the if block for
> "if (IsA(node, ConvertRowtypeExpr))". Please check the attached patch.
> There, I have avoided coerced_var context variable.

Tom, is this more like what you have in mind?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate Locks for writes?

2017-10-13 Thread Alexander Korotkov
On Wed, Oct 11, 2017 at 7:27 PM, Robert Haas  wrote:

> On Wed, Oct 11, 2017 at 11:51 AM, Simon Riggs 
> wrote:
> > I'm inclined to believe Robert's hypothesis that there is some race
> > condition there.
> >
> > The question is whether that still constitutes a serializability
> > problem since we haven't done anything with the data, just passed it
> > upwards to be modified.
>
> Well, the question is whether passing it upwards constitutes reading
> it.  I kind of suspect it does.  The plan tree isn't just blindly
> propagating values upward but stuff with them.  There could be quite a
> bit between the ModifyTable node and the underlying scan if DELETE ..
> FROM or UPDATE .. USING is in use.
>

Right.  In general we can't skip SIReadLock just basing on the fact that
we're under ModifyTable node.
It seems still possible for me to skip SIReadLock in some very limited (but
probably typical) cases.
But before analyzing this deeper, it would be nice to estimate possible
benefits.
We can try some broken version which skip all SIReadLock's under
ModifyTable node and benchmark it.
If it wouldn't have significant performance benefit, then there is no
reason to do something further...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [COMMITTERS] pgsql: Improve performance of SendRowDescriptionMessage.

2017-10-13 Thread Tom Lane
Noah Misch  writes:
> I hacked psql to call PQtrace() and ran "psql -Xc 'select true'" in the
> defective configuration and in a working x64 GNU/Linux configuration.  I've
> attached both PQtrace() products.

Thanks.  It looks to me like the xlc build simply forgets to send some
of the T-message fields: the message length observed by the frontend
is too short, and the reported field values starting with "1140850688"
correspond to the actual contents of the subsequent D message, rather
than what should be there.

Studying the values that are returned, a plausible conclusion is that
in the sequence

pq_writestring(buf, NameStr(att->attname));
pq_writeint32(buf, resorigtbl);
pq_writeint16(buf, resorigcol);
pq_writeint32(buf, atttypid);
pq_writeint16(buf, att->attlen);
pq_writeint32(buf, atttypmod);
pq_writeint16(buf, format);

the pq_writeint32 calls are somehow becoming no-ops.  That would explain
the message being exactly 12 bytes too short, and the 6 bytes that are
there match what the pq_writeint16 calls should send.

Looking at the pq_writeintN function definitions, I'm annoyed by the fact
that Andres seems to have decided whether to write sizeof(ni) or sizeof(i)
with the aid of a ouija board.  That should be consistent.  I'd go with
sizeof(ni) myself, since that's the object actually being memcpy'd.
It seems unlikely that that could explain this bug, but maybe somehow
sizeof() misbehaves for a parameter that's been inlined away?

Anyway, I will go make the sizeof() usages consistent, just to satisfy
my own uncontrollable neatnik-ism.  Assuming that hornet stays red,
which is probable, we should disable "restrict" for that compiler.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Continuous integration on Windows?

2017-10-13 Thread Andrew Dunstan


On 10/13/2017 08:09 AM, Thomas Munro wrote:
> On Fri, Oct 13, 2017 at 1:42 PM, Andrew Dunstan
>  wrote:
>> Well, as you can see here the appveyor.yml file can live outside the
>> tree that's being built.
> Here's a Wiki page where I hope we can collect how-to information on
> this general topic:
>
> https://wiki.postgresql.org/wiki/Continuous_Integration
>
> I tried your appveyor.yml, and added:
>
> test_script:
>   - cd src\tools\msvc && vcregress check
>
> That much I could figure out just by reading our manual and I could
> see that it worked first time, but to make this really useful I guess
> we'd have to teach it to dump out resulting regression.diffs files and
> backtraces from core files (as the Travis CI version accessible from
> that page does).
>


I'll add some info to the wiki. Unfortunately, the tests fail on the
tablespace test because they are running as a privileged user. We need
to find a way around that, still looking into it. (See

which describes how I get around that when running by hand).

cheers

andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-13 Thread Tom Lane
Heikki Linnakangas  writes:
> On 10/13/2017 02:08 AM, Tom Lane wrote:
>> I started to look into fixing orderedsetaggs.c so that we could revert
>> 52328727b, and soon found a rather nasty problem.  Although the plain
>> OSAs seem amenable to supporting multiple finalfn calls on the same
>> transition state, the "hypothetical set" functions are not at all.
>> What they do is to accumulate all the regular aggregate input into
>> a tuplesort, then add the direct arguments as an additional "hypothetical"
>> row, and finally sort the result.  There's no realistic way of undoing
>> the addition of the hypothetical row, so these finalfns simply can't
>> share tuplesort state.

> The current implementation, with the extra flag column, is quite naive. 
> We could add some support to tuplesort to find a row with given 
> attributes, and call that instead of actually adding the hypothetical 
> row to the tuplesort and iterating to re-find it after performsort. For 
> a result set that fits in memory, you could even do a binary search 
> instead of linearly iterating through the result set.

I'd had some idle thoughts about using a heap to support window
aggregation of an OSA, rather than ever performing a full sort.
It'd probably work well as long as the window doesn't get wide enough
that the heap stops fitting in memory.  But in any case, the immediate
point is that we shouldn't just assume that every aggregate
implementation is capable of supporting state merging, and while we're
at it, it'd be better to have a non-syntax-based distinction between
aggs that can support use as window functions and aggs that can't.

Moving on to the exact color of the bikeshed: it seems like the right
way to present this to users of CREATE AGGREGATE is in terms of "does
the final function modify the transition state?".  So maybe the values
could be spelled

SMODIFY = READ_ONLY   ffunc never touches state, ok as window agg
SMODIFY = SHARABLEffunc does some one-time change like sorting,
  so state merging is OK but not window agg
SMODIFY = READ_WRITE  ffunc trashes state, can't do merging either

I'm not set on these names by any means; anyone have a better idea?
Also, I'm not sure whether we'd need a separate MMODIFY flag for
the moving-aggregate sub-implementation, but it seems possible.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] v10 bottom-listed

2017-10-13 Thread Magnus Hagander
On Fri, Oct 6, 2017 at 3:59 AM, Amit Langote 
wrote:

> On 2017/10/05 22:28, Erik Rijkers wrote:
> > In the 'ftp' listing, v10 appears at the bottom:
> >   https://www.postgresql.org/ftp/source/
> >
> > With all the other v10* directories at the top, we could get a lot of
> > people installing wrong binaries...
> >
> > Maybe it can be fixed so that it appears at the top.
>
> Still see it at the bottom.  Maybe ping pgsql-www?
>

Turns out there was already quite an ugly hack to deal with this, so I just
made it a bit more ugly. Once the caches expire I believe sorting should be
correct.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] new function for tsquery creartion

2017-10-13 Thread Victor Drobny

On 2017-10-13 16:37, Alexey Chernyshov wrote:

Hi all,
I am extending phrase operator  is such way that it will have 
syntax that means from n to m words, so I will use such syntax ()
further. I found that a AROUND(N) b is exactly the same as a <-N,N> b
and it can be replaced while parsing. So, what do you think of such
idea? In this patch I have noticed some unobvious behavior.


Thank you for the interest and review!


# select to_tsvector('Hello, cat world!') @@ queryto_tsquery('cat
AROUND(1) cat') as match;
match
---
 t

cat AROUND(1) cat is the same is "cat <1> cat || cat <0> cat" and:

# select to_tsvector('Hello, cat world!') @@ to_tsquery('cat <0> cat');
 ?column?
---
 t

It seems to be a proper logic behavior but it is a possible pitfall,
maybe it should be documented?


It is a tricky question. I think that this interpretation is confusing, 
so

better to make it as <-N, -1> and <1, N>.


But more important question is how AROUND() operator should handle stop
words? Now it works as:

# select queryto_tsquery('cat <2> (a AROUND(10) rat)');
 queryto_tsquery
--
 'cat' <12> 'rat'
(1 row)

# select queryto_tsquery('cat <2> a AROUND(10) rat');
queryto_tsquery

 'cat' AROUND(12) 'rat'
(1 row)

In my opinion it should be like:
cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-8,12>
rat


I think that correct version is:
cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-2,12> 
rat.



cat <2> a AROUND(10) rat == cat <2,2> a <-10,10> rat = cat <-8, 12>
rat


It is a problem indeed. I did not catch it during implementation. Thank 
you

for pointing it out.


Now  operator can be replaced with combination of phrase
operator , AROUND(), and logical operators, but with  operator
it will be much painless. Correct me, please, if I am wrong.


I think that  operator is more general than around(n) so the last 
one
should be based on yours. However, i think, that taking negative 
parameters
is not the best idea because it is confusing. On top of that it is not 
so

necessary and i think it won`t be popular among users.
It seems to me that AROUND operator can be easily implemented with 
,

also, it helps to avoid problems, that you showed above.

--
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new function for tsquery creartion

2017-10-13 Thread Alexey Chernyshov
Hi all,
I am extending phrase operator  is such way that it will have 
syntax that means from n to m words, so I will use such syntax ()
further. I found that a AROUND(N) b is exactly the same as a <-N,N> b
and it can be replaced while parsing. So, what do you think of such
idea? In this patch I have noticed some unobvious behavior.

# select to_tsvector('Hello, cat world!') @@ queryto_tsquery('cat
AROUND(1) cat') as match;
match 
---
 t

cat AROUND(1) cat is the same is "cat <1> cat || cat <0> cat" and:

# select to_tsvector('Hello, cat world!') @@ to_tsquery('cat <0> cat');
 ?column? 
---
 t

It seems to be a proper logic behavior but it is a possible pitfall,
maybe it should be documented?

But more important question is how AROUND() operator should handle stop
words? Now it works as:

# select queryto_tsquery('cat <2> (a AROUND(10) rat)');
 queryto_tsquery  
--
 'cat' <12> 'rat'
(1 row)

# select queryto_tsquery('cat <2> a AROUND(10) rat');
queryto_tsquery 

 'cat' AROUND(12) 'rat'
(1 row)

In my opinion it should be like:
cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-8,12>
rat 

cat <2> a AROUND(10) rat == cat <2,2> a <-10,10> rat = cat <-8, 12>
rat

Now  operator can be replaced with combination of phrase
operator , AROUND(), and logical operators, but with  operator
it will be much painless. Correct me, please, if I am wrong.

-- 
Alexey Chernyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Still another race condition in recovery TAP tests

2017-10-13 Thread Andrew Dunstan


On 10/13/2017 01:04 AM, Noah Misch wrote:
> On Fri, Oct 06, 2017 at 05:57:24PM +0800, Craig Ringer wrote:
>> On 6 October 2017 at 14:03, Noah Misch  wrote:
>>> On Fri, Sep 08, 2017 at 10:32:03PM -0400, Tom Lane wrote:
 (I do kinda wonder why we rolled our own RecursiveCopy; surely there's
 a better implementation in CPAN?)
>>> Fewer people will test as we grow the list of modules they must first 
>>> install.
>> Meh, I don't buy that. At worst, all we have to do is provide a script
>> that fetches them, from distro repos if possible, and failing that
>> from CPAN.
>>
>> With cpanminus, that's pretty darn simple too.
> If the tree had such a script and it were reliable, then yes, it would matter
> little whether the script procured one module or five.
>
>



Not everyone has cpanminus installed either. My approach in the
buildfarm code is to lean over backwards in order to avoid non-standard
modules. For the problem at hand we use cp/xcopy, but the tree being
copied is stable so we don't run into the disappearing/changing file
problem.


cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Continuous integration on Windows?

2017-10-13 Thread Thomas Munro
On Fri, Oct 13, 2017 at 1:42 PM, Andrew Dunstan
 wrote:
> Well, as you can see here the appveyor.yml file can live outside the
> tree that's being built.

Here's a Wiki page where I hope we can collect how-to information on
this general topic:

https://wiki.postgresql.org/wiki/Continuous_Integration

I tried your appveyor.yml, and added:

test_script:
  - cd src\tools\msvc && vcregress check

That much I could figure out just by reading our manual and I could
see that it worked first time, but to make this really useful I guess
we'd have to teach it to dump out resulting regression.diffs files and
backtraces from core files (as the Travis CI version accessible from
that page does).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

2017-10-13 Thread Robert Haas
On Thu, Oct 12, 2017 at 7:31 PM, Alvaro Herrera  wrote:
>> Wouldn't this last "if" test, to cover the pg_upgrade case, be better
>> targeted by comparing *raw* xmin to FrozenTransactionId? You're using
>> the potentially distinct xmin value returned by
>> HeapTupleHeaderGetXmin() for the test here. I think we should be
>> directly targeting tuples frozen on or before 9.4 (prior to
>> pg_upgrade) instead.
>
> Yes, agreed, I should change that.  Thanks for continuing to think about
> this.

I haven't really followed this thread in depth, but I'm very nervous
about the idea that we should ever be examining the raw-xmin of a
tuple that has been marked HEAP_XMIN_FROZEN for anything other than
forensic purposes.  The design principle I followed in writing the
original patch was that marking a tuple HEAP_XMIN_FROZEN was identical
to setting the xmin to 2 except that the original xmin was still
available for manual inspection.  If we're finding that we need the
raw xmin after freezing, doesn't that mean that our freezing algorithm
was flat busted prior to that commit?  And maybe still busted when
things wrap around multiple times and raw-xmins are reused?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in execReplication.c

2017-10-13 Thread Simon Riggs
On 13 October 2017 at 09:13, Masahiko Sawada  wrote:
> On Thu, Oct 12, 2017 at 11:30 PM, Robert Haas  wrote:
>> On Thu, Oct 12, 2017 at 6:55 AM, Petr Jelinek
>>  wrote:
>>> Thanks for the patch, looks correct to me.
>>
>> Committed and back-patched to v10.

Well spotted both of you!

Shows that reading code and correcting comments is useful activity.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql: Avoid coercing a whole-row variable that is already coerced

2017-10-13 Thread Amit Khandekar
Bringing here the mail thread from pgsql-committers  regarding this commit :

commit 1c497fa72df7593d8976653538da3d0ab033207f
Author: Robert Haas 
Date:   Thu Oct 12 17:10:48 2017 -0400

Avoid coercing a whole-row variable that is already coerced.

Marginal efficiency and beautification hack.  I'm not sure whether
this case ever arises currently, but the pending patch for update
tuple routing will cause it to arise.

Amit Khandekar

Discussion:
http://postgr.es/m/caj3gd9cazfppe7-wwubabpcq4_0subkipfd1+0r5_dkvnwo...@mail.gmail.com


Tom Lane  wrote:
> Robert Haas  wrote:
> > Avoid coercing a whole-row variable that is already coerced.
>
> This logic seems very strange and more than likely buggy: the
> added coerced_var flag feels like the sort of action at a distance
> that is likely to have unforeseen side effects.
>
> I'm not entirely sure what the issue is here, but if you're concerned
> about not applying two ConvertRowtypeExprs in a row, why not have the
> upper one just strip off the lower one?  We handle, eg, nested
> RelabelTypes that way.

We kind of do a similar thing. When a ConvertRowtypeExpr node is
encountered, we create a new ConvertRowtypeExpr that points to a new
var, and return this new ConvertRowtypeExpr instead of the existing
one. So we actually replace the old with a new one. But additionally,
we also want to change the vartype of the new var to
context->to_rowtype.

I think you are worried specifically about coerced_var causing
unexpected regression in existing scenarios, such as :
context->coerced_var getting set and prematurely unset in recursive
scenarios. But note that, when we call map_variable_attnos_mutator()
just after setting context->coerced_var = true,
map_variable_attnos_mutator() won't recurse further, because it is
always called with a Var, which does not have any further arguments to
process. So coerced_var won't be again changed until we return from
map_variable_attnos_mutator().

The only reason why we chose to call map_variable_attnos_mutator()
with a Var is so that we can re-use the code that converts the whole
row var.

One thing we can do is : instead of calling
map_variable_attnos_mutator(), convert the var inside the if block for
"if (IsA(node, ConvertRowtypeExpr))". Please check the attached patch.
There, I have avoided coerced_var context variable.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


handle-redundant-ConvertRowtypeExpr-node.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-10-13 Thread Kuntal Ghosh
On Fri, Oct 13, 2017 at 1:58 PM, Haribabu Kommi
 wrote:
>
>
> On Fri, Oct 13, 2017 at 11:55 AM, Robert Haas  wrote:
>>
>> On Thu, Oct 12, 2017 at 8:00 PM, Haribabu Kommi
>>  wrote:
>>
>> That seems like a strange choice of API.  I think it should more
>> integrated with the scan logic.  For example, if I'm doing an index
>> scan, and I get a TID, then I should be able to just say "here's a
>> TID, give me any tuples associated with that TID that are visible to
>> the scan snapshot".  Then for the current heap it will do
>> heap_hot_search_buffer, and for zheap it will walk the undo chain and
>> return the relevant tuple from the chain.
>
>
> OK, Understood.
> I will check along these lines and come up with storage API's.
>
I've some doubts regarding the following function hook:

+typedef bool (*hot_search_buffer_hook) (ItemPointer tid, Relation relation,
+Buffer buffer, Snapshot snapshot, HeapTuple heapTuple,
+bool *all_dead, bool first_call);

As per my understanding, with HOT feature  a new tuple placed on the
same page and with all indexed columns the same as its parent row
version does not get new index entries (README.HOT). For some other
storage engine, if we maintain the older version in different storage,
undo for example, and don't require a new index entry, should we still
call it HOT-chain? If not, IMHO, we may have something like
*search_buffer_hook(tid,,storageTuple,...). Depending on the
underlying storage, one can traverse hot-chain or undo-chain or some
other multi-version strategy for non-key updates.

After a successful index search, most of the index AMs set
(HeapTupleData)xs_ctup->t_self of IndexScanDescData to the tupleid in
the storage. IMHO, some changes are needed here to make it generic.

@@ -328,47 +376,27 @@ ExecStoreTuple(HeapTuple tuple,
  Assert(tuple != NULL);
  Assert(slot != NULL);
  Assert(slot->tts_tupleDescriptor != NULL);
+ Assert(slot->tts_storageslotam != NULL);
  /* passing shouldFree=true for a tuple on a disk page is not sane */
  Assert(BufferIsValid(buffer) ? (!shouldFree) : true);
For some storage engine, isn't it possible that the buffer is valid
and the tuple to be stored is formed in memory (for example, tuple
formed from UNDO, in-memory decrypted tuple etc.)

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-13 Thread Heikki Linnakangas

On 10/13/2017 02:41 AM, Tom Lane wrote:

David Rowley  writes:

On 13 October 2017 at 12:08, Tom Lane  wrote:

Therefore, I think we need to bite the bullet and provide an aggregate
property (CREATE AGGREGATE argument / pg_aggregate column) that tells
whether the aggregate supports transition state merging.  Likely this
should have been in the state-merging patch to begin with, but better
late than never.



Are you considering that this is an option only for ordered-set
aggregates or for all?


All.


If the user defines their normal aggregate as not safe for merging,
then surely it'll not be suitable to be used as a window function
either, since the final function will also be called there multiple
times per state.


Yeah, we would probably also want to check the flag in nodeWindowAgg.
Not sure exactly how that should play out --- maybe we end up with
a tri-valued property "works as normal agg without merging, works
as normal agg with merging, works as window agg".  But this would
arguably be an improvement over the current situation.  Right now
I'm sure there are user-written aggs out there that will just crash
if used as a window agg, and the authors don't have much choice because
the performance costs of not modifying the transition state in the
finalfn are higher than they're willing to bear.  At least with a
flag they could ensure that the case will fail cleanly.


Sounds right to me. I'm not so sure there really are aggregates out 
there that would crash today if used as a window aggregate, but it sure 
would be nice to give some control on that.


We've been doing that window agg thing for a long time, so I think 
"works as window agg" should be the default for regular aggregates. For 
ordered-set aggregates, "no merging, no more transfn() calls after 
finalfn()" seems safest.


It's a bit of a shame to have different defaults for regular and 
ordered-set aggregates. But that is what we implicitly assume today, 
anyway, and it's not too that hard to document.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Lockable views

2017-10-13 Thread Yugo Nagata
On Thu, 12 Oct 2017 13:11:45 +0900 (JST)
Tatsuo Ishii  wrote:

> >> test=# CREATE VIEW v3 AS SELECT count(*) FROM v1;
> >> CREATE VIEW
> >> test=# BEGIN;
> >> BEGIN
> >> test=# LOCK TABLE v3;
> >> ERROR:  cannot lock view "v3"
> >> DETAIL:  Views that return aggregate functions are not automatically 
> >> updatable.
> > 
> > It would be nice if the message would be something like:
> > 
> > DETAIL:  Views that return aggregate functions are not lockable

This uses messages from view_query_is_auto_updatable() of the rewrite system 
directly. 
Although we can modify the messages, I think it is not necessary for now
since we can lock only automatically updatable views.

> > 
> >> test=# END;
> >> ROLLBACK
> >> 
> >> test=# CREATE FUNCTION fnc() RETURNS trigger AS $$ BEGIN RETURN NEW; END; 
> >> $$ LANGUAGE plpgsql;
> >> CREATE FUNCTION
> >> test=# CREATE TRIGGER trg INSTEAD OF INSERT ON v1 FOR EACH ROW EXECUTE 
> >> PROCEDURE fnc();
> >> CREATE TRIGGER
> >> test=# BEGIN;
> >> BEGIN
> >> test=# LOCK TABLE v1;
> >> ERROR:  cannot lock view "v1"
> >> DETAIL:  views that have an INSTEAD OF trigger are not lockable
> >> test=# END;
> >> ROLLBACK
> > 
> > I wonder if we should lock tables in a subquery as well. For example,
> > 
> > create view v1 as select * from t1 where i in (select i from t2);
> > 
> > In this case should we lock t2 as well?
> 
> Current the patch ignores t2 in the case above.
> 
> So we have options below:
> 
> - Leave as it is (ignore tables appearing in a subquery)
> 
> - Lock all tables including in a subquery
> 
> - Check subquery in the view definition. If there are some tables
>   involved, emit an error and abort.
> 
> The first one might be different from what users expect. There may be
> a risk that the second one could cause deadlock. So it seems the third
> one seems to be the safest IMO.

Make sense. Even if the view is locked, when tables in a subquery is
modified, the contents of view can change. To avoid it, we have to
lock tables, or give up to lock such views. 

We can say the same thing for functions in a subquery. If the definition
of the functions are changed, the result of the view can change.
We cannot lock functions, but should we abtain row-level lock on pg_proc
in such cases? (of cause, or give up to lock such views)

BTW, though you mentioned the risk of deadlocks, even when there
are no subquery, deadlock can occur in the current patch.

For example, lock a table T in Session1, and then lock a view V
whose base relation is T in Session2. Session2 will wait for 
Session1 to release the lock on T. After this, when Session1 try to
lock view V, the deadlock occurs and the query is canceled.

Is this unacceptable behavior?

> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in execReplication.c

2017-10-13 Thread Masahiko Sawada
On Thu, Oct 12, 2017 at 11:30 PM, Robert Haas  wrote:
> On Thu, Oct 12, 2017 at 6:55 AM, Petr Jelinek
>  wrote:
>> Thanks for the patch, looks correct to me.
>
> Committed and back-patched to v10.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-10-13 Thread Haribabu Kommi
On Fri, Oct 13, 2017 at 11:55 AM, Robert Haas  wrote:

> On Thu, Oct 12, 2017 at 8:00 PM, Haribabu Kommi
>  wrote:
> > Currently I added a snapshot_satisfies API to find out whether the tuple
> > satisfies the visibility or not with different types of visibility
> routines.
> > I feel these
> > are some how enough to develop a different storage methods like UNDO.
> > The storage methods can decide internally how to provide the visibility.
> >
> > + amroutine->snapshot_satisfies[MVCC_VISIBILITY] =
> HeapTupleSatisfiesMVCC;
> > + amroutine->snapshot_satisfies[SELF_VISIBILITY] =
> HeapTupleSatisfiesSelf;
> > + amroutine->snapshot_satisfies[ANY_VISIBILITY] = HeapTupleSatisfiesAny;
> > + amroutine->snapshot_satisfies[TOAST_VISIBILITY] =
> HeapTupleSatisfiesToast;
> > + amroutine->snapshot_satisfies[DIRTY_VISIBILITY] =
> HeapTupleSatisfiesDirty;
> > + amroutine->snapshot_satisfies[HISTORIC_MVCC_VISIBILITY] =
> > HeapTupleSatisfiesHistoricMVCC;
> > + amroutine->snapshot_satisfies[NON_VACUUMABLE_VISIBILTY] =
> > HeapTupleSatisfiesNonVacuumable;
> > +
> > + amroutine->snapshot_satisfiesUpdate = HeapTupleSatisfiesUpdate;
> > + amroutine->snapshot_satisfiesVacuum = HeapTupleSatisfiesVacuum;
> >
> > Currently no changes are carried out in snapshot logic as that is kept
> > seperate
> > from storage API.
>
> That seems like a strange choice of API.  I think it should more
> integrated with the scan logic.  For example, if I'm doing an index
> scan, and I get a TID, then I should be able to just say "here's a
> TID, give me any tuples associated with that TID that are visible to
> the scan snapshot".  Then for the current heap it will do
> heap_hot_search_buffer, and for zheap it will walk the undo chain and
> return the relevant tuple from the chain.


OK, Understood.
I will check along these lines and come up with storage API's.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Discussion on missing optimizations

2017-10-13 Thread Laurenz Albe
Stephen Frost wrote:
> * Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> > Robert Haas wrote:
> > > One trick that some system use is avoid replanning as much as we do
> > > by, for example, saving plans in a shared cache and reusing them even
> > > in other sessions.  That's hard to do in our architecture because the
> > > controlling GUCs can be different in every session and there's not
> > > even any explicit labeling of which GUCs control planner behavior. But
> > > if you had it, then extra planning cycles would be, perhaps, more
> > > tolerable.
> > > From my experience with Oracle I would say that that is a can of worms.
> > 
> > Perhaps it really brings the performance benefits they claim, but
> > a) there have been a number of bugs where the wrong plan got used
> >(you have to keep several plans for the same statement around,
> >since - as you say - different sessions have different environments)
> 
> I'm not sure this is really a fair strike against this concept- bugs
> happen, even bugs in planning, and what we need is more testing, imv, to
> reduce the number and minimize the risk as much as we can.

Right, I guess I didn't express my concern properly.
Bugs can certainly happen, but if a certain feature is particularly
rich in them, I take it as an indication that it is something difficult
to get right.

> > b) it is a frequent problem that this shared memory area grows
> >too large if the application does not use prepared statements
> >but dynamic SQL with varying constants.
> 
> This just requires that the memory area be managed somehow, not unlike
> how our shared buffers have to deal with evicting out old pages.
> There's a challenge there around making sure that it doesn't make the
> performance of the system be much worse than not having a cache at all,
> naturally, but given that a lot of people use pg_stat_statements to good
> effect and without much in the way of complaints, it seems like it might
> be possible make it work reasonably (just imagining a generic plan being
> attached to pg_stat_statements with some information about if the
> generic plan works well or not, blah blah, hand waving goes here).

pg_stat_statements is quite different, since it ignores constants.

I don't know how often I have seen the advice to use dynamic SQL
because a generic plan was not so great, and in OLAP you usually
want each individual query to be planned.

So I think it is important that a plan is only reused if it matches
the query exactly.  Oracle has an option CURSOR_SHARING = FORCE to
reuse plans even if they were planned with different constants, but
their own documentation warns against using it:
http://docs.oracle.com/database/122/TGSQL/improving-rwp-cursor-sharing.
htm#GUID-7FF4E133-06A7-401E-9BFC-3B0B9C902346

Maybe it is an idea to only cache generic plans in a shared area?

Yours,
Laurenz Albe



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-13 Thread Heikki Linnakangas

On 10/13/2017 02:08 AM, Tom Lane wrote:

I started to look into fixing orderedsetaggs.c so that we could revert
52328727b, and soon found a rather nasty problem.  Although the plain
OSAs seem amenable to supporting multiple finalfn calls on the same
transition state, the "hypothetical set" functions are not at all.
What they do is to accumulate all the regular aggregate input into
a tuplesort, then add the direct arguments as an additional "hypothetical"
row, and finally sort the result.  There's no realistic way of undoing
the addition of the hypothetical row, so these finalfns simply can't
share tuplesort state.

You could imagine accumulating the regular input into a tuplestore
and then copying it into a tuplesort in each finalfn call.  But that
seems pretty icky, and I'm not sure it'd really be any more performant
than just keeping separate tuplesort objects for each aggregate.


The current implementation, with the extra flag column, is quite naive. 
We could add some support to tuplesort to find a row with given 
attributes, and call that instead of actually adding the hypothetical 
row to the tuplesort and iterating to re-find it after performsort. For 
a result set that fits in memory, you could even do a binary search 
instead of linearly iterating through the result set.


I'm not suggesting that we do that right now, though. And even if we 
did, there might be other aggregates out there that are not safe to 
merge, so we should  still add the option to CREATE AGGREGATE. But 
that'd be nice little project, if someone wanted to make those functions 
faster.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-13 Thread David Rowley
On 13 October 2017 at 19:36, Jeevan Chalke
 wrote:
> I have tried exactly same tests to get to this factor on my local developer
> machine. And with parallelism enabled I got this number as 7.9. However, if
> I disable the parallelism (and I believe David too disabled that), I get
> this number as 1.8. Whereas for 1 rows, I get this number to 1.7
>
> -- With Gather
> # select current_Setting('cpu_tuple_cost')::float8 / ((10633.56 * (81.035 /
> 72.450) - 10633.56)  / 100);
> 7.9
>
> -- Without Gather
> # select current_Setting('cpu_tuple_cost')::float8 / ((16925.01 * (172.838 /
> 131.400) - 16925.01)  / 100);
> 1.8
>
> -- With 1 rows (so no Gather too)
> # select current_Setting('cpu_tuple_cost')::float8 / ((170.01 * (1.919 /
> 1.424) - 170.01)  / 1);
> 1.7
>
> So it is not so straight forward to come up the correct heuristic here. Thus
> using 50% of cpu_tuple_cost look good to me here.
>
> As suggested by Ashutosh and Robert, attached separate small WIP patch for
> it.

Good to see it stays fairly consistent at different tuple counts, and
is not too far away from what I got on this machine.

I looked over the patch and saw this:

@@ -1800,6 +1827,9 @@ cost_merge_append(Path *path, PlannerInfo *root,
  */
  run_cost += cpu_operator_cost * tuples;

+ /* Add MergeAppend node overhead like we do it for the Append node */
+ run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
+
  path->startup_cost = startup_cost + input_startup_cost;
  path->total_cost = startup_cost + run_cost + input_total_cost;
 }

You're doing that right after a comment that says we don't do that. It
also does look like the "run_cost += cpu_operator_cost * tuples" is
trying to do the same thing, so perhaps it's worth just replacing
that, which by default will double that additional cost, although
doing so would have the planner slightly prefer a MergeAppend to an
Append than previously.

+#define DEFAULT_APPEND_COST_FACTOR 0.5

I don't really think the DEFAULT_APPEND_COST_FACTOR adds much. it
means very little by itself. It also seems that most of the other cost
functions just use the magic number.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-13 Thread Jeevan Chalke
On Tue, Oct 10, 2017 at 1:31 PM, David Rowley 
wrote:

> On 10 October 2017 at 17:57, Ashutosh Bapat
>  wrote:
> > Append node just returns the result of ExecProcNode(). Charging
> > cpu_tuple_cost may make it too expensive. In other places where we
> > charge cpu_tuple_cost there's some processing done to the tuple like
> > ExecStoreTuple() in SeqNext(). May be we need some other measure for
> > Append's processing of the tuple.
>
> I don't think there's any need to invent any new GUC. You could just
> divide cpu_tuple_cost by something.
>
> I did a quick benchmark on my laptop to see how much Append really
> costs, and with the standard costs the actual cost seems to be about
> cpu_tuple_cost / 2.4. So probably cpu_tuple_cost / 2 might be
> realistic. create_set_projection_path() does something similar and
> brincostestimate() does some similar magic and applies 0.1 *
> cpu_operator_cost to the total cost.
>
> # create table p (a int, b int);
> # create table p1 () inherits (p);
> # insert into p1 select generate_series(1,100);
> # vacuum analyze p1;
> # \q
> $ echo "select count(*) from p1;" > p1.sql
> $ echo "select count(*) from p;" > p.sql
> $ pgbench -T 60 -f p1.sql -n
>
> latency average = 58.567 ms
>
> $ pgbench -T 60 -f p.sql -n
> latency average = 72.984 ms
>
> $ psql
> psql (11devel)
> Type "help" for help.
>
> # -- check the cost of the plan.
> # explain select count(*) from p1;
> QUERY PLAN
> --
>  Aggregate  (cost=16925.00..16925.01 rows=1 width=8)
>->  Seq Scan on p1  (cost=0.00..14425.00 rows=100 width=0)
> (2 rows)
>
> # -- selecting from the parent is the same due to zero Append cost.
> # explain select count(*) from p;
>QUERY PLAN
> 
>  Aggregate  (cost=16925.00..16925.01 rows=1 width=8)
>->  Append  (cost=0.00..14425.00 rows=101 width=0)
>  ->  Seq Scan on p  (cost=0.00..0.00 rows=1 width=0)
>  ->  Seq Scan on p1  (cost=0.00..14425.00 rows=100 width=0)
> (4 rows)
>
> # -- extrapolate the additional time taken for the Append scan and
> work out what the planner
> # -- should add to the plan's cost, then divide by the number of rows
> in p1 to work out the
> # -- tuple cost of pulling a row through the append.
> # select (16925.01 * (72.984 / 58.567) - 16925.01)  / 100;
> ?column?
> 
>  0.00416630302337493743
> (1 row)
>
> # show cpu_tuple_cost;
>  cpu_tuple_cost
> 
>  0.01
> (1 row)
>
> # -- How does that compare to the cpu_tuple_cost?
> # select current_Setting('cpu_tuple_cost')::float8 /
> 0.00416630302337493743;
> ?column?
> 
>  2.400209476818
> (1 row)
>
> Maybe it's worth trying with different row counts to see if the
> additional cost is consistent, but it's probably not worth being too
> critical here.
>

I have tried exactly same tests to get to this factor on my local developer
machine. And with parallelism enabled I got this number as 7.9. However, if
I disable the parallelism (and I believe David too disabled that), I get
this number as 1.8. Whereas for 1 rows, I get this number to 1.7

-- With Gather
# select current_Setting('cpu_tuple_cost')::float8 / ((10633.56 * (81.035 /
72.450) - 10633.56)  / 100);
7.9

-- Without Gather
# select current_Setting('cpu_tuple_cost')::float8 / ((16925.01 * (172.838
/ 131.400) - 16925.01)  / 100);
1.8

-- With 1 rows (so no Gather too)
# select current_Setting('cpu_tuple_cost')::float8 / ((170.01 * (1.919 /
1.424) - 170.01)  / 1);
1.7

So it is not so straight forward to come up the correct heuristic here.
Thus using 50% of cpu_tuple_cost look good to me here.

As suggested by Ashutosh and Robert, attached separate small WIP patch for
it.

I think it will be better if we take this topic on another mail-thread.
Do you agree?


>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ce32b8a4..c0bf602 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1742,6 +1742,33 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * Though Append doesn't do any selection or projection, it's not free.  So we
+ * try to add cost per input tuple which is arbitrarily calculated as
+ * DEFAULT_APPEND_COST_FACTOR * cpu_tuple_cost.
+ *
+ * 'input_startup_cost' is the sum of the input streams' startup costs
+ * 'input_total_cost' is the sum of the input streams' 

Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended

2017-10-13 Thread Amit Kapila
On Fri, Oct 13, 2017 at 10:32 AM, Michael Paquier
 wrote:
> On Thu, Oct 12, 2017 at 10:47 PM, Amit Kapila  wrote:
>> Today, I was trying to think about cases when we can return BLK_DONE
>> in XLogReadBufferForRedoExtended.  One thing that occurred to me is
>> that it can happen during the replay of WAL if the full_page_writes is
>> off.  Basically, when the full_page_writes is on, then during crash
>> recovery, it will always first restore the full page image of page and
>> then apply the WAL corresponding to that page, so we will never hit
>> the case where the lsn of the page is greater than lsn of WAL record.
>>
>> Are there other cases for which we can get BLK_DONE state?  Is it
>> mentioned somewhere in code/comments which I am missing?
>
> Remember the thread about meta page optimization... Some index AMs use
> XLogInitBufferForRedo() to redo their meta pages and those don't have
> a FPW so when doing crash recovery you may finish by not replaying a
> meta page if its LSN on the page header is newer than what's being
> replayed.
>

I think for metapage usage, it will anyway apply the changes.   See
_bt_restore_page.

> That's another case where BLK_DONE can be reached, even if
> full_page_writes is on.
>

Yeah and probably if someone uses REGBUF_NO_IMAGE.  However, I was
mainly thinking about cases where caller is not doing anything to
prevent full_page_image explicitly.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oversight in EphemeralNamedRelation support

2017-10-13 Thread Julien Rouhaud
On Fri, Oct 13, 2017 at 5:22 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> On Fri, Oct 13, 2017 at 12:46 PM, Tom Lane  wrote:
>>> Yeah, I agree --- personally I'd never write a query like that.  But
>>> the fact that somebody ran into it when v10 has been out for barely
>>> a week suggests that people are doing it.
>
>> Not exactly -- Julien's bug report was about a *qualified* reference
>> being incorrectly rejected.
>
> Nonetheless, he was using a CTE name equivalent to the name of the
> query's target table.  That's already confusing IMV ... and it does
> not seem unreasonable to guess that he only qualified the target
> because it stopped working unqualified.

FWIW, the original (and much more complex) query Hugo sent me was
inserting data in a qualified table name (the schema wasn't public,
and I assume not in his search_path).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers