[HACKERS] a word-choice question

2015-12-05 Thread Chapman Flack
In PL/Java's function annotations, 'type' has been used for the
volatility category:


@Function(type=IMMUTABLE)
public static String hello() { return "Hello, world!"; }


It seems a bit infelicitous because you would probably sooner guess that
'type' was telling you something about the function's _data_ type (and
indeed there are times when PL/Java can't infer the SQL data type and you
need to say it in the annotation, which has to be done with a different,
less obvious keyword because 'type' has been used for the volatility
category).  :(

It has been that way since the first inchoate commits of annotation code
eleven years ago, but I don't think it was ever usably complete until
2013. That's since the most recent numbered PL/Java release (there
hasn't been one since 1.4.3 in 2011), so it is possible to say there
hasn't yet been a numbered release featuring the code annotations.

So I am thinking there may never be a better time to change those keywords,
if they are worth changing, so 'type' would be for the function return type,
and some other word for the volatility category. At this stage, a change
might inconvenience some people who have been building from the github head
the last couple of years, but it couldn't affect anyone else.

One obvious choice for that word would be 'volatility' - after all,
'volatility category' is the term used in the PostgreSQL manual.  But
it's long, and five syllables to say, and the goofy redundancy in an
annotation like (volatility=VOLATILE) makes me snicker the same way
I can't help when I type 'set log_error_verbosity to verbose'.

Argh, verbose verbosity AND volatile volatility.

But I've been trying think of something short, clear, preferably
monosyllabic, less geeky than 'v8y', and I don't have a winner yet.
I've flirted with 'hoist', from thinking about the kind of optimization
the category is there to control ...

hoist=STABLEhoist=IMMUTABLEhoist=VOLATILE

meh ...

effect=STABLE   effect=IMMUTABLE   effect=VOLATILE

...

Is anyone thinking of an obvious, perfect short word for that thing
that is just eluding me at the moment? I'm starting to warm to 'effect'.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size of Path nodes

2015-12-05 Thread Robert Haas
On Sat, Dec 5, 2015 at 9:11 PM, Robert Haas  wrote:
> here.  For example, in the case of nodeSeqscan.c, almost two-thirds of
> the new lines of code are to add three additional methods that aren't
> called in parallel aware mode, and about half of the rest are due to a
> SeqScanState now having one integer field that's not part of
> ScanState, which require mechanical changes in a bunch of places ("ps"
> becomes "ss.ps").  The rest amounts to 15-20 lines of real code
> change.

Oops, bad proofreading.  It should say "are ONLY called in parallel
mode", not "aren't called in parallel aware mode".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size of Path nodes

2015-12-05 Thread Robert Haas
On Sat, Dec 5, 2015 at 2:29 PM, Tom Lane  wrote:
> Meh.  Size of first patch is a pretty bad guide to whether a particular
> structural decision is going to be a good idea in the long run.  Do you
> need some examples of patches we've rejected because they were done in a
> way that minimized the size of the patch rather than a way that made sense
> from a system structural viewpoint?

You can provide some examples if you like, but I thought about the
issue of overall system structure and settled on this design after
coming to the conclusion that this design was better from that
viewpoint. If it becomes clear that I'm wrong, fine: we can change it.
I think I've done a pretty good job - I'm going to go out on a limb
and say an excellent job - modularizing the parallelism code so that,
if it turns out I've picked the wrong design for some part of it, we
can replace just that part without having a huge impact on all the
rest of it.  But at this point I am not by any means convinced that
I've got it wrong.  I've been working on this for several years and
have thought about it quite deeply, discussed it extensively on and
off the mailing list with anyone and everyone who was willing to
participate in that discussion, and read what relevant literature I
could find.  That doesn't guarantee that I've got it right, but if
your premise here is that I haven't thought about system structure,
I'm going to say that premise is wrong.  I've thought about it a lot.

> I would also point out that if we'd followed this approach in the past,
> nodeIndexscan.c, nodeBitmapIndexscan.c and nodeIndexonlyscan.c would be
> one file, and it would be nigh unintelligible.  Those files nonetheless
> manage to share code where it makes sense for them to.

True, but I don't think that's a good analogy.  I think the best
analogy to parallelism is parameterized plans.  And we DON'T have
separate nodes for parameterized index scans and unparameterized index
scans, nor do we have separate path nodes.  We have something called
an index scan and it can be parameterized or unparameterized, and we
keep track of that distinction during both planning and execution.  So
here.  For example, in the case of nodeSeqscan.c, almost two-thirds of
the new lines of code are to add three additional methods that aren't
called in parallel aware mode, and about half of the rest are due to a
SeqScanState now having one integer field that's not part of
ScanState, which require mechanical changes in a bunch of places ("ps"
becomes "ss.ps").  The rest amounts to 15-20 lines of real code
change.

It may be useful context - although you will already know this, if you
have been reading the relevant threads - to know that up until a
couple of months ago I actually had this exactly the way you are
describing, with parallel sequential scan as a separate node type,
mostly because I know that you've leaned in the direction of making
things separate node types whenever there's any doubt about the
matter.  The main reason I changed my mind, aside from the massive
code duplication that seemed to be resulting, was the realization that
every scan type needs a parallel-aware mode, and that the
parallel-aware mode need to degenerate back to exactly the same thing
as the non-parallel-aware case when no workers are available.  It
makes no sense to me to say that we should double the number of
executor nodes, or even the number of path nodes, just because we have
parallelism.  I'm sure we'd never actually double it, as there are
some plan nodes like Result for which I can't foresee a parallel-aware
version.  But we'd add a heck of a lot of them, and in every case the
parallel-aware version would be a strict superset of what the
non-parallel-aware version can do.

Consider Append, for example.  A regular Append runs all of the child
plans one after another.  A parallel-aware Append, like any other
parallel-aware node, will be running in multiple copies, one per
worker.  Well, what it should do in order to minimize contention is
try to spread out the workers among the subplans so as to minimize
contention, rather than throwing all the workers at the first subplan,
then when that gets done throwing them all at the second subplan, and
so forth.  See the parallel append thread for further discussion of
these issues.  Now, if you are in a parallel plan, you always want
this behavior: there is never any reason (at least that I can see
currently) to prefer the regular Append behavior (though the
architecture I've chosen could support that if it turns out we need
it).  And if you are not in a parallel plan, the behavior of the
parallel-aware version degenerates to exactly what our current Append
node does anyway.  So if you separate that into two nodes, a regular
Append and a Parallel Append, it's just stupid.  You just have one
node that is a dumber version of another node, and you have code
somewhere to distinguish between them when you could just as well
always use the more capable one.

I 

Re: [HACKERS] dynloader.h missing in prebuilt package for Windows?

2015-12-05 Thread Bruce Momjian
On Fri, Dec  4, 2015 at 07:09:03PM -0500, Chapman Flack wrote:
> On 12/04/15 12:56, Bruce Momjian wrote:
> > 
> > OK, good logical reason to install dynloader.h on Windows.
> 
> Ah!  Thanks.  I was starting to wonder whether I had said something wrong
> and been sent to the bit bucket within my first two -hackers posts. :)

No, sometimes people just don't know how to respond, particuarly related
to technology we don't use regularly.

> > What do we need to do to close this item?  What versions of Windows
> > installers are missing dynloader.h?  Mingw?  MSVC?  EDB's?  OpenSCG?
> > Postgres Pro (Russian)?
> 
> I am too new around here to be able to supply good answers myself to
> those questions.  I keep learning though.  For example, I have just
> learned that OpenSCG and Postgres Pro (Russian) are things, and (by
> inference) that they run on Windows. :)

Yes, there are several binary Postgres distributions.  However, in
researching, I think there is a central way to fix them all.

> In working on PL/Java, as a non-Windows dev myself, I have been
> blessed to find Ken Olson willing to build my WIP branches in MSVC
> and tell me what I've busted. I think he's building against an EDB
> installation of PostgreSQL, so that's the combination I am least
> ignorant about. I know that mingw has also been used, but I haven't
> yet made a connection with someone who can tell me what I'm breaking
> for that build

OK, that sounds good.  I assume he is using MSVC to build PL/Java, and
then using the EDB server binaries, which should work fine.

> > Is this a change we need to make on the server end and then all the
> > installers will automatically install the file?  It is present in all
> > Unix-like installs?
> 
> AFAICS, it must at one time have been envisioned that sometimes
> extension code might like a nice dynloader; the whole way that
> backend/port/dynloader contains a version for every platform, and
> the configure script links the appropriate one into src/include with
> all the other .h files that would go into a -devel package, makes me
> _think_ that it'd be present in any install that used configure in
> the build. Anyone building a -devel package would have to go out
> of their way to exclude it but still package the other .h files.

Yes, it should.  Looking at the 'install' Makefile rule in
include/Makefile I see:

cp $(srcdir)/*.h '$(DESTDIR)$(includedir_server)'/ || exit; \
chmod $(INSTALL_DATA_MODE) '$(DESTDIR)$(includedir_server)'/*.h  || 
exit; \
for dir in $(SUBDIRS); do \
  cp $(srcdir)/$$dir/*.h '$(DESTDIR)$(includedir_server)'/$$dir/ || 
exit; \
  chmod $(INSTALL_DATA_MODE) '$(DESTDIR)$(includedir_server)'/$$dir/*.h 
 || exit; \
done

This copies all the *.h files in src/include during install.  If I look
at my Debian source install, I see the dynloader.h installed as
include/server/dynloader.h, which is what you want.  In fact, there are
many include files installed in include/server:

c.hpg_config_ext.h pgtime.h postgres_ext.h
dynloader.hpg_config.h pg_trace.h   postgres_fe.h
fmgr.h pg_config_manual.h  plperl.h postgres.h
funcapi.h  pg_config_os.h  plpgsql.hppport.h
getaddrinfo.h  pg_getopt.h plpython.h   rusagestub.h
getopt_long.h  pgstat.hplpy_util.h  windowapi.h
miscadmin.hpgtar.h port.h

> So I'd guess that Windows builds that don't use configure are probably
> the odd players out. But I don't have any contacts or name recognition
> to approach { EDB, OpenSCG, Postgres Pro } and find out how their
> internal build processes work, or whether they all end up lacking
> the file, or whether there is any single change that could be made
> in the PG source tree so that their builds would then all include it.

In fact, there is a single file that affects all the MSVC-based Windows
installers.  All the MSVC build stuff happens in src/tools/msvc, and it
is mostly written in Perl.  This is the corresponding line in the MSVC Perl 
file Install.pm:

CopySetOfFiles(
'',
[ glob("src\\include\\*.h") ],
$target . '/include/server/');

So, for me, the big question is why dynloader.h isn't getting copied
into /include/server.  (There is a mention the 'Makefile' about vpath
builds needing to copy dynloader.h manually --- is vpath being used for
the Windows installers somehow?)

Can you show me what files you have in /include/server, without your
copying the dynloader.h file manually?  Where did you get that Windows
installer?

> >  Also, I am very glad you are working on PL/Java.  :-)
> 
> Thanks!  It has been interesting trying to get up to speed, both on
> how it technically works, and also on the development history, why it
> lives out-of-tree, and so on. (That question had puzzled me for a long
> time, and when I finally found the long 2006 -hackers t

Re: [HACKERS] Size of Path nodes

2015-12-05 Thread Tom Lane
Robert Haas  writes:
> On Sat, Dec 5, 2015 at 12:35 PM, Tom Lane  wrote:
>> Amit Kapila  writes:
>>> To add to whatever has been said above, intention of adding that flag
>>> was also to avoid adding new nodes for parallelism.  Basically modify
>>> the existing nodes (like SeqScan) to take care of whatever is needed
>>> for parallel execution.

>> TBH, I would say that that's a damn-fool idea.  I think you should instead
>> create a separate ParallelSeqScan path type and plan type, and the same
>> for every other thing that becomes parallel-aware.

> Maybe.  But if we go down that path, we're eventually going to have
> ParallelSeqScan, ParallelIndexScan, ParallelBitmapHeapScan,
> ParallelForeignScan, ParallelAppend, ParallelHashJoin, and probably a
> bunch of others.  That could lead to a lot of code duplication.  Even
> for ParallelSeqScan:

>  src/backend/executor/nodeSeqscan.c | 136
> ++-
>  1 file changed, 103 insertions(+), 33 deletions(-)

> Now that file has 344 lines today, but only 33 of those lines are
> actual changes to pre-existing code, and most of those are mechanical.
> So the effect of making that a whole separate node type would have
> been to copy about 200 lines of code and comments.  That didn't seem
> like a good idea.

Meh.  Size of first patch is a pretty bad guide to whether a particular
structural decision is going to be a good idea in the long run.  Do you
need some examples of patches we've rejected because they were done in a
way that minimized the size of the patch rather than a way that made sense
from a system structural viewpoint?

I would also point out that if we'd followed this approach in the past,
nodeIndexscan.c, nodeBitmapIndexscan.c and nodeIndexonlyscan.c would be
one file, and it would be nigh unintelligible.  Those files nonetheless
manage to share code where it makes sense for them to.

In any case, even if there's an argument for keeping the two cases
together in the executor, I remain of the opinion that you'd be better off
with them being distinct Path types in the planner.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-12-05 Thread Tom Lane
Andrew Dunstan  writes:
> On 11/29/2015 04:28 PM, Noah Misch wrote:
>> Never mutate the filesystem in a BEGIN block, because "perl -c" runs BEGIN
>> blocks.  (Likewise for the BEGIN block this patch adds to TestLib.)

> Yeah, those two lines might belong in an INIT block. "perldoc perlmod" 
> for details.

BTW, if we consider that gospel, why has PostgresNode.pm got this in its
BEGIN block?

# PGHOST is set once and for all through a single series of tests when
# this module is loaded.
$test_pghost =
  $TestLib::windows_os ? "127.0.0.1" : TestLib::tempdir_short;
$ENV{PGHOST} = $test_pghost;

On non-Windows machines, the call of tempdir_short will result in
filesystem activity, ie creation of a directory.  Offhand it looks like
all of the activity in this BEGIN block could go to an INIT block instead.

I'm also bemused by why there was any question about this being wrong:

# XXX: Should this use PG_VERSION_NUM?
$last_port_assigned = 90600 % 16384 + 49152;

If that's not a hard-coded PG version number then I don't know
what it is.  Maybe it would be better to use random() instead,
but surely this isn't good as-is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size of Path nodes

2015-12-05 Thread Greg Stark
On Sat, Dec 5, 2015 at 5:35 PM, Tom Lane  wrote:
> The planner does not
> normally[1] use the same path type to represent two fundamentally different
> execution plans with enormously different cost estimates


Eh, this is a typical object modelling dilemma. There are lots of
different path types and many of them share a lot of properties but
depending on which way you look at things different sets of them seem
to be the same thing. BitmapScan is very like a Seqscan and
MergeAppend is like Append but have different node types, but nodes
with Filters attached didn't become FilteredSeqScan and
FilteredIndexScan etc... I'm not sure which Parallel is more like and
it may be more convenient for the planner one way and other parts the
other.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size of Path nodes

2015-12-05 Thread Robert Haas
On Sat, Dec 5, 2015 at 12:35 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> To add to whatever has been said above, intention of adding that flag
>> was also to avoid adding new nodes for parallelism.  Basically modify
>> the existing nodes (like SeqScan) to take care of whatever is needed
>> for parallel execution.
>
> TBH, I would say that that's a damn-fool idea.  I think you should instead
> create a separate ParallelSeqScan path type and plan type, and the same
> for every other thing that becomes parallel-aware.  The planner does not
> normally[1] use the same path type to represent two fundamentally different
> execution plans with enormously different cost estimates, but that is the
> direction you want to push in for parallel query.  I think it will lead to
> a mess: lots of unreadable code that has to do things in a way unlike the
> code around it, and lots of bugs-of-omission in places that should have
> distinguished seq and parallel cases but didn't.

Maybe.  But if we go down that path, we're eventually going to have
ParallelSeqScan, ParallelIndexScan, ParallelBitmapHeapScan,
ParallelForeignScan, ParallelAppend, ParallelHashJoin, and probably a
bunch of others.  That could lead to a lot of code duplication.  Even
for ParallelSeqScan:

 src/backend/executor/nodeSeqscan.c | 136
++-
 1 file changed, 103 insertions(+), 33 deletions(-)

Now that file has 344 lines today, but only 33 of those lines are
actual changes to pre-existing code, and most of those are mechanical.
So the effect of making that a whole separate node type would have
been to copy about 200 lines of code and comments.  That didn't seem
like a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Random crud left behind by aborted TAP tests

2015-12-05 Thread Tom Lane
Michael Paquier  writes:
> On Sat, Dec 5, 2015 at 1:59 AM, Tom Lane  wrote:
>> Let's just put the cruft in tmp_check/ and call it good.

> +1. Having those folders with random names showing up when typing git
> statue is annoying, so voilà.

Looks good to me, so pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size of Path nodes

2015-12-05 Thread Tom Lane
Amit Kapila  writes:
> To add to whatever has been said above, intention of adding that flag
> was also to avoid adding new nodes for parallelism.  Basically modify
> the existing nodes (like SeqScan) to take care of whatever is needed
> for parallel execution.

TBH, I would say that that's a damn-fool idea.  I think you should instead
create a separate ParallelSeqScan path type and plan type, and the same
for every other thing that becomes parallel-aware.  The planner does not
normally[1] use the same path type to represent two fundamentally different
execution plans with enormously different cost estimates, but that is the
direction you want to push in for parallel query.  I think it will lead to
a mess: lots of unreadable code that has to do things in a way unlike the
code around it, and lots of bugs-of-omission in places that should have
distinguished seq and parallel cases but didn't.

regards, tom lane

[1] Yes, I'm aware that UniquePath is an exception.  It has reasons to
live though, in particular that if it were multiple path types there would
still need to be places that understood the common property of those path
types all delivering unique results.  I do not see any corresponding
benefit of fuzzing the distinction between SeqScan and ParallelSeqScan.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size of Path nodes

2015-12-05 Thread Robert Haas
On Fri, Dec 4, 2015 at 4:00 PM, Robert Haas  wrote:
> If it's really true that the extra byte I added there has doubled
> planner memory use, then that's definitely cause for concern.
> However, I am skeptical that's really what has happened here.  Not
> every path has crossed a power-of-two threshold, and paths are not the
> only things the planner allocates.  What's a reasonable way to assess
> the effect of this on planner memory use in general?

So I did a really crude test of this.  I put
MemoryContextStats(MessageContext) - which seems to be where the
planner garbage is going - at the end of the message-processing loop,
and then ran the regression tests with and without parallel_aware in
the Path structure.  Then I ran a little grep and awk magic over the
postmaster logs and compared the sizes of contexts.  For reasons I
haven't tracked down, the number of instrumentation lines I got with
and without the flag differed.  But the overall pattern seems pretty
clear.  In the results below, the "without" number is the number of
times MessageContext had allocated the specified amount of storage
space without the parallel_aware flag; the "with" number is the number
of times it had allocated the specified amount of storage with the
parallel_aware flag.

size 8192 without 7589 with 7605
size 16384 without 6074 with 6078
size 16448 without 1 with 1
size 24576 without 26 with 27
size 24640 without 75 with 68
size 26448 without 3 with 3
size 32768 without 1747 with 1760
size 36512 without 0 with 1
size 42832 without 1 with 1
size 57344 without 7 with 9
size 57520 without 151 with 152
size 65536 without 1319 with 1349
size 66448 without 1 with 1
size 73728 without 4 with 5
size 73792 without 1 with 1
size 73904 without 2 with 2
size 116448 without 4 with 4
size 122880 without 4 with 4
size 131072 without 631 with 638
size 139264 without 12 with 12
size 216512 without 4 with 4
size 262144 without 496 with 504
size 270336 without 5 with 5
size 316448 without 1 with 1
size 516448 without 2 with 2
size 524288 without 73 with 74
size 532480 without 1 with 1
size 816512 without 1 with 1
size 1048576 without 19 with 19
size 1216448 without 1 with 0
size 2097152 without 4 with 5
queries_with=18337 queries_without=18259 total_with=612886960
total_without=605744096

What I think this is showing is that making Path bigger occasionally
pushes palloc over a boundary so that it allocates another chunk, but
most of the time t doesn't.  Also, it suggests to me that if we're
concerned about keeping memory utilization tight on these kinds of
queries, we could think about changing palloc's allocation pattern.
For example, if we did 8k, 16k, 32k, 64k, 64k, 64k, 64k, 128k, 128k,
128k, 128k, 256k, 256k, 256k, 256k ... a lot of these queries would
consume less memory.

If there's a strong feeling that I should find a way to make this
testing more rigorous, I'm willing to do so, but I suspect that we're
not going to find anything very exciting here.  A more likely angle of
investigation here is to try to figure out what a worst case for
enlarging the Path structure might look like, and test that.  I don't
have a brilliant idea there right at the moment, but I'll mull it
over.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-05 Thread Tom Lane
David Rowley  writes:
> As of today these Equivalence Classes only incorporate expressions which
> use the equality operator, but what if the above query had been written as:

> select * from t1 inner join t2 on t1.id = t2.id where t1.id <= 10;

> Should we not be able to assume that t2.id is also <= 10?

This sort of thing has been suggested multiple times before, and I've
rejected it every time on the grounds that it would likely add far more
planner cycles than it'd be worth, eg, time would be spent on searches for
matching subexpressions whether or not anything was learned (and often
nothing would be learned).  While I've only read your description of the
patch not the patch itself, the search methodology you propose seems
pretty brute-force and unlikely to solve that issue.  It's particularly
important to avoid O(N^2) behaviors when there are N expressions ...

Another issue that would need consideration is how to avoid skewing
planner selectivity estimates with derived clauses that are fully
redundant with other clauses.  The existing EC machinery is mostly
able to dodge that problem by generating just a minimal set of equality
clauses from an EC, but I don't see how we'd make that work here.

I'm also wondering why you want to limit it to comparisons to constants;
that seems rather arbitrary.

Lastly, in most cases knowing that t2.id <= 10 is just not worth all
that much; it's certainly far less useful than an equality condition.
It's not difficult to imagine that this would often be a net drag on
runtime performance (never mind planner performance) by doing nothing
except creating additional filter conditions the executor has to check.
Certainly it would be valuable to know this if it let us exclude some
partition of a table, but that's only going to happen in a small minority
of queries.

I'm not necessarily opposed to doing anything in this area, but so far
I've not seen how to do it in a way that is attractive when planner
complexity, performance, and end results are all considered.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libxml2 2.9.3 breaks xml test output

2015-12-05 Thread Christian Ullrich

* Michael Paquier wrote:


On Sat, Dec 5, 2015 at 4:38 PM, Christian Ullrich  wrote:

I have zero experience with libxml2, so no idea if the previous context
level can be turned on again. IMHO, the libxml2 change is a bug in itself;
PostgreSQL's error messages are more readable with the XML text in them.


See here for example you are not the only one to see this problem:
http://www.postgresql.org/message-id/cafj8pra4xjqfgnqcqmcygx-umgmr3stt3xfeuw7kbsoiovg...@mail.gmail.com
https://bugzilla.redhat.com/show_bug.cgi?id=1286692


Right. Sorry, I had checked the archives for earlier mentions of the 
issue, but had not found Pavel's post.


--
Christian





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: add \pset true/false

2015-12-05 Thread Michael Paquier
On Fri, Dec 4, 2015 at 6:06 PM, Pavel Stehule  wrote:
> long time I am dream about integrating Lua to psql
>
> It is fast enough for these purpose and can be used for custom macros, ..

Things are perhaps digressing too much here... Regarding the patch, I
would tend to think that we should just reject it and try to cruft
something that could be more pluggable if there is really a need.
Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Random crud left behind by aborted TAP tests

2015-12-05 Thread Michael Paquier
On Sat, Dec 5, 2015 at 1:59 AM, Tom Lane  wrote:
> Let's just put the cruft in tmp_check/ and call it good.

+1. Having those folders with random names showing up when typing git
statue is annoying, so voilà.
-- 
Michael


20151205_tap_tempdirs.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-12-05 Thread Michael Paquier
On Sat, Dec 5, 2015 at 1:11 AM, Alvaro Herrera  wrote:
> Michael Paquier wrote:
>
>> By the way, if there are no objections, I am going to mark this patch
>> as committed in the CF app. Putting in the infrastructure is already a
>> good step forward, and I will add an entry in next CF to track the
>> patch to add tests for recovery itself. Alvaro, what do you think?
>
> Feel free to do that, but I'm likely to look into it before the next CF
> anyway.  The thread about this has been open since 2013, so that doesn't
> seem unfair.

Let's see then. For the time being I have marked this patch as
committed, and created a new entry for the set of tests regarding
standbys:
https://commitfest.postgresql.org/8/438/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Equivalence Class Filters

2015-12-05 Thread David Rowley
Hi,

I'd like to gather up what the community interest might be for this patch.
Let me explain:

As of today Equivalence Classes are used in the query planner to allow the
planner to have more flexibility into the join order by collecting
information to describe which expressions are equal to each other. These
Equivalence classes, when they contain a constant value also allow
predicate push down. For example:

# explain select * from t1 inner join t2 on t1.id = t2.id where t1.id=1;
 QUERY PLAN
-
 Nested Loop  (cost=0.56..12.61 rows=1 width=12)
   ->  Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=8)
 Index Cond: (id = 1)
   ->  Index Only Scan using t2_pkey on t2  (cost=0.28..4.29 rows=1 width=4)
 Index Cond: (id = 1)
(5 rows)

We can see that a qual was added to filter t2.id=1.

As of today these Equivalence Classes only incorporate expressions which
use the equality operator, but what if the above query had been written as:

select * from t1 inner join t2 on t1.id = t2.id where t1.id <= 10;

Should we not be able to assume that t2.id is also <= 10? Currently we
don't, but the attached patch expands to add what I've called Equivalence
Filters to Equivalence Classes.

This allows the above query to produce a plan like:

  QUERY PLAN
--
 Merge Join  (cost=0.56..5.68 rows=1 width=12)
   Merge Cond: (t1.id = t2.id)
   ->  Index Scan using t1_pkey on t1  (cost=0.29..8.44 rows=9 width=8)
 Index Cond: (id <= 10)
   ->  Index Only Scan using t2_pkey on t2  (cost=0.28..4.45 rows=10
width=4)
 Index Cond: (id <= 10)

Now, it may not be that common to perform range filters on an id column,
but it can be fairly common for date values to be join conditions and have
date range filters too. For example in [1] Alexandre claimed a 1100 to 2200
performance improvement after manually pushing the date filter into the
subquery. This patch allows this to become automatic.

This all works by simply just collecting OpExprs during building the
EquivalanceClasses which have previously been rejected for the eqclass
because they don't use an equality operator. OpExprs in the form of "Expr
op Const" and "Const op Expr" are collected, and then once the
EquivalanceClasses have been build the "Expr" is searched for in the built
classes to see if we can find that Expr as a member of a class, if so this
then becomes an EquivalenceFilter and gets tagged onto to the
EquivalenceClass.

Patch Status - I'm a bit uncertain as to how far we can go with this and if
we deem this as a good idea, then we'd need to be careful not to cause any
performance regression. For example if the filter was "somestaticfunc(col)
< 1234", then we might not want to push that down as somestaticfunc() might
be so costly, that it might be better to perform the join with all the rows
instead.  For this reason I've limited the current patch to only using
Operators which are members of a btree op class. Perhaps we could do more,
but likely there's less of a win to be had due to less chance of that qual
being useful for an index scan.

Writing this patch has been on my "interesting" list for a while now. I got
some time last weekend to finally write it, but due to that happening at
the weekend rather than in work time it's a low priority for me to. However
there's no sense in it gathering dust, so I'm posting it today.

Is this something that we might want?


[1]
http://www.postgresql.org/message-id/cagewt-tbqrw5nlazkdcvp_zten_lmmygugq1ivvezb+p2xp...@mail.gmail.com

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services


equivalenceclassfilters_2015-12-05_aa62f00b.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Proposal for \rotate in psql

2015-12-05 Thread Pavel Stehule
2015-11-30 16:34 GMT+01:00 Daniel Verite :

> Pavel Stehule wrote:
>
> > [ \rotate being a wrong name ]
>
> Here's an updated patch.
>

Today I have a time to play with it. I am sorry for delay.


>
> First it renames the command to \crosstabview, which hopefully may
> be more consensual, at least it's semantically much closer to crosstab .
>

Thank you very much - it is good name.


>
> > The important question is sorting output. The vertical header is
> > sorted by first appearance in result. The horizontal header is
> > sorted in ascending or descending order. This is unfriendly for
> > often use case - month names. This can be solved by third parameter
> > - sort function.
>
> I've thought that sorting with an external function would be too
> complicated for this command, but sorting ascending by default
> was not the right choice either.
> So I've changed to sorting by first appearance in result (like the vertical
> header), and sorting ascending or descending only when specified
> (with +colH or -colH syntax).
>
> So the synopsis becomes: \crosstabview [ colV [+ | -]colH ]
>
> Example with a time series (daily mean temperatures in Paris,2014),
> month names across, day numbers down :
>
> select
>   to_char(w_date,'DD') as day ,
>   to_char(w_date,'Mon') as month,
>   w_temp from weather
>   where w_date between '2014-01-01' and '2014-12-31'
>   order by w_date
> \crosstabview
>
>  day | Jan | Feb | Mar | Apr | May | Jun | ...[cut]
> -+-+-+-+-+-+-+-
>  01  |   8 |   8 |   6 |  16 |  12 |  15 |
>  02  |  10 |   6 |   6 |  15 |  12 |  16 |
>  03  |  11 |   5 |   7 |  14 |  11 |  17 |
>  04  |  10 |   6 |   8 |  12 |  12 |  14 |
>  05  |   6 |   7 |   8 |  14 |  16 |  14 |
>  06  |  10 |   9 |   9 |  16 |  17 |  20 |
>  07  |  11 |  10 |  10 |  18 |  14 |  24 |
>  08  |  11 |   8 |  12 |  10 |  13 |  22 |
>  09  |  10 |   6 |  14 |  12 |  16 |  22 |
>  10  |   6 |   7 |  14 |  14 |  14 |  19 |
>  11  |   7 |   6 |  12 |  14 |  12 |  21 |
> ...cut..
>  28  |   4 |   7 |  10 |  12 |  14 |  16 |
>  29  |   4 | |  14 |  10 |  15 |  16 |
>  30  |   5 | |  14 |  14 |  17 |  18 |
>  31  |   5 | |  14 | |  16 | |
>
> The month names come out in the expected order here,
> contrary to what happened with the previous iteration of
> the patch which forced a sort in all cases.
> Here it plays out well because the single "ORDER BY w_date" is
> simultaneously OK for the vertical and horizontal headers,
> a common case for time series.
>
> For more complicated cases, when the horizontal and vertical
> headers should be ordered independantly, and
> in addition the horizontal header should not be sorted
> by its values, I've toyed with the idea of sorting by another
> column which would supposedly be added in the query
> just for sorting, but it loses much in simplicity. For the more
> complex stuff, users can always turn to the server-side methods
> if needed.
>
>
.Usually you have not natural order for both dimensions - I miss a
possibility to set [+/-] order for vertical dimension

For my query

select sum(amount) as amount, to_char(date_trunc('month', closed),'TMmon')
as Month, customer
  from data group by customer, to_char(date_trunc('month', closed),
'TMmon'), extract(month from closed)
  order by extract(month from closed);

I cannot to push order by customer - and I have to use


select sum(amount) as amount, extract(month from closed) as Month, customer
from data group by customer, extract(month from closed) order by  customer;

and \crosstabview 3 +2

So possibility to enforce order for vertical dimension and use data order
for horizontal dimension can be really useful. Other way using special
column for sorting

some like \crosstabview verticalcolumn horizontalcolumn
sorthorizontalcolumn


Next - I use "fetch_count" > 0. Your new version work only with "fetch_cunt
<= 0". It is limit - but I am thinking it is acceptable.In this case some
warning should be displayed - some like "crosstabview doesn't work with
FETCH_COUNT > 0"

I miss support for autocomplete and \?


Regards

Pavel






> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>