Re: [HACKERS] Parallel Plans and Cost of non-filter functions

2017-11-04 Thread Tels
Moin,

On Fri, November 3, 2017 7:13 pm, Tom Lane wrote:
> Paul Ramsey <pram...@cleverelephant.ca> writes:
>>> Whether I get a parallel aggregate seems entirely determined by the
>>> number
>>> of rows, not the cost of preparing those rows.
>
>> This is true, as far as I can tell and unfortunate. Feeding tables with
>> 100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
>> matter how costly the work going on within. That's true of changing
>> costs
>> on the subquery select list, and on the aggregate transfn.
>
> This sounds like it might be the same issue being discussed in
>
> https://www.postgresql.org/message-id/flat/CAMkU=1ycXNipvhWuweUVpKuyu6SpNjF=yhwu4c4us5jgvgx...@mail.gmail.com

When looking at the web archive, the link is broken, even though in the
mail above it appears correct for me:

https://www.postgresql.org/message-id/28621.1509750807%40sss.pgh.pa.us

(shortened: http://bit.ly/2zetO5T)

Seems the email-obfuskation breaks such links?

Here is a short-link for people reading it via the archive on http:

http://bit.ly/2hF4lIt

Best regards,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Account for cost and selectivity of HAVING quals

2017-11-01 Thread Tels
Hello David,

On Tue, October 31, 2017 7:54 pm, David G. Johnston wrote:
> On Tue, Oct 31, 2017 at 4:31 PM, Tels <nospam-pg-ab...@bloodgate.com>
> wrote:
>
>>
>> ​​
>> That looks odd to me, it first uses output_tuples in a formula, then
>> overwrites the value with a new value. Should these lines be swapped?
>>
>
> ​IIUC it is correct: the additional total_cost comes from processing every
> output group to check whether it is qualified - since every group is
> checked the incoming output_tuples from the prior grouping is used.  The
> side-effect of the effort is that the number of output_tuples has now been
> reduced to only those matching the qual - and so it now must take on a new
> value to represent this.

Ah, makes sense. Learned something new today.

Maybe it's worth to add a comment, or would everybody else beside me
understand it easily by looking at the code? :)

Thank you,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Account for cost and selectivity of HAVING quals

2017-10-31 Thread Tels
Moin,

On Tue, October 31, 2017 5:45 pm, Tom Lane wrote:
> Pursuant to the discussion at
> https://www.postgresql.org/message-id/20171029112420.8920b5f...@mx.zeyos.com
> here's a patch to fix the planner so that eval costs and selectivity of
> HAVING quals are factored into the appropriate plan node numbers.
> Perhaps unsurprisingly, this doesn't change the results of any
> existing regression tests.
>
> It's slightly annoying that this approach will result in calculating
> the eval costs and selectivity several times, once for each aggregation
> plan type we consider.  I thought about inserting RestrictInfo nodes
> into the havingQual so that those numbers could be cached, but that turned
> out to break various code that doesn't expect to see such nodes there.
> I'm not sure it's worth going to the trouble of fixing that; in the big
> scheme of things, the redundant calculations likely don't cost much, since
> we aren't going to have relevant statistics.
>
> Comments?  If anyone wants to do a real review of this, I'm happy to stick
> it into the upcoming CF; but without an expression of interest, I'll just
> push it.  I don't think there's anything terribly controversial here.

Not a review, but the patch has this:


+ total_cost += qual_cost.startup + output_tuples *
qual_cost.per_tuple;
+
+ output_tuples = clamp_row_est(output_tuples *

...

That looks odd to me, it first uses output_tuples in a formula, then
overwrites the value with a new value. Should these lines be swapped?

Best regards,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

2017-08-01 Thread Tels
On Sun, July 30, 2017 4:35 pm, Tom Lane wrote:
> "Tels" <nospam-pg-ab...@bloodgate.com> writes:
>> On Sun, July 30, 2017 12:22 pm, Tom Lane wrote:
>>> Yeah, I looked into that.  The closest candidate I can find is that
>>> perl 5.10.1 contains Test::More 0.92.  However, it's not real clear
>>> to me exactly which files I'd need to pull out of 5.10.1 and inject
>>> into
>>> an older tarball --- the layout seems a lot different from a standalone
>>> package.
>
>> So basically the two files:
>> http://search.cpan.org/src/EXODIST/Test-Simple-1.302086/lib/Test/More.pm
>> http://search.cpan.org/src/EXODIST/Test-Simple-1.302086/lib/Test/Builder/Module.pm
>> might do the trick.
>
> Thanks for the hint.  I transplanted these files out of a 5.10.1
> tarball into 5.8.3, then built as usual:
> lib/Test/Simple.pm
> lib/Test/More.pm
> lib/Test/Builder.pm
> lib/Test/Builder/Module.pm
> The result seems to work, although it fails a few of 5.8.3's tests,
> probably because I didn't copy over the relevant test scripts.
> It's good enough to run PG's tests though.

Ah, cool. One more question:

> 1. Can it run the build-related Perl scripts needed to build PG from
> a bare git checkout, on non-Windows platforms?
> 2. Can it run our MSVC build scripts?
> 3. Does pl/perl compile (and pass its regression tests) against it?
> 4. Can it run our TAP tests?

> 5.8.3 does appear to succeed at points 1,3,4.  Now, to get through
> the TAP tests you need to install IPC::Run, and you also need to
> update Test::More because the version shipped with 5.8.3 is too old.
> But at least the failures that you get from lacking these are pretty
> clear.

So, for point 2, one would need Perl 5.8.3 + a newer Test::More and
IPC::Run, or a newer Perl. I've toyed around with cpanminus:

 apt-get install cpanminus
 cpanm Test::More
 cpanm IPC::Run

and it successfully build (in a local path because I was not root) the
latest Test::More and IPC::Run. I'm not sure if this would be enough to
run the MSVC build scripts, tho.

So, is the goal you are trying to achive here to be able to say "You need
Perl 5.8.3; plus Module XYZ in vABC if you want point 2, otherwise skip
this step" instead of saying "You need Perl 5.10.1?"?

In that case it might be also possible to write instructions on how to
obtain and use these modules, or how to just use a newer Perl in a temp.
path.

Best wishes,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

2017-07-30 Thread Tels
Moin,

On Sun, July 30, 2017 12:22 pm, Tom Lane wrote:
> "Tels" <nospam-pg-ab...@bloodgate.com> writes:
>> On Sun, July 30, 2017 1:21 am, Tom Lane wrote:
>>>> So the question is, does anyone care?  I wouldn't except that our
>>>> documentation appears to claim that we work with Perl "5.8 or later".
>
>> Not sure how often People use old Perl versions out in the field. I'd
>> venture this is either happens with "ancient" stuff, e.g. where people
>> just can't or want upgrade.
>> Otherwise, an up-to-date OS is just necessary for security, anyway, and
>> that would contain a Perl from this decade, wouldn't it?
>
> Well, that's not really the point, IMO.  The reason I'm interested in
> this is the same reason I run some buildfarm critters on ancient
> platforms: if we do something that breaks backwards compatibility
> with old software, we should know it and make a deliberate decision
> that it's okay.  (And update the relevant compatibility claims in our
> docs.)  Moving the compatibility goalposts without knowing it isn't
> good, especially if it happens in supposedly-stable release branches.

Sure, I was just pointing out that moving the goalpost forward knowingly,
as you put it, can be ok vs. trying to support ancient software at all
costs. Reads to me as we are in agreement here.

>>> I am unable to confirm our claim that we work with Test::More 0.82,
>>> because CPAN has only versions from a year or three back.  (Anyone
>>> know of a more, er, comprehensive archive than CPAN?)  It's also
>>> interesting to speculate about how old a version of IPC::Run is new
>>> enough, but I see no easy way to get much data on that either.
>
>> Test::More has been bundled with Perl since 5.6.2 (you can use
>> "corelist"
>> to check for these things), so if all fails, it might be possible to
>> extract a version from a Perl distribution [4].
>
> Yeah, I looked into that.  The closest candidate I can find is that
> perl 5.10.1 contains Test::More 0.92.  However, it's not real clear
> to me exactly which files I'd need to pull out of 5.10.1 and inject into
> an older tarball --- the layout seems a lot different from a standalone
> package.

Module distributions contain a MANIFEST like this:

http://search.cpan.org/~exodist/Test-Simple/MANIFEST

And while reconstructing a module for an old version can be quite a
hassle,, it looks like Test::More is just plain Perl and only uses
Test::Builder::Module.

So basically the two files:

http://search.cpan.org/src/EXODIST/Test-Simple-1.302086/lib/Test/More.pm
http://search.cpan.org/src/EXODIST/Test-Simple-1.302086/lib/Test/Builder/Module.pm

might do the trick. Unless PG uses also Test::Simple or other modules,
which I haven't check, then you just need to add these, too.

And basically, you put these files into a subdirectory, and use:

  use lib "mydir";

to tell Perl to load modules from there first.

Here is a sample archive with a modern Test::More and an old Test::More
from 5.10.1. There are two scripts to see how they are loaded and used.

  http://bloodgate.com/tmp/test-more-test.tar.gz

One thing to watch out is that this would use the old Test::More, but any
module it loads (or the script use) comes from the system Perl. So the
test might not be 100% complete accurate, f.i. if a newer IO::Scalar is
used with the old Test::More.

You could also compile and install an old Perl version into a local
subdirectory and test with that. That takes a bit more time, though.

Hope this helps,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

2017-07-30 Thread Tels
Moin Tom,

On Sun, July 30, 2017 1:21 am, Tom Lane wrote:
> I wrote:
>> So the question is, does anyone care?  I wouldn't except that our
>> documentation appears to claim that we work with Perl "5.8 or later".
>> And the lack of field complaints suggests strongly that nobody else
>> cares.  So I'm inclined to think we just need to be more specific
>> about the minimum Perl version --- but what exactly?

My 0.02 cent on Perl versions:

Not sure how often People use old Perl versions out in the field. I'd
venture this is either happens with "ancient" stuff, e.g. where people
just can't or want upgrade.

Otherwise, an up-to-date OS is just necessary for security, anyway, and
that would contain a Perl from this decade, wouldn't it?

If someone is still running Perl 5.8.3, they also got glorius Unicode
circa Unicode 4.0 or in this area...

> I've done some more research on this.  It seems to me there are four
> distinct components to any claim about whether we work with a particular
> Perl version:
>
> 1. Can it run the build-related Perl scripts needed to build PG from
> a bare git checkout, on non-Windows platforms?
> 2. Can it run our MSVC build scripts?
> 3. Does pl/perl compile (and pass its regression tests) against it?
> 4. Can it run our TAP tests?
>
> I have no info to offer about point #2, but I did some tests with
> ancient 5.8.x Perl versions on my buildfarm hosts prairiedog and
> gaur.  (I would have liked to use something faster, but these Perl
> versions failed to build at all on more recent platforms, and
> I thought it rather pointless to try to hack them enough to build.)
>
> I find that perl 5.8.0 and later seem to succeed at point #1,
> but you need at least 5.8.1 to compile plperl.  Also, on prairiedog
> 5.8.1 fails plperl's regression tests because of some seemingly
> utf8-related bug.  That may be an ancient-macOS problem more than
> anything else, so I didn't poke at it too hard.
>
> The really surprising thing I found out is that you can't run the
> TAP tests on anything older than 5.8.3, because that's when they
> added "prove".  I'm bemused by the idea that such a fundamental
> facility would get added in a third-digit minor release, but there
> you have it.  (Now, in fairness, this amounted to importing a feature
> that already existed on CPAN, but still...)
>
> 5.8.3 does appear to succeed at points 1,3,4.  Now, to get through
> the TAP tests you need to install IPC::Run, and you also need to
> update Test::More because the version shipped with 5.8.3 is too old.
> But at least the failures that you get from lacking these are pretty
> clear.
>
> I am unable to confirm our claim that we work with Test::More 0.82,
> because CPAN has only versions from a year or three back.  (Anyone
> know of a more, er, comprehensive archive than CPAN?)  It's also
> interesting to speculate about how old a version of IPC::Run is new
> enough, but I see no easy way to get much data on that either.
>
> Anyway, pending some news about compatibility of the MSVC scripts,
> I think we ought to adjust our docs to state that 5.8.3 is the
> minimum supported Perl version.
>
> Also, I got seriously confused at one point during these tests because,
> while our configure script carefully sets PERL to an absolute path name,
> it's content to set PROVE to "prove", paying no attention to whether
> that version of "prove" matches "perl".  Is it really okay to run the
> TAP tests with a different perl version than we selected for other
> purposes?  I think it'd be a good idea to insist that "prove" be in
> the same directory we found "perl" in.

Thank you for the analysis. I agree about "prove".

As for Test::More:

Test::More has been bundled with Perl since 5.6.2 (you can use "corelist"
to check for these things), so if all fails, it might be possible to
extract a version from a Perl distribution [4].

CPAN authors are encouraged to clean out old versions due to the sheer
size of the archive. (Not all got the memo, tho...*cough*) and most
mirrors only carry the current files.

The original author is Michael G. Schwern [0]. But it seems he cleaned
house :)

You might have luck with an mirror [1] who didn't clean out, or with
archive.org.

But with Test::More, it seems a bit confusing, as it is part of
Test::Simple [2], which in turn is part of Test2, which is now on github
[3]. It's Test-Modules all the way down.

I'm not sure you'd find old Test::More versions ready-to-use in this.

My apologies if you knew that already.

However, I do so happen to have a large archive with Perl releases and
CPAN modules. It was first mirrored on mid-2015 - so anything that was
deleted before 2015 unfortunately I can't help you with that.

But if 

Re: [HACKERS] TAP: allow overriding PostgresNode in get_new_node

2017-06-01 Thread Tels
Moin,

On Wed, May 31, 2017 10:18 pm, Craig Ringer wrote:
> On 31 May 2017 at 08:43, Craig Ringer <cr...@2ndquadrant.com> wrote:
>> Hi all
>>
>> More and more I'm finding it useful to extend PostgresNode for project
>> specific helper classes. But PostgresNode::get_new_node is a factory
>> that doesn't provide any mechanism for overriding, so you have to
>> create a PostgresNode then re-bless it as your desired subclass. Ugly.
>>
>> The attached patch allows an optional second argument, a class name,
>> to be passed to PostgresNode::get_new_node . It's instantiated instead
>> of PostgresNode if supplied. Its 'new' constructor must take the same
>> arguments.
>
> Note that you can achieve the same effect w/o patching
> PostgresNode.pm, albeit in a somewhat ugly manner, by re-blessing the
> returned object.
>
> sub get_new_mywhatever_node {
> my $self = PostgresNode::get_new_node($name);
> $self = bless $self, 'MyWhateverNode';
> return $self;
> }
>
> so this would be cosmetically nice, but far from functionally vital.

It's good style in Perl to have constructors bless new objects with the
class that is passed in, tho.

(I'd even go so far as to say that any Perl OO code that uses fixed class
names is broken).

While technically you can rebless a returned object, that breaks thge
subclassing, sometimes subtle, and sometimes really bad.

For instances, any method calls the constructor does, are happening in the
"hardcoded" package, not in the subclass you are using, because the
reblessing happens later.

Consider for instance:

 package MyBase;

 sub new
{
my $self = bless {}, 'MyBase';
# it should be instead:
# my $self = bless {}, shift;
$self->_init();
}

 sub _init
{
my ($self) = @_;

$self->{foo} = 'bar';

# return the initialized object
$self;
}

If you do the this:

 package MySubclass;

 use MyBase;
 use vars qw/@ISA/;

 @ISA = qw/MyBase/;

 sub _init
{
my ($self) = @_;

# call the base's _init
$self->SUPER::_init();

# initialize our own stuff and override some
$self->{foo} = 'subclass';
$self->{baz} = 1;

# return the initialized object
$self;
}

and try to use it like this:

  package main;

  use MySubclass;

  my $thingy = MySubclass->new();
  print $thingy->{foo},"\n";

you get "bar", not "subclass" - even if you rebless $thingy into the
correct class.


And as someone who subclasses MyBase, you have no idea why or how and it
will break with the next update to MyBase's code. While technically you
can work around that by "peeking" into MyBase's code and maybe some
reblessing, the point is that you shouldn't do nor need to do this.

Please SEE: http://perldoc.perl.org/perlobj.html

Regards,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-19 Thread Tels

On Thu, May 18, 2017 10:24 pm, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael
>> Paquier
>> On Thu, May 18, 2017 at 11:30 PM, Robert Haas <robertmh...@gmail.com>
>> wrote:
>> > Because why?
>>
>> Because it is critical to let the user know as well *why* an error
>> happened.
>> Imagine that this feature is used with multiple nodes, all primaries.
>> If
>> a DB admin busted the credentials in one of them then all the load
>> would
>> be redirected on the other nodes, without knowing what is actually
>> causing
>> the error. Then the node where the credentials have been changed would
>> just
>> run idle, and the application would be unaware of that.
>
> In that case, the DBA can know the authentication errors in the server log
> of the idle instance.
>
> I'm sorry to repeat myself, but libpq connection failover is the feature
> for HA.  So I believe what to prioritize is HA.

I'm in agreement here, the feature for me sounds very useful for HA, but
HA means it needs to work as reliable as possible, not just "often enough"
:)

If one goes to the length to have multiple instances, there is surely also
monitoring in place to see if they are healthy or under load/stress.

The beaty of having libpq connecting to multiple hosts until one works is
that you can handle temporary unavailability (e.g. one instance is
restarted for patching) and general failure (one instance goes down to
whatever error) in one place and without having to implement this logic
into every app (database user connector).


Imagine f.i. that you have 3 hosts A, B and C and B.

There are two scenarioes here: Everyone tries "A,B,C", or everyone tries
random permutations like "A,C,B" or "B,C,A".

If In the first scenary, almost all connections would go to A, until it no
longer accepts no connections, then they spill over to B.

In the second one, each host gets 1/3 of all connections equally.

Now imagine  that B is down for either a brief period or permantently.

If libpq stops when it connects to B, then the scenarios play out like this:

1: Almost all connections run on A, but a random subset breaks when
spillover to B does not happen. Host C is idle.

2: 2/3 of all connections just work, 1/3 just breaks. Both A and C have a
higher load than usual.

If libpq skips B and continues, then we have instead:

1: Almost all connections run on A, but a random subset spills over to C
after skipping B.

2: All connections run on A or C, B is always skipped if it appears before
A or C.

The admin would see on the monitoring that B is offline (briefly or
permanent) and need to correct it.

>From the user's perspective, the second variant is smooth, the first is
breaking randomly. A "database user" would not really want to know that B
is down or why, it would just expect to get a working DB connection.

That's my 0.02 € anyway.

Tels



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-17 Thread Tels
Moin,

On Wed, May 17, 2017 12:34 pm, Robert Haas wrote:
> On Wed, May 17, 2017 at 3:06 AM, Tsunakawa, Takayuki
> <tsunakawa.ta...@jp.fujitsu.com> wrote:
>> What do you think of the following cases?  Don't you want to connect to
>> other servers?
>>
>> * The DBA shuts down the database.  The server takes a long time to do
>> checkpointing.  During the shutdown checkpoint, libpq tries to connect
>> to the server and receive an error "the database system is shutting
>> down."
>>
>> * The former primary failed and now is trying to start as a standby,
>> catching up by applying WAL.  During the recovery, libpq tries to
>> connect to the server and receive an error "the database system is
>> performing recovery."
>>
>> * The database server crashed due to a bug.  Unfortunately, the server
>> takes unexpectedly long time to shut down because it takes many seconds
>> to write the stats file (as you remember, Tom-san experienced 57 seconds
>> to write the stats file during regression tests.)  During the stats file
>> write, libpq tries to connect to the server and receive an error "the
>> database system is shutting down."
>>
>> These are equivalent to server failure.  I believe we should prioritize
>> rescuing errors during operation over detecting configuration errors.
>
> Yeah, you have a point.  I'm willing to admit that we may have defined
> the behavior of the feature incorrectly, provided that you're willing
> to admit that you're proposing a definition change, not just a bug
> fix.
>
> Anybody else want to weigh in with an opinion here?

Hm, to me the feature needs to be reliable (for certain values of
reliable) to be usefull.

Consider that you have X hosts (rendundancy), and a lot of applications
that want a stable connection to the one that (still) works, whichever
this is.

You can then either:

1. make one primary, the other standby(s) and play DNS tricks or similiar
to make it appear that there is only one working host, and have all apps
connect to the "one host" (and reconnect to it upon failure)

2. let each app try each host until it finds a working one, if the
connection breaks, retry with the next host

3. or use libpq and let it try the hosts for you.

However, if I understand it correctly, #3 only works reliable in certain
cases (e.g. host down), but not if it is "sort of down". In that case each
app would again need code to retry different hosts until it finds a
working one, instead of letting libpq do the work.

That sound hard to deploy #3 in praxis, as you might easily just code up
#1 or #2 and call it a day.

All the best,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-04-25 Thread Tels

On Tue, April 25, 2017 1:21 pm, Bruce Momjian wrote:
> On Tue, Apr 25, 2017 at 01:06:05PM -0400, Tels wrote:
>> Moin,
>>
>> On Mon, April 24, 2017 9:31 pm, Bruce Momjian wrote:
>> > I have committed the first draft of the Postgres 10 release notes.
>> They
>> > are current as of two days ago, and I will keep them current.  Please
>> > give me any feedback you have.
>>
>> Thank you! Here is one thing I noticed:
>>
>> "By default planning and execution is display by EXPLAIN ANALYZE and not
>> display in other cases. The new EXPLAIN option SUMMARY allows explicit
>> control of this."
>>
>> The grammar sounds a bit off, I'd suggest:
>>
>> "By default planning and execution time are both displayed by EXPLAIN
>> ANALYZE and not displayed in other cases. The new EXPLAIN option SUMMARY
>> allows explicit control of this feature."
>
> I think all that was missing was "time":
>
> By default planning and execution time is display by
> EXPLAIN ANALYZE and not display in other cases.
> The new EXPLAIN option SUMMARY allows
> explicit control of this.

Hm, no, I disagree, the the grammar is not right:

It should be: "X and Y _are_ display_ed_"

instead of:

"X and Y is display"

likewise "and not display_ed_ in other cases."

Best wishes,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-04-25 Thread Tels
Moin,

On Mon, April 24, 2017 9:31 pm, Bruce Momjian wrote:
> I have committed the first draft of the Postgres 10 release notes.  They
> are current as of two days ago, and I will keep them current.  Please
> give me any feedback you have.

Thank you! Here is one thing I noticed:

"By default planning and execution is display by EXPLAIN ANALYZE and not
display in other cases. The new EXPLAIN option SUMMARY allows explicit
control of this."

The grammar sounds a bit off, I'd suggest:

"By default planning and execution time are both displayed by EXPLAIN
ANALYZE and not displayed in other cases. The new EXPLAIN option SUMMARY
allows explicit control of this feature."

Regards,

Tels



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-04-21 Thread Tels
Moin,

On Fri, April 21, 2017 7:04 am, David Rowley wrote:
> On 21 April 2017 at 22:30, Tels <nospam-pg-ab...@bloodgate.com> wrote:
>> I'd rather see:
>>
>>  CREATE STATISTICS stats_name ON table(col);
>>
>> as this both mirrors CREATE INDEX and foreign keys with REFERENCES. It
>> could also be extended to both more columns, expressions or other tables
>> like so:
>>
>>  CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b);
>>
>> and even:
>>
>>  CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b) WITH (options)
>> WHERE t2.a > 4;
>
> How would you express a join condition with that syntax?
>
>> This looks easy to remember, since it compares to:
>>
>>  CREATE INDEX idx_name ON t2(a,b) WITH (options) WHERE t2.a > 4;
>>
>> Or am I'm missing something?
>
> Sadly yes, you are, and it's not the first time.
>
> I seem to remember mentioning this to you already in [1].
>
> Please, can you read over [2], it mentions exactly what you're
> proposing and why it's not any good.
>
> [1]
> https://www.postgresql.org/message-id/cakjs1f9hmet+7adiceau8heompob5pkkcvyzliezje3dvut...@mail.gmail.com
> [2]
> https://www.postgresql.org/message-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=vbj78v...@mail.gmail.com

Ah, ok, thank you, somehow I missed your answer the first time. So, just
ignore me :)

Best wishes,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-04-21 Thread Tels
Moin,

On Thu, April 20, 2017 8:21 pm, Tomas Vondra wrote:
> On 04/21/2017 12:13 AM, Tom Lane wrote:
>> Alvaro Herrera <alvhe...@2ndquadrant.com> writes:
>>> Simon just pointed out that having the WITH clause appear in the middle
>>> of the CREATE STATISTICS command looks odd; apparently somebody else
>>> already complained on list about the same.  Other commands put the WITH
>>> clause at the end, so perhaps we should do likewise in the new command.
>>
>>> Here's a patch to implement that.  I verified that if I change
>>> qualified_name to qualified_name_list, bison does not complain about
>>> conflicts, so this new syntax should support extension to multiple
>>> relations without a problem.
>>
>> Yeah, WITH is fully reserved, so as long as the clause looks like
>> WITH ( stuff... ) you're pretty much gonna be able to drop it
>> wherever you want.
>>
>>> Discuss.
>>
>> +1 for WITH at the end; the existing syntax looks weird to me too.
>>
>
> -1 from me
>
> I like the current syntax more, and  WHERE ... WITH seems a bit weird to
> me. But more importantly, one thing Dean probably considered when
> proposing the current syntax was that we may add support for partial
> statistics, pretty much like partial indexes. And we don't allow WITH at
> the end (after WHERE) for indexes:
>
> test=# create index on t (a) where a < 100 with (fillfactor=10);
> ERROR:  syntax error at or near "with"
> LINE 1: create index on t (a) where a < 100 with (fillfactor=10);
>  ^
> test=# create index on t (a) with (fillfactor=10) where a < 100;

While I'm not sure about where to put the WITH, so to speak, I do favour
consistency.

So I'm inclinded to keep the syntax like for the "create index".

More importantly however, I'd rather see the syntax on the "ON (column)
FROM relname" to be changed to match the existing examples. (Already wrote
this to Simon, not sure if my email made it to the list)

So instead:

 CREATE STATISTICS stats_name ON (columns) FROM relname

I'd rather see:

 CREATE STATISTICS stats_name ON table(col);

as this both mirrors CREATE INDEX and foreign keys with REFERENCES. It
could also be extended to both more columns, expressions or other tables
like so:

 CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b);

and even:

 CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b) WITH (options)
WHERE t2.a > 4;

This looks easy to remember, since it compares to:

 CREATE INDEX idx_name ON t2(a,b) WITH (options) WHERE t2.a > 4;

Or am I'm missing something?

Regards,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v25)

2017-04-05 Thread Tels
Moin,

On Wed, April 5, 2017 2:52 pm, Simon Riggs wrote:
> On 5 April 2017 at 10:47, David Rowley <david.row...@2ndquadrant.com>
> wrote:
>
>>> I have some other comments.
>
> Me too.
>
>
> CREATE STATISTICS should take ShareUpdateExclusiveLock like ANALYZE.
>
> This change is in line with other changes in this and earlier
> releases. Comments and docs included.
>
> Patch ready to be applied directly barring objections.

I know I'm a bit late, but isn't the syntax backwards?

"CREATE STATISTICS s1 WITH (dependencies) ON (col_a, col_b) FROM table;"

These do it the other way round:

CREATE INDEX idx ON table (col_a);

AND:

   CREATE TABLE t (
 id INT  REFERENCES table_2 (col_b);
   );

Won't this be confusing and make things hard to remember?

Sorry for not asking earlier, I somehow missed this.

Regard,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-12 Thread Tels
Moin,

On Sat, March 11, 2017 11:29 pm, Robert Haas wrote:
> On Fri, Mar 10, 2017 at 6:01 AM, Tels <nospam-pg-ab...@bloodgate.com>
> wrote:
>> Just a question for me to understand the implementation details vs. the
>> strategy:
>>
>> Have you considered how the scheduling decision might impact performance
>> due to "inter-plan parallelism vs. in-plan parallelism"?
>>
>> So what would be the scheduling strategy? And should there be a fixed
>> one
>> or user-influencable? And what could be good ones?
>>
>> A simple example:
>>
>> E.g. if we have 5 subplans, and each can have at most 5 workers and we
>> have 5 workers overall.
>>
>> So, do we:
>>
>>   Assign 5 workers to plan 1. Let it finish.
>>   Then assign 5 workers to plan 2. Let it finish.
>>   and so on
>>
>> or:
>>
>>   Assign 1 workers to each plan until no workers are left?
>
> Currently, we do the first of those, but I'm pretty sure the second is
> way better.  For example, suppose each subplan has a startup cost.  If
> you have all the workers pile on each plan in turn, every worker pays
> the startup cost for every subplan.  If you spread them out, then
> subplans can get finished without being visited by all workers, and
> then the other workers never pay those costs.  Moreover, you reduce
> contention for spinlocks, condition variables, etc.  It's not
> impossible to imagine a scenario where having all workers pile on one
> subplan at a time works out better: for example, suppose you have a
> table with lots of partitions all of which are on the same disk, and
> it's actually one physical spinning disk, not an SSD or a disk array
> or anything, and the query is completely I/O-bound.  Well, it could
> be, in that scenario, that spreading out the workers is going to turn
> sequential I/O into random I/O and that might be terrible.  In most
> cases, though, I think you're going to be better off.  If the
> partitions are on different spindles or if there's some slack I/O
> capacity for prefetching, you're going to come out ahead, maybe way
> ahead.  If you come out behind, then you're evidently totally I/O
> bound and have no capacity for I/O parallelism; in that scenario, you
> should probably just turn parallel query off altogether, because
> you're not going to benefit from it.

I agree with the proposition that both strategies can work well, or not,
depending on system-setup, the tables and data layout. I'd be a bit more
worried about turning it into the "random-io-case", but that's still just
a feeling and guesswork.

So which one will be better seems speculative, hence the question for
benchmarking different strategies.

So, I'd like to see the scheduler be out in a single place, maybe a
function that get's called with the number of currently running workers,
the max. number of workers to be expected, the new worker, the list of
plans still todo, and then schedules that single worker to one of these
plans by strategy X.

That would make it easier to swap out X for Y and see how it fares,
wouldn't it?


However, I don't think the patch needs to select the optimal strategy
right from the beginning (if that even exists, maybe it's a mixed
strategy), even "not so optimal" parallelism will be better than doing all
things sequentially.

Best regards,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-10 Thread Tels
 in-plan parallelism"?

So what would be the scheduling strategy? And should there be a fixed one
or user-influencable? And what could be good ones?

A simple example:

E.g. if we have 5 subplans, and each can have at most 5 workers and we
have 5 workers overall.

So, do we:

  Assign 5 workers to plan 1. Let it finish.
  Then assign 5 workers to plan 2. Let it finish.
  and so on

or:

  Assign 1 workers to each plan until no workers are left?

In the second case you would have 5 plans running in a quasy-sequential
manner, which might be slower than the other way. Or not, that probably
needs some benchmarks?

Likewise, if you have a mix of plans with max workers like:

  Plan A: 1 worker
  Plan B: 2 workers
  Plan C: 3 workers
  Plan D: 1 worker
  Plan E: 4 workers

Would the strategy be:

 * Serve them in first-come-first-served order? (A,B,C,D?) (Would order
here be random due to how the plan's emerge, i.e. could the user re-order
query to get a different order?)
 * Serve them in max-workers order? (A,D,B,C)
 * Serve first all with 1 worker, then fill the rest? (A,D,B,C | A,D,C,B)
 * Serve them by some other metric, e.g. index-only scans first, seq-scans
last? Or a mix of all these?

Excuse me if I just didn't see this from the thread so far. :)

Best regards,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Upgrading postmaster's log messages about bind/listen errors

2017-03-10 Thread Tels
Moin,

On Thu, March 9, 2017 11:43 pm, Tom Lane wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> On Thu, Mar 9, 2017 at 4:01 PM, Joe Conway <m...@joeconway.com> wrote:
>>> On 03/09/2017 12:27 PM, Tom Lane wrote:
>>>> For good measure I also added a DEBUG1 log message reporting
>>>> successful
>>>> binding to a port.  I'm not sure if there's an argument for putting
>>>> this
>>>> out at LOG level (i.e. by default) --- any thoughts about that?
>
>>> +1 for making it LOG instead of DEBUG1
>
>> I would tend to vote against that, because startup is getting
>> gradually chattier and chattier, and I think this isn't likely to be
>> of interest to very many people most of the time.
>
> Yeah, my thought was that if we've gotten along without this for 20 years,
> it's probably not of interest to most people most of the time.
>
> However, if we're measuring this on a scale of usefulness to the average
> DBA, I would argue that it's of more interest than any of these messages
> that currently appear by default:

My 0.02$:

I'd argue that from a security standpoint it is important to log at
startup what addresses the service binds to, just so it is visible,
explicit and logged.

Especially on machines with multiple interfaces to multiple networks it
can be confusing, see ipv6 vs ipv4, or bound interfaces with multiple
hosts and switches.

Granted, there should be firewall rules preventing access, but
misconfigurations, or simple changes can happen and go unnoticed. If later
the postmaster bind address changes, maybe due to an update or human
error,  you got the stars aligned just right for an unauthorized access.

OTOH, that the "logical replication launcher started" isn't really useful
to know to me as a user, I'd rather know when it failed to launch.

Best regards,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan [take-2]

2017-03-10 Thread Tels
Hello,

On Wed, March 1, 2017 7:21 pm, Kouhei Kaigai wrote:
>> I've looked at the patch, and as I'm not that familiar with the
>> pg-sourcecode,
>> customs and so on, this isn't a review, more like food for thought and
>> all
>> should be taken with a grain of salt. :)
>>
>> So here are a few questions and remarks:
>>
>> >+   double  limit_tuples = -1.0;
>>
>> Surely the limit cannot be fractional, and must be an integer. So
>> wouldn't
>> it be better the same type as say:
>>
>> >+   if (root->limit_tuples >= 0.0 &&
>>
>> Than you could also compare with ">= 0", not ">= 0.0".
>>
> The above variable represents the "estimated" number of rows at the
> planning stage, not execution time.
> You may be able to see Path structure has "rows" field declared as
> double type. It makes sense to consider stupid paths during planning,
> even if it is eventually rejected. For example, if a cross join with
> two large tables appear during planning, 64bit integer will make overflow
> easily.

Hm, ok. Not related to your patch, just curious: Is there a mechanism in
place that automatically rejects plans where the limit would overflow the
double to uint64 conversation? Or is this more of a "there would be
hopefully a plan with a better limit so we do not use the bad one"?

Would it possible to force a plan where such overflow would occur?

>> And this comment might be better "were we already called?"
>>
>> >+   boolrs_started; /* are we already
>> called? */
>>
> Other variables in ResultState uses present form, like:
>
> +   boolrs_started; /* are we already called? */
> boolrs_done;/* are we done? */
> boolrs_checkqual;   /* do we need to check the qual? */
>  } ResultState;

Yes, I noted that, but still "are" and "called" and "already" don't read
well together for me:

  are - present form
  called - past form like "were we called?", or "are we called bob?" an
ongoing process
  already - it has started

So "are we already called" reads like someone is waiting for being called.

Maybe to mirror the comment on "rs_done":

  /* have we started yet? */

Also, maybe it's easier for the comment to describe what is happening in
the code because of the flag, not just to the flag itself:

  /* To do things once when we are called */

Anyway, it is a minor point and don't let me distract you from your work,
I do like the feature and the patch :)

Best wishes,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-09 Thread Tels
Hi Aleksander,

noticed this in your patch:

> * Add a corespinding entry to pgStatTabHash.

"corresponding"

Also a question: Some one-line comments are

 /* Comment. */

while others are

 /*
  * Comment.
  */

Why the difference?

Hope this helps,

Tels

PS: Sorry if this appears twice, I fumbled the From: ...



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-09 Thread Tels
Hi Aleksander,

noticed this in your patch:

> * Add a corespinding entry to pgStatTabHash.

"corresponding"

Also a question: Some one-line comments are

 /* Comment. */

while others are

 /*
  * Comment.
  */

Why the difference?

Hope this helps,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-03-09 Thread Tels
Hello,

On Thu, March 9, 2017 9:04 am, Alexander Korotkov wrote:
> On Wed, Feb 1, 2017 at 2:43 PM, Emre Hasegeli <e...@hasegeli.com> wrote:
>
>> > I think this patch is already in a good shape.
>>
>> I am sorry for introducing this bug.  This fix looks good to me as well.
>
>
> I checked this patch too.  And it seems good to me as well.
> Should we mark it as "ready for committer"?

I can't comment on the code, but the grammar on the comments caught my eye:

> +/* Can any range from range_box does not extend higher than this
argument? */
>
>+static bool
>+overLower2D(RangeBox *range_box, Range *query)
>+{
>+  return FPle(range_box->left.low, query->high) &&
>+  FPle(range_box->right.low, query->high);
>+}

The sentence sounds quite garbled in English. I'm not entirely sure what
it should be, but given the comment below "/* Can any range from range_box
to be higher than this argument? */" maybe something like:

/* Does any range from range_box extend to the right side of the query? */

If used, an analog wording should be used for overHigher2D's comment like:

/* Does any range from range_box extend to the left side of the query? */

Also:

/* Can any range from range_box to be higher than this argument? */

should be:

/* Can any range from range_box be higher than this argument? */

Another question: Does it make sense to add the "minimal bad example for
the '&<' case" as test case, too? After all, it should pass the test after
the patch.

Bets regards,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan [take-2]

2017-03-01 Thread Tels
Hello all,

as this is my first mail to pgsql-hackers, please be gentle :)

I've looked at the patch, and as I'm not that familiar with the
pg-sourcecode, customs and so on, this isn't a review, more like food for
thought and all should be taken with a grain of salt. :)

So here are a few questions and remarks:

>+  double  limit_tuples = -1.0;

Surely the limit cannot be fractional, and must be an integer. So wouldn't
it be better the same type as say:

>+  if (root->limit_tuples >= 0.0 &&

Than you could also compare with ">= 0", not ">= 0.0".

node->ss.ps.ps_numTuples is f.i. an uint64.

Or is there a specific reason the limit must be a double?

And finally:

>+  if (node->ss.ps.ps_numTuples > 0)

>+  appendStringInfo(, " LIMIT %ld", node->ss.ps.ps_numTuples);

vs.

>+  appendStringInfo(, "%s LIMIT %lu",
>+   sql, 
>node->ss.ps.ps_numTuples);

It seems odd to have two different format strings here for the same variable.

A few comments miss "." at the end, like these:

>+   * Also, pass down the required number of tuples

>+   * Pass down the number of required tuples by the upper node

And this comment might be better "were we already called?"

>+  bool    rs_started; /* are we already called? */

Hope this helps, and thank you for working on this issue.

Regards,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers