Re: [HACKERS] [bug fix] pg_ctl fails with config-only directory

2013-12-06 Thread Amit Kapila
On Thu, Dec 5, 2013 at 6:30 PM, MauMau  wrote:
> From: "Amit Kapila" 
>>
>> On Wed, Dec 4, 2013 at 7:57 PM, MauMau  wrote:
>>>
>>
>> Approach-2 has been discussed previously to resolve it and it doesn't seem
>> to be
>> a good way to handle it. Please refer link:
>> http://www.postgresql.org/message-id/1339601668-sup-4...@alvh.no-ip.org
>>
>> You can go through that mail chain and see if there can be a better
>> solution than Approach-2.
>
>
> Thanks for the info.  I understand your feeling, but we need to be
> practical.  I believe we should not leave a bug and inconvenience by
> worrying about theory too much.  In addition to the config-only directory,
> the DBA with admin privs will naturally want to run "postgres -C" and
> "postgres --describe-config", because they are useful and so described in
> the manual.  I don't see any (at least big) risk in allowing postgres
> -C/--describe-config to run with admin privs.

Today, I had again gone through all the discussion that happened at
that time related to this problem
and I found that later in discussion it was discussed something on
lines as your Approach-2,
please see the link
http://www.postgresql.org/message-id/503a879c.6070...@dunslane.net

> In addition, recent Windows
> versions help to secure the system by revoking admin privs with UAC, don't
> they?  Disabling UAC is not recommended.
>
> I couldn't find a way to let postgres delete its token groups from its own
> primary access token.  There doesn't seem to be a reasonably clean and good
> way.

Wouldn't the other way to resolve this problem be reinvoke pg_ctl in
non-restricted mode for the case in question?

> So I had to choose approach 2.  Please find attached the patch.  This simple
> and not-complex change worked well.  I'd like to add this to 2014-1
> commitfest this weekend unless a better approach is proposed.

I think it is important to resolve this problem, so please godhead and
upload this patch to next CF.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-06 Thread Fabien COELHO



http://en.wikipedia.org/wiki/Cluster_sampling
http://en.wikipedia.org/wiki/Multistage_sampling

I suspect the hard part will be characterising the nature of the
non-uniformity in the sample generated by taking a whole block. Some
of it may come from how the rows were loaded (e.g. older rows were
loaded by pg_restore but newer rows were inserted retail) or from the
way Postgres works (e.g. hotter rows are on blocks with fewer rows in
them and colder rows are more densely packed).


I would have thought that as VACUUM reclaims space it levels that issue in 
the long run and on average, so that it could be simply ignored?



I've felt for a long time that Postgres would make an excellent test
bed for some aspiring statistics research group.


I would say "applied statistics" rather than "research". Nevertheless I 
can ask my research statistician colleagues next door about their opinion 
on this sampling question.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: programmable file format for postgresql.conf

2013-12-06 Thread Álvaro Hernández Tortosa


On 06/12/2013 22:59, Peter Eisentraut wrote:

On 12/6/13, 12:29 PM, Álvaro Hernández Tortosa wrote:

 What I've been trying to do is summarize what has already been
discussed here and propose a solution. You say that "you can already do
those thisngs", but that's not what I have read here. Greg Smith (cc'ed
as I'm quoting you) was explaining this in [1]:

"Right now, writing such a tool in a generic way gets so bogged down
just in parsing/manipulating the postgresql.conf file that it's hard to
focus on actually doing the tuning part."

That was in 2008.  I don't think that stance is accurate anymore.


Just for me to learn about this: why is it not accurate anymore?

Thanks for your patience! :)

aht


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: programmable file format for postgresql.conf

2013-12-06 Thread Álvaro Hernández Tortosa


On 06/12/2013 19:11, David Johnston wrote:

Álvaro Hernández Tortosa wrote

   Note that you are not required to maintain your configuration data in a
postgresql.conf-formatted file.  You can keep it anywhere you like, GUI
around in it, and convert it back to the required format.  Most of the

I think it is not a very good idea to encourage GUI tools or tools to
auto-configure postgres to use a separate configuration file and then
convert it to postgresql.conf. That introduces a duplicity with evil
problems if either source of data is modified out-of-the-expected-way.

That's why I'm suggesting a config file that is, at the same time,
usable by both postgres and other external tools. That also enables
other features such as editing the config file persistently through a
SQL session.

For my money I'd rather have a single file and/or directory-structure where
raw configuration settings are saved in the current 'key = value' format
with simple comments allowed and ignored by PostgreSQL.  And being simple
key-value the risk of "out-of-the-expected-way" changes would be minimal.


What I meant by "out-of-the-expected-way" is that if you edit 
postgresql.conf directly rather than through a tool (assuming you're 
regularly using the tool), then those changes may get lost when you use 
the tool again. In other words, there's potentially "duplicated 
information", and we all know that it's not desirable.

If we want to put a separate "configuration meta-data" file out there to
basically provide a database from which third-party tools can pull out this
information then great.  I would not incorporate that same information into
the main PostgreSQL configuration file/directory-structure.  The biggest
advantage is that the meta-data database can be readily modified without any
concern regarding such changes impacting running systems upon update.  Then,
tools simply need to import "two" files instead of one, link together the
meta-data key with the configuration key, and do whatever they were going to
do anyway.
Despite I think it's not ideal to have two separate, both editable, 
files for configuring postgresql, if:


- Both would be included in the official distribution, one alongside the 
other one
- A tool for converting the new one into the current postgresql.conf is 
included also with the distribution, say bin/pgconfiguration or whatever


then I'd agree that it could be a great first step to both adding 
support for external tooling for configuring postgres, and providing new 
users with a lot more help if they don't use any other tool.


Of course, other tools could be completely external to the 
postgresql distribution, but not the "alternate" configuration file and 
the pgconfiguration program.


Would this be a good thing to do then?


If indeed that target audience is going to be novices then a static
text-based document is not going to be the most desirable interface to
present.  At worse we should simply include a comment-link at the top of the
document to a web-page where an interactive tool for configuration file
creation would exist.  That tool, at the end of the process, could provide
the user with text to copy-paste/save into a specified area on the server so
the customizations made would override the installed defaults.


I think both could be used a lot, editing directly a rich 
configuration file or using a GUI tool. I'm trying to suggest supporting 
both.


Regards,

aht



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery to backup point

2013-12-06 Thread Michael Paquier
On Sat, Dec 7, 2013 at 9:06 AM, MauMau  wrote:
> It seems that Everyone welcomed the following functionality, and I also want
> it to solve some problem.  But this doesn't appear to be undertaken.
Indeed, nobody has really showed up to implement that.

>
> Recovery target 'immediate'
> http://www.postgresql.org/message-id/51703751.2020...@vmware.com
> Is there any technical difficulty?
As far as I recall, I don't think so. The problem and the way to solve
that are clear. The only trick is to be sure that recovery is done
just until a consistent point is reached, and to implement that
cleanly.

> May I implement this feature and submit a patch for the next commitfest if I 
> have time?
Please feel free. I might as well participate in the review.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-06 Thread Peter Geoghegan
On Thu, Dec 5, 2013 at 2:19 AM, Peter Geoghegan  wrote:
> I did a relatively short variant 'insert' pgbench-tools benchmark,
> with a serial primary index created on the pgbench_history table.
> Results:
>
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/insert/

I saw something today that made me lose confidence in the results I've
presented. I was unsuccessful in re-creating similar 'select' numbers
at scale 15 on the same server [1] (originally I wanted to see how
eliding the fmgr trapoline worked out with binary searching only).
Then, when re-testing master as of commit
8e18d04d4daf34b8a557e2dc553a7754b255cd9a (my feature branches branched
off from that master commit), I noticed that even after the last
pgbench had run, a single postgres process was stuck at 100% CPU usage
for upwards of a minute (the run referred to was for 32 clients, and I
only saw that one process in 'top' output).

To me this points to either 1) some kind of contamination or 2) a bug
in Postgres. My first guess is that it's the issue described here [2]
and elsewhere (maybe whether or not that behavior constitutes a bug in
controversial, but I think it does).

Consider the contrast between these two runs (against master, 2
clients) of the same, new benchmark:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/new-select/50/index.html

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/new-select/44/index.html

I had considered that something like Intel Speedstep technology had a
role here, but I'm pretty sure it steps up very aggressively when
things are CPU bound - I tested that against a Core 2 Duo desktop a
couple of years back, where it was easy to immediately provoke it by
moving around desktop windows or something. I know that Greg Smith
controls for that by disabling it, but I have not done so here - I
assumed that he only did so because he was being particularly
cautious. There are similar runs on my original 'results' benchmark
(these particular should-be-comparable-but-are-not runs are with 1
client against my patch this time):

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/results/297/index.html

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/results/303/index.html

References:

[1] http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/new-select/

[2] 
http://www.postgresql.org/message-id/CAFj8pRDDa40eiP4UTrCm=+bdt0xbwf7qc8t_3y0dfqyuzk2...@mail.gmail.com

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-06 Thread David Johnston
MauMau wrote
> From: "Tom Lane" <

> tgl@.pa

> >
>> There is no enthusiasm for a quick-hack solution here, and most people
>> don't actually agree with your proposal that these errors should never
>> get logged.  So no, that is not happening.  You can hack your local
>> copy that way if you like of course, but it's not getting committed.
> 
> Oh, I may have misunderstood your previous comments.  I got the impression 
> that you and others regard those messages (except "too many clients") as 
> unnecessary in server log.
> 
> 1. FATAL:  the database system is starting up

How about altering the message to tone down the severity by a half-step...

FATAL: (probably) not! - the database system is starting up

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/RFC-Shouldn-t-we-remove-annoying-FATAL-messages-from-server-log-tp5781899p5782236.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-06 Thread David Johnston
David Johnston wrote
> 
> MauMau wrote
>> From: "Tom Lane" <

>> tgl@.pa

>> >
>>> There is no enthusiasm for a quick-hack solution here, and most people
>>> don't actually agree with your proposal that these errors should never
>>> get logged.  So no, that is not happening.  You can hack your local
>>> copy that way if you like of course, but it's not getting committed.
>> 
>> Oh, I may have misunderstood your previous comments.  I got the
>> impression 
>> that you and others regard those messages (except "too many clients") as 
>> unnecessary in server log.
>> 
>> 1. FATAL:  the database system is starting up
>> 2. FATAL:  the database system is shutting down
>> 3. FATAL:  the database system is in recovery mode
>> 
>> 5. FATAL:  terminating walreceiver process due to administrator command
>> 6. FATAL:  terminating background worker \"%s\" due to administrator
>> command
>> 
>> Could you tell me why these are necessary in server log?  I guess like
>> this. 
>> Am I correct?
>> 
>> * #1 through #3 are necessary for the DBA to investigate and explain to
>> the 
>> end user why he cannot connect to the database.
>> 
>> * #4 and #5 are unnecessary for the DBA.  I can't find out any reason why 
>> these are useful for the DBA.
> For me 1-3 are unusual events in production situations and so knowing when
> they occur, and confirming they occurred for a good reason, is a key job
> of the DBA.
> 
> 5 and 6: I don't fully understand when they would happen but likely fall
> into the same "the DBA should know what is going on with their server and
> confirm any startup/shutdown activity it is involved with".
> 
> They might be better categorized "NOTICE" level if they were in response
> to a administrator action, versus in response to a crashed process, but
> even for the user-initiated situation making sure they hit the log but
> using FATAL is totally understandable and IMO desirable.
> 
> I'd ask in what situations are these messages occurring so frequently that
> they are becoming noise instead of useful data?  Sorry if I missed your
> use-case explanation up-thread.
> 
> David J.

Went and scanned the thread:

PITR/Failover is not generally that frequent an occurrence but I will agree
that these events are likely common during such.

Maybe PITR/Failover mode can output something in the logs to alleviate user
angst about these frequent events?  I'm doubting that a totally separate
mechanism can be used for this "mode" but instead of looking for things to
remove how about adding some additional coddling to the logs and the
beginning and end of the mode change?

Thought provoking only as I have not actually been a user of said feature.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/RFC-Shouldn-t-we-remove-annoying-FATAL-messages-from-server-log-tp5781899p5782235.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-06 Thread David Johnston
MauMau wrote
> From: "Tom Lane" <

> tgl@.pa

> >
>> There is no enthusiasm for a quick-hack solution here, and most people
>> don't actually agree with your proposal that these errors should never
>> get logged.  So no, that is not happening.  You can hack your local
>> copy that way if you like of course, but it's not getting committed.
> 
> Oh, I may have misunderstood your previous comments.  I got the impression 
> that you and others regard those messages (except "too many clients") as 
> unnecessary in server log.
> 
> 1. FATAL:  the database system is starting up
> 2. FATAL:  the database system is shutting down
> 3. FATAL:  the database system is in recovery mode
> 
> 5. FATAL:  terminating walreceiver process due to administrator command
> 6. FATAL:  terminating background worker \"%s\" due to administrator
> command
> 
> Could you tell me why these are necessary in server log?  I guess like
> this. 
> Am I correct?
> 
> * #1 through #3 are necessary for the DBA to investigate and explain to
> the 
> end user why he cannot connect to the database.
> 
> * #4 and #5 are unnecessary for the DBA.  I can't find out any reason why 
> these are useful for the DBA.

For me 1-3 are unusual events in production situations and so knowing when
they occur, and confirming they occurred for a good reason, is a key job of
the DBA.

5 and 6: I don't fully understand when they would happen but likely fall
into the same "the DBA should know what is going on with their server and
confirm any startup/shutdown activity it is involved with".

They might be better categorized "NOTICE" level if they were in response to
a administrator action, versus in response to a crashed process, but even
for the user-initiated situation making sure they hit the log but using
FATAL is totally understandable and IMO desirable.

I'd ask in what situations are these messages occurring so frequently that
they are becoming noise instead of useful data?  Sorry if I missed your
use-case explanation up-thread.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/RFC-Shouldn-t-we-remove-annoying-FATAL-messages-from-server-log-tp5781899p5782234.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Recovery to backup point

2013-12-06 Thread MauMau

Hello,

It seems that Everyone welcomed the following functionality, and I also want 
it to solve some problem.  But this doesn't appear to be undertaken.


Recovery target 'immediate'
http://www.postgresql.org/message-id/51703751.2020...@vmware.com

Is there any technical difficulty?  May I implement this feature and submit 
a patch for the next commitfest if I have time?


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-06 Thread MauMau

From: "Tom Lane" 

There is no enthusiasm for a quick-hack solution here, and most people
don't actually agree with your proposal that these errors should never
get logged.  So no, that is not happening.  You can hack your local
copy that way if you like of course, but it's not getting committed.


Oh, I may have misunderstood your previous comments.  I got the impression 
that you and others regard those messages (except "too many clients") as 
unnecessary in server log.


1. FATAL:  the database system is starting up
2. FATAL:  the database system is shutting down
3. FATAL:  the database system is in recovery mode

5. FATAL:  terminating walreceiver process due to administrator command
6. FATAL:  terminating background worker \"%s\" due to administrator command

Could you tell me why these are necessary in server log?  I guess like this. 
Am I correct?


* #1 through #3 are necessary for the DBA to investigate and explain to the 
end user why he cannot connect to the database.


* #4 and #5 are unnecessary for the DBA.  I can't find out any reason why 
these are useful for the DBA.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Not wanting to consider the sort args when there's more than one
 Tom> doesn't square with forcing them to be considered when there's
 Tom> just one.  It's the same aggregate after all,

This logic is only applied in the patch to aggregates that _aren't_
hypothetical.

(thinking out loud:) It might be more consistent to also add the
condition that the single sort column not be a variadic arg. And/or
the condition that it be the same type as the result. Or have a flag
in pg_aggregate to say "this agg returns one of its sorted input
values, so preserve the collation".

 >> Consider a construct like:

 >> select max(common_val)
 >> from (select mode() within group (order by textcol) as common_val
 >> from ... group by othercol) s;

 Tom> AFAICT none of the SQL-spec aggregates expose the kind of case
 Tom> I'm worried about, because none of the ones that can take
 Tom> multiple sort columns have a potentially collatable return type.

None of the spec's ordered-set functions expose any collation issue at
all, because they _all_ have non-collatable return types, period.

The problem only arises from the desire to make functions like
percentile_disc and mode applicable to collatable types.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] writable FDWs / update targets confusion

2013-12-06 Thread Tom Lane
Tomas Vondra  writes:
> I think that we should make the documentation more explicit about this
> limitation, because the current wording in fdw-callbacks documentation
> seems to suggest it's possible to add such hidden columns. At least
> that's how I read it before running into this.

You can add hidden columns if you've got 'em ;-).  What's missing
is the ability to create any hidden columns other than the ones in
standard PG tables.  What we most likely need is the ability for
an FDW to override the type assigned to the CTID column at foreign
table creation.  (We'd then also need to think about where such a
column could be shoehorned into a tuple, but the catalog support
has to come first.)  Alternatively, it might work to append "junk" columns
in the user column numbering domain, which would only exist in runtime
tuple descriptors and not in the catalogs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Meh.  I don't think you can have that and also have the behavior
>  Tom> that multiple ORDER BY items aren't constrained to have the same
>  Tom> collation; at least not without some rule that amounts to a
>  Tom> special case for percentile_disc, which I'd resist.

> What the submitted patch does (as discussed in the comment in
> parse_collate) is to treat the sort argument as contributing to the
> collation only if there is exactly one sort arg.

Blech :-(

Not wanting to consider the sort args when there's more than one doesn't
square with forcing them to be considered when there's just one.
It's the same aggregate after all, and in general the semantics ought
to be the same whether you give one sort col or three.

We might be able to make this work sanely by considering only argument
positions that were declared something other than ANY (anyelement being
other than that, so this isn't leaving polymorphics in the cold entirely).
This is a bit unlike the normal rules for collation assignment but
it doesn't result in weird semantics changes depending on how many sort
columns you supply.

> Consider a construct like:

> select max(common_val)
>   from (select mode() within group (order by textcol) as common_val
>   from ... group by othercol) s;

AFAICT none of the SQL-spec aggregates expose the kind of case I'm worried
about, because none of the ones that can take multiple sort columns have a
potentially collatable return type.  I don't think that justifies not
thinking ahead, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] writable FDWs / update targets confusion

2013-12-06 Thread Tomas Vondra
On 18.11.2013 09:28, Albe Laurenz wrote:
> Tom Lane wrote:
>>> Tom, could you show us a rope if there is one?
>>
>> What is it you actually need to fetch?
>>
>> IIRC, the idea was that most FDWs would do the equivalent of fetching the
>> primary-key columns to use in an update.  If that's what you need, then
>> AddForeignUpdateTargets should identify those columns and generate Vars
>> for them.  postgres_fdw is probably not a good model since it's using
>> ctid (a non-portable thing) and piggybacking on the existence of a tuple
>> header field to put that in.
>>
>> If you're dealing with some sort of hidden tuple identity column that
>> works like CTID but doesn't fit in six bytes, there may not be any good
>> solution in the current state of the FDW support.  As I mentioned, we'd
>> batted around the idea of letting FDWs define a system column with some
>> other datatype besides TID, but we'd not figured out all the nitty
>> gritty details in time for 9.3.
> 
> I was hoping for the latter (a hidden column).
> 
> But I'll have to settle for primary keys, which is also ok.

I was hoping for the latter too, and I can't really switch to primary
key columns. I can live with 6B passed in the ctid field for now, but
it'd be nice to be able to use at least 8B.

I think that we should make the documentation more explicit about this
limitation, because the current wording in fdw-callbacks documentation
seems to suggest it's possible to add such hidden columns. At least
that's how I read it before running into this.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> 2. For an ordered set function, n must equal aggnfixedargs.  We
 Tom> treat all n fixed arguments as contributing to the aggregate's
 Tom> result collation, but ignore the sort arguments.

 >> That doesn't work for getting a sensible collation out of
 >> percentile_disc applied to a collatable type. (Which admittedly is
 >> an extension to the spec, which allows only numeric and interval,
 >> but it seems to me to be worth having.)

 Tom> Meh.  I don't think you can have that and also have the behavior
 Tom> that multiple ORDER BY items aren't constrained to have the same
 Tom> collation; at least not without some rule that amounts to a
 Tom> special case for percentile_disc, which I'd resist.

What the submitted patch does (as discussed in the comment in
parse_collate) is to treat the sort argument as contributing to the
collation only if there is exactly one sort arg.

Consider a construct like:

select max(common_val)
  from (select mode() within group (order by textcol) as common_val
  from ... group by othercol) s;

(the same arguments for percentile_disc also apply to mode() and to
any other ordered set function that returns a value chosen from its
input sorted set)

Having this sort of thing not preserve the collation of textcol (or
fail) would be, IMO, surprising and undesirable.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
Andrew Gierth  writes:
> The patch as submitted answers those questions as follows:

> CREATE AGGREGATE func(integer) WITHIN GROUP (text) ...

You've glossed over a significant amount of complexity, as shown by your
example that prints WITHIN GROUP (*), a syntax that you've not defined
here.

In the long run we might think it worthwhile to actually store two
separate arglists for ordered-set aggregates; probably, pg_proc.proargs
would just describe the direct arguments and there'd be a second oidvector
in pg_aggregate that would describe the ORDER BY arguments.  This'd let
them be independently VARIADIC, or not.  I'm not proposing we do that
right now, because we don't have any use-cases that aren't sufficiently
handled by the hack of letting a single VARIADIC ANY entry cover both sets
of arguments.  I'd like though that the external syntax not be something
that prevents that from ever happening, and I'm afraid that this (*)
business is cheating enough to be a problem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
> "Josh" == Josh Berkus  writes:

 >> Since I don't particularly trust my own judgement on aesthetics, I
 >> used the feedback I got from others when deciding what to
 >> do. Frankly I think this one needs wider input than just you and
 >> me arguing over it.

 Josh> Can someone paste examples of the two syntax alternatives we're
 Josh> talking about here?  I've lost track.

OK. The starting point is that this is the calling syntax for ordered
set funcs, set by the spec:

SELECT func(foo) WITHIN GROUP (ORDER BY bar) FROM ...

where "foo" and "bar" might be fixed types, or polymorphic or variadic.

So we need to define (with no help from the spec) at least these:

  - what syntax is used in CREATE AGGREGATE to specify the number and
types of parameters for a newly defined "func"

  - what syntax is used to refer to the function in a
GRANT ... ON FUNCTION ...  statement, or ALTER ... OWNER TO ...

  - what should ::regprocedure casts accept as input and produce as
output

  - what output is shown in \df and \da when listing the function's
argument types

The patch as submitted answers those questions as follows:

CREATE AGGREGATE func(integer) WITHIN GROUP (text) ...

GRANT ... ON FUNCTION func(integer,text) ...
(there is no GRANT ... ON AGGREGATE ... (yet))

ALTER AGGREGATE func(integer) WITHIN GROUP (text) OWNER TO ...
ALTER AGGREGATE func(integer,text) OWNER TO ...
ALTER FUNCTION func(integer,text) OWNER TO ...
(all three of those are equivalent)

regprocedure outputs 'func(integer,text)' and accepts only that as
input

postgres=# \df func
 List of functions
 Schema | Name | Result data type | Argument data types   | Type 
+--+--+---+--
 public | func | text | (integer) WITHIN GROUP (text) | agg
(1 row)

If there's also a func() that isn't an ordered set function, you get
output like this (which provoked tom's "schitzophrenic" comment):

postgres=# \df ran[a-z]{1,5}
   List of functions
   Schema   |   Name   | Result data type |Argument data types| 
 Type  
+--+--+---+
 pg_catalog | random   | double precision |   | 
normal
 pg_catalog | rangesel | double precision | internal, oid, internal, integer  | 
normal
 pg_catalog | rank | bigint   |   | 
window
 pg_catalog | rank | bigint   | (VARIADIC "any") WITHIN GROUP (*) | 
agg
(4 rows)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: programmable file format for postgresql.conf

2013-12-06 Thread Peter Eisentraut
On 12/6/13, 12:29 PM, Álvaro Hernández Tortosa wrote:
> What I've been trying to do is summarize what has already been
> discussed here and propose a solution. You say that "you can already do
> those thisngs", but that's not what I have read here. Greg Smith (cc'ed
> as I'm quoting you) was explaining this in [1]:
> 
> "Right now, writing such a tool in a generic way gets so bogged down
> just in parsing/manipulating the postgresql.conf file that it's hard to
> focus on actually doing the tuning part."

That was in 2008.  I don't think that stance is accurate anymore.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
Josh Berkus  writes:
> Can someone paste examples of the two syntax alternatives we're talking
> about here?  I've lost track.

I'll leave it to Andrew to describe/defend exactly what his patch is
doing.  The alternative I'm thinking about is that in CREATE AGGREGATE
as well as \da output, the argument list of an ordered-set aggregate
would look like

  type1, type2, ... ORDER BY type3, type4, ...

if the aggregate only permits a fixed number of ordering columns
(as mode() does for example).  If it permits a variable number of
ordering columns, you could have

  type1, type2, ... ORDER BY [ type3, type4, ... ] VARIADIC something

99% of the time the right-hand part would just be "VARIADIC ANY"
but if an aggregate had need to lock down the ordering column types,
or the leading ordering column types, it could do that.  If it needs
a variable number of direct arguments as well (which in particular
hypothetical set functions do) then you would write

  [ type1, type2, ... ] VARIADIC something ORDER BY VARIADIC something

where we constrain the two "somethings" to be the same.  (Again, these
would *usually* be ANY, but I can envision that an aggregate might want to
constrain the argument types more than that.)  We have to constrain this
case because the underlying pg_proc representation and parser function
lookup code only allow the last argument to be declared VARIADIC.  So
under the hood, this last case would be represented in pg_proc with
proargs looking like just "[ type1, type2, ... ] VARIADIC something",
whereas in the first two cases the proargs representation would contain
all the same entries as above.

We could substitute something else for ORDER BY without doing a lot
of violence to this, if people are really down on that aspect.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wildcard usage enhancements in .pgpass

2013-12-06 Thread Peter Eisentraut
On 11/17/13, 1:56 PM, Martijn van Oosterhout wrote:
> Looks interesting, though I wonder if you could use fnmatch(3) here. Or
> woud that match more than you expect?  For example, it would allow
> 'foo*bar' to match 'foo.bar' which your code doesn't.

The question is whether you'd want that.

We had previously considered using fnmatch() for wildcard matching of
host names in certificates and ended up deciding against that.  It would
be worth checking that old thread.  It would also be good if the pgpass
wildcard matching were somehow consistent with certificates, since they
are both somewhat related security features.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Another thing to think about here is to wonder why the committee chose
 Tom> anything as verbose as "agg(...) WITHIN GROUP (ORDER BY ...)" in the
 Tom> first place.  The words ORDER BY certainly seem pretty unnecessary.

All of the ordered-set functions that the spec defines are intimately tied
to ordering of values, and they allow things like DESC ordering to affect
the results, so it seems obvious to me that they used (ORDER BY ...) because
what follows is both syntactically and semantically similar to any other use
of ORDER BY. (Similar logic seems to have been used for "FILTER (WHERE ...)".)
(The name "ordered set function" also seems quite specific.)

So I don't think there's any greater chance of the spec people adding
a WITHIN GROUP (something_other_than_order_by) construct than of them
adding any other awkward piece of syntax - and if they did, it would
represent an entirely new class of aggregate functions, separate from
ordered set functions and no doubt with its own headaches.

(I realize that expecting sanity from the spec writers is perhaps unwise)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Josh Berkus
On 12/06/2013 01:30 PM, Andrew Gierth wrote:
> Since I don't particularly trust my own judgement on aesthetics, I used
> the feedback I got from others when deciding what to do. Frankly I think
> this one needs wider input than just you and me arguing over it.

Can someone paste examples of the two syntax alternatives we're talking
about here?  I've lost track.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-12-06 Thread Peter Geoghegan
On Fri, Dec 6, 2013 at 12:24 PM, Tom Lane  wrote:
>> There seems to be no problem even if we use bigint as the type of
>> unsigned 32-bit integer like queryid. For example, txid_current()
>> returns the transaction ID, i.e., unsigned 32-bit integer, as bigint.
>> Could you tell me what the problem is when using bigint for queryid?
>
> We're talking about the output of some view, right, not internal storage?
> +1 for using bigint for that.  Using OID is definitely an abuse, because
> the value *isn't* an OID.  And besides, what if we someday decide we need
> 64-bit keys not 32-bit?

Fair enough. I was concerned about the cost of external storage of
64-bit integers (unlike query text, they might have to be stored many
times for many distinct intervals or something like that), but in
hindsight that was fairly miserly of me.

Attached revision displays signed 64-bit integers instead.

-- 
Peter Geoghegan
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
new file mode 100644
index e8aed61..95a2767
*** a/contrib/pg_stat_statements/Makefile
--- b/contrib/pg_stat_statements/Makefile
*** MODULE_big = pg_stat_statements
*** 4,11 
  OBJS = pg_stat_statements.o
  
  EXTENSION = pg_stat_statements
! DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
! 	pg_stat_statements--unpackaged--1.0.sql
  
  ifdef USE_PGXS
  PG_CONFIG = pg_config
--- 4,11 
  OBJS = pg_stat_statements.o
  
  EXTENSION = pg_stat_statements
! DATA = pg_stat_statements--1.2.sql pg_stat_statements--1.1--1.2.sql \
! 	pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql
  
  ifdef USE_PGXS
  PG_CONFIG = pg_config
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
new file mode 100644
index ...7824e3e
*** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
***
*** 0 
--- 1,43 
+ /* contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql */
+ 
+ -- complain if script is sourced in psql, rather than via ALTER EXTENSION
+ \echo Use "ALTER EXTENSION pg_stat_statements UPDATE" to load this file. \quit
+ 
+ /* First we have to remove them from the extension */
+ ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements();
+ 
+ /* Then we can drop them */
+ DROP VIEW pg_stat_statements;
+ DROP FUNCTION pg_stat_statements();
+ 
+ /* Now redefine */
+ CREATE FUNCTION pg_stat_statements(
+ OUT userid oid,
+ OUT dbid oid,
+ OUT queryid bigint,
+ OUT query text,
+ OUT calls int8,
+ OUT total_time float8,
+ OUT rows int8,
+ OUT shared_blks_hit int8,
+ OUT shared_blks_read int8,
+ OUT shared_blks_dirtied int8,
+ OUT shared_blks_written int8,
+ OUT local_blks_hit int8,
+ OUT local_blks_read int8,
+ OUT local_blks_dirtied int8,
+ OUT local_blks_written int8,
+ OUT temp_blks_read int8,
+ OUT temp_blks_written int8,
+ OUT blk_read_time float8,
+ OUT blk_write_time float8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE VIEW pg_stat_statements AS
+   SELECT * FROM pg_stat_statements();
+ 
+ GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
new file mode .
index 42e4d68..e69de29
*** a/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
***
*** 1,43 
- /* contrib/pg_stat_statements/pg_stat_statements--1.1.sql */
- 
- -- complain if script is sourced in psql, rather than via CREATE EXTENSION
- \echo Use "CREATE EXTENSION pg_stat_statements" to load this file. \quit
- 
- -- Register functions.
- CREATE FUNCTION pg_stat_statements_reset()
- RETURNS void
- AS 'MODULE_PATHNAME'
- LANGUAGE C;
- 
- CREATE FUNCTION pg_stat_statements(
- OUT userid oid,
- OUT dbid oid,
- OUT query text,
- OUT calls int8,
- OUT total_time float8,
- OUT rows int8,
- OUT shared_blks_hit int8,
- OUT shared_blks_read int8,
- OUT shared_blks_dirtied int8,
- OUT shared_blks_written int8,
- OUT local_blks_hit int8,
- OUT local_blks_read int8,
- OUT local_blks_dirtied int8,
- OUT local_blks_written int8,
- OUT temp_blks_read int8,
- OUT temp_blks_written int8,
- OUT blk_read_time float8,
- OUT blk_write_time float8
- )
- RETURNS SETOF record
- AS 'MODULE_PATHNAME'
- LANGUAGE C;
- 
- -- Register a view on the function for ease of use.
- CREATE VIEW pg_stat_statements AS
-   SELECT * FROM pg_stat_statements();
- 
- GRANT SELECT ON pg_stat_statements TO PUBLIC;
- 
- -- Don't want this to be available to non-superusers.
- REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
--- 0 

Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> pg_get_function_arguments()'s interface assumes that the caller is
 >> providing the enclosing parens. Changing it would have meant
 >> returning a result like 'float8) WITHIN GROUP (float8' which I
 >> reckoned would have too much chance of breaking existing callers.

 Tom> Well, as it stands, existing callers are broken a fortiori;

Not in most cases, because using the output of
pg_get_function_arguments works in all contexts except for
constructing the CREATE AGGREGATE statement (for example, a function
that uses the pg_get_function_arguments output to construct GRANTs
or ALTER OWNERs would still work).

And since constructing a correct CREATE AGGREGATE statement for an
ordered set function requires that the caller know about the
additional options to supply in the body, this does not seem like a
restriction.

 Tom> Of course, if we define the right output as being just the
 Tom> arguments according to pg_proc, it's fine.

The patch accepts the 'just the arguments according to pg_proc' as
input everywhere except in CREATE AGGREGATE, in addition to the
syntax with explicit WITHIN GROUP.

(With the exception of GRANT, as it happens, where since there is no
existing GRANT ON AGGREGATE variant and the patch doesn't add one,
only the pg_proc arguments form is accepted.)

 Tom> But your point about the parens is a good one anyway, because
 Tom> now that I think about it, what \da has traditionally printed is

 Tom>  pg_catalog | sum  | bigint   | integer | sum as 
bigint acro
 Tom> ss all integer input values

 Tom> and I see the patch makes it do this

 Tom>  pg_catalog | sum  | bigint   | (integer)   | sum as 
bigint acro

 Tom> which I cannot find to be an improvement, especially since \df
 Tom> does not look like that.  (As patched, the output of "\df ran*"
 Tom> is positively schizophrenic.)

Since I don't particularly trust my own judgement on aesthetics, I used
the feedback I got from others when deciding what to do. Frankly I think
this one needs wider input than just you and me arguing over it.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Actually, now that I think of it, why not use this syntax for
>  Tom> declaration and display purposes:
>  Tom> type1, type2 ORDER BY type3, type4

> But unfortunately it looks exactly like the calling sequence for a
> normal aggregate with an order by clause - I really think that is
> potentially too much confusion.

I thought about that too, but really that ship sailed long ago, and it
went to sea under the SQL committee's captaincy, so it's not our fault.
There are already at least four different standards-blessed ways you can
use ORDER BY in a query, some of them quite nearby (eg window functions);
so the potential for confusion is there no matter what we do.  In this
case, if we describe ordered-set aggregates using WITHIN GROUP rather than
ORDER BY, we might avoid confusion with the normal-aggregate case, but
instead we will have confusion about what the arguments even do.  Is
semantic confusion better than syntactic confusion?

Another thing to think about here is to wonder why the committee chose
anything as verbose as "agg(...) WITHIN GROUP (ORDER BY ...)" in the
first place.  The words ORDER BY certainly seem pretty unnecessary.
I'm suspicious that they might've been leaving the door open to put other
things into the second set of parens later --- GROUP BY, maybe?  So down
the road, we might regret it if we key off WITHIN GROUP and not ORDER BY.

Having said that, I'm not so dead set on it that I won't take WITHIN GROUP
if that's what more people want.  But we gotta lose the extra parens; they
are just too strange for function declaration/documentation purposes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread David Johnston
Andrew Gierth wrote
>> "Tom" == Tom Lane <

> tgl@.pa

> > writes:
> 
>  >> Please don't object that that doesn't look exactly like the syntax
>  >> for calling the function, because it doesn't anyway --- remember
>  >> you also need ORDER BY in the call.
> 
>  Tom> Actually, now that I think of it, why not use this syntax for
>  Tom> declaration and display purposes:
> 
>  Tom> type1, type2 ORDER BY type3, type4
> 
>  Tom> This has nearly as much relationship to the actual calling
>  Tom> syntax as the WITHIN GROUP proposal does,
> 
> But unfortunately it looks exactly like the calling sequence for a
> normal aggregate with an order by clause - I really think that is
> potentially too much confusion. (It's one thing not to look like
> the calling syntax, it's another to look exactly like a valid
> calling sequence for doing something _different_.)
> 
> -- 
> Andrew (irc:RhodiumToad)

How about:

type1, type2 GROUP ORDER type3, type4

OR

GROUP type1, type2 ORDER type3, type4

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Re-WITHIN-GROUP-patch-tp5773851p5782202.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> Please don't object that that doesn't look exactly like the syntax
 >> for calling the function, because it doesn't anyway --- remember
 >> you also need ORDER BY in the call.

 Tom> Actually, now that I think of it, why not use this syntax for
 Tom> declaration and display purposes:

 Tom>   type1, type2 ORDER BY type3, type4

 Tom> This has nearly as much relationship to the actual calling
 Tom> syntax as the WITHIN GROUP proposal does,

But unfortunately it looks exactly like the calling sequence for a
normal aggregate with an order by clause - I really think that is
potentially too much confusion. (It's one thing not to look like
the calling syntax, it's another to look exactly like a valid
calling sequence for doing something _different_.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reference to parent query from ANY sublink

2013-12-06 Thread Kevin Grittner
Kevin Grittner  wrote:

> test=# SELECT *
> FROM    tab1 a
> LEFT JOIN
> tab2 b
> ON a.i = ANY (
> SELECT  k    
> FROM    tab3 c
> WHERE    k = a.i);
>  i | j
> ---+---
>  1 | 4
>  1 | 5
>  1 | 6
>  2 |
>  3 | 4
>  3 | 5
>  3 | 6
> (7 rows)
>
>> SELECT  *
>> FROM    tab1 a
>>  LEFT JOIN
>>  (
>>    SELECT *
>>    tab2 b
>>    SEMI JOIN
>>    (  SELECT  k
>>    FROM    tab3 c
>>    WHERE  k = a.i
>>    ) AS ANY_subquery
>>    ON a.i = ANY_subquery.k
>>  ) AS SJ_subquery
>>  ON true;
>
> It is hard to see what you intend here

Perhaps you were looking for a way to formulate it something like
this?:

test=# SELECT *
test-#   FROM tab1 a
test-#   LEFT JOIN LATERAL
test-# (
test(#   SELECT *
test(# FROM tab2 b
test(# WHERE EXISTS
test(# (
test(#   SELECT *
test(# FROM tab3 c
test(# WHERE c.k = a.i
test(# )
test(# ) AS SJ_subquery
test-# ON true;
 i | j 
---+---
 1 | 4
 1 | 5
 1 | 6
 2 |  
 3 | 4
 3 | 5
 3 | 6
(7 rows)

Without LATERAL you get an error:

ERROR:  invalid reference to FROM-clause entry for table "a"
LINE 11: WHERE c.k = a.i
 ^

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-12-06 Thread Tom Lane
Fujii Masao  writes:
> On Sun, Nov 24, 2013 at 10:58 AM, Peter Geoghegan  wrote:
>> I decided that queryid should be of type oid, not bigint. This is
>> arguably a slight abuse of notation, but since ultimately Oids are
>> just abstract object identifiers (so say the docs), but also because
>> there is no other convenient, minimal way of representing unsigned
>> 32-bit integers in the view that I'm aware of, I'm inclined to think
>> that it's appropriate.

> There seems to be no problem even if we use bigint as the type of
> unsigned 32-bit integer like queryid. For example, txid_current()
> returns the transaction ID, i.e., unsigned 32-bit integer, as bigint.
> Could you tell me what the problem is when using bigint for queryid?

We're talking about the output of some view, right, not internal storage?
+1 for using bigint for that.  Using OID is definitely an abuse, because
the value *isn't* an OID.  And besides, what if we someday decide we need
64-bit keys not 32-bit?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
I wrote:
> One possibility is to forget all the parens and say that the display
> looks like
>   type1, type2 WITHIN GROUP type3, type4
> Please don't object that that doesn't look exactly like the syntax
> for calling the function, because it doesn't anyway --- remember you
> also need ORDER BY in the call.

Actually, now that I think of it, why not use this syntax for declaration
and display purposes:
type1, type2 ORDER BY type3, type4
This has nearly as much relationship to the actual calling syntax as the
WITHIN GROUP proposal does, and it's hugely saner from a semantic
standpoint, because after all the ordering columns are ordering columns,
not grouping columns.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Regardless of that, though ... what is the reasoning for
>  Tom> inventing pg_get_aggregate_arguments() rather than just making
>  Tom> pg_get_function_arguments() do the right thing?

> pg_get_function_arguments()'s interface assumes that the caller is
> providing the enclosing parens. Changing it would have meant returning
> a result like 'float8) WITHIN GROUP (float8' which I reckoned would
> have too much chance of breaking existing callers.

Well, as it stands, existing callers are broken a fortiori; they can't
possibly produce the right output for an ordered-set aggregate, if we
define the "right output" as looking like that.  Of course, if we define
the right output as being just the arguments according to pg_proc, it's
fine.  But your point about the parens is a good one anyway, because now
that I think about it, what \da has traditionally printed is

 pg_catalog | sum  | bigint   | integer | sum as bigint acro
ss all integer input values

and I see the patch makes it do this

 pg_catalog | sum  | bigint   | (integer)   | sum as bigint acro

which I cannot find to be an improvement, especially since \df does
not look like that.  (As patched, the output of "\df ran*" is positively
schizophrenic.)

One possibility is to forget all the parens and say that the display
looks like

  type1, type2 WITHIN GROUP type3, type4

Please don't object that that doesn't look exactly like the syntax
for calling the function, because it doesn't anyway --- remember you
also need ORDER BY in the call.

Or perhaps we could abbreviate that --- maybe just WITHIN?  Abbreviation
would be a good thing, especially if we're going to say 'VARIADIC "any"'
twice in common cases.  OTOH I'm not sure we can make that work as a
declaration syntax without reserving WITHIN.  I think WITHIN GROUP would
work, though I've not tried to see if bison likes it.

Anyway, the long and the short of this is that the SQL committee's bizarre
choice of syntax for calling these functions should not be followed too
slavishly when declaring them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Regardless of that, though ... what is the reasoning for
 Tom> inventing pg_get_aggregate_arguments() rather than just making
 Tom> pg_get_function_arguments() do the right thing?

pg_get_function_arguments()'s interface assumes that the caller is
providing the enclosing parens. Changing it would have meant returning
a result like 'float8) WITHIN GROUP (float8' which I reckoned would
have too much chance of breaking existing callers.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-12-06 Thread Fujii Masao
On Sun, Nov 24, 2013 at 10:58 AM, Peter Geoghegan  wrote:
> On Mon, Nov 18, 2013 at 1:54 AM, Sameer Thakur  wrote:
>> Please find v10 of patch attached. This patch addresses following
>> review comments
>
> I've cleaned this up - revision attached - and marked it "ready for 
> committer".
>
> I decided that queryid should be of type oid, not bigint. This is
> arguably a slight abuse of notation, but since ultimately Oids are
> just abstract object identifiers (so say the docs), but also because
> there is no other convenient, minimal way of representing unsigned
> 32-bit integers in the view that I'm aware of, I'm inclined to think
> that it's appropriate.

There seems to be no problem even if we use bigint as the type of
unsigned 32-bit integer like queryid. For example, txid_current()
returns the transaction ID, i.e., unsigned 32-bit integer, as bigint.
Could you tell me what the problem is when using bigint for queryid?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
I don't especially like the syntax you invented for listing arguments in
CREATE AGGREGATE, especially not the WITHIN GROUP (*) special case.
If we stick with that we're going to need to touch a lot more places than
you have, notably regprocedure.  I also fear that this syntax is not
appropriate for identifying aggregates reliably, ie, aggregate argument
lists that look different in this syntax could reduce to identical
pg_proc.proargs lists, and perhaps vice versa.

I think we should just have it list the arguments as they'd appear in
pg_proc, and rely on aggregate properties (to wit, aggkind and
aggndirectargs) to identify ordered-set and hypothetical aggregates.

A slightly different question is what \da ought to print.  I can't
say I think that (VARIADIC "any") WITHIN GROUP (*) is going to prove
very helpful to users, but probably just (VARIADIC "any") isn't
going to do either, at least not unless we add an aggregate-kind
column to the printout, and maybe not even then.  It might work to
cheat by duplicating the last item if it's variadic:
  (..., VARIADIC "any") WITHIN GROUP (VARIADIC "any")
while if it's not variadic, we'd have to work out which argument
positions correspond to the ordered-set arguments and put them
in the right places.

Regardless of that, though ... what is the reasoning for inventing
pg_get_aggregate_arguments() rather than just making
pg_get_function_arguments() do the right thing?  The separate function
seems to accomplish little except complicating life for clients, eg in
psql's describe.c.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: programmable file format for postgresql.conf

2013-12-06 Thread David Johnston
Álvaro Hernández Tortosa wrote
>>   Note that you are not required to maintain your configuration data in a
>> postgresql.conf-formatted file.  You can keep it anywhere you like, GUI
>> around in it, and convert it back to the required format.  Most of the
> 
>   I think it is not a very good idea to encourage GUI tools or tools to 
> auto-configure postgres to use a separate configuration file and then 
> convert it to postgresql.conf. That introduces a duplicity with evil 
> problems if either source of data is modified out-of-the-expected-way.
> 
>   That's why I'm suggesting a config file that is, at the same time, 
> usable by both postgres and other external tools. That also enables 
> other features such as editing the config file persistently through a 
> SQL session.

For my money I'd rather have a single file and/or directory-structure where
raw configuration settings are saved in the current 'key = value' format
with simple comments allowed and ignored by PostgreSQL.  And being simple
key-value the risk of "out-of-the-expected-way" changes would be minimal.

If you want to put an example configuration file out there, one that will
not be considered to the true configuration, with lots of comments and
meta-data then great.  I'm hoping that someday there is either a
curses-based and even full-fledged GUI that beginners can use to generate
the desired configuration.

If we want to put a separate "configuration meta-data" file out there to
basically provide a database from which third-party tools can pull out this
information then great.  I would not incorporate that same information into
the main PostgreSQL configuration file/directory-structure.  The biggest
advantage is that the meta-data database can be readily modified without any
concern regarding such changes impacting running systems upon update.  Then,
tools simply need to import "two" files instead of one, link together the
meta-data key with the configuration key, and do whatever they were going to
do anyway.

If indeed that target audience is going to be novices then a static
text-based document is not going to be the most desirable interface to
present.  At worse we should simply include a comment-link at the top of the
document to a web-page where an interactive tool for configuration file
creation would exist.  That tool, at the end of the process, could provide
the user with text to copy-paste/save into a specified area on the server so
the customizations made would override the installed defaults.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/RFC-programmable-file-format-for-postgresql-conf-tp5781097p5782175.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: programmable file format for postgresql.conf

2013-12-06 Thread Álvaro Hernández Tortosa



On 06/12/13 04:47, Peter Eisentraut wrote:

On Thu, 2013-12-05 at 00:51 +0100, Álvaro Hernández Tortosa wrote:




 The tradeoff seems quite positive to me. I see no strong
reasons why
not do it... am I missing something?


I don't buy your argument.  You say, if we make this change, those
things will happen.  I don't believe it.  You can *already* do those
things, but no one is doing it.


	What I've been trying to do is summarize what has already been 
discussed here and propose a solution. You say that "you can already do 
those thisngs", but that's not what I have read here. Greg Smith (cc'ed 
as I'm quoting you) was explaining this in [1]:


"Right now, writing such a tool in a generic way gets so bogged down 
just in parsing/manipulating the postgresql.conf file that it's hard to 
focus on actually doing the tuning part."


	And I completely agree. The alternative of having two separate sources 
of metadata is a very bad solution IMHO, as changes done to the 
postgresql.conf file directly would completely break the tool used 
otherwise. And parsing the actual postgresql.conf is simply not enough. 
First because it's difficult to parse all the comments correctly. Then, 
because it lacks a lot of the information required for GUI tools and 
auto-tunning tools.


	I'm sure you have read the GUCS Overhaul wiki page [2], that already 
points out many ideas related to this one.




But if we make this change, existing users will be inconvenienced,


	And I somehow agree. Adding some metainformation to the postgresql.conf 
file may be *a little* bit inconvenient for some users. But those users 
are probably pgsql-hackers or advanced DBAs. And I'm  sure everybody 
here knows keyboard shortcuts and how to fiddle with larger, yet 
structured, files. We all know how to grep and sed and awk this files, 
right?


	On the other hand, this metainformation would be extremely useful for 
newbies, not-that-unexperienced DBAs and even users which go to other 
databases because postgres is hard to configure. Adding it would be 
extremely valuable for them because:


- they would have much more inlined information about the parameter, and
- they could use tools to help them with the configuration

	So the question is: which group of users are we trying to please? And 
even if the answer would be the pgsql-hackers and not the rest of the 
world out there, is that much of an inconvenience what I'm saying, to 
deny the rest of advantages that it may bring?


Thanks for your comments,

aht


[1] 
http://www.postgresql.org/message-id/pine.gso.4.64.0806020452220.26...@westnet.com

[2] http://wiki.postgresql.org/wiki/GUCS_Overhaul


--
Álvaro Hernández Tortosa


---
NOSYS
Networked Open SYStems


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-06 Thread Robert Haas
On Thu, Dec 5, 2013 at 4:06 PM, Heikki Linnakangas
 wrote:
>> That's a very interesting idea.  I've been thinking that we needed to
>> preserve the property that new workers could attach to the shared
>> memory segment at any time, but that might not be necessary in all
>> case.  We could introduce a new dsm operation that means "i promise no
>> one else needs to attach to this segment".  Further attachments would
>> be disallowed by dsm.c regardless of the implementation in use, and
>> dsm_impl.c would also be given a chance to perform
>> implementation-specific operations, like shm_unlink and
>> shmctl(IPC_RMID).  This new operation, when used, would help to reduce
>> the chance of leaks and perhaps catch other programming errors as
>> well.
>>
>> What should we call it?  dsm_finalize() is the first thing that comes
>> to mind, but I'm not sure I like that.
>
> dsm_unlink() would mirror the underlying POSIX shm_unlink() call, and would
> be familiar to anyone who understands how unlink() on a file works on Unix.

OK, let me work on that once this CommitFest is done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2013-12-06 Thread Tom Lane
Andres Freund  writes:
> On 2013-12-06 11:02:48 -0500, Tom Lane wrote:
>> I think the special-purpose command line switches you mention can be
>> passed through PGOPTIONS, rather than inventing a new parameter -- do you
>> have an objection to that?

> I am not sure if they currently will get recognized early enough and
> whether permission checking will interferes, but if so, that's probably
> fixable.

Shouldn't be a problem --- the single-user mode will just concatenate
the options parameter onto the command line it builds.

> There's the question what we're going to end up doing with the current
> single user mode? There's some somewhat ugly code around for it...

Nothing, in the short term.  In a release or two we can get rid of it,
probably, but I'd hesitate to provide no overlap at all of these
usage modes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2013-12-06 Thread Andres Freund
On 2013-12-06 11:02:48 -0500, Tom Lane wrote:
> Andres Freund  writes:
> My feeling is that we should just treat the executable name and data
> directory path as new connection parameters, which'd be ignored in
> normal-connection mode, just as some other parameters will be ignored in
> single-user mode.  Otherwise we'll find ourselves building parameter
> setting infrastructure that pretty much duplicates what's there for the
> existing connection parameters.

Right.

> I think the special-purpose command line switches you mention can be
> passed through PGOPTIONS, rather than inventing a new parameter -- do you
> have an objection to that?

I am not sure if they currently will get recognized early enough and
whether permission checking will interferes, but if so, that's probably
fixable.

> > Not sure if we need anything but the pid of the postmaster be returned?
> 
> The new PQconnect routine would certainly hand back a PGconn.  I think
> we'd need a new query function PQchildPid(PGconn *) or some such to
> provide access to the child process PID.

I was thinking of a pid_t* argument to the new routine, but it's likely
unneccessary as we're probably going to end up storing it in PGconn
anyway.

There's the question what we're going to end up doing with the current
single user mode? There's some somewhat ugly code around for it...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] commit fest 2013-11 week 3 report

2013-12-06 Thread Peter Eisentraut
Last week's status:

Fri Nov 29

Status Summary. Needs Review: 29, Waiting on Author: 33, Ready for
Committer: 13, Committed: 24, Returned with Feedback: 5, Rejected: 5.
Total: 109.

Current status:

Fri Dec 6 10:55

Status Summary. Needs Review: 24, Waiting on Author: 21, Ready for
Committer: 14, Committed: 26, Returned with Feedback: 16, Rejected: 8.
Total: 109.


Although this looks like slow progress on paper, many larger patches
have actually made significant progress in their reviews this past week.
 A lot of those will probably be in very good shape for the next commit
fest.

We're also very close to achieving the primary goal of having given
every submission a decent review.  Although it looks like Heikki's
B-tree patches might end up being the loser ...

Now it's crunch time:

- If you are a committer, grab a patch from the Ready for Committer list
and commit it.

- If you are a patch author and you hope to get your patch finished next
week, make sure you have a current patch, your commit fest entry is up
to date, and your reviewers are aware of your intentions.

- If you are a reviewer, grab one of the remaining patches and give it a
look.  So no one feels sad.

- If you are a commit fest manager, all patches not in the above
categories should be disposed of appropriately by the end of next week.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_archivecleanup bug

2013-12-06 Thread Tom Lane
Robert Haas  writes:
> On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane  wrote:
>> In general, I think there is no excuse for code in the backend to use
>> readdir() directly; it should be using ReadDir(), which takes care of this
>> as well as error reporting.

> My understanding is that the fd.c infrastructure can't be used in the
> postmaster.

Say what?  See ParseConfigDirectory for code that certainly runs in the
postmaster, and uses ReadDir().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-06 Thread Greg Stark
It looks like this is a fairly well understood problem because in the
real world it's also often cheaper to speak to people in a small
geographic area or time interval too. These wikipedia pages sound
interesting and have some external references:

http://en.wikipedia.org/wiki/Cluster_sampling
http://en.wikipedia.org/wiki/Multistage_sampling

I suspect the hard part will be characterising the nature of the
non-uniformity in the sample generated by taking a whole block. Some
of it may come from how the rows were loaded (e.g. older rows were
loaded by pg_restore but newer rows were inserted retail) or from the
way Postgres works (e.g. hotter rows are on blocks with fewer rows in
them and colder rows are more densely packed).

I've felt for a long time that Postgres would make an excellent test
bed for some aspiring statistics research group.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time-Delayed Standbys

2013-12-06 Thread Fabrízio de Royes Mello
On Fri, Dec 6, 2013 at 1:36 PM, Robert Haas  wrote:

> On Thu, Dec 5, 2013 at 11:07 PM, Fabrízio de Royes Mello
>  wrote:
> > On Tue, Dec 3, 2013 at 5:33 PM, Simon Riggs 
> wrote:
> >>
> >> > - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
> >> > XLOG_XACT_COMMIT_COMPACT checks
> >>
> >> Why just those? Why not aborts and restore points also?
> >>
> >
> > I think make no sense execute the delay after aborts and/or restore
> points,
> > because it not change data in a standby server.
>
> I see no reason to pause for aborts.  Aside from the fact that it
> wouldn't be reliable in corner cases, as Fabrízio says, there's no
> user-visible effect, just as there's no user-visible effect from
> replaying a transaction up until just prior to the point where it
> commits (which we also do).
>
> Waiting for restore points seems like it potentially makes sense.  If
> the standby is delayed by an hour, and you create a restore point and
> wait 55 minutes, you might expect that that you can still kill the
> standby and recover it to that restore point.
>
>
Makes sense. Fixed.

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 9d80256..12aa917 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -142,6 +142,31 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
   
  
 
+ 
+  min_standby_apply_delay (integer)
+  
+min_standby_apply_delay recovery parameter
+  
+  
+   
+Specifies the amount of time (in milliseconds, if no unit is specified)
+which recovery of transaction commits should lag the master.  This
+parameter allows creation of a time-delayed standby.  For example, if
+you set this parameter to 5min, the standby will
+replay each transaction commit only when the system time on the standby
+is at least five minutes past the commit time reported by the master.
+   
+   
+Note that if the master and standby system clocks are not synchronized,
+this might lead to unexpected results.
+   
+   
+This parameter works only for streaming replication deployments. Synchronous
+replicas and PITR has not affected.
+   
+  
+ 
+
 
 
   
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index 5acfa57..e8617db 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -123,6 +123,17 @@
 #
 #trigger_file = ''
 #
+# min_standby_apply_delay
+#
+# By default, a standby server keeps restoring XLOG records from the
+# primary as soon as possible. If you want to delay the replay of
+# commited transactions from the master, specify a recovery time delay.
+# For example, if you set this parameter to 5min, the standby will replay
+# each transaction commit only when the system time on the standby is least
+# five minutes past the commit time reported by the master.
+#
+#min_standby_apply_delay = 0
+#
 #---
 # HOT STANDBY PARAMETERS
 #---
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b68230d..7ca2f9b 100755
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -218,6 +218,8 @@ static bool recoveryPauseAtTarget = true;
 static TransactionId recoveryTargetXid;
 static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
+static int min_standby_apply_delay = 0;
+static TimestampTz recoveryDelayUntilTime;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -730,6 +732,7 @@ static void readRecoveryCommandFile(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo);
 static bool recoveryStopsHere(XLogRecord *record, bool *includeThis);
 static void recoveryPausesHere(void);
+static void recoveryDelay(void);
 static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
@@ -5474,6 +5477,19 @@ readRecoveryCommandFile(void)
 	(errmsg_internal("trigger_file = '%s'",
 	 TriggerFile)));
 		}
+		else if (strcmp(item->name, "min_standby_apply_delay") == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item->value, &min_standby_apply_delay, GUC_UNIT_MS,
+	&hintmsg))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("parameter \"%s\" requires a temporal value", "min_standby_

Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2013-12-06 Thread Tom Lane
Andres Freund  writes:
> On 2013-12-05 23:01:28 +0200, Heikki Linnakangas wrote:
>> Right. Not all of the parameters will make sense for a stand-alone backend
>> though, like the hostname and port number. And I think you need need a new
>> parameter to pass the path to the 'postgres' executable, unless we re-use
>> the host parameter for that.

> Hm. I'd guessed that we wouldn't use the connection string to pass down
> the executable name and the datadir now that we're inventing a separate
> function. But maybe that's unneccessary.

> What parameters do we require to be set for that mode:
> * path to postgres
> * data directory
> * database name (single mode after all)
> * port, because of the shmem key? I'd say that's not important enough

> I think we also need to be able to pass some additional parameters to
> postgres:
> - config_file, hba_file, ... might be required to start pg in some 
> environments
> - -P, -O , are sometimes required in cases single user mode is
>   neccessary for data recovery.

Right, so by the time we're done, we'd still need a connection string or
the moral equivalent.

My feeling is that we should just treat the executable name and data
directory path as new connection parameters, which'd be ignored in
normal-connection mode, just as some other parameters will be ignored in
single-user mode.  Otherwise we'll find ourselves building parameter
setting infrastructure that pretty much duplicates what's there for the
existing connection parameters.

I think the special-purpose command line switches you mention can be
passed through PGOPTIONS, rather than inventing a new parameter -- do you
have an objection to that?

> Not sure if we need anything but the pid of the postmaster be returned?

The new PQconnect routine would certainly hand back a PGconn.  I think
we'd need a new query function PQchildPid(PGconn *) or some such to
provide access to the child process PID.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_archivecleanup bug

2013-12-06 Thread Robert Haas
On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane  wrote:
> In general, I think there is no excuse for code in the backend to use
> readdir() directly; it should be using ReadDir(), which takes care of this
> as well as error reporting.

My understanding is that the fd.c infrastructure can't be used in the
postmaster.

I agree that sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time-Delayed Standbys

2013-12-06 Thread Robert Haas
On Thu, Dec 5, 2013 at 11:07 PM, Fabrízio de Royes Mello
 wrote:
> On Tue, Dec 3, 2013 at 5:33 PM, Simon Riggs  wrote:
>>
>> > - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
>> > XLOG_XACT_COMMIT_COMPACT checks
>>
>> Why just those? Why not aborts and restore points also?
>>
>
> I think make no sense execute the delay after aborts and/or restore points,
> because it not change data in a standby server.

I see no reason to pause for aborts.  Aside from the fact that it
wouldn't be reliable in corner cases, as Fabrízio says, there's no
user-visible effect, just as there's no user-visible effect from
replaying a transaction up until just prior to the point where it
commits (which we also do).

Waiting for restore points seems like it potentially makes sense.  If
the standby is delayed by an hour, and you create a restore point and
wait 55 minutes, you might expect that that you can still kill the
standby and recover it to that restore point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-06 Thread Tom Lane
"MauMau"  writes:
> That discussion sounds interesting, and I want to take more time to 
> consider.  But what do you think of my original suggestion to easily solve 
> the current issue?  I'd like to remove the current annoying problem first 
> before spending much time for more excited infrastructure.

There is no enthusiasm for a quick-hack solution here, and most people
don't actually agree with your proposal that these errors should never
get logged.  So no, that is not happening.  You can hack your local
copy that way if you like of course, but it's not getting committed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2013-12-06 Thread Andres Freund
On 2013-12-05 23:01:28 +0200, Heikki Linnakangas wrote:
> On 12/05/2013 10:37 PM, Robert Haas wrote:
> >On Thu, Dec 5, 2013 at 3:05 PM, Tom Lane  wrote:
> >>It might be unpleasant to use in some cases, though.
> >
> >Why would there be more than a few cases in the first place?  Who is
> >going to use this beyond psql, pg_dump(all), and pg_upgrade, and why?
> 
> Well, you might want to use pgAdmin, or your other favorite admin tool. I'm
> not sure how well it would work, and I think it's OK if we say "sorry, can't
> do that", but it's not a crazy thing to want.

Pgadmin wouldn't work, it uses multiple connections for anything but the
most trivial tasks. You can't even send a manual sql query using only
one connection.
I think that's true for most of the non-trivial tools.

> >>Another issue is that we have too many variants of PQconnect
> >>already; which of them are we prepared to clone for this
> >>hypothetical new connection method?
> >
> >PQconnectdbParams, I assume.  Isn't that the one to rule them all,
> >modulo async connect which I can't think is relevant here?

> Right. Not all of the parameters will make sense for a stand-alone backend
> though, like the hostname and port number. And I think you need need a new
> parameter to pass the path to the 'postgres' executable, unless we re-use
> the host parameter for that.

Hm. I'd guessed that we wouldn't use the connection string to pass down
the executable name and the datadir now that we're inventing a separate
function. But maybe that's unneccessary.

What parameters do we require to be set for that mode:
* path to postgres
* data directory
* database name (single mode after all)
* port, because of the shmem key? I'd say that's not important enough

I think we also need to be able to pass some additional parameters to
postgres:
- config_file, hba_file, ... might be required to start pg in some environments
- -P, -O , are sometimes required in cases single user mode is
  neccessary for data recovery.

So I think we should just allow passing through arguments to postgres.

Not sure if we need anything but the pid of the postmaster be returned?

> >Or don't clone that one but instead have
> >PQnextConnectionShouldForkThisBinary('...') and let the psql/pg_dump
> >switch be --standalone=full-path-to-the-postgres-binary.

Yuck, that's ugly.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Review: [HACKERS] ECPG infrastructure changes part 1, was: Re: ECPG fixes

2013-12-06 Thread Antonin Houska
Tested git apply and build again. No warnings.

The regression test also looks good to me now.

I'm done with this review.

(Not sure if I should move it to 'ready for committer' status or the CFM
should do).

// Antonin Houska (Tony)

On 12/06/2013 02:01 PM, Boszormenyi Zoltan wrote:
> 2013-12-04 14:51 keltezéssel, Boszormenyi Zoltan írta:
>> 2013-12-03 16:48 keltezéssel, Antonin Houska írta:
>>
>>> Tests - 23.patch
>>> 
>>>
>>> src/interfaces/ecpg/test/sql/cursorsubxact.pgc
>>>
>>>
>>>  /*
>>>   * Test the implicit RELEASE SAVEPOINT if a SAVEPOINT
>>>   * is used with an already existing name.
>>>   */
>>>
>>> Shouldn't it be "... if a CURSOR is used with an already existing
>>> name?". Or just "... implicit RELEASE SAVEPOINT after an error"?
>>> I'd also appreciate a comment where exactly the savepoint is
>>> (implicitly) released.
>>
>> I have already answered this in my previous answer.
> 
> And I was wrong in that. The comments in the test were rearranged
> a little and the fact in the above comment is now actually tested.
> 
> Some harmless unused variables were also removed and an
> uninitialized variable usage was fixed. Because of these and the above
> changes a lot of patches need to be rebased.
> 
> All patches are attached again for completeness.
> 
> Best regards,
> Zoltán Böszörményi
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2013-12-06 Thread Peter Eisentraut
On 11/11/13, 12:01 PM, Tom Lane wrote:
> I do recall Peter saying that the infrastructure knows how to
> verify conversion specs in translated strings, so it would have to be
> aware of 'z' flags for this to work.

It just checks that the same conversion placeholders appear in the
translation.  It doesn't interpret the actual placeholders.  I don't
think this will be a problem.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trust intermediate CA for client certificates

2013-12-06 Thread Bruce Momjian
On Mon, Dec  2, 2013 at 12:44:08PM -0500, Bruce Momjian wrote:
> On Sat, Nov 30, 2013 at 12:10:19PM -0500, Bruce Momjian wrote:
> > > Drat, you're quite right. I've always included the full certificate
> > > chain in client certs but it's in no way required.
> > > 
> > > I guess that pretty much means maintaining the status quo and documenting
> > > it better.
> > 
> > I have developed the attached patch to document this behavior.  My goals
> > were:
> > 
> > * clarify that a cert can match a remote intermediate or root certificate
> > * clarify that the client cert must match a server root.crt
> > * clarify that the server cert much match a client root.crt
> > * clarify that the root certificate does not have to be specified
> >   in the client or server cert as long as the remote end has the chain
> >   to the root
> > 
> > Does it meet these goals?  Is it correct?
> 
> I have updated the patch, attached, to be clearer about the requirement
> that intermediate certificates need a chain to root certificates.

Patch applied to head.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> I believe that the spec requires that the "direct" arguments of
>  Tom> an inverse or hypothetical-set aggregate must not contain any
>  Tom> Vars of the current query level.

> In any event, going by the docs on the web, Oracle does not forbid
> grouped columns there (their wording is "This expr must be constant
> within each aggregation group.") MSSQL seems to require a literal
> constant, but that's obviously not per spec. IBM doesn't seem to
> have it in db2 for linux, but some of their other products have it
> and include examples of using grouped vars: see
> http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.dbu.doc%2Fc_dbuser_hypothetical_set_family_syntax.html

OK, that's reasonably convincing.  I think we'll need a HINT or something
to clarify the error message, because it sure looks like those arguments
are "used in an aggregate function".

>  Tom> 2. For an ordered set function, n must equal aggnfixedargs.  We
>  Tom> treat all n fixed arguments as contributing to the aggregate's
>  Tom> result collation, but ignore the sort arguments.

> That doesn't work for getting a sensible collation out of
> percentile_disc applied to a collatable type. (Which admittedly is an
> extension to the spec, which allows only numeric and interval, but it
> seems to me to be worth having.)

Meh.  I don't think you can have that and also have the behavior that
multiple ORDER BY items aren't constrained to have the same collation;
at least not without some rule that amounts to a special case for
percentile_disc, which I'd resist.

>  Tom> 3. For a hypothetical-set function, n must equal aggnfixedargs
>  Tom> plus k, and we match up type and collation info of the last k
>  Tom> fixed arguments with the corresponding sort arguments.  The
>  Tom> first n-k fixed arguments contribute to the aggregate's result
>  Tom> collation, the rest not.

> The submitted patch does essentially this but taking the number of
> non-variadic args in place of the suggested aggnfixedargs. Presumably
> in your version the latter would be derived from the former?

I'm not on board with using variadic vs non variadic to determine this.
For example, imagine a hypothetical-set function that for some reason
supports only a single sort column; there would be no reason to use
VARIADIC in its definition, and indeed good reason not to.  In any
case, I don't think this behavior should be tied to implementation details
of the representation of the function signature, and IMV variadic is
just that --- particularly for VARIADIC ANY, which is nothing more than
a short-cut for overloading the function name with different numbers of
ANY arguments.  Once we've got a match that involves N direct arguments
and K ordering arguments, the behavior should be determinate without
respect to just how we got that match.

>  Tom> Reading back over this email, I see I've gone back and forth
>  Tom> between using the terms "direct args" and "fixed args" for the
>  Tom> evaluate-once stuff to the left of WITHIN GROUP.  I guess I'm
>  Tom> not really sold on either terminology, but we need something we
>  Tom> can use consistently in the code and docs.  The spec is no help,
>  Tom> it has no generic term at all for these args.  Does anybody else
>  Tom> have a preference, or maybe another suggestion entirely?

> We (Atri and I) have been using "direct args", but personally I'm not
> amazingly happy with it. Documentation for other dbs tends to just call
> them "arguments", and refer to the WITHIN GROUP expressions as "ordering
> expressions" or similar.

Well, given that I was mistaken to think there could be no Vars at all
in them, "fixed" may not be le mot juste.  Unless somebody's got an
alternative to "direct", let's go with that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reference to parent query from ANY sublink

2013-12-06 Thread Kevin Grittner
Antonin Houska  wrote:

> SELECT *
> FROM    tab1 a
> LEFT JOIN
> tab2 b
> ON a.i = ANY (
> SELECT  k
> FROM    tab3 c
> WHERE    k = a.i);

This query works with k in any or all tables, but the semantics
certainly vary depending on where k happens to be.  It would help a
lot if you showed SQL statements to create and populate the tables
involved and/or if you qualified all referenced column names with
the table alias to avoid ambiguity.

If I assume that the k reference is supposed to be a column in
tab3, what you have is a query where you always get all rows from
tab1, and for each row from tab1 you either match it to all rows
from tab2 or no rows from tab2 depending on whether the tab1 row
has a match in tab3.

test=# create table tab1 (i int);
CREATE TABLE
test=# create table tab2 (j int);
CREATE TABLE
test=# create table tab3 (k int);
CREATE TABLE
test=# insert into tab1 values (1), (2), (3);
INSERT 0 3
test=# insert into tab2 values (4), (5), (6);
INSERT 0 3
test=# insert into tab3 values (1), (3);
INSERT 0 2
test=# SELECT *
FROM    tab1 a
    LEFT JOIN
    tab2 b
    ON a.i = ANY (
    SELECT  k    
    FROM    tab3 c
    WHERE    k = a.i);
 i | j
---+---
 1 | 4
 1 | 5
 1 | 6
 2 | 
 3 | 4
 3 | 5
 3 | 6
(7 rows)

> SELECT  *
> FROM    tab1 a
> LEFT JOIN
> (
>   SELECT *
>   tab2 b
>   SEMI JOIN
>   (  SELECT  k
>   FROM    tab3 c
>   WHERE  k = a.i
>   ) AS ANY_subquery
>   ON a.i = ANY_subquery.k
> ) AS SJ_subquery
> ON true;

It is hard to see what you intend here, since this is not valid
syntax.  I assume you want a FROM keyword before the tab2
reference, but it's less clear what you intend with the SEMI JOIN
syntax.  PostgreSQL supports semi-joins; but that is an
implementation detail for the EXISTS or IN syntax.  Could you
clarify your intent?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [patch] Client-only installation on Windows

2013-12-06 Thread MauMau

Hello,

According to this page,

http://www.postgresql.org/docs/current/static/install-procedure.html

client-only installation is possible on UNIX/Linux like this:

   gmake -C src/bin install
   gmake -C src/include install
   gmake -C src/interfaces install
gmake -C doc install

With the attached patch, you can do client-only installation on Windows like 
this:


   install.bat  client

This installs:

* client applications (both core and contrib)
* DLLs for libpq and ECPG
* header files
* import libraries
* pg_service.conf.sample and psqlrc.sample
* symbol files (*.pdb) for the above modules

If the second argument is given as "client" or omitted, all files are 
installed.  With 9.4, the whole installation takes up about 80 MB, and the 
client-only installation takes up only 24 MB.


Any comments would be appreciated.


Regards
MauMau


client_only_install_win.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-06 Thread MauMau

From: "Alvaro Herrera" 

There was also the idea that this would be driven off SQLSTATE but this
seems pretty unwieldy to me.


You are referring to this long discussion, don't you?

http://www.postgresql.org/message-id/19791.1335902...@sss.pgh.pa.us

I've read it before, and liked the SQLSTATE-based approach.  It seems like 
properly assigned SQLSTATEs can be used as message IDs, and pairs of 
SQLSTATE and its user action might be utilized to provide sophisticated 
database administration GUI.


That discussion sounds interesting, and I want to take more time to 
consider.  But what do you think of my original suggestion to easily solve 
the current issue?  I'd like to remove the current annoying problem first 
before spending much time for more excited infrastructure.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature request: Logging SSL connections

2013-12-06 Thread Dr. Andreas Kunert

Anything else missing?


Functionally it's fine now, but I see few style problems:

- "if (port->ssl > 0)" is wrong, ->ssl is pointer.  So use just
   "if (port->ssl)".
- Deeper indentation would look nicer with braces.
- There are some duplicated message, could you restructure it so that
   each message exists only once.


New version is attached. I could add braces before and after the 
ereport()-calls too, but then I need two more #ifdefs to catch the 
closing braces.




--
---
  __  
Dr. Andreas Kunert / __/ / / / __/
HU-Berlin, ZE Rechenzentrum (CMS) / /_  / / / / __\\
www.hu-berlin.de/~kunert /___/ /_/_/_/ /___/

---
--- postinit.c.orig	2013-12-06 14:42:14.827832432 +0100
+++ postinit.c	2013-12-06 14:42:25.171842695 +0100
@@ -221,13 +221,31 @@
 	if (Log_connections)
 	{
 		if (am_walsender)
-			ereport(LOG,
-	(errmsg("replication connection authorized: user=%s",
-			port->user_name)));
+		{
+#ifdef USE_SSL
+			if (port->ssl)
+ereport(LOG,
+		(errmsg("replication connection authorized: user=%s SSL(protocol: %s, cipher: %s)",
+port->user_name, SSL_get_version(port->ssl), SSL_get_cipher(port->ssl;
+			else
+#endif
+ereport(LOG,
+		(errmsg("replication connection authorized: user=%s",
+port->user_name)));
+		}
 		else
-			ereport(LOG,
-	(errmsg("connection authorized: user=%s database=%s",
-			port->user_name, port->database_name)));
+		{
+#ifdef USE_SSL
+			if (port->ssl)
+ereport(LOG,
+		(errmsg("connection authorized: user=%s database=%s SSL(protocol: %s, cipher: %s)",
+port->user_name, port->database_name, SSL_get_version(port->ssl), SSL_get_cipher(port->ssl;
+			else
+#endif
+ereport(LOG,
+		(errmsg("connection authorized: user=%s database=%s",
+port->user_name, port->database_name)));
+		}
 	}
 
 	set_ps_display("startup", false);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-06 Thread Andres Freund
On 2013-12-06 22:35:21 +0900, MauMau wrote:
> From: "Tom Lane" 
> >No.  They are FATAL so far as the individual session is concerned.
> >Possibly some documentation effort is needed here, but I don't think
> >any change in the code behavior would be an improvement.
> 
> You are suggesting that we should add a note like "Don't worry about the
> following message.  This is a result of normal connectivity checking", don't
> you?
> 
> FATAL:  the database system is starting up

Uh. An explanation why you cannot connect to the database hardly seems
like a superflous log message.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-06 Thread MauMau

From: "Josh Berkus" 

Heck, I'd be happy just to have a class of messages which specifically
means "OMG, there's something wrong with the server", that is, a flag
for messages which only occur when PostgreSQL encounters a bug, data
corrpution, or platform error.  Right now, I have to suss those out by
regex.


What are the issues of using SQLSTATE XXnnn as a filter?

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-06 Thread MauMau

From: "Tom Lane" 

No.  They are FATAL so far as the individual session is concerned.
Possibly some documentation effort is needed here, but I don't think
any change in the code behavior would be an improvement.


You are suggesting that we should add a note like "Don't worry about the 
following message.  This is a result of normal connectivity checking", don't 
you?


FATAL:  the database system is starting up

But I doubt most users would recognize such notes.  Anyway, lots of such 
messages certainly make monitoring and troubleshooting harder, because 
valuable messages are buried.




4. FATAL:  sorry, too many clients already
Report these as FATAL to the client because the client wants to know the
reason.  But don't output them to server log because they are not 
necessary

for DBAs (4 is subtle.)


The notion that a DBA should not be allowed to find out how often #4 is
happening is insane.


I thought someone would point out so.  You are right, #4 is a strong hint 
for the DBA about max_connection setting or connection pool configuration.



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature request: Logging SSL connections

2013-12-06 Thread Marko Kreen
On Fri, Dec 06, 2013 at 11:43:55AM +0100, Dr. Andreas Kunert wrote:
> >>That seems useful.  Do we need more information, like whether a client
> >>certificate was presented, or what ciphers were used?
> >
> >Yes, please show ciphersuite and TLS version too.  Andreas, you can use my
> >recent \conninfo patch as template:
> >
> >   
> > https://github.com/markokr/postgres/commit/7d1b27ac74643abd15007cc4ec0b56ba92b39d90
> >
> >Also, please show the SSL level also for walsender connections.  It's
> >quite important to know whether they are using SSL or not.
> >
> >But I think the 'bits' output is unnecessary, as it's cipher strength
> >is known by ciphersuite.  Perhaps it can be removed from \conninfo too.
> 
> A new patch is attached. I added the ciphersuite and TLS version
> like shown in your template (minus the 'bits' output). I also added
> the SSL information for walsender connections, but due to a missing
> test setup I cannot test that part.
> 
> Anything else missing?

Functionally it's fine now, but I see few style problems:

- "if (port->ssl > 0)" is wrong, ->ssl is pointer.  So use just
  "if (port->ssl)".

- Deeper indentation would look nicer with braces.

- There are some duplicated message, could you restructure it so that
  each message exists only once.

Something like this perhaps:

#if USE_SSL
if (port->ssl)
{
if (walsender) ..
else ..
}
else
#endif
...

-- 
marko



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?

2013-12-06 Thread MauMau

From: "Peter Eisentraut" 

Yeah, this is part of a more general problem, which you have
characterized correctly: What is fatal (or error, or warning, ...) to
the client isn't necessarily fatal (or error, or warning, ...) to the
server or DBA.


Thanks.  In addition, #5 and #6 in my previous mail are even unnecessary for 
both the client and the DBA, aren't they?




Fixing this would need a larger enhancement of the
logging infrastructure.  It's been discussed before, but it's a bit of 
work.


How about the easy fix I proposed?  The current logging infrastructure seems 
enough to solve the original problem with small effort without complicating 
the code.  If you don't like "log_min_messages = PANIC", SetConfigOption() 
can be used instead.  I think we'd better take a step to eliminate the 
facing problem, as well as consider a much richer infrastracture in the long 
run.  I'm also interested in the latter, and want to discuss it after 
solving the problem in front of me.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared memory message queues

2013-12-06 Thread Andres Freund
On 2013-12-05 14:07:39 -0500, Robert Haas wrote:
> On Thu, Dec 5, 2013 at 8:29 AM, Andres Freund  wrote:
> > Hm. The API change of on_shmem_exit() is going to cause a fair bit of
> > pain. There are a fair number of extensions out there using it and all
> > would need to be littered by version dependent #ifdefs. What about
> > providing a version of on_shmem_exit() that allows to specify the exit
> > phase, and make on_shmem_exit() a backward compatible wrapper?
> 
> Or, we could have on_user_exit() or so and leave on_shmem_exit() alone
> altogether, which would probably be transparent for nearly everyone.
> I kind of like that better, except that I'm not sure whether
> on_user_exit() is any good as a name.

Leaving it as separate calls sounds good to me as well - but like you I
don't like on_user_exit() that much. Not sure if I can up with something
really good...
on_shmem_exit_phase() or at_shmem_exit_phase() if we go for a function
allowing to specify phases, and just before_shmem_exit() for the "user
level" things?

> > FWIW, I find on_dsm_detach_cancel() to be a confusing name. I thought it
> > might be a variant that is called in different circumstances than
> > on_dsm_detach(). Maybe go with cancel_on_shmem_detach(), like with
> > cancel_shmem_exit()?
> 
> It could be cancel_on_dsm_detach() if you like that better.  Not
> cancel_on_shmem_detach(), though.

Yes, I like that better. The shm instead of dsm was just a thinko ,)

> > Hm. A couple of questions/comments:
> > * How do you imagine keys to be used/built?
> > * Won't the sequential search over toc entries become problematic?
> > * Some high level overview of how it's supposed to be used wouldn't
> >   hurt.
> > * the estimator stuff doesn't seem to belong in the public header?
> 
> The test-shm-mq patch is intended to illustrate these points.  In that
> case, for an N-worker configuration, there are N+1 keys; that is, N
> message queues and one fixed-size control area.  The estimator stuff
> is critically needed in the public header so that you can size your
> DSM to hold the stuff you intend to store in it; again, see
> test-shm-mq.

I still think shm_mq.c needs to explain more of that. That's where I
look for a brief high-level explanation, no?

> >> Patch #3, shm-mq-v1.patch, is the heart of this series.  It creates an
> >> infrastructure for sending and receiving messages of arbitrary length
> >> using ring buffers stored in shared memory (presumably dynamic shared
> >> memory, but hypothetically the main shared memory segment could be
> >> used).
> >
> > The API seems to assume it's in dsm tho?
> 
> The header file makes no reference to dsm anywhere, so I'm not sure
> why you draw this conclusion.

The reason I was asking was this reference to dsm:
+shm_mq_handle *
+shm_mq_attach(shm_mq *mq, dsm_segment *seg, BackgroundWorkerHandle
*handle)

I'd only looked at the header at that point, but looking at the
function's comment it's otional.


> > Hm. Too bad, I'd hoped for single-reader, multiple-writer :P
> 
> Sure, that might be useful, too, as might multiple-reader,
> multi-writer.  But those things would come with performance and
> complexity trade-offs of their own.  I think it's appropriate to leave
> the task of creating those things to the people that need them.  If it
> turns out that this can be enhanced to work like that without
> meaningful loss of performance, that's fine by me, too, but I think
> this has plenty of merit as-is.

Yea, it's perfectly fine not to implement what I wished for ;). I just
had hoped you would magically develop what I was dreaming about...

> It's symmetric.  The idea is that you try to read or write data;
> should that fail, you wait on your latch and try again when it's set.

Ok, good. That's what I thought.

> > Couple of questions:
> > * Some introductory comments about the concurrency approach would be
> >   good.
> 
> Not sure exactly what to write.

I had a very quick look at shm_mq_receive()/send() and
shm_mq_receive_bytes()/shm_mq_send_bytes() - while the low level steps
seem to be explained fairly well, the high level design of the queue
doesn't seem to be explained much. I was first thinking that you were
trying to implement a lockless queue (which is quite efficiently
possible for 1:1 queues) even because I didn't see any locking in them
till I noticed it's just delegated to helper functions.

> > * shm_mq_attach() is talking about BackgroundWorkerHandles - is there a
> >   limitation that a bgworker has to be involved?

> [sensible stuff]
>
> Possibly there could be some similar mechanism to wait for a
> non-background-worker to stop, but I haven't thought much about what
> that would look like.

I wonder if we couldn't just generally store a "generation" number for
each PGPROC which is increased everytime the slot is getting
reused. Then one could simply check whether mq->mq_sender->generation ==
mq->mq_sender_generation.

Greetings,

Andres Freund

-- 
 Andres Freund ht

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-12-06 Thread Amit Kapila
On Fri, Dec 6, 2013 at 3:39 PM, Haribabu kommi
 wrote:
> On 06 December 2013 12:29 Amit Kapila wrote:
>> >> Note -
>> >> a. Performance is data is taken on my laptop, needs to be tested on
>> >> some better m/c b. Attached Patch is just a prototype of chunkwise
>> >> concept, code needs to be improved and decode
>> >> handling/test is pending.
>> >
>> > I ran the performance test on linux machine and attached the results
>> in the mail.
>>
>
> I ran the performance test on above patches including another patch which
> Does snappy hash instead of normal hash in LZ algorithm. The performance
> Readings and patch with snappy hash not including new data in compression
> are attached in the mail.

   Thanks for taking the data.

> The chunk wise approach is giving good performance in most of the scenarios.

   Agreed, summarization of data for LZ/Chunkwise encoding especially for
   non-compressible (hundred tiny fields, all changed/half changed) or less
   compressible data (hundred tiny fields, half nulled) w.r.t CPU
usage is as below:

   a. For hard disk, there is an overhead of 7~16% with LZ delta encoding
   and there is an overhead of 5~8% with Chunk wise encoding.

   b. For Tempfs (which means operate on RAM as disk), there is an
overhead of 19~26%
   with LZ delta encoding and there is an overhead of 9~18% with
Chunk wise encoding.

There might be some variation of data (in your last mail the overhead
for chunkwise method for Tempfs was < 12%),
but in general the data suggests that chunk wise encoding has less
overhead than LZ encoding for non-compressible data
and for others it is better or equal.

Now, I think we have below options for this patch:
a. If the performance overhead for worst case is acceptable (we can
try to reduce some more, but don't think it will be something
drastic),
then this can be done without any flag/option.
b. Have it with table level option to enable/disable WAL compression
c. Drop this patch, as for worst cases there is some performance overhead.
d. Go back and work more on it, if there is any further suggestions
for improvement.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dblink performance regression

2013-12-06 Thread Fabrízio de Royes Mello
On Fri, Dec 6, 2013 at 2:50 AM, Joe Conway  wrote:
>
> On 12/05/2013 07:16 PM, Fabrízio de Royes Mello wrote:
> > Hi Joe, how are you?
>
> Hi Fabrizio -- great to hear from you! I'm well.
>

:-)


> > Well, when Tom sent this email I was reviewing your patch and the
> > main suggestion is about use of 'pg_encoding_to_char' too... ;-)
> >
> > The attached patch with my review!
>
> Awesome, thanks for the review!
>

You're welcome!

Greetings,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] [PATCH] Add transforms feature

2013-12-06 Thread Hannu Krosing
On 12/06/2013 07:25 AM, Peter Eisentraut wrote:
> On Tue, 2013-11-26 at 18:10 +0100, Dimitri Fontaine wrote:
>> The problem is installing a set of extensions where some of them are
>> already using the new transform feature and some of them are not. We
>> need a way to cater with that, I think.
> Here is an idea.  Add a GUC that basically says something like
> use_transforms = on|off.  You can then attach that to individual
> functions, which is the right granularity, because only the function
> knows whether its code expects transforms or not.  But you can use the
> full power of GUC to configure it any way you want.
It would requite the "old" extensions to be modified to have
(SET use_transforms = off) in all their definitions so that they
would not accidentally be called with  use_transforms = on
from "new" functions, but else it seems like a good way to get
it done without too much effort.

> The only thing this doesn't give you is per-argument granularity, but I
> think the use cases for that are slim, and we don't have a good existing
> mechanism to attach arbitrary attributes to function arguments.
Agreed. And we are quite unlikely to need multiple transforms for
the same type in the same language.
> Actually, I'd take this two steps further.
>
> First, make this parameter per-language, so something like
> plpython.use_transforms.  Then it's up to the language implementation
> how they want to deal with this.  A future new language could just
> ignore the whole issue and require transforms from the start.
I do not really see much need for this, as it will need to be set for
each individual function anyway.

Actually what we could do is just declare a new "language" for this
so functions declared with "LANGUAGE plpythonu" will not be using
transforms and those with "LANGUAGE plpythonuxf" will use it.

This would only need one extra function to be defined in source
code, namely the compile function for the "new" language".


Some not-transforms-related wild ideas follow :)

Adding a new language would also be a good way to fix the bad syntax
choices in pl/python which require code manipulation before compiling .

I came up with this idea after seeing how pl/jsv8 supports multiple
JavaScript-based languages (standard JavaScript, CoffeeScript, LiveScript)
from the same codebase.

Taking the plv8 ideas further we could also create a JavaScript-based
"sandboxed python" using thins like skulpt and pyjamas which compile
python source code to JavaScript VM and inherit all the sandboxing of
v8.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature request: Logging SSL connections

2013-12-06 Thread Dr. Andreas Kunert

That seems useful.  Do we need more information, like whether a client
certificate was presented, or what ciphers were used?


Yes, please show ciphersuite and TLS version too.  Andreas, you can use my
recent \conninfo patch as template:

   
https://github.com/markokr/postgres/commit/7d1b27ac74643abd15007cc4ec0b56ba92b39d90

Also, please show the SSL level also for walsender connections.  It's
quite important to know whether they are using SSL or not.

But I think the 'bits' output is unnecessary, as it's cipher strength
is known by ciphersuite.  Perhaps it can be removed from \conninfo too.


A new patch is attached. I added the ciphersuite and TLS version like 
shown in your template (minus the 'bits' output). I also added the SSL 
information for walsender connections, but due to a missing test setup I 
cannot test that part.


Anything else missing?

--
Andreas
--- postinit.c.orig	2013-12-06 10:26:47.773341120 +0100
+++ postinit.c	2013-12-06 10:37:30.185894650 +0100
@@ -220,6 +220,26 @@
 
 	if (Log_connections)
 	{
+#ifdef USE_SSL
+		if (am_walsender)
+			if (port->ssl > 0)
+ereport(LOG,
+		(errmsg("replication connection authorized: user=%s SSL(protocol: %s, cipher: %s)",
+port->user_name, SSL_get_version(port->ssl), SSL_get_cipher(port->ssl;
+			else
+ereport(LOG,
+		(errmsg("replication connection authorized: user=%s",
+port->user_name)));
+		else
+			if (port->ssl > 0)
+ereport(LOG,
+		(errmsg("connection authorized: user=%s database=%s SSL(protocol: %s, cipher: %s)",
+port->user_name, port->database_name, SSL_get_version(port->ssl), SSL_get_cipher(port->ssl;
+			else
+ereport(LOG,
+		(errmsg("connection authorized: user=%s database=%s",
+port->user_name, port->database_name)));
+#else
 		if (am_walsender)
 			ereport(LOG,
 	(errmsg("replication connection authorized: user=%s",
@@ -228,6 +248,7 @@
 			ereport(LOG,
 	(errmsg("connection authorized: user=%s database=%s",
 			port->user_name, port->database_name)));
+#endif
 	}
 
 	set_ps_display("startup", false);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add transforms feature

2013-12-06 Thread Dimitri Fontaine
Peter Eisentraut  writes:
> Here is an idea.  Add a GUC that basically says something like
> use_transforms = on|off.  You can then attach that to individual
> functions, which is the right granularity, because only the function
> knows whether its code expects transforms or not.  But you can use the
> full power of GUC to configure it any way you want.

+1

> The only thing this doesn't give you is per-argument granularity, but I
> think the use cases for that are slim, and we don't have a good existing
> mechanism to attach arbitrary attributes to function arguments.

+1

> Actually, I'd take this two steps further.
>
> First, make this parameter per-language, so something like
> plpython.use_transforms.  Then it's up to the language implementation
> how they want to deal with this.  A future new language could just
> ignore the whole issue and require transforms from the start.

I'm not sure about this level of granularity, but why not.

> Second, depending on the choice of the language, this parameter could
> take three values: ignore | if available | require.  That would allow
> users to set various kinds of strictness, for example if they want to be
> alerted that a language cannot deal with a particular type.

My understanding is that it always can deal with any particular type if
you consider text based input/output, right?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks storm bug

2013-12-06 Thread Pavel Stehule
2013/12/6 Andres Freund 

> On 2013-12-06 07:22:27 +0100, Pavel Stehule wrote:
> > I have a report of critical bug (database is temporary unavailability ..
> > restart is necessary).
>
> > PostgreSQL 9.2.4,
> > 24 CPU
> > 140G RAM
> > SSD disc for all
> >
> >
> > Database is under high load. There is a few databases with very high
> number
> > of similar simple statements. When application produce higher load, then
> > number of active connection is increased to 300-600 about.
> >
> > In some moment starts described event - there is a minimal IO, all CPU
> are
> > on 100%.
> >
> > Perf result shows:
> >354246.00 93.0% s_lock
> > /usr/lib/postgresql/9.2/bin/postgres
> > 10503.00  2.8% LWLockRelease
> >  /usr/lib/postgresql/9.2/bin/postgres
> >  8802.00  2.3% LWLockAcquire
>
> > We try to limit a connection to 300, but I am not sure if this issue is
> not
> > related to some Postgres bug.
>
> We've seen this issue repeatedly now. None of the times it turned out to
> be a bug, but just limitations in postgres' scalability. If you can I'd
> strongly suggest trying to get postgres binaries compiled with
> -fno-omit-frame-pointer installed to check which locks are actually
> conteded.
> My bet is BufMappingLock.
>
> There's a CF entry about changing our lwlock implementation to scale
> better...
>
>
one missing info - the customer's staff reduced shared buffers from 30G to
5G without success. A database is 20G about.

Regards

Pavel




> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-12-06 Thread Etsuro Fujita
I wrote:
> Amit Khandekar wrote:
> > Yes, I agree that rather than looking at the bitmap heap scan to track
> > the number of pages, we should look somewhere in the underlying index
> > scan. Yes, we should get a constant number of index pages regardless
> > of the actual parent table rows.

> I agree with you.  I'll modify the patch to show 1) the number of the
> exact/lossy pages in a TIDBitmap by examining the underlying index scan,
> not the number of these pages that have been fetched in the bitmap heap
> scan, and 2) the memory requirement.

Though at first I agreed on this, while working on this I start to think 
information about (2) is enough for tuning work_mem.  Here are examples using a 
version under development, where "Bitmap Memory Usage" means (peak) memory 
space used by a TIDBitmap, and "Desired" means the memory required to guarantee 
non-lossy storage of a TID set, which is shown only when the TIDBitmap has been 
lossified.  (work_mem = 1MB.)

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.0001 and 
0.0005 ;
 QUERY PLAN
-
 Bitmap Heap Scan on demo  (cost=77.14..12142.69 rows=3581 width=42) (actual 
time=1.748..53.203 rows=4112 loops=1)
   Recheck Cond: ((col2 >= 0.0001::double precision) AND (col2 <= 
0.0005::double precision))
   Bitmap Memory Usage: 315kB
   ->  Bitmap Index Scan on demo_col2_idx  (cost=0.00..76.25 rows=3581 width=0) 
(actual time=1.113..1.113 rows=4112 loops=1)
 Index Cond: ((col2 >= 0.0001::double precision) AND (col2 <= 
0.0005::double precision))
 Total runtime: 53.804 ms
(6 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.05 ;
 QUERY PLAN
-
 Bitmap Heap Scan on demo  (cost=8307.41..107635.14 rows=391315 width=42) 
(actual time=84.818..2709.015 rows=400172 loops=1)
   Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.05::double 
precision))
   Rows Removed by Index Recheck: 8815752
   Bitmap Memory Usage: 1025kB (desired 20573kB)
   ->  Bitmap Index Scan on demo_col2_idx  (cost=0.00..8209.58 rows=391315 
width=0) (actual time=83.664..83.664 rows=400172 loops=1)
 Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 
0.05::double precision))
 Total runtime: 2747.088 ms
(7 rows)

We should look at (1) as well?  (Honestly, I don't know what to show about (1) 
when using a bitmap scan on the inside of a nestloop join.  For memory usage 
and desired memory I think the maximum values would be fine.)  I re-wish to 
know your opinion.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks storm bug

2013-12-06 Thread Andres Freund
On 2013-12-06 07:22:27 +0100, Pavel Stehule wrote:
> I have a report of critical bug (database is temporary unavailability ..
> restart is necessary).

> PostgreSQL 9.2.4,
> 24 CPU
> 140G RAM
> SSD disc for all
> 
> 
> Database is under high load. There is a few databases with very high number
> of similar simple statements. When application produce higher load, then
> number of active connection is increased to 300-600 about.
> 
> In some moment starts described event - there is a minimal IO, all CPU are
> on 100%.
> 
> Perf result shows:
>354246.00 93.0% s_lock
> /usr/lib/postgresql/9.2/bin/postgres
> 10503.00  2.8% LWLockRelease
>  /usr/lib/postgresql/9.2/bin/postgres
>  8802.00  2.3% LWLockAcquire

> We try to limit a connection to 300, but I am not sure if this issue is not
> related to some Postgres bug.

We've seen this issue repeatedly now. None of the times it turned out to
be a bug, but just limitations in postgres' scalability. If you can I'd
strongly suggest trying to get postgres binaries compiled with
-fno-omit-frame-pointer installed to check which locks are actually
conteded.
My bet is BufMappingLock.

There's a CF entry about changing our lwlock implementation to scale
better...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup throttling

2013-12-06 Thread Boszormenyi Zoltan

Hi,

2013-12-05 15:36 keltezéssel, Antonin Houska írta:

On 12/02/2013 02:23 PM, Boszormenyi Zoltan wrote:

Hi,

I am reviewing your patch.

Thanks. New version attached.


I have reviewed and tested it and marked it as ready for committer.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-06 Thread Andres Freund
On 2013-12-05 17:52:34 -0800, Peter Geoghegan wrote:
> Has anyone ever thought about opportunistic ANALYZE piggy-backing on
> other full-table scans? That doesn't really help Greg, because his
> complaint is mostly that a fresh ANALYZE is too expensive, but it
> could be an interesting, albeit risky approach.

What I've been thinking of is

a) making it piggy back on scans vacuum is doing instead of doing
separate ones all the time (if possible, analyze needs to be more
frequent). Currently with quite some likelihood the cache will be gone
again when revisiting.

b) make analyze incremental. In lots of bigger tables most of the table
is static - and we actually *do* know that, thanks to the vm. So keep a
rawer form of what ends in the catalogs around somewhere, chunked by the
region of the table the statistic is from. Everytime a part of the table
changes, re-sample only that part. Then recompute the aggregate.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-06 Thread Amit Kapila
On Fri, Dec 6, 2013 at 7:22 AM, Peter Geoghegan  wrote:
> On Thu, Dec 5, 2013 at 3:50 PM, Josh Berkus  wrote:
>> There are fairly well researched algorithms for block-based sampling
>> which estimate for the skew introduced by looking at consecutive rows in
>> a block.  In general, a minimum sample size of 5% is required, and the
>> error is no worse than our current system.  However, the idea was shot
>> down at the time, partly because I think other hackers didn't get the math.
>
> I think that this certainly warrants revisiting. The benefits would be
> considerable.
>
> Has anyone ever thought about opportunistic ANALYZE piggy-backing on
> other full-table scans? That doesn't really help Greg, because his
> complaint is mostly that a fresh ANALYZE is too expensive, but it
> could be an interesting, albeit risky approach.

Is only fresh ANALYZE costly or consecutive one's are also equally costly?

Doing it in some background operation might not be a bad idea, but doing it
in backend query execution (seq scan) might add overhead for query response time
especially if part or most of data for table is in RAM, so here
overhead due to actual read
might not be very high but the calculation for analyse (like sort)
will make it costly.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Further questions about WITHIN GROUP:

 Tom> I believe that the spec requires that the "direct" arguments of
 Tom> an inverse or hypothetical-set aggregate must not contain any
 Tom> Vars of the current query level.

False.

The spec requires that the direct arguments must not contain _ungrouped_
columns (see ), but nowhere are grouped
columns forbidden.

 Tom> They don't manage to say that in plain English, of course, but
 Tom> in the  case the behavior is defined
 Tom> in terms of this sub-select:

 Tom>   ( SELECT 0, SK1, ..., SKK
 Tom> FROM TNAME UNION ALL
 Tom> VALUES (1, VE1, ..., VEK) )

"TNAME" there is not the input table or an alias for it, but rather
the specific subset of rows to which the grouping operation is being
applied (after applying a FILTER if any).

We're in the General Rules here, not the Syntax Rules, so this is
describing _how to compute the result_ rather than a syntactic
transformation of the input.

In any event, going by the docs on the web, Oracle does not forbid
grouped columns there (their wording is "This expr must be constant
within each aggregation group.") MSSQL seems to require a literal
constant, but that's obviously not per spec. IBM doesn't seem to
have it in db2 for linux, but some of their other products have it
and include examples of using grouped vars: see

http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.dbu.doc%2Fc_dbuser_hypothetical_set_family_syntax.html

So I'm going to say that the majority opinion is on my side here.

 Tom> So that's all fine, but the patch seems a bit confused about it.

The patch treats vars in the direct args exactly as it would treat
them anywhere else where they must be ungrouped.

[snip a bunch of stuff based on false assumptions]

 Tom> What I now think we should do about the added pg_aggregate
 Tom> columns is to have:

 Tom> aggnfixedargs int number of fixed arguments, excluding any
 Tom>   hypothetical ones (always 0 for normal aggregates)

 Tom> aggkind   char'n' for normal aggregate,
 Tom>   'o' for ordered set function,
 Tom>   'h' for hypothetical-set function

I don't see any obvious problem with this.

 Tom> with the parsing rules that given

 Tom>  agg( n fixed arguments ) WITHIN GROUP ( ORDER BY k sort 
specifications )

 Tom> 1. WITHIN GROUP is disallowed for normal aggregates.

(This is what the submitted patch does.)

 Tom> 2. For an ordered set function, n must equal aggnfixedargs.  We
 Tom> treat all n fixed arguments as contributing to the aggregate's
 Tom> result collation, but ignore the sort arguments.

That doesn't work for getting a sensible collation out of
percentile_disc applied to a collatable type. (Which admittedly is an
extension to the spec, which allows only numeric and interval, but it
seems to me to be worth having.)

 Tom> 3. For a hypothetical-set function, n must equal aggnfixedargs
 Tom> plus k, and we match up type and collation info of the last k
 Tom> fixed arguments with the corresponding sort arguments.  The
 Tom> first n-k fixed arguments contribute to the aggregate's result
 Tom> collation, the rest not.

The submitted patch does essentially this but taking the number of
non-variadic args in place of the suggested aggnfixedargs. Presumably
in your version the latter would be derived from the former?

 Tom> Reading back over this email, I see I've gone back and forth
 Tom> between using the terms "direct args" and "fixed args" for the
 Tom> evaluate-once stuff to the left of WITHIN GROUP.  I guess I'm
 Tom> not really sold on either terminology, but we need something we
 Tom> can use consistently in the code and docs.  The spec is no help,
 Tom> it has no generic term at all for these args.  Does anybody else
 Tom> have a preference, or maybe another suggestion entirely?

We (Atri and I) have been using "direct args", but personally I'm not
amazingly happy with it. Documentation for other dbs tends to just call
them "arguments", and refer to the WITHIN GROUP expressions as "ordering
expressions" or similar.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers