Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread Mark Kirkwood

On 18/10/14 07:13, Josh Berkus wrote:

CK,

Before we go any further on this, how is Vitesse currently licensed?
last time we talked it was still proprietary.  If it's not being
open-sourced, we likely need to take discussion off this list.



+1

Guys, you need to 'fess up on the licensing!

Regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread CK Tan
Indeed! A big part of our implementation is based on the Neumann
paper. There are also a few other papers that impacted our
implemented:

A. Ailamaki, D. DeWitt, M. Hill, D. Wood. DBMSs On A Modern Processor:
Where Does Time Go?

Peter Boncz, Marcin Zukowski, Niels Nes. MonetDB/X100:
Hyper-Pipelining Query Execution

M. Zukowski el al. Super-Scalar RAM-CPU Cache Compression

Of course, we need to adapt a lot of the design to Postgres to make
something that could stand up harmoniously with the Postgres system,
and also to take care that we would be able to merge easily with
future versions of Postgres -- the implementation needs to be as
non-invasive as possible.

Regards,
-cktan

On Fri, Oct 17, 2014 at 8:40 PM, David Gould  wrote:
> On Fri, 17 Oct 2014 13:12:27 -0400
> Tom Lane  wrote:
>
>> CK Tan  writes:
>> > The bigint sum,avg,count case in the example you tried has some 
>> > optimization. We use int128 to accumulate the bigint instead of numeric in 
>> > pg. Hence the big speed up. Try the same query on int4 for the improvement 
>> > where both pg and vitessedb are using int4 in the execution.
>>
>> Well, that's pretty much cheating: it's too hard to disentangle what's
>> coming from JIT vs what's coming from using a different accumulator
>> datatype.  If we wanted to depend on having int128 available we could
>> get that speedup with a couple hours' work.
>>
>> But what exactly are you "compiling" here?  I trust not the actual data
>> accesses; that seems far too complicated to try to inline.
>>
>>   regards, tom lane
>>
>>
>
> I don't have any inside knowledge, but from the presentation given at the
> recent SFPUG followed by a bit of google-fu I think these papers are
> relevant:
>
>   http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
>   http://sites.computer.org/debull/A14mar/p3.pdf
>
> -dg
>
> --
> David Gould  510 282 0869 da...@sonic.net
> If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trailing comma support in SELECT statements

2014-10-17 Thread David E. Wheeler
On Oct 17, 2014, at 3:18 PM, Tom Lane  wrote:

> Yeah, exactly.  Personally I'm *not* for this, but if we do it we should
> do it consistently: every comma-separated list in the SQL syntax should
> work the same.

PL/pgSQL, too, I presume.

D

smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-17 Thread MauMau

Hello,

As I said in the previous mail, I looked into the latest PCI DSS 3.0 to find 
out whether and how pgaudit fulfills the requirements related to auditing. 
I believe that even the initial version of pgaudit needs to have enough 
functionalities to meet the requirements of some well-known standard, in 
order to demonstrate that PostgreSQL provides a really useful auditing 
feature.  I chose PCI DSS because it seems popular worldwide.


Most requirement items are met, but some are not or I'm not sure if and/or 
how.  I only listed items which I'd like to ask for opinions.  I'm sorry I 
can't have good suggestions, but I hope this report will lead to good 
discussion and better auditing feature we can boast of.  Of course, I'll 
suggest any idea when I think of something.



[requirement]
3.4 Render PAN unreadable anywhere it
is stored (including on portable digital
media, backup media, and in logs) by
using any of the following approaches:
...
3.4.d Examine a sample of audit logs to confirm that the PAN is
rendered unreadable or removed from the logs.

[my comment]
Do the audit log entries always hide the actual bind parameter values (with 
$1, $2, etc.) if the application uses parameterized SQL statements?  Should 
we advise users to use parameterized statements in the pgaudit 
documentation? (I think so)




[requirement]
10.2.2 Verify all actions taken by any individual with root or
administrative privileges are logged.
10.2.6 Verify the following are logged:
. Initialization of audit logs
. Stopping or pausing of audit logs.

[my comment]
The database superuser can hide his activity by "SET pgaudit.log = '';", but 
this SET is audit-logged.  Therefore, I think these requirements is met 
because the fact that the superuser's suspicious behavior (hiding activity) 
is recorded.  Do you think this interpretation is valid?




[requirement]
10.2.3 Verify access to all audit trails is logged.

Malicious users often attempt to alter audit logs to
hide their actions, and a record of access allows
an organization to trace any inconsistencies or
potential tampering of the logs to an individual
account. Having access to logs identifying
changes, additions, and deletions can help retrace
steps made by unauthorized personnel.

[my comment]
I'm totally unsure how this can be fulfilled.



[requirement]
10.2.4 Verify invalid logical access attempts are logged.

Malicious individuals will often perform multiple
access attempts on targeted systems. Multiple
invalid login attempts may be an indication of an
unauthorized user’s attempts to “brute force” or
guess a password.

[my comment]
Login attempts also need to be audit-logged to meet this requirement.



[requirement]
10.2.5.a Verify use of identification and authentication
mechanisms is logged.

Without knowing who was logged on at the time of
an incident, it is impossible to identify the
accounts that may have been used. Additionally,
malicious users may attempt to manipulate the
authentication controls with the intent of
bypassing them or impersonating a valid account.

[my comment]
Can we consider this is met because pgaudit records the session user name?



[requirement]
10.3 Record at least the following audit
trail entries for all system components for
each event:
10.3.4 Verify success or failure indication is included in log
entries.
10.3.5 Verify origination of event is included in log entries.

[my comment]
This doesn't seem to be fulfilled because the failure of SQL statements and 
the client address are not part of the audit log entry.




[requirement]
10.5 Secure audit trails so they cannot
be altered.
10.5 Interview system administrators and examine system
configurations and permissions to verify that audit trails are
secured so that they cannot be altered as follows:
10.5.1 Only individuals who have a job-related need can view
audit trail files.
Adequate protection of the audit logs includes
strong access control (limit access to logs based
on “need to know” only), and use of physical or
network segregation to make the logs harder to
find and modify.
Promptly backing up the logs to a centralized log
server or media that is difficult to alter keeps the
logs protected even if the system generating the
logs becomes compromised.
10.5.2 Protect audit trail files from
unauthorized modifications.

[my comment]
I don't know how to achieve these, because the DBA (who starts/stops the 
server) can modify and delete server log files without any record.




[requirement]
10.6 Review logs and security events for
all system components to identify
anomalies or suspicious activity.
Note: Log harvesting, parsing, and
alerting tools may be used to meet this
Requirement.
The log review process does not have to be
manual. The use of log harvesting, parsing, and
alerting tools can help facilitate the process by
identifying log events that need to be reviewed.

[my comment]
What commercial and open source products are well known as the "log 
harvesting, parsing, and 

Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread David Gould
On Fri, 17 Oct 2014 13:12:27 -0400
Tom Lane  wrote:

> CK Tan  writes:
> > The bigint sum,avg,count case in the example you tried has some 
> > optimization. We use int128 to accumulate the bigint instead of numeric in 
> > pg. Hence the big speed up. Try the same query on int4 for the improvement 
> > where both pg and vitessedb are using int4 in the execution.
> 
> Well, that's pretty much cheating: it's too hard to disentangle what's
> coming from JIT vs what's coming from using a different accumulator
> datatype.  If we wanted to depend on having int128 available we could
> get that speedup with a couple hours' work.
> 
> But what exactly are you "compiling" here?  I trust not the actual data
> accesses; that seems far too complicated to try to inline.
> 
>   regards, tom lane
> 
> 

I don't have any inside knowledge, but from the presentation given at the
recent SFPUG followed by a bit of google-fu I think these papers are
relevant:

  http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
  http://sites.computer.org/debull/A14mar/p3.pdf

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-17 Thread Tom Lane
I wrote:
> Because it needs up-to-date min/max values in order to avoid being
> seriously misled about selectivities of values near the endpoints.
> See commit 40608e7f949fb7e4025c0ddd5be01939adc79eec.

BTW, on re-reading that code I notice that it will happily seize upon
the first suitable index ("first" in OID order), regardless of how many
lower-order columns that index has got.  This doesn't make any difference
I think for get_actual_variable_range's own purposes, because it's only
expecting to touch the endmost index page regardless.  However, in light
of Marko's complaint maybe we should teach it to check all the indexes
and prefer the matching one with fewest columns?  It would only take a
couple extra lines of code, and probably not that many added cycles
considering we're going to do an index access of some sort.  But I'm
not sure if it's worth any extra effort --- I think in his example
case, there wasn't any narrower index anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-17 Thread Tom Lane
Bruce Momjian  writes:
> On Fri, Oct 17, 2014 at 06:15:37PM -0400, Tom Lane wrote:
>> Those stats were perfectly valid: what the planner is looking for is
>> accurate minimum and maximum values for the index's leading column, and
>> that's what it got.  You're correct that a narrower index could have given
>> the same results with a smaller disk footprint, but the planner got the
>> results it needed from the index you provided for it to work with.

> Uh, why is the optimizer looking at the index on a,b,c and not just the
> stats on column a, for example?  I am missing something here.

Because it needs up-to-date min/max values in order to avoid being
seriously misled about selectivities of values near the endpoints.
See commit 40608e7f949fb7e4025c0ddd5be01939adc79eec.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-17 Thread Bruce Momjian
On Fri, Oct 17, 2014 at 06:15:37PM -0400, Tom Lane wrote:
> Marko Tiikkaja  writes:
> > On 10/17/14, 11:59 PM, Tom Lane wrote:
> >> Well, the index might've been getting used in queries too in a way that
> >> really only involved the first column.  I think you're solving the wrong
> >> problem here.  The right problem is how to identify indexes that are
> >> being used in a way that doesn't exploit all the columns.
> 
> > I'm not sure I agree with that.  Even if there was some information the 
> > planner could have extracted out of the index by using all columns (thus 
> > appearing "fully used" in these hypothetical new statistics), I still 
> > would've wanted the index gone.  But in this particular case, an index 
> > on foo(a) alone was not selective enough and it would have been a bad 
> > choice for practically every query, so I'm not sure what good those 
> > statistics were in the first place.
> 
> Those stats were perfectly valid: what the planner is looking for is
> accurate minimum and maximum values for the index's leading column, and
> that's what it got.  You're correct that a narrower index could have given
> the same results with a smaller disk footprint, but the planner got the
> results it needed from the index you provided for it to work with.

Uh, why is the optimizer looking at the index on a,b,c and not just the
stats on column a, for example?  I am missing something here.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Directory/File Access Permissions for COPY and Generic File Access Functions

2014-10-17 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Oct 16, 2014 at 12:01:28PM -0400, Stephen Frost wrote:
> > This started out as a request for a non-superuser to be able to review
> > the log files without needing access to the server.  Now, things can
> > certainly be set up on the server to import *all* logs and then grant
> > access to a non-superuser, but generally it's "I need to review the log
> > from X to Y" and not *all* logs need to be stored or kept in PG.
> 
> Why is this patch showing up before being discussed?  You are having to
> back into the discusion because of this.

For my part, I didn't actually see it as being a questionable use-case
from the start..  That was obviously incorrect, though I didn't know
that previously.  The general idea has been discussed a couple of times
before, at least as far back as 2005:

http://www.postgresql.org/message-id/430f78e0.9020...@cs.concordia.ca

It's also a feature available in other databases (at least MySQL and
Oracle, but I'm pretty sure others also).

I can also recall chatting with folks about it a couple of times over
the years at various conferences.  Still, perhaps it would have been
better to post about the idea before the patch, but hindsight is often
20/20.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Optimizer on sort aggregate

2014-10-17 Thread Peter Geoghegan
On Fri, Oct 17, 2014 at 6:25 PM, Feng Tian  wrote:
> I feel sorting string as if it is bytea is particularly interesting.  I am
> aware Peter G's patch and I think it is great, but for this sort agg case,
> first, I believe it is still slower than sorting bytea, and second, Peter
> G's patch depends on data.   A common long prefix will make the patch less
> effective, which is probably not so uncommon (for example, URL with a domain
> prefix).  I don't see any downside of sort bytea, other than lost the
> interest ordering.

FWIW, that's probably less true than you'd think. Using Robert's test program:

pg@hamster:~/code$ ./strxfrm-binary en_US.UTF-8 "http://www.something";
"http://www.something"; ->
131f1f1b221e1a18101f131419120109090909090909090909090909090909010909090909090909090909090909090901053d014201420444
(59 bytes)
pg@hamster:~/code$ ./strxfrm-binary en_US.UTF-8 "http://www.another";
"http://www.another"; ->
131f1f1b220c191a1f13101d01090909090909090909090909090901090909090909090909090909090901053d014201420444
(53 bytes)

So the first eight bytes of the first string is 0x131F1F1B221E,
and the second 0x131F1F1B220C. The last byte is different. That's
because the way the Unicode algorithm [1] works, there is often a
significantly greater concentration of entropy in the first 8 bytes as
compared to raw C strings compared with memcmp() - punctuation
characters and so on are not actually described at the primary weight
level. If we can get even a single byte to somewhat differentiate each
string, we can still win by a very significant amount - just not an
enormous amount. The break even point is hard to characterize exactly,
but I'm quite optimistic that a large majority of real-world text
sorts will see at least some benefit, while a smaller majority will be
much, much faster.

[1] http://www.unicode.org/reports/tr10/#Notation
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Directory/File Access Permissions for COPY and Generic File Access Functions

2014-10-17 Thread Bruce Momjian
On Thu, Oct 16, 2014 at 12:01:28PM -0400, Stephen Frost wrote:
> This started out as a request for a non-superuser to be able to review
> the log files without needing access to the server.  Now, things can
> certainly be set up on the server to import *all* logs and then grant
> access to a non-superuser, but generally it's "I need to review the log
> from X to Y" and not *all* logs need to be stored or kept in PG.

Why is this patch showing up before being discussed?  You are having to
back into the discusion because of this.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizer on sort aggregate

2014-10-17 Thread Feng Tian
Hi, David,

Yes, switch sorting order would loose an interesting order so if user
dictates order by t, i; planner need to resort to its cost model.
Estimating cardinality of groupby is a much bigger topic than this thread.

I feel sorting string as if it is bytea is particularly interesting.  I am
aware Peter G's patch and I think it is great, but for this sort agg case,
first, I believe it is still slower than sorting bytea, and second, Peter
G's patch depends on data.   A common long prefix will make the patch less
effective, which is probably not so uncommon (for example, URL with a
domain prefix).  I don't see any downside of sort bytea, other than lost
the interest ordering.

Thanks,
Feng



On Fri, Oct 17, 2014 at 4:36 PM, David Rowley  wrote:

> On Sat, Oct 18, 2014 at 5:10 AM, Feng Tian  wrote:
>
>> Hi,
>>
>> Consider the following queries.
>>
>> create table t(i int, j int, k int, t text);
>> insert into t select i, i % 100, i % 1000, 'AABBCCDD' || i from
>> generate_series(1, 100) i;
>>
>> ftian=# explain select count(distinct j) from t group by t, i;
>>QUERY PLAN
>> 
>>  GroupAggregate  (cost=158029.84..178029.84 rows=100 width=22)
>>->  Sort  (cost=158029.84..160529.84 rows=100 width=22)
>>  Sort Key: t, i
>>  ->  Seq Scan on t  (cost=0.00..17352.00 rows=100 width=22)
>> (4 rows)
>>
>>
>> The query,
>> select count(distinct j) from t group by t, i;
>>
>> runs for 35 seconds.  However, if I change the query to,
>> select count(distinct j) from t group by i, t;  -- note switching the
>> ordering
>> select count(distinct j) from t group by decode(t, 'escape'), i; --
>> convert t to bytea
>>
>> Run times are just about 5 and 6.5 seconds.  The reason is clear, compare
>> a string with collation is slow, which is well understood by pg hackers.
>> However, here, the sorting order is forced by the planner, not user.
>> Planner can do the following optimizations,
>>
>> 1. for the sort we generated for sort agg, planner can switch column
>> ordering, put int before string,
>> 2. for the sort we generated for sort agg, use bytea compare instead of
>> string compare.
>>
>> They will bring big improvement to this common query.   Is this something
>> reasonable?
>>
>>
> It seems like it might be worth looking into, but I think it's likely more
> complex than sorting the group by order into narrowest column first.
>
> If the query was:
>
> select count(distinct j) from t group by t, i order by t, i;
>
> Then if that was  rewritten to make the group by i,t then the planner
> would then need to perform an additional sort on t,i to get the correct
> order by for the final result, which may or may not be faster, it would
> depend on how many groups there were to sort. Though I guess you could
> possibly just skip the optimisation if the next node up didn't need any
> specific ordering.
>
> I also wonder if taking into account the column's width is not quite
> enough. For example if the 'i' column only had 1 distinct value, then
> performing the group by on 'i' first wouldn't help at all. Keep in mind
> that the columns could be much closer in width than in your example, e.g
> int and bigint. Perhaps you'd need to include some sort of heuristics to
> look at the number of distinct values and the width, and form some sort of
> weighted estimates about which column to put first.
>
> Regards
>
> David Rowley
>


Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)

2014-10-17 Thread Tom Lane
David Rowley  writes:
> On Fri, Oct 17, 2014 at 3:34 AM, Tom Lane  wrote:
>> I don't want to go there.  It would be a lot better to expend the effort
>> on a better regression testing infrastructure that wouldn't *need*
>> bitwise-identical output across platforms.  (mysql is ahead of us in that
>> department: they have some hacks for selective matching of the output.)

> Perhaps we could introduce some sort of wildcard matching in the regression
> tests. So that we could stick something like:
>  Execution time: * ms
> Into the expected results, though, probably we'd need to come up with some
> wildcard character which is a bit less common than *

I was imagining that we might as well allow regexp matching, so you could
be as specific as

   Execution time: \d+\.\d+ ms

if you had a mind to.  But with or without that, it would be difficult to
pick a meta-character that never appears in expected-output files today.
What we'd probably want to do (again, I'm stealing ideas from what I
remember of the mysql regression tests) is add metasyntax to switch
between literal and wildcard/regexp matching.  So perhaps an expected
file could contain something like

-- !!match regexp
... expected output including regexps ...
-- !!match literal
... normal expected output here

Not sure how we get there without writing our own diff engine though :-(.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)

2014-10-17 Thread David Rowley
On Fri, Oct 17, 2014 at 3:34 AM, Tom Lane  wrote:

> Andres Freund  writes:
> > On 2014-10-16 10:06:59 -0400, Tom Lane wrote:
> >> No, it wasn't.  I'm not convinced either that that patch will get in at
> >> all, or that it has to have regression tests of that particular form,
> >> or that such a switch would be sufficient to make such tests platform
> >> independent.
>
> > Why should the EXPLAIN ANALYZE output without timing information be less
> > consistent for sensibly selected cases than EXPLAIN itself?
>
> To take just one example, the performance numbers that get printed for
> a sort, such as memory consumption, are undoubtedly platform-dependent.
> Maybe your idea of "sensibly selected cases" excludes sorting, but
> I don't find such an argument terribly convincing.  I think if we go
> down this road, we are going to end up with an EXPLAIN that has one
> hundred parameters turning on and off tiny pieces of the output, none
> of which are of any great use for anything except the regression tests.
> I don't want to go there.  It would be a lot better to expend the effort
> on a better regression testing infrastructure that wouldn't *need*
> bitwise-identical output across platforms.  (mysql is ahead of us in that
> department: they have some hacks for selective matching of the output.)
>
>
I saw this, and I was about to ask the same question as Andres I think
I now see what you're worried about. Next we'd need a flag to disable
external disk sort sizes too...

Perhaps we could introduce some sort of wildcard matching in the regression
tests. So that we could stick something like:

 Execution time: * ms

Into the expected results, though, probably we'd need to come up with some
wildcard character which is a bit less common than *

It might be hard to generate a useful diff with this for when a test fails,
but maybe it'd be good enough to just run the 2 files through this wildcard
matching programme and then just do a diff if it fails.

Regards

David Rowley


Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-17 Thread Stephen Frost
* Peter Eisentraut (pete...@gmx.net) wrote:
> On 10/16/14 9:45 AM, Stephen Frost wrote:
> > Alright, coming back to this, I have to ask- how are matviews different
> > from views from the SQL standard's perspective?  I tried looking through
> > the standard to figure it out (and I admit that I probably missed
> > something), but the only thing appears to be a statement in the standard
> > that (paraphrased) "functions are run with the view is queried" and that
> > strikes me as a relatively minor point..
> 
> To me, the main criterion is that you cannot DROP VIEW a materialized view.

That is an entirely correctable situation.  We don't require 'DROP
UNLOGGED TABLE'.

> Generally, if the information schema claims that a
> view/table/function/etc. named "foo" exists, then I should be able to
> operate on "foo" using the basic operations for a
> view/table/function/etc. of that name.  I think think DROP VIEW is a
> basic operation for a view.  Others might disagree.

This strikes me as a reason to allow DROP VIEW and perhaps other
operations against a matview, not as a reason why matviews aren't views.

> More subtly, if we claim that a materialized view is a view, then we
> cannot have asynchronously updated materialized views, because then we
> have different semantics.

This is, at least, a reason I can understand, though I'm not sure I see
it as sufficient to say matviews are so different from views that they
shouldn't be listed as such.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-17 Thread Stephen Frost
* Nicolas Barbier (nicolas.barb...@gmail.com) wrote:
> 2014-10-16 Stephen Frost :
> 
> > Alright, coming back to this, I have to ask- how are matviews different
> > from views from the SQL standard's perspective?
> 
> Matviews that are always up to date when you access them are
> semantically exactly the same as normal views. Matviews that can get
> out of date, however, are not.

And when we have matviews which can be kept up to date..?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Optimizer on sort aggregate

2014-10-17 Thread David Rowley
On Sat, Oct 18, 2014 at 12:35 PM, Tatsuo Ishii  wrote:

> > The query,
> > select count(distinct j) from t group by t, i;
> >
> > runs for 35 seconds.  However, if I change the query to,
> > select count(distinct j) from t group by i, t;  -- note switching the
> > ordering
> > select count(distinct j) from t group by decode(t, 'escape'), i; --
> convert
> > t to bytea
> >
> > Run times are just about 5 and 6.5 seconds.  The reason is clear,
> compare a
> > string with collation is slow, which is well understood by pg hackers.
> > However, here, the sorting order is forced by the planner, not user.
> > Planner can do the following optimizations,
>
> Interesting. I got following result:
>
> test=# explain analyze select count(distinct j) from t group by t, i;
> QUERY PLAN
>
> --
>  GroupAggregate  (cost=137519.84..157519.84 rows=100 width=22) (actual
> time=1332.937..2431.238 rows=100 loops=1)
>Group Key: t, i
>->  Sort  (cost=137519.84..140019.84 rows=100 width=22) (actual
> time=1332.922..1507.413 rows=100 loops=1)
>  Sort Key: t, i
>  Sort Method: external merge  Disk: 33232kB
>  ->  Seq Scan on t  (cost=0.00..17352.00 rows=100 width=22)
> (actual time=0.006..131.406 rows=100 loops=1)
>  Planning time: 0.031 ms
>  Execution time: 2484.271 ms
> (8 rows)
>
> Time: 2484.520 ms
>
> test=# explain analyze select count(distinct j) from t group by i, t;
> QUERY PLAN
>
> --
>  GroupAggregate  (cost=137519.84..157519.84 rows=100 width=22) (actual
> time=602.510..1632.087 rows=100 loops=1)
>Group Key: i, t
>->  Sort  (cost=137519.84..140019.84 rows=100 width=22) (actual
> time=602.493..703.274 rows=100 loops=1)
>  Sort Key: i, t
>  Sort Method: external sort  Disk: 33240kB
>  ->  Seq Scan on t  (cost=0.00..17352.00 rows=100 width=22)
> (actual time=0.014..129.213 rows=100 loops=1)
>  Planning time: 0.176 ms
>  Execution time: 1685.575 ms
> (8 rows)
>
> Time: 1687.641 ms
>
> Not so big difference here (maybe because I use SSD) but there is
> still about 50% difference in execution time. Note that I disable
> locale support.
>
>
I think this is more likely your locale settings, as if I do:

 create table t(i int, j int, k int, t text collate "C");
The GROUP BY t,i runs about 25% faster.

I've not looked at it yet, but Peter G's patch here
https://commitfest.postgresql.org/action/patch_view?id=1462 will quite
likely narrow the performance gap between the 2 queries.

Regards

David Rowley


Re: [HACKERS] Optimizer on sort aggregate

2014-10-17 Thread David Rowley
On Sat, Oct 18, 2014 at 5:10 AM, Feng Tian  wrote:

> Hi,
>
> Consider the following queries.
>
> create table t(i int, j int, k int, t text);
> insert into t select i, i % 100, i % 1000, 'AABBCCDD' || i from
> generate_series(1, 100) i;
>
> ftian=# explain select count(distinct j) from t group by t, i;
>QUERY PLAN
> 
>  GroupAggregate  (cost=158029.84..178029.84 rows=100 width=22)
>->  Sort  (cost=158029.84..160529.84 rows=100 width=22)
>  Sort Key: t, i
>  ->  Seq Scan on t  (cost=0.00..17352.00 rows=100 width=22)
> (4 rows)
>
>
> The query,
> select count(distinct j) from t group by t, i;
>
> runs for 35 seconds.  However, if I change the query to,
> select count(distinct j) from t group by i, t;  -- note switching the
> ordering
> select count(distinct j) from t group by decode(t, 'escape'), i; --
> convert t to bytea
>
> Run times are just about 5 and 6.5 seconds.  The reason is clear, compare
> a string with collation is slow, which is well understood by pg hackers.
> However, here, the sorting order is forced by the planner, not user.
> Planner can do the following optimizations,
>
> 1. for the sort we generated for sort agg, planner can switch column
> ordering, put int before string,
> 2. for the sort we generated for sort agg, use bytea compare instead of
> string compare.
>
> They will bring big improvement to this common query.   Is this something
> reasonable?
>
>
It seems like it might be worth looking into, but I think it's likely more
complex than sorting the group by order into narrowest column first.

If the query was:

select count(distinct j) from t group by t, i order by t, i;

Then if that was  rewritten to make the group by i,t then the planner would
then need to perform an additional sort on t,i to get the correct order by
for the final result, which may or may not be faster, it would depend on
how many groups there were to sort. Though I guess you could possibly just
skip the optimisation if the next node up didn't need any specific ordering.

I also wonder if taking into account the column's width is not quite
enough. For example if the 'i' column only had 1 distinct value, then
performing the group by on 'i' first wouldn't help at all. Keep in mind
that the columns could be much closer in width than in your example, e.g
int and bigint. Perhaps you'd need to include some sort of heuristics to
look at the number of distinct values and the width, and form some sort of
weighted estimates about which column to put first.

Regards

David Rowley


Re: [HACKERS] Optimizer on sort aggregate

2014-10-17 Thread Tatsuo Ishii
> The query,
> select count(distinct j) from t group by t, i;
> 
> runs for 35 seconds.  However, if I change the query to,
> select count(distinct j) from t group by i, t;  -- note switching the
> ordering
> select count(distinct j) from t group by decode(t, 'escape'), i; -- convert
> t to bytea
> 
> Run times are just about 5 and 6.5 seconds.  The reason is clear, compare a
> string with collation is slow, which is well understood by pg hackers.
> However, here, the sorting order is forced by the planner, not user.
> Planner can do the following optimizations,

Interesting. I got following result:

test=# explain analyze select count(distinct j) from t group by t, i;
QUERY PLAN  
  
--
 GroupAggregate  (cost=137519.84..157519.84 rows=100 width=22) (actual 
time=1332.937..2431.238 rows=100 loops=1)
   Group Key: t, i
   ->  Sort  (cost=137519.84..140019.84 rows=100 width=22) (actual 
time=1332.922..1507.413 rows=100 loops=1)
 Sort Key: t, i
 Sort Method: external merge  Disk: 33232kB
 ->  Seq Scan on t  (cost=0.00..17352.00 rows=100 width=22) (actual 
time=0.006..131.406 rows=100 loops=1)
 Planning time: 0.031 ms
 Execution time: 2484.271 ms
(8 rows)

Time: 2484.520 ms

test=# explain analyze select count(distinct j) from t group by i, t;
QUERY PLAN  
  
--
 GroupAggregate  (cost=137519.84..157519.84 rows=100 width=22) (actual 
time=602.510..1632.087 rows=100 loops=1)
   Group Key: i, t
   ->  Sort  (cost=137519.84..140019.84 rows=100 width=22) (actual 
time=602.493..703.274 rows=100 loops=1)
 Sort Key: i, t
 Sort Method: external sort  Disk: 33240kB
 ->  Seq Scan on t  (cost=0.00..17352.00 rows=100 width=22) (actual 
time=0.014..129.213 rows=100 loops=1)
 Planning time: 0.176 ms
 Execution time: 1685.575 ms
(8 rows)

Time: 1687.641 ms

Not so big difference here (maybe because I use SSD) but there is
still about 50% difference in execution time. Note that I disable
locale support.

> 1. for the sort we generated for sort agg, planner can switch column
> ordering, put int before string,
> 2. for the sort we generated for sort agg, use bytea compare instead of
> string compare.
> 
> They will bring big improvement to this common query.   Is this something
> reasonable?

Not looking at sort and planner code closely, I guess planner does
not recognize the cost difference between "external merge" and
"external sort" because cost estimations for sort step in each plan
are exactly same (137519.84..140019.84). However I am not sure if we
should fix the planner or should fix our sort machinery since it is
possible that the sort machinery should not expose such a big
difference in the two sort methods.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-17 Thread Marko Tiikkaja

On 10/18/14, 12:15 AM, Tom Lane wrote:

Marko Tiikkaja  writes:

I think there's a big difference between "this index was used to look up
stuff for planning" and "this index was used to answer queries quickly".


I think that's utter nonsense.


Well you probably know a bit more about the optimizer than I do.  But I 
can't see a case where the stats provided by the index would be useful 
for choosing between two (or more) plans that don't use the index in the 
actual query.  If you're saying that there are such cases, then clearly 
I don't know something, and my thinking is in the wrong here.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow format 0000-0000-0000 in postgresql MAC parser

2014-10-17 Thread Ali Akbar
> It has been registered now
> (https://commitfest.postgresql.org/action/patch_view?id=1585). I've got
> an updated version of the patch with the documentation fix.
>

Looks like the patch is all good. I'm marking as ready for commiter.

On a side note, i'm noticing from http://en.wikipedia.org/wiki/MAC_address,
that there is three numbering namespace for MAC: MAC-48, EUI-48 and EUI-64.
The last one is 64 bits long (8 bytes). Currently PostgreSQL's macaddr is
only 6 bytes long. Should we change it to 8 bytes (not in this patch, of
course)?

Regards,
-- 
Ali Akbar


Re: [HACKERS] Superuser connect during smart shutdown

2014-10-17 Thread Jim Nasby

On 10/16/14, 11:46 PM, David G Johnston wrote:

Tom Lane-2 wrote

Something else mentioned was that once you start a smart shutdown you
have no good way (other than limited ps output) to see what the shutdown
is waiting on. I'd like to have some way to get back into the database
to see what's going on. Perhaps we could allow superusers to connect
while waiting for shutdown.


I think this idea is going to founder on the fact that the postmaster
has no way to tell whether an incoming connection is for a superuser.
You don't find that out until you've connected to a database and run
a transaction (so you can read pg_authid).  And by that point, you've
already had a catastrophic impact on any attempt to shut things down.


This quote from the documentation seems suspect in light of your comment...

"While backup mode is active, new connections will still be allowed, but
only to superusers (this exception allows a superuser to connect to
terminate online backup mode)."

http://www.postgresql.org/docs/9.3/interactive/server-shutdown.html


check_hba() does

if (!check_role(port->user_name, roleid, hba->roles))
continue;

And check_role(char **newval, void **extra, GucSource source) does

is_superuser = ((Form_pg_authid) GETSTRUCT(roleTup))->rolsuper;
...
myextra->roleid = roleid;
myextra->is_superuser = is_superuser;
*extra = (void *) myextra;

So presumably with some changes to how we're calling check_role() we could 
determine if port->user_name is a superuser.

I also like the idea of specifying that a connection should be terminated by a 
smart shutdown; I agree that'd be useful for monitoring tools and what-not. If 
folks agree with that I can take a stab at implementing it.

Since I tend to be paranoid, I like smart being the default, but seems I'm in 
the minority there.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trailing comma support in SELECT statements

2014-10-17 Thread Tom Lane
Jim Nasby  writes:
> As I originally posted, if we're going to do this I think we should do it 
> *EVERYWHERE* commas are used as delimiters, save COPY input and output. Or we 
> should at least get close to doing it everywhere. I think the only way things 
> could get more annoying is if we accepted extra commas in SELECT but not in 
> CREATE TABLE (as one example).

> To me completeness is more important than whether we do it or not; that said, 
> I like the idea (as well as supporting leading extra commas).

Yeah, exactly.  Personally I'm *not* for this, but if we do it we should
do it consistently: every comma-separated list in the SQL syntax should
work the same.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-17 Thread Tom Lane
Marko Tiikkaja  writes:
> On 10/17/14, 11:59 PM, Tom Lane wrote:
>> Well, the index might've been getting used in queries too in a way that
>> really only involved the first column.  I think you're solving the wrong
>> problem here.  The right problem is how to identify indexes that are
>> being used in a way that doesn't exploit all the columns.

> I'm not sure I agree with that.  Even if there was some information the 
> planner could have extracted out of the index by using all columns (thus 
> appearing "fully used" in these hypothetical new statistics), I still 
> would've wanted the index gone.  But in this particular case, an index 
> on foo(a) alone was not selective enough and it would have been a bad 
> choice for practically every query, so I'm not sure what good those 
> statistics were in the first place.

Those stats were perfectly valid: what the planner is looking for is
accurate minimum and maximum values for the index's leading column, and
that's what it got.  You're correct that a narrower index could have given
the same results with a smaller disk footprint, but the planner got the
results it needed from the index you provided for it to work with.

> I think there's a big difference between "this index was used to look up 
> stuff for planning" and "this index was used to answer queries quickly". 

I think that's utter nonsense.  Even if there were any validity to the
position, it wouldn't be enough to justify doubling the stats footprint
in order to track system-driven accesses separately from query-driven
ones.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trailing comma support in SELECT statements

2014-10-17 Thread Jim Nasby

On 10/16/14, 11:48 PM, David Johnston wrote:

We might as well allow a final trailing (or initial leading) comma on a
values list at the same time:




do you know, so this feature is a proprietary and it is not based on 
ANSI/SQL? Any user, that use this feature and will to port to other database 
will hate it.

​I've got no complaint if "at the same time" means that neither behavior is 
ever implemented...


As I originally posted, if we're going to do this I think we should do it 
*EVERYWHERE* commas are used as delimiters, save COPY input and output. Or we 
should at least get close to doing it everywhere. I think the only way things 
could get more annoying is if we accepted extra commas in SELECT but not in 
CREATE TABLE (as one example).

To me completeness is more important than whether we do it or not; that said, I 
like the idea (as well as supporting leading extra commas).
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-17 Thread Marko Tiikkaja

On 10/17/14, 11:59 PM, Tom Lane wrote:

Marko Tiikkaja  writes:

On 10/17/14, 11:47 PM, Tom Lane wrote:

Marko Tiikkaja  writes:

So what I'd like to have is a way to be able to distinguish between
indexes being used to answer queries, and ones being only used for stats
lookups during planning.



Why?  Used is used.



Because I don't need a 30GB index on foo(a,b,c) to look up statistics.
If I ever have a problem, I can replace it with a 5GB one on foo(a).


Well, the index might've been getting used in queries too in a way that
really only involved the first column.  I think you're solving the wrong
problem here.  The right problem is how to identify indexes that are
being used in a way that doesn't exploit all the columns.


I'm not sure I agree with that.  Even if there was some information the 
planner could have extracted out of the index by using all columns (thus 
appearing "fully used" in these hypothetical new statistics), I still 
would've wanted the index gone.  But in this particular case, an index 
on foo(a) alone was not selective enough and it would have been a bad 
choice for practically every query, so I'm not sure what good those 
statistics were in the first place.


I think there's a big difference between "this index was used to look up 
stuff for planning" and "this index was used to answer queries quickly". 
 In my mind the first one belongs to the category "this index was 
considered", and the latter is "this index was actually useful".  But 
maybe I'm not seeing the big picture here.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issue with mkdtemp() in port.h

2014-10-17 Thread Tom Lane
Caleb Welton  writes:
> A little while back some users started complaining that the contrib module
> I develop (MADlib) was failing to build with the following error:
> /usr/include/postgresql/9.2/server/port.h:480:32: error: declaration of
> 'char* mkdtemp(char*)' has a different exception specifier
> /usr/include/stdlib.h:663:14: error: from previous declaration 'char*
> mkdtemp(char*) throw ()'
> After some research I've tracked this down to the following commit from ~4
> months ago:
> https://github.com/postgres/postgres/commit/a919937f112eb2f548d5f9bd1b3a7298375e6380

Hm.  Looks like the extern that added should have been protected by
 #ifndef HAVE_MKDTEMP
similarly to the other cases where we conditionally provide a substitute
function.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-17 Thread Jim Nasby

On 10/17/14, 4:49 PM, Marko Tiikkaja wrote:

On 10/17/14, 11:47 PM, Tom Lane wrote:

Marko Tiikkaja  writes:

This week we had one of the most annoying problems I've ever encountered
with postgres.  We had a big index on multiple columns, say,  foo(a, b,
c).  According to pg_stat_all_indexes the index was being used *all the
time*.  However, after looking into our queries more closely, it turns
out that it was only being used to look up statistics for the foo.a
column to estimate merge scan viability during planning.  But this took
hours for two people to track down.



So what I'd like to have is a way to be able to distinguish between
indexes being used to answer queries, and ones being only used for stats
lookups during planning.


Why?  Used is used.


Because I don't need a 30GB index on foo(a,b,c) to look up statistics. If I 
ever have a problem, I can replace it with a 5GB one on foo(a).


That problem can exist with user queries too. Perhaps it would be better to 
find a way to count scans that didn't use all the fields in the index.

I do also see value in differentiating planning use from real query processing; 
not doing that can certainly cause confusion. What I don't know is if the added 
stats bloat is worth it. If we do go down that road, I think it'd be better to 
add an indicator to EState. Aside from allowing stats for all planning access, 
it should make it less likely that someone adds a new access path and forgets 
to mark it as internal (especially if the added field defaults to an invalid 
value).
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-17 Thread Tom Lane
Marko Tiikkaja  writes:
> On 10/17/14, 11:47 PM, Tom Lane wrote:
>> Marko Tiikkaja  writes:
>>> So what I'd like to have is a way to be able to distinguish between
>>> indexes being used to answer queries, and ones being only used for stats
>>> lookups during planning.

>> Why?  Used is used.

> Because I don't need a 30GB index on foo(a,b,c) to look up statistics. 
> If I ever have a problem, I can replace it with a 5GB one on foo(a).

Well, the index might've been getting used in queries too in a way that
really only involved the first column.  I think you're solving the wrong
problem here.  The right problem is how to identify indexes that are
being used in a way that doesn't exploit all the columns.  Which is not
necessarily wrong in itself --- what you'd want is to figure out when the
last column(s) are *never* used.  The existing stats aren't terribly
helpful for that, I agree.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-17 Thread Marko Tiikkaja

On 10/17/14, 11:47 PM, Tom Lane wrote:

Marko Tiikkaja  writes:

This week we had one of the most annoying problems I've ever encountered
with postgres.  We had a big index on multiple columns, say,  foo(a, b,
c).  According to pg_stat_all_indexes the index was being used *all the
time*.  However, after looking into our queries more closely, it turns
out that it was only being used to look up statistics for the foo.a
column to estimate merge scan viability during planning.  But this took
hours for two people to track down.



So what I'd like to have is a way to be able to distinguish between
indexes being used to answer queries, and ones being only used for stats
lookups during planning.


Why?  Used is used.


Because I don't need a 30GB index on foo(a,b,c) to look up statistics. 
If I ever have a problem, I can replace it with a 5GB one on foo(a).



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-17 Thread Tom Lane
Marko Tiikkaja  writes:
> This week we had one of the most annoying problems I've ever encountered 
> with postgres.  We had a big index on multiple columns, say,  foo(a, b, 
> c).  According to pg_stat_all_indexes the index was being used *all the 
> time*.  However, after looking into our queries more closely, it turns 
> out that it was only being used to look up statistics for the foo.a 
> column to estimate merge scan viability during planning.  But this took 
> hours for two people to track down.

> So what I'd like to have is a way to be able to distinguish between 
> indexes being used to answer queries, and ones being only used for stats 
> lookups during planning.

Why?  Used is used.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-17 Thread Marko Tiikkaja

Hi,

This week we had one of the most annoying problems I've ever encountered 
with postgres.  We had a big index on multiple columns, say,  foo(a, b, 
c).  According to pg_stat_all_indexes the index was being used *all the 
time*.  However, after looking into our queries more closely, it turns 
out that it was only being used to look up statistics for the foo.a 
column to estimate merge scan viability during planning.  But this took 
hours for two people to track down.


So what I'd like to have is a way to be able to distinguish between 
indexes being used to answer queries, and ones being only used for stats 
lookups during planning.  Perhaps the easiest way would be adding a new 
column or two into pg_stat_all_indexes, which we would increment in 
get_actual_variable_range() when fetching data.


Any thoughts?


.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json function volatility

2014-10-17 Thread Tom Lane
Alvaro Herrera  writes:
> Merlin Moncure wrote:
>> On Fri, Oct 17, 2014 at 3:03 PM, Andrew Dunstan  wrote:
>>> Following up something Pavel wrote, I notice that json_agg() and
>>> json_object_agg() are both marked as immutable, even though they invoke IO
>>> functions, while json_object is marked stable, even though it does not, and
>>> can probably be marked as immutable. Mea maxima culpa.

>>> I'm not sure what we should do about these things now. Is it a tragedy if we
>>> let these escape into the 9.4 release that way?

>> Is it too late to change them?

> One thing to consider is the catversion bump, which we don't want this
> late in the cycle.  Still, you could change the catalogs but not the
> version, and advise those with the older definitions to tweak the
> catalogs by hand if they need it.  I think we did this once.

I'm fairly sure that the system doesn't actually pay attention to the
volatility marking of aggregates, so there's no huge harm done by the
incorrect markings of those.  The incorrect marking of json_object might
be a small optimization block.

+1 for changing these in 9.4 without bumping catversion.  I don't think
we need to give advice for manual corrections, either: the risk of doing
that wrong probably outweighs the value of fixing it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-17 Thread Jim Nasby

On 10/17/14, 4:31 AM, David G Johnston wrote:

Since the standard doesn't distinguish between read and write aspects of the
object types there isn't a safe way to add matviews to the information
schema that doesn't violate the intent of the provided view.  If the
application/users wants to support/use PostgreSQL specific features it/they
have to be ready and able to use the catalog.


+1. If we add matviews to information_schema while they're not part of that 
standard then we're going to regret it at some point.

Perhaps the answer to this problem is to restart the old pg_newsysviews project.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json function volatility

2014-10-17 Thread Alvaro Herrera
Merlin Moncure wrote:
> On Fri, Oct 17, 2014 at 3:03 PM, Andrew Dunstan  wrote:
> > Following up something Pavel wrote, I notice that json_agg() and
> > json_object_agg() are both marked as immutable, even though they invoke IO
> > functions, while json_object is marked stable, even though it does not, and
> > can probably be marked as immutable. Mea maxima culpa.
> >
> > I'm not sure what we should do about these things now. Is it a tragedy if we
> > let these escape into the 9.4 release that way?
> 
> Is it too late to change them?

One thing to consider is the catversion bump, which we don't want this
late in the cycle.  Still, you could change the catalogs but not the
version, and advise those with the older definitions to tweak the
catalogs by hand if they need it.  I think we did this once.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json function volatility

2014-10-17 Thread Peter Geoghegan
On Fri, Oct 17, 2014 at 1:44 PM, Merlin Moncure  wrote:
> Is it too late to change them?  Either way, it seems fairly
> implausible someone would come up with a case to stick json_agg(), or
> any aggregate function really, inside of an index. So it's not exactly
> the crime of the century.

Indexes reject aggregates outright as acceptable expressions for
indexing. That's not the only issue, of course.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Optimizer on sort aggregate

2014-10-17 Thread Feng Tian
Hi,

Consider the following queries.

create table t(i int, j int, k int, t text);
insert into t select i, i % 100, i % 1000, 'AABBCCDD' || i from
generate_series(1, 100) i;

ftian=# explain select count(distinct j) from t group by t, i;
   QUERY PLAN

 GroupAggregate  (cost=158029.84..178029.84 rows=100 width=22)
   ->  Sort  (cost=158029.84..160529.84 rows=100 width=22)
 Sort Key: t, i
 ->  Seq Scan on t  (cost=0.00..17352.00 rows=100 width=22)
(4 rows)


The query,
select count(distinct j) from t group by t, i;

runs for 35 seconds.  However, if I change the query to,
select count(distinct j) from t group by i, t;  -- note switching the
ordering
select count(distinct j) from t group by decode(t, 'escape'), i; -- convert
t to bytea

Run times are just about 5 and 6.5 seconds.  The reason is clear, compare a
string with collation is slow, which is well understood by pg hackers.
However, here, the sorting order is forced by the planner, not user.
Planner can do the following optimizations,

1. for the sort we generated for sort agg, planner can switch column
ordering, put int before string,
2. for the sort we generated for sort agg, use bytea compare instead of
string compare.

They will bring big improvement to this common query.   Is this something
reasonable?

Thanks,


[HACKERS] Issue with mkdtemp() in port.h

2014-10-17 Thread Caleb Welton
A little while back some users started complaining that the contrib module
I develop (MADlib) was failing to build with the following error:

--
/usr/include/postgresql/9.2/server/port.h:480:32: error: declaration of
'char* mkdtemp(char*)' has a different exception specifier
/usr/include/stdlib.h:663:14: error: from previous declaration 'char*
mkdtemp(char*) throw ()'
--

After some research I've tracked this down to the following commit from ~4
months ago:

--
https://github.com/postgres/postgres/commit/a919937f112eb2f548d5f9bd1b3a7298375e6380
--


Which added a definition of mkdtemp into port.h that conflicts with the
definition in the system header files.

The following is a simple program that demonstrates the issue:

--
bash$ cat /tmp/foo.cpp
#include "postgres.h"
int main() { return 0; }

bash$ gcc -o foo foo.cpp -I`pg_config --includedir-server` -pedantic
In file included from /usr/pgsql-9.2/include/server/c.h:860,
 from /usr/pgsql-9.2/include/server/postgres.h:47,
 from foo.cpp:1:
/usr/pgsql-9.2/include/server/port.h:479: error: declaration of ‘char*
mkdtemp(char*)’ throws different exceptions
/usr/include/stdlib.h:663: error: from previous declaration ‘char*
mkdtemp(char*) throw ()’
--

Reproducible on ubuntu 14.04, centos6, and likely others.

Regards,
  Caleb


Re: [HACKERS] json function volatility

2014-10-17 Thread Merlin Moncure
On Fri, Oct 17, 2014 at 3:03 PM, Andrew Dunstan  wrote:
> Following up something Pavel wrote, I notice that json_agg() and
> json_object_agg() are both marked as immutable, even though they invoke IO
> functions, while json_object is marked stable, even though it does not, and
> can probably be marked as immutable. Mea maxima culpa.
>
> I'm not sure what we should do about these things now. Is it a tragedy if we
> let these escape into the 9.4 release that way?

Is it too late to change them?  Either way, it seems fairly
implausible someone would come up with a case to stick json_agg(), or
any aggregate function really, inside of an index. So it's not exactly
the crime of the century.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-17 Thread Peter Eisentraut
On 10/16/14 9:45 AM, Stephen Frost wrote:
> Alright, coming back to this, I have to ask- how are matviews different
> from views from the SQL standard's perspective?  I tried looking through
> the standard to figure it out (and I admit that I probably missed
> something), but the only thing appears to be a statement in the standard
> that (paraphrased) "functions are run with the view is queried" and that
> strikes me as a relatively minor point..

To me, the main criterion is that you cannot DROP VIEW a materialized view.

Generally, if the information schema claims that a
view/table/function/etc. named "foo" exists, then I should be able to
operate on "foo" using the basic operations for a
view/table/function/etc. of that name.  I think think DROP VIEW is a
basic operation for a view.  Others might disagree.

More subtly, if we claim that a materialized view is a view, then we
cannot have asynchronously updated materialized views, because then we
have different semantics.

All of this is a judgement call in corner cases.  But I don't think this
is a corner case at all.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] json function volatility

2014-10-17 Thread Andrew Dunstan
Following up something Pavel wrote, I notice that json_agg() and 
json_object_agg() are both marked as immutable, even though they invoke 
IO functions, while json_object is marked stable, even though it does 
not, and can probably be marked as immutable. Mea maxima culpa.


I'm not sure what we should do about these things now. Is it a tragedy 
if we let these escape into the 9.4 release that way?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto: PGP signatures

2014-10-17 Thread Jeff Janes
On Mon, Sep 15, 2014 at 4:37 AM, Marko Tiikkaja  wrote:

>
> I've changed the patch back to ignore signatures when not using the
> decrypt_verify() functions in the attached.



Hi Marko,

This patch needs a rebase now that the armor header patch has been
committed.

Thanks,

Jeff


Re: [HACKERS] 2014-10 CommitFest

2014-10-17 Thread Kevin Grittner
Kevin Grittner  wrote:

> Unless someone else wants to pick it up, I'll manage this one.

> Since there was no previous warning, I'll allow a grace day for the
> cut-off on submissions for this CF; if it isn't registered in the
> web application 24 hours after this email, I will move it to the
> next CF.  So look for those patches that "fell through the cracks"
> without getting registered, and post the ones you've got.

[ rings bell ]

So the 2014-10 CF is now In Progress and the 2014-12 CF is Open.

There are 86 patches listed at this time:

Needs Review: 69
Waiting on Author: 5
Ready for Committer: 10
Committed: 2

We've got four weeks if we're going to return or commit all of
these by the scheduled end of the CF.  Since many of these have
already had significant review, my hope is that we can "hit the
ground running" and wrap things up in a timely fashion so we have a
break before the start of the *next* CF.

Please remember, if you are submitting patches, you should be
reviewing patches, too.  Otherwise development will hit a logjam.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] documentation update for doc/src/sgml/func.sgml

2014-10-17 Thread Andreas 'ads' Scherbaum

On 09/14/2014 06:32 PM, Peter Eisentraut wrote:

On 9/12/14 3:13 PM, Andreas 'ads' Scherbaum wrote:

Of course a general rule how to link to WP would be nice ...


I think Wikipedia links should be avoided altogether.  We can assume
that readers are technically proficient to look up general technical
concepts on their own using a reference system of their choice.

In cases where a link is warranted, it is better to construct a proper
bibliographic citation to the primary source material, such as an IEEE
standard or an academic paper, in a way that will stand the test of time.


That's a clear statement, and makes sense. Should be written down 
somewhere, so it can be found again.




Independent of that, it is actually not correct that "we use the IEEE's
rules", because "we" don't use any rules, that is up to the operating
system/platform.  While most platforms indeed do use the IEEE
floating-point standard more less, some don't.  Section 8.1.3 tries to
point that out.


New version attached, WP link removed, wording changed.


Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 13c71af..d54cf58 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -924,6 +924,25 @@
 

 
+   
+For functions like round(), log() and sqrt() which run against
+either fixed-precision (NUMERIC) or floating-point numbers (e.g. REAL),
+note that the results of these operations will differ according
+to the input type due to rounding. This is most observable with
+round(), which can end up rounding down as well as up for
+any #.5 value. PostgreSQL's handling
+of floating-point values depends on the operating system, which
+may or may not follow the IEEE floating-point standard.
+  
+
+  
+The bitwise operators work only on integral data types, whereas
+the others are available for all numeric data types. The bitwise
+operators are also available for the bit string types
+bit and bit varying, as
+shown in .
+   
+
   
  shows functions for
 generating random numbers.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: dynahash replacement for buffer table

2014-10-17 Thread Andres Freund
On 2014-10-16 20:22:24 -0400, Robert Haas wrote:
> On Thu, Oct 16, 2014 at 6:53 PM, Andres Freund  wrote:
> > When using shared_buffers = 96GB there's a performance benefit, but not
> > huge:
> > master (f630b0dd5ea2de52972d456f5978a012436115e):   153621.8
> > master + LW_SHARED + lockless StrategyGetBuffer():  477118.4
> > master + LW_SHARED + lockless StrategyGetBuffer() + chash:  496788.6
> > master + LW_SHARED + lockless StrategyGetBuffer() + chash-nomb: 499562.7
> >
> > But with shared_buffers = 16GB:
> > master (f630b0dd5ea2de52972d456f5978a012436115e):   177302.9
> > master + LW_SHARED + lockless StrategyGetBuffer():  206172.4
> > master + LW_SHARED + lockless StrategyGetBuffer() + chash:  413344.1
> > master + LW_SHARED + lockless StrategyGetBuffer() + chash-nomb: 426405.8
> 
> Very interesting.  This doesn't show that chash is the right solution,
> but it definitely shows that doing nothing is the wrong solution.

Absolutely.


The thing worrying me most (but not all that much on an absolute scale)
about chash is that lots of the solutions to memory management it builds
are specific to it... And generalizing afterwards will be hard because
we'll have to prove that that general solution is as performant as the
special case code...

> It shows that, even with the recent bump to 128 lock manager
> partitions, and LW_SHARED on top of that, workloads that actually
> update the buffer mapping table still produce a lot of contention
> there. 

FWIW, I couldn't see much of a benefit without LW_SHARED even though I
a *few* things. The bottleneck simply is completely elsewhere.

> This hasn't been obvious to me from profiling, but the numbers
> above make it pretty clear.

So I'm not super surprised about that. There very well might be cases
where it's a bad bottleneck before, but I've not seen them.

> It also seems to suggest that trying to get rid of the memory barriers
> isn't a very useful optimization project.

I'm not yet convinced of that. Yes, it's not showing up hugely in the
numbers here, but that's simply because the workload is again completely
dominated by the kernel copying data for the read()s, GetSnapshotData(),
the buffer mapping cache misses and a few other familiar friends.

> We might get a couple of
> percent out of it, but it's pretty small potatoes, so unless it can be
> done more easily than I suspect, it's probably not worth bothering
> with.

I still think that moving the barrier to the reading side would be
simple (implementation wise) and correct for x86. If you think about it,
otherwise our spinlock implementation for x86 would be broken. As a
unlock is just a compiler barrier the full barrier on acquire better be
a full synchronization point. Am I missing something?

> An approach I think might have more promise is to have bufmgr.c
> call the CHash stuff directly instead of going through buf_table.c.

I don't see all that much point in buf_table.c currently - on the other
hand it has lead to it replacing the buffer mapping being slightly
easier... Maybe it should just go to a header...

> Right now, for example, BufferAlloc() creates and initializes a
> BufferTag and passes a pointer to that buffer tag to BufTableLookup,
> which copies it into a BufferLookupEnt.  But it would be just as easy
> for BufferAlloc() to put the BufferLookupEnt on its own stack, and
> then you wouldn't need to copy the data an extra time.  Now a 20-byte
> copy isn't a lot, but it's completely unnecessary and looks easy to
> get rid of.

Worthwile to try.

> > I had to play with setting max_connections+1 sometimes to get halfway
> > comparable results for master - unaligned data otherwise causes wierd
> > results otherwise. Without doing that the performance gap between master
> > 96/16G was even bigger. We really need to fix that...
> >
> > This is pretty awesome.
> 
> Thanks.  I wasn't quite sure how to test this or where the workloads
> that it would benefit would be found, so I appreciate you putting time
> into it.  And I'm really glad to hear that it delivers good results.

I wasn't sure either ;). Lucky that it played out so impressively. After
the first results I was nearly ready to send out a 'Meh.' type of
message ;)


Btw, CHashTableGetNode isn't exactly cheap. It shows up noticeably in
profiles. It results in several non-pipelineable instructions in a
already penalized section of the code... Replacing arena_stride by a
constant provided measurable improvements (no imul is required anymore,
instead you get shr;lea or so). I'm not sure how to deal with that. If
it shows up even on my quite new laptop, it'll be more so noticeable on
older x86 platforms.

> I think it would be useful to plumb the chash statistics into the
> stats collector or at least a debugging dump of some kind for testing.

I'm not sure it's solid enough at this point to be in the stats
collector. But I surely would like to access it somehow. I'm
e.g. absolutely not s

Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread CK Tan
Happy to contribute to that decision :-)


On Fri, Oct 17, 2014 at 11:35 AM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2014-10-17 13:12:27 -0400, Tom Lane wrote:
>>> Well, that's pretty much cheating: it's too hard to disentangle what's
>>> coming from JIT vs what's coming from using a different accumulator
>>> datatype.  If we wanted to depend on having int128 available we could
>>> get that speedup with a couple hours' work.
>
>> I think doing that when configure detects int128 would make a great deal
>> of sense.
>
> Yeah, I was wondering about that myself: use int128 if available,
> else fall back on existing code path.
>
> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash index creation warning

2014-10-17 Thread Bruce Momjian
On Fri, Oct 17, 2014 at 12:56:52PM -0400, Tom Lane wrote:
> David G Johnston  writes:
> > The question is whether we explain the implications of not being WAL-logged
> > in an error message or simply state the fact and let the documentation
> > explain the hazards - basically just output:
> > "hash indexes are not WAL-logged and their use is discouraged"
> 
> +1.  The warning message is not the place to be trying to explain all the
> details.

OK, updated patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
new file mode 100644
index e469b17..43df32f
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
*** Indexes:
*** 474,480 
  Also, changes to hash indexes are not replicated over streaming or
  file-based replication after the initial base backup, so they
  give wrong answers to queries that subsequently use them.
! For these reasons, hash index use is presently discouraged.
 

  
--- 474,481 
  Also, changes to hash indexes are not replicated over streaming or
  file-based replication after the initial base backup, so they
  give wrong answers to queries that subsequently use them.
! Hash indexes are also not properly restored during point-in-time
! recovery.  For these reasons, hash index use is presently discouraged.
 

  
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
new file mode 100644
index 8a1cb4b..3c1e90e
*** a/src/backend/commands/indexcmds.c
--- b/src/backend/commands/indexcmds.c
*** DefineIndex(Oid relationId,
*** 491,497 
  
  	if (strcmp(accessMethodName, "hash") == 0)
  		ereport(WARNING,
! (errmsg("hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers")));
  
  	if (stmt->unique && !accessMethodForm->amcanunique)
  		ereport(ERROR,
--- 491,497 
  
  	if (strcmp(accessMethodName, "hash") == 0)
  		ereport(WARNING,
! (errmsg("hash indexes are not WAL-logged and their use is discouraged")));
  
  	if (stmt->unique && !accessMethodForm->amcanunique)
  		ereport(ERROR,
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
new file mode 100644
index a2bef7a..8326e94
*** a/src/test/regress/expected/create_index.out
--- b/src/test/regress/expected/create_index.out
*** DROP TABLE array_gin_test;
*** 2238,2250 
  -- HASH
  --
  CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers
  CREATE INDEX hash_name_index ON hash_name_heap USING hash (random name_ops);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers
  CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers
  CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers
  -- CREATE INDEX hash_ovfl_index ON hash_ovfl_heap USING hash (x int4_ops);
  --
  -- Test functional index
--- 2238,2250 
  -- HASH
  --
  CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops);
! WARNING:  hash indexes are not WAL-logged and their use is discouraged
  CREATE INDEX hash_name_index ON hash_name_heap USING hash (random name_ops);
! WARNING:  hash indexes are not WAL-logged and their use is discouraged
  CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops);
! WARNING:  hash indexes are not WAL-logged and their use is discouraged
  CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops);
! WARNING:  hash indexes are not WAL-logged and their use is discouraged
  -- CREATE INDEX hash_ovfl_index ON hash_ovfl_heap USING hash (x int4_ops);
  --
  -- Test functional index
diff --git a/src/test/regress/expected/enum.out b/src/test/regress/expected/enum.out
new file mode 100644
index fa23b52..1a61a5b
*** a/src/test/regress/expected/enum.out
--- b/src/test/regress/expected/enum.out
*** DROP INDEX enumtest_btree;
*** 383,389 
  -- Hash index / opclass with the = operator
  --
  CREATE INDEX enumtest_hash ON enumtest USING hash (col);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers
  SELECT * FROM enumtest WHERE col = 'orange';
col   
  
--- 383,389 
  -- Hash index / opclass with the = operator
  --
  CREATE INDEX enumtest_hash ON enumtest USING hash (col);
! WARNING:  hash indexes are not WAL-logged and their use is discourag

Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread Tom Lane
Andres Freund  writes:
> On 2014-10-17 13:12:27 -0400, Tom Lane wrote:
>> Well, that's pretty much cheating: it's too hard to disentangle what's
>> coming from JIT vs what's coming from using a different accumulator
>> datatype.  If we wanted to depend on having int128 available we could
>> get that speedup with a couple hours' work.

> I think doing that when configure detects int128 would make a great deal
> of sense.

Yeah, I was wondering about that myself: use int128 if available,
else fall back on existing code path.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread Merlin Moncure
On Fri, Oct 17, 2014 at 1:21 PM, Peter Geoghegan  wrote:
> On Fri, Oct 17, 2014 at 11:08 AM, Feng Tian  wrote:
>> I agree using that using int128 in stock postgres will speed up things too.
>> On the other hand, that is only one part of the equation.   For example, if
>> you look at TPCH Q1, the int128 "cheating" does not kick in at all, but we
>> are 8x faster.
>
> I'm curious about how the numbers look when stock Postgres is built
> with the same page size as your fork. You didn't mention whether or
> not your Postgres numbers came from a standard build.

I downloaded the 8kb varant.

vitesse (median of 3):
postgres=# select count(*), sum(i*i), avg(i) from t;
  count  │sum │ avg
─┼┼─
 100 │ 338350 │ 50.5000
(1 row)

Time: 39.197 ms

stock (median of 3):
postgres=# select count(*), sum(i*i), avg(i) from t;
  count  │sum │ avg
─┼┼─
 100 │ 338350 │ 50.5000
(1 row)

Time: 667.362 ms

(stock int4 ops)
postgres=# select sum(1::int4) from t;
   sum
─
 100
(1 row)

Time: 75.265 ms

What I'm wondering is how complex the hooks are that tie the
technology in.   Unless a bsd licensed patch materializes, the
conversation (beyond the intitial wow! factor) should probably focus
on a possible integration points and/or implementation of technology
into core in a general way.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread Andres Freund
On 2014-10-17 13:12:27 -0400, Tom Lane wrote:
> Well, that's pretty much cheating: it's too hard to disentangle what's
> coming from JIT vs what's coming from using a different accumulator
> datatype.  If we wanted to depend on having int128 available we could
> get that speedup with a couple hours' work.

I think doing that when configure detects int128 would make a great deal
of sense. It's not like we'd save a great deal of complicated code by
removing the existing accumulator... We'd still have to return a
numeric, but that's likely lost in the noise cost wise.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread Peter Geoghegan
On Fri, Oct 17, 2014 at 11:08 AM, Feng Tian  wrote:
> I agree using that using int128 in stock postgres will speed up things too.
> On the other hand, that is only one part of the equation.   For example, if
> you look at TPCH Q1, the int128 "cheating" does not kick in at all, but we
> are 8x faster.

I'm curious about how the numbers look when stock Postgres is built
with the same page size as your fork. You didn't mention whether or
not your Postgres numbers came from a standard build.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-17 Thread Andres Freund
On 2014-10-17 17:14:16 +0530, Amit Kapila wrote:
> On Tue, Oct 14, 2014 at 11:34 AM, Amit Kapila 
> wrote:
> >
> >
> > I am not sure why we are seeing difference even though running
> > on same m/c with same configuration.
> 
> I have tried many times, but I could not get the numbers you have
> posted above with HEAD, however now trying with the latest version
> [1] posted by you, everything seems to be fine at this workload.
> The data at higher client count is as below:

I'll try to reproduce it next week. But I don't think it matters all
that much. Personally so far the performance numbers don't seem to
indicate much reason to wait any further. We sure improve further, but I
don't see much reason to wait because of that.

>   HEAD – commit 494affb
> 
>  Shared_buffers=8GB; Scale Factor = 3000
> 
>  Client Count/No. Of Runs (tps) 64 128  Run-1 271799 24  Run-2 274341
> 245207  Run-3 275019 252258
> 
> 
> 
> 
> 
>  HEAD – commit 494affb + wait free lw_shared_v2
> 
>  Shared_buffers=8GB; Scale Factor = 3000
> 
>  Client Count/No. Of Runs (tps) 64 128  Run-1 286209 274922  Run-2 289101
> 274495  Run-3 289639 273633

So here the results with LW_SHARED were consistently better, right? You
saw performance degradations here earlier?

> So I am planning to proceed further with the review/test of your
> latest patch.

> According to me, below things are left from myside:
> a. do some basic tpc-b tests with patch
> b. re-review latest version posted by you

Cool!

> I know that you have posted optimization into StrategyGetBuffer() in
> this thread, however I feel we can evaluate it separately unless you
> are of opinion that both the patches should go together.
> 
> [1]
> http://www.postgresql.org/message-id/20141010111027.gc6...@alap3.anarazel.de

No, I don't think they should go together - I wrote that patch because
it was the bottleneck in the possibly regressing test and I wanted to
see the full effect. Although I do think we should apply it ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread Josh Berkus
CK,

Before we go any further on this, how is Vitesse currently licensed?
last time we talked it was still proprietary.  If it's not being
open-sourced, we likely need to take discussion off this list.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Fwd: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread Feng Tian
Hi, Tom,

Sorry for double post to you.

Feng

-- Forwarded message --
From: Feng Tian 
Date: Fri, Oct 17, 2014 at 10:29 AM
Subject: Re: [HACKERS] Vitesse DB call for testing
To: Tom Lane 


Hi, Tom,

I agree using that using int128 in stock postgres will speed up things
too.  On the other hand, that is only one part of the equation.   For
example, if you look at TPCH Q1, the int128 "cheating" does not kick in at
all, but we are 8x faster.

I am not sure why do you mean by "actual data access".   Data is still in
stock postgres format on disk.  We indeed jit-ed all data fields access
(deform tuple).To put things in perspective, I just timed select
count(*) and select count(l_orderkey) from tpch1.lineitem;   Our code is
bottlenecked by memory bandwidth and difference is pretty much invisible.

Thanks,
Feng


ftian=# set vdb_jit = 0;

SET

Time: 0.155 ms

ftian=# select count(*) from tpch1.lineitem;

  count

-

 6001215

(1 row)


Time: 688.658 ms

ftian=# select count(*) from tpch1.lineitem;

  count

-

 6001215

(1 row)


Time: 690.753 ms

ftian=# select count(l_orderkey) from tpch1.lineitem;

  count

-

 6001215

(1 row)


Time: 819.452 ms

ftian=# set vdb_jit = 1;

SET

Time: 0.167 ms

ftian=# select count(*) from tpch1.lineitem;

  count

-

 6001215

(1 row)


Time: 203.543 ms

ftian=# select count(l_orderkey) from tpch1.lineitem;

  count

-

 6001215

(1 row)


Time: 202.253 ms

ftian=#




On Fri, Oct 17, 2014 at 10:12 AM, Tom Lane  wrote:

> CK Tan  writes:
> > The bigint sum,avg,count case in the example you tried has some
> optimization. We use int128 to accumulate the bigint instead of numeric in
> pg. Hence the big speed up. Try the same query on int4 for the improvement
> where both pg and vitessedb are using int4 in the execution.
>
> Well, that's pretty much cheating: it's too hard to disentangle what's
> coming from JIT vs what's coming from using a different accumulator
> datatype.  If we wanted to depend on having int128 available we could
> get that speedup with a couple hours' work.
>
> But what exactly are you "compiling" here?  I trust not the actual data
> accesses; that seems far too complicated to try to inline.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread Tom Lane
CK Tan  writes:
> The bigint sum,avg,count case in the example you tried has some optimization. 
> We use int128 to accumulate the bigint instead of numeric in pg. Hence the 
> big speed up. Try the same query on int4 for the improvement where both pg 
> and vitessedb are using int4 in the execution.

Well, that's pretty much cheating: it's too hard to disentangle what's
coming from JIT vs what's coming from using a different accumulator
datatype.  If we wanted to depend on having int128 available we could
get that speedup with a couple hours' work.

But what exactly are you "compiling" here?  I trust not the actual data
accesses; that seems far too complicated to try to inline.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-17 Thread Tom Lane
Alvaro Herrera  writes:
> Dag-Erling Smørgrav wrote:
>> I understand this policy.  However, this new feature a) has absolutely
>> no impact unless the admin makes a conscious decision to use it and b)
>> will make life much easier for everyone if a POODLE-like vulnerability
>> is discovered in TLS.

> I see this as vaguely related to 
> http://www.postgresql.org/message-id/20131114202733.gb7...@eldon.alvh.no-ip.org
> where we want to have SSL behavior configurable in the back branches due
> to renegotiation issues: there was talk in that thread about introducing
> new GUC variables in back branches to control the behavior.  The current
> patch really doesn't add much in the way of features (SSLv3 support and
> so on already exist in back branches) --- what it does add is a way to
> control whether these are used.

This looks to me like re-fighting the last war.  Such a GUC has zero value
*unless* some situation exactly like the POODLE bug comes up again, and
the odds of that are not high.

Moreover, the GUC could easily be misused to decrease rather than increase
one's security, if it's carelessly set.  Remember that we only recently
fixed bugs that prevented use of the latest TLS version.  If we have a
setting like this, I fully anticipate that people will set it to "only use
TLS 1.2" and then forget that they ever did that; years from now they'll
still be using 1.2 when it's deprecated.

So I think the argument that this is a useful security contribution is
pretty weak; and on the whole we don't need another marginally-useful GUC.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash index creation warning

2014-10-17 Thread Tom Lane
David G Johnston  writes:
> The question is whether we explain the implications of not being WAL-logged
> in an error message or simply state the fact and let the documentation
> explain the hazards - basically just output:
> "hash indexes are not WAL-logged and their use is discouraged"

+1.  The warning message is not the place to be trying to explain all the
details.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Segmentation fault] pg_dump binary-upgrade fail for type without element

2014-10-17 Thread Tom Lane
Rushabh Lathia  writes:
> pg_dump binary-upgrade fail with segmentation fault for type without
> element.

Ooops.

> Looking further into code I found that rather then fetching typrelid, we can
> use the already stored typrelid from TypeInfo structure.

Agreed.  Patch committed, thanks!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash index creation warning

2014-10-17 Thread David G Johnston
Bruce Momjian wrote
> Now that we have the create hash index warning in 9.5, I realized that
> we don't warn about hash indexes with PITR, only crash recovery and
> streaming.  This patch fixes that.
> 
> Is the wording "cannot be used" too vague.  The CREATE INDEX manual
> page has the words "give wrong answers to queries", which might be
> better, but is kind of long for an error message.  Suggestions?

Something like the following is more specific without being more wordy:

"hash indexes are not WAL-logged: they are corrupted during recovery and
changes do not replicate to standby servers."

The question is whether we explain the implications of not being WAL-logged
in an error message or simply state the fact and let the documentation
explain the hazards - basically just output:

"hash indexes are not WAL-logged and their use is discouraged"

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Hash-index-creation-warning-tp5823443p5823445.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-17 Thread Alvaro Herrera
Dag-Erling Smørgrav wrote:
> Michael Paquier  writes:
> > Please note that new features can only be added to the version
> > currently in development, aka 9.5.
> 
> I understand this policy.  However, this new feature a) has absolutely
> no impact unless the admin makes a conscious decision to use it and b)
> will make life much easier for everyone if a POODLE-like vulnerability
> is discovered in TLS.

I see this as vaguely related to 
http://www.postgresql.org/message-id/20131114202733.gb7...@eldon.alvh.no-ip.org
where we want to have SSL behavior configurable in the back branches due
to renegotiation issues: there was talk in that thread about introducing
new GUC variables in back branches to control the behavior.  The current
patch really doesn't add much in the way of features (SSLv3 support and
so on already exist in back branches) --- what it does add is a way to
control whether these are used.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hash index creation warning

2014-10-17 Thread Bruce Momjian
Now that we have the create hash index warning in 9.5, I realized that
we don't warn about hash indexes with PITR, only crash recovery and
streaming.  This patch fixes that.

Is the wording "cannot be used" too vague.  The CREATE INDEX manual
page has the words "give wrong answers to queries", which might be
better, but is kind of long for an error message.  Suggestions?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
new file mode 100644
index e469b17..43df32f
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
*** Indexes:
*** 474,480 
  Also, changes to hash indexes are not replicated over streaming or
  file-based replication after the initial base backup, so they
  give wrong answers to queries that subsequently use them.
! For these reasons, hash index use is presently discouraged.
 

  
--- 474,481 
  Also, changes to hash indexes are not replicated over streaming or
  file-based replication after the initial base backup, so they
  give wrong answers to queries that subsequently use them.
! Hash indexes are also not properly restored during point-in-time
! recovery.  For these reasons, hash index use is presently discouraged.
 

  
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
new file mode 100644
index 8a1cb4b..03833d7
*** a/src/backend/commands/indexcmds.c
--- b/src/backend/commands/indexcmds.c
*** DefineIndex(Oid relationId,
*** 491,497 
  
  	if (strcmp(accessMethodName, "hash") == 0)
  		ereport(WARNING,
! (errmsg("hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers")));
  
  	if (stmt->unique && !accessMethodForm->amcanunique)
  		ereport(ERROR,
--- 491,497 
  
  	if (strcmp(accessMethodName, "hash") == 0)
  		ereport(WARNING,
! (errmsg("hash indexes are not WAL-logged and thus are not crash-safe and cannot be used for point-in-time recovery or on standby servers")));
  
  	if (stmt->unique && !accessMethodForm->amcanunique)
  		ereport(ERROR,
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
new file mode 100644
index a2bef7a..11325e4
*** a/src/test/regress/expected/create_index.out
--- b/src/test/regress/expected/create_index.out
*** DROP TABLE array_gin_test;
*** 2238,2250 
  -- HASH
  --
  CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers
  CREATE INDEX hash_name_index ON hash_name_heap USING hash (random name_ops);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers
  CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers
  CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers
  -- CREATE INDEX hash_ovfl_index ON hash_ovfl_heap USING hash (x int4_ops);
  --
  -- Test functional index
--- 2238,2250 
  -- HASH
  --
  CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used for point-in-time recovery or on standby servers
  CREATE INDEX hash_name_index ON hash_name_heap USING hash (random name_ops);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used for point-in-time recovery or on standby servers
  CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used for point-in-time recovery or on standby servers
  CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used for point-in-time recovery or on standby servers
  -- CREATE INDEX hash_ovfl_index ON hash_ovfl_heap USING hash (x int4_ops);
  --
  -- Test functional index
diff --git a/src/test/regress/expected/enum.out b/src/test/regress/expected/enum.out
new file mode 100644
index fa23b52..47ac5a6
*** a/src/test/regress/expected/enum.out
--- b/src/test/regress/expected/enum.out
*** DROP INDEX enumtest_btree;
*** 383,389 
  -- Hash index / opclass with the = operator
  --
  CREATE INDEX enumtest_hash ON enumtest USING hash (col);
! WARNING:  hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers
  SELECT * FROM enumtest WHERE co

Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Marti Raudsepp
On Oct 17, 2014 6:16 PM, "Tom Lane"  wrote:
> A more realistic objection goes like this:
>
> create table foo(f int, g int);
> update foo x set x = (1,2);  -- works
> alter table foo add column x int;
> update foo x set x = (1,2,3);  -- no longer works
>
> It's not a real good thing if a column addition or renaming can
> so fundamentally change the nature of a query.

I think a significant use case for this feature is when you already have a
row-value and want to persist it in the database, like you can do with
INSERT:
insert into foo select * from populate_record_json(null::foo, '{...}');

In this case the opposite is true: requiring explicit column names would
break the query if you add columns to the table. The fact that you can't
reasonably use populate_record/_json with UPDATE is a significant omission.
IMO this really speaks for supporting shorthand whole-row assignment,
whatever the syntax.

Regards,
Marti


Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread Merlin Moncure
On Fri, Oct 17, 2014 at 10:47 AM, CK Tan  wrote:
> Merlin, glad you tried it.
>
> We take the query plan exactly as given by the planner, decide whether to JIT 
> or to punt depending on the cost. If we punt, it goes back to pg executor. If 
> we JIT, and if we could not proceed (usually of some operators we haven't 
> implemented yet), we again punt. Once we were able to generate the code, 
> there is no going back; we call into LLVM to obtain the function entry point, 
> and run it to completion. The 3% improvement you see in OLTP tests is 
> definitely noise.
>
> The bigint sum,avg,count case in the example you tried has some optimization. 
> We use int128 to accumulate the bigint instead of numeric in pg. Hence the 
> big speed up. Try the same query on int4 for the improvement where both pg 
> and vitessedb are using int4 in the execution.
>
> The speed up is really noticeable when the data type is nonvarlena. In the 
> varlena cases, we still call into pg routines most of the times. Again, try 
> the sum,avg,count query on numeric, and you will see what I mean.
>
> Also, we don't support UDF at the moment. So all queries involving UDF gets 
> sent to pg executor.
>
> On your question of 32k page size, the rational is that some of our customers 
> could be interested in a data warehouse on pg. 32k page size is a big win 
> when all you do is seqscan all day long.
>
> We are looking for bug reports at these stage and some stress tests done 
> without our own prejudices. Some test on real data in non prod setting on 
> queries that are highly CPU bound would be ideal.

One thing that I noticed is that when slamming your benchmark query
via pgbench, resident memory consumption was really aggressive and
would have taken down the server had I not spuriously stopped the
test.  Memory consumption did return to baseline after that so I
figured some type of llvm memory management games were going on.  This
isn't really a problem for most OLAP workloads but it's something to
be aware of.

Via 'perf top' on stock postgres, you see the usual suspects: palloc,
hash_search_etc.   On your build though HeapTuplesSatisfiesMVCC zooms
right to the top of the stack which is pretty interesting...the
executor is you've built is very lean and mean for sure.  A drop in
optimization engine with little no schema/sql changes is pretty neat
-- your primary competitor here is going to be column organized type
table solutions to olap type problems.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Merlin Moncure
On Fri, Oct 17, 2014 at 10:30 AM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> On Fri, Oct 17, 2014 at 10:16 AM, Tom Lane  wrote:
>>> I think it wouldn't; Merlin is proposing that f would be taken as the
>>> field name.  A more realistic objection goes like this:
>>>
>>> create table foo(f int, g int);
>>> update foo x set x = (1,2);  -- works
>>> alter table foo add column x int;
>>> update foo x set x = (1,2,3);  -- no longer works
>>>
>>> It's not a real good thing if a column addition or renaming can
>>> so fundamentally change the nature of a query.
>
>> That's exactly how SELECT works.  I also dispute that the user should
>> be surprised in such cases.
>
> Well, the reason we have a problem in SELECT is that we support an
> unholy miscegenation of SQL-spec and PostQUEL syntax: the idea that
> "SELECT foo FROM foo" could represent a whole-row selection is nowhere
> to be found in the SQL standard, for good reason IMO.  But we've never
> had the courage to break cleanly with this Berkeley leftover and
> insist that people spell it SQL-style as "SELECT ROW(foo.*) FROM foo".
> I'd just as soon not introduce the same kind of ambiguity into UPDATE
> if we have a reasonable alternative.

Ah, interesting point (I happen to like the terse syntax and use it
often).  This is for posterity anyways since you guys seem to like
Atri's proposal, which surprised me.  However, I think you're over
simplifying things here.  Syntax aside: I think
SELECT f FROM foo f;
and a hypothetical
SELECT row(f.*) FROM foo f;

give different semantics.  The former gives an object of type 'f' and
the latter gives type 'row'.  To get parity, you'd have to add an
extra cast which means you'd have to play tricky games to avoid extra
performance overhead besides being significantly more verbose.  I'm
aware some of the other QUELisms are pretty dodgy and have burned us
in the past (like that whole function as record reference thing) but
pulling a record as a field in select is pretty nice.  It's also
widely used and quite useful in json serialization.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread CK Tan
Merlin, glad you tried it. 

We take the query plan exactly as given by the planner, decide whether to JIT 
or to punt depending on the cost. If we punt, it goes back to pg executor. If 
we JIT, and if we could not proceed (usually of some operators we haven't 
implemented yet), we again punt. Once we were able to generate the code, there 
is no going back; we call into LLVM to obtain the function entry point, and run 
it to completion. The 3% improvement you see in OLTP tests is definitely noise.

The bigint sum,avg,count case in the example you tried has some optimization. 
We use int128 to accumulate the bigint instead of numeric in pg. Hence the big 
speed up. Try the same query on int4 for the improvement where both pg and 
vitessedb are using int4 in the execution.

The speed up is really noticeable when the data type is nonvarlena. In the 
varlena cases, we still call into pg routines most of the times. Again, try the 
sum,avg,count query on numeric, and you will see what I mean.

Also, we don't support UDF at the moment. So all queries involving UDF gets 
sent to pg executor.

On your question of 32k page size, the rational is that some of our customers 
could be interested in a data warehouse on pg. 32k page size is a big win when 
all you do is seqscan all day long.

We are looking for bug reports at these stage and some stress tests done 
without our own prejudices. Some test on real data in non prod setting on 
queries that are highly CPU bound would be ideal.

Thanks,
-cktan

> On Oct 17, 2014, at 6:43 AM, Merlin Moncure  wrote:
> 
>> On Fri, Oct 17, 2014 at 8:14 AM, Merlin Moncure  wrote:
>>> On Fri, Oct 17, 2014 at 7:32 AM, CK Tan  wrote:
>>> Hi everyone,
>>> 
>>> Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor
>>> designed for compute intensive OLAP workload. We have gotten it to a
>>> reasonable state and would like to open it up to the pg hackers
>>> community for testing and suggestions.
>>> 
>>> Vitesse DB offers
>>> -- JIT Compilation for compute-intensive queries
>>> -- CSV parsing with SSE instructions
>>> -- 100% binary compatibility with PG9.3.5.
>>> 
>>> Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X 
>>> faster.
>>> 
>>> Our TPCH 1GB benchmark results is also available at
>>> http://vitessedata.com/benchmark/ .
>>> 
>>> Please direct any questions by email to ck...@vitessedata.com .
>> 
>> You offer a binary with 32k block size...what's the rationale for that?
> 
> (sorry for the double post)
> 
> OK, I downloaded the ubuntu binary and ran your benchmarks (after
> making some minor .conf tweaks like disabling SSL).  I then ran your
> benchmark (after fixing the typo) of the count/sum/avg test -- *and
> noticed a 95% reduction in runtime performance* which is really quite
> amazing IMNSHO.  I also ran a select only test on small scale factor
> pgbench and didn't see any regression there -- in fact you beat stock
> by ~ 3% (although this could be measurement noise).   So now you've
> got my attention.  So, if you don't mind, quit being coy and explain
> how the software works and all the neat things it does and doesn't do.
> 
> merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Tom Lane
Merlin Moncure  writes:
> On Fri, Oct 17, 2014 at 10:16 AM, Tom Lane  wrote:
>> I think it wouldn't; Merlin is proposing that f would be taken as the
>> field name.  A more realistic objection goes like this:
>> 
>> create table foo(f int, g int);
>> update foo x set x = (1,2);  -- works
>> alter table foo add column x int;
>> update foo x set x = (1,2,3);  -- no longer works
>> 
>> It's not a real good thing if a column addition or renaming can
>> so fundamentally change the nature of a query.

> That's exactly how SELECT works.  I also dispute that the user should
> be surprised in such cases.

Well, the reason we have a problem in SELECT is that we support an
unholy miscegenation of SQL-spec and PostQUEL syntax: the idea that
"SELECT foo FROM foo" could represent a whole-row selection is nowhere
to be found in the SQL standard, for good reason IMO.  But we've never
had the courage to break cleanly with this Berkeley leftover and
insist that people spell it SQL-style as "SELECT ROW(foo.*) FROM foo".
I'd just as soon not introduce the same kind of ambiguity into UPDATE
if we have a reasonable alternative.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trailing comma support in SELECT statements

2014-10-17 Thread Kevin Grittner
Pavel Stehule  wrote:

> do you know, so this feature is a proprietary and it is not based
> on ANSI/SQL? Any user, that use this feature and will to port to
> other database will hate it.

I remember that Sybase ASE allowed a trailing comma within the
parentheses of a table definition, which was handy.  I checked on
SQL Fiddle and found that MS SQL Server and MySQL both allow that,
too; although Oracle does not.  I'm not taking a position on
whether we should allow this in PostgreSQL, but not having it is
likely to annoy some users moving *to* PostgreSQL, while having it
is likely to annoy some users moving *away* from PostgreSQL.

None of the products I tried allowed a leading comma.

I didn't test, and have no knowledge regarding, how other products
treat extra commas elsewhere.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Merlin Moncure
On Fri, Oct 17, 2014 at 10:16 AM, Tom Lane  wrote:
> Marko Tiikkaja  writes:
>> local:marko=#* create table foo(f int);
>> CREATE TABLE
>> local:marko=#* update foo f set f=1;
>> UPDATE 0
>
>> This query would change meaning with your suggestion.
>
> I think it wouldn't; Merlin is proposing that f would be taken as the
> field name.  A more realistic objection goes like this:
>
> create table foo(f int, g int);
> update foo x set x = (1,2);  -- works
> alter table foo add column x int;
> update foo x set x = (1,2,3);  -- no longer works
>
> It's not a real good thing if a column addition or renaming can
> so fundamentally change the nature of a query.

That's exactly how SELECT works.  I also dispute that the user should
be surprised in such cases.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Tom Lane
Marko Tiikkaja  writes:
> local:marko=#* create table foo(f int);
> CREATE TABLE
> local:marko=#* update foo f set f=1;
> UPDATE 0

> This query would change meaning with your suggestion.

I think it wouldn't; Merlin is proposing that f would be taken as the
field name.  A more realistic objection goes like this:

create table foo(f int, g int);
update foo x set x = (1,2);  -- works
alter table foo add column x int;
update foo x set x = (1,2,3);  -- no longer works

It's not a real good thing if a column addition or renaming can
so fundamentally change the nature of a query.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Merlin Moncure
On Fri, Oct 17, 2014 at 10:10 AM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> On Fri, Oct 17, 2014 at 9:55 AM, Marko Tiikkaja  wrote:
>>> I don't know about Tom, but I didn't like that:
>>> http://www.postgresql.org/message-id/5364c982.7060...@joh.to
>
>> Hm, I didn't understand your objection:
>
>> 
>> So e.g.:
>>UPDATE foo f SET f = ..;
>
>> would resolve to the table, despite there being a column called "f"?
>> That would break backwards compatibility.
>> 
>
>> That's not correct: it should work exactly as 'select' does; given a
>> conflict resolve the field name, so no backwards compatibility issue.
>
> The point is that it's fairly messy (and bug-prone) to have a syntax
> where we have to make an arbitrary choice between two reasonable
> interpretations.
>
> If you look back at the whole thread Marko's above-cited message is in,
> we'd considered a bunch of different possible syntaxes for this, and
> none of them had much support.  The "(*)" idea actually is starting to
> look pretty good to me.

Hm, I'll take it then.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Tom Lane
Merlin Moncure  writes:
> On Wed, Oct 15, 2014 at 3:48 AM, Atri Sharma  wrote:
>> Thanks for the links, but this patch only targets SET(*) case, which, if I
>> understand correctly, the patch you mentioned doesn't directly handle (If I
>> understand correctly, the target of the two patches is different).

> Yeah -- in fact, there was some discussion about this exact case.
> This patch solves a very important problem: when doing record
> operations to move data between databases with identical schema
> there's currently no way to 'update' in a generic way without building
> out the entire field list via complicated and nasty dynamic SQL.  I'm
> not sure about the proposed syntax though; it seems a little weird to
> me.  Any particular reason why you couldn't have just done:

> UPDATE table1 SET * = a,b,c, ...

> also,

> UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...);

> seems cleaner than the proposed syntax for row assignment.  Tom
> objected though IIRC.

That last proposal is no good because it would be ambiguous if the
table contains a column by that name.  The "(*)" idea actually seems
not bad, since it's shorthand for a parenthesized column list.

I'm not sure about the patch itself though --- in particular, it
doesn't seem to me that it should be touching transformTargetList,
since that doesn't have anything to do with expansion of multiassignments
today.  Probably this is a symptom of having chosen a poor representation
of the construct in the raw grammar output.  However, I've not exactly
wrapped my head around what that representation is ... the lack of any
comments explaining it doesn't help.

A larger question is whether it's appropriate to do the *-expansion
at parse time, or whether we'd need to postpone it to later in order
to handle reasonable use-cases.  Since we expand "SELECT *" at parse
time (and are mandated to do so by the SQL spec, I believe), it seems
consistent to do this at parse time as well; but perhaps there is a
counter argument.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Tom Lane
Merlin Moncure  writes:
> On Fri, Oct 17, 2014 at 9:55 AM, Marko Tiikkaja  wrote:
>> I don't know about Tom, but I didn't like that:
>> http://www.postgresql.org/message-id/5364c982.7060...@joh.to

> Hm, I didn't understand your objection:

> 
> So e.g.:
>UPDATE foo f SET f = ..;

> would resolve to the table, despite there being a column called "f"?
> That would break backwards compatibility.
> 

> That's not correct: it should work exactly as 'select' does; given a
> conflict resolve the field name, so no backwards compatibility issue.

The point is that it's fairly messy (and bug-prone) to have a syntax
where we have to make an arbitrary choice between two reasonable
interpretations.

If you look back at the whole thread Marko's above-cited message is in,
we'd considered a bunch of different possible syntaxes for this, and
none of them had much support.  The "(*)" idea actually is starting to
look pretty good to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-10-17 Thread Simon Riggs
On 17 October 2014 14:05, Alvaro Herrera  wrote:

> Of course, this is a task that requires much more thinking, design, and
> discussion than just adding multi-process capability to vacuumdb ...

Yes, please proceed with this patch as originally envisaged. No more
comments from me.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Marko Tiikkaja

On 10/17/14 5:03 PM, Merlin Moncure wrote:

Hm, I didn't understand your objection:


So e.g.:
UPDATE foo f SET f = ..;

would resolve to the table, despite there being a column called "f"?
That would break backwards compatibility.


That's not correct: it should work exactly as 'select' does; given a
conflict resolve the field name, so no backwards compatibility issue.


local:marko=# show server_version;
 server_version

 9.1.13
(1 row)

local:marko=#* create table foo(f int);
CREATE TABLE
local:marko=#* update foo f set f=1;
UPDATE 0

This query would change meaning with your suggestion.

I'm not saying it would be a massive problem in practice, but I think we 
should first consider options which don't break backwards compatibility, 
even if some consider them "less clean".



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Merlin Moncure
On Fri, Oct 17, 2014 at 9:55 AM, Marko Tiikkaja  wrote:
> On 10/17/14 4:15 PM, Merlin Moncure wrote:
>>
>> Any particular reason why you couldn't have just done:
>>
>> UPDATE table1 SET * = a,b,c, ...
>
>
> That just looks wrong to me.  I'd prefer  (*) = ..  over that any day.
>
>> UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...);
>>
>> seems cleaner than the proposed syntax for row assignment.  Tom
>> objected though IIRC.
>
> I don't know about Tom, but I didn't like that:
> http://www.postgresql.org/message-id/5364c982.7060...@joh.to

Hm, I didn't understand your objection:


So e.g.:
   UPDATE foo f SET f = ..;

would resolve to the table, despite there being a column called "f"?
That would break backwards compatibility.


That's not correct: it should work exactly as 'select' does; given a
conflict resolve the field name, so no backwards compatibility issue.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Marko Tiikkaja

On 10/17/14 4:15 PM, Merlin Moncure wrote:

Any particular reason why you couldn't have just done:

UPDATE table1 SET * = a,b,c, ...


That just looks wrong to me.  I'd prefer  (*) = ..  over that any day.


UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...);

seems cleaner than the proposed syntax for row assignment.  Tom
objected though IIRC.


I don't know about Tom, but I didn't like that: 
http://www.postgresql.org/message-id/5364c982.7060...@joh.to



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Atri Sharma
On Fri, Oct 17, 2014 at 7:45 PM, Merlin Moncure  wrote:

> On Wed, Oct 15, 2014 at 3:48 AM, Atri Sharma  wrote:
> >
> >
> > On Wednesday, October 15, 2014, Marti Raudsepp  wrote:
> >>
> >> Hi
> >>
> >> On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma 
> wrote:
> >> > Please find attached a patch which implements support for UPDATE
> table1
> >> > SET(*)=...
> >>
> >> I presume you haven't read Tom Lane's proposal and discussion about
> >> multiple column assignment in UPDATE:
> >> http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
> >> (Assigning all columns was also discussed there)
> >>
> >> And there's a WIP patch:
> >> http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us
> >
> > Thanks for the links, but this patch only targets SET(*) case, which, if
> I
> > understand correctly, the patch you mentioned doesn't directly handle
> (If I
> > understand correctly, the target of the two patches is different).
>
> Yeah -- in fact, there was some discussion about this exact case.
> This patch solves a very important problem: when doing record
> operations to move data between databases with identical schema
> there's currently no way to 'update' in a generic way without building
> out the entire field list via complicated and nasty dynamic SQL.


Thanks!


> I'm
> not sure about the proposed syntax though; it seems a little weird to
> me.  Any particular reason why you couldn't have just done:
>
> UPDATE table1 SET * = a,b,c, ...
>
> also,
>
> UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...);
>

I honestly have not spent a lot of time thinking about the exact syntax
that may be acceptable. If we have arguments for or against a specific
syntax, I will be glad to incorporate them.

>
>


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Merlin Moncure
On Wed, Oct 15, 2014 at 3:48 AM, Atri Sharma  wrote:
>
>
> On Wednesday, October 15, 2014, Marti Raudsepp  wrote:
>>
>> Hi
>>
>> On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma  wrote:
>> > Please find attached a patch which implements support for UPDATE table1
>> > SET(*)=...
>>
>> I presume you haven't read Tom Lane's proposal and discussion about
>> multiple column assignment in UPDATE:
>> http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
>> (Assigning all columns was also discussed there)
>>
>> And there's a WIP patch:
>> http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us
>
> Thanks for the links, but this patch only targets SET(*) case, which, if I
> understand correctly, the patch you mentioned doesn't directly handle (If I
> understand correctly, the target of the two patches is different).

Yeah -- in fact, there was some discussion about this exact case.
This patch solves a very important problem: when doing record
operations to move data between databases with identical schema
there's currently no way to 'update' in a generic way without building
out the entire field list via complicated and nasty dynamic SQL.  I'm
not sure about the proposed syntax though; it seems a little weird to
me.  Any particular reason why you couldn't have just done:

UPDATE table1 SET * = a,b,c, ...

also,

UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...);

seems cleaner than the proposed syntax for row assignment.  Tom
objected though IIRC.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread Merlin Moncure
On Fri, Oct 17, 2014 at 8:14 AM, Merlin Moncure  wrote:
> On Fri, Oct 17, 2014 at 7:32 AM, CK Tan  wrote:
>> Hi everyone,
>>
>> Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor
>> designed for compute intensive OLAP workload. We have gotten it to a
>> reasonable state and would like to open it up to the pg hackers
>> community for testing and suggestions.
>>
>> Vitesse DB offers
>> -- JIT Compilation for compute-intensive queries
>> -- CSV parsing with SSE instructions
>> -- 100% binary compatibility with PG9.3.5.
>>
>> Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster.
>>
>> Our TPCH 1GB benchmark results is also available at
>> http://vitessedata.com/benchmark/ .
>>
>> Please direct any questions by email to ck...@vitessedata.com .
>
> You offer a binary with 32k block size...what's the rationale for that?

(sorry for the double post)

OK, I downloaded the ubuntu binary and ran your benchmarks (after
making some minor .conf tweaks like disabling SSL).  I then ran your
benchmark (after fixing the typo) of the count/sum/avg test -- *and
noticed a 95% reduction in runtime performance* which is really quite
amazing IMNSHO.  I also ran a select only test on small scale factor
pgbench and didn't see any regression there -- in fact you beat stock
by ~ 3% (although this could be measurement noise).   So now you've
got my attention.  So, if you don't mind, quit being coy and explain
how the software works and all the neat things it does and doesn't do.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-17 Thread Michael Paquier
On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao  wrote:

> In this case, the patch seems to make the restartpoint recycle even WAL
> files
> which have .ready files and will have to be archived later. Thought?
>
The real problem currently is that it is possible to have a segment file
not marked as .done during recovery when stream connection is abruptly cut
when this segment is switched, marking it as .ready in archive_status and
simply letting this segment in pg_xlog because it will neither be recycled
nor removed. I have not been able to look much at this code these days, so
I am not sure how invasive it would be in back-branches, but perhaps we
should try to improve code such as when a segment file is switched and
connection to the is cut, we guarantee that this file is completed and
marked as .done.
-- 
Michael


Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread Merlin Moncure
On Fri, Oct 17, 2014 at 7:32 AM, CK Tan  wrote:
> Hi everyone,
>
> Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor
> designed for compute intensive OLAP workload. We have gotten it to a
> reasonable state and would like to open it up to the pg hackers
> community for testing and suggestions.
>
> Vitesse DB offers
> -- JIT Compilation for compute-intensive queries
> -- CSV parsing with SSE instructions
> -- 100% binary compatibility with PG9.3.5.
>
> Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster.
>
> Our TPCH 1GB benchmark results is also available at
> http://vitessedata.com/benchmark/ .
>
> Please direct any questions by email to ck...@vitessedata.com .

You offer a binary with 32k block size...what's the rationale for that?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-17 Thread Fujii Masao
On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao  wrote:
> On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier
>  wrote:
>>
>>
>> On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier 
>> wrote:
>>>
>>> The additional process at promotion sounds like a good idea, I'll try to
>>> get a patch done tomorrow. This would result as well in removing the
>>> XLogArchiveForceDone stuff. Either way, not that I have been able to
>>> reproduce the problem manually, things can be clearly solved.
>>
>> Please find attached two patches aimed to fix this issue and to improve the
>> situation:
>> - 0001 prevents the apparition of those phantom WAL segment file by ensuring
>> that when a node is in recovery it will remove it whatever its status in
>> archive_status. This patch is the real fix, and should be applied down to
>> 9.2.
>> - 0002 is a patch implementing Heikki's idea of enforcing all the segment
>> files present in pg_xlog to have their status to .done, marking them for
>> removal. When looking at the code, I finally concluded that Fujii-san's
>> point, about marking in all cases as .done segment files that have been
>> fully streamed, actually makes more sense to not be backward. This patch
>> would actually not be mandatory for back-patching, but it makes the process
>> more robust IMO.
>
> Thanks for the patches!

While reviewing the patch, I found another bug related to WAL file in recovery
mode. The problem is that exitArchiveRecovery() always creates .ready file for
the last WAL file of the old timeline even when it's restored from the archive
and has .done file. So this causes the already-archived WAL file to be archived
again Attached patch fixes this bug.

Regards,

-- 
Fujii Masao
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 5351,5368  exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo)
  	 * for the new timeline.
  	 *
  	 * Notify the archiver that the last WAL segment of the old timeline is
! 	 * ready to copy to archival storage. Otherwise, it is not archived for a
! 	 * while.
  	 */
  	if (endTLI != ThisTimeLineID)
  	{
  		XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
  
! 		if (XLogArchivingActive())
! 		{
! 			XLogFileName(xlogpath, endTLI, endLogSegNo);
! 			XLogArchiveNotify(xlogpath);
! 		}
  	}
  
  	/*
--- 5351,5367 
  	 * for the new timeline.
  	 *
  	 * Notify the archiver that the last WAL segment of the old timeline is
! 	 * ready to copy to archival storage if its .done file doesn't exist
! 	 * (e.g., if it's the restored WAL file, it's expected to have .done file).
! 	 * Otherwise, it is not archived for a while.
  	 */
  	if (endTLI != ThisTimeLineID)
  	{
  		XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
  
! 		/* Create .ready file only when neither .ready nor .done files exist */
! 		XLogFileName(xlogpath, endTLI, endLogSegNo);
! 		XLogArchiveCheckDone(xlogpath);
  	}
  
  	/*
*** a/src/backend/access/transam/xlogarchive.c
--- b/src/backend/access/transam/xlogarchive.c
***
*** 516,521  XLogArchiveNotify(const char *xlog)
--- 516,527 
  	char		archiveStatusPath[MAXPGPATH];
  	FILE	   *fd;
  
+ 	/*
+ 	 * We should not create .ready file for already archived WAL file to
+ 	 * prevent it from being archived again.
+ 	 */
+ 	Assert(XLogArchiveIsBusy(xlog));
+ 
  	/* insert an otherwise empty file called .ready */
  	StatusFilePath(archiveStatusPath, xlog, ".ready");
  	fd = AllocateFile(archiveStatusPath, "w");

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-10-17 Thread Alvaro Herrera
Amit Kapila wrote:
> On Fri, Oct 17, 2014 at 1:31 AM, Simon Riggs  wrote:
> >
> > On 16 October 2014 15:09, Amit Kapila  wrote:

> > c) seems like the only issue that needs any thought. I don't think its
> > going to be that hard.
> >
> > I don't see any problems with the other points. You can make a
> > function wait, if you wish.
> 
> It is quite possible, but still I think to accomplish such a function,
> we need to have some mechanism where it can inform auto vacuum
> and then some changes in auto vacuum to receive/read that information
> and reply back.  I don't think any such mechanism exists.

You're right, it doesn't.  I think we have plenty more infrastructure
for that than we had when autovacuum was initially developed.  It
shouldn't be that hard.

Of course, this is a task that requires much more thinking, design, and
discussion than just adding multi-process capability to vacuumdb ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-10-17 Thread Simon Riggs
On 17 October 2014 12:52, Amit Kapila  wrote:

> It is quite possible, but still I think to accomplish such a function,
> we need to have some mechanism where it can inform auto vacuum
> and then some changes in auto vacuum to receive/read that information
> and reply back.  I don't think any such mechanism exists.

That's exactly how the CHECKPOINT command works, in conjunction with
the checkpointer process.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread Andres Freund
Hi,


On 2014-10-17 05:32:13 -0700, CK Tan wrote:
> Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor
> designed for compute intensive OLAP workload. We have gotten it to a
> reasonable state and would like to open it up to the pg hackers
> community for testing and suggestions.
> 
> Vitesse DB offers
> -- JIT Compilation for compute-intensive queries
> -- CSV parsing with SSE instructions
> -- 100% binary compatibility with PG9.3.5.
> 
> Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster.

How are these modifications licensed?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-17 Thread Dag-Erling Smørgrav
Michael Paquier  writes:
> Please note that new features can only be added to the version
> currently in development, aka 9.5.

I understand this policy.  However, this new feature a) has absolutely
no impact unless the admin makes a conscious decision to use it and b)
will make life much easier for everyone if a POODLE-like vulnerability
is discovered in TLS.

> You should as well register your patch to the current commit fest, I
> think you are still in time:
> https://commitfest.postgresql.org/action/commitfest_view?id=24

Thanks for reminding me.

DES
-- 
Dag-Erling Smørgrav - d...@des.no


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2014-10-17 Thread Andres Freund
On 2014-10-17 14:57:03 +0800, Craig Ringer wrote:
> On 10/17/2014 02:49 AM, Robert Haas wrote:
> > I think you could probably make the DELETE policy control what can get
> > deleted, but then have the SELECT policy further filter what gets
> > returned.
> 
> That seems like the worst of both worlds to me.
> 
> Suddenly DELETE ... RETURNING might delete more rows than it reports a
> resultset for. As well as being potentially dangerous for people using
> it in wCTEs, etc, to me that's the most astonishing possible outcome of all.
> 
> I'd be much happier with even:
> 
>   ERROR: RETURNING not permitted with SELECT row-security policy

FWIW, that doesn't sound acceptable to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Vitesse DB call for testing

2014-10-17 Thread CK Tan
Hi everyone,

Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor
designed for compute intensive OLAP workload. We have gotten it to a
reasonable state and would like to open it up to the pg hackers
community for testing and suggestions.

Vitesse DB offers
-- JIT Compilation for compute-intensive queries
-- CSV parsing with SSE instructions
-- 100% binary compatibility with PG9.3.5.

Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster.

Our TPCH 1GB benchmark results is also available at
http://vitessedata.com/benchmark/ .

Please direct any questions by email to ck...@vitessedata.com .

Thank you for your help.

--
CK Tan
Vitesse Data, Inc.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2014-10-17 Thread Robert Haas
On Fri, Oct 17, 2014 at 7:46 AM, Stephen Frost  wrote:
> Thoughts on 'WITH RETURNING' / 'WITHOUT RETURNING' and what the default
> should be?

That sounds like a horrendous pile of nasty complication for no good purpose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-17 Thread Fujii Masao
On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier
 wrote:
>
>
> On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier 
> wrote:
>>
>> The additional process at promotion sounds like a good idea, I'll try to
>> get a patch done tomorrow. This would result as well in removing the
>> XLogArchiveForceDone stuff. Either way, not that I have been able to
>> reproduce the problem manually, things can be clearly solved.
>
> Please find attached two patches aimed to fix this issue and to improve the
> situation:
> - 0001 prevents the apparition of those phantom WAL segment file by ensuring
> that when a node is in recovery it will remove it whatever its status in
> archive_status. This patch is the real fix, and should be applied down to
> 9.2.
> - 0002 is a patch implementing Heikki's idea of enforcing all the segment
> files present in pg_xlog to have their status to .done, marking them for
> removal. When looking at the code, I finally concluded that Fujii-san's
> point, about marking in all cases as .done segment files that have been
> fully streamed, actually makes more sense to not be backward. This patch
> would actually not be mandatory for back-patching, but it makes the process
> more robust IMO.

Thanks for the patches!

I found one problem in the 0002 patch. The patch changes the recovery so that
it creates .done files for every WAL files which exist in pg_xlog directory at
the end of recovery. But even WAL files which will have to be archived later
can exist in pg_xlog at that moment. For example, the latest, recycled and
fully-written-but-not-archived-yet (i.e., maybe having .ready files) WAL files.
The patch wrongly prevents them from being archvied at all.

ISTM that the 0001 patch has the similar problem. Please imagine the following
scenario.

1. There are many unarchived WAL files in pg_xlog because of the continuous
failure of archive_command, for example, and then the server unfortunately
crashes because of the corruption of database itself.

2. DBA restores the backup onto the server and copies all the WAL files
 from old pg_xlog to new one. Then he or she prepares for archive recovery.

3. DBA starts the server and the archive recovery starts.

4. After all the archived WAL files are replayed, the server tries to replay
 the WAL files in pg_xlog. Since there are many WAL files in pg_xlog,
 more than one restartpoints happen while they are being replayed.

In this case, the patch seems to make the restartpoint recycle even WAL files
which have .ready files and will have to be archived later. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2014-10-17 Thread Stephen Frost
* David G Johnston (david.g.johns...@gmail.com) wrote:
> How about returning a placeholder row but with all the values replaced with
> NULL?

I don't think that would be a good approach..  A user actually looking
at those rows would be highly confused.

> In the absence of returning does the delete count show the total number of
> rows deleted or only the number of rows deleted that the user would be aware
> of if they issued a select with the same criteria?  Whatever the answer the
> number of rows returned with returning should match the row count normally
> noted.

Today, everything matches up, yes.  Having rows which are deleted but
which don't show up in RETURNING could certainly surprise people and
applications, which is why I tend to favor the 'all-or-error' approach
that others have also suggested.  Adding that wouldn't be difficult,
though we'd need to decide which should be the default.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-10-17 Thread Amit Kapila
On Fri, Oct 17, 2014 at 1:31 AM, Simon Riggs  wrote:
>
> On 16 October 2014 15:09, Amit Kapila  wrote:
>
> > I think doing anything on the server side can have higher complexity
like:
> > a. Does this function return immediately after sending request to
> > autovacuum, if yes then the behaviour of this new functionality
> > will be different as compare to vacuumdb which user might not
> > expect?
> > b. We need to have some way to handle errors that can occur in
> > autovacuum (may be need to find a way to pass back to user),
> > vacuumdb or Vacuum can report error to user.
> > c. How does nworkers input relates to autovacuum_max_workers
> > which is needed at start for shared memory initialization and in calc
> > of maxbackends.
> > d. How to handle database wide vacuum which is possible via vacuumdb
> > e. What is the best UI (a command like above or via config parameters)?
>
>
> c) seems like the only issue that needs any thought. I don't think its
> going to be that hard.
>
> I don't see any problems with the other points. You can make a
> function wait, if you wish.

It is quite possible, but still I think to accomplish such a function,
we need to have some mechanism where it can inform auto vacuum
and then some changes in auto vacuum to receive/read that information
and reply back.  I don't think any such mechanism exists.

> > I think we can find a way for above and may be if any other similar
things
> > needs to be taken care, but still I think it is better that we have this
> > feature
> > via vacuumdb rather than adding complexity in server code.  Also the
> > current algorithm used in patch is discussed and agreed upon in this
> > thread and if now we want to go via some other method (auto vacuum),
> > it might take much more time to build consensus on all the points.
>
> Well, I read Alvaro's point from earlier in the thread and agreed with
> it. All we really need is an instruction to autovacuum to say "be
> aggressive".
>
> Just because somebody added something to the TODO list doesn't make it
> a good idea. I apologise to Dilip for saying this, it is not anything
> against him, just the idea.
>
> Perhaps we just accept the patch and change AV in the future.

So shall we move ahead with review of this patch and make a note
of changing AV in future (may be TODO)?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] CREATE POLICY and RETURNING

2014-10-17 Thread Stephen Frost
* Thom Brown (t...@linux.com) wrote:
> On 17 October 2014 07:57, Craig Ringer  wrote:
> > On 10/17/2014 02:49 AM, Robert Haas wrote:
> > > I think you could probably make the DELETE policy control what can get
> > > deleted, but then have the SELECT policy further filter what gets
> > > returned.
> >
> > That seems like the worst of both worlds to me.
> >
> > Suddenly DELETE ... RETURNING might delete more rows than it reports a
> > resultset for. As well as being potentially dangerous for people using
> > it in wCTEs, etc, to me that's the most astonishing possible outcome of
> > all.
> >
> > I'd be much happier with even:
> >
> >   ERROR: RETURNING not permitted with SELECT row-security policy
> >
> > than this.
> 
> +1
> 
> This suggestion is most in line with what I would expect to occur.

This was along the lines that I've been thinking for how to address this
also and I think it's the least surprising- but I want it controllable..

Thoughts on 'WITH RETURNING' / 'WITHOUT RETURNING' and what the default
should be?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] CREATE POLICY and RETURNING

2014-10-17 Thread Stephen Frost
* Fujii Masao (masao.fu...@gmail.com) wrote:
> Another minor problem that I found is that pg_dump always fails when
> there is a row-level policy for update. I think that the attached patch
> should be applied.

Urgh.  That was tested before but we switched the characters used and
evidently missed that. :(

Will fix, thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-17 Thread Amit Kapila
On Tue, Oct 14, 2014 at 11:34 AM, Amit Kapila 
wrote:
>
>
> I am not sure why we are seeing difference even though running
> on same m/c with same configuration.

I have tried many times, but I could not get the numbers you have
posted above with HEAD, however now trying with the latest version
[1] posted by you, everything seems to be fine at this workload.
The data at higher client count is as below:

  HEAD – commit 494affb

 Shared_buffers=8GB; Scale Factor = 3000

 Client Count/No. Of Runs (tps) 64 128  Run-1 271799 24  Run-2 274341
245207  Run-3 275019 252258





 HEAD – commit 494affb + wait free lw_shared_v2

 Shared_buffers=8GB; Scale Factor = 3000

 Client Count/No. Of Runs (tps) 64 128  Run-1 286209 274922  Run-2 289101
274495  Run-3 289639 273633

So I am planning to proceed further with the review/test of your
latest patch.

According to me, below things are left from myside:
a. do some basic tpc-b tests with patch
b. re-review latest version posted by you

I know that you have posted optimization into StrategyGetBuffer() in
this thread, however I feel we can evaluate it separately unless you
are of opinion that both the patches should go together.

[1]
http://www.postgresql.org/message-id/20141010111027.gc6...@alap3.anarazel.de



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-17 Thread Michael Paquier
On Fri, Oct 17, 2014 at 7:58 PM, Dag-Erling Smørgrav  wrote:

> The default is "ALL:-SSLv2" in 9.0-9.3 and "ALL:-SSL" in 9.4 and up.
> This corresponds to the current hardcoded values, so the default
> behavior is unchanged, but the admin now has the option to select a
> different settings, e.g. if a serious vulnerability is found in TLS 1.0.
>
Please note that new features can only be added to the version currently in
development, aka 9.5. You should as well register your patch to the current
commit fest, I think you are still in time:
https://commitfest.postgresql.org/action/commitfest_view?id=24
-- 
Michael


Re: [HACKERS] CREATE POLICY and RETURNING

2014-10-17 Thread Fujii Masao
On Fri, Oct 17, 2014 at 3:49 AM, Robert Haas  wrote:
>>> That's an argument in favour of only applying a read-filtering policy
>>> where a RETURNING clause is present, but that introduces the "surprise!
>>> the effects of your DELETE changed based on an unrelated clause!" issue.
>>
>> No- if we were going to do this, I wouldn't want to change the existing
>> structure but rather provide either:
>>
>> a) a way to simply disable RETURNING if the policy is in effect and the
>>policy creator doesn't wish to allow it
>> b) allow the user to define another clause which would be applied to the
>>rows in the RETURNING set
>
> I think you could probably make the DELETE policy control what can get
> deleted, but then have the SELECT policy further filter what gets
> returned.

+1

That's more intuitive to me because another security feature "privilege"
works in that way, i.e., SELECT privilege not DELETE controls RETURNING.

Another minor problem that I found is that pg_dump always fails when
there is a row-level policy for update. I think that the attached patch
should be applied.

Regards,

-- 
Fujii Masao
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index c56a4cb..16ebc12 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2939,7 +2939,7 @@ dumpRowSecurity(Archive *fout, DumpOptions *dopt, RowSecurityInfo *rsinfo)
 		cmd = "SELECT";
 	else if (strcmp(rsinfo->rseccmd, "a") == 0)
 		cmd = "INSERT";
-	else if (strcmp(rsinfo->rseccmd, "u") == 0)
+	else if (strcmp(rsinfo->rseccmd, "w") == 0)
 		cmd = "UPDATE";
 	else if (strcmp(rsinfo->rseccmd, "d") == 0)
 		cmd = "DELETE";

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: plpgsql - Assert statement

2014-10-17 Thread Pavel Stehule
2014-10-17 9:14 GMT+02:00 Petr Jelinek :

> On 16/10/14 13:29, Pavel Stehule wrote:
>
>> Hi
>>
>> 2014-10-14 22:57 GMT+02:00 Petr Jelinek >
>> Short review of the patch. Note that this has nothing to do with
>> actual assertions, at the moment it's just enhancement of RAISE
>> statement to make it optionally conditional. As I was one of the
>> people who voted for it I do think we want this and I like the syntax.
>>
>> Code applies cleanly, seems formatted according to project standards
>> - there is unnecessary whitespace added in variable declaration part
>> of exec_stmt_raise which should be removed.
>>
>>
>> fixed
>>
>>
>> Passes make check, I would prefer to have little more complex
>> expression than just "true" in the new regression test added for
>> this feature.
>>
>>
>> fixed
>>
>>
>> Did some manual testing, seems to work as advertised.
>>
>> There are no docs at the moment which is the only show-stopper that
>> I can see so far.
>>
>>
>> fixed
>>
>>
> Great, looks good to me, marking as ready for committer.


Thank you very much

Pavel


>
>
> --
>  Petr Jelinek  http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-17 Thread Simon Riggs
On 17 October 2014 09:55,   wrote:

>>A new parameter to send feedback should be called --feedback

>>A second parameter to decide whether to fsync should be called --fsync
>
> I think keep using "--reply-fsync" and "--fsync-interval" is better than make 
> new options.
> Thought?

We already have hot_standby_feedback, so using the name feedback is best idea.

I am suggesting that we send feedback even if we do not fsync, to
allow the master to track our progress. Hence the name of the second
parameter was just fsync.

So both names were suggested because of links to those terms already
being used for similar reasons elsewhere in Postgres.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-17 Thread Dag-Erling Smørgrav
The attached patches add an ssl_protocols configuration option which
control which versions of SSL or TLS the server will use.  The syntax is
similar to Apache's SSLProtocols directive, except that the list is
colon-separated instead of whitespace-separated, although that is easy
to change if it proves unpopular.

Summary of the patch:

 - In src/backend/libpq/be-secure.c:
   - Add an SSLProtocols variable for the option.
   - Add a function, parse_SSL_protocols(), that parses an ssl_protocols
 string and returns a bitmask suitable for SSL_CTX_set_options().
   - Change initialize_SSL() to call parse_SSL_protocols() and pass the
 result to SSL_CTX_set_options().
 - In src/backend/utils/misc/guc.c:
   - Add an extern declaration for SSLProtocols.
   - Add an entry in the ConfigureNamesString array for the
 ssl_protocols option.
 - In src/backend/utils/misc/postgresql.conf.sample:
   - Add a sample ssl_protocols line.
 - In doc/src/sgml/config.sgml:
   - Document the ssl_protocols option.

The file names are slightly different in 9.5, since be-secure.c was
split in two and the declaration was moved into libpq.h.

The default is "ALL:-SSLv2" in 9.0-9.3 and "ALL:-SSL" in 9.4 and up.
This corresponds to the current hardcoded values, so the default
behavior is unchanged, but the admin now has the option to select a
different settings, e.g. if a serious vulnerability is found in TLS 1.0.

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6ee17d8..7233a73 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1027,6 +1027,34 @@ include_dir 'conf.d'
   
  
 
+ 
+  ssl_protocols (string)
+  
+   ssl_protocols configuration parameter
+  
+  
+   
+Specifies a colon-separated list of SSL protocols that are
+allowed to be used on secure connections. Each entry in the list can
+be prefixed by a + (add to the current list)
+or - (remove from the current list). If no prefix is used,
+the list is replaced with the specified protocol.
+   
+   
+The full list of supported protocols can be found in the
+the openssl manual page.  In addition to the names of
+individual protocols, the following keywords can be
+used: ALL (all protocols supported by the underlying
+crypto library), SSL (all supported versions
+of SSL) and TLS (all supported versions
+of TLS).
+   
+   
+The default is ALL:-SSL.
+   
+  
+ 
+
  
   ssl_ciphers (string)
   
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index b05364c..f440b77 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -87,6 +87,7 @@ static int	verify_cb(int, X509_STORE_CTX *);
 static void info_cb(const SSL *ssl, int type, int args);
 static void initialize_ecdh(void);
 static const char *SSLerrmessage(void);
+static long parse_SSL_protocols(const char *str);
 
 /* are we in the middle of a renegotiation? */
 static bool in_ssl_renegotiation = false;
@@ -245,15 +246,16 @@ be_tls_init(void)
 			SSLerrmessage(;
 	}
 
-	/* set up ephemeral DH keys, and disallow SSL v2/v3 while at it */
+	/* set up ephemeral DH keys */
 	SSL_CTX_set_tmp_dh_callback(SSL_context, tmp_dh_cb);
-	SSL_CTX_set_options(SSL_context,
-		SSL_OP_SINGLE_DH_USE |
-		SSL_OP_NO_SSLv2 | SSL_OP_NO_SSLv3);
+	SSL_CTX_set_options(SSL_context, SSL_OP_SINGLE_DH_USE);
 
 	/* set up ephemeral ECDH keys */
 	initialize_ecdh();
 
+	/* set up the allowed protocol list */
+	SSL_CTX_set_options(SSL_context, parse_SSL_protocols(SSLProtocols));
+
 	/* set up the allowed cipher list */
 	if (SSL_CTX_set_cipher_list(SSL_context, SSLCipherSuites) != 1)
 		elog(FATAL, "could not set the cipher list (no valid ciphers available)");
@@ -1053,3 +1055,106 @@ SSLerrmessage(void)
 	snprintf(errbuf, sizeof(errbuf), _("SSL error code %lu"), errcode);
 	return errbuf;
 }
+
+
+/*
+ *	Parse the SSL allowed protocol list
+ *
+ *	The logic here is inverted.  OpenSSL does not take a list of
+ *	protocols to use, but a list of protocols to avoid.  We use the
+ *	same bits with the opposite meaning, then invert the result.
+ */
+
+#define SSL_PROTO_SSLv2		SSL_OP_NO_SSLv2
+#define SSL_PROTO_SSLv3		SSL_OP_NO_SSLv3
+#define SSL_PROTO_SSL		(SSL_PROTO_SSLv2 | SSL_PROTO_SSLv3)
+#define SSL_PROTO_TLSv1		SSL_OP_NO_TLSv1
+#ifdef SSL_OP_NO_TLSv1_1
+#define SSL_PROTO_TLSv1_1	SSL_OP_NO_TLSv1_1
+#else
+#define SSL_PROTO_TLSv1_1	0
+#endif
+#ifdef SSL_OP_NO_TLSv1_2
+#define SSL_PROTO_TLSv1_2	SSL_OP_NO_TLSv1_2
+#else
+#define SSL_PROTO_TLSv1_2	0
+#endif
+#define SSL_PROTO_TLS		(SSL_PROTO_TLSv1 | SSL_PROTO_TLSv1_1 | SSL_PROTO_TLSv1_2)
+#define SSL_PROTO_ALL		(SSL_PROTO_SSL | SSL_PROTO_TLS)
+#define SSL_PROTO_NONE		0
+
+#define str_is_token(str, tok, len) \
+	(len == sizeof(tok) - 1 && pg_strncasecmp(str, tok, len) == 0)
+
+static long
+parse_SSL_

  1   2   >