Re: [HACKERS] Vitesse DB call for testing

2014-10-18 Thread CK Tan
Hi Mark,

Vitesse DB won't be open-sourced, or it would have been a contrib
module to postgres. We should take further discussions off this list.
People should contact me directly if there is any questions.

Thanks,
ck...@vitessedata.com



On Fri, Oct 17, 2014 at 10:55 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:
 On 18/10/14 07:13, Josh Berkus wrote:

 CK,

 Before we go any further on this, how is Vitesse currently licensed?
 last time we talked it was still proprietary.  If it's not being
 open-sourced, we likely need to take discussion off this list.


 +1

 Guys, you need to 'fess up on the licensing!

 Regards

 Mark


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-18 Thread Petr Jelinek

On 18/10/14 06:13, MauMau wrote:


[requirement]
10.2.3 Verify access to all audit trails is logged.

Malicious users often attempt to alter audit logs to
hide their actions, and a record of access allows
an organization to trace any inconsistencies or
potential tampering of the logs to an individual
account. Having access to logs identifying
changes, additions, and deletions can help retrace
steps made by unauthorized personnel.

[my comment]
I'm totally unsure how this can be fulfilled.



This is more matter of configuration of the whole system than something 
pg_audit itself needs to care about (see my answer to 10.5).




[requirement]
10.3 Record at least the following audit
trail entries for all system components for
each event:
10.3.4 Verify success or failure indication is included in log
entries.
10.3.5 Verify origination of event is included in log entries.

[my comment]
This doesn't seem to be fulfilled because the failure of SQL statements
and the client address are not part of the audit log entry.



You can add it to the log_line_prefix though.



[requirement]
10.5 Secure audit trails so they cannot
be altered.
10.5 Interview system administrators and examine system
configurations and permissions to verify that audit trails are
secured so that they cannot be altered as follows:
10.5.1 Only individuals who have a job-related need can view
audit trail files.
Adequate protection of the audit logs includes
strong access control (limit access to logs based
on “need to know” only), and use of physical or
network segregation to make the logs harder to
find and modify.
Promptly backing up the logs to a centralized log
server or media that is difficult to alter keeps the
logs protected even if the system generating the
logs becomes compromised.
10.5.2 Protect audit trail files from
unauthorized modifications.

[my comment]
I don't know how to achieve these, because the DBA (who starts/stops the
server) can modify and delete server log files without any record.



Logging can be setup in a way that it's not even stored on the server 
which DBA has access to. DBA can turn off logging (and the plugin) 
altogether or change logging config but we'd get the shutdown log when 
that happens so 10.2.2 and 10.2.6 will be fulfilled in that case I think.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-18 Thread Simon Riggs
On 18 October 2014 05:13, MauMau maumau...@gmail.com wrote:

 [requirement]
 10.6 Review logs and security events for
 all system components to identify
 anomalies or suspicious activity.
 Note: Log harvesting, parsing, and
 alerting tools may be used to meet this
 Requirement.
 The log review process does not have to be
 manual. The use of log harvesting, parsing, and
 alerting tools can help facilitate the process by
 identifying log events that need to be reviewed.

 [my comment]
 What commercial and open source products are well known as the log
 harvesting, parsing, and alerting tool?  Is it possible and reasonably easy
 to integrate pgaudit with those tools?  The purpose of audit logging feature
 is not recording facts, but to enable timely detection of malicious actions.
 So, I think the ease of integration with those tools must be evaluated.  But
 I don't know about such tools.

 I feel the current output format of pgaudit is somewhat difficult to treat:

 * The audit log entries are mixed with other logs in the server log files,
 so the user has to extract the audit log lines from the server log files and
 save them elsewhere.  I think it is necessary to store audit logs in
 separate files.

 * Does the command text need  around it in case it contains commas?

Audit entries are sent to the server log, yes.

The server log may be redirected to syslog, which allows various forms
of routing and manipulation that are outside of the reasonable domain
of pgaudit.

PostgreSQL also provides a logging hook that would allow you to filter
or redirect messages as desired.

Given those two ways of handling server log messages, the server log
is the obvious destination to provide for the recording/loggin part of
the audit requirement. pgaudit is designed to allow generating useful
messages, not be an out of the box compliance tool.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizer on sort aggregate

2014-10-18 Thread Greg Stark
On Sat, Oct 18, 2014 at 3:10 AM, Peter Geoghegan p...@heroku.com wrote:
 So the first eight bytes of the first string is 0x131F1F1B221E,
 and the second 0x131F1F1B220C. The last byte is different.

That's interesting but I think it's mostly a quirk of your example.
Afaics the difference is only that the en_US locale ignores
punctuation like : and /  (or at least treats them as less significant
than alphabetic characters). If you had strings that had less
punctuation or differences that didn't happen to arrive shortly after
the 8-byte boundary then it wouldn't make any difference.

And we still have to run strfrm at least once, write out the whole
binary blob to memory somewhere and if it spills to disk we still have
to write and read much more data. I think recognizing cases where
equality is the only thing we're interested in and locale-sensitive
sorting isn't necessary and using a memcmp would be a clear win.

I'm not immediately clear on what the cleanest way to integrate it
would be. A btree opclass function like the cmp function but that
doesn't need to be consistent with  and , only = ? Or perhaps a flag
on the btree opclass that indicates that the data types can safely be
compared with memcmp when equality is all that's needed? The latter is
pretty tempting since it would tell code something interesting about
the data type's internal storage that may lead to other optimizations.
On the other hand the former is nice in that the operator could maybe
handle other cases like padding by doing memcmp on only the
significant bits.



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-18 Thread Nicolas Barbier
2014-10-18 Stephen Frost sfr...@snowman.net:

 * Peter Eisentraut (pete...@gmx.net) wrote:

 More subtly, if we claim that a materialized view is a view, then we
 cannot have asynchronously updated materialized views, because then we
 have different semantics.

 This is, at least, a reason I can understand, though I'm not sure I see
 it as sufficient to say matviews are so different from views that they
 shouldn't be listed as such.

Maybe it's useful to try to imagine who the actual consumers of
INFORMATION_SCHEMA are? It's probably mostly generic SQL tools that
try to provide a list of objects with corresponding operations (drop,
alter, dump to a file, show contents, etc) or tools that provide
syntax completion.

I can only imagine two practical (though rather hypothetical) problems
caused by the semantical mismatch between normal views and
possibly-outdated matviews:

(1) Accidentally seeing outdated data: Something tries to replace part
of a query with a reference to a matview, because INFORMATION_SCHEMA
says that the definition of the view is such-and-so. This causes the
resulting query to possibly see outdated data.

(2) Accidentally seeing data that is too new: Something replaces a
reference to a matview in a query with the defining query of the
matview. This causes the resulting query to possibly see data that is
too new, assuming that the original query author is trying to rely
on the outdatedness of the matview in the vein of I want to see
yesterday's data. I personally consider relying on the outdatedness
of a matview to be bad design; Maybe that should be mentioned in the
documentation if I'm not the only one thinking that way.

(Note that (2) also happens when a generic SQL tool regenerates a
schema by recreating a matview as a normal view. The resulting normal
view seems to contain data that is too new.)

Those problems sound so far-fetched, that I suggest putting matviews
(even though they may be out-of-date) in INFORMATION_SCHEMA.VIEWS as
if they were normal views, so that in all other use cases (i.e., the
abovementioned generic SQL tools), the right thing happens. It is
probably useful to put them in INFORMATION_SCHEMA.TABLES with a
specialized type MATERIALIZED VIEW (or somesuch), so that tools that
know about the existence of matviews know how to make the difference.

Does someone know what other DBMSs do in this regard? I.e., do they
put anything in INFORMATION_SCHEMA.VIEWS for matviews? What TABLE_TYPE
do they use in INFORMATION_SCHEMA.TABLES?

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-18 Thread Bruce Momjian
On Fri, Oct 17, 2014 at 11:03:04PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Fri, Oct 17, 2014 at 06:15:37PM -0400, Tom Lane wrote:
  Those stats were perfectly valid: what the planner is looking for is
  accurate minimum and maximum values for the index's leading column, and
  that's what it got.  You're correct that a narrower index could have given
  the same results with a smaller disk footprint, but the planner got the
  results it needed from the index you provided for it to work with.
 
  Uh, why is the optimizer looking at the index on a,b,c and not just the
  stats on column a, for example?  I am missing something here.
 
 Because it needs up-to-date min/max values in order to avoid being
 seriously misled about selectivities of values near the endpoints.
 See commit 40608e7f949fb7e4025c0ddd5be01939adc79eec.

Oh, I had forgotten we did that.  It is confusing that there is no way
via EXPLAIN to see the access, making the method of consulting pg_stat_*
and using EXPLAIN unreliable.  Should we document this somewhere?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Code bug or doc bug?

2014-10-18 Thread Bruce Momjian
On Mon, Oct 13, 2014 at 12:17:54PM -0400, Bruce Momjian wrote:
 On Wed, Aug 27, 2014 at 06:39:21AM -0700, David G Johnston wrote:
   Is there a doc patch to make here?
  
  1. Last sentence change suggestion: The target tablespace must be empty.
  
  2. Based on Robert's comments it sounds like a You cannot change the
  default tablespace of the current database. comment should be added as
  well.
  
  Side note: I have no clue what the mapped relations Robert refers to
  are...
 
 I have created the attached doc patch for this.  Should we backpatch
 this through 9.0, or just 9.4?

Applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-18 Thread Marko Tiikkaja

On 10/18/14, 5:16 AM, Tom Lane wrote:

BTW, on re-reading that code I notice that it will happily seize upon
the first suitable index (first in OID order), regardless of how many
lower-order columns that index has got.  This doesn't make any difference
I think for get_actual_variable_range's own purposes, because it's only
expecting to touch the endmost index page regardless.  However, in light
of Marko's complaint maybe we should teach it to check all the indexes
and prefer the matching one with fewest columns?  It would only take a
couple extra lines of code, and probably not that many added cycles
considering we're going to do an index access of some sort.  But I'm
not sure if it's worth any extra effort --- I think in his example
case, there wasn't any narrower index anyway.


Perhaps accidentally this would have helped in my case, actually, since 
I could have created a new, smaller index CONCURRENTLY and then seen 
that the usage of the other index stopped increasing.  With the pick 
the smallest OID behaviour that was not possible.  Another idea had was 
some way to tell the optimizer not to use that particular index for 
stats lookups, but probably the use case for such a feature would be a 
bit narrow.


All that said, I don't think my struggles justify the change you 
described above.  Not sure if it's a good idea or not.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] interval typmodout is broken

2014-10-18 Thread Bruce Momjian
On Mon, Oct 13, 2014 at 07:38:39PM -0400, Bruce Momjian wrote:
 I think the basic problem is that the original author had the idea of
 doing:
 
   SELECT INTERVAL (2) '100. seconds';
interval
   --
00:01:41
 
 and using (2) in that location as a short-hand when the interval
 precision units were not specified, which seems logical.  However, they
 allowed it even when the units were specified:
 
   SELECT INTERVAL (2) '100. seconds' HOUR to SECOND;
interval
   --
00:01:41
 
 and in cases where the precision made no sense:
   
   SELECT INTERVAL (2) '100. seconds' HOUR to MINUTE;
interval
   --
00:01:00
 
 I have created the attached patch which only allows parentheses in the
 first case.  

Patch applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-18 Thread Bruce Momjian
On Sat, Oct 18, 2014 at 04:29:41PM +0200, Marko Tiikkaja wrote:
 On 10/18/14, 5:16 AM, Tom Lane wrote:
 BTW, on re-reading that code I notice that it will happily seize upon
 the first suitable index (first in OID order), regardless of how many
 lower-order columns that index has got.  This doesn't make any difference
 I think for get_actual_variable_range's own purposes, because it's only
 expecting to touch the endmost index page regardless.  However, in light
 of Marko's complaint maybe we should teach it to check all the indexes
 and prefer the matching one with fewest columns?  It would only take a
 couple extra lines of code, and probably not that many added cycles
 considering we're going to do an index access of some sort.  But I'm
 not sure if it's worth any extra effort --- I think in his example
 case, there wasn't any narrower index anyway.
 
 Perhaps accidentally this would have helped in my case, actually,
 since I could have created a new, smaller index CONCURRENTLY and
 then seen that the usage of the other index stopped increasing.
 With the pick the smallest OID behaviour that was not possible.
 Another idea had was some way to tell the optimizer not to use that
 particular index for stats lookups, but probably the use case for
 such a feature would be a bit narrow.

Well, if the index is there, why not use it?  I thought the problem was
just that you had no visibility into how those statistics were being
accessed.  Most people think EXPLAIN shows all accesses, but obviously
now it doesn't.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash index creation warning

2014-10-18 Thread Bruce Momjian
On Fri, Oct 17, 2014 at 02:36:55PM -0400, Bruce Momjian wrote:
 On Fri, Oct 17, 2014 at 12:56:52PM -0400, Tom Lane wrote:
  David G Johnston david.g.johns...@gmail.com writes:
   The question is whether we explain the implications of not being 
   WAL-logged
   in an error message or simply state the fact and let the documentation
   explain the hazards - basically just output:
   hash indexes are not WAL-logged and their use is discouraged
  
  +1.  The warning message is not the place to be trying to explain all the
  details.
 
 OK, updated patch attached.

Patch applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-18 Thread Marko Tiikkaja

On 10/18/14, 4:33 PM, Bruce Momjian wrote:

On Sat, Oct 18, 2014 at 04:29:41PM +0200, Marko Tiikkaja wrote:

Another idea had was some way to tell the optimizer not to use that
particular index for stats lookups, but probably the use case for
such a feature would be a bit narrow.


Well, if the index is there, why not use it?  I thought the problem was
just that you had no visibility into how those statistics were being
accessed.


Yes, exactly; if I had had the option to disable the index from the 
optimizer's point of view, I'd have seen that it's not used for looking 
up any data by any queries, and thus I would have known that I can 
safely drop it without slowing down queries.  Which was the only thing I 
cared about, and where the stats we provide failed me.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-18 Thread Thomas Kellerer
 Does someone know what other DBMSs do in this regard? I.e., do they
 put anything in INFORMATION_SCHEMA.VIEWS for matviews? What TABLE_TYPE
 do they use in INFORMATION_SCHEMA.TABLES? 

I can only speak for Oracle. 

Oracle doesn't have INFORMATION_SCHEMA but their JDBC driver treats mviews
as tables. If you ask the driver for a list of tables (explicitly passing
TABLE as the only type to return), it will include mviews as well,
returning MATERIALIZED VIEW as the type of the object.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Materialized-views-don-t-show-up-in-information-schema-tp5822643p5823559.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-18 Thread Bruce Momjian
On Sat, Oct 18, 2014 at 04:38:37PM +0200, Marko Tiikkaja wrote:
 On 10/18/14, 4:33 PM, Bruce Momjian wrote:
 On Sat, Oct 18, 2014 at 04:29:41PM +0200, Marko Tiikkaja wrote:
 Another idea had was some way to tell the optimizer not to use that
 particular index for stats lookups, but probably the use case for
 such a feature would be a bit narrow.
 
 Well, if the index is there, why not use it?  I thought the problem was
 just that you had no visibility into how those statistics were being
 accessed.
 
 Yes, exactly; if I had had the option to disable the index from the
 optimizer's point of view, I'd have seen that it's not used for
 looking up any data by any queries, and thus I would have known that
 I can safely drop it without slowing down queries.  Which was the
 only thing I cared about, and where the stats we provide failed me.

How many other cases do we have where the statistics are getting
incremented and there is no user visibility into the operation?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-18 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 How many other cases do we have where the statistics are getting
 incremented and there is no user visibility into the operation?

* system catalog accesses
* vacuum/analyze/cluster/etc

The fact that system-initiated accesses get counted in the statistics
is a feature, not a bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-18 Thread Tom Lane
Marko Tiikkaja ma...@joh.to writes:
 On 10/18/14, 4:33 PM, Bruce Momjian wrote:
 Well, if the index is there, why not use it?  I thought the problem was
 just that you had no visibility into how those statistics were being
 accessed.

 Yes, exactly; if I had had the option to disable the index from the 
 optimizer's point of view, I'd have seen that it's not used for looking 
 up any data by any queries, and thus I would have known that I can 
 safely drop it without slowing down queries.  Which was the only thing I 
 cared about, and where the stats we provide failed me.

This argument is *utterly* wrongheaded, because it assumes that the
planner's use of the index provided no benefit to your queries.  If the
planner was touching the index at all then it was planning queries in
which knowledge of the extremal value was relevant to accurate selectivity
estimation.  So it's quite likely that without the index you'd have gotten
different and inferior plans, whether or not those plans actually chose to
use the index.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-18 Thread Marko Tiikkaja

On 10/18/14, 5:46 PM, Tom Lane wrote:

Marko Tiikkaja ma...@joh.to writes:

Yes, exactly; if I had had the option to disable the index from the
optimizer's point of view, I'd have seen that it's not used for looking
up any data by any queries, and thus I would have known that I can
safely drop it without slowing down queries.  Which was the only thing I
cared about, and where the stats we provide failed me.


This argument is *utterly* wrongheaded, because it assumes that the
planner's use of the index provided no benefit to your queries.  If the
planner was touching the index at all then it was planning queries in
which knowledge of the extremal value was relevant to accurate selectivity
estimation.  So it's quite likely that without the index you'd have gotten
different and inferior plans, whether or not those plans actually chose to
use the index.


Maybe.  But at the same time that's a big problem: there's no way of 
knowing whether the index is actually useful or not when it's used only 
by the query planner.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FieldSelect optimization versus overall planner organization

2014-10-18 Thread Tom Lane
I looked into bug #11703,
http://www.postgresql.org/message-id/20141017162236.2523.74...@wrigleys.postgresql.org

What is happening here is that the FieldSelect optimization in
eval_const_expressions fails to play nice with the logic that determines
which Vars need to be available from a relation scan.  We start out with,
say,

select x from my_table, func2(my_table) f(x);

which inline_set_returning_functions() expands into

select x from my_table, lateral (select func1(my_table)) f(x);

At this point we decide that the sub-select can't be flattened into the
upper query (in this case, because it lacks a FROM clause, but any other
reason to suppress flattening would have the same result).  So we put the
sub-select on the back burner for the moment and start to plan the outer
query.  In particular, we determine that the only Var that the scan of
my_table has to produce is a whole-row Var (my_table.*) to satisfy the
lateral reference in the sub-select.

Eventually, we get around to recursively planning the sub-query, and at
that point we'll inline the func1() call, producing a FieldSelect applied
to the whole-row Var for mytable.  When we do const-simplification on
the resulting expression, that pair of nodes is reduced to a simple Var
referencing mytable.my_column.  Which ought to be fine --- except that we
already planned the scan of my_table on the assumption that it should
produce my_table.*.  So after all the dust settles and setrefs.c is
doing the janitorial work, it finds out that the lateral reference to
mytable.my_column can't be satisfied, and you get the dreaded variable
not found in subplan target list error.

AFAICS, the only way to fix this right would require major
rearrangements of the order in which the planner does stuff, for example
insisting on doing function inlining and const-simplification all the way
down in subqueries before we plan the outer query level.  That'd be quite
a lot of work to change, and the odds of introducing new bugs are high
enough that I think back-patching would be out of the question anyway.

I think the only practical fix is to lobotomize the FieldSelect-collapsing
optimization.  It's probably safe enough to keep allowing it for whole-row
Vars of the current query level, but not those with levelsup  0.

Anybody have another idea?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of GetUserId() Usage

2014-10-18 Thread Brightwell, Adam
All,


 I'll break them into three pieces- superuser() cleanup, GetUserId() -
 has_privs_of_role(), and the additional-role-attributes patch will just
 depend on the others.


Attached is a patch for the GetUserId() - has_privs_of_role() cleanup for
review.

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index 67539ec..42d9a1f
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***
*** 34,39 
--- 34,40 
  #include storage/pmsignal.h
  #include storage/proc.h
  #include storage/procarray.h
+ #include utils/acl.h
  #include utils/lsyscache.h
  #include utils/ruleutils.h
  #include tcop/tcopprot.h
*** pg_signal_backend(int pid, int sig)
*** 113,119 
  		return SIGNAL_BACKEND_ERROR;
  	}
  
! 	if (!(superuser() || proc-roleId == GetUserId()))
  		return SIGNAL_BACKEND_NOPERMISSION;
  
  	/*
--- 114,120 
  		return SIGNAL_BACKEND_ERROR;
  	}
  
! 	if (!has_privs_of_role(GetUserId(), proc-roleId))
  		return SIGNAL_BACKEND_NOPERMISSION;
  
  	/*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
new file mode 100644
index 44ccd37..ea2cd1e
*** a/src/backend/utils/adt/pgstatfuncs.c
--- b/src/backend/utils/adt/pgstatfuncs.c
***
*** 20,25 
--- 20,26 
  #include libpq/ip.h
  #include miscadmin.h
  #include pgstat.h
+ #include utils/acl.h
  #include utils/builtins.h
  #include utils/inet.h
  #include utils/timestamp.h
*** pg_stat_get_activity(PG_FUNCTION_ARGS)
*** 675,681 
  			nulls[15] = true;
  
  		/* Values only available to same user or superuser */
! 		if (superuser() || beentry-st_userid == GetUserId())
  		{
  			SockAddr	zero_clientaddr;
  
--- 676,682 
  			nulls[15] = true;
  
  		/* Values only available to same user or superuser */
! 		if (has_privs_of_role(GetUserId(), beentry-st_userid))
  		{
  			SockAddr	zero_clientaddr;
  
*** pg_stat_get_backend_activity(PG_FUNCTION
*** 877,883 
  
  	if ((beentry = pgstat_fetch_stat_beentry(beid)) == NULL)
  		activity = backend information not available;
! 	else if (!superuser()  beentry-st_userid != GetUserId())
  		activity = insufficient privilege;
  	else if (*(beentry-st_activity) == '\0')
  		activity = command string not enabled;
--- 878,884 
  
  	if ((beentry = pgstat_fetch_stat_beentry(beid)) == NULL)
  		activity = backend information not available;
! 	else if (!has_privs_of_role(GetUserId(), beentry-st_userid))
  		activity = insufficient privilege;
  	else if (*(beentry-st_activity) == '\0')
  		activity = command string not enabled;
*** pg_stat_get_backend_waiting(PG_FUNCTION_
*** 898,904 
  	if ((beentry = pgstat_fetch_stat_beentry(beid)) == NULL)
  		PG_RETURN_NULL();
  
! 	if (!superuser()  beentry-st_userid != GetUserId())
  		PG_RETURN_NULL();
  
  	result = beentry-st_waiting;
--- 899,905 
  	if ((beentry = pgstat_fetch_stat_beentry(beid)) == NULL)
  		PG_RETURN_NULL();
  
! 	if (!has_privs_of_role(GetUserId(), beentry-st_userid))
  		PG_RETURN_NULL();
  
  	result = beentry-st_waiting;
*** pg_stat_get_backend_activity_start(PG_FU
*** 917,923 
  	if ((beentry = pgstat_fetch_stat_beentry(beid)) == NULL)
  		PG_RETURN_NULL();
  
! 	if (!superuser()  beentry-st_userid != GetUserId())
  		PG_RETURN_NULL();
  
  	result = beentry-st_activity_start_timestamp;
--- 918,924 
  	if ((beentry = pgstat_fetch_stat_beentry(beid)) == NULL)
  		PG_RETURN_NULL();
  
! 	if (!has_privs_of_role(GetUserId(), beentry-st_userid))
  		PG_RETURN_NULL();
  
  	result = beentry-st_activity_start_timestamp;
*** pg_stat_get_backend_xact_start(PG_FUNCTI
*** 943,949 
  	if ((beentry = pgstat_fetch_stat_beentry(beid)) == NULL)
  		PG_RETURN_NULL();
  
! 	if (!superuser()  beentry-st_userid != GetUserId())
  		PG_RETURN_NULL();
  
  	result = beentry-st_xact_start_timestamp;
--- 944,950 
  	if ((beentry = pgstat_fetch_stat_beentry(beid)) == NULL)
  		PG_RETURN_NULL();
  
! 	if (!has_privs_of_role(GetUserId(), beentry-st_userid))
  		PG_RETURN_NULL();
  
  	result = beentry-st_xact_start_timestamp;
*** pg_stat_get_backend_start(PG_FUNCTION_AR
*** 965,971 
  	if ((beentry = pgstat_fetch_stat_beentry(beid)) == NULL)
  		PG_RETURN_NULL();
  
! 	if (!superuser()  beentry-st_userid != GetUserId())
  		PG_RETURN_NULL();
  
  	result = beentry-st_proc_start_timestamp;
--- 966,972 
  	if ((beentry = pgstat_fetch_stat_beentry(beid)) == NULL)
  		PG_RETURN_NULL();
  
! 	if (!has_privs_of_role(GetUserId(), beentry-st_userid))
  		PG_RETURN_NULL();
  
  	result = beentry-st_proc_start_timestamp;
*** pg_stat_get_backend_client_addr(PG_FUNCT
*** 989,995 
  	if ((beentry = pgstat_fetch_stat_beentry(beid)) == NULL)
  		PG_RETURN_NULL();
  
! 	if (!superuser()  beentry-st_userid != 

[HACKERS] (auto-)analyze causing bloat/load

2014-10-18 Thread Andres Freund
Hi,

I've noticed an interesting phenomenon. Look at the following pgbench -c
32 -j 32 -P 1 output on a scale 200 database. Then guess where I've
started
SET vacuum_cost_delay = '20ms';
ANALYZE pgbench_accounts;
and when it finished. The cost_delay is to simulate a autovac triggered
analyze.

progress: 18.0 s, 17018.1 tps, lat 1.879 ms stddev 1.018
progress: 19.0 s, 17039.6 tps, lat 1.876 ms stddev 0.806
progress: 20.0 s, 16892.1 tps, lat 1.893 ms stddev 0.732
progress: 21.0 s, 16636.9 tps, lat 1.922 ms stddev 0.933
progress: 22.0 s, 16198.4 tps, lat 1.974 ms stddev 1.180
progress: 23.0 s, 16362.1 tps, lat 1.954 ms stddev 0.858
progress: 24.0 s, 15926.0 tps, lat 2.007 ms stddev 0.967
progress: 25.0 s, 14550.5 tps, lat 2.197 ms stddev 1.870
progress: 26.0 s, 15042.2 tps, lat 2.125 ms stddev 1.135
progress: 27.0 s, 14579.1 tps, lat 2.196 ms stddev 1.016
progress: 28.0 s, 14145.0 tps, lat 2.261 ms stddev 1.160
progress: 29.0 s, 13734.1 tps, lat 2.328 ms stddev 0.992
progress: 30.0 s, 12127.6 tps, lat 2.565 ms stddev 1.930
progress: 31.0 s, 12375.2 tps, lat 1.807 ms stddev 1.556
progress: 32.0 s, 11939.9 tps, lat 3.551 ms stddev 45.457
progress: 33.0 s, 12466.0 tps, lat 2.566 ms stddev 1.517
progress: 34.0 s, 12166.1 tps, lat 2.634 ms stddev 1.309
progress: 35.0 s, 11743.7 tps, lat 2.723 ms stddev 1.576
progress: 36.0 s, 11623.3 tps, lat 2.752 ms stddev 1.237
progress: 37.0 s, 11341.4 tps, lat 2.817 ms stddev 1.413
progress: 38.0 s, 11130.2 tps, lat 2.875 ms stddev 1.359
progress: 39.0 s, 10988.0 tps, lat 2.910 ms stddev 1.132
progress: 40.0 s, 10681.9 tps, lat 2.993 ms stddev 1.355
progress: 41.0 s, 10573.1 tps, lat 3.027 ms stddev 1.738
progress: 42.0 s, 10174.9 tps, lat 3.143 ms stddev 2.125
progress: 43.0 s, 10190.1 tps, lat 3.136 ms stddev 1.892
progress: 44.0 s, 9943.2 tps, lat 3.219 ms stddev 1.740
progress: 45.0 s, 9792.5 tps, lat 3.265 ms stddev 2.229
progress: 46.0 s, 9701.6 tps, lat 3.299 ms stddev 1.834
progress: 47.0 s, 9376.6 tps, lat 3.410 ms stddev 2.149
progress: 48.0 s, 9217.2 tps, lat 3.471 ms stddev 1.992
progress: 49.0 s, 9265.9 tps, lat 3.449 ms stddev 1.819
progress: 50.0 s, 9090.9 tps, lat 3.521 ms stddev 1.916
progress: 51.0 s, 8933.9 tps, lat 3.578 ms stddev 1.813
progress: 52.0 s, 8849.6 tps, lat 3.616 ms stddev 1.913
progress: 53.0 s, 8767.5 tps, lat 3.649 ms stddev 1.952
progress: 54.0 s, 8521.5 tps, lat 3.753 ms stddev 2.158
progress: 55.0 s, 8382.2 tps, lat 3.816 ms stddev 2.383
progress: 56.0 s, 8298.1 tps, lat 3.848 ms stddev 2.727
progress: 57.0 s, 8369.5 tps, lat 3.825 ms stddev 2.897
progress: 58.0 s, 8205.0 tps, lat 3.902 ms stddev 2.669
progress: 59.0 s, 8048.1 tps, lat 3.974 ms stddev 2.473
progress: 60.0 s, 7886.5 tps, lat 4.050 ms stddev 2.456
progress: 61.0 s, 7539.0 tps, lat 4.251 ms stddev 2.992
progress: 62.0 s, 7815.4 tps, lat 4.090 ms stddev 2.845
progress: 63.0 s, 7863.4 tps, lat 4.062 ms stddev 3.032
progress: 64.0 s, 7750.6 tps, lat 4.135 ms stddev 2.677
progress: 65.0 s, 7580.7 tps, lat 4.217 ms stddev 2.210
progress: 66.0 s, 7164.8 tps, lat 4.466 ms stddev 2.511
progress: 67.0 s, 7504.4 tps, lat 4.265 ms stddev 1.343
progress: 68.0 s, 9530.0 tps, lat 3.360 ms stddev 1.742
progress: 69.0 s, 16751.3 tps, lat 1.909 ms stddev 0.652
progress: 70.0 s, 16123.7 tps, lat 1.983 ms stddev 0.751
progress: 71.0 s, 15122.7 tps, lat 2.115 ms stddev 1.065

Yes. That's right. ANALYZE caused a slowdown of nearly *2. Executing it
on a bigger database where it takes longer makes it much worse. Also, a
nonuniform access pattern also makes it *much* worse.

So, you might ask, why is that? It's because ANALYZE keeps a snapshot
around and thus blocks RecentGlobalXmin from advancing. Leading to hot
pruning being impossible on popular pages.

Now, the update rates in this example certainly are above what most
sites see, but in practical sites relations are also much larger. And
thus autoanalyze also takes longer... Much longer with a large sample
size.

Interestingly we already set PROC_IN_ANALYZE - but we never actually
look at it. I wonder what it'd take to ignore analyze in
GetSnapshotData()/GetOldestXmin()... At first sight it looks quite
doable to not have a snapshot around (or setup to be ignored) while
acquire_sample_rows() is running. As that's the expensive bit...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizer on sort aggregate

2014-10-18 Thread Peter Geoghegan
On Sat, Oct 18, 2014 at 5:27 AM, Greg Stark st...@mit.edu wrote:
 That's interesting but I think it's mostly a quirk of your example.
 Afaics the difference is only that the en_US locale ignores
 punctuation like : and /  (or at least treats them as less significant
 than alphabetic characters). If you had strings that had less
 punctuation or differences that didn't happen to arrive shortly after
 the 8-byte boundary then it wouldn't make any difference.

The en_US locale treats those kind of punctuation characters as less
significant than alphabetical characters, but is not at all unusual in
doing so - all locales I tested do the same. They also do the same for
spaces. And there is another property of the transformation that is
very important for those outside of the English speaking world -
diacritics are not represented at the primary weight level. So even
though representing certain characters with a diacritic will take 2
bytes in UTF-8, the corresponding primary weight only takes 1 byte. In
short, as I said, the concentration of entropy can easily be a lot
higher within the first n bytes of the primary weight level of the
transformed blob as compared to the first n bytes of the original
string, and frequently *will* be. This will make worst cases for
abbreviated keys significantly less common than they would otherwise
be, while more generally improving performance. Of course, that might
not always be as effective as we'd prefer, but that's something that
you more or less have to live with in order to have competitive sort
performance (even with the HyperLogLog amelioration, you still pay
something for considering the technique). You can always contrive a
case that puts things just out of reach, no matter how much entropy
you manage to concentrate in the abbreviated key. Fortunately, I don't
think those cases are all that representative of what people want from
sort performance.

 And we still have to run strfrm at least once, write out the whole
 binary blob to memory somewhere and if it spills to disk we still have
 to write and read much more data. I think recognizing cases where
 equality is the only thing we're interested in and locale-sensitive
 sorting isn't necessary and using a memcmp would be a clear win.

Yeah. I was making a point that strictly concerned abbreviated keys as
proposed in response to Feng's remarks. I am not 100% sure that the
benefits of abbreviated keys with locale support aren't worth it here,
and I say that in full agreement with Feng about locale-aware sorts
not actually being necessary. It's clear that cache performance is
essential to getting good sort performance, which strongly recommends
abbreviated keys. When we consider the concentration of entropy with
ordinary locale-aware abbreviated keys, as compared to abbreviated
keys that just use the C locale artificially for this case, it's not
clear that the concentration of entropy isn't reason enough to prefer
locale aware abbreviated keys. Now, it might well be that paying for n
strxfrm() operations, rather than doing n straight-up memcpy()
operations isn't worth it. But I think it might well be worth it -
particularly when you factor in the complexity avoided by not
special-casing this. I'm not really sure.

The general idea of abbreviated keys is almost old hat, to be honest.
It just happens to be essential for competitive sort performance.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-18 Thread Bruce Momjian
On Sat, Oct 18, 2014 at 06:15:03PM +0200, Marko Tiikkaja wrote:
 On 10/18/14, 5:46 PM, Tom Lane wrote:
 Marko Tiikkaja ma...@joh.to writes:
 Yes, exactly; if I had had the option to disable the index from the
 optimizer's point of view, I'd have seen that it's not used for looking
 up any data by any queries, and thus I would have known that I can
 safely drop it without slowing down queries.  Which was the only thing I
 cared about, and where the stats we provide failed me.
 
 This argument is *utterly* wrongheaded, because it assumes that the
 planner's use of the index provided no benefit to your queries.  If the
 planner was touching the index at all then it was planning queries in
 which knowledge of the extremal value was relevant to accurate selectivity
 estimation.  So it's quite likely that without the index you'd have gotten
 different and inferior plans, whether or not those plans actually chose to
 use the index.
 
 Maybe.  But at the same time that's a big problem: there's no way of
 knowing whether the index is actually useful or not when it's used
 only by the query planner.

That is a good point.  Without an index, the executor is going to do a
sequential scan, while a missing index to the optimizer just means worse
statistics.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] initdb failure on RH 5.10

2014-10-18 Thread BRUSSER Michael
Hi,

One of our sites ran into a problem with database installation:

initdb failed
FATAL: unexpected data beyond EOF in block 19 of relation base/1/2609
HINT: This has been seen to occur with buggy kernels; consider updating your 
system.
STATEMENT: COMMENT ON FUNCTION euc_jis_2004_to_shift_jis_2004
  (INTEGER, INTEGER, CSTRING, INTERNAL, INTEGER)
   IS 'internal conversion function for EUC_JIS_2004 to SHIFT_JIS_2004';


initdb is called like this:
initdb -D data-dir -L input-dir -E UTF8 --locale=C

This is Postgres 8.4.4, the installation piece has been stable and always 
worked, but this time they have a new Red Hat 5.10 server

# uname -a
Linux hostname 2.6.18-371.el5 #1 SMP Thu Sep 5 21:21:44 EDT 2013 x86_64 
x86_64 x86_64 GNU/Linux

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.10 (Tikanga)

I am not sure if this is helpful, but just in case:
# ldd euc_jis_2004_and_shift_jis_2004.so
   linux-vdso.so.1 = (0x7fff259fd000)
   libc.so.6 = /lib64/libc.so.6 (0x2b3d5a756000)
   /lib64/ld-linux-x86-64.so.2 (0x00384840)

And these libs are softlinks:
/lib64/libc.so.6 - libc-2.5.so
/lib64/ld-linux-x86-64.so.2 - ld-2.5.so

The only thought I have at this point is to run it with strace, but maybe this 
is a known issue and someone has a better idea?

Thank you,
Michael.







This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systemes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.

For other languages, go to http://www.3ds.com/terms/email-disclaimer


Re: [HACKERS] (auto-)analyze causing bloat/load

2014-10-18 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Interestingly we already set PROC_IN_ANALYZE - but we never actually
 look at it. I wonder what it'd take to ignore analyze in
 GetSnapshotData()/GetOldestXmin()... At first sight it looks quite
 doable to not have a snapshot around (or setup to be ignored) while
 acquire_sample_rows() is running. As that's the expensive bit...

Say what?  How is ANALYZE supposed to know which rows are valid
without a snapshot?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (auto-)analyze causing bloat/load

2014-10-18 Thread Andres Freund
On 2014-10-18 14:30:53 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Interestingly we already set PROC_IN_ANALYZE - but we never actually
  look at it. I wonder what it'd take to ignore analyze in
  GetSnapshotData()/GetOldestXmin()... At first sight it looks quite
  doable to not have a snapshot around (or setup to be ignored) while
  acquire_sample_rows() is running. As that's the expensive bit...
 
 Say what?  How is ANALYZE supposed to know which rows are valid
 without a snapshot?

I haven't fully investigated it yet. But it's already not using it's own
snapshot to determine visibility. Instead it uses GetOldestXmin() +
HeapTupleSatisfiesVacuum().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb failure on RH 5.10

2014-10-18 Thread Tom Lane
BRUSSER Michael michael.brus...@3ds.com writes:
 initdb failed
 FATAL: unexpected data beyond EOF in block 19 of relation base/1/2609
 HINT: This has been seen to occur with buggy kernels; consider updating your 
 system.
 STATEMENT: COMMENT ON FUNCTION euc_jis_2004_to_shift_jis_2004
   (INTEGER, INTEGER, CSTRING, INTERNAL, INTEGER)
IS 'internal conversion function for EUC_JIS_2004 to 
 SHIFT_JIS_2004';

That's ... surprising.  We've only ever seen that error message in cases
with heavy concurrent updates and wonky underlying storage; initdb is not
where anyone would expect it.

 initdb is called like this:
 initdb -D data-dir -L input-dir -E UTF8 --locale=C

It's not exactly customary to use -L in initdb calls.  Is it possible
you're pointing it to an incompatible library directory?  Not that I see
how that would lead to this behavior, but you're definitely dealing with
something pretty weird.

 This is Postgres 8.4.4, the installation piece has been stable and always 
 worked, but this time they have a new Red Hat 5.10 server

What in the world are they doing using 8.4.4?  The entire 8.4.x release
series is out of support anyway, but there is little if any excuse not
to be using the last minor release, 8.4.22.

I'd call your attention also to the fact that RHEL 5.10 is obsolete.
5.11 came out last month, and Red Hat are not known for updating back-rev
release series with inessential bug fixes.

If you can still reproduce this with 5.11 and 8.4.22, people might be
interested in looking more closely.  Otherwise, well, you're dealing
with five-year-old software with a very long list of known bugs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-10-18 Thread Tomas Vondra
Hi,

On 16.10.2014 03:26, Jeff Davis wrote:
 On Fri, 2014-08-29 at 10:12 -0400, Tom Lane wrote:
 What about removing the callback per se and just keeping the
 argument, as it were. That is, a MemoryContext has a field of type
 size_t * that is normally NULL, but if set to non-NULL, then we
 increment the pointed-to value for pallocs and decrement for
 pfrees.
 
 I like that idea; patch attached. Two differences:
   * At block level, not chunk level.
   * I use a uint64, because a size_t is only guaranteed to hold one
 allocation, and we need to sum up many allocations.

 When unused, it does still appear to have a little overhead in 
 Robert's test on his power machine. It seems to be between a 0.5-1.0%
 regression. There's not much extra code on that path, just a branch,
 pointer dereference, and an addition; so I don't think it will get
 much cheaper than it is.
 
 This regression seems harder to reproduce on my workstation, or
 perhaps it's just noisier.

I assume it's the reindex pgbench_accounts_pkey test, correct?

I did the same test on my workstation (with i7-4770R CPU), and ISTM
there's a measurable difference, although in surprising directions. For
binaries compiled with gcc 4.7.3 and clang 3.3 and 3.5, the results look
like this (5 runs for each combination):

gcc 4.7.3 / master
--
LOG:  internal sort ended, ... CPU 1.24s/4.32u sec elapsed 9.78 sec
LOG:  internal sort ended, ... CPU 1.29s/4.31u sec elapsed 9.80 sec
LOG:  internal sort ended, ... CPU 1.26s/4.31u sec elapsed 9.80 sec
LOG:  internal sort ended, ... CPU 1.26s/4.37u sec elapsed 9.78 sec
LOG:  internal sort ended, ... CPU 1.33s/4.29u sec elapsed 9.78 sec

gcc 4.7.3 / patched
---
LOG:  internal sort ended, ... CPU 1.35s/4.27u sec elapsed 9.78 sec
LOG:  internal sort ended, ... CPU 1.33s/4.30u sec elapsed 9.74 sec
LOG:  internal sort ended, ... CPU 1.34s/4.27u sec elapsed 9.75 sec
LOG:  internal sort ended, ... CPU 1.27s/4.26u sec elapsed 9.78 sec
LOG:  internal sort ended, ... CPU 1.35s/4.26u sec elapsed 9.78 sec

clang 3.3 / master
--
LOG:  internal sort ended, ... CPU 1.32s/4.61u sec elapsed 10.00 sec
LOG:  internal sort ended, ... CPU 1.27s/4.48u sec elapsed 9.95 sec
LOG:  internal sort ended, ... CPU 1.35s/4.48u sec elapsed 9.99 sec
LOG:  internal sort ended, ... CPU 1.32s/4.49u sec elapsed 9.97 sec
LOG:  internal sort ended, ... CPU 1.32s/4.47u sec elapsed 10.04 sec

clang 3.3 / patched
---
LOG:  internal sort ended, ... CPU 1.35s/4.59u sec elapsed 10.13 sec
LOG:  internal sort ended, ... CPU 1.31s/4.61u sec elapsed 10.06 sec
LOG:  internal sort ended, ... CPU 1.28s/4.63u sec elapsed 10.10 sec
LOG:  internal sort ended, ... CPU 1.27s/4.58u sec elapsed 10.01 sec
LOG:  internal sort ended, ... CPU 1.29s/4.60u sec elapsed 10.05 sec

clang 3.5 / master
--
LOG:  internal sort ended, ... CPU 1.30s/4.46u sec elapsed 9.96 sec
LOG:  internal sort ended, ... CPU 1.30s/4.49u sec elapsed 9.96 sec
LOG:  internal sort ended, ... CPU 1.30s/4.53u sec elapsed 9.95 sec
LOG:  internal sort ended, ... CPU 1.25s/4.51u sec elapsed 9.95 sec
LOG:  internal sort ended, ... CPU 1.30s/4.50u sec elapsed 9.95 sec

clang 3.5 / patched
---
LOG:  internal sort ended, ... CPU 1.32s/4.59u sec elapsed 9.97 sec
LOG:  internal sort ended, ... CPU 1.27s/4.49u sec elapsed 9.91 sec
LOG:  internal sort ended, ... CPU 1.29s/4.48u sec elapsed 9.88 sec
LOG:  internal sort ended, ... CPU 1.31s/4.49u sec elapsed 9.93 sec
LOG:  internal sort ended, ... CPU 1.23s/4.56u sec elapsed 9.90 sec

So while on clang 3.3 I really see about ~1% slowdown, on the other two
compilers it seems a tad faster (but it might easily be a noise).

It'd be interesting to know what compiler/version Robert used ...

 One thought in either case is that we don't have to touch the API
 for MemoryContextCreate: rather, the feature can be enabled in a
 separate call after making the context.
 
 That seems fairly awkward to me because the pointer needs to be 
 inherited from the parent context when not specified. I left the
 extra API call in.

I don't see a big difference between adding a new API create method
and a adding a separate enable tracking method.

What however seems awkward is using the 'uint64*' directly as a
parameter, instead of using 'enable_tracking' flag as in the previous
versions. Is there a reason for that?

This allows supplying a pointer to a uint64 variable, i.e. something
like this:

   {
   uint64 mem_used = 0;
   MemoryContext context
   = MemoryContextCreateTracked(..., mem_used);

   ...

   if (mem_used  work_mem * 1024L)
   {
   ... do something ...
   }
   }

Thanks to exposing the total_mem like this, it's possible to get the
current value directly like this (without the API methods available in
the previous versions). It also makes it possible to detach the
accounting from the parent (i.e. not count it against the parent).

But is this 

Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch

2014-10-18 Thread Jim Nasby

On 10/17/14, 10:16 PM, Tom Lane wrote:

I wrote:

Because it needs up-to-date min/max values in order to avoid being
seriously misled about selectivities of values near the endpoints.
See commit 40608e7f949fb7e4025c0ddd5be01939adc79eec.


BTW, on re-reading that code I notice that it will happily seize upon
the first suitable index (first in OID order), regardless of how many
lower-order columns that index has got.  This doesn't make any difference
I think for get_actual_variable_range's own purposes, because it's only
expecting to touch the endmost index page regardless.  However, in light
of Marko's complaint maybe we should teach it to check all the indexes
and prefer the matching one with fewest columns?  It would only take a
couple extra lines of code, and probably not that many added cycles
considering we're going to do an index access of some sort.  But I'm
not sure if it's worth any extra effort --- I think in his example
case, there wasn't any narrower index anyway.


The real cost here isn't the number of columns, it's the size of the index, no? 
So shouldn't we look at relpages instead? For example, you'd certainly want to 
use an index on (field_we_care_about, smallint_field) over an index on 
(field_we_care_about, big_honking_text_field).
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trailing comma support in SELECT statements

2014-10-18 Thread Jim Nasby

On 10/17/14, 11:19 PM, David E. Wheeler wrote:

On Oct 17, 2014, at 3:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:


Yeah, exactly.  Personally I'm *not* for this, but if we do it we should
do it consistently: every comma-separated list in the SQL syntax should
work the same.


PL/pgSQL, too, I presume.


Yes.

The only case I can think of where we wouldn't want this is COPY.

BTW, this should also apply to delimiters other than commas; for example, some 
geometry types use ; as a delimiter between points.

I do think such a change should be made in stages, and maybe not every last one 
makes it into 9.5, but the intention should certainly be that we support extra 
delimiters *everywhere*.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Superuser connect during smart shutdown

2014-10-18 Thread Robert Haas

 But TBH I suspect 95% of the problems here would vanish if smart
 shutdown weren't the default ...

But for your repeated objections, we would have changed the default to fast 
years ago. AFAICT everyone else is in favor of that.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-18 Thread Jim Nasby

On 10/9/14, 4:19 PM, Andres Freund wrote:

On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

 On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:

  Bruce Momjian wrote:
  

   I agree this is a serious problem.  We have discussed various options,
   but have not decided on anything.  The TODO list has:
   
https://wiki.postgresql.org/wiki/Todo
   
Improve setting of visibility map bits for read-only and insert-only
workloads


http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us

  
  I hate to repeat myself, but I think autovacuum could be modified to run
  actions other than vacuum and analyze.  In this specific case we could
  be running a table scan that checks only pages that don't have the
  all-visible bit set, and see if it can be set.

 
 Isn't that*precisely*  what a plain vacuum run does?


Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references.

IIRC it doesn't do most of that if that there's no need. And if it's a
insert only table without rollbacks. I*do*  think there's some
optimizations we could make in general.


No, it always attempts dead tuple removal. The weird part is that if it's not 
doing a freeze it will just punt on a page if it can't get the cleanup lock. I have to 
believe that could seriously screw up autovacuum scheduling.

Now that we have forks, I'm wondering if it would be best to come up with a 
per-page system that could be used to determine when a table needs background 
work to be done. The visibility map could serve a lot of this purpose, but I'm 
not sure if it would work for getting hint bits set in the background. I think 
it would also be a win if we had a way to advance relfrozenxid and relminmxid. 
Perhaps something that simply remembered the last XID that touched each page...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers