Re: [HACKERS] Allow SQL/plpgsql functions to accept record

2015-04-23 Thread Jim Nasby

On 4/23/15 8:33 AM, Pavel Stehule wrote:

I agree that that would be useful.  I think the problem with an
expression like rowvar.something is that PL/pgsql cannot infer the
type of the result, and nothing else works without that.  I doubt that
it's practical to lift that restriction.  PL/pgsql could introduce
dedicated syntax for this operation, like DYNAMIC_EXTRACT(rowvar,
colname, resulttype) or something, but that's going to be clunky at
best.  Whether we eventually do that or not, if we can allow rows to
be passed in and then let people use json or hstore operators on them,
that would be a helpful step forward, IMHO.  I'm not sure if that's
practical either, but maybe...


this need significant changes in plpgsql - it can enforce work with
different types in expressions in cycle - so we should to leave
expressions based on plans or we have to introduce alternated plans for
different input types.


Is this fundamentally an issue of not knowing what we're being handed 
when we compile the function? Perhaps a way around this limitation would 
be to recompile during execution if any record arguments get a different 
base type. In reality, I suspect that won't happen during a single query.


I'll take a look at at least allowing passing a record in so you can 
hand it to some other function. Any pointers on how to do that would be 
welcome; I've never hacked on plpgsql or SQL function code before.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Geoff Winkless
On 23 April 2015 at 14:50, Andres Freund and...@anarazel.de wrote:

  ​Maybe I'm misreading it, but isn't index_predicate meant to be inside
 the
  brackets?
 
 
 http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html

 That has changed since.


​Oh, helpful. :)​

​I'll shut up. I have a feeling that my objection is really with the very
idea of unreserved keywords and I have a feeling that there will be rather
more people shouting me down if I go off on that particular rant; meanwhile
it's 20 years since I used yacc in earnest and it's too hazy to be able to
argue about what it is or isn't capable of.

When I set out I was really only hoping to express a preference as a user;
on balance I would really rather not have DO IGNORE, if it were possible to
avoid, because it's really ugly, but DO UPDATE/DO NOTHING I could just
about cope with (and means you don't need to add IGNORE as a keyword,
win!), although it still mildly pains me that there's an additional
unnecessary word.

But I certainly don't object enough to hold up you guys doing the actual
work for my benefit (among others, obviously!).

G


Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread Bruce Momjian
On Thu, Apr 23, 2015 at 09:13:52AM -0400, Robert Haas wrote:
 On Wed, Apr 22, 2015 at 10:41 PM, Bruce Momjian br...@momjian.us wrote:
  What is a real problem is that we don't block creating tablespaces
  anywhere at all, including in obviously problematic places like the
  transaction log directory:
 
  josh=# create tablespace tbl2 location '/home/josh/pg94/data/pg_xlog/';
  CREATE TABLESPACE
 
  It really seems like we ought to block *THAT*.  Of course, if we block
  tablespace creation in PGDATA generally, then that's covered.
 
  I have developed the attached patch to warn about creating tablespaces
  inside the data directory.  The case this doesn't catch is referencing a
  symbolic link that points to the same directory.  We can't make it an
  error so people can use pg_upgrade these setups.  This would be for 9.5
  only.
 
 I think this is a good thing to do, but I sure wish we could go
 further and block it completely.  That may require more thought than
 we have time to put in at this stage of the release cycle, though, so
 +1 for doing at least this much.

OK, good.  Thinking to 9.6, I am not sure how we could throw an error
because we have allowed this in the past and pg_dump is going to be
restored with a raw SQL CREATE TABLESPACE command.  

We have had this type of problem before, but never resolved it.  We
almost need pg_dump to set a GUC variable telling the backend it is
restoring a dump and issue a warning, but throw an error if the same
command was issued outside of a pg_dump restore.  FYI, pg_upgrade
already throws a warning related to the non-creation of a delete script.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CINE for ALTER TABLE ... ADD COLUMN

2015-04-23 Thread Fabrízio de Royes Mello
On Wed, Apr 22, 2015 at 3:48 PM, Payal Singh pa...@omniti.com wrote:

 The following review has been posted through the commitfest application:
 make installcheck-world:  tested, failed
 Implements feature:   not tested
 Spec compliant:   not tested
 Documentation:not tested

 Seeing this when trying to apply the patch:

 Patching file src/backend/commands/tablecmds.c using Plan A...
 Hunk #1 FAILED at 328.
 Hunk #2 succeeded at 2294 (offset 11 lines).
 Hunk #3 FAILED at 3399.
 Hunk #4 FAILED at 3500.
 Hunk #5 succeeded at 4658 with fuzz 1 (offset 65 lines).
 Hunk #6 succeeded at 4753 (offset 66 lines).
 Hunk #7 succeeded at 4989 with fuzz 2 (offset 66 lines).
 Hunk #8 succeeded at 5003 (offset 69 lines).
 Hunk #9 succeeded at 5017 (offset 69 lines).
 Hunk #10 succeeded at 5033 (offset 69 lines).

 The new status of this patch is: Waiting on Author


The patch needs a rebase. Done!

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 6a82730..3041b09 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -36,7 +36,7 @@ ALTER TABLE ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable
 
 phrasewhere replaceable class=PARAMETERaction/replaceable is one of:/phrase
 
-ADD [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceable class=PARAMETERcollation/replaceable ] [ replaceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ]
+ADD [ COLUMN ] [ IF NOT EXISTS ]replaceable class=PARAMETERcolumn_name/replaceable replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceable class=PARAMETERcollation/replaceable ] [ replaceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ]
 DROP [ COLUMN ] [ IF EXISTS ] replaceable class=PARAMETERcolumn_name/replaceable [ RESTRICT | CASCADE ]
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable [ SET DATA ] TYPE replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceable class=PARAMETERcollation/replaceable ] [ USING replaceable class=PARAMETERexpression/replaceable ]
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable SET DEFAULT replaceable class=PARAMETERexpression/replaceable
@@ -96,11 +96,12 @@ ALTER TABLE ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable
 
   variablelist
varlistentry
-termliteralADD COLUMN/literal/term
+termliteralADD COLUMN [ IF NOT EXISTS ]/literal/term
 listitem
  para
   This form adds a new column to the table, using the same syntax as
-  xref linkend=SQL-CREATETABLE.
+  xref linkend=SQL-CREATETABLE. If literalIF NOT EXISTS/literal
+  is specified and the column already exists, no error is thrown.
  /para
 /listitem
/varlistentry
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 06e4332..8cd436d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -328,8 +328,8 @@ static void ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recu
 bool is_view, AlterTableCmd *cmd, LOCKMODE lockmode);
 static ObjectAddress ATExecAddColumn(List **wqueue, AlteredTableInfo *tab,
 Relation rel, ColumnDef *colDef, bool isOid,
-bool recurse, bool recursing, LOCKMODE lockmode);
-static void check_for_column_name_collision(Relation rel, const char *colname);
+bool recurse, bool recursing, bool if_not_exists, LOCKMODE lockmode);
+static bool check_for_column_name_collision(Relation rel, const char *colname, bool if_not_exists);
 static void add_column_datatype_dependency(Oid relid, int32 attnum, Oid typid);
 static void add_column_collation_dependency(Oid relid, int32 attnum, Oid collid);
 static void ATPrepAddOids(List **wqueue, Relation rel, bool recurse,
@@ -2294,7 +2294,7 @@ renameatt_internal(Oid myrelid,
 		oldattname)));
 
 	/* new name should not already exist */
-	check_for_column_name_collision(targetrelation, newattname);
+	check_for_column_name_collision(targetrelation, newattname, false);
 
 	/* apply the update */
 	namestrcpy((attform-attname), newattname);
@@ -3443,11 +3443,11 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_AddColumnToView:		/* add column via CREATE OR REPLACE
 		 * VIEW */
 			address = ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd-def,
-	  false, false, false, lockmode);
+			false, false, false, false, lockmode);
 			break;
 		case AT_AddColumnRecurse:
 			address = ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd-def,
-	  false, true, false, lockmode);
+	  false, true, 

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Jim Nasby

On 4/23/15 8:42 AM, Robert Haas wrote:

On Thu, Apr 23, 2015 at 4:19 AM, Simon Riggs si...@2ndquadrant.com wrote:

We were talking about having an incremental backup map also. Which sounds a
lot like the freeze map.


Yeah, possibly.  I think we should try to set things up so that the
backup map can be updated asynchronously by a background worker, so
that we're not adding more work to the foreground path just for the
benefit of maintenance operations.  That might make the logic for
autovacuum to use it a little bit more complex, but it seems
manageable.


I'm not sure an actual map makes sense... for incremental backups you 
need some kind of stream that tells you not only what changed but when 
it changed. A simple freeze map won't work for that because the 
operation of freezing itself writes data (and the same can be true for 
VM). Though, if the backup utility was actually comparing live data to 
an actual backup maybe this would work...



We only need a freeze/backup map for larger relations. So if we map 1000
blocks per map page, we skip having a map at all when size  1000.


Agreed.  We might also want to map multiple blocks per map slot - e.g.
one slot per 32 blocks.  That would keep the map quite small even for
very large relations, and would not compromise efficiency that much
since reading 256kB sequentially probably takes only a little longer
than reading 8kB.


The problem with mapping a range of pages per bit is dealing with 
locking when you set the bit. Currently that's easy because we're 
holding the cleanup lock on the page, but you can't do that if you have 
a range of pages. Though, if each 'slot' wasn't a simple binary value we 
could have a 3rd state that indicates we're in the process of marking 
that slot as all visible/frozen, but you still need to consider the bit 
as cleared.


Honestly though, I think concerns about the size of the map are a bit 
overblown. Even if we double it's size, it's still 32,000 times smaller 
than the heap is with 8k pages. I suspect if you have tables large 
enough where you'll care, you'll also be using 32k pages, which means 
it'd be 128,000 times smaller than the heap. I have a hard time 
believing that's going to be even a faint blip on the performance radar.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread Bruce Momjian
On Thu, Apr 23, 2015 at 05:05:14PM +0200, Andres Freund wrote:
 On 2015-04-23 11:00:43 -0400, Bruce Momjian wrote:
  On Thu, Apr 23, 2015 at 09:13:52AM -0400, Robert Haas wrote:
   I think this is a good thing to do, but I sure wish we could go
   further and block it completely.  That may require more thought than
   we have time to put in at this stage of the release cycle, though, so
   +1 for doing at least this much.
  
  OK, good.  Thinking to 9.6, I am not sure how we could throw an error
  because we have allowed this in the past and pg_dump is going to be
  restored with a raw SQL CREATE TABLESPACE command.  
 
 We could just document that you need to pre-create the tablespace and
 ignore the resulting error. This isn't going to affect too many people.

This approach is going to cause any object in that tablespace to not
restore --- are we sure that enough people check for restore errors that
we will not have people losing data on a restore?

Also, the error is going to cause pg_upgrade to fail.  We could have
pg_upgrade --check detect these cases and force people to fix their
setups before they run pg_upgrade --- at least that would be consistent
with the pg_dump behavior.  Jim Nasby suggested throwing an error unless
IsBinaryUpgrade is set, and that would work, but it means we are
allowing such tablespaces to be upgraded using pg_upgrade only, which
seems kind of odd.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Bruce Momjian
On Thu, Apr 23, 2015 at 10:42:59AM +0300, Heikki Linnakangas wrote:
 On 04/22/2015 09:24 PM, Robert Haas wrote:
 I would feel safer if we added a completely new epoch counter to the page
 header, instead of reusing LSNs. But as we all know, changing the page
 format is a problem for in-place upgrade, and takes some space too.
 Yeah.  We have a serious need to reduce the size of our on-disk
 format.  On a TPC-C-like workload Jan Wieck recently tested, our data
 set was 34% larger than another database at the beginning of the test,
 and 80% larger by the end of the test.  And we did twice the disk
 writes.  See The Elephants in the Room.pdf at
 https://sites.google.com/site/robertmhaas/presentations
 
 Meh. Adding an 8-byte header to every 8k block would add 0.1% to the
 disk size. No doubt it would be nice to reduce our disk footprint,
 but the page header is not the elephant in the room.

Agreed.  Are you saying we can't find a way to fit an 8-byte value into
the existing page in a backward-compatible way?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] anole - test case sha2 fails on all branches

2015-04-23 Thread Tom Lane
I wrote:
 Given that anole is the only one reporting this, I'm not sure that we
 should immediately blame Postgres itself.  I have a vague recollection
 that we've seen this symptom before and traced it to a bug in some
 supporting library.  Is anole using any particularly out-of-date versions
 of openssl, kerberos, etc?

A bit of digging in the archives suggests that my hindbrain remembered this:
http://www.postgresql.org/message-id/flat/4f5a8404.8020...@dunslane.net

The specifics probably don't apply to anole, but the conclusion that
inconsistent openssl header and library files triggered the bug might.

Maybe it'd be worth trying to see if we could detect such inconsistencies
during configure?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Bruce Momjian
On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
  Right.  My point is that either you do X 2M times to maintain that fork
  and the overhead of the file existence, or you do one VACUUM FREEZE.  I
  am saying that 2M is a large number and adding all those X's might
  exceed the cost of a VACUUM FREEZE.
 
 I agree, but if we instead make this part of the visibility map
 instead of a separate fork, the cost is much less.  It won't be any
 more expensive to clear 2 consecutive bits any time a page is touched
 than it is to clear 1.  The VM fork will be twice as large, but still
 tiny.  And the fact that you'll have only half as many pages mapping
 to the same VM page may even improve performance in some cases by
 reducing contention.  Even when it reduces performance, I think the
 impact will be so tiny as not to be worth caring about.

Agreed, no extra file, and the same write volume as currently.  It would
also match pg_clog, which uses two bits per transaction --- maybe we can
reuse some of that code.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] anole - test case sha2 fails on all branches

2015-04-23 Thread Alvaro Herrera
Robert Haas wrote:

 There are lots of machines failing in pg_upgradeCheck, but I don't see
 details of the failures in the logs.

Yeah, I think the buildfarm script is failing to save the error log.
Anyway AFAIR this is related to the move from contrib to src/bin;
machines that have updated to buildfarm 4.15 work fine, those that are
on older versions report failure.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Petr Jelinek

On 23/04/15 17:24, Heikki Linnakangas wrote:

On 04/23/2015 05:52 PM, Jim Nasby wrote:

On 4/23/15 2:42 AM, Heikki Linnakangas wrote:

On 04/22/2015 09:24 PM, Robert Haas wrote:

Yeah.  We have a serious need to reduce the size of our on-disk
format.  On a TPC-C-like workload Jan Wieck recently tested, our data
set was 34% larger than another database at the beginning of the test,
and 80% larger by the end of the test.  And we did twice the disk
writes.  See The Elephants in the Room.pdf at
https://sites.google.com/site/robertmhaas/presentations


Meh. Adding an 8-byte header to every 8k block would add 0.1% to the
disk size. No doubt it would be nice to reduce our disk footprint, but
the page header is not the elephant in the room.


I've often wondered if there was some way we could consolidate XMIN/XMAX
from multiple tuples at the page level; that could be a big win for OLAP
environments where most of your tuples belong to a pretty small range of
XIDs. In many workloads you could have 80%+ of the tuples in a table
having a single inserting XID.


It would be doable for xmin - IIRC someone even posted a patch for that
years ago - but xmax (and ctid) is difficult. When a tuple is inserted,
Xmax is basically just a reservation for the value that will be put
there later. You have no idea what that value is, and you can't
influence it, and when it's time to delete/update the row, you *must*
have the space for that xmax. So we can't opportunistically use the
space for anything else, or compress them or anything like that.



That depends, if we are going to change page format we can move the xmax 
to be some map of ctid-xmax in the header (with no values for tuples 
with no xmax) or have bitmap there of tuples that have xmax etc. 
Basically not saving xmax (and potentially other info) inline for each 
tuple but have some info in header only for tuples that need it. That 
might have bad performance side effects of course, but there are 
definitely some potential ways of doing things differently which we 
could explore.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Heikki Linnakangas

On 04/23/2015 06:39 PM, Petr Jelinek wrote:

On 23/04/15 17:24, Heikki Linnakangas wrote:

On 04/23/2015 05:52 PM, Jim Nasby wrote:

I've often wondered if there was some way we could consolidate XMIN/XMAX
from multiple tuples at the page level; that could be a big win for OLAP
environments where most of your tuples belong to a pretty small range of
XIDs. In many workloads you could have 80%+ of the tuples in a table
having a single inserting XID.


It would be doable for xmin - IIRC someone even posted a patch for that
years ago - but xmax (and ctid) is difficult. When a tuple is inserted,
Xmax is basically just a reservation for the value that will be put
there later. You have no idea what that value is, and you can't
influence it, and when it's time to delete/update the row, you *must*
have the space for that xmax. So we can't opportunistically use the
space for anything else, or compress them or anything like that.


That depends, if we are going to change page format we can move the xmax
to be some map of ctid-xmax in the header (with no values for tuples
with no xmax)  ...


Stop right there. You need to reserve enough space on the page to store 
an xmax for *every* tuple on the page. Because if you don't, what are 
you going to do when every tuple on the page is deleted by a different 
transaction.


Even if you store the xmax somewhere else than the page header, you need 
to reserve the same amount of space for them, so it doesn't help at all.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] anole - test case sha2 fails on all branches

2015-04-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Apr 23, 2015 at 6:13 AM, Sandeep Thakkar
 sandeep.thak...@enterprisedb.com wrote:
 The test case sha2 in contrib pgcrypto module is failing with a diff on
 anole because the results file contains the additional lines as:
 + WARNING:  detected write past chunk end in ExprContext 6021cbb0

 That's clearly a bug, but it's a little hard to tell which commit
 introduced it because anole hasn't reported in so long.  :-(

Given that anole is the only one reporting this, I'm not sure that we
should immediately blame Postgres itself.  I have a vague recollection
that we've seen this symptom before and traced it to a bug in some
supporting library.  Is anole using any particularly out-of-date versions
of openssl, kerberos, etc?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Andres Freund
On 2015-04-23 15:52:40 +0100, Geoff Winkless wrote:
 When I set out I was really only hoping to express a preference as a user;
 on balance I would really rather not have DO IGNORE, if it were possible to
 avoid, because it's really ugly, but DO UPDATE/DO NOTHING I could just
 about cope with (and means you don't need to add IGNORE as a keyword,
 win!), although it still mildly pains me that there's an additional
 unnecessary word.

Yea, DO NOTHING is a good alternative. And I do like we're adding one
keyword less (which is also good for the parser's
size/performance).

DO {UPDATE ... | NOTHING | LOCK} doesn't sound too bad to me (yes, LOCK
doesn't exist yet, except by writing UPDATE .. WHERE false ;)).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread Jim Nasby

On 4/22/15 9:41 PM, Bruce Momjian wrote:

The case this doesn't catch is referencing a
symbolic link that points to the same directory.  We can't make it an
error so people can use pg_upgrade these setups.


Couldn't we make it an ERROR unless IsBinaryUpgrade?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Heikki Linnakangas

On 04/23/2015 05:52 PM, Jim Nasby wrote:

On 4/23/15 2:42 AM, Heikki Linnakangas wrote:

On 04/22/2015 09:24 PM, Robert Haas wrote:

Yeah.  We have a serious need to reduce the size of our on-disk
format.  On a TPC-C-like workload Jan Wieck recently tested, our data
set was 34% larger than another database at the beginning of the test,
and 80% larger by the end of the test.  And we did twice the disk
writes.  See The Elephants in the Room.pdf at
https://sites.google.com/site/robertmhaas/presentations


Meh. Adding an 8-byte header to every 8k block would add 0.1% to the
disk size. No doubt it would be nice to reduce our disk footprint, but
the page header is not the elephant in the room.


I've often wondered if there was some way we could consolidate XMIN/XMAX
from multiple tuples at the page level; that could be a big win for OLAP
environments where most of your tuples belong to a pretty small range of
XIDs. In many workloads you could have 80%+ of the tuples in a table
having a single inserting XID.


It would be doable for xmin - IIRC someone even posted a patch for that 
years ago - but xmax (and ctid) is difficult. When a tuple is inserted, 
Xmax is basically just a reservation for the value that will be put 
there later. You have no idea what that value is, and you can't 
influence it, and when it's time to delete/update the row, you *must* 
have the space for that xmax. So we can't opportunistically use the 
space for anything else, or compress them or anything like that.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-04-23 Thread Pavel Stehule
2015-04-23 16:12 GMT+02:00 Robert Haas robertmh...@gmail.com:

 On Thu, Apr 23, 2015 at 9:55 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  On Thu, Apr 23, 2015 at 4:56 AM, Pavel Stehule pavel.steh...@gmail.com
 
  wrote:
   I don't see a contradiction. There is clean agreement, so ERROR level
   should
   to show the context. NOTICE and WARNINGs doesn't need it - and there
 is
   a
   backward compatibility and usability reasons don't do it.
 
  Whether notices and warnings need it is a matter of opinion.  I don't
  think your idea is bad, and it might be a good rule of thumb in many
  cases, but I slightly prefer Marko's approach of adding a new option.
 
  I am not sure if I understand to you.
 
  please, can you write more about your idea?

 Your idea, as I understand it, is that for logs at severity levels
 lower than ERROR, we can always emit the context, because it's not
 necessary.  But I'm not sure that's right: some people might find that
 context helpful.  If, as Marko proposes, we add an explicit option,
 then everyone can choose the behavior that is right for them.


I am not sure, if explained it well. I would to emit context for ERROR and
higher by default. And I would not to emit context for any less than ERROR
by default (I am not sure about WARNING level).

But it can be changed by some option in RAISE statement like Marko proposes
- possible to change by GUC globally, because it doesn't change a behave of
application.

For current behave I have a problem with ERROR level in plpgsql where the
context is missing now.  On second hand I am thinking so current behave is
ok for NOTICE level .

I am not against to any new option in RAISE statement.

If there is some collision between me and Marko, then it is in opinion what
have to be default behave for NOTICE level. I strongly prefer don't show
context there. But I can accept some global switch too.

Regards

Pavel



 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



Re: [HACKERS] Split the 'Server Programming' chapter into two?

2015-04-23 Thread Alvaro Herrera
Robert Haas wrote:
 On Thu, Apr 23, 2015 at 8:21 AM, Andres Freund and...@anarazel.de wrote:

  To me at least 44 - 47 don't really fit well to the rest. I think we
  either should invent a new category for them, or move them to
  'Internals'. Maybe we could introduce 'Extending the Server' category
  for those and a couple more? Candidates for that least seem to be
  52. Writing A Procedural Language Handler
  53. Writing A Foreign Data Wrapper
  54. Writing A Custom Scan Provider
 
 I like the extending the server idea.  Maybe Server Extensions.

+1 for Extending the Server.  I don't like Server Extensions too much;
seems confusing with sql-level EXTENSION objects.

  Also, shouldn't there at least be a link to
  http://www.postgresql.org/docs/devel/static/xfunc-sql.html in
  http://www.postgresql.org/docs/devel/static/xplang.html ?
 
 Wouldn't hurt.

Agreed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Jim Nasby

On 4/23/15 2:42 AM, Heikki Linnakangas wrote:

On 04/22/2015 09:24 PM, Robert Haas wrote:

Yeah.  We have a serious need to reduce the size of our on-disk
format.  On a TPC-C-like workload Jan Wieck recently tested, our data
set was 34% larger than another database at the beginning of the test,
and 80% larger by the end of the test.  And we did twice the disk
writes.  See The Elephants in the Room.pdf at
https://sites.google.com/site/robertmhaas/presentations


Meh. Adding an 8-byte header to every 8k block would add 0.1% to the
disk size. No doubt it would be nice to reduce our disk footprint, but
the page header is not the elephant in the room.


I've often wondered if there was some way we could consolidate XMIN/XMAX 
from multiple tuples at the page level; that could be a big win for OLAP 
environments where most of your tuples belong to a pretty small range of 
XIDs. In many workloads you could have 80%+ of the tuples in a table 
having a single inserting XID.


Dunno how much it would help for OLTP though... :/
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Split the 'Server Programming' chapter into two?

2015-04-23 Thread Jim Nasby

On 4/23/15 9:23 AM, Alvaro Herrera wrote:

Robert Haas wrote:

On Thu, Apr 23, 2015 at 8:21 AM, Andres Freund and...@anarazel.de wrote:



To me at least 44 - 47 don't really fit well to the rest. I think we
either should invent a new category for them, or move them to
'Internals'. Maybe we could introduce 'Extending the Server' category
for those and a couple more? Candidates for that least seem to be
52. Writing A Procedural Language Handler
53. Writing A Foreign Data Wrapper
54. Writing A Custom Scan Provider


I like the extending the server idea.  Maybe Server Extensions.


+1 for Extending the Server.  I don't like Server Extensions too much;
seems confusing with sql-level EXTENSION objects.


I definitely think it would be useful to split the C stuff out on it's 
own. We could also stand to have more information there, especially 
around things like fcinfo, but that's another topic...

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Bruce Momjian
On Thu, Apr 23, 2015 at 05:02:19PM +0200, Andres Freund wrote:
 On 2015-04-23 15:52:40 +0100, Geoff Winkless wrote:
  When I set out I was really only hoping to express a preference as a user;
  on balance I would really rather not have DO IGNORE, if it were possible to
  avoid, because it's really ugly, but DO UPDATE/DO NOTHING I could just
  about cope with (and means you don't need to add IGNORE as a keyword,
  win!), although it still mildly pains me that there's an additional
  unnecessary word.
 
 Yea, DO NOTHING is a good alternative. And I do like we're adding one
 keyword less (which is also good for the parser's
 size/performance).

No question that DO IGNORE sounds awkward.  DO NOTHING also matches
CREATE RULE --- another plus.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Shouldn't CREATE TABLE LIKE copy the relhasoids property?

2015-04-23 Thread Bruce Momjian
On Thu, Apr 23, 2015 at 09:26:50AM -0400, Robert Haas wrote:
  What I meant was - I didn't see an attachment on that message.
 
  I didn't attach it as people have told me they can just as easily see
  the patch via git, and since it was so similar, I didn't repost it.
  Should I have?  I can easily do that.
 
 No, I just misread your email.  I thought you said you had attached
 the patch; rereading it, I see that you said you had applied the
 patch.  Silly me.

You were not the only one confused --- I got a private email on the same
topic.  My only guess is that I normally say attached in that case so
applied just looked too similar.  I will try to mix it up in the
future.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing tuple overhead

2015-04-23 Thread Joshua D. Drake


On 04/23/2015 09:42 AM, Jim Nasby wrote:


On 4/23/15 11:24 AM, Andres Freund wrote:

I do wonder what, in realistic cases, is actually the bigger contributor
to the overhead. The tuple header or the padding we liberally add in
many cases...


Assuming you're talking about padding between fields...

Several years ago Enova paid Command Prompt to start work on logical
column ordering, and part of the motivation for that was to allow
re-ordering physical tuples into the most efficient on-disk format
possible. I think I did some tests re-arranging some tables into the
theoretically most efficient order and measuring heap size. I think
there was some modest size improvement, maybe 10-15%? This was several
years ago so it's all foggy. Maybe Josh can find some of this in CMD's
ticketing system?


Yeah I dug around. I don't see anything about size improvement but here 
are our notes:


Alvaro said:

I ended up not producing notes as regularly as I had initially hoped. To 
try and make up for it, here's an update covering everything I've done 
since I started working on this issue.


This patch turned out to be completely different than what we had 
initially thought. We had thought it was going to be a matter of finding 
out places that used attnum and replace it with either attnum, 
attlognum or attphysnum, depending on what order was necessary on any 
given spot. This wasn't an easy thing to do because there are several 
hundreds of those. So it was supposed to be amazingly time-consuming and 
rather boring work.


This has nothing to do with reality: anywhere from parser down to 
optimizer and executor, the way things work is that a list of attributes 
is built, processed, and referenced. Some places assume that the list is 
in a certain order that's always the same order for those three cases. 
So the way to develop this feature is to change those places so that 
instead of receiving the list in one of these orders, they instead 
receive it in a different order.


So what I had to do early on, was find a way to retrieve the sort order 
from catalogs, preserve it when TupleDescriptors are built, and ensure 
the attribute list is extracted from TupleDesc in the correct order. But 
it turned out that this is not enough, because down in the parser guts, 
a target list is constructed; and later, a TupleDescriptor is built from 
the target list. So it's necessary to preserve the sorting info from the 
original tuple descriptor into the target list (which means adding order 
info to Var and TargetEntry nodes), so that the new TupleDesc can also 
have it.


Today I'm finding that even more than that is necessary. It turns out 
that the RangeTableEntries (i.e. the entries in the FROM clause of a 
query) have an item dubbed eref which is a list of column names; due 
to my changes in the earlier parser stages, this list is sorted in 
logical column order; but the code to resolve things such as columns 
used in JOIN/ON clauses walks the list (which is in logical order) and 
then uses the number of times it had to walk the elements in the list to 
construct a Var (column reference) in attnum order -- so it finds a 
different column, and it all fails.


So what I'm doing now is modify the RangeTableEntry node to keep a 
mapping list of logical to identity numbers. Then I'll have to search 
for places using the rte-eref-colnames and make sure that they 
correctly use attlognum as index into it.


And then later:

First of all I should note that I discussed the approach mentioned above 
to pgsql-hackers and got a very interesting comment from Tom Lane that 
adding sorting info to Var and TargetEntry nodes was not a very good 
idea because it'd break stored rules whenever a table column changed. So 
I went back and studied that code and noticed that it was really the 
change in RangeTableEntry that's doing the good magic; those other 
changes are fortunately not necessary. (Though there were a necessary 
vehicle for me to understand how the other stuff works.)


I've been continuing to study the backend code looking for uses of 
attribute lists that assume a single ordering. As I get more into it, 
more complex cases appear. The number of cases is fortunately bounded, 
though. Most of the uses of straight attribute lists are in places that 
do not require modification, or require little work or thought to update 
correctly.


However, some other places are not like that. I have fixed SQL 
functions two times now, and I just found out that the second fix (which 
I believed to be mostly correct) was to be the final one, but I found 
out just now that it's not, and the proper fix is going to require 
something a bit more low-level (namely, a projection step that reorders 
columns correctly after the fact). Fortunately, I believe that this 
extra projection step is going to fix a lot of other cases too, which I 
originally had no idea how to attack. Moreover, understanding that bit 
means I also figured out what Tom Lane meant 

Re: [HACKERS] Reducing tuple overhead

2015-04-23 Thread Petr Jelinek

On 23/04/15 18:24, Andres Freund wrote:

Whether that's feasible complexity wise is debatable, but it's certainly
possible.


I do wonder what, in realistic cases, is actually the bigger contributor
to the overhead. The tuple header or the padding we liberally add in
many cases...



The logical ordering patch + auto optimizations of column layout on 
table creation/rewrite might help partially there.


But what seems to be clear is that we need more in depth analysis of 
what really contributes most to the tuple size in various use-cases and 
then we can debate what we can do about it.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread Jeff Davis
On Thu, Apr 23, 2015 at 1:49 AM, David G. Johnston
david.g.johns...@gmail.com wrote:
 Reading and writing all this I'm convinced you have gotten the idea in your
 mind an expectation of equivalency and consistency where there really is
 little or none from an overall design perspective.  And none insofar as
 would merit trying to force the two example queries you provide to behave
 identically.  There are a number of things about SQL that one either simply
 lives with or goes through mind contortions to understand the, possibly
 imperfect, reasoning behind.  This is one of those things: and while it's
 been fun to do those contortions in the end I am only a little bit better
 off than when I simply accepted the fact the unknown and untyped were
 similar but different (even if I hadn't considered giving them different
 names).

You make some good points, but I disagree for two reasons:

1. This is a postgres issue, not a general SQL issue, so we can't
simply say SQL is weird (like we can with a lot of the strange NULL
semantics).
2. Postgres has determined that the unknown column reference b in
the query SELECT a=b FROM (SELECT ''::text, '  ') x(a,b) can and
should be coerced to text. So postgres has already made that decision.
It's just that when it tries to coerce it, it fails. Even if you
believe that literals and column references are not equivalent, I
don't see any justification at all for this behavior -- postgres
should not decide to coerce it in the first place if it's going to
fail.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving speed of make check-world

2015-04-23 Thread Jeff Janes
On Thu, Aug 14, 2014 at 10:45 PM, Peter Eisentraut pete...@gmx.net wrote:

 make check-world creates a temporary installation in every subdirectory
 it runs a test in, which is stupid: it's very slow and uses a lot of
 disk space.  It's enough to do this once per run.  That is the essence
 of what I have implemented.  It cuts the time for make check-world in
 half or less, and it saves gigabytes of disk space.


Something about this commit (dcae5faccab64776376d354d) broke make check
in parallel conditions when started from a clean directory.  It fails with
a different error each time, one example:

make -j4 check  /dev/null

In file included from gram.y:14515:
scan.c: In function 'yy_try_NUL_trans':
scan.c:10307: warning: unused variable 'yyg'
/usr/bin/ld: tab-complete.o: No such file: No such file or directory
collect2: ld returned 1 exit status
make[3]: *** [psql] Error 1
make[2]: *** [all-psql-recurse] Error 2
make[2]: *** Waiting for unfinished jobs
make[1]: *** [all-bin-recurse] Error 2
make: *** [all-src-recurse] Error 2
make: *** Waiting for unfinished jobs

If I rerun it without cleaning the tree, is usually passes the second
time.  Or I can just separate the make and the check like make -j4 
/dev/null  make check  /dev/null but I've grown accustomed to being
able to combine them since this problem was first fixed a couple years ago
(in a commit I can't seem to find)

I have:
GNU Make 4.0
Built for x86_64-unknown-linux-gnu

I was using ccache, but I still get the problem without using it.

Cheers,

Jeff


Re: [HACKERS] Reducing tuple overhead

2015-04-23 Thread Alvaro Herrera
Thanks for posting this.

Joshua D. Drake wrote:

 First of all I should note that I discussed the approach mentioned above to
 pgsql-hackers and got a very interesting comment from Tom Lane that adding
 sorting info to Var and TargetEntry nodes was not a very good idea because
 it'd break stored rules whenever a table column changed. So I went back and
 studied that code and noticed that it was really the change in
 RangeTableEntry that's doing the good magic; those other changes are
 fortunately not necessary. (Though there were a necessary vehicle for me to
 understand how the other stuff works.)

So in the logical columns thread I was saying that this change of eref
didn't work either; it seems that most of the time (if not always) the
list should continue to be in attnum order, and the
logical-ordering-info data should be obtained from the tuple descriptor
and whatever needs to be sorted should be sorted afterwards, i.e. in
setrefs.c, using data previously stored in RelOptInfo.  I tried doing
that and ran into some other mess elsewhere.

 I've been continuing to study the backend code looking for uses of attribute
 lists that assume a single ordering. As I get more into it, more complex
 cases appear. The number of cases is fortunately bounded, though.

he hopes, and eventually gives up

 However, some other places are not like that. I have fixed SQL functions
 two times now, and I just found out that the second fix (which I believed to
 be mostly correct) was to be the final one, but I found out just now that
 it's not, and the proper fix is going to require something a bit more
 low-level (namely, a projection step that reorders columns correctly after
 the fact). Fortunately, I believe that this extra projection step is going
 to fix a lot of other cases too, which I originally had no idea how to
 attack. Moreover, understanding that bit means I also figured out what Tom
 Lane meant on the second half of his response to my original pgsql-hackers
 comment. So I think we're good on that front.

I had forgotten my intention to rework things using projection.

In any case, I have posted a lot of patches now which others could
study, if there's interest.  I would sure like to see this split, and
there are multiple benefits such as reduction of padding space.  I'm
sure there's a nonempty set of guys brighter than me that could figure
it out in not that much time.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-23 Thread Jim Nasby

On 4/23/15 8:25 AM, Robert Haas wrote:

Some users are partitioning tables just so that each
partition can be autovac'd separately.  That really shouldn't be
required.


Are they doing this for improved heap scan performance? Index scan 
performance? If the table wasn't partitioned, would they need more than 
one pass through the indexes due to exhausting maintenance_work_mem?


There's probably some fairly low-hanging fruit for parallelizing vacuum, 
but it really depends on what problems people are running into.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread Andres Freund
On 2015-04-23 11:00:43 -0400, Bruce Momjian wrote:
 On Thu, Apr 23, 2015 at 09:13:52AM -0400, Robert Haas wrote:
  I think this is a good thing to do, but I sure wish we could go
  further and block it completely.  That may require more thought than
  we have time to put in at this stage of the release cycle, though, so
  +1 for doing at least this much.
 
 OK, good.  Thinking to 9.6, I am not sure how we could throw an error
 because we have allowed this in the past and pg_dump is going to be
 restored with a raw SQL CREATE TABLESPACE command.  

We could just document that you need to pre-create the tablespace and
ignore the resulting error. This isn't going to affect too many people.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Bruce Momjian
On Thu, Apr 23, 2015 at 06:24:00PM +0300, Heikki Linnakangas wrote:
 I've often wondered if there was some way we could consolidate XMIN/XMAX
 from multiple tuples at the page level; that could be a big win for OLAP
 environments where most of your tuples belong to a pretty small range of
 XIDs. In many workloads you could have 80%+ of the tuples in a table
 having a single inserting XID.
 
 It would be doable for xmin - IIRC someone even posted a patch for
 that years ago - but xmax (and ctid) is difficult. When a tuple is
 inserted, Xmax is basically just a reservation for the value that
 will be put there later. You have no idea what that value is, and
 you can't influence it, and when it's time to delete/update the row,
 you *must* have the space for that xmax. So we can't
 opportunistically use the space for anything else, or compress them
 or anything like that.

Also SELECT FOR UPDATE uses the per-row xmax too.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding more information about process(es) cpu and memory usage

2015-04-23 Thread Heikki Linnakangas

On 04/23/2015 08:00 PM, Radovan Jablonovsky wrote:

During current encounters with amazon web services - RDS, the DBA does not
have access to OS/linux shell of underlying instance. That render some
postgresql monitoring technique of process CPU and memory usage, not
useful. Even if the AWS provide internal tools/programming interface for
monitoring, it could be very useful to have this information provided by
postgresql system table(s)/view/functions/api. The information about how
much postgresql background/process is using CPU (similar to command top
result) and memory. it could be something as simple as adding cpu,memory
information fields to pg_stat_activity.


You can write an extension to do that. Of course, Amazon won't let you 
run your own C extension either (otherwise you could use that to escape 
into shell), but if you do it well and publish and get it included into 
standard distributions, they just might pick it up. Unless they don't 
want you to see that information. If they don't, then they wouldn't let 
you use the system views either.


In a nutshell, I don't think PostgreSQL should get involved in that...

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Moving ExecInsertIndexTuples and friends to new file

2015-04-23 Thread Heikki Linnakangas
While looking at Peter's INSERT ... ON CONFLICT patch, I started to feel 
that ExecInsertIndexTuples() and friends would deserve a file of their 
own, and not be buried in the middle of execUtils.c. I propose that we 
split execUtils.c into two, moving ExecOpenIndices(), ExecCloseIndices() 
ExecInsertIndexTuples() and related functions into a new file called 
executor/execIndexing.c.


Moving functions makes backpatching harder, so it's not something to be 
done lightly, but I think it would be worth it in this case. There have 
been few changes to those functions in years, so I doubt they'll need 
much backpatching in the near future either.


For comparison, this is what the file sizes of executor/exec*.c will 
look like after the split:


-rw-r--r-- 1 heikki heikki  14710 Apr 22 09:07 execAmi.c
-rw-r--r-- 1 heikki heikki   9711 Apr 22 09:07 execCurrent.c
-rw-r--r-- 1 heikki heikki  16659 Apr 22 09:07 execGrouping.c
-rw-r--r-- 1 heikki heikki  16023 Apr 23 21:57 execIndexing.c
-rw-r--r-- 1 heikki heikki   8276 Apr 22 09:07 execJunk.c
-rw-r--r-- 1 heikki heikki  80102 Apr 22 09:07 execMain.c
-rw-r--r-- 1 heikki heikki  18694 Apr 22 09:07 execProcnode.c
-rw-r--r-- 1 heikki heikki 160700 Apr 22 09:07 execQual.c
-rw-r--r-- 1 heikki heikki   9957 Apr 22 09:07 execScan.c
-rw-r--r-- 1 heikki heikki  37796 Apr 22 09:07 execTuples.c
-rw-r--r-- 1 heikki heikki  28004 Apr 23 21:50 execUtils.c

Any objections? I'm planning to do this as a separate commit, before the 
INSERT ... ON CONFLICT patch goes in.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump: largeobject behavior issues (possible bug)

2015-04-23 Thread Joshua D. Drake


Hello,

I have been working a problem with Andrew Gierth (sp?) in regards to 
pg_dump. Here is the basic breakdown:


FreeBSD 10.1
PostgreSQL 9.3.6
64GB ~ memory
500GB database
228G of largeobjects (106M objects)

The database dumps fine as long as we don't dump large objects. However, 
if we try to dump the large objects, FreeBSD will kill pg_dump as it 
will consume all free memory and swap. With Andrew's help we were able 
to determine the following:


There is a memory cost of about 160 bytes per largeobject. Based on the 
number of largeobjects we have that would be about 16GB of memory. Also 
when pg_dump is reading in the largobject list there is a point where 
pg_dump has a PGresult containing the entire contents of 
pg_largeobject_metadata and a malloc of an array where it is going to 
copy the data to. That could easily get above the 40G thus causeFreeBSD 
to kill the process.


tl;dr

The memory issue comes down to the fact that in the prep stage, pg_dump 
creates a TOC entry for every individual large object.


It seems that pg_dump should be much more efficient about dumping these
objects.

Sincerely,

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread Jim Nasby

On 4/23/15 5:07 AM, Kyotaro HORIGUCHI wrote:

This is because parsing of UNION immediately converts constants
of unknown type in the UNION's both arms to text so the top level
select won't be bothered by this problem. But the problematic
query doesn't have appropriate timing to do that until the
function I patched.


FWIW, I think that's more accidental than anything.

I'm no expert in our casting and type handling code but I spent a lot of 
time stuck in it while working on the variant type, and it seems very 
scattered. There's stuff in the actual casting code, there's some stuff 
in other parts of parse/plan, there's stuff in individual types (array 
and record at least).


Some stuff is handled by casting; some stuff is handled by mangling the 
parse tree.


Something else I noticed is we're not consistent with handling typmod 
either. I don't remember the exact example I found, but there's cases 
involving casting of constants where we ignore it (I don't think it was 
as simple as SELECT 1::int::variant(...), but it was something like that).


I don't know how much of this is just historical and how much is 
intentional, but it'd be nice if we could consolidate it more.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Heikki Linnakangas

On 04/23/2015 10:53 PM, Andres Freund wrote:

On 2015-04-23 12:45:59 -0700, Peter Geoghegan wrote:

On Thu, Apr 23, 2015 at 12:55 AM, Andres Freund and...@anarazel.de wrote:

I think you misread my statement: I'm saying we don't need the new
argument anymore, even if we still do the super-deletion in
heap_delete(). Now that the speculative insertion will not be visible
(as in seen on a tuple they could delete) to other backends we can just
do the super deletion if we see that the tuple is a promise one.


I disagree. I think it's appropriate that the one and only super
heap_delete() caller make a point of indicating that that's what it's
doing. The cost is only that the two other such callers must say that
they're not doing it. Super deletion is a special thing, that logical
decoding knows all about for example, and I think it's appropriate
that callers ask explicitly.


Unconvinced. Not breaking an API has its worth.


The heapam API is not that stable, we've added arguments to those 
functions every once in a while, and I don't recall any complaints. So 
I'm with Peter that super-deletion should be requested explicitly by the 
caller.


That said, I'd actually like to see a separate heap_super_delete() 
function for that, rather than piggybacking on heap_delete(). It's a 
quite different operation. There'll be some duplication, but seems 
better than a maze of if-else's in heap_delete.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread Andres Freund
On 2015-04-23 15:17:55 -0500, Jim Nasby wrote:
 Yes, but only after creating a brand new cluster from scratch, which would
 then disallow them from putting tablespaces in $PGDATA.

pg_dumpall output includes tablespaces.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread David Steele
On 4/23/15 4:30 PM, Andres Freund wrote:
 On 2015-04-23 16:26:09 -0400, Robert Haas wrote:
 But pg_upgrade automates all that, so you can't use pg_upgrade in that
 case.  If we add a GUC as I suggested, you can still use pg_upgrade.
 
 But we also have to live with data directories being in a shit state
 forever onward. We won't really be able to remove the option
 realistically.
 
 It's not that hard to just move the tablespace out of the data directory
 while the server. As long as you move it on the same partition, it's
 even fast.

I agree.  It wouldn't be that hard to do a bit of directory manipulation
before upgrading - and that's only for the people who have put
tablespaces in $PGDATA.  I've never seen it before, but I have no doubt
that it happens.  I can see how it might make a weird sort of sense
depending on the level of experience.

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Moving ExecInsertIndexTuples and friends to new file

2015-04-23 Thread Peter Geoghegan
On Thu, Apr 23, 2015 at 12:05 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 While looking at Peter's INSERT ... ON CONFLICT patch, I started to feel
 that ExecInsertIndexTuples() and friends would deserve a file of their own,
 and not be buried in the middle of execUtils.c. I propose that we split
 execUtils.c into two, moving ExecOpenIndices(), ExecCloseIndices()
 ExecInsertIndexTuples() and related functions into a new file called
 executor/execIndexing.c.

That split makes a lot of sense to me.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread Robert Haas
On Thu, Apr 23, 2015 at 11:00 AM, Bruce Momjian br...@momjian.us wrote:
  I have developed the attached patch to warn about creating tablespaces
  inside the data directory.  The case this doesn't catch is referencing a
  symbolic link that points to the same directory.  We can't make it an
  error so people can use pg_upgrade these setups.  This would be for 9.5
  only.

 I think this is a good thing to do, but I sure wish we could go
 further and block it completely.  That may require more thought than
 we have time to put in at this stage of the release cycle, though, so
 +1 for doing at least this much.

 OK, good.  Thinking to 9.6, I am not sure how we could throw an error
 because we have allowed this in the past and pg_dump is going to be
 restored with a raw SQL CREATE TABLESPACE command.

 We have had this type of problem before, but never resolved it.  We
 almost need pg_dump to set a GUC variable telling the backend it is
 restoring a dump and issue a warning, but throw an error if the same
 command was issued outside of a pg_dump restore.  FYI, pg_upgrade
 already throws a warning related to the non-creation of a delete script.

Well, we've made backward-incompatible changes before.  Not to this
specific thing, but in general.  I don't think there's anything
preventing us from doing so here, except that we don't want to annoy
too many users.

I don't think the right solution is to add a GUC so that pg_dump
ignores this, and otherwise deny it.  It's bad if you do it as part of
a restore, and it's bad if you do it some other time, too.

What I'd recommend is that we add a GUC stupid_tablespaces=off.  If
you have done this in the past, and you want to upgrade (whether via
pg_dump or pg_upgrade) to a new release, you'll have to configure the
new cluster for stupid_tablespaces=on.  If you don't, you'll get an
error.  If you do, you'll get a warning.  That way, people can still
upgrade, but they have to set the GUC to make it work, so they'll be
clearly aware that they're doing something that is not recommended.

(Of course we might want to call the GUC something like other than
stupid_tablespaces, like allow_tablespaces_in_data_directory, but you
get the idea.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread Andres Freund
On 2015-04-23 15:46:20 -0400, Robert Haas wrote:
 Well, we've made backward-incompatible changes before.  Not to this
 specific thing, but in general.  I don't think there's anything
 preventing us from doing so here, except that we don't want to annoy
 too many users.

I think the number of users that have done this, and haven't yet
(knowing or unknowningly) been bitten by it is pretty low. In that
scenario it seems much better to break compatibility given that it's
pretty easy to fix during restore (just precreate the tablespace).  It's
not something you have to retest a whole application for.

If you want to avoid that one error you can still do pg_dumpall
--globals, edit and run that script, and only then restore the the
actual databases.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Andres Freund
On 2015-04-23 12:45:59 -0700, Peter Geoghegan wrote:
 On Thu, Apr 23, 2015 at 12:55 AM, Andres Freund and...@anarazel.de wrote:
  I think you misread my statement: I'm saying we don't need the new
  argument anymore, even if we still do the super-deletion in
  heap_delete(). Now that the speculative insertion will not be visible
  (as in seen on a tuple they could delete) to other backends we can just
  do the super deletion if we see that the tuple is a promise one.
 
 I disagree. I think it's appropriate that the one and only super
 heap_delete() caller make a point of indicating that that's what it's
 doing. The cost is only that the two other such callers must say that
 they're not doing it. Super deletion is a special thing, that logical
 decoding knows all about for example, and I think it's appropriate
 that callers ask explicitly.

Unconvinced. Not breaking an API has its worth.

 The second most significant open item is rebasing on top of the recent
 RLS changes, IMV.

Not sure I agree. That part seems pretty mechanical to me.

* The docs aren't suitable for endusers. I think this will take a fair
  amount of work.

* We're not yet sure about the syntax yet. In addition to the keyword
  issue I'm unconvinced about having two WHERE clauses in the same
  statement. I think that'll end up confusing users a fair bit. Might
  still be the best way forward.
* The executor integration still isn't pretty, although Heikki is making
  strides there
* I don't think anybody seriously has looked at the planner side yet.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread Jim Nasby

On 4/23/15 11:01 AM, Andres Freund wrote:

On April 23, 2015 6:12:05 PM GMT+03:00, Jim Nasby jim.na...@bluetreble.com 
wrote:

On 4/22/15 9:41 PM, Bruce Momjian wrote:

The case this doesn't catch is referencing a
symbolic link that points to the same directory.  We can't make it an
error so people can use pg_upgrade these setups.


Couldn't we make it an ERROR unless IsBinaryUpgrade?


People still upgrade without pg upgrade.


Yes, but only after creating a brand new cluster from scratch, which 
would then disallow them from putting tablespaces in $PGDATA.


Or are you saying people do binary upgrades without pg_upgrade? I don't 
think we have any obligation to support that...

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-23 Thread Andres Freund
On 2015-04-23 15:40:36 -0400, Robert Haas wrote:
 The issue is that you have to vacuum a table frequently enough to
 avoid accumulating bloat.  The frequency with which you need to vacuum
 varies depending on the size of the table and how frequently it's
 updated.  However, a large, heavily-updated table can take long enough
 to vacuum that, by the time you get done, it's already overdue to be
 vacuumed again.  That's a problem.

Especially because the indexes are scanned fully. In many cases I've
observed the heap scans themselves being fast; but scanning hundreds
(yes) of gigabytes of indexes taking ages.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Peter Geoghegan
On Thu, Apr 23, 2015 at 12:55 AM, Andres Freund and...@anarazel.de wrote:
 I think you misread my statement: I'm saying we don't need the new
 argument anymore, even if we still do the super-deletion in
 heap_delete(). Now that the speculative insertion will not be visible
 (as in seen on a tuple they could delete) to other backends we can just
 do the super deletion if we see that the tuple is a promise one.

I disagree. I think it's appropriate that the one and only super
heap_delete() caller make a point of indicating that that's what it's
doing. The cost is only that the two other such callers must say that
they're not doing it. Super deletion is a special thing, that logical
decoding knows all about for example, and I think it's appropriate
that callers ask explicitly.

  * breinbaas on IRC just mentioned that it'd be nice to have upsert as a
link in the insert. Given that that's the pervasive term that doesn't
seem absurd.

 Not sure what you mean. Where would the link appear?

 The index, i.e. it'd just be another indexterm. It seems good to give
 people a link.

Oh, okay. Sure. Done on Github.

 * We need to figure out the tuple lock strength details. I think this
 is doable, but it is the greatest challenge to committing ON CONFLICT
 UPDATE at this point. Andres feels that we should require no greater
 lock strength than an equivalent UPDATE. I suggest we get to this
 after committing the IGNORE variant. We probably need to discuss this
 some more.

 I want to see a clear way forward before we commit parts. It doesn't
 have to be completely iron-clad, but the way forward should be pretty
 clear. What's the problem you're seeing right now making this work? A
 couple days on jabber you seemed to see a way doing this?

I was really just identifying it as the biggest problem the patch now
faces, and I want to face those issues down ASAP. Of course, that's
good, because as you say it is a small problem in an absolute sense.
The second most significant open item is rebasing on top of the recent
RLS changes, IMV.

I can see yourself and Heikki continuing to change small stylistic
things, which I expect to continue for a little while. That's fine,
but naturally I want to be aggressive about closing off these open
issues that are not just general clean-up, but have some small level
of risk of becoming more significant blockers.

 The reason I think this has to use the appropriate lock level is that
 it'll otherwise re-introduce the deadlocks that fk locks resolved. Given
 how much pain we endured to get fk locks, that seems like a bad deal.

Right.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump: largeobject behavior issues (possible bug)

2015-04-23 Thread Andrew Gierth
 Joshua == Joshua D Drake j...@commandprompt.com writes:

 Joshua The database dumps fine as long as we don't dump large
 Joshua objects. However, if we try to dump the large objects, FreeBSD
 Joshua will kill pg_dump as it will consume all free memory and
 Joshua swap. With Andrew's help we were able to determine the
 Joshua following:

 Joshua There is a memory cost of about 160 bytes per largeobject.

I may have the exact number here wrong, it was just a quick eyeball of
the data structures (and depends on malloc overheads anyway).

The relevant code is getBlobs in pg_dump.c, which queries the whole of
pg_largeobject_metadata without using a cursor (so the PGresult is
already huge thanks to having 100 million rows), and then mallocs a
BlobInfo array and populates it from the PGresult, also using pg_strdup
for the oid string, owner name, and ACL if any.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Peter Geoghegan
On Thu, Apr 23, 2015 at 12:53 PM, Andres Freund and...@anarazel.de wrote:
 Unconvinced. Not breaking an API has its worth.

Yeah, and I acknowledge that - but it's not something that it's
appropriate to encapsulate IMV.

Let's just leave it to Heikki...I'd say he has the deciding vote,
especially as the reviewer that is more in charge of the executor
stuff than anything else.

 The second most significant open item is rebasing on top of the recent
 RLS changes, IMV.

 Not sure I agree. That part seems pretty mechanical to me.

Hopefully so.

 * The docs aren't suitable for endusers. I think this will take a fair
   amount of work.

It's hard to explain why something like the collation field in the
inference specification matters to users...because it's only supported
at all due to forwards compatibility concerns. It's hard to document
certain things like that in an accessible way. In general, much of the
effort of the last year was making the feature play nice, and
considering a bunch of usability edge cases that are unlikely to come
up, but still must be documented.

 * We're not yet sure about the syntax yet. In addition to the keyword
   issue I'm unconvinced about having two WHERE clauses in the same
   statement. I think that'll end up confusing users a fair bit. Might
   still be the best way forward.

My previous approach to allowing an index predicate did at least
clearly show that the predicate belonged to the inference
specification, since it appeared between parenthesis. Maybe we should
do that after all? I don't think it much matters if the inference
specification is not identical to CREATE INDEX. I don't want to give
up inference of partial indexes, and I don't want to give up allowing
the UPDATE to have a limited WHERE clause, and we can't just spell
those differently here. So what alternative does that leave?

Anyone else have an opinion?

 * The executor integration still isn't pretty, although Heikki is making
   strides there

That's just clean-up, though. I'm not worried about the risk of Heikki
not succeeding at that.

 * I don't think anybody seriously has looked at the planner side yet.

Good point. That certainly needs to be looked at (and I include the
rewriter parts in that). It's really not that much code, but (ideally)
a subject matter expert should look into the whole ExcludedExpr dance
in particular.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Andres Freund
On 2015-04-23 23:08:34 +0300, Heikki Linnakangas wrote:
 The heapam API is not that stable, we've added arguments to those functions
 every once in a while, and I don't recall any complaints.

I heard some, but not too many, that's true. I know that I've written
code that'd be broken/needed even more ifdefs ;)

 That said, I'd actually like to see a separate heap_super_delete() function
 for that, rather than piggybacking on heap_delete(). It's a quite different
 operation. There'll be some duplication, but seems better than a maze of
 if-else's in heap_delete.

+many. I've requested that changes a couple times now.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread Andres Freund
On 2015-04-23 16:26:09 -0400, Robert Haas wrote:
 But pg_upgrade automates all that, so you can't use pg_upgrade in that
 case.  If we add a GUC as I suggested, you can still use pg_upgrade.

But we also have to live with data directories being in a shit state
forever onward. We won't really be able to remove the option
realistically.

It's not that hard to just move the tablespace out of the data directory
while the server. As long as you move it on the same partition, it's
even fast.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump: largeobject behavior issues (possible bug)

2015-04-23 Thread Andrew Dunstan


On 04/23/2015 04:04 PM, Andrew Gierth wrote:

Joshua == Joshua D Drake j...@commandprompt.com writes:

  Joshua The database dumps fine as long as we don't dump large
  Joshua objects. However, if we try to dump the large objects, FreeBSD
  Joshua will kill pg_dump as it will consume all free memory and
  Joshua swap. With Andrew's help we were able to determine the
  Joshua following:

  Joshua There is a memory cost of about 160 bytes per largeobject.

I may have the exact number here wrong, it was just a quick eyeball of
the data structures (and depends on malloc overheads anyway).

The relevant code is getBlobs in pg_dump.c, which queries the whole of
pg_largeobject_metadata without using a cursor (so the PGresult is
already huge thanks to having 100 million rows), and then mallocs a
BlobInfo array and populates it from the PGresult, also using pg_strdup
for the oid string, owner name, and ACL if any.




I'm surprised this hasn't come up before. I have a client that I 
persuaded to convert all their LOs to bytea fields because of problems 
with pg_dump handling millions of LOs, and kept them on an older 
postgres version until they made that change.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] adding more information about process(es) cpu and memory usage

2015-04-23 Thread Radovan Jablonovsky
During current encounters with amazon web services - RDS, the DBA does not
have access to OS/linux shell of underlying instance. That render some
postgresql monitoring technique of process CPU and memory usage, not
useful. Even if the AWS provide internal tools/programming interface for
monitoring, it could be very useful to have this information provided by
postgresql system table(s)/view/functions/api. The information about how
much postgresql background/process is using CPU (similar to command top
result) and memory. it could be something as simple as adding cpu,memory
information fields to pg_stat_activity.

-- 

*Radovan Jablonovsky* | SaaS DBA | Phone 1-403-262-6519 (ext. 7256) | Fax
1-403-233-8046



*Replicon | Hassle-Free Time  Expense Management Software - 7,300
Customers - 70 Countrieswww.replicon.com
http://www.replicon.com/ | facebook
http://www.facebook.com/Replicon.inc | twitter
http://twitter.com/Replicon | blog
http://www.replicon.com/blog/ | contact us
http://www.replicon.com/about_replicon/contact_us.aspxWe are
hiring! | search jobs
http://tbe.taleo.net/NA2/ats/careers/searchResults.jsp?org=REPLICONcws=1act=sortsortColumn=1__utma=1.651918544.1299001662.1299170819.1299174966.10__utmb=1.8.10.1299174966__utmc=1__utmx=-__utmz=1.1299174985.10.3.utmcsr=google%7Cutmccn=(organic)%7Cutmcmd=organic%7Cutmctr=replicon%20careers__utmv=1.%7C3=Visitor%20Type=Prospects=1,__utmk=40578466*


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread David G. Johnston
On Thursday, April 23, 2015, Jeff Davis pg...@j-davis.com wrote:

 On Thu, Apr 23, 2015 at 1:49 AM, David G. Johnston
 david.g.johns...@gmail.com javascript:; wrote:
  Reading and writing all this I'm convinced you have gotten the idea in
 your
  mind an expectation of equivalency and consistency where there really is
  little or none from an overall design perspective.  And none insofar as
  would merit trying to force the two example queries you provide to behave
  identically.  There are a number of things about SQL that one either
 simply
  lives with or goes through mind contortions to understand the, possibly
  imperfect, reasoning behind.  This is one of those things: and while it's
  been fun to do those contortions in the end I am only a little bit better
  off than when I simply accepted the fact the unknown and untyped were
  similar but different (even if I hadn't considered giving them different
  names).

 You make some good points, but I disagree for two reasons:

 1. This is a postgres issue, not a general SQL issue, so we can't
 simply say SQL is weird (like we can with a lot of the strange NULL
 semantics).


But it is ok to admit that we are weird when we are.  Though yes, we are
being inefficient here even with the current behavior taken as desired.

2. Postgres has determined that the unknown column reference b in
 the query SELECT a=b FROM (SELECT ''::text, '  ') x(a,b) can and
 should be coerced to text.


So the error should be operator does not exist: text = unknown...instead
it tries and fails to cast the unknown to text.

Or allow for the coercion to proceed.

There would be fewer obvious errors, and simple cases that fail would begin
to work sensibly, but it still feels like implicit casting from text which
was recently largely removed from the system for cause.  Albeit to the
disgruntlement of some and accusations of being draconian by others.


 So postgres has already made that decision.
 It's just that when it tries to coerce it, it fails. Even if you
 believe that literals and column references are not equivalent, I
 don't see any justification at all for this behavior -- postgres
 should not decide to coerce it in the first place if it's going to
 fail.


This is true - but obviously one solution is to not attempt it in the first
place.

David J.


Re: [HACKERS] [committers] pgsql: RLS fixes, new hooks, and new test module

2015-04-23 Thread Stephen Frost
Christian,

* Christian Ullrich (ch...@chrullrich.net) wrote:
 * Stephen Frost wrote:
 
 RLS fixes, new hooks, and new test module
 
 The buildfarm says that with -DCLOBBER_CACHE_ALWAYS, the RLS
 violations get blamed on the wrong tables. Mostly, they are catalogs
 (I have seen pg_opclass, pg_am, and pg_amproc), but some also come
 up with binary garbage instead, e.g.
 
 - 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=markhordt=2015-04-23%2000%3A00%3A12
 - 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=friarbirddt=2015-04-23%2004%3A20%3A00
 - 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=tickdt=2015-04-22%2019%3A56%3A53

Yup, thanks, Robert pointed this out on another thread and I'm looking
into it.

Thanks again!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Moving ExecInsertIndexTuples and friends to new file

2015-04-23 Thread Stephen Frost
* Peter Geoghegan (p...@heroku.com) wrote:
 On Thu, Apr 23, 2015 at 12:05 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
  While looking at Peter's INSERT ... ON CONFLICT patch, I started to feel
  that ExecInsertIndexTuples() and friends would deserve a file of their own,
  and not be buried in the middle of execUtils.c. I propose that we split
  execUtils.c into two, moving ExecOpenIndices(), ExecCloseIndices()
  ExecInsertIndexTuples() and related functions into a new file called
  executor/execIndexing.c.
 
 That split makes a lot of sense to me.

No objections here.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Reducing tuple overhead

2015-04-23 Thread Amit Kapila
On Fri, Apr 24, 2015 at 1:03 AM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 4/23/15 11:45 AM, Petr Jelinek wrote:

 On 23/04/15 18:24, Andres Freund wrote:

 Whether that's feasible complexity wise is debatable, but it's certainly
 possible.


 I do wonder what, in realistic cases, is actually the bigger contributor
 to the overhead. The tuple header or the padding we liberally add in
 many cases...


 The logical ordering patch + auto optimizations of column layout on
 table creation/rewrite might help partially there.

 But what seems to be clear is that we need more in depth analysis of
 what really contributes most to the tuple size in various use-cases and
 then we can debate what we can do about it.


 Also, what Robert posted was that while we started at something like
15%-30% larger, we ended the test at 80% larger. That makes me think that
the bigger win is not in reducing tuple size but tackling bloat.


I agree with you and what I think one of the major reasons of bloat is that
Index segment doesn't have visibility information due to which clearing of
Index needs to be tied along with heap.  Now if we can move transaction
information at page level, then we can even think of having it in Index
segment as well and then Index can delete/prune it's tuples on it's own
which can reduce the bloat in index significantly and there is a benefit
to Vacuum as well.  Now this has some downsides as well like Delete
needs to traverse Index segment as well to Delete mark the tuples, but
I think the upsides of reducing bloat can certainly outweigh the downsides.

In short, reducing the tuple size by moving transaction information at
page level can not only reduce the tuple size but can also help in
reducing bloat.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 23 Apr 2015 14:49:29 -0500, Jim Nasby jim.na...@bluetreble.com wrote 
in 55394cc9.5050...@bluetreble.com
 On 4/23/15 5:07 AM, Kyotaro HORIGUCHI wrote:
  This is because parsing of UNION immediately converts constants
  of unknown type in the UNION's both arms to text so the top level
  select won't be bothered by this problem. But the problematic
  query doesn't have appropriate timing to do that until the
  function I patched.
 
 FWIW, I think that's more accidental than anything.

I guess so. It looks not intentional about this behavior at
all.

 I'm no expert in our casting and type handling code but I spent a lot
 of time stuck in it while working on the variant type, and it seems
 very scattered. There's stuff in the actual casting code, there's some
 stuff in other parts of parse/plan, there's stuff in individual types
 (array and record at least).
 
 Some stuff is handled by casting; some stuff is handled by mangling
 the parse tree.

That's what makes me unconfident. But if coercion is always made
by coerce_type and coercion is properly considered at all places
needs it, and this coercion steps is appropriate, we will see
nothing bad. I hope.

 Something else I noticed is we're not consistent with handling typmod
 either. I don't remember the exact example I found, but there's cases
 involving casting of constants where we ignore it (I don't think it
 was as simple as SELECT 1::int::variant(...), but it was something
 like that).

Mmm.. It's a serious bug if explicit casts are ignored. If some
cast procedures does wrong, it should be fixed.

 I don't know how much of this is just historical and how much is
 intentional, but it'd be nice if we could consolidate it more.

Yeah, but it seems tough to do it throughly.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] forward vs backward slashes in msvc build code

2015-04-23 Thread Michael Paquier
On Fri, Mar 13, 2015 at 6:04 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Fri, Mar 13, 2015 at 6:20 AM, Alvaro Herrera wrote:
 Peter Eisentraut wrote:
 This is contrib/chkpass not finding the crypt symbol, which is
 presumably in some library.  But I can't see how it would normally find
 it, without my patch.

 It seems crypt is provided by libpgport.  So chkpass should be mentioned
 in @contrib_uselibpgport, but isn't.  Maybe the fix is just to add it
 there?

 I had a closer look at this patch, and yes indeed, the problem was
 exactly that. Now honestly I cannot understand why this dependency
 with libpgport was not necessary before... In any case, attached is a
 patch rebased on HEAD that builds correctly with MSVC.

Now that the stuff related to the move from contrib/ to src/bin/,
modulescheck and tmp_install has been committed, shouldn't we give a
new shot at this patch? Attached is a rebased version.
-- 
Michael
From 3599bab7b0e37df2a47e497dc9049aa7d78a880b Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Thu, 23 Apr 2015 20:13:30 -0700
Subject: [PATCH] Replace backslashes by slash in MSVC build script

This makes it easier to run on Linux to check file paths.
---
 src/tools/msvc/MSBuildProject.pm |   2 +-
 src/tools/msvc/Mkvcbuild.pm  | 344 ---
 src/tools/msvc/Project.pm|  44 +++--
 src/tools/msvc/Solution.pm   | 138 
 4 files changed, 264 insertions(+), 264 deletions(-)

diff --git a/src/tools/msvc/MSBuildProject.pm b/src/tools/msvc/MSBuildProject.pm
index 37958f9..a16f9ac 100644
--- a/src/tools/msvc/MSBuildProject.pm
+++ b/src/tools/msvc/MSBuildProject.pm
@@ -127,7 +127,7 @@ EOF
 	foreach my $fileNameWithPath (sort keys %{ $self-{files} })
 	{
 		confess Bad format filename '$fileNameWithPath'\n
-		  unless ($fileNameWithPath =~ /^(.*)\\([^\\]+)\.(c|cpp|y|l|rc)$/);
+		  unless ($fileNameWithPath =~ m!^(.*)/([^/]+)\.(c|cpp|y|l|rc)$!);
 		my $dir  = $1;
 		my $fileName = $2;
 		if ($fileNameWithPath =~ /\.y$/ or $fileNameWithPath =~ /\.l$/)
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 8654bfe..2f0e837 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,10 +33,12 @@ my $contrib_defines = { 'refint' = 'REFINT_VERBOSE' };
 my @contrib_uselibpq =
   ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
 my @contrib_uselibpgport = (
+	'chkpass',
 	'oid2name',
 	'pg_standby',
 	'vacuumlo');
 my @contrib_uselibpgcommon = (
+	'chkpass',
 	'oid2name',
 	'pg_standby',
 	'vacuumlo');
@@ -44,8 +46,8 @@ my $contrib_extralibs = undef;
 my $contrib_extraincludes =
   { 'tsearch2' = ['contrib/tsearch2'], 'dblink' = ['src/backend'] };
 my $contrib_extrasource = {
-	'cube' = [ 'contrib\cube\cubescan.l', 'contrib\cube\cubeparse.y' ],
-	'seg' = [ 'contrib\seg\segscan.l', 'contrib\seg\segparse.y' ], };
+	'cube' = [ 'contrib/cube/cubescan.l', 'contrib/cube/cubeparse.y' ],
+	'seg' = [ 'contrib/seg/segscan.l', 'contrib/seg/segparse.y' ], };
 my @contrib_excludes = ('pgcrypto', 'commit_ts', 'intagg', 'sepgsql');
 
 # Set of variables for frontend modules
@@ -59,12 +61,12 @@ my $frontend_extralibs = {
 	'pgbench'= ['ws2_32.lib'],
 	'psql'   = ['ws2_32.lib'] };
 my $frontend_extraincludes = {
-	'initdb' = ['src\timezone'],
-	'psql'   = [ 'src\bin\pg_dump', 'src\backend' ] };
+	'initdb' = ['src/timezone'],
+	'psql'   = [ 'src/bin/pg_dump', 'src/backend' ] };
 my $frontend_extrasource = {
-	'psql' = ['src\bin\psql\psqlscan.l'],
+	'psql' = ['src/bin/psql/psqlscan.l'],
 	'pgbench' =
-		[ 'src\bin\pgbench\exprscan.l', 'src\bin\pgbench\exprparse.y' ],
+		[ 'src/bin/pgbench/exprscan.l', 'src/bin/pgbench/exprparse.y' ],
 };
 my @frontend_excludes =
   ('pgevent', 'pg_basebackup', 'pg_rewind', 'pg_dump', 'pg_xlogdump', 'scripts');
@@ -73,9 +75,9 @@ sub mkvcbuild
 {
 	our $config = shift;
 
-	chdir('..\..\..') if (-d '..\msvc'  -d '..\..\..\src');
+	chdir('../../..') if (-d '../msvc'  -d '../../../src');
 	die 'Must run from root or msvc directory'
-	  unless (-d 'src\tools\msvc'  -d 'src');
+	  unless (-d 'src/tools/msvc'  -d 'src');
 
 	my $vsVersion = DetermineVisualStudioVersion();
 
@@ -114,37 +116,37 @@ sub mkvcbuild
 
 	$libpgport = $solution-AddProject('libpgport', 'lib', 'misc');
 	$libpgport-AddDefine('FRONTEND');
-	$libpgport-AddFiles('src\port', @pgportfiles);
+	$libpgport-AddFiles('src/port', @pgportfiles);
 
 	$libpgcommon = $solution-AddProject('libpgcommon', 'lib', 'misc');
 	$libpgcommon-AddDefine('FRONTEND');
-	$libpgcommon-AddFiles('src\common', @pgcommonfrontendfiles);
+	$libpgcommon-AddFiles('src/common', @pgcommonfrontendfiles);
 
-	$postgres = $solution-AddProject('postgres', 'exe', '', 'src\backend');
-	$postgres-AddIncludeDir('src\backend');
-	$postgres-AddDir('src\backend\port\win32');
-	$postgres-AddFile('src\backend\utils\fmgrtab.c');
+	$postgres = $solution-AddProject('postgres', 'exe', '', 'src/backend');
+	

[HACKERS] Improving vacuum/VM/etc

2015-04-23 Thread Jim Nasby
I mentioned this idea in the other[1] vacuum thread [2], but I think 
it got lost.


Kevin Grittner pointed out that there's a potentially huge number of 
writes we incur over the life of a tuple [3]:


(1) WAL log the insert.
(2) Write the tuple.
(3) Hint and rewrite the tuple.
(4) WAL log the freeze of the tuple.
(5) Rewrite the frozen tuple.
(6) WAL-log the delete.
(7) Rewrite the deleted tuple.
(8) Prune and rewrite the page.
(9) Free line pointers and rewrite the page.

He mentioned that a lot of these writes could be combined if they 
happened close enough together. We can further add an all-visible state 
in at 3.5.


Instead of simply adding all-frozen information to the VM we could 
instead store 4 different page states and potentially improve a lot of 
different cleanup woes at one time.


Unfortunately, the states I came up with using existing semantics don't 
look hugely useful[4], but if we take Robert's idea and make all-visible 
mean all-frozen, we can do much better:


0: Newly inserted tuples
Tracking this state allows us to aggressively set hint bits.

1: Newly deleted
There are tuples that have been deleted but not pruned. There may also 
be newly inserted tuples that need hinting (state 0).


Similar to state 0, we'd want to be fairly aggressive with these pages, 
because as soon as the deleting XID is committed and older than all 
snapshots we can prune. Because we can prune without hitting indexes, 
this is still a fairly cheap operation, though not as cheap as 0.


2: Fully hinted, not frozen
This is the really painful state to clean up, because we have to deal 
with indexes. We must enter this state after being in 1.


3: All-visible-frozen
Every tuple on the page is visible and frozen. Pages in this state need 
no maintenance at all. We might be able to enter this state directly 
from state 0.



BENEFITS
This tracking should help at least 3 problems: the need to set hint bits 
after insert, SELECT queries doing pruning (Simon's recent complaint), 
and needing to scan an entire table for freezing.


The improvement in hinting and pruning is based on the idea that 
normally there would not be a lot of pages in state 0 or 1, and pages 
that were in those states are very likely to still be in disk cache (if 
not shared buffers). That means we can have a background process (or 2) 
that is very aggressive at targeting pages in these states.


Not needing to scan everything that's frozen is thanks to state 3. I 
think it's OK (at least for now) if only vacuum puts pages into this 
state, which means it can actually freeze the tuples when it does it 
(thanks to 37484ad we won't lose forensic data doing this). That means 
there's no extra work necessary by a foreground process that's dirtying 
a page.


Because of 37484ad, I think as part of this we should also deprecate 
vacuum_freeze_min_age, or at least change it's behavior. AFAIK the only 
objection to aggressive freezing was loss of forensic data, and that's 
gone now. So vacuum (and presumably the bg process(es) than handle state 
0 and 1) should freeze tuples if it would allow the whole page to be 
frozen. Possibly it should just do it any time it's dirtying the page. 
(We could actually do this right now; it would let us eliminate the GUC, 
but I'm not sure there'd be other benefit without the rest of this.)


DOWNSIDES
This does mean doubling the size of the VM. It would still be 32,000 
times smaller than the heap with 8k pages (and 128,000 times smaller 
with the common warehouse 32k page size), so I suspect this is a 
non-issue, but it's worth mentioning. It might have some effect on a 
almost entirely read-only system; but I suspect in most other cases the 
other benefits will outweigh this.


This approach still does nothing to help the index related activity in 
vacuum. My gut says state 2 should be further split; but I'm not sure 
why. Perhaps if we had another state we could do something more 
intelligent with index cleanup...


This might put a lot more read pressure on the VMs. We might want some 
way to summarize per-table VMs (or ranges of VMs) so that we're not 
constantly scanning them.


We'd still have to freeze, as opposed to what might be possible with 
XID-LSN. OTOH, most of the changes to do this would be limited to 
current VM code and callers. I don't think vacuum itself would need a 
lot of changes, and I hope the BG code for state 0/1 would be that 
complicated; it shouldn't need the complexity of autovacuum or vacuum. 
So this should be much lower risk than something like XID-LSN.


So... what am I missing? :)

[1] 
http://postgresql.org/message-id/flat/20140912135413.gk4...@eldon.alvh.no-ip.org
[2] 
http://postgresql.org/message-id/flat/2011829201.2201963.1429726992897.javamail.ya...@mail.yahoo.com
[3] 
http://postgresql.org/message-id/771351984.2266772.1429728671811.javamail.ya...@mail.yahoo.com


[4]
1a: All-visible
What we have today. Page still needs to be visited for freeze, but has 
no newly 

Re: [HACKERS] Reducing tuple overhead

2015-04-23 Thread Jim Nasby

On 4/23/15 11:45 AM, Petr Jelinek wrote:

On 23/04/15 18:24, Andres Freund wrote:

Whether that's feasible complexity wise is debatable, but it's certainly
possible.


I do wonder what, in realistic cases, is actually the bigger contributor
to the overhead. The tuple header or the padding we liberally add in
many cases...



The logical ordering patch + auto optimizations of column layout on
table creation/rewrite might help partially there.

But what seems to be clear is that we need more in depth analysis of
what really contributes most to the tuple size in various use-cases and
then we can debate what we can do about it.


Also, what Robert posted was that while we started at something like 
15%-30% larger, we ended the test at 80% larger. That makes me think 
that the bigger win is not in reducing tuple size but tackling bloat.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-23 Thread Robert Haas
On Thu, Apr 23, 2015 at 10:44 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 4/23/15 8:25 AM, Robert Haas wrote:
 Some users are partitioning tables just so that each
 partition can be autovac'd separately.  That really shouldn't be
 required.

 Are they doing this for improved heap scan performance? Index scan
 performance? If the table wasn't partitioned, would they need more than one
 pass through the indexes due to exhausting maintenance_work_mem?

I don't know of anyone with a properly-configured system who needs
more than one pass through the indexes due to exhausting
maintenance_work_mem.  The issue is that you have to vacuum a table
frequently enough to avoid accumulating bloat.  The frequency with
which you need to vacuum varies depending on the size of the table and
how frequently it's updated.  However, a large, heavily-updated table
can take long enough to vacuum that, by the time you get done, it's
already overdue to be vacuumed again.  That's a problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread Robert Haas
On Thu, Apr 23, 2015 at 3:57 PM, Andres Freund and...@anarazel.de wrote:
 On 2015-04-23 15:46:20 -0400, Robert Haas wrote:
 Well, we've made backward-incompatible changes before.  Not to this
 specific thing, but in general.  I don't think there's anything
 preventing us from doing so here, except that we don't want to annoy
 too many users.

 I think the number of users that have done this, and haven't yet
 (knowing or unknowningly) been bitten by it is pretty low. In that
 scenario it seems much better to break compatibility given that it's
 pretty easy to fix during restore (just precreate the tablespace).  It's
 not something you have to retest a whole application for.

 If you want to avoid that one error you can still do pg_dumpall
 --globals, edit and run that script, and only then restore the the
 actual databases.

But pg_upgrade automates all that, so you can't use pg_upgrade in that
case.  If we add a GUC as I suggested, you can still use pg_upgrade.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Michael Paquier
On Thu, Apr 23, 2015 at 10:42 PM, Robert Haas wrote:
 On Thu, Apr 23, 2015 at 4:19 AM, Simon Riggs  wrote:
 We only need a freeze/backup map for larger relations. So if we map 1000
 blocks per map page, we skip having a map at all when size  1000.

 Agreed.  We might also want to map multiple blocks per map slot - e.g.
 one slot per 32 blocks.  That would keep the map quite small even for
 very large relations, and would not compromise efficiency that much
 since reading 256kB sequentially probably takes only a little longer
 than reading 8kB.

 I think the idea of integrating the freeze map into the VM fork is
 also worth considering.  Then, the incremental backup map could be
 optional; if you don't want incremental backup, you can shut it off
 and have less overhead.

When I read that I think about something configurable at
relation-level.There are cases where you may want to have more
granularity of this information at block level by having the VM slots
to track less blocks than 32, and vice-versa.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [committers] pgsql: RLS fixes, new hooks, and new test module

2015-04-23 Thread Christian Ullrich

* Stephen Frost wrote:


RLS fixes, new hooks, and new test module


The buildfarm says that with -DCLOBBER_CACHE_ALWAYS, the RLS violations 
get blamed on the wrong tables. Mostly, they are catalogs (I have seen 
pg_opclass, pg_am, and pg_amproc), but some also come up with binary 
garbage instead, e.g.


- 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=markhordt=2015-04-23%2000%3A00%3A12
- 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=friarbirddt=2015-04-23%2004%3A20%3A00
- 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=tickdt=2015-04-22%2019%3A56%3A53


--
Christian




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding more information about process(es) cpu and memory usage

2015-04-23 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 On Thu, Apr 23, 2015 at 10:17 AM, Heikki Linnakangas hlinn...@iki.fi
 wrote:
 In a nutshell, I don't think PostgreSQL should get involved in that...

 I have often wanted an SQL function which would expose the back-end's
 rusage statistics to the front-end. This could support a \timing feature
 variant to psql that reports more than just wall-clock time.

 I don't use RDS, and use shell access and top (and strace and gdb)
 quite enthusiastically, but still it is a pain to correlate any given
 front-end to any given back-end.

 Would such an addition to core be welcome?

The reason nobody's gotten around to that in the last fifteen years is
that per-process rusage isn't actually all that interesting; there's
too much that happens in background daemons, for instance.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Peter Geoghegan
On Thu, Apr 23, 2015 at 12:45 PM, Peter Geoghegan p...@heroku.com wrote:
 * We need to figure out the tuple lock strength details. I think this
 is doable, but it is the greatest challenge to committing ON CONFLICT
 UPDATE at this point. Andres feels that we should require no greater
 lock strength than an equivalent UPDATE. I suggest we get to this
 after committing the IGNORE variant. We probably need to discuss this
 some more.

 I want to see a clear way forward before we commit parts. It doesn't
 have to be completely iron-clad, but the way forward should be pretty
 clear. What's the problem you're seeing right now making this work? A
 couple days on jabber you seemed to see a way doing this?

 I was really just identifying it as the biggest problem the patch now
 faces, and I want to face those issues down ASAP. Of course, that's
 good, because as you say it is a small problem in an absolute sense.
 The second most significant open item is rebasing on top of the recent
 RLS changes, IMV.


OK, I pushed out code to do this a while ago. I must admit that I
probably significantly over-estimated the difficulty of doing this.

With Heikki's problematic commit reverted this works fine  (I have not
actually reverted it myself...I'll leave it to Heikki to clean that up
when he gets around to it). The usually jjanes_upsert stress tests
show up no problems.

Curious about what you think about my proposal to go back to putting
the inference specification WHERE clause within the parenthesis, since
this solves several problems, including clarifying to users that the
predicate is part of the inference clause.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-04-23 Thread Pavel Stehule
2015-04-23 15:47 GMT+02:00 Robert Haas robertmh...@gmail.com:

 On Thu, Apr 23, 2015 at 4:56 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  I don't see a contradiction. There is clean agreement, so ERROR level
 should
  to show the context. NOTICE and WARNINGs doesn't need it - and there is a
  backward compatibility and usability reasons don't do it.

 Whether notices and warnings need it is a matter of opinion.  I don't
 think your idea is bad, and it might be a good rule of thumb in many
 cases, but I slightly prefer Marko's approach of adding a new option.


I am not sure if I understand to you.

please, can you write more about your idea?





 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Andres Freund
On 2015-04-23 14:34:02 +0100, Geoff Winkless wrote:
  A syntax error. DO is a reserved keyword. Update is just unreserved (and
  thus can be used as a column label). Ignore is unreserved with the patch
  and was unreserved before.  We obviously can make both reserved, but of so
  we have to do it for real, not by hiding the conflicts
 
 
 Sorry, I misunderstood: so it's not the fact that it can't be used as a
 column label (because it can) but the fact that it can't then be referenced
 within a WHERE clause without quoting

Meh. You can use any keyword in quotes - because then they're not
keywords anymore.


 INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE
 update UPDATE update=1
 
 but I would have to do
 
 INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE
 do UPDATE do=1

Yes.

 ​Maybe I'm misreading it, but isn't index_predicate meant to be inside the
 brackets?
 
 http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html

That has changed since. And for good reason: It's pretty to have the
WHERE clause inside the brackets when that's not the case for CREATE
INDEX. But more importantly with multiple columns for stuff like ON
CONFLICT (a, b WHERE foo) it's unclear where the WHERE is actually
attached to. We have that problem with aggregates and it repeatedly
caused confusion.

 ​As I said, it's my personal belief that anyone using keywords (of any
 kind) unquoted deserves what they get, but I see your point.​

Given that IGNORE isn't even a keyword right now (9.5 master) that
policy isn't particularly meaningful anyway.

 I think I object to the fact that you're talking about adding extra
 syntactic sugar to work around a parser-implementation problem, not an
 actual syntax problem (since UPDATE SET is unambiguous, isn't it?).

I fail to see the point of such an objection. We have an LALR parser
(generated by bison). That implies a certain expressiveness. You're
suggesting that we change to a different kind of parser?

I don't think it's necessarily unambiguous. I'm not particularly
motivated to prove it though - the point is that we rely on bison to
prevent ambiguities. That only works if we're listening. And not if
we're silencing warnings about ambiguities over the whole grammar.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Code paths where LWLock should be released on failure

2015-04-23 Thread Kevin Grittner
Michael Paquier michael.paqu...@gmail.com wrote:

 I have also been surprised by the inconsistencies particularly in
 predicate.c or other places regarding LWLock releases. Sometimes
 they are released on failure, sometimes not.

Those are not needed for correctness; they are a micro-optimization
for SerializableXactHashLock, which is a particularly hot LWLock
when serializable transactions are used.  There are a few places
where this is not done for this lock, which look like the places
where changes were made late in the development of the feature.
Most probably we went through at one point and added them before
throwing errors when that lock was held (as an optimization), but
neglected to do that with subsequent changes.  Removing the ones
that are there, or adding to the places we could, would not affect
correctness; it might make a very small difference in how quickly a
transaction which is going to cancel due to a serialization failure
gets out of the way of other transactions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] anole - test case sha2 fails on all branches

2015-04-23 Thread Robert Haas
On Thu, Apr 23, 2015 at 6:13 AM, Sandeep Thakkar
sandeep.thak...@enterprisedb.com wrote:
 The test case sha2 in contrib pgcrypto module is failing with a diff on
 anole because the results file contains the additional lines as:
 --

 + WARNING:  detected write past chunk end in ExprContext 6021cbb0

That's clearly a bug, but it's a little hard to tell which commit
introduced it because anole hasn't reported in so long.  :-(

I notice that friarbird, leech, markhor, and jaguarundi are failing
with changes that appear to be clearly RLS-related:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=friarbirddt=2015-04-23%2004%3A20%3A00
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leechdt=2015-04-23%2003%3A47%3A14
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=markhordt=2015-04-23%2000%3A00%3A12
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=jaguarundidt=2015-04-22%2020%3A03%3A09

There are lots of machines failing in pg_upgradeCheck, but I don't see
details of the failures in the logs.

I suppose whatever's going wrong on anole must be the result of a
commit that was back-patched all the way to 9.0, post November 30th,
and probably touching contrib/pgcrypto.  That seems to include the
following four commits:

Author: Noah Misch n...@leadboat.com
Branch: master [8b59672d8] 2015-02-02 10:00:45 -0500
Branch: REL9_4_STABLE Release: REL9_4_1 [258e294db] 2015-02-02 10:00:49 -0500
Branch: REL9_3_STABLE Release: REL9_3_6 [a558ad3a7] 2015-02-02 10:00:50 -0500
Branch: REL9_2_STABLE Release: REL9_2_10 [d1972da8c] 2015-02-02 10:00:51 -0500
Branch: REL9_1_STABLE Release: REL9_1_15 [8d412e02e] 2015-02-02 10:00:52 -0500
Branch: REL9_0_STABLE Release: REL9_0_19 [0a3ee8a5f] 2015-02-02 10:00:52 -0500

Cherry-pick security-relevant fixes from upstream imath library.

This covers alterations to buffer sizing and zeroing made between imath
1.3 and imath 1.20.  Valgrind Memcheck identified the buffer overruns
and reliance on uninitialized data; their exploit potential is unknown.
Builds specifying --with-openssl are unaffected, because they use the
OpenSSL BIGNUM facility instead of imath.  Back-patch to 9.0 (all
supported versions).

Security: CVE-2015-0243

Author: Noah Misch n...@leadboat.com
Branch: master [1dc755158] 2015-02-02 10:00:45 -0500
Branch: REL9_4_STABLE Release: REL9_4_1 [82806cf4e] 2015-02-02 10:00:49 -0500
Branch: REL9_3_STABLE Release: REL9_3_6 [6994f0790] 2015-02-02 10:00:50 -0500
Branch: REL9_2_STABLE Release: REL9_2_10 [d95ebe0ac] 2015-02-02 10:00:51 -0500
Branch: REL9_1_STABLE Release: REL9_1_15 [11f738a8a] 2015-02-02 10:00:51 -0500
Branch: REL9_0_STABLE Release: REL9_0_19 [ce6f261cd] 2015-02-02 10:00:52 -0500

Fix buffer overrun after incomplete read in pullf_read_max().

Most callers pass a stack buffer.  The ensuing stack smash can crash the
server, and we have not ruled out the viability of attacks that lead to
privilege escalation.  Back-patch to 9.0 (all supported versions).

Marko Tiikkaja

Security: CVE-2015-0243

Author: Tom Lane t...@sss.pgh.pa.usBranch: master [a59ee8819]
2015-01-30 13:05:30 -0500
Branch: REL9_4_STABLE Release: REL9_4_1 [70da7aeba] 2015-01-30 13:04:59 -0500
Branch: REL9_3_STABLE Release: REL9_3_6 [f08cf8ad9] 2015-01-30 13:05:01 -0500
Branch: REL9_2_STABLE Release: REL9_2_10 [a97dfdfd9] 2015-01-30 13:05:04 -0500
Branch: REL9_1_STABLE Release: REL9_1_15 [8f51c432c] 2015-01-30 13:05:07 -0500
Branch: REL9_0_STABLE Release: REL9_0_19 [7c41a32b3] 2015-01-30 13:05:09 -0500

Fix Coverity warning about contrib/pgcrypto's mdc_finish().

Coverity points out that mdc_finish returns a pointer to a local buffer
(which of course is gone as soon as the function returns), leaving open
a risk of misbehaviors possibly as bad as a stack overwrite.

In reality, the only possible call site is in process_data_packets()
which does not examine the returned pointer at all.  So there's no
live bug, but nonetheless the code is confusing and risky.  Refactor
to avoid the issue by letting process_data_packets() call mdc_finish()
directly instead of going through the pullf_read() API.

Although this is only cosmetic, it seems good to back-patch so that
the logic in pgp-decrypt.c stays in sync across all branches.

Marko Kreen

Author: Tom Lane t...@sss.pgh.pa.us
Branch: master [586dd5d6a] 2015-01-24 13:05:42 -0500
Branch: REL9_4_STABLE Release: REL9_4_1 [d51d4ff31] 2015-01-24 13:05:45 -0500
Branch: REL9_3_STABLE Release: REL9_3_6 [7240f9200] 2015-01-24 13:05:49 -0500
Branch: REL9_2_STABLE Release: REL9_2_10 [502e5f9c3] 2015-01-24 13:05:53 -0500
Branch: REL9_1_STABLE Release: REL9_1_15 [b00a08859] 2015-01-24 13:05:56 -0500
Branch: REL9_0_STABLE Release: REL9_0_19 [3a3ee655c] 2015-01-24 13:05:58 -0500

Replace a bunch more uses of strncpy() with safer coding.

strncpy() has a well-deserved reputation for being unsafe, so make an
effort to get rid of nearly all occurrences in HEAD.

 

Re: [HACKERS] Split the 'Server Programming' chapter into two?

2015-04-23 Thread Robert Haas
On Thu, Apr 23, 2015 at 8:21 AM, Andres Freund and...@anarazel.de wrote:
 While playing around with where exactly to put the replication
 origin/progress docs I once more noticed that the 'Server Programming'
 book is a mix of different topics.

 It currently contains:
 35. Extending SQL
 36. Triggers
 37. Event Triggers
 38. The Rule System
 39. Procedural Languages
 40. PL/pgSQL - SQL Procedural Language
 41. PL/Tcl - Tcl Procedural Language
 42. PL/Perl - Perl Procedural Language
 43. PL/Python - Python Procedural Language
 44. Server Programming Interface
 45. Background Worker Processes
 46. Logical Decoding
 47. Replication Progress Tracking

 To me at least 44 - 47 don't really fit well to the rest. I think we
 either should invent a new category for them, or move them to
 'Internals'. Maybe we could introduce 'Extending the Server' category
 for those and a couple more? Candidates for that least seem to be
 52. Writing A Procedural Language Handler
 53. Writing A Foreign Data Wrapper
 54. Writing A Custom Scan Provider

I like the extending the server idea.  Maybe Server Extensions.

 Also, shouldn't there at least be a link to
 http://www.postgresql.org/docs/devel/static/xfunc-sql.html in
 http://www.postgresql.org/docs/devel/static/xplang.html ?

Wouldn't hurt.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-04-23 Thread Robert Haas
On Thu, Apr 23, 2015 at 9:55 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 On Thu, Apr 23, 2015 at 4:56 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  I don't see a contradiction. There is clean agreement, so ERROR level
  should
  to show the context. NOTICE and WARNINGs doesn't need it - and there is
  a
  backward compatibility and usability reasons don't do it.

 Whether notices and warnings need it is a matter of opinion.  I don't
 think your idea is bad, and it might be a good rule of thumb in many
 cases, but I slightly prefer Marko's approach of adding a new option.

 I am not sure if I understand to you.

 please, can you write more about your idea?

Your idea, as I understand it, is that for logs at severity levels
lower than ERROR, we can always emit the context, because it's not
necessary.  But I'm not sure that's right: some people might find that
context helpful.  If, as Marko proposes, we add an explicit option,
then everyone can choose the behavior that is right for them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Code paths where LWLock should be released on failure

2015-04-23 Thread Michael Paquier
On Thu, Apr 23, 2015 at 2:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 After looking at bug #13128, I have been looking at the code around
 LWLockAcquire/Release to see if there are similar issues elsewhere.
 Here are my findings:

 IIRC, we automatically release all LWLocks at the start of error recovery,
 so I rather doubt that any of this is necessary.

Perhaps this is purely cosmetic for most those code, but not all... if
you have a look at #13128, not releasing TwoPhaseStateLock causes a
deadlock on startup when max_prepared_transactions does not have
enough slots. I have also been surprised by the inconsistencies
particularly in predicate.c or other places regarding LWLock releases.
Sometimes they are released on failure, sometimes not.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Sawada Masahiko
On Thu, Apr 23, 2015 at 3:24 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Apr 22, 2015 at 12:39 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 The thing that made me nervous about that approach is that it made the LSN
 of each page critical information. If you somehow zeroed out the LSN, you
 could no longer tell which pages are frozen and which are not. I'm sure it
 could be made to work - and I got it working to some degree anyway - but
 it's a bit scary. It's similar to the multixid changes in 9.3: multixids
 also used to be data that you can just zap at restart, and when we changed
 the rules so that you lose data if you lose multixids, we got trouble. Now,
 LSNs are much simpler, and there wouldn't be anything like the
 multioffset/member SLRUs that you'd have to keep around forever or vacuum,
 but still..

 LSNs are already pretty critical.  If they're in the future, you can't
 flush those pages.  Ever.  And if they're wrong in either direction,
 crash recovery is broken.  But it's still worth thinking about ways
 that we could make this more robust.

 I keep coming back to the idea of treating any page that is marked as
 all-visible as frozen, and deferring freezing until the page is again
 modified.  The big downside of this is that if the page is set as
 all-visible and then immediately thereafter modified, it sucks to have
 to freeze when the XIDs in the page are still present in CLOG.  But if
 we could determine from the LSN that the XIDs in the page are new
 enough to still be considered valid, then we could skip freezing in
 those cases and only do it when the page is old.  That way, if
 somebody zeroed out the LSN (why, oh why?) the worst that would happen
 is that we'd do some extra freezing when the page was next modified.

In your idea, if we have WORM (write-once read-many) table then these
tuples in page would not be frozen at all unless we do VACUUM FREEZE.
Also in this situation, from the second time VACUUM FREEZE would need
to scan only pages of increment from last freezing, we could reduce
I/O, but we would still need to do explicitly freezing for
anti-wrapping as in the past. WORM table has huge data in general, and
that data would be increase rapidly, so it would also be expensive.


 I would feel safer if we added a completely new epoch counter to the page
 header, instead of reusing LSNs. But as we all know, changing the page
 format is a problem for in-place upgrade, and takes some space too.

 Yeah.  We have a serious need to reduce the size of our on-disk
 format.  On a TPC-C-like workload Jan Wieck recently tested, our data
 set was 34% larger than another database at the beginning of the test,
 and 80% larger by the end of the test.  And we did twice the disk
 writes.  See The Elephants in the Room.pdf at
 https://sites.google.com/site/robertmhaas/presentations


Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread Jeff Davis
On Wed, 2015-04-22 at 20:35 -0700, David G. Johnston wrote:

 But the fact that column b has the data type unknown is only a
 warning - not an error.
 
I get an error:

postgres=# SELECT '  '::text = 'a';
 ?column? 
--
 f
(1 row)

postgres=# SELECT a=b FROM (SELECT ''::text, '  ') x(a,b);
ERROR:  failed to find conversion function from unknown to text

So that means the column reference b is treated differently than the
literal. Here I don't mean a reference to an actual column of a real
table, just an identifier (b) that parses as a columnref.

Creating the table gives you a warning (not an error), but I think that
was a poor example for me to choose, and not important to my point.
 
 This seems to be a case of the common problem (or, at least recently
 mentioned) where type conversion only deals with data and not context.
 
 
 http://www.postgresql.org/message-id/CADx9qBmVPQvSH3
 +2cH4cwwPmphW1mE18e=wumlfuc-qz-t7...@mail.gmail.com
 
 
I think that is a different problem. That's a runtime type conversion
error (execution time), and I'm talking about something happening at
parse analysis time.

 
 but this too works - which is why the implicit cast concept above
 fails (I'm leaving it since the thought process may help in
 understanding):
 
 
 SELECT 1 = '1';
 
 
 From which I infer that an unknown literal is allowed to be fed
 directly into a type's input function to facilitate a direct coercion.

Yes, I believe that's what's happening. When we use an unknown literal,
it's acting more like a value constructor and will pass it to the type
input function. When it's a columnref, even if unknown, it tries to cast
it and fails.

But that is very confusing. In the example at the top of this email, it
seems like the second query should be equivalent to the first, or even
that postgres should be able to rewrite the second into the first. But
the second query fails where the first succeeds.


 At this point...backward compatibility?

Backwards compatibility of what queries? I guess the ones that return
unknowns to the client or create tables with unknown columns?

 create table a(u) as select '1';
 
 
 WARNING: column u has type unknown​
 DETAIL:  Proceeding with relation creation anyway.
 
 
 Related question: was there ever a time when the above failed instead
 of just supplying a warning?

Not that I recall.



 ​My gut reaction is if you feel strongly enough to add some additional
 documentation or warnings/hints/details related to this topic they
 probably would get put in; but disallowing unknown as first-class
 type is likely to fail to pass a cost-benefit evaluation.

I'm not proposing that we eliminate unknown. I just think columnrefs and
literals should behave consistently. If we really don't want unknown
columnrefs, it seems like we could at least throw a better error.

If we were starting from scratch, I'd also not return unknown to the
client, but we have to worry about the backwards compatibility.

 Distinguishing between untyped literals and unknown type literals
 seems promising concept to aid in understanding the difference in the
 face of not being able (or wanting) to actually change the behavior.

Not sure I understand that proposal, can you elaborate?

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread Robert Haas
On Wed, Apr 22, 2015 at 10:41 PM, Bruce Momjian br...@momjian.us wrote:
 What is a real problem is that we don't block creating tablespaces
 anywhere at all, including in obviously problematic places like the
 transaction log directory:

 josh=# create tablespace tbl2 location '/home/josh/pg94/data/pg_xlog/';
 CREATE TABLESPACE

 It really seems like we ought to block *THAT*.  Of course, if we block
 tablespace creation in PGDATA generally, then that's covered.

 I have developed the attached patch to warn about creating tablespaces
 inside the data directory.  The case this doesn't catch is referencing a
 symbolic link that points to the same directory.  We can't make it an
 error so people can use pg_upgrade these setups.  This would be for 9.5
 only.

I think this is a good thing to do, but I sure wish we could go
further and block it completely.  That may require more thought than
we have time to put in at this stage of the release cycle, though, so
+1 for doing at least this much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread Kyotaro HORIGUCHI
Sorry, the patch had obvious bug..

-+  
  Int32GetDatum(inputTypeMod),
++  
  Int32GetDatum(targetTypeMod),

regards,


 Hello, I think this is a bug.
 
 The core of this problem is that coerce_type() fails for Var of
 type UNKNOWNOID.
 
 The comment for the function says that,
 
  * The caller should already have determined that the coercion is possible;
  * see can_coerce_type.
 
 But can_coerce_type() should say it's possible to convert from
 unknown to any type as it doesn't see the target node type. I
 think this as an inconsistency between can_coerce_type and
 coerce_type. So making this consistent would be right way.
 
 Concerning only this issue, putting on-the-fly conversion for
 unkown nonconstant as attached patch worked for me. I'm not so
 confident on this, though..
 
 regards,
 
 At Wed, 22 Apr 2015 23:26:43 -0700, Jeff Davis pg...@j-davis.com wrote in 
 1429770403.4604.22.camel@jeff-desktop
  On Wed, 2015-04-22 at 20:35 -0700, David G. Johnston wrote:
  
   But the fact that column b has the data type unknown is only a
   warning - not an error.
   
  I get an error:
  
  postgres=# SELECT '  '::text = 'a';
   ?column? 
  --
   f
  (1 row)
  
  postgres=# SELECT a=b FROM (SELECT ''::text, '  ') x(a,b);
  ERROR:  failed to find conversion function from unknown to text

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c
index a4e494b..2e3a43c 100644
--- a/src/backend/parser/parse_coerce.c
+++ b/src/backend/parser/parse_coerce.c
@@ -221,7 +221,7 @@ coerce_type(ParseState *pstate, Node *node,
 			return node;
 		}
 	}
-	if (inputTypeId == UNKNOWNOID  IsA(node, Const))
+	if (inputTypeId == UNKNOWNOID)
 	{
 		/*
 		 * Input is a string constant with previously undetermined type. Apply
@@ -275,6 +275,29 @@ coerce_type(ParseState *pstate, Node *node,
 
 		targetType = typeidType(baseTypeId);
 
+		/* Perform on the fly conversion for non-constants */
+		if(!IsA(node, Const))
+		{
+			Form_pg_type typform = (Form_pg_type) GETSTRUCT(targetType);
+			Node *result = 
+(Node*) makeFuncExpr(typform-typinput,
+		 targetTypeId,
+		 list_make3(node,
+	makeConst(OIDOID, -1, InvalidOid,
+			  sizeof(Oid),
+			  ObjectIdGetDatum(InvalidOid),
+			  false, true),
+	makeConst(INT4OID, -1, InvalidOid,
+			  sizeof(uint32),
+			  Int32GetDatum(targetTypeMod),
+			  false, true)),
+		  InvalidOid, InvalidOid,
+		  COERCE_IMPLICIT_CAST);
+			ReleaseSysCache(targetType);
+
+			return result;
+		}
+
 		newcon-consttype = baseTypeId;
 		newcon-consttypmod = inputTypeMod;
 		newcon-constcollid = typeTypeCollation(targetType);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Simon Riggs
On 21 April 2015 at 22:21, Robert Haas robertmh...@gmail.com wrote:


 I'm not saying those ideas don't have problems, because they do.  But
 I think they are worth further exploring.  The main reason I gave up
 on that is because Heikki was working on the XID-to-LSN mapping stuff.
 That seemed like a better approach than either of the above, so as
 long as Heikki was working on that, there wasn't much reason to pursue
 more lowbrow approaches.  Clearly, though, we need to do something
 about this.  Freezing is a big problem for lots of users.

 All that having been said, I don't think adding a new fork is a good
 approach.  We already have problems pretty commonly where our
 customers complain about running out of inodes.  Adding another fork
 for every table would exacerbate that problem considerably.


We were talking about having an incremental backup map also. Which sounds a
lot like the freeze map.

XID-to-LSN sounded cool but was complex. If we need the map for backup
purposes, we may as well do it the simple way and hit both birds at once.

We only need a freeze/backup map for larger relations. So if we map 1000
blocks per map page, we skip having a map at all when size  1000.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread David G. Johnston
On Thu, Apr 23, 2015 at 1:35 AM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

 Very sorry for the trash..

 ===
 Now I found a comment at just where I patched,

   * XXX if the typinput function is not immutable, we really ought to
   * postpone evaluation of the function call until runtime. But there
   * is no way to represent a typinput function call as an expression
   * tree, because C-string values are not Datums. (XXX This *is*
   * possible as of 7.3, do we want to do it?)

 - Is it OK to *now* we can do this?
 + Is it OK to regard that we can do this *now*?


In this patch or a different one?  Does this comment have anything to do
with the concern of this thread?

David J.
​


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread David G. Johnston
On Thu, Apr 23, 2015 at 1:07 AM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

 Hello, I think this is a bug.

 The core of this problem is that coerce_type() fails for Var of
 type UNKNOWNOID.

 The comment for the function says that,

  * The caller should already have determined that the coercion is
 possible;
  * see can_coerce_type.

 But can_coerce_type() should say it's possible to convert from
 unknown to any type as it doesn't see the target node type. I
 think this as an inconsistency between can_coerce_type and
 coerce_type. So making this consistent would be right way.


​You have two pieces of contradictory knowledge - how are you picking which
one to fix?​

Concerning only this issue, putting on-the-fly conversion for
 unkown nonconstant as attached patch worked for me. I'm not so
 confident on this, though..


​Confident about what aspect - the safety of the patch itself or whether
the conversion is even a good idea?​

David J.​


Re: [HACKERS] Code paths where LWLock should be released on failure

2015-04-23 Thread Andres Freund
Hi,m

On 2015-04-23 13:51:57 +0900, Michael Paquier wrote:
 After looking at bug #13128, I have been looking at the code around
 LWLockAcquire/Release to see if there are similar issues elsewhere.
 Here are my findings:

Afaics all of these should actually be handled by the paths that release
locks upon error.

 5) In ReplicationSlotCreate@slot.c, I would think that
 ReplicationSlotAllocationLock should be released when all the locks
 are in use. Similarly, in  ReplicationSlotDropAcquired,
 ReplicationSlotAllocationLock should be released when !fail_softly.
 SaveSlotToPath could also be made safer when a file could not be
 created.

When called via SQL these are released by normal error processing, in
walsender they're released by WalSndErrorCleanup().

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread Kyotaro HORIGUCHI
Hello, I think this is a bug.

The core of this problem is that coerce_type() fails for Var of
type UNKNOWNOID.

The comment for the function says that,

 * The caller should already have determined that the coercion is possible;
 * see can_coerce_type.

But can_coerce_type() should say it's possible to convert from
unknown to any type as it doesn't see the target node type. I
think this as an inconsistency between can_coerce_type and
coerce_type. So making this consistent would be right way.

Concerning only this issue, putting on-the-fly conversion for
unkown nonconstant as attached patch worked for me. I'm not so
confident on this, though..

regards,

At Wed, 22 Apr 2015 23:26:43 -0700, Jeff Davis pg...@j-davis.com wrote in 
1429770403.4604.22.camel@jeff-desktop
 On Wed, 2015-04-22 at 20:35 -0700, David G. Johnston wrote:
 
  But the fact that column b has the data type unknown is only a
  warning - not an error.
  
 I get an error:
 
 postgres=# SELECT '  '::text = 'a';
  ?column? 
 --
  f
 (1 row)
 
 postgres=# SELECT a=b FROM (SELECT ''::text, '  ') x(a,b);
 ERROR:  failed to find conversion function from unknown to text
 
 So that means the column reference b is treated differently than the
 literal. Here I don't mean a reference to an actual column of a real
 table, just an identifier (b) that parses as a columnref.
 
 Creating the table gives you a warning (not an error), but I think that
 was a poor example for me to choose, and not important to my point.
  
  This seems to be a case of the common problem (or, at least recently
  mentioned) where type conversion only deals with data and not context.
  
  
  http://www.postgresql.org/message-id/CADx9qBmVPQvSH3
  +2cH4cwwPmphW1mE18e=wumlfuc-qz-t7...@mail.gmail.com
  
  
 I think that is a different problem. That's a runtime type conversion
 error (execution time), and I'm talking about something happening at
 parse analysis time.
 
  
  but this too works - which is why the implicit cast concept above
  fails (I'm leaving it since the thought process may help in
  understanding):
  
  
  SELECT 1 = '1';
  
  
  From which I infer that an unknown literal is allowed to be fed
  directly into a type's input function to facilitate a direct coercion.
 
 Yes, I believe that's what's happening. When we use an unknown literal,
 it's acting more like a value constructor and will pass it to the type
 input function. When it's a columnref, even if unknown, it tries to cast
 it and fails.
 
 But that is very confusing. In the example at the top of this email, it
 seems like the second query should be equivalent to the first, or even
 that postgres should be able to rewrite the second into the first. But
 the second query fails where the first succeeds.
 
 
  At this point...backward compatibility?
 
 Backwards compatibility of what queries? I guess the ones that return
 unknowns to the client or create tables with unknown columns?
 
  create table a(u) as select '1';
  
  
  WARNING: column u has type unknown​
  DETAIL:  Proceeding with relation creation anyway.
  
  
  Related question: was there ever a time when the above failed instead
  of just supplying a warning?
 
 Not that I recall.
 
 
 
  ​My gut reaction is if you feel strongly enough to add some additional
  documentation or warnings/hints/details related to this topic they
  probably would get put in; but disallowing unknown as first-class
  type is likely to fail to pass a cost-benefit evaluation.
 
 I'm not proposing that we eliminate unknown. I just think columnrefs and
 literals should behave consistently. If we really don't want unknown
 columnrefs, it seems like we could at least throw a better error.
 
 If we were starting from scratch, I'd also not return unknown to the
 client, but we have to worry about the backwards compatibility.
 
  Distinguishing between untyped literals and unknown type literals
  seems promising concept to aid in understanding the difference in the
  face of not being able (or wanting) to actually change the behavior.
 
 Not sure I understand that proposal, can you elaborate?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c
index a4e494b..b64d40b 100644
--- a/src/backend/parser/parse_coerce.c
+++ b/src/backend/parser/parse_coerce.c
@@ -221,7 +221,7 @@ coerce_type(ParseState *pstate, Node *node,
 			return node;
 		}
 	}
-	if (inputTypeId == UNKNOWNOID  IsA(node, Const))
+	if (inputTypeId == UNKNOWNOID)
 	{
 		/*
 		 * Input is a string constant with previously undetermined type. Apply
@@ -275,6 +275,29 @@ coerce_type(ParseState *pstate, Node *node,
 
 		targetType = typeidType(baseTypeId);
 
+		/* Perform on the fly conversion for non-constants */
+		if(!IsA(node, Const))
+		{
+			Form_pg_type typform = (Form_pg_type) GETSTRUCT(targetType);
+			Node *result = 
+(Node*) makeFuncExpr(typform-typinput,
+		 targetTypeId,
+		 list_make3(node,
+	

Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread David G. Johnston
On Wednesday, April 22, 2015, Jeff Davis pg...@j-davis.com wrote:

 On Wed, 2015-04-22 at 20:35 -0700, David G. Johnston wrote:

  But the fact that column b has the data type unknown is only a
  warning - not an error.
 
 I get an error:

 postgres=# SELECT '  '::text = 'a';
  ?column?
 --
  f
 (1 row)

 postgres=# SELECT a=b FROM (SELECT ''::text, '  ') x(a,b);
 ERROR:  failed to find conversion function from unknown to text

 So that means the column reference b is treated differently than the
 literal. Here I don't mean a reference to an actual column of a real
 table, just an identifier (b) that parses as a columnref.

 Creating the table gives you a warning (not an error), but I think that
 was a poor example for me to choose, and not important to my point.


I get the point but the warning stems from converting from untyped to
unknown.  Then you get the error looking for an implicit cast from
unknown.  The error you had stated referred to the first situation, the
conversion of untyped to unknown.


  This seems to be a case of the common problem (or, at least recently
  mentioned) where type conversion only deals with data and not context.
 
 
  http://www.postgresql.org/message-id/CADx9qBmVPQvSH3
  +2cH4cwwPmphW1mE18e=wumlfuc-qz-t7...@mail.gmail.com javascript:;
 
 
 I think that is a different problem. That's a runtime type conversion
 error (execution time), and I'm talking about something happening at
 parse analysis time.


Again, referring here to why your proposed error seems unlikely in face of
similar errors not currently providing sufficient context either.  I don't
know enough to posit why this is the case.


 
  but this too works - which is why the implicit cast concept above
  fails (I'm leaving it since the thought process may help in
  understanding):
 
 
  SELECT 1 = '1';
 
 
  From which I infer that an unknown literal is allowed to be fed
  directly into a type's input function to facilitate a direct coercion.

 Yes, I believe that's what's happening. When we use an unknown literal,
 it's acting more like a value constructor and will pass it to the type
 input function. When it's a columnref, even if unknown, it tries to cast
 it and fails.

 But that is very confusing. In the example at the top of this email, it
 seems like the second query should be equivalent to the first, or even
 that postgres should be able to rewrite the second into the first. But
 the second query fails where the first succeeds.


Agreed (sorta, I can understand your PoV) - but it is consistently
confusing...and quite obvious when you've changed from one to the other.

Is there something concrete to dig teeth into here?



  At this point...backward compatibility?

 Backwards compatibility of what queries? I guess the ones that return
 unknowns to the client or create tables with unknown columns?


Yes, disallowing unknown and requiring everything to be untyped or error.


  create table a(u) as select '1';
 
 
  WARNING: column u has type unknown​
  DETAIL:  Proceeding with relation creation anyway.
 
 
  Related question: was there ever a time when the above failed instead
  of just supplying a warning?

 Not that I recall.



  ​My gut reaction is if you feel strongly enough to add some additional
  documentation or warnings/hints/details related to this topic they
  probably would get put in; but disallowing unknown as first-class
  type is likely to fail to pass a cost-benefit evaluation.

 I'm not proposing that we eliminate unknown. I just think columnrefs and
 literals should behave consistently. If we really don't want unknown
 columnrefs, it seems like we could at least throw a better error.


We do allow unknown column refs.  We don't allow you to do much with them
though - given the lack of casts, implicit and otherwise.  The error that
result from that situation are where the complaint lies.  Since we cannot
disallow unknown column refs the question is can the resultant errors be
improved.  I really don't see value in expending effort solely trying to
improve this limited situation.  If the same effort also improves a wider
swath of the code base then great.

The only other option is to allow unknowns to be implicitly cast to text
and then fed into the input type just like an untyped literal would.  But
those are not the same thing - no matter how similar your two mock queries
make them seem - and extrapolation from those two alone doesn't seem
justified. And his is crux of where your similarity falls apart.  If you
can justify the above behavior then maybe...



 If we were starting from scratch, I'd also not return unknown to the
 client, but we have to worry about the backwards compatibility.

  Distinguishing between untyped literals and unknown type literals
  seems promising concept to aid in understanding the difference in the
  face of not being able (or wanting) to actually change the behavior.

 Not sure I understand that proposal, can you elaborate?


Purely documentation 

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Heikki Linnakangas

On 04/22/2015 09:24 PM, Robert Haas wrote:

I would feel safer if we added a completely new epoch counter to the page
header, instead of reusing LSNs. But as we all know, changing the page
format is a problem for in-place upgrade, and takes some space too.

Yeah.  We have a serious need to reduce the size of our on-disk
format.  On a TPC-C-like workload Jan Wieck recently tested, our data
set was 34% larger than another database at the beginning of the test,
and 80% larger by the end of the test.  And we did twice the disk
writes.  See The Elephants in the Room.pdf at
https://sites.google.com/site/robertmhaas/presentations


Meh. Adding an 8-byte header to every 8k block would add 0.1% to the 
disk size. No doubt it would be nice to reduce our disk footprint, but 
the page header is not the elephant in the room.


- Heikki




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-23 Thread Heikki Linnakangas

On 04/22/2015 11:58 PM, Robert Haas wrote:

On Wed, Apr 22, 2015 at 3:34 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

On 04/22/2015 10:21 PM, Robert Haas wrote:

On Wed, Apr 22, 2015 at 3:01 PM, Heikki Linnakangas hlinn...@iki.fi
wrote:

For example, imagine that perform point-in-time recovery to WAL position
0/1237E568, on timeline 1. That falls within segment
00010012. Then we end recovery, and switch to timeline 2.
After the switch, and some more WAL-logged actions, we'll have these
files
in pg_xlog:

00010011
00010012
00020012
00020013
00020014



Is the 00010012 file a partial segment of the sort
you're proposing to no longer achive?


If you did pure archive recovery, with no streaming replication involved,
then no. If it was created by streaming replication, and the replication had
not filled the whole segment yet, then yes, it would be a partial segment.


Why the difference?


Because we don't archive partial segments, except for the last one at a 
timeline switch, and there was no timeline switch to timeline 1 within 
that segment.


It doesn't really matter, though. The behaviour at the switch from 
timeline 1 to 2 works the same, whether the 00010012 
segment is complete or not.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread Kyotaro HORIGUCHI
Very sorry for the trash..

===
Now I found a comment at just where I patched,

  * XXX if the typinput function is not immutable, we really ought to
  * postpone evaluation of the function call until runtime. But there
  * is no way to represent a typinput function call as an expression
  * tree, because C-string values are not Datums. (XXX This *is*
  * possible as of 7.3, do we want to do it?)

- Is it OK to *now* we can do this?
+ Is it OK to regard that we can do this *now*?

regards,

 Hello, I think this is a bug.
 
 The core of this problem is that coerce_type() fails for Var of
 type UNKNOWNOID.
 
 The comment for the function says that,
 
  * The caller should already have determined that the coercion is possible;
  * see can_coerce_type.
 
 But can_coerce_type() should say it's possible to convert from
 unknown to any type as it doesn't see the target node type. I
 think this as an inconsistency between can_coerce_type and
 coerce_type. So making this consistent would be right way.
 
 Concerning only this issue, putting on-the-fly conversion for
 unkown nonconstant as attached patch worked for me. I'm not so
 confident on this, though..
 
 regards,
 
 At Wed, 22 Apr 2015 23:26:43 -0700, Jeff Davis pg...@j-davis.com wrote in 
 1429770403.4604.22.camel@jeff-desktop
  On Wed, 2015-04-22 at 20:35 -0700, David G. Johnston wrote:
  
   But the fact that column b has the data type unknown is only a
   warning - not an error.
   
  I get an error:
  
  postgres=# SELECT '  '::text = 'a';
   ?column? 
  --
   f
  (1 row)
  
  postgres=# SELECT a=b FROM (SELECT ''::text, '  ') x(a,b);
  ERROR:  failed to find conversion function from unknown to text



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread David G. Johnston
On Thu, Apr 23, 2015 at 1:49 AM, David G. Johnston 
david.g.johns...@gmail.com wrote:

 On Wednesday, April 22, 2015, Jeff Davis pg...@j-davis.com wrote:

 On Wed, 2015-04-22 at 20:35 -0700, David G. Johnston wrote:


  ​My gut reaction is if you feel strongly enough to add some additional
  documentation or warnings/hints/details related to this topic they
  probably would get put in; but disallowing unknown as first-class
  type is likely to fail to pass a cost-benefit evaluation.

 I'm not proposing that we eliminate unknown. I just think columnrefs and
 literals should behave consistently. If we really don't want unknown
 columnrefs, it seems like we could at least throw a better error.


 We do allow unknown column refs.  We don't allow you to do much with them
 though - given the lack of casts, implicit and otherwise.  The error that
 result from that situation are where the complaint lies.  Since we cannot
 disallow unknown column refs the question is can the resultant errors be
 improved.  I really don't see value in expending effort solely trying to
 improve this limited situation.  If the same effort also improves a wider
 swath of the code base then great.


​Slightly tangential but a pair of recent threads

http://www.postgresql.org/message-id/55244654.7020...@bluetreble.com
http://www.postgresql.org/message-id/flat/cakfquwazck37j7fa4pzoz8m9jskmqqopaftxry0ca_n4r2x...@mail.gmail.com#cakfquwazck37j7fa4pzoz8m9jskmqqopaftxry0ca_n4r2x...@mail.gmail.com

where I pondered about polymorphic functions got me thinking about the
following function definition:

​create function poly(inarg anyelement, testarg unknown) returns anyelement
AS $$
BEGIN
   SELECT inarg;
END;
$$
LANGUAGE plpgsql
;

Which indeed works...

Jim's variant type largely mirrors the behavior desired in this thread.

The lack of casting fails to unknown makes it an unsuitable alternative for
variant though maybe if indeed we allow type coercion to work it would
become viable.  But the same concerns apply as well.

David J.


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Andres Freund
On 2015-04-22 15:23:16 -0700, Peter Geoghegan wrote:
 On Tue, Apr 21, 2015 at 7:57 AM, Andres Freund and...@anarazel.de wrote:
  * Iff we're going to have the XLOG_HEAP_AFFIRM record, I'd rather have
that guide the logical decoding code. Seems slightly cleaner.
 
 I thought that you didn't think that would always work out? That in
 some possible scenario it could break?

I don't think there's a real problem. You obviously have to do it right
(i.e. only abort insertion if there's a insert/update/delete or commit).

Speaking of commits: Without having rechecked: I think you're missing
cleanup of th pending insertion on commit.

  * Gram.y needs a bit more discussion:
* Can anybody see a problem with changing the precedence of DISTINCT 
  ON to nonassoc? Right now I don't see a problem given both are
  reserved keywords already.
 
 Why did you push code that solved this in a roundabout way, but
 without actually reverting my original nonassoc changes (which would
 now not result in shift/reduce conflicts?).

I pushed the changes to a separate repo so you could polish them ;)

 What should we do about that?

I'm prety sure we should not do the %nonassoc stuff.

 I don't like that you seem to have regressed
 diagnostic output [1].

Meh^5. This is the same type of errors we get in literally hundreds of
other syntax errors. And in contrast to the old error it'll actually
have a correct error possition.

 Surely it's simpler to use the nonassoc approach?

I think it's much harder to forsee all consequences of that.

  * SPEC_IGNORE,  /* INSERT of ON CONFLICT IGNORE */ looks like
a wrongly copied comment.
 
 Not sure what you mean here. Please clarify.

I'm not sure either ;)

  * I wonder if we now couldn't avoid changing heap_delete's API - we can
always super delete if we find a speculative insertion now. It'd be
nice not to break out of core callers if not necessary.
 
 Maybe, but if there is a useful way to break out only a small subset
 of heap_delete(), I'm not seeing it.

I think you misread my statement: I'm saying we don't need the new
argument anymore, even if we still do the super-deletion in
heap_delete(). Now that the speculative insertion will not be visible
(as in seen on a tuple they could delete) to other backends we can just
do the super deletion if we see that the tuple is a promise one.

  * breinbaas on IRC just mentioned that it'd be nice to have upsert as a
link in the insert. Given that that's the pervasive term that doesn't
seem absurd.
 
 Not sure what you mean. Where would the link appear?

The index, i.e. it'd just be another indexterm. It seems good to give
people a link.

 * We need to figure out the tuple lock strength details. I think this
 is doable, but it is the greatest challenge to committing ON CONFLICT
 UPDATE at this point. Andres feels that we should require no greater
 lock strength than an equivalent UPDATE. I suggest we get to this
 after committing the IGNORE variant. We probably need to discuss this
 some more.

I want to see a clear way forward before we commit parts. It doesn't
have to be completely iron-clad, but the way forward should be pretty
clear. What's the problem you're seeing right now making this work? A
couple days on jabber you seemed to see a way doing this?

The reason I think this has to use the appropriate lock level is that
it'll otherwise re-introduce the deadlocks that fk locks resolved. Given
how much pain we endured to get fk locks, that seems like a bad deal.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-04-23 Thread Marko Tiikkaja

On 4/2/15 9:37 AM, Pavel Stehule wrote:

estate-err_text = stmt-elog_level == ERROR ? estate-err_text :
raise_skip_msg  ;


Can we do this simple change? It will produce a stackinfo for exceptions
and it will not to make mad developers by lot of useless content.


I'm not sure everyone agrees with this to be honest, myself included.

I think the best way to do this would be to have an option for RAISE to 
suppress the context *regardless of nesting depth*, but show the full 
context by default for ERRORs.  For NOTICEs and WARNINGs I don't care 
much what the default will be; perhaps just full backwards compatibility 
could work there.



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread Kyotaro HORIGUCHI
Now I found a comment at just where I patched,

  * XXX if the typinput function is not immutable, we really ought to
  * postpone evaluation of the function call until runtime. But there
  * is no way to represent a typinput function call as an expression
  * tree, because C-string values are not Datums. (XXX This *is*
  * possible as of 7.3, do we want to do it?)

Is it OK to *now* we can do this?

regards,

 Hello, I think this is a bug.
 
 The core of this problem is that coerce_type() fails for Var of
 type UNKNOWNOID.
 
 The comment for the function says that,
 
  * The caller should already have determined that the coercion is possible;
  * see can_coerce_type.
 
 But can_coerce_type() should say it's possible to convert from
 unknown to any type as it doesn't see the target node type. I
 think this as an inconsistency between can_coerce_type and
 coerce_type. So making this consistent would be right way.
 
 Concerning only this issue, putting on-the-fly conversion for
 unkown nonconstant as attached patch worked for me. I'm not so
 confident on this, though..
 
 regards,
 
 At Wed, 22 Apr 2015 23:26:43 -0700, Jeff Davis pg...@j-davis.com wrote in 
 1429770403.4604.22.camel@jeff-desktop
  On Wed, 2015-04-22 at 20:35 -0700, David G. Johnston wrote:
  
   But the fact that column b has the data type unknown is only a
   warning - not an error.
   
  I get an error:
  
  postgres=# SELECT '  '::text = 'a';
   ?column? 
  --
   f
  (1 row)
  
  postgres=# SELECT a=b FROM (SELECT ''::text, '  ') x(a,b);
  ERROR:  failed to find conversion function from unknown to text

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-04-23 Thread Pavel Stehule
2015-04-23 9:53 GMT+02:00 Marko Tiikkaja ma...@joh.to:

 On 4/2/15 9:37 AM, Pavel Stehule wrote:

 estate-err_text = stmt-elog_level == ERROR ? estate-err_text :
 raise_skip_msg  ;


 Can we do this simple change? It will produce a stackinfo for exceptions
 and it will not to make mad developers by lot of useless content.


 I'm not sure everyone agrees with this to be honest, myself included.

 I think the best way to do this would be to have an option for RAISE to
 suppress the context *regardless of nesting depth*, but show the full
 context by default for ERRORs.  For NOTICEs and WARNINGs I don't care much
 what the default will be; perhaps just full backwards compatibility could
 work there.


I don't see a contradiction. There is clean agreement, so ERROR level
should to show the context. NOTICE and WARNINGs doesn't need it - and there
is a backward compatibility and usability reasons don't do it.

I am not to against to any special option to RAISE statement. Have you some
idea?

What about a enhancing a USING clause?

example:

RAISE NOTICE USING message = '', with_context = true
RAISE EXCEPTION USING message = '', with_context = false


Regards

Pavel





 .m



[HACKERS] anole - test case sha2 fails on all branches

2015-04-23 Thread Sandeep Thakkar
Hi,

The test case sha2 in contrib pgcrypto module is failing with a diff on
anole because the results file contains the additional lines as:
--

+ WARNING:  detected write past chunk end in ExprContext 6021cbb0

--

Ex:
The log for REL9_2_STABLE can be seen at
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=anoledt=2015-04-23%2009%3A00%3A25stg=contrib-install-check-C


-- 
Sandeep Thakkar


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-23 Thread Sawada Masahiko
On Mon, Apr 20, 2015 at 10:17 PM, David Steele da...@pgmasters.net wrote:
 On 4/20/15 4:40 AM, Sawada Masahiko wrote:

 Thank you for updating the patch.

 One question about regarding since v7 (or later) patch is;
 What is the different between OBJECT logging and SESSION logging?

 In brief, session logging can log anything that happens in a user
 session while object logging only targets DML and SELECT on selected
 relations.

 The pg_audit.log_relation setting is intended to mimic the prior
 behavior of pg_audit before object logging was added.

 In general, I would not expect pg_audit.log = 'read, write' to be used
 with pg_audit.role.  In other words, session logging of DML and SELECT
 should probably not be turned on at the same time as object logging is
 in use.  Object logging is intended to be a fine-grained version of
 pg_audit.log = 'read, write' (one or both).

Thank you for your explanation!
I understood about how to use two logging style.

 I used v9 patch with pg_audit.log_relation = on, and got quite
 similar but different log as follows.

 =# select * from hoge, bar where hoge.col = bar.col;
 NOTICE:  AUDIT: OBJECT,8,1,READ,SELECT,TABLE,public.hoge,select *
 from hoge, bar where hoge.col = bar.col;
 NOTICE:  AUDIT: SESSION,8,1,READ,SELECT,TABLE,public.hoge,select *
 from hoge, bar where hoge.col = bar.col;
 NOTICE:  AUDIT: OBJECT,8,1,READ,SELECT,TABLE,public.bar,select * from
 hoge, bar where hoge.col = bar.col;
 NOTICE:  AUDIT: SESSION,8,1,READ,SELECT,TABLE,public.bar,select *
 from hoge, bar where hoge.col = bar.col;

 The behaviour of SESSION log is similar to OBJECT log now, and SESSION
 log per session (i.g, pg_audit.log_relation = off) is also similar to
 'log_statement = all'. (enhancing log_statement might be enough)
 So I couldn't understand needs of SESSION log.

 Session logging is quite different from 'log_statement = all'.  See:

 http://www.postgresql.org/message-id/552323b2.8060...@pgmasters.net

 and/or the Why pg_audit? section of the pg_audit documentation.

 I agree that it may make sense in the future to merge session logging
 into log_statements, but for now it does provide important additional
 functionality for creating audit logs.


I'm concerned that behaviour of pg_audit has been changed at a few
times as far as I remember. Did we achieve consensus on this design?

And one question; OBJECT logging of all tuple deletion (i.g. DELETE
FROM hoge) seems like not work as follows.


=# grant all on bar TO masahiko;

(1) Delete all tuple
=# insert into bar values(1);
=# delete from bar ;
NOTICE:  AUDIT: SESSION,47,1,WRITE,DELETE,TABLE,public.bar,delete from bar ;
DELETE 1

(2) Delete specified tuple (but same result as (1))
=# insert into bar values(1);
=# delete from bar where col = 1;
NOTICE:  AUDIT: OBJECT,48,1,WRITE,DELETE,TABLE,public.bar,delete from
bar where col = 1;
NOTICE:  AUDIT: SESSION,48,1,WRITE,DELETE,TABLE,public.bar,delete from
bar where col = 1;
DELETE 1

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread Kyotaro HORIGUCHI
Hello,

  Very sorry for the trash..
 
  ===
  Now I found a comment at just where I patched,
 
* XXX if the typinput function is not immutable, we really ought to
* postpone evaluation of the function call until runtime. But there
* is no way to represent a typinput function call as an expression
* tree, because C-string values are not Datums. (XXX This *is*
* possible as of 7.3, do we want to do it?)
 
  - Is it OK to *now* we can do this?
  + Is it OK to regard that we can do this *now*?
 
 
 In this patch or a different one?  Does this comment have anything to do
 with the concern of this thread?


The comment cieted above is in the PostgreSQL source file. The
reason why implicit cast (coercion) don't work for subquery's
results of unkown type, but works for constants is in the comment
cited above.

For select ''::text = ' ', the internal intermediate
representation of the expression looks something like this,

 Equal(text_const(''), unknown_const(' '))

and the second parameter can be immediately converted into
text_const(' ') during expression transformation in parse phase.

On the other hand, the problematic query is represented as
follows,

 Equal(text_const(a), unknown_const(b))
for select ''::text as a, ' ' as b

But as described in the comment, PostgreSQL knows what to do but
it couldn't be implemented at the time of.. 7.3? But it seems to
be doable now.


By the way, the following query is similar to the problematic one
but doesn't fail.

SELECT a = b FROM (SELECT ''::text, ' ' UNION ALL SELECT '':text,
' ') AS x(a, b);

This is because parsing of UNION immediately converts constants
of unknown type in the UNION's both arms to text so the top level
select won't be bothered by this problem. But the problematic
query doesn't have appropriate timing to do that until the
function I patched.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread Kyotaro HORIGUCHI
Hello,

  Hello, I think this is a bug.
 
  The core of this problem is that coerce_type() fails for Var of
  type UNKNOWNOID.
 
  The comment for the function says that,
 
   * The caller should already have determined that the coercion is
  possible;
   * see can_coerce_type.
 
  But can_coerce_type() should say it's possible to convert from
  unknown to any type as it doesn't see the target node type. I
  think this as an inconsistency between can_coerce_type and
  coerce_type. So making this consistent would be right way.
 
 
 You have two pieces of contradictory knowledge - how are you picking which
 one to fix?

What do you think are contradicting? The patch fixes the problem
that unkown nonconstats cannot get proper conversion and the
query fails.

| SELECT a=b FROM (SELECT ''::text, '  ') x(a,b);
| ERROR:  failed to find conversion function from unknown to text

  Concerning only this issue, putting on-the-fly conversion for
  unkown nonconstant as attached patch worked for me. I'm not so
  confident on this, though..
 
 
 Confident about what aspect - the safety of the patch itself or whether
 the conversion is even a good idea?​

Mainly the former, and how far it covers the other similar but
unkown troubles, generality? in other words.

The fix would not so significant but no processing cost or
complexity added, and would reduce astonishment if it works
safely.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Andres Freund
On 2015-04-22 16:40:07 -0700, Peter Geoghegan wrote:
 On Wed, Apr 22, 2015 at 3:23 PM, Peter Geoghegan p...@heroku.com wrote:
  * We need to sort out those issues with the grammar, since that only
  really applies to the inference specification. Maybe the WHERE clause
  that the inference specification accepts can be broken out. No ON
  CONFLICT UPDATE specific issues left there, AFAICT though.

 I pushed some code that deals with the predicate being within parenthesis:

 https://github.com/petergeoghegan/postgres/commit/358854645279523310f998dfc9cb3fe3e165ce1e

And the way you've used nonassoc here doesn't look correct. You're
hiding legitimate ambiguities in the grammar. UPDATE is a unreserved
keyword, so for

... ON CONFLICT '(' index_params ')' where_clause OnConflictUpdateStmt

it won't be able to discern whether an UPDATE in the WHERE clause is
part of the where_clause or OnConflictUpdate.

This is legal:
SELECT * FROM (SELECT true as update) f WHERE update;
i.e. 'update' can be the last part of a WHERE clause.

Essentially what you're trying to do with the nonassic is hiding that
UPDATE and IGNORE need to be reserved keywords with the syntax you're
proposing. We can either make them reserved or change the syntax.

One way to avoid making them reserved keywords - which would be somewhat
painful - is to add a 'DO' before the IGNORE/UPDATE. I.e. something like

  ON CONFLICT opt_conflict_expr DO OnConflictUpdateStmt
| ON CONFLICT opt_conflict_expr DO IGNORE

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Split the 'Server Programming' chapter into two?

2015-04-23 Thread Andres Freund
Hi,

While playing around with where exactly to put the replication
origin/progress docs I once more noticed that the 'Server Programming'
book is a mix of different topics.

It currently contains:
35. Extending SQL
36. Triggers
37. Event Triggers
38. The Rule System
39. Procedural Languages
40. PL/pgSQL - SQL Procedural Language
41. PL/Tcl - Tcl Procedural Language
42. PL/Perl - Perl Procedural Language
43. PL/Python - Python Procedural Language
44. Server Programming Interface
45. Background Worker Processes
46. Logical Decoding
47. Replication Progress Tracking

To me at least 44 - 47 don't really fit well to the rest. I think we
either should invent a new category for them, or move them to
'Internals'. Maybe we could introduce 'Extending the Server' category
for those and a couple more? Candidates for that least seem to be
52. Writing A Procedural Language Handler
53. Writing A Foreign Data Wrapper
54. Writing A Custom Scan Provider
and arguably
57. GiST Indexes
58. SP-GiST Indexes
59. GIN Indexes
60. BRIN Indexes
as well.

Alternatively we could just move the above 'Internals' chapters into
'Server Programming'? But I think it's good to have split between C and
normal stuff there.

Also, shouldn't there at least be a link to
http://www.postgresql.org/docs/devel/static/xfunc-sql.html in
http://www.postgresql.org/docs/devel/static/xplang.html ?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-04-23 Thread Andres Freund
On 2015-03-24 22:22:29 +0100, Petr Jelinek wrote:
 Perhaps we should have some Logical replication developer documentation
 section and put all those three as subsections of that?

So I just played around with this and it didn't find it
worthwhile. Primarily because there's lots of uses of logical decoding
besides building a logical replication solution. I've reverted to
putting it into a separate chapter 'besides' logical decoding.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Geoff Winkless
Apologies for butting in but can I (as a user) express a preference as a
user against DO?

Firstly, it looks horrible. And what's to stop me having SELECT true AS
do in the where clause (as per your UPDATE objection)?

Shouldn't UPDATE be a reserved keyword anyway? AIUI ANSI suggests so.

http://developer.mimer.se/validator/sql-reserved-words.tml

I had always assumed it was; anyone who produced a query for me that
contained update in an unusual context would get slapped heavily.

Geoff

On 23 April 2015 at 11:54, Andres Freund and...@anarazel.de wrote:

 On 2015-04-22 16:40:07 -0700, Peter Geoghegan wrote:
  On Wed, Apr 22, 2015 at 3:23 PM, Peter Geoghegan p...@heroku.com wrote:
   * We need to sort out those issues with the grammar, since that only
   really applies to the inference specification. Maybe the WHERE clause
   that the inference specification accepts can be broken out. No ON
   CONFLICT UPDATE specific issues left there, AFAICT though.
 
  I pushed some code that deals with the predicate being within
 parenthesis:
 
 
 https://github.com/petergeoghegan/postgres/commit/358854645279523310f998dfc9cb3fe3e165ce1e

 And the way you've used nonassoc here doesn't look correct. You're
 hiding legitimate ambiguities in the grammar. UPDATE is a unreserved
 keyword, so for

 ... ON CONFLICT '(' index_params ')' where_clause OnConflictUpdateStmt

 it won't be able to discern whether an UPDATE in the WHERE clause is
 part of the where_clause or OnConflictUpdate.

 This is legal:
 SELECT * FROM (SELECT true as update) f WHERE update;
 i.e. 'update' can be the last part of a WHERE clause.

 Essentially what you're trying to do with the nonassic is hiding that
 UPDATE and IGNORE need to be reserved keywords with the syntax you're
 proposing. We can either make them reserved or change the syntax.

 One way to avoid making them reserved keywords - which would be somewhat
 painful - is to add a 'DO' before the IGNORE/UPDATE. I.e. something like

   ON CONFLICT opt_conflict_expr DO OnConflictUpdateStmt
 | ON CONFLICT opt_conflict_expr DO IGNORE

 Greetings,

 Andres Freund


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Petr Jelinek

On 23/04/15 14:34, Geoff Winkless wrote:

Apologies for butting in but can I (as a user) express a preference as a
user against DO?

Firstly, it looks horrible. And what's to stop me having SELECT true AS
do in the where clause (as per your UPDATE objection)?



DO is already reserved keyword. There is also some precedence for this 
in CREATE RULE. But I agree that it's not ideal syntax.



Shouldn't UPDATE be a reserved keyword anyway? AIUI ANSI suggests so.

http://developer.mimer.se/validator/sql-reserved-words.tml

I had always assumed it was; anyone who produced a query for me that
contained update in an unusual context would get slapped heavily.


Postgres currently has UPDATE as unreserved keyword and more importantly 
IGNORE is not keyword at all so making it a new reserved keyword is not 
nice at all.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Andres Freund
On April 23, 2015 3:34:07 PM GMT+03:00, Geoff Winkless pgsqlad...@geoff.dj 
wrote:
Apologies for butting in but can I (as a user) express a preference as
a
user against DO?

Sure. If you propose an alternative ;)

Firstly, it looks horrible. And what's to stop me having SELECT true
AS
do in the where clause (as per your UPDATE objection)?

A syntax error. DO is a reserved keyword. Update is just unreserved (and thus 
can be used as a column label). Ignore is unreserved with the patch and was 
unreserved before.  We obviously can make both reserved, but of so we have to 
do it for real, not by hiding the conflicts 

Shouldn't UPDATE be a reserved keyword anyway? AIUI ANSI suggests so.

http://developer.mimer.se/validator/sql-reserved-words.tml

It's not one right now. And ignore isn't a keyword at all atm.

(Please don't top post)

Andres


--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-23 Thread Robert Haas
On Wed, Apr 22, 2015 at 5:17 PM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Apr 22, 2015 at 06:07:00PM -0300, Alvaro Herrera wrote:
  Good point, but doesn't vacuum remove the need for pruning as it removes
  all the old rows?

 Sure.  The point, I think, is to make autovacuum runs of some sort that
 don't actually vacuum but only do HOT-pruning.  Maybe this is a
 reasonable solution to the problem that queries don't prune anymore
 after Simon's patch.  If we made autovac HOT-prune periodically, we
 could have read-only queries prune only already-dirty pages.  Of course,
 that would need further adjustments to default number of autovac
 workers, I/O allocation, etc.

 Do we really want to make vacuum more complex for this?  vacuum does
 have the delay settings we would need though.

I think it's abundantly clear that, as wonderful as autovacuum is
compared with what we had before autovacuum, it's not good enough.
This is one area where I think improvement is definitely needed, and
I've suggested it before.  Discussion began here:

http://www.postgresql.org/message-id/AANLkTimd3ieGCm9pXV39ci6-owy3rX0mzz_N1tL=0...@mail.gmail.com

Some of the things I suggested then seem dumb in hindsight, but I
think the basic concept is still valid: if we scan the heap and find
only a few dead tuples, the expense of scanning all of the indexes may
not be justified.  Also, the fact that a relation can currently only
be vacuumed by one process at a time is coming to seem like a major
limitation.  Some users are partitioning tables just so that each
partition can be autovac'd separately.  That really shouldn't be
required.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >