Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-03-01 Thread Shigeru Hanada
(2012/03/01 0:33), Tom Lane wrote:
 I don't think that creating such a dependency is acceptable.
 Even if we didn't mind the dependency, you said yourself that
 contrib/postgresql_fdw's validator will accept stuff that's not
 appropriate for dblink.

Agreed.  I think that these two contrib modules (and all FDW modules)
should have individual validator for each to avoid undesirable
dependency and naming conflict, and such validator function should be
inside each module, but not in core.

How about moving postgresql_fdw_validator into dblink, with renaming to
dblink_fdw_validator?  Attached patch achieves such changes.  I've left
postgresql_fdw_validator in foreign_data regression test section, so
that foreign_data section can still check whether FDW DDLs invoke
validator function.  I used the name postgresql_fdw_validator for test
validator to make change as little as possible.

This change requires dblink to have new function, so its version should
be bumped to 1.1.

These changes have no direct relation to PostgreSQL FDW, so this patch
can be applied by itself.  If this patch has been applied, I'll rename
pgsql_fdw to postgresql_fdw which contains product name fully spelled out.

-- 
Shigeru Hanada
diff --git a/contrib/dblink/Makefile b/contrib/dblink/Makefile
index ac63748..a27db88 100644
*** a/contrib/dblink/Makefile
--- b/contrib/dblink/Makefile
*** SHLIB_LINK = $(libpq)
*** 7,13 
  SHLIB_PREREQS = submake-libpq
  
  EXTENSION = dblink
! DATA = dblink--1.0.sql dblink--unpackaged--1.0.sql
  
  REGRESS = dblink
  
--- 7,13 
  SHLIB_PREREQS = submake-libpq
  
  EXTENSION = dblink
! DATA = dblink--1.1.sql dblink--1.0--1.1.sql dblink--unpackaged--1.0.sql
  
  REGRESS = dblink
  
diff --git a/contrib/dblink/dblink--1.0--1.1.sql 
b/contrib/dblink/dblink--1.0--1.1.sql
index ...3dd02a0 .
*** a/contrib/dblink/dblink--1.0--1.1.sql
--- b/contrib/dblink/dblink--1.0--1.1.sql
***
*** 0 
--- 1,17 
+ /* contrib/dblink/dblink--1.0--1.1.sql */
+ 
+ -- complain if script is sourced in psql, rather than via ALTER EXTENSION
+ \echo Use ALTER EXTENSION dblink UPDATE to load this file. \quit
+ 
+ /* First we have to create validator function */
+ CREATE FUNCTION dblink_fdw_validator(
+ options text[],
+ catalog oid
+ )
+ RETURNS boolean
+ AS 'MODULE_PATHNAME', 'dblink_fdw_validator'
+ LANGUAGE C STRICT;
+ 
+ /* Then we add the validator */
+ ALTER EXTENSION dblink ADD FUNCTION dblink_fdw_validator(text[], oid);
+ 
diff --git a/contrib/dblink/dblink--1.1.sql b/contrib/dblink/dblink--1.1.sql
index ...ec02498 .
*** a/contrib/dblink/dblink--1.1.sql
--- b/contrib/dblink/dblink--1.1.sql
***
*** 0 
--- 1,231 
+ /* contrib/dblink/dblink--1.1.sql */
+ 
+ -- complain if script is sourced in psql, rather than via CREATE EXTENSION
+ \echo Use CREATE EXTENSION dblink to load this file. \quit
+ 
+ -- dblink_connect now restricts non-superusers to password
+ -- authenticated connections
+ CREATE FUNCTION dblink_connect (text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_connect'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_connect (text, text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_connect'
+ LANGUAGE C STRICT;
+ 
+ -- dblink_connect_u allows non-superusers to use
+ -- non-password authenticated connections, but initially
+ -- privileges are revoked from public
+ CREATE FUNCTION dblink_connect_u (text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_connect'
+ LANGUAGE C STRICT SECURITY DEFINER;
+ 
+ CREATE FUNCTION dblink_connect_u (text, text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_connect'
+ LANGUAGE C STRICT SECURITY DEFINER;
+ 
+ REVOKE ALL ON FUNCTION dblink_connect_u (text) FROM public;
+ REVOKE ALL ON FUNCTION dblink_connect_u (text, text) FROM public;
+ 
+ CREATE FUNCTION dblink_disconnect ()
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_disconnect'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_disconnect (text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_disconnect'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_open (text, text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_open'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_open (text, text, boolean)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_open'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_open (text, text, text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_open'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_open (text, text, text, boolean)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_open'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_fetch (text, int)
+ RETURNS setof record
+ AS 'MODULE_PATHNAME','dblink_fetch'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_fetch (text, int, boolean)
+ RETURNS setof record
+ AS 'MODULE_PATHNAME','dblink_fetch'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_fetch (text, text, int)
+ RETURNS setof record
+ AS 'MODULE_PATHNAME','dblink_fetch'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_fetch (text, text, int, boolean)
+ 

Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Robert Haas
On Wed, Feb 29, 2012 at 5:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Feb 29, 2012 at 4:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Easier for who?  I don't care for the idea of code that has to cope with
 two page formats, or before long N page formats, because if we don't
 have some mechanism like this then we will never be able to decide that
 an old data format is safely dead.

 Huh?  You can drop support for a new page format any time you like.
 You just decree that version X+1 no longer supports it, and you can't
 pg_upgrade to it until all traces of the old page format are gone.

 And how would a DBA know that?

We'd add a column to pg_class that tracks which page version is in use
for each relation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade --logfile option documentation

2012-03-01 Thread Robert Haas
On Wed, Feb 29, 2012 at 6:02 PM, Bruce Momjian br...@momjian.us wrote:
 Any ideas about improving the error reporting more generally, so that
 when reloading the dump fails, the user can easily see what went
 belly-up, even if they didn't use -l?

 The only idea I have is to write the psql log to a temporary file and
 report the last X lines from the file in case of failure.  Does that
 help?

Why not just redirect stdout but not stderr?  If there are error
messages, surely we want the user to just see those.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-03-01 Thread Robert Haas
On Wed, Feb 29, 2012 at 6:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, my evidence is that a parameterized path should pretty much
 always include a paramaterized path somewhere in there - otherwise,
 what is parameterization doing for us?

 Well, yes, we know that much.

I didn't write what I meant to write there.  I meant to say: a
parameterized path is presumably going to contain a parameterized
*index scan* somewhere within.  So somewhere we're going to have
something of the form

- Index Scan blah on blah
Index Cond: someattr = $1

And if that path weren't parameterized, we'd have to read the whole
relation, either with a full index scan, or a sequential scan.  Or, I
mean, maybe there's a filter condition, so that no path needs to
retrieve the *whole* relation, but even there the index cond is on top
of that, and it's probably doing something, though I suppose you're
right that there might be cases where it doesn't.

 And that's going to reduce the
 row count.  I may be missing something, but I'm confused as to why
 this isn't nearly tautological.

 We don't know that --- I will agree it's likely, but that doesn't make
 it so certain that we can assume it without checking.  A join condition
 won't necessarily eliminate any rows.

 (... thinks about that for awhile ...)  One thing we could possibly do
 is have indxpath.c arbitrarily reject parameterizations that don't
 produce a smaller estimated number of rows than an unparameterized scan.
 Admittedly, this still doesn't *prove* the assumption for join
 relations, but maybe it brings the odds to where it's okay for add_path
 to make such an assumption.

That seems to make sense.

 (... thinks some more ...)  No, that doesn't get us there, because that
 doesn't establish that a more-parameterized path produces fewer rows
 than some path that requires less parameterization, yet not none at
 all.  You really want add_path carrying out those comparisons.  In your
 previous example, it's entirely possible that path D is dominated by B
 or C because of poor choices of join quals.

I'm not following this part.  Can you explain further?  It seems to me
at any rate that we could get pretty far if we could just separate
parameterized paths and unparameterized paths into separate buckets.
Even if we have to do some extra work when comparing parameterized
paths *to each other*, we'd gain a fair amount by avoiding comparing
any of them with the unparameterized paths.  Or at least, I hope so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] performance results on IBM POWER7

2012-03-01 Thread Robert Haas
IBM has provided the PostgreSQL community with access to a couple of
IBM POWER7 machines through OSUOSL.  Simon has access to one, carved
up into a couple of LPARs, for replication work, and there's a
buildfarm animal on there as well, I think; I have access to the
other, for performance testing.  I imagine we can get access for a few
other people as well, though at the moment the performance-testing
machine is inaccessible and I'm not having very much luck getting help
from the very busy OSUOSL folks.  Anyway, before it bit the dust, I
was able to do some basic pgbench tests at various scale factors and
client counts.  I used my usual settings:

shared_buffers = 8GB
maintenance_work_mem = 1GB
synchronous_commit = off
checkpoint_segments = 300
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
wal_writer_delay = 20ms

I did three five-minute runs at each scale factors 100, 300, 1000,
3000, and 1, with varying client counts: 1, 2, and all multiples
of 4 up to 80.  I stopped and restarted the database after each run
(but did not flush the OS cache, so this is a warm-start test) and
took the median of the three results for each run.  Full results are
attached herewith; pretty graphs are on my blog at
http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html

When I get the machine back, my plan is to next run some read-write
pgbench tests.  Those will need to be longer, though.  Read
performance doesn't seem to be very sensitive to the length of the
tests, but write performance is, so I'll probably need at least
30-minute runs if not more to get an accurate sense of what the
performance is like.  After that I think maybe some testing of the
remaining CommitFest patches might be in order (though personally I'd
like to wrap this CommitFest up fairly soon) to see if any of those
improve things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


f.100
Description: Binary data


f.300
Description: Binary data


f.1000
Description: Binary data


f.3000
Description: Binary data


f.1
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Alexander Korotkov
On Thu, Mar 1, 2012 at 1:19 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 That seems like a pretty narrow, uncommon use-case.  Also, to get
 accurate stats for such queries that way, you'd need really enormous
 histograms.  I doubt that the existing parameters for histogram size
 will permit meaningful estimation of more than the first array entry
 (since we don't make the histogram any larger than we do for a scalar
 column).

 The real point here is that the fact that we're storing btree-style
 stats for arrays is an accident, backed into by having added btree
 comparators for arrays plus analyze.c's habit of applying default
 scalar-oriented analysis functions to any type without an explicit
 typanalyze entry.  I don't recall that we ever thought hard about
 it or showed that those stats were worth anything.


 OK. I don't object to removing btree stats from arrays.
 What do you thinks about pg_stats view in this case? Should it combine
 values histogram and array length histogram in single column like do for
 MCV and MCELEM?


Btree statistics for arrays and additional statistics slot are removed from
attached version of patch. pg_stats view is untouched for while.

--
With best regards,
Alexander Korotkov.


arrayanalyze-0.13.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Robert Haas
On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Nathan Boley npbo...@gmail.com writes:
 On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I am starting to look at this patch now.  I'm wondering exactly why the
 decision was made to continue storing btree-style statistics for arrays,
 in addition to the new stuff.

 If I understand you're suggestion, queries of the form

 SELECT * FROM rel
 WHERE ARRAY[ 1,2,3,4 ] = x
      AND x =ARRAY[ 1, 2, 3, 1000];

 would no longer use an index. Is that correct?

 No, just that we'd no longer have statistics relevant to that, and would
 have to fall back on default selectivity assumptions.  Do you think that
 such applications are so common as to justify bloating pg_statistic for
 everybody that uses arrays?

I confess I am nervous about ripping this out.  I am pretty sure we
will get complaints about it.  Performance optimizations that benefit
group A at the expense of group B are always iffy, and I'm not sure
the case of using an array as a path indicator is as uncommon as you
seem to think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Alvaro Herrera

Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:
 On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:

  No, just that we'd no longer have statistics relevant to that, and would
  have to fall back on default selectivity assumptions.  Do you think that
  such applications are so common as to justify bloating pg_statistic for
  everybody that uses arrays?
 
 I confess I am nervous about ripping this out.  I am pretty sure we
 will get complaints about it.  Performance optimizations that benefit
 group A at the expense of group B are always iffy, and I'm not sure
 the case of using an array as a path indicator is as uncommon as you
 seem to think.

Maybe we should keep it as an option.  I do think it's quite uncommon,
but for those rare users, it'd be good to provide the capability while
not bloating everyone else's stat catalog.  The thing is, people using
arrays as path indicators and such are likely using relatively small
arrays; people storing real data are likely to store much bigger arrays.
Just a hunch though.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-03-01 Thread Marko Kreen
On Tue, Feb 28, 2012 at 05:04:44PM +0900, Kyotaro HORIGUCHI wrote:
  There is still one EOF in v3 getAnotherTuple() -
  pqGetInt(tupnfields), please turn that one also to
  protocolerror.
 
 pqGetInt() returns EOF only when it wants additional reading from
 network if the parameter `bytes' is appropreate. Non-zero return
 from it seems should be handled as EOF, not a protocol error. The
 one point I had modified bugilly is also restored. The so-called
 'protocol error' has been vanished eventually.

But it's broken in V3 protocol - getAnotherTuple() will be called
only if the packet is fully read.  If the packet contents do not
agree with packet header, it's protocol error.  Only valid EOF
return in V3 getAnotherTuple() is when row processor asks
for early exit.

 Is there someting left undone?

* Convert old EOFs to protocol errors in V3 getAnotherTuple()

* V2 getAnotherTuple() can leak PGresult when handling custom
  error from row processor.

* remove pqIsnonblocking(conn) check when row processor returned 2.
  I missed that it's valid to call PQisBusy/PQconsumeInput/PQgetResult
  on sync connection.

* It seems the return codes from callback should be remapped,
  (0, 1, 2) is unusual pattern.  Better would be:

   -1 - error
0 - stop parsing / early exit (I'm not done yet)
1 - OK (I'm done with the row)

* Please drop PQsetRowProcessorErrMsg() / PQresultSetErrMsg().
  Main problem is that it needs to be synced with error handling
  in rest of libpq, which is unlike the rest of row processor patch,
  which consists only of local changes.  All solutions here
  are either ugly hacks or too complex to be part of this patch.

Also considering that we have working exceptions and PQgetRow,
I don't see much need for custom error messages.  If really needed,
it should be introduced as separate patch, as the area of code it
affects is completely different.

Currently the custom error messaging seems to be the blocker for
this patch, because of raised complexity when implementing it and
when reviewing it.  Considering how unimportant the provided
functionality is, compared to rest of the patch, I think we should
simply drop it.

My suggestion - check in getAnotherTuple whether resultStatus is
already error and do nothing then.  This allows internal pqAddRow
to set regular out of memory error.  Otherwise give generic
row processor error.

 By the way, I noticed that dblink always says that the current
 connection is 'unnamed' in messages the errors in
 dblink_record_internal@dblink.  I could see that
 dblink_record_internal defines the local variable conname = NULL
 and pass it to dblink_res_error to display the error message. But
 no assignment on it in the function.
 
 It seemed properly shown when I added the code to set conname
 from PG_GETARG_TEXT_PP(0) if available, in other words do that
 just after DBLINK_GET_CONN/DBLINK_GET_NAMED_CONN's. It seems the
 dblink's manner...  This is not included in this patch.
 
 Furthurmore dblink_res_error looks only into returned PGresult to
 display the error and always says only `Error occurred on dblink
 connection..: could not execute query'..
 
 Is it right to consider this as follows?
 
  - dblink is wrong in error handling. A client of libpq should
see PGconn by PQerrorMessage() if (or regardless of whether?)
PGresult says nothing about error.

Yes, it seems like bug.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance results on IBM POWER7

2012-03-01 Thread Ants Aasma
On Thu, Mar 1, 2012 at 4:54 PM, Robert Haas robertmh...@gmail.com wrote:
 ... After that I think maybe some testing of the
 remaining CommitFest patches might be in order (though personally I'd
 like to wrap this CommitFest up fairly soon) to see if any of those
 improve things.

Besides performance testing, could you check how clocksources behave
on this kind of machine?
You can find pg_test_timing tool attached here:
http://archives.postgresql.org/pgsql-hackers/2012-01/msg00937.php

To see which clocksources are available, you can do:
# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
To switch the clocksource, just write the desired clocksource like this:
# echo hpet  /sys/devices/system/clocksource/clocksource0/current_clocksource

Thanks,
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Allowing multi -t and adding -n to vacuumdb ?

2012-03-01 Thread Jehan-Guillaume (ioguix) de Rorthais
Hi,

One of our customer send us a patch he wrote for his needs (on
src/bin/scripts/vacuumdb.c, no doc were included).

He's using one schema per application and would like to be able to run
vacuumdb on each of them independently so he added the --schema|-n
option and send us the patch.

Reviewing his patch, I thought it would be more useful to allow multi
--table|-t options on the command line first. It might be possible to
pass an array of tables to vacuum_one_database function instead of
just one.

Then, we could add the --schema|-n option which would fetch and build
the table list and call vacuum_one_database.

But before I start writing this patch, I would like some opinion, pros /
cons. Do you think such a feature could be accepted in official
PostgreSQL code or should we keep this as an external script ?

Thank you,
-- 
Jehan-Guillaume (ioguix) de Rorthais
http://www.dalibo.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance results on IBM POWER7

2012-03-01 Thread Robert Haas
On Thu, Mar 1, 2012 at 11:23 AM, Ants Aasma ants.aa...@eesti.ee wrote:
 On Thu, Mar 1, 2012 at 4:54 PM, Robert Haas robertmh...@gmail.com wrote:
 ... After that I think maybe some testing of the
 remaining CommitFest patches might be in order (though personally I'd
 like to wrap this CommitFest up fairly soon) to see if any of those
 improve things.

 Besides performance testing, could you check how clocksources behave
 on this kind of machine?
 You can find pg_test_timing tool attached here:
 http://archives.postgresql.org/pgsql-hackers/2012-01/msg00937.php

 To see which clocksources are available, you can do:
 # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
 To switch the clocksource, just write the desired clocksource like this:
 # echo hpet  /sys/devices/system/clocksource/clocksource0/current_clocksource

Sure, I'll check that as soon as it's back up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-01 Thread Simon Riggs
On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 24.02.2012 22:55, Simon Riggs wrote:

 A long time ago, in a galaxy far away, we discussed ways to speed up
 data loads/COPY.
 http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php

 In particular, the idea that we could mark tuples as committed while
 we are still loading them, to avoid negative behaviour for the first
 reader.

 Simple patch to implement this is attached, together with test case.

  ...


 What exactly does it do? Previously, we optimised COPY when it was
 loading data into a newly created table or a freshly truncated table.
 This patch extends that and actually sets the tuple header flag as
 HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
 code. The patch also adds some tests for corner cases that would make
 that action break MVCC - though those cases are minor and typical data
 loads will benefit fully from this.


 This doesn't work with subtransactions:
...
 The query should return the row copied in the same subtransaction.

Thanks for pointing that out.

New patch with corrected logic and test case attached.

 In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
 and adding current xid to snapshots. That is an invasive change that I
 would wish to avoid at any time and explains the long delay in
 tackling this. The way I've implemented it, is just as a short test
 during XidInMVCCSnapshot() so that we trap the case when the xid ==
 xmax and so would appear to be running. This is much less invasive and
 just as performant as Tom's original suggestion.


 TransactionIdIsCurrentTransactionId() can be fairly expensive if you have a
 lot of subtransactions open...

I've put in something to avoid that cost for the common case - just a boolean.

This seems like the best plan rather than the explicit FREEZE option.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c910863..3899282 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2056,6 +2056,17 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup-t_tableOid = RelationGetRelid(relation);
 
 	/*
+	 * If we are inserting into a new relation invisible as yet to other
+	 * backends and our session has no prior snapshots and no ready portals
+	 * then we can also set the hint bit for the rows we are inserting. The
+	 * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives
+	 * the right answer if the current transaction inspects the data after
+	 * we load it.
+	 */
+	if (options  HEAP_INSERT_HINT_XMIN)
+		tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED;
+
+	/*
 	 * If the new tuple is too big for storage or contains already toasted
 	 * out-of-line attributes from some other relation, invoke the toaster.
 	 */
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index e22bdac..de7504c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -146,6 +146,7 @@ typedef struct TransactionStateData
 	int			prevSecContext; /* previous SecurityRestrictionContext */
 	bool		prevXactReadOnly;		/* entry-time xact r/o state */
 	bool		startedInRecovery;		/* did we start in recovery? */
+	bool		maySeePreHintedTuples;	/* did subtrans write pre-hinted rows? */
 	struct TransactionStateData *parent;		/* back link to parent */
 } TransactionStateData;
 
@@ -175,6 +176,7 @@ static TransactionStateData TopTransactionStateData = {
 	0,			/* previous SecurityRestrictionContext */
 	false,		/* entry-time xact r/o state */
 	false,		/* startedInRecovery */
+	false,		/* maySeePreHintedTuples */
 	NULL		/* link to parent state block */
 };
 
@@ -704,6 +706,18 @@ TransactionStartedDuringRecovery(void)
 	return CurrentTransactionState-startedInRecovery;
 }
 
+bool
+TransactionMaySeePreHintedTuples(void)
+{
+	return CurrentTransactionState-maySeePreHintedTuples;
+}
+
+void
+TransactionMayWritePreHintedTuples(void)
+{
+	CurrentTransactionState-maySeePreHintedTuples = true;
+}
+
 /*
  *	CommandCounterIncrement
  */
@@ -1689,6 +1703,7 @@ StartTransaction(void)
 		s-startedInRecovery = false;
 		XactReadOnly = DefaultXactReadOnly;
 	}
+	s-maySeePreHintedTuples = false;
 	XactDeferrable = DefaultXactDeferrable;
 	XactIsoLevel = DefaultXactIsoLevel;
 	forceSyncCommit = false;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 110480f..1419e33 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include utils/builtins.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/portal.h
 #include utils/rel.h
 #include utils/snapmgr.h
 
@@ -1922,6 +1923,13 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_SKIP_FSM;
 		if 

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-03-01 Thread Peter Eisentraut
On tor, 2012-03-01 at 20:56 +0900, Shigeru Hanada wrote:
 How about moving postgresql_fdw_validator into dblink,

That's probably a good move.  If this were C++, we might try to subclass
this whole thing a bit, to avoid code duplication, but I don't see an
easy way to do that here.

  with renaming to dblink_fdw_validator? 

Well, it's not the validator of the dblink_fdw, so maybe something like
basic_postgresql_fdw_validator.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incompatible pointer types with newer zlib

2012-03-01 Thread Peter Eisentraut
On fre, 2012-02-24 at 11:10 -0500, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On tor, 2012-02-23 at 10:17 -0500, Tom Lane wrote:
  void * seems entirely reasonable given the two different usages, but
  I would be happier if the patch added explicit casts whereever FH is
  set to or used as one of these two types.
 
  That would add about 70 casts all over the place.  I don't think that
  will make things clearer or more robust.  I think we either leave it as
  my first patch for now or find a more robust solution with a union or
  something.
 
 Hm.  Could we just create two separate fields?  It's not real clear to
 me that forcing both these usages into a generic pointer buys much.

It's used in confusing ways.

In pg_backup_files.c _PrintFileData(), it's a compile-time decision
whether FH is a FILE or a gzFile:

#ifdef HAVE_LIBZ
AH-FH = gzopen(filename, rb);
#else
AH-FH = fopen(filename, PG_BINARY_R);
#endif

But if we changed FH to be gzFile under HAVE_LIBZ, then tons of other
places will complain that use fread(), fwrite(), fileno(), etc. directly
on FH.

Considering the volume of who complains in one way versus the other, I
think _PrintFileData() is at fault.

I think the best fix would be to rearrange _PrintFileData() so that it
doesn't use FH at all.  Instead, we could define a separate
ArchiveHandle field IF that works more like OF, and then change
ahwrite() to use that.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Josh Berkus

 And how would a DBA know that?
 
 We'd add a column to pg_class that tracks which page version is in use
 for each relation.

So a relation can't have some pages in Version 9.2, and other pages in
version 9.3?  How will this work for 2TB tables?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Robert Haas
On Thu, Mar 1, 2012 at 12:42 PM, Josh Berkus j...@agliodbs.com wrote:
 And how would a DBA know that?

 We'd add a column to pg_class that tracks which page version is in use
 for each relation.

 So a relation can't have some pages in Version 9.2, and other pages in
 version 9.3?  How will this work for 2TB tables?

Not very well, but better than Tom's proposal to require upgrading the
entire cluster in a single off-line operation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Josh Berkus

 So a relation can't have some pages in Version 9.2, and other pages in
 version 9.3?  How will this work for 2TB tables?
 
 Not very well, but better than Tom's proposal to require upgrading the
 entire cluster in a single off-line operation.

Yes, but the result will be that anyone with a 2TB table will *never*
convert it to the new format.  Which means we can never deprecate that
format, because lots of people will still be using it.

I continue to assert that all of this sounds like 9.3 work to me.  I'm
really not keen on pushing through a hack which will make pushing in a
long-term solution harder.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2012-03-01 Thread Alvaro Herrera


Why does CollectCheckedFunctions skip trigger functions?  My only guess
is that at one point the checker was not supposed to know how to check
them, and a later version learned about it and this bit wasn't updated;
but maybe there's another reason?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade --logfile option documentation

2012-03-01 Thread Peter Eisentraut
On ons, 2012-02-29 at 22:47 -0500, Bruce Momjian wrote:
 Hey, that's a good idea.  I would always write the pg_dump output to a
 log file.  If the dump succeeds, I remove the file, if not, I tell
 users to read the log file for details about the failure --- good
 idea.

But we also need the server log output somewhere.  So I think this temp
file would need to cover everything that -l covers.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-01 Thread Heikki Linnakangas

On 01.03.2012 18:40, Simon Riggs wrote:

On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 24.02.2012 22:55, Simon Riggs wrote:


What exactly does it do? Previously, we optimised COPY when it was
loading data into a newly created table or a freshly truncated table.
This patch extends that and actually sets the tuple header flag as
HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
code. The patch also adds some tests for corner cases that would make
that action break MVCC - though those cases are minor and typical data
loads will benefit fully from this.


This doesn't work with subtransactions:

...

The query should return the row copied in the same subtransaction.


Thanks for pointing that out.

New patch with corrected logic and test case attached.


It's still broken:

-- create test table and file
create table a as select 1 as id;
copy a to '/tmp/a';

-- start test
postgres=# begin;
BEGIN
postgres=# truncate a;
TRUNCATE TABLE
postgres=# savepoint sp1;
SAVEPOINT
postgres=# copy a from '/tmp/a';
COPY 1
postgres=# select * from a;
 id

  1
(1 row)

postgres=# rollback to savepoint sp1;
ROLLBACK
postgres=# select * from a;
 id

  1
(1 row)

That last select should not have seen the tuple.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 So a relation can't have some pages in Version 9.2, and other pages in
 version 9.3?  How will this work for 2TB tables?

 Not very well, but better than Tom's proposal to require upgrading the
 entire cluster in a single off-line operation.

WTF?  That was most certainly not what *I* was proposing; it's obviously
unworkable.  We need a process that can incrementally up-version a live
database and keep track of the minimum version present, at some
granularity smaller than whole database.

All of this was discussed and hashed out about two years ago, IIRC.
We just haven't made any progress towards actually implementing those
concepts.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-01 Thread Peter Geoghegan
On 1 March 2012 00:48, Alvaro Herrera alvhe...@commandprompt.com wrote:
 I'm curious about the LeafNode stuff.  Is this something that could be
 done by expression_tree_walker?  I'm not completely familiar with it so
 maybe there's some showstopper such as some node tags not being
 supported, or maybe it just doesn't help.  But I'm curious.

Good question. The LeafNode function (which is a bit of a misnomer, as
noted in a comment) looks rather like a walker function, as appears in
the example in a comment in nodeFuncs.c:

* expression_tree_walker() is designed to support routines that traverse
 * a tree in a read-only fashion (although it will also work for routines
 * that modify nodes in-place but never add/delete/replace nodes).
 * A walker routine should look like this:
 *
 * bool my_walker (Node *node, my_struct *context)
 * {
 *  if (node == NULL)
 *  return false;
 *  // check for nodes that special work is required for, eg:
 *  if (IsA(node, Var))
 *  {
 *  ... do special actions for Var nodes
 *  }
 *  else if (IsA(node, ...))
 *  {
 *  ... do special actions for other node types
 *  }
 *  // for any node type not specially processed, do:
 *  return expression_tree_walker(node, my_walker, (void *) 
context);
 * }

My understanding is that the expression-tree walking support is mostly
useful for the majority of walker code, which only cares about a small
subset of nodes, and hopes to avoid including boilerplate code just to
walk those other nodes that it's actually disinterested in.

This code, unlike most clients of expression_tree_walker(), is pretty
much interested in everything, since its express purpose is to
fingerprint all possible query trees.

Another benefit of expression_tree_walker is that if you miss a
certain node being added, (say a FuncExpr-like node), you get to
automatically have that node walked over to walk to the nodes that you
do in fact care about (such as those within this new nodes args List).
That makes perfect sense in the majority of cases, but here you've
already missed the fields within this new node that FuncExpr itself
lacks, so you're already finger-printing inaccurately. I suppose you
could still at least get the nodetag and still have a warning about
the fingerprinting being inadequate by going down the
expression_tree_walker path, but I'm inclined to wonder if it you
aren't just better of directly walking the tree, if only to encourage
the idea that this code needs to be maintained over time, and to cut
down on the little extra bit of indirection that that imposes.

It's not going to be any sort of burden to maintain it - it currently
stands at a relatively meagre 800 lines of code for everything to do
with tree walking - and the code that will have to be added with new
nodes or refactored along with the existing tree structure is going to
be totally trivial.

All of that said, I wouldn't mind making LeafNode into a walker, if
that approach is judged to be better, and you don't mind documenting
the order in which the tree is walked as deterministic, because the
order now matters in a way apparently not really anticipated by
expression_tree_walker, though that's probably not a problem.

My real concern now is that it's March 1st, and the last commitfest
may end soon. Even though this patch has extensive regression tests,
has been floating around for months, and, I believe, will end up being
a timely and important feature, a committer has yet to step forward to
work towards this patch getting committed. Can someone volunteer,
please? My expectation is that this feature will make life a lot
easier for a lot of Postgres users.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:
 On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I confess I am nervous about ripping this out.  I am pretty sure we
 will get complaints about it.  Performance optimizations that benefit
 group A at the expense of group B are always iffy, and I'm not sure
 the case of using an array as a path indicator is as uncommon as you
 seem to think.

 Maybe we should keep it as an option.

How would we make it optional?  There's noplace I can think of to stick
such a knob ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-01 Thread Daniel Farina
On Thu, Mar 1, 2012 at 1:27 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 My expectation is that this feature will make life a lot
 easier for a lot of Postgres users.

Yes.  It's hard to overstate the apparent utility of this feature in
the general category of visibility and profiling.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-01 Thread Josh Berkus
On 3/1/12 1:57 PM, Daniel Farina wrote:
 On Thu, Mar 1, 2012 at 1:27 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 My expectation is that this feature will make life a lot
 easier for a lot of Postgres users.
 
 Yes.  It's hard to overstate the apparent utility of this feature in
 the general category of visibility and profiling.

More importantly, this is what pg_stat_statements *should* have been in
8.4, and wasn't.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012:
 
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:
  On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I confess I am nervous about ripping this out.  I am pretty sure we
  will get complaints about it.  Performance optimizations that benefit
  group A at the expense of group B are always iffy, and I'm not sure
  the case of using an array as a path indicator is as uncommon as you
  seem to think.
 
  Maybe we should keep it as an option.
 
 How would we make it optional?  There's noplace I can think of to stick
 such a knob ...

Uhm, attoptions?

alter table foo alter column bar set extended_array_stats to on
or something like that?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing multi -t and adding -n to vacuumdb ?

2012-03-01 Thread Tom Lane
Jehan-Guillaume (ioguix) de Rorthais j...@dalibo.com writes:
 One of our customer send us a patch he wrote for his needs (on
 src/bin/scripts/vacuumdb.c, no doc were included).

 He's using one schema per application and would like to be able to run
 vacuumdb on each of them independently so he added the --schema|-n
 option and send us the patch.

 Reviewing his patch, I thought it would be more useful to allow multi
 --table|-t options on the command line first. It might be possible to
 pass an array of tables to vacuum_one_database function instead of
 just one.

 Then, we could add the --schema|-n option which would fetch and build
 the table list and call vacuum_one_database.

 But before I start writing this patch, I would like some opinion, pros /
 cons. Do you think such a feature could be accepted in official
 PostgreSQL code or should we keep this as an external script ?

I think most of us see vacuumdb as a historical leftover.  We keep it
around in case anyone is still relying on it, but improving it seems
like misdirected effort.

Why isn't your customer using autovacuum?  If there are concrete
reasons why that doesn't get the job done for him, it would be more
useful in the long run to work on fixing that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Nathan Boley
 What about MCV's? Will those be removed as well?

 Sure.  Those seem even less useful.

Ya, this will destroy the performance of several queries without some
heavy tweaking.

Maybe this is bad design, but I've gotten in the habit of storing
sequences as arrays and I commonly join on them. I looked through my
code this morning, and I only have one 'range' query ( of the form
described up-thread ), but there are tons of the form

SELECT att1, attb2 FROM rela, relb where rela.seq_array_1 = relb.seq_array;

I can provide some examples if that would make my argument more compelling.

Sorry to be difficult,
Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Tom Lane
Nathan Boley npbo...@gmail.com writes:
 Maybe this is bad design, but I've gotten in the habit of storing
 sequences as arrays and I commonly join on them. I looked through my
 code this morning, and I only have one 'range' query ( of the form
 described up-thread ), but there are tons of the form

 SELECT att1, attb2 FROM rela, relb where rela.seq_array_1 = relb.seq_array;

What do you mean by storing sequences as arrays?  Can you demonstrate
that the existing stats are relevant at all to the query you're worried
about?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Caching for stable expressions with constant arguments v6

2012-03-01 Thread Jaime Casanova
On Sat, Feb 4, 2012 at 5:40 AM, Marti Raudsepp ma...@juffo.org wrote:
 On Sat, Feb 4, 2012 at 09:49, Jaime Casanova ja...@2ndquadrant.com wrote:
 i little review...

 Thanks! By the way, you should update to the v7 patch.


just tried it and it fail when initializing on make check

creating information schema ... TRAP: FailedAssertion(!(*cachable ==
((bool) 1)), File: clauses.c, Line: 2295)
Aborted (core dumped)
child process exited with exit code 134


-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing multi -t and adding -n to vacuumdb ?

2012-03-01 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 Why isn't your customer using autovacuum?  If there are concrete
 reasons why that doesn't get the job done for him, it would be
 more useful in the long run to work on fixing that.
 
FWIW, we're using autovacuum here, at slightly more aggressive
settings from the default, and we still rely on manual vacuums for
two main reasons:
 
(1)  VACUUM FREEZE ANALYZE after a bulk load to avoid the hint bit
rewrite costs for end users and the unpredictable anti-wraparound
autovacuum when all the loaded data suddenly hits the freeze
threshold.
 
(2)  For base backups of databases across a slow WAN, we do a diff
rsync against a copy of the hot standby here.  (Well, actually, to
save space we do it against a hard-link copy of the previous base
backup against which we have run a diff rsync from the hot
standby.)  If we don't do a VACUUM FREEZE ANALYZE before each base
backup, it at least doubles the size of base backups, due to the
hint bit and xmin freeze changes that occur after the initial copy
of a tuple is backed up.
 
Simon's recent work, if it makes it in, will deal with (1), and it
may be possible to deal with (2) using much more aggressive
configurations for autovacuum, although I suspect it might take
another tweak or two to the back end to really cover that without
manual vacuums.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012:
 How would we make it optional?  There's noplace I can think of to stick
 such a knob ...

 Uhm, attoptions?

Oh, I had forgotten we had that mechanism already.  Yeah, that might
work.  I'm a bit tempted to design the option setting so that you can
select whether to keep the btree stats, the new stats, or both or
neither --- after all, there are probably plenty of databases where
nobody cares about the array-containment operators either.

That leaves the question of which setting should be the default ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Nathan Boley
[ sorry Tom, reply all this time... ]

 What do you mean by storing sequences as arrays?

So, a simple example is, for transcripts ( sequences of DNA that are
turned into proteins ), we store each of the connected components as
an array of the form:

exon_type in [1,6]
splice_type = [1,3]

and then the array elements are

[ exon_type, splice_type, exon_type ]

~ 99% of the elements are of the form [ [1,3], 1, [1,3] ],

so I almost always get a hash or merge join ( correctly ) but for the
rare junction types ( which are usually more interesting as well ) I
correctly get nest loops with an index scan.

 Can you demonstrate
 that the existing stats are relevant at all to the query you're worried
 about?

Well, if we didn't have mcv's and just relied on ndistinct to estimate
the '=' selectivities, either my low selectivity quals would use the
index, or my high selectivity quals would use a table scan, either of
which would be wrong.

I guess I could wipe out the stats and get some real numbers tonight,
but I can't see how the planner would be able to distinguish *without*
mcv's...

Best,
Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Robert Haas
On Thu, Mar 1, 2012 at 4:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 So a relation can't have some pages in Version 9.2, and other pages in
 version 9.3?  How will this work for 2TB tables?

 Not very well, but better than Tom's proposal to require upgrading the
 entire cluster in a single off-line operation.

 WTF?  That was most certainly not what *I* was proposing; it's obviously
 unworkable.  We need a process that can incrementally up-version a live
 database and keep track of the minimum version present, at some
 granularity smaller than whole database.

 All of this was discussed and hashed out about two years ago, IIRC.
 We just haven't made any progress towards actually implementing those
 concepts.

I am now officially confused.  I thought you and Heikki were arguing
about 24 hours ago that we needed to shut down the old database, run a
pre-upgrade utility to convert all the pages, run pg_upgrade, and then
bring the new database on-line; and that, further, you were arguing
that we should not support multiple page versions.  Now you seem to be
arguing the exact opposite - namely, that we should support multiple
page versions, and that the conversion should be done on-line.

So, to reiterate, I'm lost.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Alvaro Herrera

Excerpts from Robert Haas's message of jue mar 01 21:23:06 -0300 2012:
 On Thu, Mar 1, 2012 at 4:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  So a relation can't have some pages in Version 9.2, and other pages in
  version 9.3?  How will this work for 2TB tables?
 
  Not very well, but better than Tom's proposal to require upgrading the
  entire cluster in a single off-line operation.
 
  WTF?  That was most certainly not what *I* was proposing; it's obviously
  unworkable.  We need a process that can incrementally up-version a live
  database and keep track of the minimum version present, at some
  granularity smaller than whole database.
 
  All of this was discussed and hashed out about two years ago, IIRC.
  We just haven't made any progress towards actually implementing those
  concepts.
 
 I am now officially confused.  I thought you and Heikki were arguing
 about 24 hours ago that we needed to shut down the old database, run a
 pre-upgrade utility to convert all the pages, run pg_upgrade, and then
 bring the new database on-line;

Actually, nobody mentioned shutting the database down.  This utility
does not necessarily have to be something that runs independently from a
running postmaster; in fact what I envisioned was that we would have it
be run in a backend -- I've always imagined it's just a function to
which you give a table OID, and it converts pages of that table into the
new format.  Maybe a user-friendly utility would connect to each
database and call this function for each table.

 and that, further, you were arguing that we should not support
 multiple page versions.

I don't think we need to support multiple page versions, if multiple
means  2.  Obviously we need the server to support reading *two* page
versions; though we can probably get away with having support for
*writing* only the newest page version.

(The pg_class column would mean the oldest page version to be found in
this table, so the table can actually contain a mixture of old pages
and new pages.)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review of: collation for (expr)

2012-03-01 Thread Jaime Casanova
On Thu, Jan 19, 2012 at 1:50 PM, Peter Eisentraut pete...@gmx.net wrote:
 On tor, 2012-01-12 at 21:25 -0800, probabble wrote:
 Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from
 2012-01-12 checkout.

 Bison upgraded to v2.5, and downgraded to v2.4.1

 Make process for both versions resulted in the following errors:

 make[3]: Leaving directory `/usr/local/src/pgbuild/src/backend/catalog'
 make -C parser gram.h
 make[3]: Entering directory `/usr/local/src/pgbuild/src/backend/parser'
 /usr/local/bin/bison -d  -o gram.c gram.y
 gram.y: conflicts: 370 reduce/reduce
 gram.y: expected 0 reduce/reduce conflicts
 gram.y:10482.27-10494.33: warning: rule useless in parser due to conflicts:
 func_expr: COLLATION FOR '(' a_expr ')'
 make[3]: *** [gram.c] Error 1

 I can't reproduce that.


me neither

 In the meantime, attached is a re-merged version of the patch; the old
 version doesn't apply cleanly anymore.


besides a clash in the oid and the value of leakproof missing in the
pg_proc entry, everything works fine.

The only thing is that i don't see a reason for these includes in
src/backend/utils/adt/misc.c:

+ #include nodes/nodeFuncs.h
+ #include utils/lsyscache.h


-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Robert Haas's message of jue mar 01 21:23:06 -0300 2012:
 and that, further, you were arguing that we should not support
 multiple page versions.

 I don't think we need to support multiple page versions, if multiple
 means  2.

That's exactly the point here.  We clearly cannot support on-line
upgrade unless, somewhere along the line, we are willing to cope with
two page formats more or less concurrently.  What I don't want is for
that requirement to balloon to supporting N formats forever.  If we
do not have a mechanism that allows certifying that you have no
remaining pages of format N-1 before you upgrade to a server that
supports (only) versions N and N+1, then we're going to be in the
business of indefinite backwards compatibility instead.

I'm not entirely sure, but I think we may all be in violent agreement
about where this needs to end up.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-03-01 Thread Shigeru Hanada
2012/3/2 Peter Eisentraut pete...@gmx.net:
  with renaming to dblink_fdw_validator?

 Well, it's not the validator of the dblink_fdw, so maybe something like
 basic_postgresql_fdw_validator.

-1 for same reason.  It's not the validator of basic_postgresql_fdw.

Using fdw in the name of validator which doesn't have actual FDW might
confuse users.  Rather dblink_validator or libpq_option_validator is better?

One possible another idea is creating dblink_fdw which uses the
validator during CREATE EXTENSION dblink for users who store
connection information in FDW objects.

-- 
Shigeru Hanada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Robert Haas
On Thu, Mar 1, 2012 at 8:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Robert Haas's message of jue mar 01 21:23:06 -0300 2012:
 and that, further, you were arguing that we should not support
 multiple page versions.

 I don't think we need to support multiple page versions, if multiple
 means  2.

 That's exactly the point here.  We clearly cannot support on-line
 upgrade unless, somewhere along the line, we are willing to cope with
 two page formats more or less concurrently.  What I don't want is for
 that requirement to balloon to supporting N formats forever.  If we
 do not have a mechanism that allows certifying that you have no
 remaining pages of format N-1 before you upgrade to a server that
 supports (only) versions N and N+1, then we're going to be in the
 business of indefinite backwards compatibility instead.

 I'm not entirely sure, but I think we may all be in violent agreement
 about where this needs to end up.

I was using multiple to mean N1, so yeah, it's sounding like we're
more or less on the same page here.

One thing I'm not too sure about is how to extend the page format to
handle optional features.  For example, suppose we want to add 2 bytes
to the page header for a checksum (or 4 bytes, or any other number).
Ideally, I'd like to not use up those 2 bytes on pages that aren't
checksummed.  What do we do about that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade --logfile option documentation

2012-03-01 Thread Bruce Momjian
On Thu, Mar 01, 2012 at 08:45:26AM -0500, Robert Haas wrote:
 On Wed, Feb 29, 2012 at 6:02 PM, Bruce Momjian br...@momjian.us wrote:
  Any ideas about improving the error reporting more generally, so that
  when reloading the dump fails, the user can easily see what went
  belly-up, even if they didn't use -l?
 
  The only idea I have is to write the psql log to a temporary file and
  report the last X lines from the file in case of failure.  Does that
  help?
 
 Why not just redirect stdout but not stderr?  If there are error
 messages, surely we want the user to just see those.

Well, I think sending the error messages to the user but stdout to a
file will leave users confused because it will be unclear which SQL
statement generated the error.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade --logfile option documentation

2012-03-01 Thread Bruce Momjian
On Thu, Mar 01, 2012 at 10:17:04PM +0200, Peter Eisentraut wrote:
 On ons, 2012-02-29 at 22:47 -0500, Bruce Momjian wrote:
  Hey, that's a good idea.  I would always write the pg_dump output to a
  log file.  If the dump succeeds, I remove the file, if not, I tell
  users to read the log file for details about the failure --- good
  idea.
 
 But we also need the server log output somewhere.  So I think this temp
 file would need to cover everything that -l covers.

OK, combining your and Robert's ideas, how about I have pg_upgrade write
the server log to a file, and the pg_dump output to a file (with its
stderr), and if pg_upgrade fails, I report the failure and mention those
files.  If pg_upgrade succeeds, I remove the files?  pg_upgrade already
creates temporary files that it removes on completion.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 One thing I'm not too sure about is how to extend the page format to
 handle optional features.  For example, suppose we want to add 2 bytes
 to the page header for a checksum (or 4 bytes, or any other number).
 Ideally, I'd like to not use up those 2 bytes on pages that aren't
 checksummed.  What do we do about that?

I'm having a hard time getting excited about adding that requirement on
top of all the others that we have for this feature.  In particular, if
we insist on that, there is exactly zero chance of turning checksums on
on-the-fly, because you won't be able to do it if the page is full.

The scheme that seems to me to make the most sense for checksums,
if we grant Simon's postulate that a 2-byte checksum is enough, is
(1) repurpose the top N bits of pd_flags as a page version number,
for some N;
(2) repurpose pd_pagesize_version as the checksum, with the
understanding that you can't have a checksum in version-0 pages.

(Simon's current patch seems to be an overengineered version of that.
Possibly we should also ask ourselves how much we really need pd_tli
and whether that couldn't be commandeered as the checksum field.)

I see the page versioning stuff as mainly being of interest for changes
that are more invasive than this, for instance tuple-header or data
changes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-01 Thread Peter Geoghegan
On 1 March 2012 22:09, Josh Berkus j...@agliodbs.com wrote:
 On 3/1/12 1:57 PM, Daniel Farina wrote:
 On Thu, Mar 1, 2012 at 1:27 PM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 My expectation is that this feature will make life a lot
 easier for a lot of Postgres users.

 Yes.  It's hard to overstate the apparent utility of this feature in
 the general category of visibility and profiling.

 More importantly, this is what pg_stat_statements *should* have been in
 8.4, and wasn't.

It would probably be prudent to concentrate on getting the core
infrastructure committed first. That way, we at least know that if
this doesn't get into 9.2, we can work on getting it into 9.3 knowing
that once committed, people won't have to wait over a year at the very
least to be able to use the feature. The footprint of such a change is
quite small:

 src/backend/nodes/copyfuncs.c  |2 +
 src/backend/nodes/equalfuncs.c |4 +
 src/backend/nodes/outfuncs.c   |6 +
 src/backend/nodes/readfuncs.c  |5 +
 src/backend/optimizer/plan/planner.c   |1 +
 src/backend/parser/analyze.c   |   37 +-
 src/backend/parser/parse_coerce.c  |   12 +-
 src/backend/parser/parse_param.c   |   11 +-
 src/include/nodes/parsenodes.h |3 +
 src/include/nodes/plannodes.h  |2 +
 src/include/parser/analyze.h   |   12 +
 src/include/parser/parse_node.h|3 +-

That said, I believe that the patch is pretty close to a commitable
state as things stand, and that all that is really needed is for a
committer familiar with the problem space to conclude the work started
by Daniel and others, adding:

contrib/pg_stat_statements/pg_stat_statements.c| 1420 ++-

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Robert Haas
On Thu, Mar 1, 2012 at 9:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 One thing I'm not too sure about is how to extend the page format to
 handle optional features.  For example, suppose we want to add 2 bytes
 to the page header for a checksum (or 4 bytes, or any other number).
 Ideally, I'd like to not use up those 2 bytes on pages that aren't
 checksummed.  What do we do about that?

 I'm having a hard time getting excited about adding that requirement on
 top of all the others that we have for this feature.  In particular, if
 we insist on that, there is exactly zero chance of turning checksums on
 on-the-fly, because you won't be able to do it if the page is full.

 The scheme that seems to me to make the most sense for checksums,
 if we grant Simon's postulate that a 2-byte checksum is enough, is
 (1) repurpose the top N bits of pd_flags as a page version number,
    for some N;
 (2) repurpose pd_pagesize_version as the checksum, with the
    understanding that you can't have a checksum in version-0 pages.

 (Simon's current patch seems to be an overengineered version of that.
 Possibly we should also ask ourselves how much we really need pd_tli
 and whether that couldn't be commandeered as the checksum field.)

 I see the page versioning stuff as mainly being of interest for changes
 that are more invasive than this, for instance tuple-header or data
 changes.

Well, OK, so...  it wouldn't bother me a bit to steal pd_tli for this,
although Simon previously objected to steeling even half of it, when I
suggested that upthread.

But I don't see the point of your first proposal: keeping the page
version right where it is is a good idea, and we should do it.  We
could steel some *high* order bits from that field without breaking
anything, but moving it around seems like it will complicate life to
no good purpose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Canceling ROLLBACK statement

2012-03-01 Thread Pavan Deolasee
Hi All,

While working on something in XC, I hit upon an assertion failure. While
this is in XC code, I believe there can be a way of reproducing this on
vanilla PostgreSQL as well. I could not do so even after several tries,
unless I add some code or run it through debugger. Here is the theory
anyways.

If client is running a ROLLBACK statement and sends a statement cancel
signal to the server and if the cancel signal gets processed after the
transaction AbortTransaction() is completed, but before
CleanupTransaction() is called, I think the server may try to ABORT the
transaction again, especially if it raises a ereport(ERROR) reporting
canceling statement due to user request.  That results in server throwing
a warning AbortTransaction while in ABORT state and subsequently an
assertion failure in ProcArrayEndTransaction because proc-xid is not valid.

The reason I believe I am not able to reproduce this is because interrupts
are usually not processed between AbortTransaction and CleanupTransaction.
And when the server gets a chance to process the interrupts, its
DoingCommandRead and hence the cancel statement event is ignored. But its
not entirely unlikely that the server may get chance to process the
interrupt earlier. For example, if CleanupTransaction() calls elog() and
there are bunch of possible code paths where it can happen.

I am not sure if this is really an issue or if its worth fixing, but I
thought it will be good to share at the least.

Thanks,
Pavan
-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Canceling ROLLBACK statement

2012-03-01 Thread Tom Lane
Pavan Deolasee pavan.deola...@gmail.com writes:
 If client is running a ROLLBACK statement and sends a statement cancel
 signal to the server and if the cancel signal gets processed after the
 transaction AbortTransaction() is completed, but before
 CleanupTransaction() is called, I think the server may try to ABORT the
 transaction again,

This should be okay.  If you can demonstrate a case where it causes
problems, that would be a bug to fix, but I don't think it's likely
to be anything fundamental.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers