Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-03-01 Thread Shigeru Hanada
(2012/03/01 0:33), Tom Lane wrote:
> I don't think that creating such a dependency is acceptable.
> Even if we didn't mind the dependency, you said yourself that
> contrib/postgresql_fdw's validator will accept stuff that's not
> appropriate for dblink.

Agreed.  I think that these two contrib modules (and all FDW modules)
should have individual validator for each to avoid undesirable
dependency and naming conflict, and such validator function should be
inside each module, but not in core.

How about moving postgresql_fdw_validator into dblink, with renaming to
dblink_fdw_validator?  Attached patch achieves such changes.  I've left
postgresql_fdw_validator" in foreign_data regression test section, so
that foreign_data section can still check whether FDW DDLs invoke
validator function.  I used the name "postgresql_fdw_validator" for test
validator to make change as little as possible.

This change requires dblink to have new function, so its version should
be bumped to 1.1.

These changes have no direct relation to PostgreSQL FDW, so this patch
can be applied by itself.  If this patch has been applied, I'll rename
pgsql_fdw to postgresql_fdw which contains product name fully spelled out.

-- 
Shigeru Hanada
diff --git a/contrib/dblink/Makefile b/contrib/dblink/Makefile
index ac63748..a27db88 100644
*** a/contrib/dblink/Makefile
--- b/contrib/dblink/Makefile
*** SHLIB_LINK = $(libpq)
*** 7,13 
  SHLIB_PREREQS = submake-libpq
  
  EXTENSION = dblink
! DATA = dblink--1.0.sql dblink--unpackaged--1.0.sql
  
  REGRESS = dblink
  
--- 7,13 
  SHLIB_PREREQS = submake-libpq
  
  EXTENSION = dblink
! DATA = dblink--1.1.sql dblink--1.0--1.1.sql dblink--unpackaged--1.0.sql
  
  REGRESS = dblink
  
diff --git a/contrib/dblink/dblink--1.0--1.1.sql 
b/contrib/dblink/dblink--1.0--1.1.sql
index ...3dd02a0 .
*** a/contrib/dblink/dblink--1.0--1.1.sql
--- b/contrib/dblink/dblink--1.0--1.1.sql
***
*** 0 
--- 1,17 
+ /* contrib/dblink/dblink--1.0--1.1.sql */
+ 
+ -- complain if script is sourced in psql, rather than via ALTER EXTENSION
+ \echo Use "ALTER EXTENSION dblink UPDATE" to load this file. \quit
+ 
+ /* First we have to create validator function */
+ CREATE FUNCTION dblink_fdw_validator(
+ options text[],
+ catalog oid
+ )
+ RETURNS boolean
+ AS 'MODULE_PATHNAME', 'dblink_fdw_validator'
+ LANGUAGE C STRICT;
+ 
+ /* Then we add the validator */
+ ALTER EXTENSION dblink ADD FUNCTION dblink_fdw_validator(text[], oid);
+ 
diff --git a/contrib/dblink/dblink--1.1.sql b/contrib/dblink/dblink--1.1.sql
index ...ec02498 .
*** a/contrib/dblink/dblink--1.1.sql
--- b/contrib/dblink/dblink--1.1.sql
***
*** 0 
--- 1,231 
+ /* contrib/dblink/dblink--1.1.sql */
+ 
+ -- complain if script is sourced in psql, rather than via CREATE EXTENSION
+ \echo Use "CREATE EXTENSION dblink" to load this file. \quit
+ 
+ -- dblink_connect now restricts non-superusers to password
+ -- authenticated connections
+ CREATE FUNCTION dblink_connect (text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_connect'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_connect (text, text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_connect'
+ LANGUAGE C STRICT;
+ 
+ -- dblink_connect_u allows non-superusers to use
+ -- non-password authenticated connections, but initially
+ -- privileges are revoked from public
+ CREATE FUNCTION dblink_connect_u (text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_connect'
+ LANGUAGE C STRICT SECURITY DEFINER;
+ 
+ CREATE FUNCTION dblink_connect_u (text, text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_connect'
+ LANGUAGE C STRICT SECURITY DEFINER;
+ 
+ REVOKE ALL ON FUNCTION dblink_connect_u (text) FROM public;
+ REVOKE ALL ON FUNCTION dblink_connect_u (text, text) FROM public;
+ 
+ CREATE FUNCTION dblink_disconnect ()
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_disconnect'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_disconnect (text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_disconnect'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_open (text, text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_open'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_open (text, text, boolean)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_open'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_open (text, text, text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_open'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_open (text, text, text, boolean)
+ RETURNS text
+ AS 'MODULE_PATHNAME','dblink_open'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_fetch (text, int)
+ RETURNS setof record
+ AS 'MODULE_PATHNAME','dblink_fetch'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_fetch (text, int, boolean)
+ RETURNS setof record
+ AS 'MODULE_PATHNAME','dblink_fetch'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_fetch (text, text, int)
+ RETURNS setof record
+ AS 'MODULE_PATHNAME','dblink_fetch'
+ LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION dblink_fetch (text, text, int, boolea

Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Robert Haas
On Wed, Feb 29, 2012 at 5:52 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Feb 29, 2012 at 4:34 PM, Tom Lane  wrote:
>>> Easier for who?  I don't care for the idea of code that has to cope with
>>> two page formats, or before long N page formats, because if we don't
>>> have some mechanism like this then we will never be able to decide that
>>> an old data format is safely dead.
>
>> Huh?  You can drop support for a new page format any time you like.
>> You just decree that version X+1 no longer supports it, and you can't
>> pg_upgrade to it until all traces of the old page format are gone.
>
> And how would a DBA know that?

We'd add a column to pg_class that tracks which page version is in use
for each relation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade --logfile option documentation

2012-03-01 Thread Robert Haas
On Wed, Feb 29, 2012 at 6:02 PM, Bruce Momjian  wrote:
>> Any ideas about improving the error reporting more generally, so that
>> when reloading the dump fails, the user can easily see what went
>> belly-up, even if they didn't use -l?
>
> The only idea I have is to write the psql log to a temporary file and
> report the last X lines from the file in case of failure.  Does that
> help?

Why not just redirect stdout but not stderr?  If there are error
messages, surely we want the user to just see those.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-03-01 Thread Robert Haas
On Wed, Feb 29, 2012 at 6:01 PM, Tom Lane  wrote:
>> Well, my "evidence" is that a parameterized path should pretty much
>> always include a paramaterized path somewhere in there - otherwise,
>> what is parameterization doing for us?
>
> Well, yes, we know that much.

I didn't write what I meant to write there.  I meant to say: a
parameterized path is presumably going to contain a parameterized
*index scan* somewhere within.  So somewhere we're going to have
something of the form

-> Index Scan blah on blah
Index Cond: someattr = $1

And if that path weren't parameterized, we'd have to read the whole
relation, either with a full index scan, or a sequential scan.  Or, I
mean, maybe there's a filter condition, so that no path needs to
retrieve the *whole* relation, but even there the index cond is on top
of that, and it's probably doing something, though I suppose you're
right that there might be cases where it doesn't.

>> And that's going to reduce the
>> row count.  I may be missing something, but I'm confused as to why
>> this isn't nearly tautological.
>
> We don't know that --- I will agree it's likely, but that doesn't make
> it so certain that we can assume it without checking.  A join condition
> won't necessarily eliminate any rows.
>
> (... thinks about that for awhile ...)  One thing we could possibly do
> is have indxpath.c arbitrarily reject parameterizations that don't
> produce a smaller estimated number of rows than an unparameterized scan.
> Admittedly, this still doesn't *prove* the assumption for join
> relations, but maybe it brings the odds to where it's okay for add_path
> to make such an assumption.

That seems to make sense.

> (... thinks some more ...)  No, that doesn't get us there, because that
> doesn't establish that a more-parameterized path produces fewer rows
> than some path that requires less parameterization, yet not none at
> all.  You really want add_path carrying out those comparisons.  In your
> previous example, it's entirely possible that path D is dominated by B
> or C because of poor choices of join quals.

I'm not following this part.  Can you explain further?  It seems to me
at any rate that we could get pretty far if we could just separate
parameterized paths and unparameterized paths into separate buckets.
Even if we have to do some extra work when comparing parameterized
paths *to each other*, we'd gain a fair amount by avoiding comparing
any of them with the unparameterized paths.  Or at least, I hope so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] performance results on IBM POWER7

2012-03-01 Thread Robert Haas
IBM has provided the PostgreSQL community with access to a couple of
IBM POWER7 machines through OSUOSL.  Simon has access to one, carved
up into a couple of LPARs, for replication work, and there's a
buildfarm animal on there as well, I think; I have access to the
other, for performance testing.  I imagine we can get access for a few
other people as well, though at the moment the performance-testing
machine is inaccessible and I'm not having very much luck getting help
from the very busy OSUOSL folks.  Anyway, before it bit the dust, I
was able to do some basic pgbench tests at various scale factors and
client counts.  I used my usual settings:

shared_buffers = 8GB
maintenance_work_mem = 1GB
synchronous_commit = off
checkpoint_segments = 300
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
wal_writer_delay = 20ms

I did three five-minute runs at each scale factors 100, 300, 1000,
3000, and 1, with varying client counts: 1, 2, and all multiples
of 4 up to 80.  I stopped and restarted the database after each run
(but did not flush the OS cache, so this is a warm-start test) and
took the median of the three results for each run.  Full results are
attached herewith; pretty graphs are on my blog at
http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html

When I get the machine back, my plan is to next run some read-write
pgbench tests.  Those will need to be longer, though.  Read
performance doesn't seem to be very sensitive to the length of the
tests, but write performance is, so I'll probably need at least
30-minute runs if not more to get an accurate sense of what the
performance is like.  After that I think maybe some testing of the
remaining CommitFest patches might be in order (though personally I'd
like to wrap this CommitFest up fairly soon) to see if any of those
improve things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


f.100
Description: Binary data


f.300
Description: Binary data


f.1000
Description: Binary data


f.3000
Description: Binary data


f.1
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Alexander Korotkov
On Thu, Mar 1, 2012 at 1:19 AM, Alexander Korotkov wrote:

> On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane  wrote:
>
>> That seems like a pretty narrow, uncommon use-case.  Also, to get
>> accurate stats for such queries that way, you'd need really enormous
>> histograms.  I doubt that the existing parameters for histogram size
>> will permit meaningful estimation of more than the first array entry
>> (since we don't make the histogram any larger than we do for a scalar
>> column).
>>
>> The real point here is that the fact that we're storing btree-style
>> stats for arrays is an accident, backed into by having added btree
>> comparators for arrays plus analyze.c's habit of applying default
>> scalar-oriented analysis functions to any type without an explicit
>> typanalyze entry.  I don't recall that we ever thought hard about
>> it or showed that those stats were worth anything.
>>
>
> OK. I don't object to removing btree stats from arrays.
> What do you thinks about pg_stats view in this case? Should it combine
> values histogram and array length histogram in single column like do for
> MCV and MCELEM?
>

Btree statistics for arrays and additional statistics slot are removed from
attached version of patch. pg_stats view is untouched for while.

--
With best regards,
Alexander Korotkov.


arrayanalyze-0.13.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Robert Haas
On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane  wrote:
> Nathan Boley  writes:
>> On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane  wrote:
>>> I am starting to look at this patch now.  I'm wondering exactly why the
>>> decision was made to continue storing btree-style statistics for arrays,
>>> in addition to the new stuff.
>
>> If I understand you're suggestion, queries of the form
>
>> SELECT * FROM rel
>> WHERE ARRAY[ 1,2,3,4 ] <= x
>>      AND x <=ARRAY[ 1, 2, 3, 1000];
>
>> would no longer use an index. Is that correct?
>
> No, just that we'd no longer have statistics relevant to that, and would
> have to fall back on default selectivity assumptions.  Do you think that
> such applications are so common as to justify bloating pg_statistic for
> everybody that uses arrays?

I confess I am nervous about ripping this out.  I am pretty sure we
will get complaints about it.  Performance optimizations that benefit
group A at the expense of group B are always iffy, and I'm not sure
the case of using an array as a path indicator is as uncommon as you
seem to think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Alvaro Herrera

Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:
> On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane  wrote:

> > No, just that we'd no longer have statistics relevant to that, and would
> > have to fall back on default selectivity assumptions.  Do you think that
> > such applications are so common as to justify bloating pg_statistic for
> > everybody that uses arrays?
> 
> I confess I am nervous about ripping this out.  I am pretty sure we
> will get complaints about it.  Performance optimizations that benefit
> group A at the expense of group B are always iffy, and I'm not sure
> the case of using an array as a path indicator is as uncommon as you
> seem to think.

Maybe we should keep it as an option.  I do think it's quite uncommon,
but for those rare users, it'd be good to provide the capability while
not bloating everyone else's stat catalog.  The thing is, people using
arrays as path indicators and such are likely using relatively small
arrays; people storing real data are likely to store much bigger arrays.
Just a hunch though.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-03-01 Thread Marko Kreen
On Tue, Feb 28, 2012 at 05:04:44PM +0900, Kyotaro HORIGUCHI wrote:
> > There is still one EOF in v3 getAnotherTuple() -
> > pqGetInt(tupnfields), please turn that one also to
> > protocolerror.
> 
> pqGetInt() returns EOF only when it wants additional reading from
> network if the parameter `bytes' is appropreate. Non-zero return
> from it seems should be handled as EOF, not a protocol error. The
> one point I had modified bugilly is also restored. The so-called
> 'protocol error' has been vanished eventually.

But it's broken in V3 protocol - getAnotherTuple() will be called
only if the packet is fully read.  If the packet contents do not
agree with packet header, it's protocol error.  Only valid EOF
return in V3 getAnotherTuple() is when row processor asks
for early exit.

> Is there someting left undone?

* Convert old EOFs to protocol errors in V3 getAnotherTuple()

* V2 getAnotherTuple() can leak PGresult when handling custom
  error from row processor.

* remove pqIsnonblocking(conn) check when row processor returned 2.
  I missed that it's valid to call PQisBusy/PQconsumeInput/PQgetResult
  on sync connection.

* It seems the return codes from callback should be remapped,
  (0, 1, 2) is unusual pattern.  Better would be:

   -1 - error
0 - stop parsing / early exit ("I'm not done yet")
1 - OK ("I'm done with the row")

* Please drop PQsetRowProcessorErrMsg() / PQresultSetErrMsg().
  Main problem is that it needs to be synced with error handling
  in rest of libpq, which is unlike the rest of row processor patch,
  which consists only of local changes.  All solutions here
  are either ugly hacks or too complex to be part of this patch.

Also considering that we have working exceptions and PQgetRow,
I don't see much need for custom error messages.  If really needed,
it should be introduced as separate patch, as the area of code it
affects is completely different.

Currently the custom error messaging seems to be the blocker for
this patch, because of raised complexity when implementing it and
when reviewing it.  Considering how unimportant the provided
functionality is, compared to rest of the patch, I think we should
simply drop it.

My suggestion - check in getAnotherTuple whether resultStatus is
already error and do nothing then.  This allows internal pqAddRow
to set regular "out of memory" error.  Otherwise give generic
"row processor error".

> By the way, I noticed that dblink always says that the current
> connection is 'unnamed' in messages the errors in
> dblink_record_internal@dblink.  I could see that
> dblink_record_internal defines the local variable conname = NULL
> and pass it to dblink_res_error to display the error message. But
> no assignment on it in the function.
> 
> It seemed properly shown when I added the code to set conname
> from PG_GETARG_TEXT_PP(0) if available, in other words do that
> just after DBLINK_GET_CONN/DBLINK_GET_NAMED_CONN's. It seems the
> dblink's manner...  This is not included in this patch.
> 
> Furthurmore dblink_res_error looks only into returned PGresult to
> display the error and always says only `Error occurred on dblink
> connection..: could not execute query'..
> 
> Is it right to consider this as follows?
> 
>  - dblink is wrong in error handling. A client of libpq should
>see PGconn by PQerrorMessage() if (or regardless of whether?)
>PGresult says nothing about error.

Yes, it seems like bug.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance results on IBM POWER7

2012-03-01 Thread Ants Aasma
On Thu, Mar 1, 2012 at 4:54 PM, Robert Haas  wrote:
> ... After that I think maybe some testing of the
> remaining CommitFest patches might be in order (though personally I'd
> like to wrap this CommitFest up fairly soon) to see if any of those
> improve things.

Besides performance testing, could you check how clocksources behave
on this kind of machine?
You can find pg_test_timing tool attached here:
http://archives.postgresql.org/pgsql-hackers/2012-01/msg00937.php

To see which clocksources are available, you can do:
# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
To switch the clocksource, just write the desired clocksource like this:
# echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource

Thanks,
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Allowing multi "-t" and adding "-n" to vacuumdb ?

2012-03-01 Thread Jehan-Guillaume (ioguix) de Rorthais
Hi,

One of our customer send us a patch he wrote for his needs (on
"src/bin/scripts/vacuumdb.c", no doc were included).

He's using one schema per application and would like to be able to run
vacuumdb on each of them independently so he added the "--schema|-n"
option and send us the patch.

Reviewing his patch, I thought it would be more useful to allow multi
"--table|-t" options on the command line first. It might be possible to
pass an array of tables to "vacuum_one_database" function instead of
just one.

Then, we could add the "--schema|-n" option which would fetch and build
the table list and call "vacuum_one_database".

But before I start writing this patch, I would like some opinion, pros /
cons. Do you think such a feature could be accepted in official
PostgreSQL code or should we keep this as an external script ?

Thank you,
-- 
Jehan-Guillaume (ioguix) de Rorthais
http://www.dalibo.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance results on IBM POWER7

2012-03-01 Thread Robert Haas
On Thu, Mar 1, 2012 at 11:23 AM, Ants Aasma  wrote:
> On Thu, Mar 1, 2012 at 4:54 PM, Robert Haas  wrote:
>> ... After that I think maybe some testing of the
>> remaining CommitFest patches might be in order (though personally I'd
>> like to wrap this CommitFest up fairly soon) to see if any of those
>> improve things.
>
> Besides performance testing, could you check how clocksources behave
> on this kind of machine?
> You can find pg_test_timing tool attached here:
> http://archives.postgresql.org/pgsql-hackers/2012-01/msg00937.php
>
> To see which clocksources are available, you can do:
> # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
> To switch the clocksource, just write the desired clocksource like this:
> # echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource

Sure, I'll check that as soon as it's back up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-01 Thread Simon Riggs
On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas
 wrote:
> On 24.02.2012 22:55, Simon Riggs wrote:
>>
>> A long time ago, in a galaxy far away, we discussed ways to speed up
>> data loads/COPY.
>> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php
>>
>> In particular, the idea that we could mark tuples as committed while
>> we are still loading them, to avoid negative behaviour for the first
>> reader.
>>
>> Simple patch to implement this is attached, together with test case.
>>
>>  ...
>>
>>
>> What exactly does it do? Previously, we optimised COPY when it was
>> loading data into a newly created table or a freshly truncated table.
>> This patch extends that and actually sets the tuple header flag as
>> HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
>> code. The patch also adds some tests for corner cases that would make
>> that action break MVCC - though those cases are minor and typical data
>> loads will benefit fully from this.
>
>
> This doesn't work with subtransactions:
...
> The query should return the row copied in the same subtransaction.

Thanks for pointing that out.

New patch with corrected logic and test case attached.

>> In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
>> and adding current xid to snapshots. That is an invasive change that I
>> would wish to avoid at any time and explains the long delay in
>> tackling this. The way I've implemented it, is just as a short test
>> during XidInMVCCSnapshot() so that we trap the case when the xid ==
>> xmax and so would appear to be running. This is much less invasive and
>> just as performant as Tom's original suggestion.
>
>
> TransactionIdIsCurrentTransactionId() can be fairly expensive if you have a
> lot of subtransactions open...

I've put in something to avoid that cost for the common case - just a boolean.

This seems like the best plan rather than the explicit FREEZE option.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c910863..3899282 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2056,6 +2056,17 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup->t_tableOid = RelationGetRelid(relation);
 
 	/*
+	 * If we are inserting into a new relation invisible as yet to other
+	 * backends and our session has no prior snapshots and no ready portals
+	 * then we can also set the hint bit for the rows we are inserting. The
+	 * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives
+	 * the right answer if the current transaction inspects the data after
+	 * we load it.
+	 */
+	if (options & HEAP_INSERT_HINT_XMIN)
+		tup->t_data->t_infomask |= HEAP_XMIN_COMMITTED;
+
+	/*
 	 * If the new tuple is too big for storage or contains already toasted
 	 * out-of-line attributes from some other relation, invoke the toaster.
 	 */
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index e22bdac..de7504c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -146,6 +146,7 @@ typedef struct TransactionStateData
 	int			prevSecContext; /* previous SecurityRestrictionContext */
 	bool		prevXactReadOnly;		/* entry-time xact r/o state */
 	bool		startedInRecovery;		/* did we start in recovery? */
+	bool		maySeePreHintedTuples;	/* did subtrans write pre-hinted rows? */
 	struct TransactionStateData *parent;		/* back link to parent */
 } TransactionStateData;
 
@@ -175,6 +176,7 @@ static TransactionStateData TopTransactionStateData = {
 	0,			/* previous SecurityRestrictionContext */
 	false,		/* entry-time xact r/o state */
 	false,		/* startedInRecovery */
+	false,		/* maySeePreHintedTuples */
 	NULL		/* link to parent state block */
 };
 
@@ -704,6 +706,18 @@ TransactionStartedDuringRecovery(void)
 	return CurrentTransactionState->startedInRecovery;
 }
 
+bool
+TransactionMaySeePreHintedTuples(void)
+{
+	return CurrentTransactionState->maySeePreHintedTuples;
+}
+
+void
+TransactionMayWritePreHintedTuples(void)
+{
+	CurrentTransactionState->maySeePreHintedTuples = true;
+}
+
 /*
  *	CommandCounterIncrement
  */
@@ -1689,6 +1703,7 @@ StartTransaction(void)
 		s->startedInRecovery = false;
 		XactReadOnly = DefaultXactReadOnly;
 	}
+	s->maySeePreHintedTuples = false;
 	XactDeferrable = DefaultXactDeferrable;
 	XactIsoLevel = DefaultXactIsoLevel;
 	forceSyncCommit = false;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 110480f..1419e33 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/portal.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -1922,6 +1923,13 @@ CopyFrom(CopyState cstate)
 		hi_opt

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-03-01 Thread Peter Eisentraut
On tor, 2012-03-01 at 20:56 +0900, Shigeru Hanada wrote:
> How about moving postgresql_fdw_validator into dblink,

That's probably a good move.  If this were C++, we might try to subclass
this whole thing a bit, to avoid code duplication, but I don't see an
easy way to do that here.

>  with renaming to dblink_fdw_validator? 

Well, it's not the validator of the dblink_fdw, so maybe something like
basic_postgresql_fdw_validator.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incompatible pointer types with newer zlib

2012-03-01 Thread Peter Eisentraut
On fre, 2012-02-24 at 11:10 -0500, Tom Lane wrote:
> Peter Eisentraut  writes:
> > On tor, 2012-02-23 at 10:17 -0500, Tom Lane wrote:
> >> void * seems entirely reasonable given the two different usages, but
> >> I would be happier if the patch added explicit casts whereever FH is
> >> set to or used as one of these two types.
> 
> > That would add about 70 casts all over the place.  I don't think that
> > will make things clearer or more robust.  I think we either leave it as
> > my first patch for now or find a more robust solution with a union or
> > something.
> 
> Hm.  Could we just create two separate fields?  It's not real clear to
> me that forcing both these usages into a generic pointer buys much.

It's used in confusing ways.

In pg_backup_files.c _PrintFileData(), it's a compile-time decision
whether FH is a FILE or a gzFile:

#ifdef HAVE_LIBZ
AH->FH = gzopen(filename, "rb");
#else
AH->FH = fopen(filename, PG_BINARY_R);
#endif

But if we changed FH to be gzFile under HAVE_LIBZ, then tons of other
places will complain that use fread(), fwrite(), fileno(), etc. directly
on FH.

Considering the volume of who complains in one way versus the other, I
think _PrintFileData() is at fault.

I think the best fix would be to rearrange _PrintFileData() so that it
doesn't use FH at all.  Instead, we could define a separate
ArchiveHandle field IF that works more like OF, and then change
ahwrite() to use that.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Josh Berkus

>> And how would a DBA know that?
> 
> We'd add a column to pg_class that tracks which page version is in use
> for each relation.

So a relation can't have some pages in Version 9.2, and other pages in
version 9.3?  How will this work for 2TB tables?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Robert Haas
On Thu, Mar 1, 2012 at 12:42 PM, Josh Berkus  wrote:
>>> And how would a DBA know that?
>>
>> We'd add a column to pg_class that tracks which page version is in use
>> for each relation.
>
> So a relation can't have some pages in Version 9.2, and other pages in
> version 9.3?  How will this work for 2TB tables?

Not very well, but better than Tom's proposal to require upgrading the
entire cluster in a single off-line operation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Josh Berkus

>> So a relation can't have some pages in Version 9.2, and other pages in
>> version 9.3?  How will this work for 2TB tables?
> 
> Not very well, but better than Tom's proposal to require upgrading the
> entire cluster in a single off-line operation.

Yes, but the result will be that anyone with a 2TB table will *never*
convert it to the new format.  Which means we can never deprecate that
format, because lots of people will still be using it.

I continue to assert that all of this sounds like 9.3 work to me.  I'm
really not keen on pushing through a hack which will make pushing in a
long-term solution harder.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2012-03-01 Thread Alvaro Herrera


Why does CollectCheckedFunctions skip trigger functions?  My only guess
is that at one point the checker was not supposed to know how to check
them, and a later version learned about it and this bit wasn't updated;
but maybe there's another reason?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade --logfile option documentation

2012-03-01 Thread Peter Eisentraut
On ons, 2012-02-29 at 22:47 -0500, Bruce Momjian wrote:
> Hey, that's a good idea.  I would always write the pg_dump output to a
> log file.  If the dump succeeds, I remove the file, if not, I tell
> users to read the log file for details about the failure --- good
> idea.

But we also need the server log output somewhere.  So I think this temp
file would need to cover everything that -l covers.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-01 Thread Heikki Linnakangas

On 01.03.2012 18:40, Simon Riggs wrote:

On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas
  wrote:

On 24.02.2012 22:55, Simon Riggs wrote:


What exactly does it do? Previously, we optimised COPY when it was
loading data into a newly created table or a freshly truncated table.
This patch extends that and actually sets the tuple header flag as
HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
code. The patch also adds some tests for corner cases that would make
that action break MVCC - though those cases are minor and typical data
loads will benefit fully from this.


This doesn't work with subtransactions:

...

The query should return the row copied in the same subtransaction.


Thanks for pointing that out.

New patch with corrected logic and test case attached.


It's still broken:

-- create test table and file
create table a as select 1 as id;
copy a to '/tmp/a';

-- start test
postgres=# begin;
BEGIN
postgres=# truncate a;
TRUNCATE TABLE
postgres=# savepoint sp1;
SAVEPOINT
postgres=# copy a from '/tmp/a';
COPY 1
postgres=# select * from a;
 id

  1
(1 row)

postgres=# rollback to savepoint sp1;
ROLLBACK
postgres=# select * from a;
 id

  1
(1 row)

That last select should not have seen the tuple.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Tom Lane
Robert Haas  writes:
>> So a relation can't have some pages in Version 9.2, and other pages in
>> version 9.3?  How will this work for 2TB tables?

> Not very well, but better than Tom's proposal to require upgrading the
> entire cluster in a single off-line operation.

WTF?  That was most certainly not what *I* was proposing; it's obviously
unworkable.  We need a process that can incrementally up-version a live
database and keep track of the minimum version present, at some
granularity smaller than "whole database".

All of this was discussed and hashed out about two years ago, IIRC.
We just haven't made any progress towards actually implementing those
concepts.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-01 Thread Peter Geoghegan
On 1 March 2012 00:48, Alvaro Herrera  wrote:
> I'm curious about the LeafNode stuff.  Is this something that could be
> done by expression_tree_walker?  I'm not completely familiar with it so
> maybe there's some showstopper such as some node tags not being
> supported, or maybe it just doesn't help.  But I'm curious.

Good question. The LeafNode function (which is a bit of a misnomer, as
noted in a comment) looks rather like a walker function, as appears in
the example in a comment in nodeFuncs.c:

* expression_tree_walker() is designed to support routines that traverse
 * a tree in a read-only fashion (although it will also work for routines
 * that modify nodes in-place but never add/delete/replace nodes).
 * A walker routine should look like this:
 *
 * bool my_walker (Node *node, my_struct *context)
 * {
 *  if (node == NULL)
 *  return false;
 *  // check for nodes that special work is required for, eg:
 *  if (IsA(node, Var))
 *  {
 *  ... do special actions for Var nodes
 *  }
 *  else if (IsA(node, ...))
 *  {
 *  ... do special actions for other node types
 *  }
 *  // for any node type not specially processed, do:
 *  return expression_tree_walker(node, my_walker, (void *) 
context);
 * }

My understanding is that the expression-tree walking support is mostly
useful for the majority of walker code, which only cares about a small
subset of nodes, and hopes to avoid including boilerplate code just to
walk those other nodes that it's actually disinterested in.

This code, unlike most clients of expression_tree_walker(), is pretty
much interested in everything, since its express purpose is to
fingerprint all possible query trees.

Another benefit of expression_tree_walker is that if you miss a
certain node being added, (say a FuncExpr-like node), you get to
automatically have that node walked over to walk to the nodes that you
do in fact care about (such as those within this new nodes args List).
That makes perfect sense in the majority of cases, but here you've
already missed the fields within this new node that FuncExpr itself
lacks, so you're already finger-printing inaccurately. I suppose you
could still at least get the nodetag and still have a warning about
the fingerprinting being inadequate by going down the
expression_tree_walker path, but I'm inclined to wonder if it you
aren't just better of directly walking the tree, if only to encourage
the idea that this code needs to be maintained over time, and to cut
down on the little extra bit of indirection that that imposes.

It's not going to be any sort of burden to maintain it - it currently
stands at a relatively meagre 800 lines of code for everything to do
with tree walking - and the code that will have to be added with new
nodes or refactored along with the existing tree structure is going to
be totally trivial.

All of that said, I wouldn't mind making LeafNode into a walker, if
that approach is judged to be better, and you don't mind documenting
the order in which the tree is walked as deterministic, because the
order now matters in a way apparently not really anticipated by
expression_tree_walker, though that's probably not a problem.

My real concern now is that it's March 1st, and the last commitfest
may end soon. Even though this patch has extensive regression tests,
has been floating around for months, and, I believe, will end up being
a timely and important feature, a committer has yet to step forward to
work towards this patch getting committed. Can someone volunteer,
please? My expectation is that this feature will make life a lot
easier for a lot of Postgres users.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:
>> On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane  wrote:
>> I confess I am nervous about ripping this out.  I am pretty sure we
>> will get complaints about it.  Performance optimizations that benefit
>> group A at the expense of group B are always iffy, and I'm not sure
>> the case of using an array as a path indicator is as uncommon as you
>> seem to think.

> Maybe we should keep it as an option.

How would we make it optional?  There's noplace I can think of to stick
such a knob ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-01 Thread Daniel Farina
On Thu, Mar 1, 2012 at 1:27 PM, Peter Geoghegan  wrote:
> My expectation is that this feature will make life a lot
> easier for a lot of Postgres users.

Yes.  It's hard to overstate the apparent utility of this feature in
the general category of visibility and profiling.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-01 Thread Josh Berkus
On 3/1/12 1:57 PM, Daniel Farina wrote:
> On Thu, Mar 1, 2012 at 1:27 PM, Peter Geoghegan  wrote:
>> My expectation is that this feature will make life a lot
>> easier for a lot of Postgres users.
> 
> Yes.  It's hard to overstate the apparent utility of this feature in
> the general category of visibility and profiling.

More importantly, this is what pg_stat_statements *should* have been in
8.4, and wasn't.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012:
> 
> Alvaro Herrera  writes:
> > Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:
> >> On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane  wrote:
> >> I confess I am nervous about ripping this out.  I am pretty sure we
> >> will get complaints about it.  Performance optimizations that benefit
> >> group A at the expense of group B are always iffy, and I'm not sure
> >> the case of using an array as a path indicator is as uncommon as you
> >> seem to think.
> 
> > Maybe we should keep it as an option.
> 
> How would we make it optional?  There's noplace I can think of to stick
> such a knob ...

Uhm, attoptions?

"alter table foo alter column bar set extended_array_stats to on"
or something like that?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing multi "-t" and adding "-n" to vacuumdb ?

2012-03-01 Thread Tom Lane
"Jehan-Guillaume (ioguix) de Rorthais"  writes:
> One of our customer send us a patch he wrote for his needs (on
> "src/bin/scripts/vacuumdb.c", no doc were included).

> He's using one schema per application and would like to be able to run
> vacuumdb on each of them independently so he added the "--schema|-n"
> option and send us the patch.

> Reviewing his patch, I thought it would be more useful to allow multi
> "--table|-t" options on the command line first. It might be possible to
> pass an array of tables to "vacuum_one_database" function instead of
> just one.

> Then, we could add the "--schema|-n" option which would fetch and build
> the table list and call "vacuum_one_database".

> But before I start writing this patch, I would like some opinion, pros /
> cons. Do you think such a feature could be accepted in official
> PostgreSQL code or should we keep this as an external script ?

I think most of us see vacuumdb as a historical leftover.  We keep it
around in case anyone is still relying on it, but improving it seems
like misdirected effort.

Why isn't your customer using autovacuum?  If there are concrete
reasons why that doesn't get the job done for him, it would be more
useful in the long run to work on fixing that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Nathan Boley
>> What about MCV's? Will those be removed as well?
>
> Sure.  Those seem even less useful.

Ya, this will destroy the performance of several queries without some
heavy tweaking.

Maybe this is bad design, but I've gotten in the habit of storing
sequences as arrays and I commonly join on them. I looked through my
code this morning, and I only have one 'range' query ( of the form
described up-thread ), but there are tons of the form

SELECT att1, attb2 FROM rela, relb where rela.seq_array_1 = relb.seq_array;

I can provide some examples if that would make my argument more compelling.

Sorry to be difficult,
Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Tom Lane
Nathan Boley  writes:
> Maybe this is bad design, but I've gotten in the habit of storing
> sequences as arrays and I commonly join on them. I looked through my
> code this morning, and I only have one 'range' query ( of the form
> described up-thread ), but there are tons of the form

> SELECT att1, attb2 FROM rela, relb where rela.seq_array_1 = relb.seq_array;

What do you mean by "storing sequences as arrays"?  Can you demonstrate
that the existing stats are relevant at all to the query you're worried
about?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Caching for stable expressions with constant arguments v6

2012-03-01 Thread Jaime Casanova
On Sat, Feb 4, 2012 at 5:40 AM, Marti Raudsepp  wrote:
> On Sat, Feb 4, 2012 at 09:49, Jaime Casanova  wrote:
>> i little review...
>
> Thanks! By the way, you should update to the v7 patch.
>

just tried it and it fail when initializing on make check
"""
creating information schema ... TRAP: FailedAssertion("!(*cachable ==
((bool) 1))", File: "clauses.c", Line: 2295)
Aborted (core dumped)
child process exited with exit code 134
"""

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing multi "-t" and adding "-n" to vacuumdb ?

2012-03-01 Thread Kevin Grittner
Tom Lane  wrote:
 
> Why isn't your customer using autovacuum?  If there are concrete
> reasons why that doesn't get the job done for him, it would be
> more useful in the long run to work on fixing that.
 
FWIW, we're using autovacuum here, at slightly more aggressive
settings from the default, and we still rely on manual vacuums for
two main reasons:
 
(1)  VACUUM FREEZE ANALYZE after a bulk load to avoid the hint bit
rewrite costs for end users and the unpredictable anti-wraparound
autovacuum when all the loaded data suddenly hits the freeze
threshold.
 
(2)  For base backups of databases across a slow WAN, we do a "diff"
rsync against a copy of the hot standby here.  (Well, actually, to
save space we do it against a hard-link copy of the previous base
backup against which we have run a "diff" rsync from the hot
standby.)  If we don't do a VACUUM FREEZE ANALYZE before each base
backup, it at least doubles the size of base backups, due to the
hint bit and xmin freeze changes that occur after the initial copy
of a tuple is backed up.
 
Simon's recent work, if it makes it in, will deal with (1), and it
may be possible to deal with (2) using much more aggressive
configurations for autovacuum, although I suspect it might take
another tweak or two to the back end to really cover that without
manual vacuums.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012:
>> How would we make it optional?  There's noplace I can think of to stick
>> such a knob ...

> Uhm, attoptions?

Oh, I had forgotten we had that mechanism already.  Yeah, that might
work.  I'm a bit tempted to design the option setting so that you can
select whether to keep the btree stats, the new stats, or both or
neither --- after all, there are probably plenty of databases where
nobody cares about the array-containment operators either.

That leaves the question of which setting should be the default ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Nathan Boley
[ sorry Tom, reply all this time... ]

> What do you mean by "storing sequences as arrays"?

So, a simple example is, for transcripts ( sequences of DNA that are
turned into proteins ), we store each of the connected components as
an array of the form:

exon_type in [1,6]
splice_type = [1,3]

and then the array elements are

[ exon_type, splice_type, exon_type ]

~ 99% of the elements are of the form [ [1,3], 1, [1,3] ],

so I almost always get a hash or merge join ( correctly ) but for the
rare junction types ( which are usually more interesting as well ) I
correctly get nest loops with an index scan.

> Can you demonstrate
> that the existing stats are relevant at all to the query you're worried
> about?

Well, if we didn't have mcv's and just relied on ndistinct to estimate
the '=' selectivities, either my low selectivity quals would use the
index, or my high selectivity quals would use a table scan, either of
which would be wrong.

I guess I could wipe out the stats and get some real numbers tonight,
but I can't see how the planner would be able to distinguish *without*
mcv's...

Best,
Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Robert Haas
On Thu, Mar 1, 2012 at 4:08 PM, Tom Lane  wrote:
> Robert Haas  writes:
>>> So a relation can't have some pages in Version 9.2, and other pages in
>>> version 9.3?  How will this work for 2TB tables?
>
>> Not very well, but better than Tom's proposal to require upgrading the
>> entire cluster in a single off-line operation.
>
> WTF?  That was most certainly not what *I* was proposing; it's obviously
> unworkable.  We need a process that can incrementally up-version a live
> database and keep track of the minimum version present, at some
> granularity smaller than "whole database".
>
> All of this was discussed and hashed out about two years ago, IIRC.
> We just haven't made any progress towards actually implementing those
> concepts.

I am now officially confused.  I thought you and Heikki were arguing
about 24 hours ago that we needed to shut down the old database, run a
pre-upgrade utility to convert all the pages, run pg_upgrade, and then
bring the new database on-line; and that, further, you were arguing
that we should not support multiple page versions.  Now you seem to be
arguing the exact opposite - namely, that we should support multiple
page versions, and that the conversion should be done on-line.

So, to reiterate, I'm lost.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Alvaro Herrera

Excerpts from Robert Haas's message of jue mar 01 21:23:06 -0300 2012:
> On Thu, Mar 1, 2012 at 4:08 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >>> So a relation can't have some pages in Version 9.2, and other pages in
> >>> version 9.3?  How will this work for 2TB tables?
> >
> >> Not very well, but better than Tom's proposal to require upgrading the
> >> entire cluster in a single off-line operation.
> >
> > WTF?  That was most certainly not what *I* was proposing; it's obviously
> > unworkable.  We need a process that can incrementally up-version a live
> > database and keep track of the minimum version present, at some
> > granularity smaller than "whole database".
> >
> > All of this was discussed and hashed out about two years ago, IIRC.
> > We just haven't made any progress towards actually implementing those
> > concepts.
> 
> I am now officially confused.  I thought you and Heikki were arguing
> about 24 hours ago that we needed to shut down the old database, run a
> pre-upgrade utility to convert all the pages, run pg_upgrade, and then
> bring the new database on-line;

Actually, nobody mentioned shutting the database down.  This utility
does not necessarily have to be something that runs independently from a
running postmaster; in fact what I envisioned was that we would have it
be run in a backend -- I've always imagined it's just a function to
which you give a table OID, and it converts pages of that table into the
new format.  Maybe a user-friendly utility would connect to each
database and call this function for each table.

> and that, further, you were arguing that we should not support
> multiple page versions.

I don't think we need to support multiple page versions, if multiple
means > 2.  Obviously we need the server to support reading *two* page
versions; though we can probably get away with having support for
*writing* only the newest page version.

(The pg_class column would mean "the oldest page version to be found in
this table", so the table can actually contain a mixture of old pages
and new pages.)

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review of: collation for (expr)

2012-03-01 Thread Jaime Casanova
On Thu, Jan 19, 2012 at 1:50 PM, Peter Eisentraut  wrote:
> On tor, 2012-01-12 at 21:25 -0800, probabble wrote:
>> Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from
>> 2012-01-12 checkout.
>>
>> Bison upgraded to v2.5, and downgraded to v2.4.1
>>
>> Make process for both versions resulted in the following errors:
>>
>> make[3]: Leaving directory `/usr/local/src/pgbuild/src/backend/catalog'
>> make -C parser gram.h
>> make[3]: Entering directory `/usr/local/src/pgbuild/src/backend/parser'
>> /usr/local/bin/bison -d  -o gram.c gram.y
>> gram.y: conflicts: 370 reduce/reduce
>> gram.y: expected 0 reduce/reduce conflicts
>> gram.y:10482.27-10494.33: warning: rule useless in parser due to conflicts:
>> func_expr: COLLATION FOR '(' a_expr ')'
>> make[3]: *** [gram.c] Error 1
>
> I can't reproduce that.
>

me neither

> In the meantime, attached is a re-merged version of the patch; the old
> version doesn't apply cleanly anymore.
>

besides a clash in the oid and the value of leakproof missing in the
pg_proc entry, everything works fine.

The only thing is that i don't see a reason for these includes in
src/backend/utils/adt/misc.c:

+ #include "nodes/nodeFuncs.h"
+ #include "utils/lsyscache.h"


-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Robert Haas's message of jue mar 01 21:23:06 -0300 2012:
>> and that, further, you were arguing that we should not support
>> multiple page versions.

> I don't think we need to support multiple page versions, if multiple
> means > 2.

That's exactly the point here.  We clearly cannot support on-line
upgrade unless, somewhere along the line, we are willing to cope with
two page formats more or less concurrently.  What I don't want is for
that requirement to balloon to supporting N formats forever.  If we
do not have a mechanism that allows certifying that you have no
remaining pages of format N-1 before you upgrade to a server that
supports (only) versions N and N+1, then we're going to be in the
business of indefinite backwards compatibility instead.

I'm not entirely sure, but I think we may all be in violent agreement
about where this needs to end up.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-03-01 Thread Shigeru Hanada
2012/3/2 Peter Eisentraut :
>>  with renaming to dblink_fdw_validator?
>
> Well, it's not the validator of the dblink_fdw, so maybe something like
> basic_postgresql_fdw_validator.

-1 for same reason.  It's not the validator of basic_postgresql_fdw.

Using "fdw" in the name of validator which doesn't have actual FDW might
confuse users.  Rather dblink_validator or libpq_option_validator is better?

One possible another idea is creating dblink_fdw which uses the
validator during "CREATE EXTENSION dblink" for users who store
connection information in FDW objects.

-- 
Shigeru Hanada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Robert Haas
On Thu, Mar 1, 2012 at 8:32 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Excerpts from Robert Haas's message of jue mar 01 21:23:06 -0300 2012:
>>> and that, further, you were arguing that we should not support
>>> multiple page versions.
>
>> I don't think we need to support multiple page versions, if multiple
>> means > 2.
>
> That's exactly the point here.  We clearly cannot support on-line
> upgrade unless, somewhere along the line, we are willing to cope with
> two page formats more or less concurrently.  What I don't want is for
> that requirement to balloon to supporting N formats forever.  If we
> do not have a mechanism that allows certifying that you have no
> remaining pages of format N-1 before you upgrade to a server that
> supports (only) versions N and N+1, then we're going to be in the
> business of indefinite backwards compatibility instead.
>
> I'm not entirely sure, but I think we may all be in violent agreement
> about where this needs to end up.

I was using multiple to mean N>1, so yeah, it's sounding like we're
more or less on the same page here.

One thing I'm not too sure about is how to extend the page format to
handle optional features.  For example, suppose we want to add 2 bytes
to the page header for a checksum (or 4 bytes, or any other number).
Ideally, I'd like to not use up those 2 bytes on pages that aren't
checksummed.  What do we do about that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade --logfile option documentation

2012-03-01 Thread Bruce Momjian
On Thu, Mar 01, 2012 at 08:45:26AM -0500, Robert Haas wrote:
> On Wed, Feb 29, 2012 at 6:02 PM, Bruce Momjian  wrote:
> >> Any ideas about improving the error reporting more generally, so that
> >> when reloading the dump fails, the user can easily see what went
> >> belly-up, even if they didn't use -l?
> >
> > The only idea I have is to write the psql log to a temporary file and
> > report the last X lines from the file in case of failure.  Does that
> > help?
> 
> Why not just redirect stdout but not stderr?  If there are error
> messages, surely we want the user to just see those.

Well, I think sending the error messages to the user but stdout to a
file will leave users confused because it will be unclear which SQL
statement generated the error.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade --logfile option documentation

2012-03-01 Thread Bruce Momjian
On Thu, Mar 01, 2012 at 10:17:04PM +0200, Peter Eisentraut wrote:
> On ons, 2012-02-29 at 22:47 -0500, Bruce Momjian wrote:
> > Hey, that's a good idea.  I would always write the pg_dump output to a
> > log file.  If the dump succeeds, I remove the file, if not, I tell
> > users to read the log file for details about the failure --- good
> > idea.
> 
> But we also need the server log output somewhere.  So I think this temp
> file would need to cover everything that -l covers.

OK, combining your and Robert's ideas, how about I have pg_upgrade write
the server log to a file, and the pg_dump output to a file (with its
stderr), and if pg_upgrade fails, I report the failure and mention those
files.  If pg_upgrade succeeds, I remove the files?  pg_upgrade already
creates temporary files that it removes on completion.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Tom Lane
Robert Haas  writes:
> One thing I'm not too sure about is how to extend the page format to
> handle optional features.  For example, suppose we want to add 2 bytes
> to the page header for a checksum (or 4 bytes, or any other number).
> Ideally, I'd like to not use up those 2 bytes on pages that aren't
> checksummed.  What do we do about that?

I'm having a hard time getting excited about adding that requirement on
top of all the others that we have for this feature.  In particular, if
we insist on that, there is exactly zero chance of turning checksums on
on-the-fly, because you won't be able to do it if the page is full.

The scheme that seems to me to make the most sense for checksums,
if we grant Simon's postulate that a 2-byte checksum is enough, is
(1) repurpose the top N bits of pd_flags as a page version number,
for some N;
(2) repurpose pd_pagesize_version as the checksum, with the
understanding that you can't have a checksum in version-0 pages.

(Simon's current patch seems to be an overengineered version of that.
Possibly we should also ask ourselves how much we really need pd_tli
and whether that couldn't be commandeered as the checksum field.)

I see the page versioning stuff as mainly being of interest for changes
that are more invasive than this, for instance tuple-header or data
changes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-01 Thread Peter Geoghegan
On 1 March 2012 22:09, Josh Berkus  wrote:
> On 3/1/12 1:57 PM, Daniel Farina wrote:
>> On Thu, Mar 1, 2012 at 1:27 PM, Peter Geoghegan  
>> wrote:
>>> My expectation is that this feature will make life a lot
>>> easier for a lot of Postgres users.
>>
>> Yes.  It's hard to overstate the apparent utility of this feature in
>> the general category of visibility and profiling.
>
> More importantly, this is what pg_stat_statements *should* have been in
> 8.4, and wasn't.

It would probably be prudent to concentrate on getting the core
infrastructure committed first. That way, we at least know that if
this doesn't get into 9.2, we can work on getting it into 9.3 knowing
that once committed, people won't have to wait over a year at the very
least to be able to use the feature. The footprint of such a change is
quite small:

 src/backend/nodes/copyfuncs.c  |2 +
 src/backend/nodes/equalfuncs.c |4 +
 src/backend/nodes/outfuncs.c   |6 +
 src/backend/nodes/readfuncs.c  |5 +
 src/backend/optimizer/plan/planner.c   |1 +
 src/backend/parser/analyze.c   |   37 +-
 src/backend/parser/parse_coerce.c  |   12 +-
 src/backend/parser/parse_param.c   |   11 +-
 src/include/nodes/parsenodes.h |3 +
 src/include/nodes/plannodes.h  |2 +
 src/include/parser/analyze.h   |   12 +
 src/include/parser/parse_node.h|3 +-

That said, I believe that the patch is pretty close to a commitable
state as things stand, and that all that is really needed is for a
committer familiar with the problem space to conclude the work started
by Daniel and others, adding:

contrib/pg_stat_statements/pg_stat_statements.c| 1420 ++-

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-03-01 Thread Robert Haas
On Thu, Mar 1, 2012 at 9:11 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> One thing I'm not too sure about is how to extend the page format to
>> handle optional features.  For example, suppose we want to add 2 bytes
>> to the page header for a checksum (or 4 bytes, or any other number).
>> Ideally, I'd like to not use up those 2 bytes on pages that aren't
>> checksummed.  What do we do about that?
>
> I'm having a hard time getting excited about adding that requirement on
> top of all the others that we have for this feature.  In particular, if
> we insist on that, there is exactly zero chance of turning checksums on
> on-the-fly, because you won't be able to do it if the page is full.
>
> The scheme that seems to me to make the most sense for checksums,
> if we grant Simon's postulate that a 2-byte checksum is enough, is
> (1) repurpose the top N bits of pd_flags as a page version number,
>    for some N;
> (2) repurpose pd_pagesize_version as the checksum, with the
>    understanding that you can't have a checksum in version-0 pages.
>
> (Simon's current patch seems to be an overengineered version of that.
> Possibly we should also ask ourselves how much we really need pd_tli
> and whether that couldn't be commandeered as the checksum field.)
>
> I see the page versioning stuff as mainly being of interest for changes
> that are more invasive than this, for instance tuple-header or data
> changes.

Well, OK, so...  it wouldn't bother me a bit to steal pd_tli for this,
although Simon previously objected to steeling even half of it, when I
suggested that upthread.

But I don't see the point of your first proposal: keeping the page
version right where it is is a good idea, and we should do it.  We
could steel some *high* order bits from that field without breaking
anything, but moving it around seems like it will complicate life to
no good purpose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Canceling ROLLBACK statement

2012-03-01 Thread Pavan Deolasee
Hi All,

While working on something in XC, I hit upon an assertion failure. While
this is in XC code, I believe there can be a way of reproducing this on
vanilla PostgreSQL as well. I could not do so even after several tries,
unless I add some code or run it through debugger. Here is the theory
anyways.

If client is running a ROLLBACK statement and sends a statement cancel
signal to the server and if the cancel signal gets processed after the
transaction AbortTransaction() is completed, but before
CleanupTransaction() is called, I think the server may try to ABORT the
transaction again, especially if it raises a ereport(ERROR) reporting
"canceling statement due to user request".  That results in server throwing
a warning "AbortTransaction while in ABORT state" and subsequently an
assertion failure in ProcArrayEndTransaction because proc->xid is not valid.

The reason I believe I am not able to reproduce this is because interrupts
are usually not processed between AbortTransaction and CleanupTransaction.
And when the server gets a chance to process the interrupts, its
DoingCommandRead and hence the cancel statement event is ignored. But its
not entirely unlikely that the server may get chance to process the
interrupt earlier. For example, if CleanupTransaction() calls elog() and
there are bunch of possible code paths where it can happen.

I am not sure if this is really an issue or if its worth fixing, but I
thought it will be good to share at the least.

Thanks,
Pavan
-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Canceling ROLLBACK statement

2012-03-01 Thread Tom Lane
Pavan Deolasee  writes:
> If client is running a ROLLBACK statement and sends a statement cancel
> signal to the server and if the cancel signal gets processed after the
> transaction AbortTransaction() is completed, but before
> CleanupTransaction() is called, I think the server may try to ABORT the
> transaction again,

This should be okay.  If you can demonstrate a case where it causes
problems, that would be a bug to fix, but I don't think it's likely
to be anything fundamental.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers