Re: [HACKERS] pg_upgrade --logfile option documentation

2012-02-29 Thread Bruce Momjian
On Wed, Feb 29, 2012 at 06:22:27PM -0500, A.M. wrote:
> 
> On Feb 29, 2012, at 6:02 PM, Bruce Momjian wrote:
> 
> > On Wed, Feb 29, 2012 at 04:34:24PM -0500, Robert Haas wrote:
> >> On Tue, Feb 28, 2012 at 9:45 PM, Bruce Momjian  wrote:
> >>> OK, I have implemented both Roberts and Àlvaro's ideas in my patch.
> >>> I only add the .old suffix to pg_controldata when link mode is used, and
> >>> I now do it after the schema has been created (the most common failure
> >>> case for pg_upgrade), and just before we actually link files --- both
> >>> very good ideas.
> >> 
> >> Thanks for working on this.  I think this will be a significant
> >> usability improvement.
> > 
> > Glad I got such good feedback and ideas.
> > 
> >> Any ideas about improving the error reporting more generally, so that
> >> when reloading the dump fails, the user can easily see what went
> >> belly-up, even if they didn't use -l?
> > 
> > The only idea I have is to write the psql log to a temporary file and
> > report the last X lines from the file in case of failure.  Does that
> > help?
> > 
> 
> 
> Perhaps pg_upgrade can print the path to the temp file containing the
> log and instruct the user to look there for more detail.

Hey, that's a good idea.  I would always write the pg_dump output to a
log file.  If the dump succeeds, I remove the file, if not, I tell users
to read the log file for details about the failure --- good idea.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Client Messages

2012-02-29 Thread Fujii Masao
On Thu, Mar 1, 2012 at 4:13 AM, Alvaro Herrera
 wrote:
>
> Excerpts from Heikki Linnakangas's message of jue ene 26 15:58:58 -0300 2012:
>> On 26.01.2012 17:31, Tom Lane wrote:
>
>> > The idea that occurs to me is to have the code that uses the GUC do a
>> > verify_mbstr(noerror) on it, and silently ignore it if it doesn't pass
>> > (maybe with a LOG message).  This would have to be documented of course,
>> > but it seems better than the potential consequences of trying to send a
>> > wrongly-encoded string.
>>
>> Hmm, fine with me. It would be nice to plug the hole that these bogus
>> characters can leak elsewhere into the system through current_setting,
>> though. Perhaps we could put the verify_mbstr() call somewhere in guc.c,
>> to forbid incorrectly encoded characters from being stored in the guc
>> variable in the first place.
>
> This patch is listed as "Needs review" but that seems to be wrong --
> it's "waiting on author", I think.

Yes. I marked the patch as "waiting on author".

>  Do we have an updated patch?  Fujii?

No. I believe that the author Jim will submit the updated version.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Tom Lane
Jim Nasby  writes:
> On 2/29/12 3:53 PM, Alvaro Herrera wrote:
>> .. in fact this is precisely what killed Zdenek Kotala's idea of
>> upgrading.

> This is also why Simon has avoided the whole upgrade thing with his 16 bit 
> checksum idea (otherwise presumably we'd be looking at bigger checksums 
> anyway).

> I get that fussing around with the version field is ugly. If there was 
> another way to do this without breaking pg_upgrade then it would be silly to 
> mess with the version field. Unfortunately, there is no other way.

Fundamentally, what is going on here is that several of us think that we
need a page format upgrade infrastructure first, and then we can think
about adding checksums.  Simon would like to cram checksums in without
building such infrastructure, regardless of the consequences in code
ugliness or future maintainability.  Personally, I think that is a bad
tradeoff.  Eventually we are going to have to build that infrastructure,
and the more we've kluged the page header format in the meanwhile, the
more unpleasant it's going to be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-02-29 Thread Alvaro Herrera


I'm curious about the LeafNode stuff.  Is this something that could be
done by expression_tree_walker?  I'm not completely familiar with it so
maybe there's some showstopper such as some node tags not being
supported, or maybe it just doesn't help.  But I'm curious.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Jim Nasby

On 2/29/12 3:53 PM, Alvaro Herrera wrote:


Excerpts from Tom Lane's message of mié feb 29 18:34:27 -0300 2012:


Robert Haas  writes:

On Wed, Feb 29, 2012 at 2:33 PM, Heikki Linnakangas
  wrote:

The utility would run in the old cluster before upgrading, so the the flag
would have to be present in the old version. pg_upgrade would check that the
flag is set, refusing to upgrade if it isn't, with an error like "please run
pre-upgrade utility first".



I find that a pretty unappealing design; it seems to me it'd be much
easier to make the new cluster cope with everything.


Easier for who?  I don't care for the idea of code that has to cope with
two page formats, or before long N page formats, because if we don't
have some mechanism like this then we will never be able to decide that
an old data format is safely dead.


.. in fact this is precisely what killed Zdenek Kotala's idea of
upgrading.


This is also why Simon has avoided the whole upgrade thing with his 16 bit 
checksum idea (otherwise presumably we'd be looking at bigger checksums anyway).

I get that fussing around with the version field is ugly. If there was another 
way to do this without breaking pg_upgrade then it would be silly to mess with 
the version field. Unfortunately, there is no other way.

Page checksuming is something a lot of people (myself included) want to see. 
Being able to get it in 9.2 would be a big win over crossing our fingers that 
at some point in the future (who knows when) we'll maybe figure out the page 
format upgrade issue. While we should definitely be looking into that issue 
it's definitely not going to get fixed in 9.2. ISTM that checksums are actually 
ready to go if people can just swallow the bitter pill of a screwed-up page 
version field until we can actually get an upgrade utility in place (and until 
we get that utility in place I don't see that the version field would do us any 
good anyway...)
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade --logfile option documentation

2012-02-29 Thread A.M.

On Feb 29, 2012, at 6:02 PM, Bruce Momjian wrote:

> On Wed, Feb 29, 2012 at 04:34:24PM -0500, Robert Haas wrote:
>> On Tue, Feb 28, 2012 at 9:45 PM, Bruce Momjian  wrote:
>>> OK, I have implemented both Roberts and Àlvaro's ideas in my patch.
>>> I only add the .old suffix to pg_controldata when link mode is used, and
>>> I now do it after the schema has been created (the most common failure
>>> case for pg_upgrade), and just before we actually link files --- both
>>> very good ideas.
>> 
>> Thanks for working on this.  I think this will be a significant
>> usability improvement.
> 
> Glad I got such good feedback and ideas.
> 
>> Any ideas about improving the error reporting more generally, so that
>> when reloading the dump fails, the user can easily see what went
>> belly-up, even if they didn't use -l?
> 
> The only idea I have is to write the psql log to a temporary file and
> report the last X lines from the file in case of failure.  Does that
> help?
> 


Perhaps pg_upgrade can print the path to the temp file containing the log and 
instruct the user to look there for more detail.

Cheers,
M




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Tom Lane
Nathan Boley  writes:
> On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane  wrote:
>> Nathan Boley  writes:
>>> If I understand you're suggestion, queries of the form
>>> SELECT * FROM rel
>>> WHERE ARRAY[ 1,2,3,4 ] <= x
>>>      AND x <=ARRAY[ 1, 2, 3, 1000];
>>> would no longer use an index. Is that correct?

>> No, just that we'd no longer have statistics relevant to that, and would
>> have to fall back on default selectivity assumptions.

> Which, currently, would mean queries of that form would typically use
> a table scan, right?

No, it doesn't.

> What about MCV's? Will those be removed as well?

Sure.  Those seem even less useful.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Nathan Boley
On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane  wrote:
> Nathan Boley  writes:
>> On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane  wrote:
>>> I am starting to look at this patch now.  I'm wondering exactly why the
>>> decision was made to continue storing btree-style statistics for arrays,
>>> in addition to the new stuff.
>
>> If I understand you're suggestion, queries of the form
>
>> SELECT * FROM rel
>> WHERE ARRAY[ 1,2,3,4 ] <= x
>>      AND x <=ARRAY[ 1, 2, 3, 1000];
>
>> would no longer use an index. Is that correct?
>
> No, just that we'd no longer have statistics relevant to that, and would
> have to fall back on default selectivity assumptions.

Which, currently, would mean queries of that form would typically use
a table scan, right?

> Do you think that
> such applications are so common as to justify bloating pg_statistic for
> everybody that uses arrays?

I have no idea, but it seems like it will be a substantial regression
for the people that are.

What about MCV's? Will those be removed as well?

Best,
Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade --logfile option documentation

2012-02-29 Thread Bruce Momjian
On Wed, Feb 29, 2012 at 04:34:24PM -0500, Robert Haas wrote:
> On Tue, Feb 28, 2012 at 9:45 PM, Bruce Momjian  wrote:
> > OK, I have implemented both Roberts and Àlvaro's ideas in my patch.
> > I only add the .old suffix to pg_controldata when link mode is used, and
> > I now do it after the schema has been created (the most common failure
> > case for pg_upgrade), and just before we actually link files --- both
> > very good ideas.
> 
> Thanks for working on this.  I think this will be a significant
> usability improvement.

Glad I got such good feedback and ideas.

> Any ideas about improving the error reporting more generally, so that
> when reloading the dump fails, the user can easily see what went
> belly-up, even if they didn't use -l?

The only idea I have is to write the psql log to a temporary file and
report the last X lines from the file in case of failure.  Does that
help?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-02-29 Thread Tom Lane
Robert Haas  writes:
> On Wed, Feb 29, 2012 at 2:08 PM, Tom Lane  wrote:
>> I think you're just assuming that without any solid evidence.  My point
>> is precisely that if the more-parameterized path *fails* to generate
>> fewer rows, we want add_path to notice that and throw it out (or at
>> least be able to throw it out, if there's not another reason to keep it).

> Well, my "evidence" is that a parameterized path should pretty much
> always include a paramaterized path somewhere in there - otherwise,
> what is parameterization doing for us?

Well, yes, we know that much.

> And that's going to reduce the
> row count.  I may be missing something, but I'm confused as to why
> this isn't nearly tautological.

We don't know that --- I will agree it's likely, but that doesn't make
it so certain that we can assume it without checking.  A join condition
won't necessarily eliminate any rows.

(... thinks about that for awhile ...)  One thing we could possibly do
is have indxpath.c arbitrarily reject parameterizations that don't
produce a smaller estimated number of rows than an unparameterized scan.
Admittedly, this still doesn't *prove* the assumption for join
relations, but maybe it brings the odds to where it's okay for add_path
to make such an assumption.

(... thinks some more ...)  No, that doesn't get us there, because that
doesn't establish that a more-parameterized path produces fewer rows
than some path that requires less parameterization, yet not none at
all.  You really want add_path carrying out those comparisons.  In your
previous example, it's entirely possible that path D is dominated by B
or C because of poor choices of join quals.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Tom Lane
Robert Haas  writes:
> On Wed, Feb 29, 2012 at 4:34 PM, Tom Lane  wrote:
>> Easier for who?  I don't care for the idea of code that has to cope with
>> two page formats, or before long N page formats, because if we don't
>> have some mechanism like this then we will never be able to decide that
>> an old data format is safely dead.

> Huh?  You can drop support for a new page format any time you like.
> You just decree that version X+1 no longer supports it, and you can't
> pg_upgrade to it until all traces of the old page format are gone.

And how would a DBA know that?

> If you're going to require an offline rewrite of the whole old cluster
> before doing the upgrade, how much better is it than just breaking the
> page format and telling pg_upgrade users they're out of luck?

The difference is that people aren't playing russian roulette with their
data when they upgrade.  The point of the mechanisms being discussed
here is to know, for certain, that a large database no longer contains
pages of the old format.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Tom Lane
Nathan Boley  writes:
> On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane  wrote:
>> I am starting to look at this patch now.  I'm wondering exactly why the
>> decision was made to continue storing btree-style statistics for arrays,
>> in addition to the new stuff.

> If I understand you're suggestion, queries of the form

> SELECT * FROM rel
> WHERE ARRAY[ 1,2,3,4 ] <= x
>  AND x <=ARRAY[ 1, 2, 3, 1000];

> would no longer use an index. Is that correct?

No, just that we'd no longer have statistics relevant to that, and would
have to fall back on default selectivity assumptions.  Do you think that
such applications are so common as to justify bloating pg_statistic for
everybody that uses arrays?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-02-29 Thread Robert Haas
On Wed, Feb 29, 2012 at 2:08 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Feb 29, 2012 at 1:40 PM, Tom Lane  wrote:
>>> Indeed, and the code already knows that.  However, in this example, path
>>> A is capable of dominating the other three (being strictly less
>>> parameterized than them), and B and C are each capable of dominating D.
>>> The problem is just that I'd neglected to consider that rowcount now
>>> also becomes a figure of merit.
>
>> In theory, yes, but in practice, won't it nearly always be the case
>> that a less parameterized path generates more rows, and a more
>> parameterized path generates less rows, so that neither dominates the
>> other?
>
> I think you're just assuming that without any solid evidence.  My point
> is precisely that if the more-parameterized path *fails* to generate
> fewer rows, we want add_path to notice that and throw it out (or at
> least be able to throw it out, if there's not another reason to keep it).

Well, my "evidence" is that a parameterized path should pretty much
always include a paramaterized path somewhere in there - otherwise,
what is parameterization doing for us?  And that's going to reduce the
row count.  I may be missing something, but I'm confused as to why
this isn't nearly tautological.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Nathan Boley
On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane  wrote:
> Alexander Korotkov  writes:
>> On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch  wrote:
>>> I've attached a new version that includes the UINT64_FMT fix, some edits of
>>> your newest comments, and a rerun of pgindent on the new files.  I see no
>>> other issues precluding commit, so I am marking the patch Ready for
>>> Committer.
>>> If I made any of the comments worse, please post another update.
>
>> Changes looks reasonable for me. Thanks!
>
> I am starting to look at this patch now.  I'm wondering exactly why the
> decision was made to continue storing btree-style statistics for arrays,
> in addition to the new stuff.

If I understand you're suggestion, queries of the form

SELECT * FROM rel
WHERE ARRAY[ 1,2,3,4 ] <= x
 AND x <=ARRAY[ 1, 2, 3, 1000];

would no longer use an index. Is that correct?

Are you suggesting removing MCV's in lieu of MCE's as well?

-Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-02-29 Thread Amit Kapila
>The most obvious thing to do about this is to consider that one path can
>dominate another on cost only if it is both cheaper *and* produces no
>more rows.  But I'm concerned about the cost of inserting yet another
>test condition into add_path, which is slow enough already.  Has anyone
>got an idea for a different formulation that would avoid that?

Will it discard the seq. scan path if the number of rows is more and cost is
less or will it keep both paths and then later based on cost estimation for
join it discards one of them?
Can it be like if seq. scan has low cost and number of rows is only greater
by certain thresh-hold, only then seq. scan will be kept else index scan
will dominate.

-Original Message-
From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
Sent: Wednesday, February 29, 2012 4:05 AM
To: pgsql-hackers@postgreSQL.org
Subject: [HACKERS] Parameterized-path cost comparisons need some work

I was experimenting today with a test case sent to me by somebody at Red
Hat.  The details aren't too important, except that it involves an inner
join on the inside of a left join (where it cannot commute with the left
join).  I can reproduce the behavior with a standard regression test
table, if I crank random_page_cost up a bit:

regression=# set random_page_cost TO 5;
SET
regression=# explain select * from tenk1 t1 left join (tenk1 t2 join tenk1
t3 on t2.thousand = t3.unique2) on t1.hundred = t2.hundred where t1.unique1
= 1;
   QUERY PLAN



 Nested Loop Left Join  (cost=0.00..507.16 rows=100 width=732)
   ->  Index Scan using tenk1_unique1 on tenk1 t1  (cost=0.00..10.28 rows=1
width=244)
 Index Cond: (unique1 = 1)
   ->  Nested Loop  (cost=0.00..495.63 rows=100 width=488)
 ->  Index Scan using tenk1_hundred on tenk1 t2  (cost=0.00..446.98
rows=100 width=244)
   Index Cond: (t1.hundred = hundred)
 ->  Index Scan using tenk1_unique2 on tenk1 t3  (cost=0.00..0.48
rows=1 width=244)
   Index Cond: (unique2 = t2.thousand)
(8 rows)

regression=# set random_page_cost TO 6;
SET
regression=# explain select * from tenk1 t1 left join (tenk1 t2 join tenk1
t3 on t2.thousand = t3.unique2) on t1.hundred = t2.hundred where t1.unique1
= 1;
 QUERY PLAN


-
 Hash Right Join  (cost=928.30..2573.80 rows=100 width=732)
   Hash Cond: (t2.hundred = t1.hundred)
   ->  Hash Join  (cost=916.00..2523.00 rows=1 width=488)
 Hash Cond: (t2.thousand = t3.unique2)
 ->  Seq Scan on tenk1 t2  (cost=0.00..458.00 rows=1 width=244)
 ->  Hash  (cost=458.00..458.00 rows=1 width=244)
   ->  Seq Scan on tenk1 t3  (cost=0.00..458.00 rows=1
width=244)
   ->  Hash  (cost=12.29..12.29 rows=1 width=244)
 ->  Index Scan using tenk1_unique1 on tenk1 t1  (cost=0.00..12.29
rows=1 width=244)
   Index Cond: (unique1 = 1)
(10 rows)

WTF?  How did it suddenly fail to find the double-nested-loop plan, and
have to settle for a much worse plan?

The reason seems to be that for large enough random_page_cost, the
seqscan on t2 is costed as cheaper than an indexscan that selects a
significant fraction of the table.  (Notice the t2 scans are nearly the
same cost in these two examples.)  That causes add_path to decide that
the unparameterized seqscan is unconditionally better than the
parameterized indexscan, and throw out the latter.  Now it is impossible
to form the parameterized nestloop subplan.

The flaw in this logic, of course, is that the seqscan might be cheaper
than the parameterized indexscan, but *it produces a whole lot more
rows*, meaning that any subsequent join will be a lot more expensive.
Previously add_path didn't have to worry about that, because all
ordinary paths for a given relation produce the same number of rows
(and we studiously kept things like inner indexscan paths out of
add_path's view of the world).

The most obvious thing to do about this is to consider that one path can
dominate another on cost only if it is both cheaper *and* produces no
more rows.  But I'm concerned about the cost of inserting yet another
test condition into add_path, which is slow enough already.  Has anyone
got an idea for a different formulation that would avoid that?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)

2012-02-29 Thread Michael Tautschnig
Hi all,

[Bcc'ed Tom Lane as he had done the initial investigation on this.]

Following up on the earlier discussions in

[1] http://archives.postgresql.org/pgsql-hackers/2010-11/msg01575.php
and
[2] http://archives.postgresql.org/pgsql-hackers/2011-08/msg00330.php
with an initial fix in
[3] 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4e15a4db5e65e43271f8d20750d6500ab12632d0

we (a group of researchers from Oxford and London) did a more formal analysis
using a chain of automated verification tools. This suite of tools enables us to
automatically check for presence or absence of bugs on (smaller) bits of
software when run under weak memory model semantics, as is the case on x86 or
PowerPC.  [In this context we are always interested in further source code
examples that make significant use of shared-memory concurrency on such
platforms.]

Using our tool chain, we were eager to (1) confirm the bug and (2) validate the
proposed fix. See the very end for our analysis of the proposed fix.

The example that Tom Lane initially posted in [2] can be further simplified to
the following bit of self-contained (C) code (with line numbers), where
WaitLatch is a simple busy wait as "while(!latch[.]);" and ResetLatch is just
"latch[.]=0;"  Then running two of these worker functions in parallel suffices
to reproduce the problem that was initially reported.

 1 #define WORKERS 2
 2 volatile _Bool latch[WORKERS];
 3 volatile _Bool flag[WORKERS];
 4
 5 void worker(int i)
 6 {
 7   while(!latch[i]);
 8   for(;;)
 9   {
10assert(!latch[i] || flag[i]);
11latch[i] = 0;
12if(flag[i])
13{
14  flag[i] = 0;
15  flag[i+1 % WORKERS] = 1;
16  latch[i+1 % WORKERS] = 1;
17}
18
19while(!latch[i]);
20  }
21 }

We developed a tool that is able to analyse a piece of concurrent C code and
determines whether it contains bugs.  Our tool confirms the presence of at
least two bugs in the piece of code under discussion.


The first problem corresponds to a message passing idiom (view with fixed-width
font; the info in front of each statement states the line number each
worker/thread is executing):

  Worker 0 |  Worker 1
===+
(0:15) flag[1]=1;  | (1:7) while (!latch[1]);
(0:16) latch[1]=1; | (1:12) if (flag[1])

where we can observe latch[1] holding 1 and flag[1] holding 0 in the end.

This behaviour can happen on the PowerPC architecture for three reasons.  First,
write-write pairs can be reordered, for example the write to flag and the write
to latch by Worker 0.  Second, read-read pairs can be reordered, for example the
read of the while condition and the read of the if condition by Worker 1.
Finally, the store to latch by Worker 0 might not be atomic.

This corresponds to a scenario where we first execute Worker 0 up to line 17.
All the reads and writes go directly to memory, except the assignments at lines
14 and 15, where the values 0 and 1 are stored respectively into the
processor-local buffers of flag[0] and flag[1].  Then the second worker starts,
reading the freshly updated value 1 of latch[1]. It then exits the blocking
while (line 7).  But latch[1] still holds 1, and flag[1] is still holding 0, as
Worker 0 has not flushed yet the write of 1 waiting in its buffer. This means
that the condition of the if statement would not be true, the critical section
would be skipped, and the program would arrive at line 19, without having
authorised the next worker to enter in critical section, and would loop forever.

This seems to correspond to the scenario discussed in [2] above, where the wait
in line 19 times out.  This is confirmed by the fact that this behaviour is
commonly observed on several generations of Power machines (e.g. 1.7G/167G on
Power 7).


Let us now focus on the second bug; it corresponds to a load buffering idiom

  Worker 0   |   Worker 1
=+
(0:12) if (flag[0])  |  (1:12) if (flag[1])
(0:15) flag[1]=1;|  (1:15) flag[0]=1;

where we can observe both flag[0] and flag[1] holding 1 in the end.

This behaviour is valid on the PowerPC architecture for two reasons. First,
PowerPC can reorder read-write pairs, hence the read of the if condition on
either thread can be done after setting the flag to 1.  Second, the fact
that PowerPC relaxes the atomicity of stores is another reason for this to
happen.  Our understanding is that this behaviour is not yet implemented by
real-world Power machines.  Yet, since it is documented to be permitted by
the architecture, this could lead to actual bugs on future generations of
Power machines if not synchronised correctly.


In [3] it was suggested to fix the problem by placing a barrier in ResetLatch,
which corresponds to placing it between lines 11 and 12 in the code above.  This
amounts to placing a barrier between the two reads (lines 7/19 and 12, i.e.,
between WaitLatch and the if(flag[1]) ) of Worker 1.

Placing a sync (i.e., the stro

Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Robert Haas
On Wed, Feb 29, 2012 at 4:34 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Feb 29, 2012 at 2:33 PM, Heikki Linnakangas
>>  wrote:
>>> The utility would run in the old cluster before upgrading, so the the flag
>>> would have to be present in the old version. pg_upgrade would check that the
>>> flag is set, refusing to upgrade if it isn't, with an error like "please run
>>> pre-upgrade utility first".
>
>> I find that a pretty unappealing design; it seems to me it'd be much
>> easier to make the new cluster cope with everything.
>
> Easier for who?  I don't care for the idea of code that has to cope with
> two page formats, or before long N page formats, because if we don't
> have some mechanism like this then we will never be able to decide that
> an old data format is safely dead.

Huh?  You can drop support for a new page format any time you like.
You just decree that version X+1 no longer supports it, and you can't
pg_upgrade to it until all traces of the old page format are gone.

If you're going to require an offline rewrite of the whole old cluster
before doing the upgrade, how much better is it than just breaking the
page format and telling pg_upgrade users they're out of luck?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mié feb 29 18:34:27 -0300 2012:
> 
> Robert Haas  writes:
> > On Wed, Feb 29, 2012 at 2:33 PM, Heikki Linnakangas
> >  wrote:
> >> The utility would run in the old cluster before upgrading, so the the flag
> >> would have to be present in the old version. pg_upgrade would check that 
> >> the
> >> flag is set, refusing to upgrade if it isn't, with an error like "please 
> >> run
> >> pre-upgrade utility first".
> 
> > I find that a pretty unappealing design; it seems to me it'd be much
> > easier to make the new cluster cope with everything.
> 
> Easier for who?  I don't care for the idea of code that has to cope with
> two page formats, or before long N page formats, because if we don't
> have some mechanism like this then we will never be able to decide that
> an old data format is safely dead.

.. in fact this is precisely what killed Zdenek Kotala's idea of
upgrading.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Tom Lane
Robert Haas  writes:
> On Wed, Feb 29, 2012 at 2:33 PM, Heikki Linnakangas
>  wrote:
>> The utility would run in the old cluster before upgrading, so the the flag
>> would have to be present in the old version. pg_upgrade would check that the
>> flag is set, refusing to upgrade if it isn't, with an error like "please run
>> pre-upgrade utility first".

> I find that a pretty unappealing design; it seems to me it'd be much
> easier to make the new cluster cope with everything.

Easier for who?  I don't care for the idea of code that has to cope with
two page formats, or before long N page formats, because if we don't
have some mechanism like this then we will never be able to decide that
an old data format is safely dead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade --logfile option documentation

2012-02-29 Thread Robert Haas
On Tue, Feb 28, 2012 at 9:45 PM, Bruce Momjian  wrote:
> OK, I have implemented both Roberts and Àlvaro's ideas in my patch.
> I only add the .old suffix to pg_controldata when link mode is used, and
> I now do it after the schema has been created (the most common failure
> case for pg_upgrade), and just before we actually link files --- both
> very good ideas.

Thanks for working on this.  I think this will be a significant
usability improvement.

Any ideas about improving the error reporting more generally, so that
when reloading the dump fails, the user can easily see what went
belly-up, even if they didn't use -l?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Robert Haas
On Wed, Feb 29, 2012 at 2:33 PM, Heikki Linnakangas
 wrote:
> On 29.02.2012 21:30, Robert Haas wrote:
>>
>> On Wed, Feb 29, 2012 at 2:18 PM, Alvaro Herrera
>>   wrote:
>>>
>>> Note that if we want such an utility to walk and transform pages, we
>>> probably need a marker in the catalogs somewhere so that pg_upgrade can
>>> make sure that it was done in all candidate tables -- which is something
>>> that we should get in 9.2 so that it can be used in 9.3.  Such a marker
>>> would also allow us get rid of HEAP_MOVED_IN and HEAP_MOVED_OUT.
>>
>>
>> Getting rid of HEAP_MOVED_IN and HEAP_MOVED_OUT would be really nice,
>> but I don't see why we need to squeeze anything into 9.2 for any of
>> this.  pg_upgrade can certainly handle the addition of a new pg_class
>> column, and can arrange for in-place upgrades from pre-9.3 versions to
>> 9.3 to set the flag to the appropriate value.
>
> The utility would run in the old cluster before upgrading, so the the flag
> would have to be present in the old version. pg_upgrade would check that the
> flag is set, refusing to upgrade if it isn't, with an error like "please run
> pre-upgrade utility first".

I find that a pretty unappealing design; it seems to me it'd be much
easier to make the new cluster cope with everything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIST OWNED BY...

2012-02-29 Thread Alvaro Herrera

Excerpts from Thom Brown's message of mié feb 29 17:50:14 -0300 2012:
> On 29 February 2012 20:33, Euler Taveira de Oliveira  
> wrote:
> > On 29-02-2012 15:23, Thom Brown wrote:
> >> Or just change it to output a verbose notice without changing the syntax?
> >>
> > I can't see why we will do it only for DROP OWNED. Chat messages are 
> > annoying
> > unless the user asks for it (that's why I suggested VERBOSE).
> 
> Possibly, but I think if you're going ahead an dropping objects owned
> by a role, you'll tend to be interested in what those things were
> rather than blindly removing things from the database without actually
> knowing what they are.

Sure.

Initially DROP OWNED was supposed to be used only after REASSIGN OWNED.
So you give your objects to someone else, then DROP OWNED to revoke
whatever privileges you have (which is the only thing that REASSIGN
OWNED wouldn't have touched), then commit suicide.  There is no danger
of dropping useful objects in this scenario because you already gave
them away.  (Unless, of course, some other poor soul is busy reassigning
objects to you).

I agree that it might be useful to list objects owned by a given user.
Maybe this belongs into a view or a function, however, not a new
command.  After all, the mechanism that those commands use to display
objects is not very user-friendly, or even claims to be complete (the
lists are truncated after 200 objects or something like that).

More generally, maybe the thing to do is have a SRF that reports objects
that depend on some other object.  If you specify a role as the
referenced object, you get what's this thread is about.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-02-29 Thread Simon Riggs
On Wed, Feb 29, 2012 at 6:14 PM, Christopher Browne  wrote:
> On Wed, Feb 29, 2012 at 11:14 AM, Simon Riggs  wrote:
>> But it is very effective at avoiding 4 out of the 5 writes you mention.
>
> For the "common case," would we not want to have (1) [WAL] and (2)
> [Writing the pre-frozen tuple]?
>
> If we only write the tuple (2), and don't capture WAL, then the COPY
> wouldn't be replicable, right?

Well, my answer is a question: how would you like it to work?

The way I coded it is that it will still write WAL if wal_level is
set, so it would be replicable. So it only works when writing to a
newly created table but is otherwise separate to whether WAL is
skipped.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Alexander Korotkov
On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane  wrote:
> >> I am starting to look at this patch now.  I'm wondering exactly why the
> >> decision was made to continue storing btree-style statistics for arrays,
>
> > Probably, btree statistics really does matter for some sort of arrays?
> For
> > example, arrays representing paths in the tree. We could request a
> subtree
> > in a range query on such arrays.
>
> That seems like a pretty narrow, uncommon use-case.  Also, to get
> accurate stats for such queries that way, you'd need really enormous
> histograms.  I doubt that the existing parameters for histogram size
> will permit meaningful estimation of more than the first array entry
> (since we don't make the histogram any larger than we do for a scalar
> column).
>
> The real point here is that the fact that we're storing btree-style
> stats for arrays is an accident, backed into by having added btree
> comparators for arrays plus analyze.c's habit of applying default
> scalar-oriented analysis functions to any type without an explicit
> typanalyze entry.  I don't recall that we ever thought hard about
> it or showed that those stats were worth anything.
>

OK. I don't object to removing btree stats from arrays.
What do you thinks about pg_stats view in this case? Should it combine
values histogram and array length histogram in single column like do for
MCV and MCELEM?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Tom Lane
Alexander Korotkov  writes:
> On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane  wrote:
>> I am starting to look at this patch now.  I'm wondering exactly why the
>> decision was made to continue storing btree-style statistics for arrays,

> Probably, btree statistics really does matter for some sort of arrays? For
> example, arrays representing paths in the tree. We could request a subtree
> in a range query on such arrays.

That seems like a pretty narrow, uncommon use-case.  Also, to get
accurate stats for such queries that way, you'd need really enormous
histograms.  I doubt that the existing parameters for histogram size
will permit meaningful estimation of more than the first array entry
(since we don't make the histogram any larger than we do for a scalar
column).

The real point here is that the fact that we're storing btree-style
stats for arrays is an accident, backed into by having added btree
comparators for arrays plus analyze.c's habit of applying default
scalar-oriented analysis functions to any type without an explicit
typanalyze entry.  I don't recall that we ever thought hard about
it or showed that those stats were worth anything.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] controlling the location of server-side SSL files

2012-02-29 Thread Tom Lane
Peter Eisentraut  writes:
> On ons, 2012-02-29 at 14:27 -0500, Tom Lane wrote:
>> Hm?  Obviously I misunderstood what changes you were proposing to make,
>> so would you mind spelling it out?

> The details are to be determined, but a possible change would likely be
> that instead of looking for a file and using it if and only if found,
> there would be some kind of connection parameter saying "use this file
> for this functionality", and otherwise it's not used.  The particular
> example would be the CRL file.

Mph.  That seems unlikely to be a net win to me.  The scenario I'm
imagining is that you ("you" being DBA for some group of people) didn't
have a CRL file before, and now you need one.  Your administration
problem is to get that CRL file into place for all your users.
If we change as above, then you still have that admin problem, plus now
you have another: getting all your users to use the new connection
parameter.  Which, as a rule, is going to be tough (for example, psql
has no easy way to make that happen).  The new admin problem offers you
no leverage at all on the old one, either, since a user who's not
acquired the CRL file more than likely hasn't changed his connection
habits either.

There may or may not be some value in a connection parameter that allows
specifying a location besides ~/.postgresql/ for the SSL support files.
But I don't find any attraction in changing the default behavior.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Alexander Korotkov
On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane  wrote:

> I am starting to look at this patch now.  I'm wondering exactly why the
> decision was made to continue storing btree-style statistics for arrays,
> in addition to the new stuff.  The pg_statistic rows for array columns
> tend to be unreasonably wide already, and as-is this patch will make
> them quite a lot wider.  I think it requires more than a little bit of
> evidence to continue storing stats that seem to have only small
> probability of usefulness.
>
> In particular, if we didn't store that stuff, we'd not need to widen the
> number of columns in pg_statistic, which would noticeably reduce both
> the footprint of the patch and the probability of breaking external
> code.
>

Initially, I used existing slots for new statistics, but I've changed this
after the first review:
http://archives.postgresql.org/pgsql-hackers/2011-07/msg00780.php

Probably, btree statistics really does matter for some sort of arrays? For
example, arrays representing paths in the tree. We could request a subtree
in a range query on such arrays.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] LIST OWNED BY...

2012-02-29 Thread Thom Brown
On 29 February 2012 20:33, Euler Taveira de Oliveira  wrote:
> On 29-02-2012 15:23, Thom Brown wrote:
>> Or just change it to output a verbose notice without changing the syntax?
>>
> I can't see why we will do it only for DROP OWNED. Chat messages are annoying
> unless the user asks for it (that's why I suggested VERBOSE).

Possibly, but I think if you're going ahead an dropping objects owned
by a role, you'll tend to be interested in what those things were
rather than blindly removing things from the database without actually
knowing what they are.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Tom Lane
Alexander Korotkov  writes:
> On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch  wrote:
>> I've attached a new version that includes the UINT64_FMT fix, some edits of
>> your newest comments, and a rerun of pgindent on the new files.  I see no
>> other issues precluding commit, so I am marking the patch Ready for
>> Committer.
>> If I made any of the comments worse, please post another update.

> Changes looks reasonable for me. Thanks!

I am starting to look at this patch now.  I'm wondering exactly why the
decision was made to continue storing btree-style statistics for arrays,
in addition to the new stuff.  The pg_statistic rows for array columns
tend to be unreasonably wide already, and as-is this patch will make
them quite a lot wider.  I think it requires more than a little bit of
evidence to continue storing stats that seem to have only small
probability of usefulness.

In particular, if we didn't store that stuff, we'd not need to widen the
number of columns in pg_statistic, which would noticeably reduce both
the footprint of the patch and the probability of breaking external
code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIST OWNED BY...

2012-02-29 Thread Euler Taveira de Oliveira
On 29-02-2012 15:23, Thom Brown wrote:
> Or just change it to output a verbose notice without changing the syntax?
> 
I can't see why we will do it only for DROP OWNED. Chat messages are annoying
unless the user asks for it (that's why I suggested VERBOSE).


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2012-02-29 Thread Pavel Stehule
Hello

2012/2/29 Alvaro Herrera :
>
> I think the way we're passing down the options to the checker is a bit
> of a mess.  The way it is formulated, it seems to me that we'd need to
> add support code in the core CheckFunction for each option we might want
> to accept in the PL-specific checkers -- including what type of value
> the option receives.  As an example, right now we have all but one
> option taking a string argument (judging from usage of defGetString());
> however, option fatal_errors takes a boolean value, and it converts to
> string "on" or "off" which is supposed to be passed down to the checker.
> This doesn't seem very future-proof.

I don't agree - we has not any datatype that can hold "key/value" list
- and can be used via SQL interface. So two dimensional text array is
simple and generic data type. Theoretically we can use JSON type now,
but I think so array is more generic and better checked then JSON now
(and it require less code - and JSON is still string too). We don't
plan to modify parser to better support of JSON, so text array is more
user friendly. I think so pl checker function can be simply called
with used concept. But I am not against to your proposals. Actually
"concept" of generic options was required in initial discuss and then
there is implemented, but it not used. But cannot be removed, because
probably don't would to change API in next version.

>
> (Also, the patch seems to be passing the fatal_errors value twice: first
> in the options array, where it is ignored by the plpgsql checker, and a
> second time as a separate boolean option.  This needs a cleanup).
>

This is by design. One request for checker function (and check
function statement) was generic support for some optional data. This
can has sense for plperl or plpython, and it are not used now.
Fatal_errors is only proof concept and can be removed. I plan to use
these options in 9.3 for checking of "inline blocks".

> I don't see any good way to pass down generic options in a generic way.
> Maybe we can just state that all option values are going to be passed as
> strings -- is that good enough?  The other option would be to pass them
> using something like pg_node_tree, but then it wouldn't be possible to
> call the checker directly instead of through CHECK FUNCTION, which I
> think was a requirement.  And it'd be a stronger argument against usage
> of SPI to call the checker function from CHECK FUNCTION, but that's an
> unsolved problem.

Using just string needs more complex parsing, and I don't like some
like pg_node_tree too, because it cannot be simple created by hands
for direct call of checker function. Please, accept fact, so we would
to call directly PL checker function - and there all params of this
function should be simple created - and using two dimensional array is
really simple: ARRAY [['source',$$$$]].

I don't understand why you  don't like pass generic options by generic way. Why?

Regards

Pavel Stehule

>
> --
> Álvaro Herrera 
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Alvaro Herrera

Excerpts from Heikki Linnakangas's message of mié feb 29 16:29:26 -0300 2012:
> On 29.02.2012 21:18, Alvaro Herrera wrote:
> > Excerpts from Heikki Linnakangas's message of mié feb 29 16:09:02 -0300 
> > 2012:
> >> I thought my view on how this should be done was already clear, but just
> >> in case it isn't, let me restate: Enlarge the page header to make room
> >> for the checksum. To handle upgrades, put code in the backend to change
> >> the page format from old version to new one on-the-fly, as pages are
> >> read in. Because we're making the header larger, we need to ensure that
> >> there's room on every page. To do that, write a utility that you run on
> >> the cluster before running pg_upgrade, which moves tuples to ensure
> >> that. To ensure that the space doesn't get used again before upgrading,
> >> change the old version so that it reserves those N bytes in all new
> >> insertions and updates (I believe that approach has been discussed
> >> before and everyone is comfortable with backpatching such a change). All
> >> of this in 9.3.
> >
> > Note that if we want such an utility to walk and transform pages, we
> > probably need a marker in the catalogs somewhere so that pg_upgrade can
> > make sure that it was done in all candidate tables -- which is something
> > that we should get in 9.2 so that it can be used in 9.3.
> 
> In the simplest form, the utility could just create a magic file in the 
> data directory to indicate that it has run. All we need is a boolean 
> flag, unless you want to be fancy and make the utility restartable. 
> Implemented that way, there's no need to have anything in the catalogs.

Well, I find it likely that people with huge and/or high velocity
databases would want to get fancy about it, so that they can schedule it
as they see fit.

What I wouldn't like is an utility that is optional, so that we end up
with situations after upgrade that do have the new page format, others
that don't.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "make check" in src/test/isolation is unworkable

2012-02-29 Thread Andrew Dunstan



On 02/29/2012 02:33 PM, Peter Eisentraut wrote:

On sön, 2011-05-08 at 19:35 -0400, Tom Lane wrote:

I believe that the "make check" target in src/test/isolation is
fundamentally unportable, as is illustrated by the fact that buildfarm
member coypu is currently choking on it.  The reason is that the
pg_isolation_regress program depends on libpq, and in particular it
depends on having an *installed* libpq.  Anyplace where it appears to
work, it's because you already installed Postgres, or at least libpq.

I came across this old issue.  Unless I'm missing something, there is no
reason why pg_isolation_regress needs to be linked with libpq at all,
and it works fine without it.  If we removed the libpq link, then it
would work just like pg_regress and could support "make check".

Apparently, -Wl,--as-needed isn't working too well here.





I believe we can't rely on it working. If we really don't need libpq 
then why not just filter it out?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] controlling the location of server-side SSL files

2012-02-29 Thread Peter Eisentraut
On ons, 2012-02-29 at 14:27 -0500, Tom Lane wrote:
> Peter Eisentraut  writes:
> > On ons, 2012-02-29 at 14:20 -0500, Tom Lane wrote:
> >> In particular, I observe that we get pushback anytime we break something
> >> in a way that makes SSL config files be required on the client side;
> >> see bug #6302 for most recent example.
> 
> > *If* we were to make a change in libpq analogous to the server side, the
> > effect would be to make the files less required, which could actually
> > help the case of bug #6302.
> 
> Hm?  Obviously I misunderstood what changes you were proposing to make,
> so would you mind spelling it out?

The details are to be determined, but a possible change would likely be
that instead of looking for a file and using it if and only if found,
there would be some kind of connection parameter saying "use this file
for this functionality", and otherwise it's not used.  The particular
example would be the CRL file.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Heikki Linnakangas

On 29.02.2012 21:30, Robert Haas wrote:

On Wed, Feb 29, 2012 at 2:18 PM, Alvaro Herrera
  wrote:

Note that if we want such an utility to walk and transform pages, we
probably need a marker in the catalogs somewhere so that pg_upgrade can
make sure that it was done in all candidate tables -- which is something
that we should get in 9.2 so that it can be used in 9.3.  Such a marker
would also allow us get rid of HEAP_MOVED_IN and HEAP_MOVED_OUT.


Getting rid of HEAP_MOVED_IN and HEAP_MOVED_OUT would be really nice,
but I don't see why we need to squeeze anything into 9.2 for any of
this.  pg_upgrade can certainly handle the addition of a new pg_class
column, and can arrange for in-place upgrades from pre-9.3 versions to
9.3 to set the flag to the appropriate value.


The utility would run in the old cluster before upgrading, so the the 
flag would have to be present in the old version. pg_upgrade would check 
that the flag is set, refusing to upgrade if it isn't, with an error 
like "please run pre-upgrade utility first".


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "make check" in src/test/isolation is unworkable

2012-02-29 Thread Peter Eisentraut
On sön, 2011-05-08 at 19:35 -0400, Tom Lane wrote:
> I believe that the "make check" target in src/test/isolation is
> fundamentally unportable, as is illustrated by the fact that buildfarm
> member coypu is currently choking on it.  The reason is that the
> pg_isolation_regress program depends on libpq, and in particular it
> depends on having an *installed* libpq.  Anyplace where it appears to
> work, it's because you already installed Postgres, or at least libpq. 

I came across this old issue.  Unless I'm missing something, there is no
reason why pg_isolation_regress needs to be linked with libpq at all,
and it works fine without it.  If we removed the libpq link, then it
would work just like pg_regress and could support "make check".

Apparently, -Wl,--as-needed isn't working too well here.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Robert Haas
On Wed, Feb 29, 2012 at 2:18 PM, Alvaro Herrera
 wrote:
> Note that if we want such an utility to walk and transform pages, we
> probably need a marker in the catalogs somewhere so that pg_upgrade can
> make sure that it was done in all candidate tables -- which is something
> that we should get in 9.2 so that it can be used in 9.3.  Such a marker
> would also allow us get rid of HEAP_MOVED_IN and HEAP_MOVED_OUT.

Getting rid of HEAP_MOVED_IN and HEAP_MOVED_OUT would be really nice,
but I don't see why we need to squeeze anything into 9.2 for any of
this.  pg_upgrade can certainly handle the addition of a new pg_class
column, and can arrange for in-place upgrades from pre-9.3 versions to
9.3 to set the flag to the appropriate value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Heikki Linnakangas

On 29.02.2012 21:18, Alvaro Herrera wrote:

Excerpts from Heikki Linnakangas's message of mié feb 29 16:09:02 -0300 2012:

I thought my view on how this should be done was already clear, but just
in case it isn't, let me restate: Enlarge the page header to make room
for the checksum. To handle upgrades, put code in the backend to change
the page format from old version to new one on-the-fly, as pages are
read in. Because we're making the header larger, we need to ensure that
there's room on every page. To do that, write a utility that you run on
the cluster before running pg_upgrade, which moves tuples to ensure
that. To ensure that the space doesn't get used again before upgrading,
change the old version so that it reserves those N bytes in all new
insertions and updates (I believe that approach has been discussed
before and everyone is comfortable with backpatching such a change). All
of this in 9.3.


Note that if we want such an utility to walk and transform pages, we
probably need a marker in the catalogs somewhere so that pg_upgrade can
make sure that it was done in all candidate tables -- which is something
that we should get in 9.2 so that it can be used in 9.3.


In the simplest form, the utility could just create a magic file in the 
data directory to indicate that it has run. All we need is a boolean 
flag, unless you want to be fancy and make the utility restartable. 
Implemented that way, there's no need to have anything in the catalogs.



Such a marker would also allow us get rid of HEAP_MOVED_IN and
HEAP_MOVED_OUT.


True.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Robert Haas
On Wed, Feb 29, 2012 at 2:09 PM, Heikki Linnakangas
 wrote:
> On 29.02.2012 19:54, Simon Riggs wrote:
>> I'm beginning to lose faith that objections are being raised at a
>> rational level. It's not a panel game with points for clever answers,
>> its an engineering debate about how to add features real users want.
>> And they do want, so let me solve the problems by agreeing something
>> early enough to allow it to be implemented, rather than just
>> discussing it until we run out of time.
>
> I thought my view on how this should be done was already clear, but just in
> case it isn't, let me restate: Enlarge the page header to make room for the
> checksum. To handle upgrades, put code in the backend to change the page
> format from old version to new one on-the-fly, as pages are read in. Because
> we're making the header larger, we need to ensure that there's room on every
> page. To do that, write a utility that you run on the cluster before running
> pg_upgrade, which moves tuples to ensure that. To ensure that the space
> doesn't get used again before upgrading, change the old version so that it
> reserves those N bytes in all new insertions and updates (I believe that
> approach has been discussed before and everyone is comfortable with
> backpatching such a change). All of this in 9.3.

This could all be done on-line if we had a per-table flag indicating
which page version is in use.  You can use some variant of CLUSTER or
VACUUM or ALTER TABLE to rewrite the table using the page version of
your choice.  This is also more granular, in that it allows checksums
(or other features) to be turned on and off on a per-table basis.
When you read the table, the same flag tells you which page version to
expect, and you can error out if you haven't got that (or if the CRCs
don't match).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] controlling the location of server-side SSL files

2012-02-29 Thread Tom Lane
Peter Eisentraut  writes:
> On ons, 2012-02-29 at 14:20 -0500, Tom Lane wrote:
>> In particular, I observe that we get pushback anytime we break something
>> in a way that makes SSL config files be required on the client side;
>> see bug #6302 for most recent example.

> *If* we were to make a change in libpq analogous to the server side, the
> effect would be to make the files less required, which could actually
> help the case of bug #6302.

Hm?  Obviously I misunderstood what changes you were proposing to make,
so would you mind spelling it out?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] controlling the location of server-side SSL files

2012-02-29 Thread Peter Eisentraut
On ons, 2012-02-29 at 14:20 -0500, Tom Lane wrote:
> In particular, I observe that we get pushback anytime we break something
> in a way that makes SSL config files be required on the client side;
> see bug #6302 for most recent example.

*If* we were to make a change in libpq analogous to the server side, the
effect would be to make the files less required, which could actually
help the case of bug #6302.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] controlling the location of server-side SSL files

2012-02-29 Thread Tom Lane
Peter Eisentraut  writes:
> On ons, 2012-02-08 at 09:16 +0100, Magnus Hagander wrote:
>> Yes, ignoring a missing file in a security context is definitely not good.
>> It should throw an error.
>> 
>> We have a few bad defaults from the old days around SSL for this, but if it
>> requires breaking backwards compatibility to get it right, I think we
>> should still do it. 

> Btw., should we also consider making similar changes on the libpq side?

I think that breaking compatibility of libpq's behavior is a whole lot
harder sell than changing things in a way that only affects what people
have to put into postgresql.conf.  We've always treated the latter as
something that can change across major versions.

In particular, I observe that we get pushback anytime we break something
in a way that makes SSL config files be required on the client side;
see bug #6302 for most recent example.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Alvaro Herrera

Excerpts from Heikki Linnakangas's message of mié feb 29 16:09:02 -0300 2012:
> On 29.02.2012 19:54, Simon Riggs wrote:
> > I'm beginning to lose faith that objections are being raised at a
> > rational level. It's not a panel game with points for clever answers,
> > its an engineering debate about how to add features real users want.
> > And they do want, so let me solve the problems by agreeing something
> > early enough to allow it to be implemented, rather than just
> > discussing it until we run out of time.
> 
> I thought my view on how this should be done was already clear, but just 
> in case it isn't, let me restate: Enlarge the page header to make room 
> for the checksum. To handle upgrades, put code in the backend to change 
> the page format from old version to new one on-the-fly, as pages are 
> read in. Because we're making the header larger, we need to ensure that 
> there's room on every page. To do that, write a utility that you run on 
> the cluster before running pg_upgrade, which moves tuples to ensure 
> that. To ensure that the space doesn't get used again before upgrading, 
> change the old version so that it reserves those N bytes in all new 
> insertions and updates (I believe that approach has been discussed 
> before and everyone is comfortable with backpatching such a change). All 
> of this in 9.3.

Note that if we want such an utility to walk and transform pages, we
probably need a marker in the catalogs somewhere so that pg_upgrade can
make sure that it was done in all candidate tables -- which is something
that we should get in 9.2 so that it can be used in 9.3.  Such a marker
would also allow us get rid of HEAP_MOVED_IN and HEAP_MOVED_OUT.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Client Messages

2012-02-29 Thread Alvaro Herrera

Excerpts from Heikki Linnakangas's message of jue ene 26 15:58:58 -0300 2012:
> On 26.01.2012 17:31, Tom Lane wrote:

> > The idea that occurs to me is to have the code that uses the GUC do a
> > verify_mbstr(noerror) on it, and silently ignore it if it doesn't pass
> > (maybe with a LOG message).  This would have to be documented of course,
> > but it seems better than the potential consequences of trying to send a
> > wrongly-encoded string.
> 
> Hmm, fine with me. It would be nice to plug the hole that these bogus 
> characters can leak elsewhere into the system through current_setting, 
> though. Perhaps we could put the verify_mbstr() call somewhere in guc.c, 
> to forbid incorrectly encoded characters from being stored in the guc 
> variable in the first place.

This patch is listed as "Needs review" but that seems to be wrong --
it's "waiting on author", I think.  Do we have an updated patch?  Fujii?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-02-29 Thread Tom Lane
Robert Haas  writes:
> On Wed, Feb 29, 2012 at 1:40 PM, Tom Lane  wrote:
>> Indeed, and the code already knows that.  However, in this example, path
>> A is capable of dominating the other three (being strictly less
>> parameterized than them), and B and C are each capable of dominating D.
>> The problem is just that I'd neglected to consider that rowcount now
>> also becomes a figure of merit.

> In theory, yes, but in practice, won't it nearly always be the case
> that a less parameterized path generates more rows, and a more
> parameterized path generates less rows, so that neither dominates the
> other?

I think you're just assuming that without any solid evidence.  My point
is precisely that if the more-parameterized path *fails* to generate
fewer rows, we want add_path to notice that and throw it out (or at
least be able to throw it out, if there's not another reason to keep it).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Heikki Linnakangas

On 29.02.2012 19:54, Simon Riggs wrote:

I'm beginning to lose faith that objections are being raised at a
rational level. It's not a panel game with points for clever answers,
its an engineering debate about how to add features real users want.
And they do want, so let me solve the problems by agreeing something
early enough to allow it to be implemented, rather than just
discussing it until we run out of time.


I thought my view on how this should be done was already clear, but just 
in case it isn't, let me restate: Enlarge the page header to make room 
for the checksum. To handle upgrades, put code in the backend to change 
the page format from old version to new one on-the-fly, as pages are 
read in. Because we're making the header larger, we need to ensure that 
there's room on every page. To do that, write a utility that you run on 
the cluster before running pg_upgrade, which moves tuples to ensure 
that. To ensure that the space doesn't get used again before upgrading, 
change the old version so that it reserves those N bytes in all new 
insertions and updates (I believe that approach has been discussed 
before and everyone is comfortable with backpatching such a change). All 
of this in 9.3.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-02-29 Thread Robert Haas
On Wed, Feb 29, 2012 at 1:40 PM, Tom Lane  wrote:
>> I am having trouble
>> constructing an example, but I feel like there might be cases where
>> it's possible to have path A, not parameterized, path B, parameterized
>> by R, and path C, parameterized by S, and maybe also path D,
>> parameterized by both R and S.  In that case, I feel like paths B and
>> C are incomparable.
>
> Indeed, and the code already knows that.  However, in this example, path
> A is capable of dominating the other three (being strictly less
> parameterized than them), and B and C are each capable of dominating D.
> The problem is just that I'd neglected to consider that rowcount now
> also becomes a figure of merit.

In theory, yes, but in practice, won't it nearly always be the case
that a less parameterized path generates more rows, and a more
parameterized path generates less rows, so that neither dominates the
other?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2012-02-29 Thread Alvaro Herrera

I think the way we're passing down the options to the checker is a bit
of a mess.  The way it is formulated, it seems to me that we'd need to
add support code in the core CheckFunction for each option we might want
to accept in the PL-specific checkers -- including what type of value
the option receives.  As an example, right now we have all but one
option taking a string argument (judging from usage of defGetString());
however, option fatal_errors takes a boolean value, and it converts to
string "on" or "off" which is supposed to be passed down to the checker.

This doesn't seem very future-proof.

(Also, the patch seems to be passing the fatal_errors value twice: first
in the options array, where it is ignored by the plpgsql checker, and a
second time as a separate boolean option.  This needs a cleanup).

I don't see any good way to pass down generic options in a generic way.
Maybe we can just state that all option values are going to be passed as
strings -- is that good enough?  The other option would be to pass them
using something like pg_node_tree, but then it wouldn't be possible to
call the checker directly instead of through CHECK FUNCTION, which I
think was a requirement.  And it'd be a stronger argument against usage
of SPI to call the checker function from CHECK FUNCTION, but that's an
unsolved problem.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-02-29 Thread Tom Lane
Robert Haas  writes:
> On Tue, Feb 28, 2012 at 5:35 PM, Tom Lane  wrote:
>> The most obvious thing to do about this is to consider that one path can
>> dominate another on cost only if it is both cheaper *and* produces no
>> more rows.

> Hmm.  Are you sure that's the right rule?

On further reflection it had seemed like we might have to treat the
rowcount as another independent metric of value.

> I am having trouble
> constructing an example, but I feel like there might be cases where
> it's possible to have path A, not parameterized, path B, parameterized
> by R, and path C, parameterized by S, and maybe also path D,
> parameterized by both R and S.  In that case, I feel like paths B and
> C are incomparable.

Indeed, and the code already knows that.  However, in this example, path
A is capable of dominating the other three (being strictly less
parameterized than them), and B and C are each capable of dominating D.
The problem is just that I'd neglected to consider that rowcount now
also becomes a figure of merit.

> Even if one is cheaper than the other and also
> produces fewer rows, they're not the same rows; so the expense of a
> subsequent join might be different with one vs. the other.

Yeah, if the quals being used are different, that's possible in
principle; but I don't foresee being able to estimate such a thing.
I think just using the number of rows is as good as we're likely to
be able to do.

> Even in the simple case where there's only one possible
> parameterization, it seems very difficult to compare the cost of a
> parameterized path A with the cost of an unparameterized path B.

I don't believe this.  We can certainly compare costs, and we can
certainly compare rowcount.  The possibility that the specific set
of rows selected might affect subsequent join costs seems to me to
be too second-order to justify ignoring these first-order differences.

The variant of the argument that had occurred to me is that if you
consider that rowcount is an independent metric, then it might be that a
less-parameterized path can never dominate a more-parameterized path,
because even if the former is cheaper it should always generate more
rows.  However, I'm not convinced that I believe that --- in particular,
with a poor choice of parameterizing quals it might not be true.
If we did have a more-parameterized path that wasn't any more selective,
we'd really want add_path to notice that and throw it out.

> I almost wonder if the bottom-up approach is the wrong way to tackle
> this entirely.

I don't find this idea too attractive ... and in any case we're not
doing rm -rf src/backend/optimizer/ and start over for 9.2.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] controlling the location of server-side SSL files

2012-02-29 Thread Peter Eisentraut
On ons, 2012-02-08 at 09:16 +0100, Magnus Hagander wrote:
> > I'm still worried about this.  If we ignore a missing root.crt, then the
> > effect is that authentication and certificate verification might fail,
> > which would be annoying, but you'd notice it soon enough.  But if we
> > ignore a missing root.crl, we are creating a security hole.
> >
> 
> Yes, ignoring a missing file in a security context is definitely not good.
> It should throw an error.
> 
> We have a few bad defaults from the old days around SSL for this, but if it
> requires breaking backwards compatibility to get it right, I think we
> should still do it. 

Btw., should we also consider making similar changes on the libpq side?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIST OWNED BY...

2012-02-29 Thread Robert Haas
On Wed, Feb 29, 2012 at 12:20 PM, Thom Brown  wrote:
> On 29 February 2012 17:16, Tom Lane  wrote:
>> Thom Brown  writes:
>>> So could we introduce either a command to show which objects are owned
>>> by a particular role, or allow a dry-run of DROP OWNED BY?
>>
>> It's always been possible to do that:
>>
>>        begin;
>>        drop owned by joe;
>>        rollback;
>>
>> I believe this is already the recommended approach if you're concerned
>> about what DROP CASCADE will do.
>
> No, the cascade part is fine.  It's the objects which won't cause a
> cascade that are an issue.  Putting it in a transaction for rolling
> back doesn't help find out what it intends to drop.
>
> How can the user tell what the statement would drop (ignoring cascades)?

It's certainly possible to write a query for this, but I think this
gets back to the old argument about whether every client (and every
end-user) should be required to reimplement this, or whether maybe we
ought to provide some server functionality around it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIST OWNED BY...

2012-02-29 Thread Thom Brown
On 29 February 2012 18:15, Euler Taveira de Oliveira  wrote:
> On 29-02-2012 14:20, Thom Brown wrote:
>> No, the cascade part is fine.  It's the objects which won't cause a
>> cascade that are an issue.  Putting it in a transaction for rolling
>> back doesn't help find out what it intends to drop.
>>
> DROP OWNED BY foo VERBOSE?

Or just change it to output a verbose notice without changing the syntax?

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIST OWNED BY...

2012-02-29 Thread Euler Taveira de Oliveira
On 29-02-2012 14:20, Thom Brown wrote:
> No, the cascade part is fine.  It's the objects which won't cause a
> cascade that are an issue.  Putting it in a transaction for rolling
> back doesn't help find out what it intends to drop.
> 
DROP OWNED BY foo VERBOSE?


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-02-29 Thread Christopher Browne
On Wed, Feb 29, 2012 at 11:14 AM, Simon Riggs  wrote:
> But it is very effective at avoiding 4 out of the 5 writes you mention.

For the "common case," would we not want to have (1) [WAL] and (2)
[Writing the pre-frozen tuple]?

If we only write the tuple (2), and don't capture WAL, then the COPY
wouldn't be replicable, right?
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-02-29 Thread Robert Haas
On Tue, Feb 28, 2012 at 5:35 PM, Tom Lane  wrote:
> The most obvious thing to do about this is to consider that one path can
> dominate another on cost only if it is both cheaper *and* produces no
> more rows.

Hmm.  Are you sure that's the right rule?  I am having trouble
constructing an example, but I feel like there might be cases where
it's possible to have path A, not parameterized, path B, parameterized
by R, and path C, parameterized by S, and maybe also path D,
parameterized by both R and S.  In that case, I feel like paths B and
C are incomparable.  Even if one is cheaper than the other and also
produces fewer rows, they're not the same rows; so the expense of a
subsequent join might be different with one vs. the other.

Even in the simple case where there's only one possible
parameterization, it seems very difficult to compare the cost of a
parameterized path A with the cost of an unparameterized path B.  No
matter how much cheaper A is than B, the eventual nested loop join
might be so inefficient as to completely wipe A out.  On the flip
side, even if A is more expensive than B, it's nearly always going to
produce at least somewhat fewer rows.  There are degenerate cases
where that might not be true, like a parameterized join where the
table we're index-scanning has only one value in the paramaterized
column, but presumably that's rare.  So it may be worth keeping A
around just because the subsequent join could turn out to be much
cheaper with the smaller row count.  Thus it seems unlikely that we'll
be able to conclude that either A or B is categorically superior.

If you accept that reasoning, then it seems like there's little if any
point in ever comparing a parameterized path to a non-parameterized
path; or of comparing two differently-parameterized paths against each
other.  You basically need a separate bucket for each group of paths,
where they compete against each other but not against
differently-parameterized paths; and then after you form the nested
loop, all the paths that are now parameterized the same way (but
weren't before) can finally go head-to-head against each other.

I almost wonder if the bottom-up approach is the wrong way to tackle
this entirely.  Suppose we're trying to build paths for {A B}.  We
first build unparameterized paths for {A} and {B} and compute join
paths for {A B}.  Now we know the cheapest way to build {A B} without
using parameterization, so we can make some deductions about a plan of
the form:

Nested Loop
-> A
-> B (parameterized by A)

In particular, if we take the best cost either overall or for a path
with the pathkeys that will be produced by this join, and divide by
the number of rows for A, a parameterized path for B only makes sense
if the total cost is less than that value.  Now we have a meaningful
bound that we can use to limit consideration of paths for B: anything
that's more expensive than that bound should be chucked.  If B is just
a single rel that's not that interesting, but if B is a joinrel, then
the bound applies individually to any subset of its members.  So we
start replanning B as a separate join problem, but any path for any
baserel or joinrel that exceeds our cutoff cost gets chucked; and if
any rel ends up with no workable paths, then we just forget the whole
thing.

Of course, the problem with this approach (aside from complexity) is
that you might end up planning the same subproblem multiple times with
different cost bounds.  You could cache the results of previous
computations so that you only replan it when the cost bound goes up,
but that's still pretty awful.  So maybe this is unworkable.  But the
current approach seems problematic, too, because you don't really know
enough to throw very much away at the time that you need to make that
decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Simon Riggs
On Wed, Feb 29, 2012 at 4:41 PM, Robert Haas  wrote:
> On Wed, Feb 29, 2012 at 11:24 AM, Simon Riggs  wrote:
>> On Wed, Feb 29, 2012 at 3:46 PM, Heikki Linnakangas
>>  wrote:
>>
 Are you saying you would accept the patch if we had this?
>>
>>> I think I would still be uncomfortable with the hacks in the page header.
>>
>> There are no "hacks". There are some carefully designed changes with
>> input from multiple people, including yourself, and it copes as
>> gracefully as it can with backwards compatibility requirements.
>
> You have comments from three different people, all experienced
> hackers, disagreeing with this position;

Who is the third person you speak of? Perhaps they will speak again if
they wish to be heard.

> Heikki and I have both
> proposed alternate approaches.

No, you've proposed rejecting the patch in favour of some future
dream. We all agree on the future dream but it clearly happens in the
future and that could easily be more than 1 release ahead.

If you have comments that would allow a patch in this release, I am
happy to hear them. I hear only two people seeking to reject a patch
that adds value for users. Quite frankly, the comments about flag bits
are ridiculous and bogus.


> I'm not sure that we're at a point
> where we can say that we know what the best solution is, but I think
> it is clear that there's enough concern about this that you ought not
> to be denying that there is a problem.

There are some weird cases that can cause problems. My wish to is to
resolve those so everyone is happy. If those weird cases are simply
used as an excuse to reject, then I don't accept that and nor will our
users. Of course, if there was a significant issue, it would be
immediate rejection but there isn't.

I'm trying to commit a useful patch in this release. If you'd like to
work with me to find a solution acceptable to all, I'd be happy to
include suggestions, just as I have already included comments from
Heikki, Bruce, Noah and others. I do accept that Heikki now says that
the code I added at his request is just a hack.

Assuming I'm going to commit something in this release, what should it
look like?

At Heikki's request/idea I will be working on a 2-phase database level
parameter that will give us checksumming on the whole database after a
scan to enable it. That sounds like it will resolve the corner cases
relatively cleanly.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Simon Riggs
On Wed, Feb 29, 2012 at 5:44 PM, Robert Haas  wrote:

>>> You have comments from three different people, all experienced
>>> hackers, disagreeing with this position;
>>
>> Who is the third person you speak of? Perhaps they will speak again if
>> they wish to be heard.
>
> Tom Lane.  It was the very first email posted in response to the very
> first version of this patch you ever posted.

Tom objected to not being able to tell which version a data block was.
At length, we have discussed this on list and there is no issue. It is
clear what page format a block has.

Please ping him if you believe he has other rational objections to
committing something and he isn't listening.

I'm beginning to lose faith that objections are being raised at a
rational level. It's not a panel game with points for clever answers,
its an engineering debate about how to add features real users want.
And they do want, so let me solve the problems by agreeing something
early enough to allow it to be implemented, rather than just
discussing it until we run out of time.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Robert Haas
On Wed, Feb 29, 2012 at 12:26 PM, Simon Riggs  wrote:
> On Wed, Feb 29, 2012 at 4:41 PM, Robert Haas  wrote:
>> On Wed, Feb 29, 2012 at 11:24 AM, Simon Riggs  wrote:
>>> On Wed, Feb 29, 2012 at 3:46 PM, Heikki Linnakangas
>>>  wrote:
>>>
> Are you saying you would accept the patch if we had this?
>>>
 I think I would still be uncomfortable with the hacks in the page header.
>>>
>>> There are no "hacks". There are some carefully designed changes with
>>> input from multiple people, including yourself, and it copes as
>>> gracefully as it can with backwards compatibility requirements.
>>
>> You have comments from three different people, all experienced
>> hackers, disagreeing with this position;
>
> Who is the third person you speak of? Perhaps they will speak again if
> they wish to be heard.

Tom Lane.  It was the very first email posted in response to the very
first version of this patch you ever posted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIST OWNED BY...

2012-02-29 Thread Thom Brown
On 29 February 2012 17:16, Tom Lane  wrote:
> Thom Brown  writes:
>> So could we introduce either a command to show which objects are owned
>> by a particular role, or allow a dry-run of DROP OWNED BY?
>
> It's always been possible to do that:
>
>        begin;
>        drop owned by joe;
>        rollback;
>
> I believe this is already the recommended approach if you're concerned
> about what DROP CASCADE will do.

No, the cascade part is fine.  It's the objects which won't cause a
cascade that are an issue.  Putting it in a transaction for rolling
back doesn't help find out what it intends to drop.

How can the user tell what the statement would drop (ignoring cascades)?

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIST OWNED BY...

2012-02-29 Thread Tom Lane
Thom Brown  writes:
> So could we introduce either a command to show which objects are owned
> by a particular role, or allow a dry-run of DROP OWNED BY?

It's always been possible to do that:

begin;
drop owned by joe;
rollback;

I believe this is already the recommended approach if you're concerned
about what DROP CASCADE will do.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] LIST OWNED BY...

2012-02-29 Thread Thom Brown
Hi all,

If someone wants to drop objects owned by a particular role, they'll
use DROP OWNED BY role.  However, the implications of this statement
aren't easily known, and once you've run it, it's not communicated
which objects were dropped.

So could we introduce either a command to show which objects are owned
by a particular role, or allow a dry-run of DROP OWNED BY?  Also when
DROP OWNED BY has been executed, shouldn't we get feedback about what
it did?  At the moment the only feedback provided is what objects the
statement cascaded to.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Robert Haas
On Wed, Feb 29, 2012 at 11:24 AM, Simon Riggs  wrote:
> On Wed, Feb 29, 2012 at 3:46 PM, Heikki Linnakangas
>  wrote:
>
>>> Are you saying you would accept the patch if we had this?
>
>> I think I would still be uncomfortable with the hacks in the page header.
>
> There are no "hacks". There are some carefully designed changes with
> input from multiple people, including yourself, and it copes as
> gracefully as it can with backwards compatibility requirements.

You have comments from three different people, all experienced
hackers, disagreeing with this position; Heikki and I have both
proposed alternate approaches.  I'm not sure that we're at a point
where we can say that we know what the best solution is, but I think
it is clear that there's enough concern about this that you ought not
to be denying that there is a problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Simon Riggs
On Wed, Feb 29, 2012 at 3:46 PM, Heikki Linnakangas
 wrote:

>> Are you saying you would accept the patch if we had this?

> I think I would still be uncomfortable with the hacks in the page header.

There are no "hacks". There are some carefully designed changes with
input from multiple people, including yourself, and it copes as
gracefully as it can with backwards compatibility requirements.

> Less so than in the current form - you wouldn't need a flag to indicate
> whether the page has a valid checksum or not, which would clean it up quite
> a bit - but still.

I agree this would remove the reliance on a bit in the page header and
so make it even more robust.

I'll add the 2 phase enabling feature, making it happen at database level.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-02-29 Thread Simon Riggs
On Sat, Feb 25, 2012 at 6:58 PM, Simon Riggs  wrote:
> On Sat, Feb 25, 2012 at 6:24 PM, Kevin Grittner
>  wrote:
>> Simon Riggs  wrote:
>>
>>> This patch extends that and actually sets the tuple header flag as
>>> HEAP_XMIN_COMMITTED during the load.
>>
>> Fantastic!

> I think we could add that as an option on COPY, since "breaking MVCC"
> in that way is only a bad thing if it happens accidentally without the
> user's permission and knowledge - which they may wish to give in many
> cases, such as reloading a database from a dump.

I've coded up COPY FREEZE to do just that.

This version doesn't require any changes to transaction machinery.

But it is very effective at avoiding 4 out of the 5 writes you mention.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index a73b022..4912449 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { table_name [ ( format_name
 OIDS [ boolean ]
+FREEZE [ boolean ]
 DELIMITER 'delimiter_character'
 NULL 'null_string'
 HEADER [ boolean ]
@@ -181,6 +182,23 @@ COPY { table_name [ ( 
 

+FREEZE
+
+ 
+  Specifies copying the data with rows already frozen, just as they
+  would be after running the VACUUM FREEZE command.
+  This only occurs if the table being loaded has been created in this
+  subtransaction, there are no cursors open and there are no older
+  snapshots held by this transaction.
+  Note that all sessions will immediately be able to see the data
+  if it has been successfully loaded, which specifically operates
+  outside of the normal definitions of MVCC. This is a performance
+  option intended for use only during initial data loads.
+ 
+
+   
+
+   
 DELIMITER
 
  
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c910863..68534bf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2050,7 +2050,13 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
 	tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
 	tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
-	HeapTupleHeaderSetXmin(tup->t_data, xid);
+	if (options & HEAP_INSERT_FREEZE)
+	{
+		HeapTupleHeaderSetXmin(tup->t_data, FrozenTransactionId);
+		tup->t_data->t_infomask |= HEAP_XMIN_COMMITTED;
+	}
+	else
+		HeapTupleHeaderSetXmin(tup->t_data, xid);
 	HeapTupleHeaderSetCmin(tup->t_data, cid);
 	HeapTupleHeaderSetXmax(tup->t_data, 0);		/* for cleanliness */
 	tup->t_tableOid = RelationGetRelid(relation);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 110480f..ec0bed8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/portal.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -108,6 +109,7 @@ typedef struct CopyStateData
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		binary;			/* binary format? */
 	bool		oids;			/* include OIDs? */
+	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
 	bool		header_line;	/* CSV header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
@@ -889,6 +891,14 @@ ProcessCopyOptions(CopyState cstate,
 		 errmsg("conflicting or redundant options")));
 			cstate->oids = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "freeze") == 0)
+		{
+			if (cstate->freeze)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("conflicting or redundant options")));
+			cstate->freeze = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (cstate->delim)
@@ -1922,6 +1932,11 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_SKIP_FSM;
 		if (!XLogIsNeeded())
 			hi_options |= HEAP_INSERT_SKIP_WAL;
+
+		if (cstate->freeze &&
+			ThereAreNoPriorRegisteredSnapshots() &&
+			ThereAreNoReadyPortals())
+			hi_options |= HEAP_INSERT_FREEZE;
 	}
 
 	/*
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d1ce2ab..cd2b243 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2315,6 +2315,10 @@ copy_opt_item:
 {
 	$$ = makeDefElem("oids", (Node *)makeInteger(TRUE));
 }
+			| FREEZE
+{
+	$$ = makeDefElem("freeze", (Node *)makeInteger(TRUE));
+}
 			| DELIMITER opt_as Sconst
 {
 	$$ = makeDefElem("delimiter", (Node *)makeString($3));
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index cfb73c1..24075db 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -1055,3 +1055,22 @@ pg_cursor(PG_FUNCTIO

Re: [HACKERS] swapcache-style cache?

2012-02-29 Thread Rob Wultsch
On Mon, Feb 27, 2012 at 11:54 PM, Andrea Suisani  wrote:
> On 02/28/2012 04:52 AM, Rob Wultsch wrote:
>>
>> On Wed, Feb 22, 2012 at 2:31 PM, james
>>  wrote:
>>>
>>> Has anyone considered managing a system like the DragonFLY swapcache for
>>> a
>>> DBMS like PostgreSQL?
>>>
>>
>> https://www.facebook.com/note.php?note_id=388112370932
>>
>
> in the same vein:
>
> http://bcache.evilpiepirate.org/
>
> from the main page:
>
> "Bcache is a patch for the Linux kernel to use SSDs to cache other block
> devices. It's analogous to L2Arc for ZFS,
> but Bcache also does writeback caching, and it's filesystem agnostic. It's
> designed to be switched on with a minimum
> of effort, and to work well without configuration on any setup. By default
> it won't cache sequential IO, just the random
> reads and writes that SSDs excel at. It's meant to be suitable for desktops,
> servers, high end storage arrays, and perhaps
> even embedded."
>
> it was submitted to linux kernel mailing list a bunch of time, the last one:
>
> https://lkml.org/lkml/2011/9/10/13
>
>
> Andrea


I am pretty sure I won't get fired (or screw up the IPO) by saying
that I have a high opinion of Flashcache (at least within the fb
environment).

Is anyone using bcache at scale?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Heikki Linnakangas

On 29.02.2012 17:42, Simon Riggs wrote:

On Wed, Feb 29, 2012 at 3:30 PM, Heikki Linnakangas
  wrote:


Surely it can be done online. You'll just need a third state between off and
on, where checksums are written but not verified, while the cluster is
scanned.


Are you saying you would accept the patch if we had this?


I think I would still be uncomfortable with the hacks in the page 
header. Less so than in the current form - you wouldn't need a flag to 
indicate whether the page has a valid checksum or not, which would clean 
it up quite a bit - but still.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Simon Riggs
On Wed, Feb 29, 2012 at 3:30 PM, Heikki Linnakangas
 wrote:

> Surely it can be done online. You'll just need a third state between off and
> on, where checksums are written but not verified, while the cluster is
> scanned.

Are you saying you would accept the patch if we had this?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-02-29 Thread Tom Lane
Shigeru Hanada  writes:
> How about removing postgresql_fdw_validator from backend binary, and
> changing dblink to use contrib/postgresql_fdw's validator?  It breaks
> some backward compatibility and requires contrib/postgresql_fdw to be
> installed before using contrib/dblink with foreign servers, but ISTM
> that it doesn't become so serious.

I don't think that creating such a dependency is acceptable.
Even if we didn't mind the dependency, you said yourself that
contrib/postgresql_fdw's validator will accept stuff that's not
appropriate for dblink.

If we don't think postgresql_fdw_validator belongs in core after all,
we should just move it to dblink.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Heikki Linnakangas

On 29.02.2012 17:01, Simon Riggs wrote:

On Wed, Feb 29, 2012 at 2:40 PM, Heikki Linnakangas
  wrote:

On 22.02.2012 14:30, Simon Riggs wrote:


Agreed. No reason to change a checksum unless we rewrite the block, no
matter whether page_checksums is on or off.


This can happen:

1. checksums are initially enabled. A page is written, with a correct
checksum.
2. checksums turned off.
3. A hint bit is set on the page.
4. While the page is being written out, someone pulls the power cord, and
you get a torn write. The hint bit change made it to disk, but the clearing
of the checksum in the page header did not.
5. Sometime after restart, checksums are turned back on.

The page now has an incorrect checksum on it. The next time it's read, you
get a checksum error.


Yes, you will. And you'll get a checksum error because the block no
longer passes. So an error should be reported.

We can and should document that turning this on/off/on can cause
problems. Hopefully crashing isn't that common a situation.


Hopefully not, but then again, you get interested in fiddling with this 
setting, when you do experience crashes. This feature needs to be 
trustworthy in the face of crashes.



I'm pretty uncomfortable with this idea of having a flag on the page itself
to indicate whether it has a checksum or not. No matter how many bits we use
for that flag. You can never be quite sure that all your data is covered by
the checksum, and there's a lot of room for subtle bugs like the above,
where a page is reported as corrupt when it isn't, or vice versa.


That is necessary to allow upgrade. It's not their for any other reason.


Understood. I'm uncomfortable with it regardless of why it's there.


This thing needs to be reliable and robust. The purpose of a checksum is to
have an extra sanity check, to detect faulty hardware. If it's complicated,
whenever you get a checksum mismatch, you'll be wondering if you have broken
hardware or if you just bumped on a PostgreSQL bug. I think you need a flag
in pg_control or somewhere to indicate whether checksums are currently
enabled or disabled, and a mechanism to scan and rewrite all the pages with
checksums, before they are verified.


That would require massive downtime, so again, it has been ruled out
for practicality.


Surely it can be done online. You'll just need a third state between off 
and on, where checksums are written but not verified, while the cluster 
is scanned.



I've said this before, but I still don't like the hacks with the version
number in the page header. Even if it works, I would much prefer the
straightforward option of extending the page header for the new field. Yes,
it means you have to deal with pg_upgrade, but it's a hurdle we'll have to
jump at some point anyway.


What you suggest might happen in the next release, or maybe longer.
There may be things that block it completely, so it might never
happen. My personal opinion is that it is not possible to make further
block format changes until we have a fully online upgrade process,
otherwise we block people from upgrading - not everybody can take
their site down to run pg_upgrade. I plan to work on that, but it may
not happen for 9.3;


Yep, we should bite the bullet and work on that.


perhaps you will object to that also when it comes.


Heh, quite possible :-). But only if there's a reason. I do want to see 
a solution to this, I have a few page-format changing ideas myself that 
I'd like to implement at some point.



This patch is very specifically something that makes the best of the
situation, now, for those that want and need it. If you don't want it,
you don't have to use it.


Whether I use it or not, I'll have to live with it in the source tree.


But that shouldn't stop us giving it to the people that do want it.

I'm hearing general interest and support for this feature from people
that run their business on PostgreSQL.


If you ask someone "would you like to have checksums in PostgreSQL?", 
he'll say "sure". If you ask him "would you like to keep the PostgreSQL 
source tree as simple as possible, to make it easy for new developers to 
join the effort?", he'll say "yes". It's all about how you frame the 
question. Even if you want to have checksums on data pages, it doesn't 
necessarily mean you want them so badly you can't wait another release 
or two for a cleaner solution, or that you'd be satisfied with the 
implementation proposed here.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Simon Riggs
On Wed, Feb 29, 2012 at 2:40 PM, Heikki Linnakangas
 wrote:
> On 22.02.2012 14:30, Simon Riggs wrote:
>>
>> On Wed, Feb 22, 2012 at 7:06 AM, Noah Misch  wrote:
>>>
>>> On Sun, Feb 19, 2012 at 05:04:06PM -0500, Robert Haas wrote:

 Another disadvantage of the current scheme is that there's no
 particularly easy way to know that your whole cluster has checksums.
 No matter how we implement checksums, you'll have to rewrite every
 table in the cluster in order to get them fully turned on.  But with
 the current design, there's no easy way to know how much of the
 cluster is actually checksummed.  If you shut checksums off, they'll
 linger until those pages are rewritten, and there's no easy way to
 find the relations from which they need to be removed, either.
>>>
>>>
>>> I'm not seeing value in rewriting pages to remove checksums, as opposed
>>> to
>>> just ignoring those checksums going forward.  Did you have a particular
>>> scenario in mind?
>>
>>
>> Agreed. No reason to change a checksum unless we rewrite the block, no
>> matter whether page_checksums is on or off.
>
>
> This can happen:
>
> 1. checksums are initially enabled. A page is written, with a correct
> checksum.
> 2. checksums turned off.
> 3. A hint bit is set on the page.
> 4. While the page is being written out, someone pulls the power cord, and
> you get a torn write. The hint bit change made it to disk, but the clearing
> of the checksum in the page header did not.
> 5. Sometime after restart, checksums are turned back on.
>
> The page now has an incorrect checksum on it. The next time it's read, you
> get a checksum error.

Yes, you will. And you'll get a checksum error because the block no
longer passes. So an error should be reported.

We can and should document that turning this on/off/on can cause
problems. Hopefully crashing isn't that common a situation.

The production default would be "off". The default in the patch is
"on" only for testing.

> I'm pretty uncomfortable with this idea of having a flag on the page itself
> to indicate whether it has a checksum or not. No matter how many bits we use
> for that flag. You can never be quite sure that all your data is covered by
> the checksum, and there's a lot of room for subtle bugs like the above,
> where a page is reported as corrupt when it isn't, or vice versa.

That is necessary to allow upgrade. It's not their for any other reason.

> This thing needs to be reliable and robust. The purpose of a checksum is to
> have an extra sanity check, to detect faulty hardware. If it's complicated,
> whenever you get a checksum mismatch, you'll be wondering if you have broken
> hardware or if you just bumped on a PostgreSQL bug. I think you need a flag
> in pg_control or somewhere to indicate whether checksums are currently
> enabled or disabled, and a mechanism to scan and rewrite all the pages with
> checksums, before they are verified.

That would require massive downtime, so again, it has been ruled out
for practicality.

> I've said this before, but I still don't like the hacks with the version
> number in the page header. Even if it works, I would much prefer the
> straightforward option of extending the page header for the new field. Yes,
> it means you have to deal with pg_upgrade, but it's a hurdle we'll have to
> jump at some point anyway.

What you suggest might happen in the next release, or maybe longer.
There may be things that block it completely, so it might never
happen. My personal opinion is that it is not possible to make further
block format changes until we have a fully online upgrade process,
otherwise we block people from upgrading - not everybody can take
their site down to run pg_upgrade. I plan to work on that, but it may
not happen for 9.3; perhaps you will object to that also when it
comes.

So we simply cannot rely on this "jam tomorrow" vision.

This patch is very specifically something that makes the best of the
situation, now, for those that want and need it. If you don't want it,
you don't have to use it. But that shouldn't stop us giving it to the
people that do want it.

I'm hearing general interest and support for this feature from people
that run their business on PostgreSQL.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI rw-conflicts and 2PC

2012-02-29 Thread Kevin Grittner
Heikki Linnakangas  wrote:
> On 23.02.2012 01:36, Jeff Davis wrote:
>> On Tue, 2012-02-14 at 19:32 -0500, Dan Ports wrote:
 
>>> Hmm, it occurs to me if we have to abort a transaction due to
>>> serialization failure involving a prepared transaction, we might
>>> want to include the prepared transaction's gid in the errdetail.
>>
>> I like this idea.
> 
> +1. Anyone want to put together a patch?
 
Unless Dan claims it before I start the work, I'll do it.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-29 Thread Heikki Linnakangas

On 22.02.2012 14:30, Simon Riggs wrote:

On Wed, Feb 22, 2012 at 7:06 AM, Noah Misch  wrote:

On Sun, Feb 19, 2012 at 05:04:06PM -0500, Robert Haas wrote:

Another disadvantage of the current scheme is that there's no
particularly easy way to know that your whole cluster has checksums.
No matter how we implement checksums, you'll have to rewrite every
table in the cluster in order to get them fully turned on.  But with
the current design, there's no easy way to know how much of the
cluster is actually checksummed.  If you shut checksums off, they'll
linger until those pages are rewritten, and there's no easy way to
find the relations from which they need to be removed, either.


I'm not seeing value in rewriting pages to remove checksums, as opposed to
just ignoring those checksums going forward.  Did you have a particular
scenario in mind?


Agreed. No reason to change a checksum unless we rewrite the block, no
matter whether page_checksums is on or off.


This can happen:

1. checksums are initially enabled. A page is written, with a correct 
checksum.

2. checksums turned off.
3. A hint bit is set on the page.
4. While the page is being written out, someone pulls the power cord, 
and you get a torn write. The hint bit change made it to disk, but the 
clearing of the checksum in the page header did not.

5. Sometime after restart, checksums are turned back on.

The page now has an incorrect checksum on it. The next time it's read, 
you get a checksum error.


I'm pretty uncomfortable with this idea of having a flag on the page 
itself to indicate whether it has a checksum or not. No matter how many 
bits we use for that flag. You can never be quite sure that all your 
data is covered by the checksum, and there's a lot of room for subtle 
bugs like the above, where a page is reported as corrupt when it isn't, 
or vice versa.


This thing needs to be reliable and robust. The purpose of a checksum is 
to have an extra sanity check, to detect faulty hardware. If it's 
complicated, whenever you get a checksum mismatch, you'll be wondering 
if you have broken hardware or if you just bumped on a PostgreSQL bug. I 
think you need a flag in pg_control or somewhere to indicate whether 
checksums are currently enabled or disabled, and a mechanism to scan and 
rewrite all the pages with checksums, before they are verified.


I've said this before, but I still don't like the hacks with the version 
number in the page header. Even if it works, I would much prefer the 
straightforward option of extending the page header for the new field. 
Yes, it means you have to deal with pg_upgrade, but it's a hurdle we'll 
have to jump at some point anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI rw-conflicts and 2PC

2012-02-29 Thread Heikki Linnakangas

On 23.02.2012 01:36, Jeff Davis wrote:

On Tue, 2012-02-14 at 19:32 -0500, Dan Ports wrote:

On Tue, Feb 14, 2012 at 09:27:58AM -0600, Kevin Grittner wrote:

Heikki Linnakangas  wrote:

On 14.02.2012 04:57, Dan Ports wrote:

The easiest answer would be to just treat every prepared
transaction found during recovery as though it had a conflict in
and out. This is roughly a one-line change, and it's certainly
safe.


+1.

I don't even see this as much of a problem. Prepared transactions
hanging around for arbitrary periods of time cause all kinds of problems
already. Those using them need to be careful to resolve them quickly --
and if there's a crash involved, I think it's reasonable to say they
should be resolved before continuing normal online operations.


Committed this now. (sorry for the delay)


Hmm, it occurs to me if we have to abort a transaction due to
serialization failure involving a prepared transaction, we might want
to include the prepared transaction's gid in the errdetail.


I like this idea.


+1. Anyone want to put together a patch?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2012-02-29 Thread Pavel Stehule
Hello

2012/2/28 Alvaro Herrera :
>
>
> In gram.y we have a new check_option_list nonterminal.  This is mostly
> identical to explain_option_list, except that the option args do not
> take a NumericOnly (only opt_boolean_or_string and empty).  I wonder if
> it's really worthwhile having a bunch of separate productions for this;
> how about we just use the existing explain_option_list instead and get
> rid of those extra productions?
>
> elog() is used in many user-facing messages (errors and notices).  Full
> ereport() calls should be used there, so that messages are marked for
> translations and so on.

I replaced elog by ereport for all not internal errors

>
> Does the patched pg_dump work with older servers?
>

it should to do

> I don't like CheckFunction being declared in defrem.h.  It seems
> completely out of place there.  I don't see any better place though, so
> I'm thinking maybe we should have a new header file for it (say
> commands/functions.h; but we already have executor/functions.h so
> perhaps it's better to find another name).  This addition means that
> there's a distressingly large number of .c files that are now getting
> dest.h, which was previously pretty confined.

please, fix it like you wish

Regards

Pavel

>
> --
> Álvaro Herrera 
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support


check_function-2012-02-29-1.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unnecessary WAL archiving after failover

2012-02-29 Thread Fujii Masao
Hi,

In streaming replication, after failover, new master might have lots
of un-applied
WAL files with old timeline ID. They are the WAL files which were recycled as a
future ones when the server was running as a standby. Since they will never be
used later, they don't need to be archived after failover. But since they have
neither .ready nor .done file in archive_status, checkpoints after
failover newly
create .reacy files for them, and then finally they are archived.
Which might cause
disk I/O spike both in WAL and archive storage.

To avoid the above problem, I think that un-applied WAL files with old
timeline ID
should be marked as already-archived and recycled immediately at the end of
recovery. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-02-29 Thread Pavel Stehule
2012/2/29 Shigeru Hanada :
> (2012/02/29 4:07), Peter Eisentraut wrote:
>> Let's at least be clear about the reasons here.  The fact that
>> postgresql_fdw_validator exists means (a) there is a possible naming
>> conflict that has not been discussed yet, and/or (b) the name is already
>> settled and we need to think of a way to make postgresql_fdw_validator
>> work with the new actual FDW.
>
> We can avoid conflict of name by using postgres_fdw or pgsql_fdw, but it
> doesn't solve fundamental issue.  ISTM that maintaining two similar
> validators is wasteful and confusing, and FDW for PostgreSQL should be
> just one, at least in the context of core distribution.
>
> Current pgsql_fdw_validator accepts every FDW options which is accepted
> by postgresql_fdw_validator, and additionally accepts FDW specific
> options such as fetch_count.  So, if dblink can ignore unknown FDW
> options, pgsql_fdw_validator can be used to create foreign servers for
> dblink connection.
>
> How about removing postgresql_fdw_validator from backend binary, and
> changing dblink to use contrib/postgresql_fdw's validator?  It breaks
> some backward compatibility and requires contrib/postgresql_fdw to be
> installed before using contrib/dblink with foreign servers, but ISTM
> that it doesn't become so serious.
>

+1

pavel stehule

> Of course dblink is still available by itself if user specifies
> connection information with "key = value" string, not with server name.
>
> One concern is how to avoid duplicated list of valid libpq options.
> Adding new libpq function, like below, which returns 1 when given name
> is a valid libpq option would help.
>
>    int PQisValidOption(const char *keyword);
>
> --
> Shigeru Hanada
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-02-29 Thread Shigeru Hanada
(2012/02/29 4:07), Peter Eisentraut wrote:
> Let's at least be clear about the reasons here.  The fact that
> postgresql_fdw_validator exists means (a) there is a possible naming
> conflict that has not been discussed yet, and/or (b) the name is already
> settled and we need to think of a way to make postgresql_fdw_validator
> work with the new actual FDW.

We can avoid conflict of name by using postgres_fdw or pgsql_fdw, but it
doesn't solve fundamental issue.  ISTM that maintaining two similar
validators is wasteful and confusing, and FDW for PostgreSQL should be
just one, at least in the context of core distribution.

Current pgsql_fdw_validator accepts every FDW options which is accepted
by postgresql_fdw_validator, and additionally accepts FDW specific
options such as fetch_count.  So, if dblink can ignore unknown FDW
options, pgsql_fdw_validator can be used to create foreign servers for
dblink connection.

How about removing postgresql_fdw_validator from backend binary, and
changing dblink to use contrib/postgresql_fdw's validator?  It breaks
some backward compatibility and requires contrib/postgresql_fdw to be
installed before using contrib/dblink with foreign servers, but ISTM
that it doesn't become so serious.

Of course dblink is still available by itself if user specifies
connection information with "key = value" string, not with server name.

One concern is how to avoid duplicated list of valid libpq options.
Adding new libpq function, like below, which returns 1 when given name
is a valid libpq option would help.

int PQisValidOption(const char *keyword);

-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] swapcache-style cache?

2012-02-29 Thread Andrea Suisani

On 02/28/2012 08:54 AM, Andrea Suisani wrote:

On 02/28/2012 04:52 AM, Rob Wultsch wrote:

On Wed, Feb 22, 2012 at 2:31 PM, james wrote:

Has anyone considered managing a system like the DragonFLY swapcache for a
DBMS like PostgreSQL?



https://www.facebook.com/note.php?note_id=388112370932



in the same vein:

http://bcache.evilpiepirate.org/


[cut]


it was submitted to linux kernel mailing list a bunch of time, the last one:

https://lkml.org/lkml/2011/9/10/13


forgot to mention another good write-up

https://lwn.net/Articles/394672/

Andrea

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-02-29 Thread Daniel Farina
On Mon, Feb 27, 2012 at 4:26 AM, Peter Geoghegan  wrote:
> This does not appear to have any user-visible effect on caret position
> for all variations in coercion syntax, while giving me everything that
> I need. I had assumed that we were relying on things being this way,
> but apparently this is not the case. The system is correctly blaming
> the coercion token when it finds the coercion is at fault, and the
> const token when it finds the Const node at fault, just as it did
> before. So this looks like a case of removing what amounts to dead
> code.

To shed some light on that hypothesis, attached is a patch whereby I
use 'semantic analysis by compiler error' to show the extent of the
reach of the changes by renaming (codebase-wide) the Const node's
location symbol.  The scope whereby the error token will change
position is small and amenable to analysis.  I don't see a problem,
nor wide-reaching consequences.  As Peter says: probably dead code.
Note that the cancellation of the error position happens very soon,
after an invocation of stringTypeDatum (on two sides of a branch).
Pre and post-patch is puts the carat at the beginning of the constant
string, even in event there is a failure to parse it properly to the
destined type.

-- 
fdr


Straw-man-to-show-the-effects-of-the-change-to-const.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-02-29 Thread Simon Riggs
On Tue, Feb 28, 2012 at 10:35 PM, Tom Lane  wrote:

> The flaw in this logic, of course, is that the seqscan might be cheaper
> than the parameterized indexscan, but *it produces a whole lot more
> rows*, meaning that any subsequent join will be a lot more expensive.
> Previously add_path didn't have to worry about that, because all
> ordinary paths for a given relation produce the same number of rows
> (and we studiously kept things like inner indexscan paths out of
> add_path's view of the world).
>
> The most obvious thing to do about this is to consider that one path can
> dominate another on cost only if it is both cheaper *and* produces no
> more rows.  But I'm concerned about the cost of inserting yet another
> test condition into add_path, which is slow enough already.  Has anyone
> got an idea for a different formulation that would avoid that?

It seems clear that we shouldn't be making that decision at that point.

It would be better to default towards processing fewer rows initially
and then swoop in later to improve decision making on larger plans.
Can't we save the SeqScan costs at every node, then re-add SeqScan
plans as a post-processing step iff the index/nestd loops plans appear
costly? So have an additional post processing step that only cuts in
with larger plans.

Seqscan plans are bad for many reasons, such as pushing data out of
cache, making the result more sensitive to growing data volumes or
selectivity mistakes as well as producing confusing stats for people
trying to add the right indexes.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers