date:20101113

Re: [HACKERS] wCTE behaviour

2010-11-13 Thread Yeb Havinga


On 2010-11-12 16:51, David Fetter wrote:

On Fri, Nov 12, 2010 at 10:25:51AM -0500, Tom Lane wrote:


Yeah, that's another interesting question: should we somehow force
unreferenced CTEs to be evaluated anyhow?

Yes.

After a night's sleep I'm still thinking no. Arguments:
1) the name Common Table Expression suggests that t must be regarded 
as an expression, hence syntactically / proof theoretic and not as a 
table, set of rows / model theoretic. I.e. it is not a Common Table.
2) The expressions can be referenced zero, one or more times. To me it 
therefore makes the most sense that a DML expressions that is defined 
but not references has no effect. Referenced once: run the plan once. 
Referenced again: run the plan again.


What should the result be of
WITH t AS (INSERT INTO foo SELECT nextval('seq') RETURNING *)
SELECT * FROM t
UNION
SELECT * FROM t;

1 or 1,2 ?

regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] duplicate connection failure messages

2010-11-13 Thread Magnus Hagander

On Fri, Nov 12, 2010 at 15:02, Bruce Momjian br...@momjian.us wrote:
 Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  I have developed the attached patch to report whether IPv4 or IPv6 are
  being used.

 What's the use of that exactly?  It doesn't really respond to Peter's
 concern, I think.

 Peter liked:

 And I agree it's not very friendly in this specific case - I
 wonder if we should log it as localhost (127.0.0.1) and localhost
 (::1) (and similar for any other case that returns more than one
 address).

 What this will show is:

        localhost (IPv4)
        localhost (IPv6)

 Is that good?  I can't figure out how to do ::1 because when you supply
 a host _name_, there is no reverse mapping done.  Looking at the code,
 we test for a host name, then a host ip, and don't assume they are both
 set.

The address is in conn-raddr, no? When you've put in a host name, we
do a forward lookup, so conn-raddr should contain ::1 already? You
only need the reverse mapping to get the localhost part, if I read
the code correctly?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-13 Thread David Fetter

On Sat, Nov 13, 2010 at 02:28:35PM +0100, Yeb Havinga wrote:
 On 2010-11-12 16:51, David Fetter wrote:
 On Fri, Nov 12, 2010 at 10:25:51AM -0500, Tom Lane wrote:
 
 Yeah, that's another interesting question: should we somehow force
 unreferenced CTEs to be evaluated anyhow?
 Yes.
 After a night's sleep I'm still thinking no. Arguments:
 1) the name Common Table Expression suggests that t must be
 regarded as an expression, hence syntactically / proof theoretic and
 not as a table, set of rows / model theoretic. I.e. it is not a
 Common Table.

Disagree.  A table never referred to in a query still exists.
Similarly, if a normal CTE called a data-changing function but was
nevertheless not referred to, it would still run.

 2) The expressions can be referenced zero, one or more times. To me
 it therefore makes the most sense that a DML expressions that is
 defined but not references has no effect. Referenced once: run the
 plan once. Referenced again: run the plan again.

No.  When I designed this feature, it was precisely to take advantage
of the run exactly once behavior of CTEs.  Under no circumstances
should we break this.

 
 What should the result be of
 WITH t AS (INSERT INTO foo SELECT nextval('seq') RETURNING *)
 SELECT * FROM t
 UNION
 SELECT * FROM t;
 
 1 or 1,2 ?

1.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-11-13 Thread Robert Haas

On Fri, Nov 12, 2010 at 11:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Right.  I propose that we set max_wal_senders to unlimited when
 wal_level = hot_standby.

 It's a memory allocation parameter ... you can't just set it to
 unlimited, at least not without a nontrivial amount of work.

Yes.  This thread would benefit from less uneducated speculation and
more examination of what the parameter actually does.  I've looked at
how it's set up in the hopes of finding an easy fix and haven't come
up with anything all that wonderful yet.  In particular, there is this
code, that gets run on every commit (unless max_wal_senders is zero):

/* Wake up all walsenders */
void
WalSndWakeup(void)
{
int i;

for (i = 0; i  max_wal_senders; i++)
SetLatch(WalSndCtl-walsnds[i].latch);
}

Notice that this code is able to iterate over all of the WAL senders
that might exist without taking any sort of lock.  You do NOT want to
unnecessarily iterate over the entire procarray here.  The obvious fix
is to keep an array that contains only the latches for the WAL senders
that actually exist at the moment, but then you have to insert a lock
acquisition and release in here, or risk backends failing to see an
update to the shared variable that identifies the end of the list,
which seems like a pretty bad idea too.

One idea I've had is that we might want to think about defining an
operation that is effectively store, with a memory barrier.  For
example, on x86, this could be implemented using xchg.  I think if you
have a single-word variable in shared memory that is always updated
using a locked store, then individual backends should be able to read
it unlocked without risk of seeing a stale value.  So in the above
example you could have an array in shared memory and a variable
updated in the manner just described indicating how many slots of the
array are in use, and backends could iterate up to the end of the
array without needing a lock on the number of slots.  Of course, you
still have to figure out how to compact the array when a WAL sender
exits, but maybe that could be done with an extra layer of
indirection.

Come to think of it, I'm not really sure I understand what protects
SetLatch() against memory ordering hazards.  Is that actually safe?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-13 Thread Yeb Havinga


On 2010-11-13 14:41, David Fetter wrote:

On Sat, Nov 13, 2010 at 02:28:35PM +0100, Yeb Havinga wrote:

1) the name Common Table Expression suggests that t must be
regarded as an expression, hence syntactically / proof theoretic and
not as a table, set of rows / model theoretic. I.e. it is not a
Common Table.

Disagree.  A table never referred to in a query still exists.
Similarly, if a normal CTE called a data-changing function but was
nevertheless not referred to, it would still run.

with t as (select nextval('seq'))
select 1;

does not update the sequence.


2) The expressions can be referenced zero, one or more times. To me
it therefore makes the most sense that a DML expressions that is
defined but not references has no effect. Referenced once: run the
plan once. Referenced again: run the plan again.

No.  When I designed this feature, it was precisely to take advantage
of the run exactly once behavior of CTEs.  Under no circumstances
should we break this.
I found the pgday2009 presentation 
http://wiki.postgresql.org/images/c/c0/PGDay2009-EN-Writeable_CTEs_The_Next_Big_Thing.pdf 
- the IO minimization example is cool, and I now understand that it 
would be artificial if the CTE had to be referenced, for it to be 
executed. Makes sense.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-13 Thread Marko Tiikkaja


On 13 Nov 2010, at 15:41, David Fetter da...@fetter.org wrote:


On Sat, Nov 13, 2010 at 02:28:35PM +0100, Yeb Havinga wrote:

On 2010-11-12 16:51, David Fetter wrote:

On Fri, Nov 12, 2010 at 10:25:51AM -0500, Tom Lane wrote:


Yeah, that's another interesting question: should we somehow force
unreferenced CTEs to be evaluated anyhow?

Yes.

After a night's sleep I'm still thinking no. Arguments:
1) the name Common Table Expression suggests that t must be
regarded as an expression, hence syntactically / proof theoretic and
not as a table, set of rows / model theoretic. I.e. it is not a
Common Table.


Disagree.  A table never referred to in a query still exists.
Similarly, if a normal CTE called a data-changing function but was
nevertheless not referred to, it would still run.


Actually, it wouldn't.

But if we make the behaviour of wCTEs hard(er) to predict, we are  
going to have a pretty bad feature in our hands.  Let's not repeat our  
mistakes, please.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-11-13 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 One idea I've had is that we might want to think about defining an
 operation that is effectively store, with a memory barrier.  For
 example, on x86, this could be implemented using xchg.  I think if you
 have a single-word variable in shared memory that is always updated
 using a locked store, then individual backends should be able to read
 it unlocked without risk of seeing a stale value.

You're still guilty of fuzzy thinking here.  What does stale mean?
To do anything useful, you need to be able to fetch the value, execute
some sequence of operations that depends on the value, and be assured
that the value you fetched remains relevant throughout that sequence.
A memory barrier by itself doesn't help.

I have seen one or two places where we could use a memory barrier
primitive that doesn't include a lock, but they had to do with ensuring
that different writes would be seen to have occurred in a particular
order.  It is not obvious how we could use one here.

Now, one could also argue that commit is already a sufficiently
heavyweight operation that taking/releasing one more LWLock won't
matter too much.  But it would be nice to back that argument with
some evidence.

 Come to think of it, I'm not really sure I understand what protects
 SetLatch() against memory ordering hazards.  Is that actually safe?

Hmm ... that's a good question.  It certainly *looks* like it could
malfunction on machines with weak memory ordering.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-13 Thread Tom Lane

Marko Tiikkaja marko.tiikk...@cs.helsinki.fi writes:
 On 13 Nov 2010, at 15:41, David Fetter da...@fetter.org wrote:
 Similarly, if a normal CTE called a data-changing function but was
 nevertheless not referred to, it would still run.

 Actually, it wouldn't.

Indeed, and that was considered a feature when we did it.  I think
that having wCTEs behave arbitrarily differently on this point
might be a bad idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-13 Thread Marko Tiikkaja


On 2010-11-13 5:08 PM +0200, Tom Lane wrote:

Marko Tiikkajamarko.tiikk...@cs.helsinki.fi  writes:

On 13 Nov 2010, at 15:41, David Fetterda...@fetter.org  wrote:

Similarly, if a normal CTE called a data-changing function but was
nevertheless not referred to, it would still run.



Actually, it wouldn't.


Indeed, and that was considered a feature when we did it.  I think
that having wCTEs behave arbitrarily differently on this point
might be a bad idea.


So these queries would behave differently?

WITH t AS (DELETE FROM foo RETURNING *)
SELECT 1 WHERE false;

WITH t AS (DELETE FROM foo RETURNING *)
SELECT 1 FROM t LIMIT 0;


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-13 Thread Clark C. Evans

On Sat, 13 Nov 2010 17:23 +0200, Marko Tiikkaja wrote:
 So these queries would behave differently?
 
 WITH t AS (DELETE FROM foo RETURNING *)
 SELECT 1 WHERE false;

 WITH t AS (DELETE FROM foo RETURNING *)
 SELECT 1 FROM t LIMIT 0;

I'm still trying to wrap my head around this
new mechanism.  What would this return?

UPDATE foo SET access = 0;

WITH t AS (UPDATE foo SET access = access + 1 RETURNING *)
SELECT x.access, y.access
 FROM t CROSS JOIN t;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-13 Thread Marko Tiikkaja


On 2010-11-13 5:36 PM +0200, Clark C. Evans wrote:

On Sat, 13 Nov 2010 17:23 +0200, Marko Tiikkaja wrote:

So these queries would behave differently?

WITH t AS (DELETE FROM foo RETURNING *)
SELECT 1 WHERE false;

WITH t AS (DELETE FROM foo RETURNING *)
SELECT 1 FROM t LIMIT 0;


I'm still trying to wrap my head around this
new mechanism.  What would this return?

UPDATE foo SET access = 0;

WITH t AS (UPDATE foo SET access = access + 1 RETURNING *)
SELECT x.access, y.access
  FROM t CROSS JOIN t;


I'm assuming you forgot to give the tables aliases:

WITH t AS (UPDATE foo SET access = access + 1 RETURNING *)
SELECT x.access, y.access
   FROM t x CROSS JOIN t y;

This would return n * n rows with values (1,1) where n is the number of 
rows in foo when the snapshot was taken.  I.e. every row in foo would 
now have access=1.  I'm also ignoring the possibility that someone 
modified the table between those two queries.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] HOT updates in index-less tables

2010-11-13 Thread Tom Lane

If a table has no indexes, we will always decide that any same-page
update operation is a HOT update, since obviously it isn't modifying
any indexed columns.  But is there any benefit to doing so?  I don't
see one offhand, and it has a downside: we're very likely to
encounter broken HOT chains if an index is created later.  That leads
to the sort of unexpected behavior exhibited here:
http://archives.postgresql.org/pgsql-performance/2010-11/msg00216.php

I'm thinking maybe HeapSatisfiesHOTUpdate should be changed so that it
always returns false if the relation has no indexes, which could be
checked cheaply via relation-rd_rel-relhasindex.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Andrew Dunstan




On 11/12/2010 11:25 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 11/12/2010 03:16 PM, Peter Eisentraut wrote:

Improved parallel make support

Looks like this patch has pretty comprehensively broken the MSVC build
system. I'll see what I can recover from the wreckage.

There are also at least three non-Windows buildfarm members failing like
so:

gmake -C src all
gmake[1]: Entering directory `/home/pgbuild/pgbuildfarm/HEAD/pgsql.6736/src'
gmake[1]: *** virtual memory exhausted.  Stop.
gmake[1]: Leaving directory `/home/pgbuild/pgbuildfarm/HEAD/pgsql.6736/src'
gmake: *** [all-src-recursive] Error 2

I think we may have pushed too far in terms of what actually works
reliably across different make versions.


Yeah, possibly. And now it looks like this has broken the Solaris 
buildfarm members too.


I'm curious to know how much all this buys us. One reason I haven't 
enabled parallel make in the buildfarm is that it interleaves the 
output, which can be a pain. And build speed isn't really the 
buildfarm's foremost concern anyway. I know waiting for a build can be 
mildly annoying (ccache can be a big help if you're building 
repeatedly). But I don't feel we need to squeeze every last pip out of 
the build system.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 I'm curious to know how much all this buys us.

It *would* be nice if make -k worked better.  I frequently run into
the fact that (with the pre-existing setup) a compile error in the
backend prevented make from proceeding with builds of interfaces/,
bin/, etc, meaning that that work still remains to be done after I've
finished fixing the backend error.

But having said that, I won't shed many tears if we have to revert this.

It looks like all the unhappy critters are getting the same virtual
memory exhausted error.  I wonder whether they are all using make 3.80
...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Tom Lane

BTW, there's another problem here: make -j2 on my Mac blows up with
this on stderr:

ld: file not found: ../../../../../../src/backend/postgres
collect2: ld returned 1 exit status
make[3]: *** [ascii_and_mic.so] Error 1
make[2]: *** [all-ascii_and_mic-recursive] Error 2
make[1]: *** [all-backend/utils/mb/conversion_procs-recursive] Error 2
make[1]: *** Waiting for unfinished jobs
In file included from gram.y:12101:
scan.c: In function 'yy_try_NUL_trans':
scan.c:16242: warning: unused variable 'yyg'
make: *** [all-src-recursive] Error 2

Consulting stdout shows that indeed it's launched this series of jobs:

make -C backend/utils/mb/conversion_procs all
make -C ascii_and_mic all
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g  
-I../../../../../../src/include   -c -o ascii_and_mic.o ascii_and_mic.c
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g  
-bundle -multiply_defined suppress -o ascii_and_mic.so ascii_and_mic.o 
-L../../../../../../src/port -Wl,-d\
ead_strip_dylibs   -bundle_loader ../../../../../../src/backend/postgres

immediately after completing the src/timezone build, before the backend
build is even well begun let alone finished.  So the parallel build
dependency interlocks are basically not working.  This machine has gmake
3.81.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Andrew Dunstan




On 11/13/2010 11:12 AM, Tom Lane wrote:

It looks like all the unhappy critters are getting the same virtual
memory exhausted error.  I wonder whether they are all using make 3.80
...


Maybe we need to put back make version logging. Interestingly, narwhal, 
the mingw machine that has reported, didn't complain. It's running 3.81.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Dave Page

On Sat, Nov 13, 2010 at 4:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 It looks like all the unhappy critters are getting the same virtual
 memory exhausted error.  I wonder whether they are all using make 3.80

Both my Sparc and Intel Solaris critters have 3.80.


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HOT updates in index-less tables

2010-11-13 Thread Hannu Krosing

On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote:
 If a table has no indexes, we will always decide that any same-page
 update operation is a HOT update, since obviously it isn't modifying
 any indexed columns.  But is there any benefit to doing so? 

If we do the in-page mini vacuum even without HOT, then there should
be no benefit from index-less HOT updates. 

If we don't try the mini vacuum in this case, then some current
behaviuors could become much worse, say a table with one row and load of
updates.

 I don't
 see one offhand, and it has a downside: we're very likely to
 encounter broken HOT chains if an index is created later.  That leads
 to the sort of unexpected behavior exhibited here:
 http://archives.postgresql.org/pgsql-performance/2010-11/msg00216.php
 
 I'm thinking maybe HeapSatisfiesHOTUpdate should be changed so that it
 always returns false if the relation has no indexes, which could be
 checked cheaply via relation-rd_rel-relhasindex.
 
   regards, tom lane
 

-- 
---
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HOT updates in index-less tables

2010-11-13 Thread Tom Lane

Hannu Krosing ha...@2ndquadrant.com writes:
 On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote:
 If a table has no indexes, we will always decide that any same-page
 update operation is a HOT update, since obviously it isn't modifying
 any indexed columns.  But is there any benefit to doing so? 

 If we do the in-page mini vacuum even without HOT, then there should
 be no benefit from index-less HOT updates. 

AFAICS we do: heap_update marks the page as prunable whether it's a HOT
update or not.  The only difference between treating the update as HOT vs
not-HOT is that if there was more than one HOT update, the intermediate
tuples could be completely reclaimed by page pruning (ie, their line
pointers go away too).  With not-HOT updates, the intermediate line
pointers would have to remain in DEAD state until vacuum, since page
pruning wouldn't know if there were index entries pointing at them.
But that seems like a pretty tiny penalty.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Peter Eisentraut

On lör, 2010-11-13 at 11:06 -0500, Andrew Dunstan wrote:
 But I don't feel we need to squeeze every last pip out of 
 the build system.

Probably not on the buildfarm, but when you are developing, saving 20
seconds or 2 minutes per cycle can lead to hours saved.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Peter Eisentraut

On lör, 2010-11-13 at 11:12 -0500, Tom Lane wrote:
 It looks like all the unhappy critters are getting the same virtual
 memory exhausted error.  I wonder whether they are all using make
 3.80 ...

It turns out that there is an unrelated bug in 3.80 that some Linux
distributions have patched around.  3.81 or 3.82 are OK.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Peter Eisentraut

On lör, 2010-11-13 at 11:23 -0500, Tom Lane wrote:
 Consulting stdout shows that indeed it's launched this series of jobs:
 
 make -C backend/utils/mb/conversion_procs all
 make -C ascii_and_mic all
 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
 -fwrapv -g  -I../../../../../../src/include   -c -o ascii_and_mic.o
 ascii_and_mic.c
 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
 -fwrapv -g  -bundle -multiply_defined suppress -o ascii_and_mic.so
 ascii_and_mic.o -L../../../../../../src/port -Wl,-d\
 ead_strip_dylibs
 -bundle_loader ../../../../../../src/backend/postgres
 
 immediately after completing the src/timezone build, before the
 backend build is even well begun let alone finished.  So the parallel
 build dependency interlocks are basically not working.

On some platforms, you need to have backend/postgres built before any
dynamically loadable modules.  For those platforms, additional
dependencies will be necessary, I suppose.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On lÃ¶r, 2010-11-13 at 11:12 -0500, Tom Lane wrote:
 It looks like all the unhappy critters are getting the same virtual
 memory exhausted error.  I wonder whether they are all using make
 3.80 ...

 It turns out that there is an unrelated bug in 3.80 that some Linux
 distributions have patched around.  3.81 or 3.82 are OK.

So what do you mean by unrelated bug?  Can we work around it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: extensible enums

2010-11-13 Thread Peter Eisentraut

On fre, 2010-11-12 at 17:19 -0500, Robert Haas wrote:
 If we allow users to name objects, we ought to make every effort to
 also allow renaming them.  In my mind, the only way renaming is too
 marginal to be useful is if the feature itself is too marginal to be
 useful.

The bottom line is, any kind of database object needs to be changeable
and removable, otherwise there will always be hesitations about its use.
And when there are hesitations about the use, it's often easiest not to
bother.

I remember ten years ago or so we used to send people away who requested
the ability to drop columns, claiming they didn't plan their database
properly, or they should load it from scratch.  Nowadays that is
ludicrous; databases live forever, development is agile, everything
needs to be changeable.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Erik Rijkers

On Sat, November 13, 2010 18:15, Peter Eisentraut wrote:
 On lÃ¶r, 2010-11-13 at 11:12 -0500, Tom Lane wrote:
 It looks like all the unhappy critters are getting the same virtual
 memory exhausted error.  I wonder whether they are all using make
 3.80 ...

 It turns out that there is an unrelated bug in 3.80 that some Linux
 distributions have patched around.  3.81 or 3.82 are OK.


Just to mention another effect of the recent changes:

make 3.81, Centos 5.5

On a dual quadcore system where I used to built with -j 16, it now only 
succeeds with  -j 8.

(I seem to remember that 16 as opposed to 8 shaved a couple of seconds off, 
although I'm not quite
sure anymore)

make -j 16 gives:

cc1: error: thread.c: No such file or directory
make[4]: *** [thread.o] Error 1
make[3]: *** [submake-libpq] Error 2
make[2]: *** [all-pg_ctl-recursive] Error 2
make[1]: *** [all-bin-recursive] Error 2
make[1]: *** Waiting for unfinished jobs
Use of assignment to $[ is deprecated at ./parse.pl line 21.
In file included from gram.y:12101:
scan.c: In function #8216;yy_try_NUL_trans#8217;:
scan.c:16242: warning: unused variable #8216;yyg#8217;
Use of assignment to $[ is deprecated at ./check_rules.pl line 18.
make: *** [all-src-recursive] Error 2


( A similar effect I see on a dual core fedora system (2.6.27.5-117.fc10.i686), 
where -j 16 always
ran, but now it needs -j 4 or less (it also has make 3.81)  )


Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 8.4-vintage problem in postmaster.c

2010-11-13 Thread Tom Lane

Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 Stefan Kaltenbrunner reported a problem in postmaster via IM to me.  I
 thought I had nailed down the bug, but after more careful reading of the
 code, turns out I was wrong.

 The reported problem is that postmaster shuts itself down with this
 error message:

 2010-11-12 10:19:05 CET FATAL:  no free slots in PMChildFlags array

Just looking at it, I think that the logic in canAcceptConnections got
broken by somebody in 8.4, and then broken some more in 9.0: in some
cases it will return an okay to proceed status without having checked
for TOOMANY children.  Was this system possibly in PM_WAIT_BACKUP or
PM_HOT_STANDBY state?  What version was actually running?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Peter Eisentraut

On lör, 2010-11-13 at 12:20 -0500, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On lör, 2010-11-13 at 11:12 -0500, Tom Lane wrote:
  It looks like all the unhappy critters are getting the same virtual
  memory exhausted error.  I wonder whether they are all using make
  3.80 ...
 
  It turns out that there is an unrelated bug in 3.80 that some Linux
  distributions have patched around.  3.81 or 3.82 are OK.
 
 So what do you mean by unrelated bug?  Can we work around it?

The information is fuzzy, but the problem has been reported around the
internet, and it appears to be related to the foreach function.  I think
I have an idea how to work around it, but I'll need some time.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HOT updates in index-less tables

2010-11-13 Thread Hannu Krosing

On Sat, 2010-11-13 at 12:13 -0500, Tom Lane wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
  On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote:
  If a table has no indexes, we will always decide that any same-page
  update operation is a HOT update, since obviously it isn't modifying
  any indexed columns.  But is there any benefit to doing so? 
 
  If we do the in-page mini vacuum even without HOT, then there should
  be no benefit from index-less HOT updates. 
 
 AFAICS we do: heap_update marks the page as prunable whether it's a HOT
 update or not.  The only difference between treating the update as HOT vs
 not-HOT is that if there was more than one HOT update, the intermediate
 tuples could be completely reclaimed by page pruning (ie, their line
 pointers go away too).  With not-HOT updates, the intermediate line
 pointers would have to remain in DEAD state until vacuum, 

How hard would it be to make the pruning logic be aware of there being
no indexes and thus no possibility of  index entries pointing at any
tuple ?

 since page
 pruning wouldn't know if there were index entries pointing at them.
 But that seems like a pretty tiny penalty.
 
   regards, tom lane

-- 
---
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HOT updates in index-less tables

2010-11-13 Thread Tom Lane

Hannu Krosing ha...@2ndquadrant.com writes:
 On Sat, 2010-11-13 at 12:13 -0500, Tom Lane wrote:
 AFAICS we do: heap_update marks the page as prunable whether it's a HOT
 update or not.  The only difference between treating the update as HOT vs
 not-HOT is that if there was more than one HOT update, the intermediate
 tuples could be completely reclaimed by page pruning (ie, their line
 pointers go away too).  With not-HOT updates, the intermediate line
 pointers would have to remain in DEAD state until vacuum, 

 How hard would it be to make the pruning logic be aware of there being
 no indexes and thus no possibility of  index entries pointing at any
 tuple ?

I think it's problematic, because heap_page_prune can be executed with
only AccessShareLock on the table, which means there's a race condition
against concurrent CREATE INDEX.  You could look at relhasindex easily
enough, but that doesn't prove there's not a CREATE INDEX in progress.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-13 Thread David Fetter

On Sat, Nov 13, 2010 at 05:23:34PM +0200, Marko Tiikkaja wrote:
 On 2010-11-13 5:08 PM +0200, Tom Lane wrote:
 Marko Tiikkajamarko.tiikk...@cs.helsinki.fi  writes:
 On 13 Nov 2010, at 15:41, David Fetterda...@fetter.org  wrote:
 Similarly, if a normal CTE called a data-changing function but
 was nevertheless not referred to, it would still run.
 
 Actually, it wouldn't.
 
 Indeed, and that was considered a feature when we did it.  I think
 that having wCTEs behave arbitrarily differently on this point
 might be a bad idea.
 
 So these queries would behave differently?
 
 WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 WHERE false;
 
 WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 FROM t LIMIT 0;

No.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-13 Thread David Fetter

On Sat, Nov 13, 2010 at 03:23:42PM +0100, Yeb Havinga wrote:
 On 2010-11-13 14:41, David Fetter wrote:
 On Sat, Nov 13, 2010 at 02:28:35PM +0100, Yeb Havinga wrote:
 1) the name Common Table Expression suggests that t must be
 regarded as an expression, hence syntactically / proof theoretic and
 not as a table, set of rows / model theoretic. I.e. it is not a
 Common Table.
 Disagree.  A table never referred to in a query still exists.
 Similarly, if a normal CTE called a data-changing function but was
 nevertheless not referred to, it would still run.
 with t as (select nextval('seq'))
 select 1;
 
 does not update the sequence.

I think you've found a bug in the form of an over-aggressive
optimization for the data-changing case.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-13 Thread Dimitri Fontaine

Alvaro Herrera alvhe...@commandprompt.com writes:
 It's not that straighforward though, in that the producer could stop a
 bit ahead of what the consumer reads, due to there being a buffer in the
 middle.  Witness this simple example

Yeah, another example where the analogy fails for us.
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Peter Eisentraut

On lör, 2010-11-13 at 20:07 +0200, Peter Eisentraut wrote:
 On lör, 2010-11-13 at 12:20 -0500, Tom Lane wrote:
  Peter Eisentraut pete...@gmx.net writes:
   On lör, 2010-11-13 at 11:12 -0500, Tom Lane wrote:
   It looks like all the unhappy critters are getting the same virtual
   memory exhausted error.  I wonder whether they are all using make
   3.80 ...
  
   It turns out that there is an unrelated bug in 3.80 that some Linux
   distributions have patched around.  3.81 or 3.82 are OK.
  
  So what do you mean by unrelated bug?  Can we work around it?
 
 The information is fuzzy, but the problem has been reported around the
 internet, and it appears to be related to the foreach function.  I think
 I have an idea how to work around it, but I'll need some time.

Well, it looks like $(eval) is pretty broken in 3.80, so either we
require 3.81 or we abandon this line of thought.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SSI update

2010-11-13 Thread Kevin Grittner

Circumstances have conspired to leave me with very little time to
work on the SSI patch during the last few weeks.  I'm still convinced
that the work mentioned in this post is necessary to have a
commit-quality patch:
 
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01754.php
 
I also think it's going to be very desirable to convert the conflict
pointers used in the papers (and in the patch so far) to lists, to
eliminate one source of false positives and allow more aggressive
clean-up of transactions.  I we implement the logic from the above
post and *then* convert the pointers to lists, it seems like more
work than implementing the lists first.  Therefore, I'm planning on
doing the lists first.
 
Anyone who thinks that's a bad idea, please speak up soon, as I'm
starting coding on that today.
 
There's no way there will be a patch implementing this in time for
the 2010-11 CF.  A couple months ago there were a couple people who
said they'd be willing to look at this between CFs, so I'm hoping
that their schedules still permit them to do that after this
unfortunate delay.
 
-Kevin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-13 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 Well, it looks like $(eval) is pretty broken in 3.80, so either we
 require 3.81 or we abandon this line of thought.

[ emerges from some grubbing about in the gmake sources... ]
It looks to me like the bug in 3.80 is only triggered when eval
expands to a long enough string to trigger reallocation of the variable
buffer.  (Ergo, the reason they didn't find it sooner was they only
tested on relatively short strings.)

I wonder whether the bug could be worked around if you did the iteration
on SUBDIRS in a foreach surrounding the eval call, so that each eval
dealt with only one subdir target.  This would result in a bit of
redundancy in the generated rules, but that seems tolerable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 8.4-vintage problem in postmaster.c

2010-11-13 Thread Stefan Kaltenbrunner


On 11/13/2010 06:58 PM, Tom Lane wrote:

Alvaro Herreraalvhe...@alvh.no-ip.org  writes:

Stefan Kaltenbrunner reported a problem in postmaster via IM to me.  I
thought I had nailed down the bug, but after more careful reading of the
code, turns out I was wrong.



The reported problem is that postmaster shuts itself down with this
error message:



2010-11-12 10:19:05 CET FATAL:  no free slots in PMChildFlags array


Just looking at it, I think that the logic in canAcceptConnections got
broken by somebody in 8.4, and then broken some more in 9.0: in some
cases it will return an okay to proceed status without having checked
for TOOMANY children.  Was this system possibly in PM_WAIT_BACKUP or
PM_HOT_STANDBY state?  What version was actually running?


I don't have too many details on the actual setup (working on that) but 
the box in question is running 8.4.2 and had no issues before the 
upgrade to 8.4 (ie 8.3 was reported to work fine - so a 8.4+ breakage 
looks plausible).



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 8.4-vintage problem in postmaster.c

2010-11-13 Thread Tom Lane

Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 On 11/13/2010 06:58 PM, Tom Lane wrote:
 Just looking at it, I think that the logic in canAcceptConnections got
 broken by somebody in 8.4, and then broken some more in 9.0: in some
 cases it will return an okay to proceed status without having checked
 for TOOMANY children.  Was this system possibly in PM_WAIT_BACKUP or
 PM_HOT_STANDBY state?  What version was actually running?

 I don't have too many details on the actual setup (working on that) but 
 the box in question is running 8.4.2 and had no issues before the 
 upgrade to 8.4 (ie 8.3 was reported to work fine - so a 8.4+ breakage 
 looks plausible).

Well, this failure would certainly involve a flood of connection
attempts, so it's possible it's a pre-existing bug that they just did
not happen to trip over before.  But the sequence of events that I'm
thinking about is a smart shutdown attempt (SIGTERM to postmaster)
while an online backup is in progress, followed by a flood of
near-simultaneous connection attempts while the backup is still active.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-11-13 Thread Robert Haas

On Sat, Nov 13, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 One idea I've had is that we might want to think about defining an
 operation that is effectively store, with a memory barrier.  For
 example, on x86, this could be implemented using xchg.  I think if you
 have a single-word variable in shared memory that is always updated
 using a locked store, then individual backends should be able to read
 it unlocked without risk of seeing a stale value.

 You're still guilty of fuzzy thinking here.  What does stale mean?

Well, that obviously depends on what algorithm you use to add and
remove items from the array.  I know my thinking is fuzzy; I was
trying to say I'm mulling over whether we could do something like
this rather than I know exactly how to make this work.

*thinks it over*

Here's an algorithm that might work.  Let's assume that we're still
going to allocate an array of size max_wal_senders, but we want to
make max_wal_senders relatively large butl avoid iterating over the
entire array on each commit.  Store a variable in shared memory called
FirstWalSenderOffset which is always updating using a memory barrier
operation.  This value is -1 if there are no currently connected WAL
senders, or the array slot of a currently connected walsender if there
is one.  Each array slot also has a structure member called
NextWalSenderOffset, so that the list of connect WAL senders is
effectively stored as a linked list, but with array slot numbers
rather than pointers.

To add a WAL sender, we initialize the new array slot, setting
NextWalSenderOffset to FirstWalSenderOffset, and then set
FirstWalSenderOffset to the slot number of the newly allocated slot.
To remove a WAL sender, we loop through the notional linked list and,
when we find the pointer to the slot we want to remove, we override
it with a pointer to the slot to which the removed entry points.  All
of these stores are done using the store-with-memory-barrier, so that
readers can iterate over the linked list without a lock.  However, we
can't let two processes try to MODIFY the list at the same time (at
least, not without a more powerful memory ordering primitive; COMPXCHG
might be enough) so just protect list modification with a global
spinlock; updates won't be frequent.

This algorithm makes the assumption that it's OK for a scan to miss
WAL senders that are added mid-scan.  But the current code makes that
assumption, too: a new WAL sender could grab an array slot we've
already passed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-11-13 Thread Robert Haas

On Fri, Nov 12, 2010 at 10:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 FYI, I always wondered if the rare use of mergejoins justified the extra
 planning time of carrying around all those joinpaths.

 They're hardly rare.

They fairly rare in the sorts of queries I normally issue, but I'd
quibble with the statement on other grounds: IME, we generate far more
nest loops paths than anything else.  The comment in
match_unsorted_outer() says it all:

 * We always generate a nestloop path for each available outer path.
 * In fact we may generate as many as five: one on the cheapest-total-cost
 * inner path, one on the same with materialization, one on the
 * cheapest-startup-cost inner path (if different), one on the
 * cheapest-total inner-indexscan path (if any), and one on the
 * cheapest-startup inner-indexscan path (if different).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HOT updates in index-less tables

2010-11-13 Thread Robert Haas

On Sat, Nov 13, 2010 at 12:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
 On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote:
 If a table has no indexes, we will always decide that any same-page
 update operation is a HOT update, since obviously it isn't modifying
 any indexed columns.  But is there any benefit to doing so?

 If we do the in-page mini vacuum even without HOT, then there should
 be no benefit from index-less HOT updates.

 AFAICS we do: heap_update marks the page as prunable whether it's a HOT
 update or not.  The only difference between treating the update as HOT vs
 not-HOT is that if there was more than one HOT update, the intermediate
 tuples could be completely reclaimed by page pruning (ie, their line
 pointers go away too).  With not-HOT updates, the intermediate line
 pointers would have to remain in DEAD state until vacuum, since page
 pruning wouldn't know if there were index entries pointing at them.
 But that seems like a pretty tiny penalty.

I'm not at all convinced that's a tiny penalty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: extensible enums

2010-11-13 Thread Robert Haas

On Sat, Nov 13, 2010 at 12:30 PM, Peter Eisentraut pete...@gmx.net wrote:
 On fre, 2010-11-12 at 17:19 -0500, Robert Haas wrote:
 If we allow users to name objects, we ought to make every effort to
 also allow renaming them.  In my mind, the only way renaming is too
 marginal to be useful is if the feature itself is too marginal to be
 useful.

 The bottom line is, any kind of database object needs to be changeable
 and removable, otherwise there will always be hesitations about its use.
 And when there are hesitations about the use, it's often easiest not to
 bother.

 I remember ten years ago or so we used to send people away who requested
 the ability to drop columns, claiming they didn't plan their database
 properly, or they should load it from scratch.  Nowadays that is
 ludicrous; databases live forever, development is agile, everything
 needs to be changeable.

It was ludicrous then, too.  I picked MySQL for several projects early
on for precisely the lack of the ability to drop columns in
PostgreSQL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 8.4-vintage problem in postmaster.c

2010-11-13 Thread Stefan Kaltenbrunner


On 11/13/2010 11:07 PM, Tom Lane wrote:

Stefan Kaltenbrunnerste...@kaltenbrunner.cc  writes:

On 11/13/2010 06:58 PM, Tom Lane wrote:

Just looking at it, I think that the logic in canAcceptConnections got
broken by somebody in 8.4, and then broken some more in 9.0: in some
cases it will return an okay to proceed status without having checked
for TOOMANY children.  Was this system possibly in PM_WAIT_BACKUP or
PM_HOT_STANDBY state?  What version was actually running?



I don't have too many details on the actual setup (working on that) but
the box in question is running 8.4.2 and had no issues before the
upgrade to 8.4 (ie 8.3 was reported to work fine - so a 8.4+ breakage
looks plausible).


Well, this failure would certainly involve a flood of connection
attempts, so it's possible it's a pre-existing bug that they just did
not happen to trip over before.  But the sequence of events that I'm


afaik this seems to be fairly reproducible on the current box so 
something in 8.4 seems to trigger that issue more often now.




thinking about is a smart shutdown attempt (SIGTERM to postmaster)
while an online backup is in progress, followed by a flood of
near-simultaneous connection attempts while the backup is still active.


interesting - but I don't think that the setup in question actually uses 
online backups at all..




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlogged tables

2010-11-13 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 2. The second one (unlogged-tables-v1) adds support for unlogged
 tables by adding a new supported value for relpersistence. I made this
 work by having backend that creates an unlogged relation write out an
 init fork for that relation.  The main fork is nuked and replaced by
 the contents of the init fork during startup.  But I haven't made this
 code work yet for index types other than btree, so attempting to
 define a non-btree index on an unlogged relation will currently result
 in an error.  I don't think that's probably too hard to fix, but I
 haven't done it yet.

That seems pretty gross.  If you're going to have to take a special
action at startup anyway, why wouldn't it take the form of truncate,
then if it's an index, call the appropriate ambuild function?  Maybe
that's a bit ugly, but at least the ugliness is localized rather than
scribbled all over the filesystem.  I'm also concerned about possible
failure modes having to do with the init fork being missing or
corrupted.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] POSIX shared memory redux

2010-11-13 Thread A . M .

The goal of this work is to address all of the shortcomings of previous POSIX 
shared memory patches as pointed out mostly by Tom Lane.

Branch: 
http://git.postgresql.org/gitweb?p=users/agentm/postgresql.git;a=shortlog;h=refs/heads/posix_shmem
Main file: 
http://git.postgresql.org/gitweb?p=users/agentm/postgresql.git;a=blob;f=src/backend/port/posix_shmem.c;h=da93848d14eeadb182d8bf1fe576d741ae5792c3;hb=refs/heads/posix_shmem

Design goals:
1) ensure that shared memory creation collisions are impossible 
2) ensure that shared memory access collisions are impossible
3) ensure proper shared memory cleanup after backend and postmaster close
4) minimize API changes
http://archives.postgresql.org/pgsql-patches/2007-02/msg00527.php
http://archives.postgresql.org/pgsql-patches/2007-02/msg00558.php

This patch addresses the above goals and offers some benefits over SysV shared 
memory:

1) no kern.sysv management (one documentation page with platform-specific help 
can disappear)
2) shared memory allocation limited only by mmap usage
3) shared memory regions are completely cleaned up when the postmaster and all 
of its children are exited or killed for any reason (including SIGKILL)
4) shared memory creation race conditions or collisions between postmasters or 
backends are impossible
5) after postmaster startup, the postmaster becomes the sole arbiter of which 
other processes are granted access to the shared memory region 
6) mmap and munmap can be used on the shared memory region- this may be useful 
for offering the option to expand the memory region dynamically

The design goals are accomplished by a simple change in shared memory creation: 
after shm_open, the region name is immediately shm_unlink'd. Because POSIX 
shared memory relies on file descriptors, the shared memory is not deallocated 
in the kernel until the last referencing file descriptor is closed (in this 
case, on process exit). The postmaster then becomes the sole arbiter of passing 
the shared memory file descriptor (either through children or through file 
descriptor passing, if necessary).

The patch is a reworked version of Chris Marcellino cmarcell...@apple.com's 
patch.

Details:

1) the shared memory name is based on getpid()- this ensures that no two 
starting postmasters (or other processes) will attempt to acquire the same 
shared memory segment.
2) the shared memory segment is created and immediately unlinked, preventing 
outside access to the shared memory region
3) the shared memory file descriptor is passed to backends via static int file 
descriptor (normal file descriptor inheritance)
* perhaps there is a better location to store the file descriptor- 
advice welcomed.
4) shared memory segment detach occurs when the process exits (kernel-based 
cleanup instead of scheduled in-process clean up)

Additional notes:
The feature whereby arbitrary postgres user processes could connect to the 
shared memory segment has been removed with this patch. If this is a desirable 
feature (perhaps for debugging or performance tools), this could be added by 
implementing a file descriptor passing server in the postmaster which would use 
SCM_RIGHTS control message passing to a) verify that the remote process is 
running as the same user as the postmaster b) pass the shared memory file 
descriptor to the process. I am happy to implement this, if required.

I am happy to continue work on this patch if the pg-hackers deem it worthwhile. 
Thanks!

Cheers,
M



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlogged tables

2010-11-13 Thread Robert Haas

On Sat, Nov 13, 2010 at 7:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 2. The second one (unlogged-tables-v1) adds support for unlogged
 tables by adding a new supported value for relpersistence. I made this
 work by having backend that creates an unlogged relation write out an
 init fork for that relation.  The main fork is nuked and replaced by
 the contents of the init fork during startup.  But I haven't made this
 code work yet for index types other than btree, so attempting to
 define a non-btree index on an unlogged relation will currently result
 in an error.  I don't think that's probably too hard to fix, but I
 haven't done it yet.

 That seems pretty gross.  If you're going to have to take a special
 action at startup anyway, why wouldn't it take the form of truncate,
 then if it's an index, call the appropriate ambuild function?

We've been over this ground before.  You can't read from non-shared
catalogs without binding to a database, and you must reinitialize all
unlogged relations before opening the database for a connection.  So
what you're proposing would involving launching a worker process for
each database in the cluster, having it do its thing and then exit,
and only after all that's done opening the database for connections.
That seems vastly more complex and less performant than what I've done
here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlogged tables

2010-11-13 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 Here is a series of three patches related to unlogged tables.
 1. The first one (relpersistence-v1) is a mostly mechanical patch that
 replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence
 (a character), so that we can support more than two values.  BE SURE
 YOU INITDB, since the old catalog format will not work with this patch
 applied.

While I'm griping ... is there a really good reason to do it that way,
rather than adding a new column?  This will break clients that are
looking at relistemp.  Maybe there aren't any, but I wouldn't bet on
that, and it doesn't seem like you're buying a lot by creating this
incompatibility.  I would also argue that temp-ness is a distinct
concept from logged-ness.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POSIX shared memory redux

2010-11-13 Thread Tom Lane

A.M. age...@themactionfaction.com writes:
 The goal of this work is to address all of the shortcomings of previous POSIX 
 shared memory patches as pointed out mostly by Tom Lane.

It seems like you've failed to understand the main shortcoming of this
whole idea, which is the loss of ability to detect pre-existing backends
still running in a cluster whose postmaster has crashed.  The nattch
variable of SysV shmem segments is really pretty critical to us, and
AFAIK there simply is no substitute for it in POSIX-land.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlogged tables

2010-11-13 Thread Andrew Dunstan




On 11/13/2010 07:59 PM, Tom Lane wrote:

  I would also argue that temp-ness is a distinct
concept from logged-ness.


I agree.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlogged tables

2010-11-13 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Sat, Nov 13, 2010 at 7:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 That seems pretty gross.  If you're going to have to take a special
 action at startup anyway, why wouldn't it take the form of truncate,
 then if it's an index, call the appropriate ambuild function?

 We've been over this ground before.  You can't read from non-shared
 catalogs without binding to a database, and you must reinitialize all
 unlogged relations before opening the database for a connection.  So
 what you're proposing would involving launching a worker process for
 each database in the cluster, having it do its thing and then exit,
 and only after all that's done opening the database for connections.
 That seems vastly more complex and less performant than what I've done
 here.

The fact that it's easy doesn't make it workable.  I would point out for
starters that AMs might (do) put WAL locations and/or XIDs into indexes.
Occasionally copying very old LSNs or XIDs back into active files seems
pretty dangerous.

Cleanup at first connection is something we've been avoiding for years,
but maybe it's time to bite the bullet and do that?

BTW, how will all of this activity look to a hot-standby slave?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] duplicate connection failure messages

2010-11-13 Thread Bruce Momjian

Magnus Hagander wrote:
 On Fri, Nov 12, 2010 at 15:02, Bruce Momjian br...@momjian.us wrote:
  Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   I have developed the attached patch to report whether IPv4 or IPv6 are
   being used.
 
  What's the use of that exactly? ?It doesn't really respond to Peter's
  concern, I think.
 
  Peter liked:
 
  And I agree it's not very friendly in this specific case - I
  wonder if we should log it as localhost (127.0.0.1) and localhost
  (::1) (and similar for any other case that returns more than one
  address).
 
  What this will show is:
 
  ? ? ? ?localhost (IPv4)
  ? ? ? ?localhost (IPv6)
 
  Is that good? ?I can't figure out how to do ::1 because when you supply
  a host _name_, there is no reverse mapping done. ?Looking at the code,
  we test for a host name, then a host ip, and don't assume they are both
  set.
 
 The address is in conn-raddr, no? When you've put in a host name, we
 do a forward lookup, so conn-raddr should contain ::1 already? You
 only need the reverse mapping to get the localhost part, if I read
 the code correctly?

OK, I found out how to get the IP address with the attached patch.  The
problem is that only pghost is set, never pghostaddr.  I am not even
sure how that would get set for this code because my tests show it is
not:

$ psql -h localhost test
pghost = localhost
-- pghostaddr = (null)
psql: could not connect to server: Connection refused
Is the server running on host localhost (127.0.0.1) and 
accepting
TCP/IP connections on port 5432?

$ psql -h 127.0.0.1 test
pghost = 127.0.0.1
pghostaddr = (null)
psql: could not connect to server: Connection refused
Is the server running on host 127.0.0.1 and accepting
TCP/IP connections on port 5432?

To get this to work, I compared pghost with the raddr value, and printed
the IP address if it was not already printed.  There is still a problem
about threading that I can fix.

Is this what we want?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 8f318a1..f3307f3 100644
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
*** connectFailureMessage(PGconn *conn, int 
*** 960,968 
  	else
  #endif   /* HAVE_UNIX_SOCKETS */
  	{
  		appendPQExpBuffer(conn-errorMessage,
  		  libpq_gettext(could not connect to server: %s\n
! 	 \tIs the server running on host \%s\ and accepting\n
  		\tTCP/IP connections on port %s?\n),
  		  SOCK_STRERROR(errorno, sebuf, sizeof(sebuf)),
  		  conn-pghostaddr
--- 960,976 
  	else
  #endif   /* HAVE_UNIX_SOCKETS */
  	{
+ 		struct sockaddr_in *remote_addr = (struct sockaddr_in *) conn-raddr.addr;
+ 		/* not thread safe */
+ 		char *remote_ip = inet_ntoa(remote_addr-sin_addr);
+ 		bool host_ip_match = strcmp(conn-pghost, remote_ip) == 0;
+ 		
+ 		fprintf(stderr, pghost = %s\n, conn-pghost);
+ 		fprintf(stderr, pghostaddr = %s\n, conn-pghostaddr);
+ 		
  		appendPQExpBuffer(conn-errorMessage,
  		  libpq_gettext(could not connect to server: %s\n
! 	 \tIs the server running on host \%s\ %s%s%sand accepting\n
  		\tTCP/IP connections on port %s?\n),
  		  SOCK_STRERROR(errorno, sebuf, sizeof(sebuf)),
  		  conn-pghostaddr
*** connectFailureMessage(PGconn *conn, int 
*** 970,975 
--- 978,987 
  		  : (conn-pghost
  			 ? conn-pghost
  			 : ???),
+ 		  /* display the IP address only if not already output */
+ 		  !host_ip_match ? ( : ,
+ 		  !host_ip_match ? remote_ip : ,
+ 		  !host_ip_match ? )  : ,
  		  conn-pgport);
  	}
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlogged tables

2010-11-13 Thread Robert Haas

On Sat, Nov 13, 2010 at 7:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Here is a series of three patches related to unlogged tables.
 1. The first one (relpersistence-v1) is a mostly mechanical patch that
 replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence
 (a character), so that we can support more than two values.  BE SURE
 YOU INITDB, since the old catalog format will not work with this patch
 applied.

 While I'm griping ... is there a really good reason to do it that way,
 rather than adding a new column?  This will break clients that are
 looking at relistemp.  Maybe there aren't any, but I wouldn't bet on
 that, and it doesn't seem like you're buying a lot by creating this
 incompatibility.  I would also argue that temp-ness is a distinct
 concept from logged-ness.

I think that would be a recipe for bugs.  Look at the three new macros
I introduced.  If you keep relistemp around, then any code which
relies on it is likely testing for one of those three things, or maybe
even something subtly different from any of them, as in the cases
where I needed to add a switch statement.  The way I see it, this is
ultimately a four-level hierarchy: permanent tables (write WAL, shared
buffers, ordinary namespace), unlogged tables (don't write WAL, shared
buffers, ordinary namespace), global temporary tables (don't write
WAL, local buffers, ordinary namespace), local temporary tables (don't
write WAL, local buffers, private namespace).  Even if we don't end up
implementing global temporary tables in the way I'm envisioning (I
know you have an alternate proposal), it seem inevitable that a
boolean for relistemp isn't going to be sufficient.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlogged tables

2010-11-13 Thread Robert Haas

On Sat, Nov 13, 2010 at 8:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Nov 13, 2010 at 7:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 That seems pretty gross.  If you're going to have to take a special
 action at startup anyway, why wouldn't it take the form of truncate,
 then if it's an index, call the appropriate ambuild function?

 We've been over this ground before.  You can't read from non-shared
 catalogs without binding to a database, and you must reinitialize all
 unlogged relations before opening the database for a connection.  So
 what you're proposing would involving launching a worker process for
 each database in the cluster, having it do its thing and then exit,
 and only after all that's done opening the database for connections.
 That seems vastly more complex and less performant than what I've done
 here.

 The fact that it's easy doesn't make it workable.  I would point out for
 starters that AMs might (do) put WAL locations and/or XIDs into indexes.
 Occasionally copying very old LSNs or XIDs back into active files seems
 pretty dangerous.

I haven't examined the GIST, GIN, or hash index code in detail so I am
not sure whether there are any hazards there; the btree case does not
seem to have any issues of this type.  Certainly, if an index AM puts
an XID into an empty index, that's gonna break.  I would consider that
a pretty odd thing to do, though.  An LSN seems less problematic since
the LSN space does not wrap; it should just look like an index that
was created a long time ago and never updated (which, in effect, it
is).

 Cleanup at first connection is something we've been avoiding for years,
 but maybe it's time to bite the bullet and do that?

There would certainly be some advantage to doing cleanup at first
connection even if we stick with the overall approach I've adopted
here, because you could avoid the overhead of cleaning up databases
that are never actually accessed.  There are a few downsides, though.
If you happened to leave a large amount of unlogged data on disk after
a crash, and then for some reason never connected to that database
again, you'd never reclaim that disk space; although perhaps you could
somehow arrange for autovacuum to clean up in that case.  Also, the
first connection to the offending database would need to lock out all
other connections until cleanup was completed, although I suppose
that's still better than doing the cleanup in the startup process as
is presently the case.  I guess the main problem is you'd need a
reliable and *inexpensive* way of identifying the first connection to
each database.  Paying something extra at startup time is better than
paying even a small penalty on each individual connection; goodness
knows our connections are expensive enough already.

 BTW, how will all of this activity look to a hot-standby slave?

The table will appear to exist but you'll get an error if you try to
access it.  I think at present it'll complain about the underying
files being missing; that could probably be fine-tuned if we're so
inclined.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlogged tables

2010-11-13 Thread Greg Stark

On Sun, Nov 14, 2010 at 1:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Cleanup at first connection is something we've been avoiding for years,
 but maybe it's time to bite the bullet and do that?


Another alternative is to initialize the unlogged tables when you
first access them. If you try to open a table and there are no files
attached them go ahead and initialize it by creating an empty table
and building any indexes.

Hm, I had been assuming recovery would be responsible for cleaning up
the tables even if the first access is responsible for rebuilding
them. But there's a chance there have been no modifications to them
since the last checkpoint. But in that case the data in them is fine.
It would be a weird interface if it only cleared them out sometimes
based on unpredictable timing though. Avoiding that does require some
kind of alternate storage scheme other than the WAL to indicate what
needs to be cleared out. .init files are as good a mechanism even if
they just mean unlink this file on startup.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Label switcher function

2010-11-13 Thread Robert Haas

2010/11/12 KaiGai Kohei kai...@kaigai.gr.jp:
 The attached patch allows the security label provider to switch
 security label of the client during execution of certain functions.
 I named it as label switcher function; also called as trusted-
 procedure in SELinux community.

 This feature is quite similar idea toward security definer function,
 or set-uid program on operating system. It allows label providers
 to switch its internal state that holds security label of the
 client, then restore it.
 If and when a label provider said the function being invoked is
 a label-switcher, fmgr_security_definer() traps this invocation
 and set some states just before actual invocations.

 We added three new hooks for security label provider.
 The get_client_label and set_client_label allows the PG core to
 save and restore security label of the client; which is mostly
 just an internal state of plugin module.
 And, the get_switched_label shall return NULL or a valid label
 if the supplied function is a label switcher. It also informs
 the PG core whether the function is switcher or not.

I don't see why the plugin needs to expose the label stack to core PG.
 If the plugin needs a label stack, it can do that all on its own.  I
see that we need the hooks to allow the plugin to selectively disable
inlining and to gain control when function execution starts and ends
(or aborts) but I don't think the exact manipulations that the plugin
chooses to do at that point need to be visible to core PG.

For SE-Linux, how do you intend to determine whether or not the
function is a trusted procedure?  Will that be a function of the
security label applied to it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlogged tables

2010-11-13 Thread Robert Haas

On Sat, Nov 13, 2010 at 9:17 PM, Greg Stark gsst...@mit.edu wrote:
 On Sun, Nov 14, 2010 at 1:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Cleanup at first connection is something we've been avoiding for years,
 but maybe it's time to bite the bullet and do that?

 Another alternative is to initialize the unlogged tables when you
 first access them. If you try to open a table and there are no files
 attached them go ahead and initialize it by creating an empty table
 and building any indexes.

I thought about that (I've thought about a lot of things in regards to
this feature...).  One problem is that you presumably will need to
open the relation before you can decide whether this is the first
access since restart.  But by the time you've opened them, you've
already taken an AccessShareLock, and you'll presumably need something
a whole lot stronger than that to do the rebuild.  Lock upgrades are
usually a good thing to avoid when possible, although maybe it would
be OK in this case, not sure.  Another problem is that it's not too
clear to me where you'd hook in the logic to do the cleanup.  The
relcache code seems like an awfully low-level place to be trying to
perpetrate this sort of monkey business.

 Hm, I had been assuming recovery would be responsible for cleaning up
 the tables even if the first access is responsible for rebuilding
 them. But there's a chance there have been no modifications to them
 since the last checkpoint. But in that case the data in them is fine.
 It would be a weird interface if it only cleared them out sometimes
 based on unpredictable timing though. Avoiding that does require some
 kind of alternate storage scheme other than the WAL to indicate what
 needs to be cleared out. .init files are as good a mechanism even if
 they just mean unlink this file on startup.

One idea I had was to trigger the rebuild when we notice that the main
relation fork is missing.  Then the startup code can just notice the
init fork, annihilate everything else, and call it good. However, this
appears to require modifying some fairly fundamental assumptions of
the current system.  smgr.c/md.c believe that nobody should ever try
to read a nonexistent block, and unconditionally throw an error if the
caller tries to do so.  You could provide a mode where they don't do
that, and instead return an error indication to the caller.  Then you
could add an additional ReadBuffer mode, say RBM_FAIL, to let the
error percolate back up through that layer to the index AM or heap
code, which could then try to upgrade its lock and recreate the main
fork.  However, I really couldn't work up much enthusiasm for
implementing this feature in a way that requires drilling a hole in
the abstraction stack from top to bottom.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring the Type System

2010-11-13 Thread Darren Duncan


David Fetter wrote:

For the past couple of years, I've been hearing from the PostGIS
people among others that our type system just isn't flexible enough
for their needs.  It's really starting to show its age, or possibly
design compromises that seemed reasonable a decade or more ago, but
are less so now.

To that end, I've put up a page on the wiki that includes a list of
issues to be addressed.  It's intended to be changed, possibly
completely.

http://wiki.postgresql.org/wiki/Refactor_Type_System

What might the next version of the type system look like?


Are you talking about changes to the type system as users see it or just changes 
to how the existing behavior is implemented internally?  If you're talking 
about, as users see it, which the other replies to this thread seem to be 
saying, though not necessarily the url you pointed to which looks more internals ...


As a statement which may surprise no one who's heard me talk about it before ...

I've mostly completed a type system specification that would be useable by 
Postgres, as the most fundamental part of my Muldis D language.


The type system is arguably the most central piece of any DBMS, around which 
everything else is defined and built.


You have data, which is structured in some way, and has operators for it.

If you look at a DBMS from the perspective of being a programming language 
implementation, you find that a database is just a variable that holds a value 
of a structured type.  In the case of a relational database, said database is a 
tuple whose attribute values are relations; or in the case of 
namespaces/schemas, the database tuple has tuple attributes having relation 
attributes.


If a database is a variable, then all database constraints are type constraints 
on the declared type of that variable, and you can make said constraints 
arbitrarily complicated.


From basic structures like nestable tuples and relations, plus a complement of 
basic types like numbers and strings, and arbitrary constraints, you can define 
data types of any shape or form.


A key component of a good type system is that users can define data types, and 
moreover where possible, system-defined types are defined in the same ways as 
users define types.  For example, stuff like temporal types or geospatial types 
are prime candidates for being defined like user-defined types.


If you define all structures using tuples and relations, you can easily flatten 
this out on the implementation end and basically do everything as associated 
flat relation variables as you do now.


So what I propose is both very flexible and easy to implement, scale, and 
optimize, relatively speaking.


You don't have to kludge things by implementing arrays as blobs for example; you 
can implement them as relations instead.  Geospatial types can just be tuples. 
Arrays of structured types can just be relations with an attribute per type 
attribute.  Arrays of simple types can just be unary relations.


You can also emulate all of the existing Pg features and syntax that you have 
now over the type system I've defined, maintaining compatibility too.


I also want to emphasize that, while I drew inspiration from many sources when 
defining Muldis D, and there was/is a lot I still didn't/don't know about 
Postgres, I have found that as I use and learn Postgres, I'm finding frequently 
that how Postgres does things is similar and compatible to how I independently 
came up with Muldis D's design; I'm finding more similarities all the time.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Label switcher function

2010-11-13 Thread KaiGai Kohei


(2010/11/14 11:19), Robert Haas wrote:

2010/11/12 KaiGai Koheikai...@kaigai.gr.jp:

The attached patch allows the security label provider to switch
security label of the client during execution of certain functions.
I named it as label switcher function; also called as trusted-
procedure in SELinux community.

This feature is quite similar idea toward security definer function,
or set-uid program on operating system. It allows label providers
to switch its internal state that holds security label of the
client, then restore it.
If and when a label provider said the function being invoked is
a label-switcher, fmgr_security_definer() traps this invocation
and set some states just before actual invocations.

We added three new hooks for security label provider.
The get_client_label and set_client_label allows the PG core to
save and restore security label of the client; which is mostly
just an internal state of plugin module.
And, the get_switched_label shall return NULL or a valid label
if the supplied function is a label switcher. It also informs
the PG core whether the function is switcher or not.


I don't see why the plugin needs to expose the label stack to core PG.
  If the plugin needs a label stack, it can do that all on its own.  I
see that we need the hooks to allow the plugin to selectively disable
inlining and to gain control when function execution starts and ends
(or aborts) but I don't think the exact manipulations that the plugin
chooses to do at that point need to be visible to core PG.


Hmm. I designed this patch according to the implementation of  existing
security definer function, but it is not a only design.

Does the label stack means that this patch touches xact.c, doesn't it?
Yes, if we have above three hooks around function calls, the core PG
does not need to manage a label stack.

However, I want fmgr_security_definer_cache to have a field to save
private opaque data, because it is not a very-light step to ask SE-Linux
whether the function is trusted-procedure and to allocate a string to
be applied during execution, although switching is a very-light step.
So, I want to compute it at first time of the function calls, like as
security definer function checks syscache at once.

Of course, it is a private opaque data, it will be open for other usage.


For SE-Linux, how do you intend to determine whether or not the
function is a trusted procedure?  Will that be a function of the
security label applied to it?


When the function being invoked has a special security label with
a type_transition rule on the current client's label in the
security policy, SE-Linux decides the function is trusted procedure.

In other words, we can know whether or not the function is a trusted
procedure by asking to the security policy. It is a task of the plugin.

Thanks,
--
KaiGai Kohei kai...@kaigai.gr.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

57 matches

Mail list logo