Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Fujii Masao
On Wed, Sep 10, 2008 at 12:26 AM, Heikki Linnakangas
[EMAIL PROTECTED] wrote:
 If a slave falls behind, how does it catch up? I guess you're saying that it
 can't fall behind, because the master will block before that happens. Also
 in asynchronous replication? And what about when the slave is first set up,
 and needs to catch up with the master?

The mechanism for the slave to catch up with the master should be
provided on the outside of postgres. I think that postgres should provide
only WAL streaming, i.e. the master always sends *current* WAL data
to the slave.

Of course, the master has to send also the current WAL *file* in the
initial sending just after the slave starts and connects with it.
Because, at the time, current WAL position might be in the middle of
WAL file. Even if the master sends only current WAL data, the slave
which don't have the corresponding WAL file can not handle it.

regards

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-10 Thread Volkan YAZICI
On Tue, 9 Sep 2008, David Fetter [EMAIL PROTECTED] writes:
 AFAICS, PostgreSQL is not keeping info about when a table, database,
 sequence, etc was created.  We cannot get that info even from OS,
 since CLUSTER or VACUUM FULL may change the metadata of
 corresponding relfilenode.

 When people aren't keeping track of their DDL, that is very strictly a
 process problem on their end.  When people are shooting themselves in
 the foot, it's a great disservice to market Kevlar shoes to them.

Word. In the company I'm currently working at we store database schema
in a VCS repository with minor and major version taggings. And there is
a current_foo_soft_version() function that returns the revision of the
related database schema. If there is no control over the database schema
changes in a company working scheme, the most logging-feature-rich
PostgreSQL release will provide an insignificant benefit compared the
mess needs to get fixed.


Regards.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Hannu Krosing
On Wed, 2008-09-10 at 15:15 +0900, Fujii Masao wrote:
 On Wed, Sep 10, 2008 at 12:26 AM, Heikki Linnakangas
 [EMAIL PROTECTED] wrote:
  If a slave falls behind, how does it catch up? I guess you're saying that it
  can't fall behind, because the master will block before that happens. Also
  in asynchronous replication? And what about when the slave is first set up,
  and needs to catch up with the master?
 
 The mechanism for the slave to catch up with the master should be
 provided on the outside of postgres. 

So you mean that we still need to do initial setup (copy backup files
and ship and replay WAL segments generated during copy) by external
WAL-shipping tools, like walmgr.py, and then at some point switch over
to internal WAL-shipping, when we are sure that we are within same WAL
file on both master and slave ?

 I think that postgres should provide
 only WAL streaming, i.e. the master always sends *current* WAL data
 to the slave.

 Of course, the master has to send also the current WAL *file* in the
 initial sending just after the slave starts and connects with it.

I think that it needs to send all WAL files which slave does not yet
have, as else the slave will have gaps. On busy system you will generate
several new WAL files in the time it takes to make master copy, transfer
it to slave and apply WAL files generated during initial setup.

 Because, at the time, current WAL position might be in the middle of
 WAL file. Even if the master sends only current WAL data, the slave
 which don't have the corresponding WAL file can not handle it.

I agree, that making initial copy may be outside the scope of
Synchronous Log Shipping Replication, but slave catching up by
requesting all missing WAL files and applying these up to a point when
it can switch to Sync mode should be in. Else we gain very little from
this patch.

---
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Pavan Deolasee
On Wed, Sep 10, 2008 at 12:05 PM, Hannu Krosing [EMAIL PROTECTED] wrote:


 Because, at the time, current WAL position might be in the middle of
 WAL file. Even if the master sends only current WAL data, the slave
 which don't have the corresponding WAL file can not handle it.

 I agree, that making initial copy may be outside the scope of
 Synchronous Log Shipping Replication, but slave catching up by
 requesting all missing WAL files and applying these up to a point when
 it can switch to Sync mode should be in. Else we gain very little from
 this patch.


I agree. We should leave the initial backup acquisition out of the
scope atleast for the first phase, but provide mechanism to do initial
catch up, as it may get messy to do it completely outside of the core.

The slave will need to able to buffer the *current* WAL until it gets
the missing WAL files and then continue. Also we may not want the
master to be stuck while slave is doing the catchup.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Simon Riggs

On Wed, 2008-09-10 at 13:28 +0900, Fujii Masao wrote:
 On Tue, Sep 9, 2008 at 8:38 PM, Heikki Linnakangas
 [EMAIL PROTECTED] wrote:
  There's one thing I haven't figured out in this discussion. Does the write
  to the disk happen before or after the write to the slave? Can you guarantee
  that if a transaction is committed in the master, it's also committed in the
  slave, or vice versa?
 

The write happens concurrently and independently on both.

Yes, you wait for the write *and* send pointer to be flushed before
you allow a synch commit with synch replication. (Definition of flushed
is changeable by parameters).

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-10 Thread Tino Wildenhain

Hi,

Devrim GÜNDÜZ wrote:

On Tue, 2008-09-09 at 15:36 -0400, Tom Lane wrote:

why would creation time (as opposed to any other time, eg last schema
modification, last data modification, yadda yadda) be especially
significant? 

Hmm, those would be cool, too.


maybe except last data modification.

But for audit reasons its really helpful so see if
someone has had hands on objects since they have
been created. So if it would not cost us arm and leg
I'm all for having created/changed timestamps for
all objects.

Regards
Tino



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Simon Riggs

On Wed, 2008-09-10 at 12:24 +0530, Pavan Deolasee wrote:
 Also we may not want the master to be stuck while slave is doing the catchup.

No, since it may take hours, not seconds.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Markus Wanner

Hi,

Simon Riggs wrote:

1. Standby contacts primary and says it would like to catch up, but is
currently at point X (which is a point at, or after the first consistent
stopping point in WAL after standby has performed its own crash
recovery, if any was required).
2. primary initiates data transfer of old data to standby, starting at
point X
3. standby tells primary where it has got to periodically
4. at some point primary decides primary and standby are close enough
that it can now begin streaming current WAL (which is always the WAL
up to wal_buffers behind the the current WAL insertion point).


Hm.. wouldn't it be simpler, to start streaming right away and cache 
that on the standby until it can be applied? I.e. a protocol like:


1. - same as above -
2. primary starts streaming from live or hot data from it's current 
position Y in the WAL stream, which is certainly after (or probably 
equal to) X.
3. standby receives the hot stream from point Y on. It now knows it 
misses 'cold' portions of the WAL from X to Y and requests that.
4. primary serves remaining 'cold' WAL chunks from its xlog / archive 
from between X and Y.
5. standby applies 'cold' WAL, until done. Then proceeds with the cached 
WAL segments from 'hot' streaming.



Bear in mind that unless wal_buffers  16MB the final catchup will
*always* be less than one WAL file, so external file based mechanisms
alone could never be enough.


Agreed.


This also probably means that receipt of WAL data on the standby cannot
be achieved by placing it in wal_buffers. So we probably need to write
it directly to the WAL files, then rely on the filesystem cache on the
standby to buffer the data for use by ReadRecord.


Makes sense, especially in case of cached WAL as outlined above. Is this 
a problem in any way?


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Heikki Linnakangas

Simon Riggs wrote:

On Wed, 2008-09-10 at 13:28 +0900, Fujii Masao wrote:

On Tue, Sep 9, 2008 at 8:38 PM, Heikki Linnakangas
[EMAIL PROTECTED] wrote:

There's one thing I haven't figured out in this discussion. Does the write
to the disk happen before or after the write to the slave? Can you guarantee
that if a transaction is committed in the master, it's also committed in the
slave, or vice versa?


The write happens concurrently and independently on both.

Yes, you wait for the write *and* send pointer to be flushed before
you allow a synch commit with synch replication. (Definition of flushed
is changeable by parameters).


The thing that bothers me is the behavior when the synchronous slave 
doesn't respond. A timeout has been discussed, after which the master 
just gives up on sending, and starts acting as if there's no slave. 
How's that different from asynchronous mode where WAL is sent to the 
server concurrently when it's flushed to disk, but we don't wait for the 
send to finish? ISTM that in both cases the only guarantee we can give 
is that when a transaction is acknowledged as committed, it's committed 
in the master but not necessarily in the slave.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Pavan Deolasee
On Wed, Sep 10, 2008 at 1:40 PM, Heikki Linnakangas
[EMAIL PROTECTED] wrote:


 The thing that bothers me is the behavior when the synchronous slave doesn't
 respond. A timeout has been discussed, after which the master just gives up
 on sending, and starts acting as if there's no slave. How's that different
 from asynchronous mode where WAL is sent to the server concurrently when
 it's flushed to disk, but we don't wait for the send to finish? ISTM that in
 both cases the only guarantee we can give is that when a transaction is
 acknowledged as committed, it's committed in the master but not necessarily
 in the slave.


I think there is one difference. Assuming that the timeouts happen
infrequently, most of the time the slave is in sync with the master
and that can be reported to the user. Whereas in async mode, the slave
will *always* be out of sync.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Simon Riggs

On Wed, 2008-09-10 at 11:10 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Wed, 2008-09-10 at 13:28 +0900, Fujii Masao wrote:
  On Tue, Sep 9, 2008 at 8:38 PM, Heikki Linnakangas
  [EMAIL PROTECTED] wrote:
  There's one thing I haven't figured out in this discussion. Does the write
  to the disk happen before or after the write to the slave? Can you 
  guarantee
  that if a transaction is committed in the master, it's also committed in 
  the
  slave, or vice versa?
  
  The write happens concurrently and independently on both.
  
  Yes, you wait for the write *and* send pointer to be flushed before
  you allow a synch commit with synch replication. (Definition of flushed
  is changeable by parameters).
 
 The thing that bothers me is the behavior when the synchronous slave 
 doesn't respond. A timeout has been discussed, after which the master 
 just gives up on sending, and starts acting as if there's no slave. 
 How's that different from asynchronous mode where WAL is sent to the 
 server concurrently when it's flushed to disk, but we don't wait for the 
 send to finish? ISTM that in both cases the only guarantee we can give 
 is that when a transaction is acknowledged as committed, it's committed 
 in the master but not necessarily in the slave.

We should differentiate between what the WALsender does and what the
user does in response to a network timeout.

Saying I want to wait for a synchronous commit and I am willing to wait
for ever to ensure it leads to long hangs in some cases.

I was suggesting that some users may wish to wait up to time X before
responding to the commit. The WALSender may keep retrying long after
that point, but that doesn't mean all current users need to do that
also. The user would need to say whether the response to the timeout was
an error, or just accept and get on with it.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: Collation support

2008-09-10 Thread Heikki Linnakangas

Radek Strnad wrote:

Progress so far:
- created catalogs pg_collation a pg_charset which are filled with three
standard collations
- initdb changes rows called DEFAULT in both catalogs during the bki
bootstrap phase with current system LC_COLLATE and LC_CTYPE or those set by
command line.
- new collations can be defined with command CREATE COLLATION collation
name FOR character set specification  FROM existing collation name
[STRCOLFN fn name]
[ pad characteristic ] [ case sensitive ] [ LCCOLLATE lc_collate ] [
LCCTYPE lc_ctype ]
- because of pg_collation and pg_charset are catalogs individual for each
database, if you want to create a database with collation other than
specified, create it in template1 and then create database


I have to wonder, is all that really necessary? The feature you're 
trying to implement is to support database-level collation at first, and 
perhaps column-level collation later. We don't need support for 
user-defined collations and charsets for that.


If leave all that out of the patch for now, we'll have a much slimmer, 
and just as useful patch, implementing database-level collation. We can 
add those catalogs later if we need them, but I don't think there's much 
point in adding all that infrastructure if they just reflect the locales 
installed in the operating system.



- when connecting to database, it retrieves locales from pg_database and
sets them


This is the real gist of this patch.


Design  functionality changes left:
- move retrieveing collation from pg_database to pg_type


I don't understand this item. What will you move?


- get case sensitivity and pad characteristic working


I feel we should leave this to the collation implementation.


- when creating database with different collation than database cluster, the
database has to be reindexed. Any idea how to do it? Function
ReindexDatabase works only when database is opened.


That's a tricky one. One idea is to prohibit choosing a different 
collation than the one in the template database, unless we know it's 
safe to do so without reindexing. The problem is that we don't know 
whether it's safe. A simple but limiting solution would be to require 
that the template database has the same collation as the database that's 
being created, except that template0 can always be used as template. 
template0 is safe, because there's no indexes on text columns there.


Note that we already have the same problem with encodings. If you create 
a database with LATIN1 encoding, load it with data, and then use that as 
a template for a database with UTF-8 encoding, the text data will be 
incorrectly encoded. We should probably fix that too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Heikki Linnakangas

Simon Riggs wrote:

Saying I want to wait for a synchronous commit and I am willing to wait
for ever to ensure it leads to long hangs in some cases.


Sure. That's the fundamental problem with synchronous replication. 
That's why many people choose asynchronous replication instead. Clearly 
at some point you'll want to give up and continue without the slave, or 
kill the master and fail over to the slave. I'm wondering how that's 
different than the lag between master and server in asynchronous 
replication from the client's point of view.



I was suggesting that some users may wish to wait up to time X before
responding to the commit. The WALSender may keep retrying long after
that point, but that doesn't mean all current users need to do that
also. The user would need to say whether the response to the timeout was
an error, or just accept and get on with it.


I'm not sure I understand that paragraph. Who's the user? Do we need to 
expose some new information to the client so that it can do something?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Csaba Nagy
On Tue, 2008-09-09 at 20:59 +0200, Zeugswetter Andreas OSB sIT wrote:
 All in all a useful streamer seems like a lot of work.

I mentioned some time ago an alternative idea of having the slave
connect through a normal SQL connection and call a function which
streams the WAL file from the point requested by the slave... wouldn't
that be feasible ? All the connection part would be already there, only
the streaming function should be implemented. It even could use SSL
connections if needed. Then you would have one normal backend per slave,
and they should access either the files directly or possibly some shared
area where the WAL is buffered for this purpose... the streaming
function could also take care of signaling the up-to-dateness of the
slaves in case of synchronous replication.

There could also be some system table infrastructure to track the
slaves. There could also be some functions to stream the files of the DB
through normal backends, so a slave could be bootstrapped all the way
from copying the files through a simple postgres backend connection...
that would make for the easiest possible setup of a slave: configure a
connection to the master, and hit run... and last but not least the
same interface could be used by a PITR repository client for archiving
the WAL stream and occasional file system snapshots.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: Collation support

2008-09-10 Thread Martijn van Oosterhout
On Wed, Sep 10, 2008 at 11:29:14AM +0300, Heikki Linnakangas wrote:
 Radek Strnad wrote:
 - because of pg_collation and pg_charset are catalogs individual for each
 database, if you want to create a database with collation other than
 specified, create it in template1 and then create database
 
 I have to wonder, is all that really necessary? The feature you're 
 trying to implement is to support database-level collation at first, and 
 perhaps column-level collation later. We don't need support for 
 user-defined collations and charsets for that.

Since the set of collations isn't exactly denumerable, we need some way
to allow the user to specify the collation they want. The only
collation PostgreSQL knows about is the C collation. Anything else is
user-defined.

 Design  functionality changes left:
 - move retrieveing collation from pg_database to pg_type
 
 I don't understand this item. What will you move?

Long term, the collation is a property of the type, but I agree, I'm not
sure why this patch needs it.

 That's a tricky one. One idea is to prohibit choosing a different 
 collation than the one in the template database, unless we know it's 
 safe to do so without reindexing.

But that put us back where we started: every database having the same
collation. We're trying to move away from that. Just reindex everything
and be done with it.

 Note that we already have the same problem with encodings. If you create 
 a database with LATIN1 encoding, load it with data, and then use that as 
 a template for a database with UTF-8 encoding, the text data will be 
 incorrectly encoded. We should probably fix that too.

I'd say forbid more than one encoding in a cluster, but that's just my
opinion :)

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] Keeping creation time of objects

2008-09-10 Thread Hannu Krosing
On Wed, 2008-09-10 at 09:27 +0300, Volkan YAZICI wrote:
 On Tue, 9 Sep 2008, David Fetter [EMAIL PROTECTED] writes:
  AFAICS, PostgreSQL is not keeping info about when a table, database,
  sequence, etc was created.  We cannot get that info even from OS,
  since CLUSTER or VACUUM FULL may change the metadata of
  corresponding relfilenode.
 
  When people aren't keeping track of their DDL, that is very strictly a
  process problem on their end.  When people are shooting themselves in
  the foot, it's a great disservice to market Kevlar shoes to them.
 
 Word. In the company I'm currently working at we store database schema
 in a VCS repository with minor and major version taggings. And there is
 a current_foo_soft_version() function that returns the revision of the
 related database schema. If there is no control over the database schema
 changes in a company working scheme, the most logging-feature-rich
 PostgreSQL release will provide an insignificant benefit compared the
 mess needs to get fixed.

Timestamps should rather be considered a forensic tool.

You may have the best VCS system, but if somebody bypasses it, you may
still need to find out, when it was done.

Until we have some enforcable audit facilities for DDL in place _inside_
the database, having at least timestamps often helps.

-
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Hannu Krosing
On Wed, 2008-09-10 at 10:06 +0200, Markus Wanner wrote:
 Hi,
 
 Simon Riggs wrote:
  1. Standby contacts primary and says it would like to catch up, but is
  currently at point X (which is a point at, or after the first consistent
  stopping point in WAL after standby has performed its own crash
  recovery, if any was required).
  2. primary initiates data transfer of old data to standby, starting at
  point X
  3. standby tells primary where it has got to periodically
  4. at some point primary decides primary and standby are close enough
  that it can now begin streaming current WAL (which is always the WAL
  up to wal_buffers behind the the current WAL insertion point).
 
 Hm.. wouldn't it be simpler, to start streaming right away and cache 
 that on the standby until it can be applied? I.e. a protocol like:

Good idea! 

This makes everything simpler, as user has to do only 4 things

1. start slave in receive WAL, dont apply mode
2. start walshipping on master 
3. copy files from master to slave.
4. restart slave in receive WAL mode

all else will happen automatically.

---
Hannu





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Hannu Krosing
On Wed, 2008-09-10 at 08:15 +0100, Simon Riggs wrote:

 Any working solution needs to work for all required phases. If you did
 it this way, you'd never catch up at all.
 
 When you first make the copy, it will be made at time X. The point of
 consistency will be sometime later and requires WAL data to make it
 consistent. So you would need to do a PITR to get it to the point of
 consistency. While you've been doing that, the primary server has moved
 on and now there is a gap between primary and standby. You *must*
 provide a facility to allow the standby to catch up with the primary.
 Only sending *current* WAL is not a solution, and not acceptable.
 
 So there must be mechanisms for sending past *and* current WAL data to
 the standby, and an exact and careful mechanism for switching between
 the two modes when the time is right. Replication is only synchronous
 *after* the change in mode.
 
 So the protocol needs to be something like:
 
 1. Standby contacts primary and says it would like to catch up, but is
 currently at point X (which is a point at, or after the first consistent
 stopping point in WAL after standby has performed its own crash
 recovery, if any was required).
 2. primary initiates data transfer of old data to standby, starting at
 point X
 3. standby tells primary where it has got to periodically
 4. at some point primary decides primary and standby are close enough
 that it can now begin streaming current WAL (which is always the WAL
 up to wal_buffers behind the the current WAL insertion point).
 
 Bear in mind that unless wal_buffers  16MB the final catchup will
 *always* be less than one WAL file, so external file based mechanisms
 alone could never be enough. So you would need wal_buffers = 2000 to
 make an external catch up facility even work at all.
 
 This also probably means that receipt of WAL data on the standby cannot
 be achieved by placing it in wal_buffers. So we probably need to write
 it directly to the WAL files, then rely on the filesystem cache on the
 standby to buffer the data for use by ReadRecord.

And this catchup may be needed to be done repeatedly, in case of network
failure.

I don't think that slave automatically becoming a master if it detects
network failure (as suggested elsewhere in this thread) is acceptable
solution, as it will more often than not result in two masters.

A better solution would be:

1. Slave just keeps waiting for new WAL records and confirming receipt
storing to disk and application.

2. Master is in one of at least two states

 2.1 - Catchup - Async mode where it is sending old logs and wal 
records to slave

 2.2 - Sync Replication - Sync mode, where COMMIT does not return before
confirmation from WALSender.

Initial mode is Catchup which is promoted to Sync Replication when delay
of WAL application is reasonably small.

When Master detects network outage (== delay bigger than acceptable) it
will either just Send a NOTICE to all clients and fall back to Catchup,
or raise an ERROR (and still fall back to cathup)

This is the point where external HA / Heartbeat etc. solutions would
intervene and decide, what to do.

-
Hannu




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Fujii Masao
On Wed, Sep 10, 2008 at 4:15 PM, Simon Riggs [EMAIL PROTECTED] wrote:

 On Wed, 2008-09-10 at 09:35 +0300, Hannu Krosing wrote:
 On Wed, 2008-09-10 at 15:15 +0900, Fujii Masao wrote:
  On Wed, Sep 10, 2008 at 12:26 AM, Heikki Linnakangas
  [EMAIL PROTECTED] wrote:
   If a slave falls behind, how does it catch up? I guess you're saying 
   that it
   can't fall behind, because the master will block before that happens. 
   Also
   in asynchronous replication? And what about when the slave is first set 
   up,
   and needs to catch up with the master?
 
  The mechanism for the slave to catch up with the master should be
  provided on the outside of postgres.

 So you mean that we still need to do initial setup (copy backup files
 and ship and replay WAL segments generated during copy) by external
 WAL-shipping tools, like walmgr.py, and then at some point switch over
 to internal WAL-shipping, when we are sure that we are within same WAL
 file on both master and slave ?

  I think that postgres should provide
  only WAL streaming, i.e. the master always sends *current* WAL data
  to the slave.
 
  Of course, the master has to send also the current WAL *file* in the
  initial sending just after the slave starts and connects with it.

 I think that it needs to send all WAL files which slave does not yet
 have, as else the slave will have gaps. On busy system you will generate
 several new WAL files in the time it takes to make master copy, transfer
 it to slave and apply WAL files generated during initial setup.

  Because, at the time, current WAL position might be in the middle of
  WAL file. Even if the master sends only current WAL data, the slave
  which don't have the corresponding WAL file can not handle it.

 I agree, that making initial copy may be outside the scope of
 Synchronous Log Shipping Replication, but slave catching up by
 requesting all missing WAL files and applying these up to a point when
 it can switch to Sync mode should be in. Else we gain very little from
 this patch.

 I agree with Hannu.

 Any working solution needs to work for all required phases. If you did
 it this way, you'd never catch up at all.

 When you first make the copy, it will be made at time X. The point of
 consistency will be sometime later and requires WAL data to make it
 consistent. So you would need to do a PITR to get it to the point of
 consistency. While you've been doing that, the primary server has moved
 on and now there is a gap between primary and standby. You *must*
 provide a facility to allow the standby to catch up with the primary.
 Only sending *current* WAL is not a solution, and not acceptable.

 So there must be mechanisms for sending past *and* current WAL data to
 the standby, and an exact and careful mechanism for switching between
 the two modes when the time is right. Replication is only synchronous
 *after* the change in mode.

 So the protocol needs to be something like:

 1. Standby contacts primary and says it would like to catch up, but is
 currently at point X (which is a point at, or after the first consistent
 stopping point in WAL after standby has performed its own crash
 recovery, if any was required).
 2. primary initiates data transfer of old data to standby, starting at
 point X
 3. standby tells primary where it has got to periodically
 4. at some point primary decides primary and standby are close enough
 that it can now begin streaming current WAL (which is always the WAL
 up to wal_buffers behind the the current WAL insertion point).

 Bear in mind that unless wal_buffers  16MB the final catchup will
 *always* be less than one WAL file, so external file based mechanisms
 alone could never be enough. So you would need wal_buffers = 2000 to
 make an external catch up facility even work at all.

 This also probably means that receipt of WAL data on the standby cannot
 be achieved by placing it in wal_buffers. So we probably need to write
 it directly to the WAL files, then rely on the filesystem cache on the
 standby to buffer the data for use by ReadRecord.

 --
  Simon Riggs   www.2ndQuadrant.com
  PostgreSQL Training, Services and Support



Umm.. I disagree with you ;)

Here is my initial setup sequence.

1) Start WAL receiver.
 The current WAL file and subsequent ones will be transmitted by
 WAL sender and WAL receiver. This transmission will not block
 the following operation for initial setup, and vice versa. That is,
 the slave can catch up with the master without blocking the master.
 I cannot accept that WAL sender is blocked for initial setup.

2) Copy the missing history files from the master to the slave.

3) Prepare recovery.conf on the slave.
 You have to configure pg_standby and set recovery_target_timeline to
 'latest' or the current TLI of the master.

4) Start postgres.
 The startup process and pg_standby start archive recovery. If there
 are missing WAL files, pg_standby waits for it and WAL replay is
 

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Simon Riggs

On Wed, 2008-09-10 at 10:06 +0200, Markus Wanner wrote:
 Hi,
 
 Simon Riggs wrote:
  1. Standby contacts primary and says it would like to catch up, but is
  currently at point X (which is a point at, or after the first consistent
  stopping point in WAL after standby has performed its own crash
  recovery, if any was required).
  2. primary initiates data transfer of old data to standby, starting at
  point X
  3. standby tells primary where it has got to periodically
  4. at some point primary decides primary and standby are close enough
  that it can now begin streaming current WAL (which is always the WAL
  up to wal_buffers behind the the current WAL insertion point).
 
 Hm.. wouldn't it be simpler, to start streaming right away and cache 

The standby server won't come up until you have:
* copied the base backup
* sent it to standby server
* bring up standby, have it realise it is a replication partner and
begin requesting WAL from primary (in some way)

There will be a gap (probably) between the initial WAL files and the
current tail of wal_buffers by the time all of the above has happened.
We will then need to copy more WAL across until we get to a point where
the most recent WAL record available on standby is ahead of the tail of
wal_buffers on primary so that streaming can start.

If we start caching WAL right away we would need to have two receivers.
One to receive the missing WAL data and one to receive the current WAL
data. We can't apply the WAL until we have the earlier missing WAL data,
so cacheing it seems difficult. On a large server this might be GBs of
data. Seems easier to not cache current WAL and to have just a single
WALReceiver process that performs a mode change once it has caught up.
(And I should say if it catches up, since it is possible that it never
actually will catch up, in practical terms, since this depends upon the
relative power of the servers involved.). So there's no need to store
more WAL on standby than is required to restart recovery from last
restartpoint. i.e. we stream WAL at all times, not just in normal
running mode.

Seems easiest to have:
* Startup process only reads next WAL record when the ReceivedLogPtr 
ReadRecPtr, so it knows nothing of how WAL is received. Startup process
reads directly from WAL files in *all* cases. ReceivedLogPtr is in
shared memory and accessed via spinlock. Startup process only ever reads
this pointer. (Notice that Startup process is modeless).
* WALReceiver reads data from primary and writes it to WAL files,
fsyncing (if ever requested to do so). WALReceiver updates
ReceivedLogPtr.

That is much simpler and more modular. Buffering of the WAL files is
handled by filesystem buffering.

If standby crashes, all data is safely written to WAL files and we
restart from correct place.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Simon Riggs

On Wed, 2008-09-10 at 17:57 +0900, Fujii Masao wrote:

  I cannot accept that WAL sender is blocked for initial setup.

Yes, very important point. We definitely agree on that. The primary must
be able to continue working while all this setup is happening. No tables
are locked, all commits are allowed etc..

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Dimitri Fontaine
Hi,

Le mercredi 10 septembre 2008, Heikki Linnakangas a écrit :
 Sure. That's the fundamental problem with synchronous replication.
 That's why many people choose asynchronous replication instead. Clearly
 at some point you'll want to give up and continue without the slave, or
 kill the master and fail over to the slave. I'm wondering how that's
 different than the lag between master and server in asynchronous
 replication from the client's point of view.

As a future user of this new facilities, the difference from client's POV is 
simple : in normal mode of operation, we want a strong guarantee that any 
COMMIT has made it to both the master and the slave at commit time. No lag 
whatsoever.

You're considering lag as an option in case of failure, but I don't see this 
as acceptable when you need sync commit. In case of network timeout, cluster 
is down. So you want to either continue servicing in degraged mode or get the 
service down while you repair the cluster, but neither of those choice can be 
transparent to the admins, I'd argue.

Of course, main use case is high availability, which tends to say you do not 
have the option to stop service, and seems to dictate continue servicing in 
degraded mode: slave can't keep up (whatever the error domain), master is 
alone, advertise to monitoring solutions and continue servicing.
And provide some way for the slave to rejoin, maybe, too.

 I'm not sure I understand that paragraph. Who's the user? Do we need to
 expose some new information to the client so that it can do something?

Maybe with some GUCs where to set the acceptable timeout for WAL sync 
process, and if reaching timeout is a warning or an error. With a userset GUC 
we could event have replication-error-level transaction concurrent to non 
critical ones...

Now what to do exactly in case of error remains to be decided...

HTH, Regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Simon Riggs

On Wed, 2008-09-10 at 17:57 +0900, Fujii Masao wrote:

 Umm.. I disagree with you ;)

That's no problem and I respect your knowledge.

If we disagree, it is very likely because we have misunderstood each
other. Much has been written, so I will wait for it to all be read and
understood by you and others, and for me to read other posts and replies
also. I feel sure that after some thought a clear consensus will emerge,
and I feel hopeful that the feature can be done in the time available
with simple code changes.

So I will stop replying for a few hours to give everyone time (incl me).

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Markus Wanner

Hi,

Simon Riggs wrote:

The standby server won't come up until you have:
* copied the base backup
* sent it to standby server
* bring up standby, have it realise it is a replication partner and
begin requesting WAL from primary (in some way)


Right. That was your assumption as well. Required before step 1 in both 
cases.



If we start caching WAL right away we would need to have two receivers.
One to receive the missing WAL data and one to receive the current WAL
data. We can't apply the WAL until we have the earlier missing WAL data,
so cacheing it seems difficult.


You could use the same receiver process and just handle different 
packets differently. I see no need for two separate receiver processes here.



On a large server this might be GBs of
data.


..if served from a log archive, correct. Without archiving, we are 
limited to xlog anyway.



Seems easier to not cache current WAL and to have just a single
WALReceiver process that performs a mode change once it has caught up.
(And I should say if it catches up, since it is possible that it never
actually will catch up, in practical terms, since this depends upon the
relative power of the servers involved.). So there's no need to store
more WAL on standby than is required to restart recovery from last
restartpoint. i.e. we stream WAL at all times, not just in normal
running mode.


Switching between streaming from files and 'live' streaming on the 
active node seems difficult to me, because you need to make sure there's 
no gap. That problem could be circumvented by handling this on the 
standby. If you think switching on the active is simple enough, that's fine.



Seems easiest to have:
* Startup process only reads next WAL record when the ReceivedLogPtr 
ReadRecPtr, so it knows nothing of how WAL is received. Startup process
reads directly from WAL files in *all* cases. ReceivedLogPtr is in
shared memory and accessed via spinlock. Startup process only ever reads
this pointer. (Notice that Startup process is modeless).


Well, that's certainly easier for the standby, but requires mode 
switching on the active.


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: Collation support

2008-09-10 Thread Heikki Linnakangas

Martijn van Oosterhout wrote:

On Wed, Sep 10, 2008 at 11:29:14AM +0300, Heikki Linnakangas wrote:

Radek Strnad wrote:

- because of pg_collation and pg_charset are catalogs individual for each
database, if you want to create a database with collation other than
specified, create it in template1 and then create database
I have to wonder, is all that really necessary? The feature you're 
trying to implement is to support database-level collation at first, and 
perhaps column-level collation later. We don't need support for 
user-defined collations and charsets for that.


Since the set of collations isn't exactly denumerable, we need some way
to allow the user to specify the collation they want. The only
collation PostgreSQL knows about is the C collation. Anything else is
user-defined.


Let's just use the name of the OS locale, like we do now. Having a 
pg_collation catalog just moves the problem elsewhere: we'd still need 
something in pg_collation to tie the collation to the OS locale.



Design  functionality changes left:
- move retrieveing collation from pg_database to pg_type

I don't understand this item. What will you move?


Long term, the collation is a property of the type, ...


You might want to provide a default collation for a type as well, but 
the very finest grade is that you can specify collation for every (text) 
comparison operator in your query. Of course you don't want to do that 
for every query, which is why we should provide defaults at different 
levels: columns, tables, database. And perhaps types as well, but that's 
not the most interesting case.


I'm not sure what the SQL spec says about that, but I believe it 
provides syntax and rules for all that.


That's a tricky one. One idea is to prohibit choosing a different 
collation than the one in the template database, unless we know it's 
safe to do so without reindexing.


But that put us back where we started: every database having the same
collation. We're trying to move away from that. Just reindex everything
and be done with it.


That's easier said than done, unfortunately.

Note that we already have the same problem with encodings. If you create 
a database with LATIN1 encoding, load it with data, and then use that as 
a template for a database with UTF-8 encoding, the text data will be 
incorrectly encoded. We should probably fix that too.


I'd say forbid more than one encoding in a cluster, but that's just my
opinion :)


Yeah, that's pretty useless, at least without support for different 
locales on different databases. But might as well keep it unless there's 
a pressing reason to drop it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: Collation support

2008-09-10 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):


Design  functionality changes left:
- move retrieveing collation from pg_database to pg_type




The problem there is that pg_collation is local catalog, but pg_database 
is global catalog. IIRC, It was discussed during last commitfest. I 
think it is bad idea to make this kind of dependency. It seems to me 
better to implement something what we will use later to avoid useless 
and confusing dependency.


Zdenek


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Simon Riggs

On Wed, 2008-09-10 at 11:07 +0200, Dimitri Fontaine wrote:
 Hi,
 
 Le mercredi 10 septembre 2008, Heikki Linnakangas a écrit :
  Sure. That's the fundamental problem with synchronous replication.
  That's why many people choose asynchronous replication instead. Clearly
  at some point you'll want to give up and continue without the slave, or
  kill the master and fail over to the slave. I'm wondering how that's
  different than the lag between master and server in asynchronous
  replication from the client's point of view.
 
 As a future user of this new facilities, the difference from client's POV is 
 simple : in normal mode of operation, we want a strong guarantee that any 
 COMMIT has made it to both the master and the slave at commit time. No lag 
 whatsoever.

Agreed.

 You're considering lag as an option in case of failure, but I don't see this 
 as acceptable when you need sync commit. In case of network timeout, cluster 
 is down. So you want to either continue servicing in degraged mode or get the 
 service down while you repair the cluster, but neither of those choice can be 
 transparent to the admins, I'd argue.
 
 Of course, main use case is high availability, which tends to say you do not 
 have the option to stop service,

We have a number of choices, at the point of failure:
* Does the whole primary server stay up (probably)?
* Do we continue to allow new transactions in degraded mode? (which
increases the risk of transaction loss if we continue at that time).
(The answer sounds like it will be of course, stupid but this cluster
may be part of an even higher level HA mechanism, so the answer isn't
always clear).
* For each transaction that is trying to commit: do we want to wait
forever? If not, how long? If we stop waiting, do we throw ERROR, or do
we say, lets get on with another transaction.

If the server is up, yet all connections in a session pool are stuck
waiting for their last commits to complete then most sysadmins would
agree that the server is actually down. Since no useful work is
happening, or can be initiated - even read only. We don't need to
address that issue in the same way for all transactions, is all I'm
saying.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Simon Riggs

On Wed, 2008-09-10 at 09:35 +0300, Hannu Krosing wrote:
 On Wed, 2008-09-10 at 15:15 +0900, Fujii Masao wrote:
  On Wed, Sep 10, 2008 at 12:26 AM, Heikki Linnakangas
  [EMAIL PROTECTED] wrote:
   If a slave falls behind, how does it catch up? I guess you're saying that 
   it
   can't fall behind, because the master will block before that happens. Also
   in asynchronous replication? And what about when the slave is first set 
   up,
   and needs to catch up with the master?
  
  The mechanism for the slave to catch up with the master should be
  provided on the outside of postgres. 
 
 So you mean that we still need to do initial setup (copy backup files
 and ship and replay WAL segments generated during copy) by external
 WAL-shipping tools, like walmgr.py, and then at some point switch over
 to internal WAL-shipping, when we are sure that we are within same WAL
 file on both master and slave ?
 
  I think that postgres should provide
  only WAL streaming, i.e. the master always sends *current* WAL data
  to the slave.
 
  Of course, the master has to send also the current WAL *file* in the
  initial sending just after the slave starts and connects with it.
 
 I think that it needs to send all WAL files which slave does not yet
 have, as else the slave will have gaps. On busy system you will generate
 several new WAL files in the time it takes to make master copy, transfer
 it to slave and apply WAL files generated during initial setup.
 
  Because, at the time, current WAL position might be in the middle of
  WAL file. Even if the master sends only current WAL data, the slave
  which don't have the corresponding WAL file can not handle it.
 
 I agree, that making initial copy may be outside the scope of
 Synchronous Log Shipping Replication, but slave catching up by
 requesting all missing WAL files and applying these up to a point when
 it can switch to Sync mode should be in. Else we gain very little from
 this patch.

I agree with Hannu.

Any working solution needs to work for all required phases. If you did
it this way, you'd never catch up at all.

When you first make the copy, it will be made at time X. The point of
consistency will be sometime later and requires WAL data to make it
consistent. So you would need to do a PITR to get it to the point of
consistency. While you've been doing that, the primary server has moved
on and now there is a gap between primary and standby. You *must*
provide a facility to allow the standby to catch up with the primary.
Only sending *current* WAL is not a solution, and not acceptable.

So there must be mechanisms for sending past *and* current WAL data to
the standby, and an exact and careful mechanism for switching between
the two modes when the time is right. Replication is only synchronous
*after* the change in mode.

So the protocol needs to be something like:

1. Standby contacts primary and says it would like to catch up, but is
currently at point X (which is a point at, or after the first consistent
stopping point in WAL after standby has performed its own crash
recovery, if any was required).
2. primary initiates data transfer of old data to standby, starting at
point X
3. standby tells primary where it has got to periodically
4. at some point primary decides primary and standby are close enough
that it can now begin streaming current WAL (which is always the WAL
up to wal_buffers behind the the current WAL insertion point).

Bear in mind that unless wal_buffers  16MB the final catchup will
*always* be less than one WAL file, so external file based mechanisms
alone could never be enough. So you would need wal_buffers = 2000 to
make an external catch up facility even work at all.

This also probably means that receipt of WAL data on the standby cannot
be achieved by placing it in wal_buffers. So we probably need to write
it directly to the WAL files, then rely on the filesystem cache on the
standby to buffer the data for use by ReadRecord.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: Collation support

2008-09-10 Thread Martijn van Oosterhout
On Wed, Sep 10, 2008 at 12:51:02PM +0300, Heikki Linnakangas wrote:
 Since the set of collations isn't exactly denumerable, we need some way
 to allow the user to specify the collation they want. The only
 collation PostgreSQL knows about is the C collation. Anything else is
 user-defined.
 
 Let's just use the name of the OS locale, like we do now. Having a 
 pg_collation catalog just moves the problem elsewhere: we'd still need 
 something in pg_collation to tie the collation to the OS locale.

There's not a one-to-one mapping between collation and locale name. A
locale name includes information about the charset and a collation may
have paramters like case-sensetivity and pad-attribute which are not
present in the locale name. You need a mapping anyway, which is what
this table is for.

The difference in collation between CHAR() and VARCHAR() is the usual
example here.

 Long term, the collation is a property of the type, ...
 
 I'm not sure what the SQL spec says about that, but I believe it 
 provides syntax and rules for all that.

The spec is quite detailed about and I posted code to do it years ago.
The point is that we don't need to go that far with this patch.

 But that put us back where we started: every database having the same
 collation. We're trying to move away from that. Just reindex everything
 and be done with it.
 
 That's easier said than done, unfortunately.

I don't see an alternative.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


[HACKERS] using hash index when BETWEEN is specified

2008-09-10 Thread Zdenek Kotala
I has played with new hash index implementation and I tried following 
command:


postgres=# select * from test where id between 1 and 5;
Time: 9651,033 ms
postgres=# explain select * from test where id between 1 and 5;
   QUERY PLAN
-
 Seq Scan on test  (cost=0.00..141681.00 rows=1 width=4)
   Filter: ((id = 1) AND (id = 5))
(2 rows)


Hash index is created on id column. However when I use

postgres=# explain select * from test where id in (1,2,3,4,5);
   QUERY PLAN
-
 Bitmap Heap Scan on test  (cost=22.24..332.53 rows=83 width=4)
   Recheck Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
   -  Bitmap Index Scan on test_idx  (cost=0.00..22.22 rows=83 width=0)
 Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
(4 rows)

Time: 1,352 ms

I'm not planner guru but it seems to me that BETWEEN clause could be 
rewritten as a IN clause for integer data types and small interval.



Zdenek


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using hash index when BETWEEN is specified

2008-09-10 Thread Asko Oja
On Wed, Sep 10, 2008 at 1:39 PM, Zdenek Kotala [EMAIL PROTECTED]wrote:

 I has played with new hash index implementation and I tried following
 command:

 postgres=# select * from test where id between 1 and 5;
 Time: 9651,033 ms
 postgres=# explain select * from test where id between 1 and 5;
   QUERY PLAN
 -
  Seq Scan on test  (cost=0.00..141681.00 rows=1 width=4)
   Filter: ((id = 1) AND (id = 5))
 (2 rows)


 Hash index is created on id column. However when I use

 postgres=# explain select * from test where id in (1,2,3,4,5);
   QUERY PLAN
 -
  Bitmap Heap Scan on test  (cost=22.24..332.53 rows=83 width=4)
   Recheck Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
   -  Bitmap Index Scan on test_idx  (cost=0.00..22.22 rows=83 width=0)
 Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
 (4 rows)

 Time: 1,352 ms

 I'm not planner guru but it seems to me that BETWEEN clause could be
 rewritten as a IN clause for integer data types and small interval.


Where should the line be drawn.
Define small :)





Zdenek


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] using hash index when BETWEEN is specified

2008-09-10 Thread Robert Haas
 I'm not planner guru but it seems to me that BETWEEN clause could be
 rewritten as a IN clause for integer data types and small interval.

 Where should the line be drawn.
 Define small :)

When the estimated cost is lower?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Base64 decode/encode performance

2008-09-10 Thread Gaetano Mendola
Hi,
I have been experimenting with some base64 encoding/decoding implementation.

I find out that the one at http://code.google.com/p/stringencoders is the best
obtaining a 1.3 speedup vs the postgres one.

Do you think is worth to submit a patch that replaces the postgres base64 
implementation
with this one?


Regards
Gaetano Mendola

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using hash index when BETWEEN is specified

2008-09-10 Thread Hannu Krosing
On Wed, 2008-09-10 at 07:13 -0400, Robert Haas wrote:
  I'm not planner guru but it seems to me that BETWEEN clause could be
  rewritten as a IN clause for integer data types and small interval.
 
  Where should the line be drawn.
  Define small :)
 
 When the estimated cost is lower?

You still need to draw a line for when to even try estimating the cost .

Will this be interval of 10 ? or 100 ? or 1 ?


Hannu 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Aidan Van Dyk
* Simon Riggs [EMAIL PROTECTED] [080910 06:18]:

 We have a number of choices, at the point of failure:
 * Does the whole primary server stay up (probably)?

The only sane choice is the one the admin makes.  Any predetermined choice
PG makes can (and will) be wrong in some situations.

 * Do we continue to allow new transactions in degraded mode? (which
 increases the risk of transaction loss if we continue at that time).
 (The answer sounds like it will be of course, stupid but this cluster
 may be part of an even higher level HA mechanism, so the answer isn't
 always clear).

The only sane choice is the one the admin makes.  Any predetermined choice
PG makes can (and will) be wrong in some situations.

 * For each transaction that is trying to commit: do we want to wait
 forever? If not, how long? If we stop waiting, do we throw ERROR, or do
 we say, lets get on with another transaction.

The only sane choice is the one the admin makes.  Any predetermined choice
PG makes can (and will) be wrong in some situations.

 If the server is up, yet all connections in a session pool are stuck
 waiting for their last commits to complete then most sysadmins would
 agree that the server is actually down. Since no useful work is
 happening, or can be initiated - even read only. We don't need to
 address that issue in the same way for all transactions, is all I'm
 saying.

Sorry to sound like a broken record here, but the whole point is to
guarantee data safety.  You can only start trading ACID for HA if you
have the ACID guarantees in the first place (and for replication, this
means across the cluster, including slaves)

So in that light, I think it's pretty obvious that if a slave is
considered part of an active synchronous replication cluster, in the
face of network lag, or even network failure, the master *must* pretty
much halt all new commits in their tracks until that slave acknowledges
the commit.  Yes that's going to cause a backup.  That's the cost of a
synchronous replication.

But that means the admin has to be able to control whether a slave is
part of an active synchronous replication cluster or not.  I hope that
control eventually is a lot more than a GUC that says when a slave is X
seconds behind, abandon him).

I'ld dream of a replication interface where I could add new slaves on
the fly (and a nice tool that pg_start_backup()/sync/apply WAL to sync
then subscribe), get slave status (maybe syncing/active/abandoned), and
some average latency (i.e. something like svctm of iostat on your WAL
disk) and some way to control the slave degradation from active to
abandoned (like the above GUC, or maybe a callout/hook/script that runs
when latency  X, etc, or both).

And for async replication, you just have a proxy slave which does
nothing but subscribe to your master, always acknowledge WAL right away
so the master doesn't wait, and keep a local backlog of WAL it's
sending out to many clients.   This proxy slave doesn't slow down the
master, but can feed clients accross slow WAN links (that may not have
the burst bandwidth to keep up with bursty master writes, but have agregate
bandwidth to keep pretty close to the master), or networks that drop out
for a period, etc.

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] Keeping creation time of objects

2008-09-10 Thread Alvaro Herrera
Hannu Krosing wrote:

 Timestamps should rather be considered a forensic tool.
 
 You may have the best VCS system, but if somebody bypasses it, you may
 still need to find out, when it was done.

So you're arguing for modification time, which is not was Devrim is
proposing -- he's proposing creation time.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Simon Riggs

On Wed, 2008-09-10 at 09:36 -0400, Aidan Van Dyk wrote:
 * Simon Riggs [EMAIL PROTECTED] [080910 06:18]:
 
  We have a number of choices, at the point of failure:
  * Does the whole primary server stay up (probably)?
 
 The only sane choice is the one the admin makes.  Any predetermined choice
 PG makes can (and will) be wrong in some situations.

We are in agreement then. Those questions were listed as arguments in
favour of a parameter to let the sysadmin choose. More than that, I was
saying this can be selected for individual transactions, not just for
the whole server as a whole (as other vendors do).

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-10 Thread Devrim GÜNDÜZ
Hi,

On Tue, 2008-09-09 at 23:14 -0400, Andrew Dunstan wrote:
 Nothing is cost free. Every feature adds to code complexity, and has
 to be maintained.

With full respect to you: I'm only talking about creation time. How much
overhead and complexity are you expecting?

Cheers,
-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Fujii Masao
On Tue, Sep 9, 2008 at 10:55 PM, Markus Wanner [EMAIL PROTECTED] wrote:
 Hi,

 ITAGAKI Takahiro wrote:

 Signals and locking, borrewed from Postgres-R, are now studied
 for the purpose in the log shipping,

 Cool. Let me know if you have any questions WRT this imessages stuff.

If you're sure it's all right, I have a trivial question.

Which signal should we use for the notification to the backend from
WAL sender? The notable signals are already used.

Or, since a backend don't need to wait on select() unlike WAL sender,
ISTM that it's not so inconvenient to use a semaphore for that notification.

Your thought?

regards

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Markus Wanner

Hi,

Fujii Masao wrote:

On Tue, Sep 9, 2008 at 10:55 PM, Markus Wanner [EMAIL PROTECTED] wrote:

Hi,

ITAGAKI Takahiro wrote:

Signals and locking, borrewed from Postgres-R, are now studied
for the purpose in the log shipping,

Cool. Let me know if you have any questions WRT this imessages stuff.


If you're sure it's all right, I have a trivial question.


Well, I know it works for me and I think it could work for you, too. 
That's all I'm saying.



Which signal should we use for the notification to the backend from
WAL sender? The notable signals are already used.


I'm using SIGUSR1, see src/backend/storage/ipc/imsg.c from Postgres-R, 
line 232. That isn't is use for backends or the postmaster, AFAIK.



Or, since a backend don't need to wait on select() unlike WAL sender,
ISTM that it's not so inconvenient to use a semaphore for that notification.


They probably could, but not the WAL sender.

What's the benefit of semaphores? It seems pretty ugly to set up a 
semaphore, lock that on the WAL sender, then claim it on the backend to 
wait for it, and then release it on the WAL sender to notify the backend.


If all you want to do is to signal the backend, why not use signals ;-) 
 But maybe I'm missing something?


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using hash index when BETWEEN is specified

2008-09-10 Thread Zdenek Kotala

Hannu Krosing napsal(a):

On Wed, 2008-09-10 at 07:13 -0400, Robert Haas wrote:

I'm not planner guru but it seems to me that BETWEEN clause could be
rewritten as a IN clause for integer data types and small interval.

Where should the line be drawn.
Define small :)

When the estimated cost is lower?


You still need to draw a line for when to even try estimating the cost .

Will this be interval of 10 ? or 100 ? or 1 ?


I think it depends of ration of unique integer number in a table and 
numbers of requested interval, number distribution and total number of rows.


For example if you have 10 distinct number and each has 100 occurrence 
then full scan is better (for between 1 and 5). But if each number 
occurs 10x. Then using hash index should be effective.


Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Base64 decode/encode performance

2008-09-10 Thread Marko Kreen
On 9/10/08, Gaetano Mendola [EMAIL PROTECTED] wrote:
  I have been experimenting with some base64 encoding/decoding implementation.

  I find out that the one at http://code.google.com/p/stringencoders is the 
 best
  obtaining a 1.3 speedup vs the postgres one.

  Do you think is worth to submit a patch that replaces the postgres base64 
 implementation
  with this one?

(Note: the b64encode there reads 3 chars at a time, b64decode int32
at a time.)

There are 2 killer problems:

- decode does not seem to handle architectures that segfault
  on unaligned int32 accesses.
- decode does not allow whitespace in input string.

If those are fixed it's question of if the 1.3x speed if worth more
complex code with big lookup tables.

If you want to optimize, it seems more worthwhile to add additional
loop to current code that reads 3 or 4 chars at a time, before the
current single-char loop.  The decode loop may thus even optimize
to int32 fetching on x86/64 with reasonable compiler.  Handling
whitespace with such code is doable, but will the code be clear enough?

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-10 Thread Ron Mayer

Robert Haas wrote:

bits...bytes...blocks...m...M

I can't imagine that taking away the B is somehow going to
be more clear.


If clarity is the goal, I'd want the following:

a) Verbosely spelling out the units in the default config file

   temp_buffers = 16 megabytes
   or
   temp_buffers = 16 milliblocks :-)

   Naive users who favor cutpaste will use the verbose words
   that should leave little room for confusion.  Power-users
   who know the short forms from the docs will presumably have
   read the descriptions.

b) having show show verbosely spelled out units.
   db=# show temp_buffers;
   temp_buffers
   --
   1600 bytes
   (1 row)

c) having set show a NOTICE with the verbose word for the units
  db=# set temp_buffers = '16mb';
  NOTICE: setting temp_buffers to 1600 bytes


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Base64 decode/encode performance

2008-09-10 Thread Mark Mielke

Marko Kreen wrote:

(Note: the b64encode there reads 3 chars at a time, b64decode int32
at a time.)

There are 2 killer problems:

- decode does not seem to handle architectures that segfault
  on unaligned int32 accesses.


Out of curiosity - does this problem exist on any platform for which 
PostgreSQL is currently ported and supported?


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-10 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Here's an updated FSM patch. Changes since last patch:



Yesterday, I started to reviewing your patch. At the beginning I have 
general questions:


1) If I understand correctly the main goal is to improve FSM to cover 
all pages in file which is useful for huge database.


2) Did you perform any benchmark? Is there any performance improvement 
or penalty?


3) How it works when database has many active parallel connections?


Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Base64 decode/encode performance

2008-09-10 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, Sep 10, 2008 at 10:44:00AM -0400, Mark Mielke wrote:
 Marko Kreen wrote:
[...]
 - decode does not seem to handle architectures that segfault
   on unaligned int32 accesses.

 Out of curiosity - does this problem exist on any platform for which 
 PostgreSQL is currently ported and supported?

HP PA is one of them. Besides, some others hide that behind a painful
(more than a factor of 1.3 -- we are talking software emulation here)
performance hit. More modeern architectures (Alpha, Itanium, IA-64) take
this route.

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIx+ggBcgs9XrR2kYRAha9AJ0Xy6Zg/m76H2H4Uzta3pSXJh/D2gCfS8PF
vpDQMU8gg2BahURgSI97GSk=
=KnIm
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Base64 decode/encode performance

2008-09-10 Thread Martijn van Oosterhout
On Wed, Sep 10, 2008 at 10:44:00AM -0400, Mark Mielke wrote:
 There are 2 killer problems:
 
 - decode does not seem to handle architectures that segfault
   on unaligned int32 accesses.
 
 Out of curiosity - does this problem exist on any platform for which 
 PostgreSQL is currently ported and supported?

It exists on most CPUs actually like Alpha/Sparc/MIPS, just not on
Intel chips, which is why you don't see them very often. Unaligned
accesses do take twice as long to execute though, even on Intel chips.
On some OSes the unaligned access is trapped and emulated by the OS,
which doesn't do much for performance.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] New FSM patch

2008-09-10 Thread Heikki Linnakangas

Zdenek Kotala wrote:
Yesterday, I started to reviewing your patch. 


Thanks!

1) If I understand correctly the main goal is to improve FSM to cover 
all pages in file which is useful for huge database.


That's not a goal per se, though it's true that the new FSM does cover 
all pages. The goals are to:
- eliminate max_fsm_pages and max_fsm_relations GUC variables, so that 
there's one thing less to configure
- make the FSM immediately available and useful after recovery (eg. warm 
standby)
- make it possible to retail update the FSM, which will be needed for 
partial vacuum


2) Did you perform any benchmark? Is there any performance improvement 
or penalty?


Working on it.. I've benchmarked some bulk-insertion scenarios, and the 
new FSM is now comparable to the current implementation on those tests. 
See the o


I've also been working on a low level benchmark using a C user-defined 
function that exercises just the FSM, showing the very raw CPU 
performance vs. current implementation. More on that later, but ATM it 
looks like the new implementation can be faster or slower than the 
current one, depending on the table size.


The biggest potential performance issue, however, is the fact that the 
new FSM implementation is WAL-logged. That shows up dramatically in the 
raw test where there's no other activity than FSM lookups and updates, 
but will be much less interesting in real life where FSM lookups are 
always related to some other updates which are WAL-logged anyway.


I also ran some DBT-2 tests without think times, with a small number of 
warehouses. But the results of that had such a high variability from 
test to test, that any difference in FSM speed would've been lost in the 
noise.


Do you still have the iGen setup available? Want to give it a shot?


3) How it works when database has many active parallel connections?


The new FSM should in principle scale better than the old one. However, 
Simon raised a worry about the WAL-logging: WALInserLock can already 
become the bottleneck in OLTP-scenarios with very high load and many 
CPUs. The FSM isn't any worse than other actions that generate WAL, but 
naturally if you're bottlenecked by the WAL lock or bandwidth, any 
increase in WAL traffic will show up as an overall performance loss.


I'm not too worried about that, myself, because in typical scenarios the 
extra WAL traffic generated by the FSM should be insignificant in volume 
compared to all the other WAL traffic. But Simon will probably demand 
some hard evidence of that ;-).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-10 Thread Robert Haas
 a) Verbosely spelling out the units in the default config file
   temp_buffers = 16 megabytes
   or
   temp_buffers = 16 milliblocks :-)
   Naive users who favor cutpaste will use the verbose words
   that should leave little room for confusion.  Power-users
   who know the short forms from the docs will presumably have
   read the descriptions.

I think it would make a lot of sense to encourage adding the word
buffers or blocks when that is the unit in question.  This is all
religion at this point, but I find it difficult to believe that there
is any real need to spell out megabytes.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Tom Lane
I observed a curious bug in autovac just now.  Since plain vacuum avoids
calling GetTransactionSnapshot, an autovac worker that happens not to
analyze any tables will never call GetTransactionSnapshot at all.
This means it will arrive at vac_update_datfrozenxid with
RecentGlobalXmin never having been changed from its boot value of
FirstNormalTransactionId, which means that it will fail to update the
database's datfrozenxid ... or, if the current value of datfrozenxid
is past 2 billion, that it will improperly advance datfrozenxid to
sometime in the future.

Once you get into this state in a reasonably idle database such as
template1, autovac is completely dead in the water: if it thinks
template1 needs to be vacuumed for wraparound, then every subsequent
worker will be launched at template1, every one will fail to advance
its datfrozenxid, rinse and repeat.  Even before that happens, the
DB's datfrozenxid will prevent clog truncation, which might explain
some of the recent complaints.

I've only directly tested this in HEAD, but I suspect the problem goes
back a ways.

On reflection I'm not even sure that this is strictly an autovacuum
bug.  It can be cast more generically as RecentGlobalXmin getting
used without ever having been set, and it sure looks to me like the
HOT patch may have introduced a few risks of that sort.

I'm thinking that maybe an appropriate fix is to insert a
GetTransactionSnapshot call at the beginning of InitPostgres'
transaction, thus ensuring that every backend has some vaguely sane
value for RecentGlobalXmin before it tries to do any database access.

Another thought is that even with that, an autovac worker is likely
to reach vac_update_datfrozenxid with a RecentGlobalXmin value that
was computed at the start of its run, and is thus rather old.
I wonder why vac_update_datfrozenxid is using the variable at all
rather than doing GetOldestXmin?  It's not like that function is
so performance-critical that it needs to avoid calling GetOldestXmin.

Lastly, now that we have the PROC_IN_VACUUM test in GetSnapshotData,
is it actually necessary for lazy vacuum to avoid setting a snapshot?
It seems like it might be a good idea for it to do so in order to
keep its RecentGlobalXmin reasonably current.

I've only looked at this in HEAD, but I am thinking that we have
a real problem here in both HEAD and 8.3.  I'm less sure how bad
things are in the older branches.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-10 Thread David Fetter
On Tue, Sep 09, 2008 at 10:50:57PM -0400, Robert Haas wrote:
  When people aren't keeping track of their DDL, that is very
  strictly a process problem on their end.  When people are shooting
  themselves in the foot, it's a great disservice to market Kevlar
  shoes to them.
 
 I can't believe anyone is going to stop tracking their DDL because,
 ooh goody, now we have pg_class.creation_time.  They will look at
 and say either oh, this is nice or oh, this is useless and go on
 about their business.

I can easily believe that a feature like this might help them make
the decision not to start out of a false sense of security.

 I try pretty hard not to shoot myself in the foot.  But if someone
 comes up to me and offers me some shoes that are have the same cost,

Not the same.  This is extra code, so it will provide both new places
for bugs and extra maintenance costs.

 appearance,

Clearly not the same.

 comfort-level,

False comfort is bad.  Putting an anesthetic instead of support in a
shoe billed as orthopedic may make customers comfortable, but when
they continue to damage their foot with it, it's not a feature.

 and durability as regular shoes but are slightly more bullet
 resistant, should I refuse them on principle?

See above.

 Why?

See above.

Cheers,
David.
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using hash index when BETWEEN is specified

2008-09-10 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes:
 I think it depends of ration of unique integer number in a table and 
 numbers of requested interval, number distribution and total number of rows.

 For example if you have 10 distinct number and each has 100 occurrence 
 then full scan is better (for between 1 and 5). But if each number 
 occurs 10x. Then using hash index should be effective.

I think this discussion is a complete waste of time.  Hash indexes don't
win against btrees for single indexscans currently.  Even if that ever
gets fixed, it's highly unlikely that they'd win for N separate
indexscans versus 1 indexscan, which is what a query rewrite of this
sort would produce.  Remember that the btree will have the desired range
of keys stored adjacently, whereas in a hash they are almost certainly
in distinct buckets, and likely not even close-together buckets if the
hash function is doing its job well.  So you really are talking about a
factor of N both in indexscan setup overhead and in I/O costs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-10 Thread Jeff Davis
On Tue, 2008-09-09 at 09:47 -0400, Robert Haas wrote:
  3. I think this is a must fix because of the point about volatile
  functions --- changing it later will result in user-visible semantics
  changes, so we have to get it right the first time.
 
  I don't entirely agree with #3. It is user-visible, but only in the
  sense that someone is depending on undocumented multiple-evaluation
  behavior.
 
 What makes you think it's going to be undocumented?  Single versus
 multiple evaluation is a keep aspect of this feature and certainly
 needs to be documented one way or the other.  I can't understand why
 we would introduce a standard syntax with non-standard behavior, but
 if we do, it certainly had better be mentioned in the documentation.
 

I meant that -- hypothetically if we did accept the feature as-is -- the
number of evaluations would be documented to be undefined, not N. That
would avoid the backwards-compatibility problem.

This one point is probably not worth discussing now, because argument
#1 and #2 stand on their own.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Alvaro Herrera
Tom Lane wrote:
 I observed a curious bug in autovac just now.  Since plain vacuum avoids
 calling GetTransactionSnapshot, an autovac worker that happens not to
 analyze any tables will never call GetTransactionSnapshot at all.
 This means it will arrive at vac_update_datfrozenxid with
 RecentGlobalXmin never having been changed from its boot value of
 FirstNormalTransactionId, which means that it will fail to update the
 database's datfrozenxid ... or, if the current value of datfrozenxid
 is past 2 billion, that it will improperly advance datfrozenxid to
 sometime in the future.

Ouch :-(


 I've only directly tested this in HEAD, but I suspect the problem goes
 back a ways.

Well, this logic was introduced in 8.2; I'm not sure if there's a
problem in 8.1, but I don't think so.

 On reflection I'm not even sure that this is strictly an autovacuum
 bug.  It can be cast more generically as RecentGlobalXmin getting
 used without ever having been set, and it sure looks to me like the
 HOT patch may have introduced a few risks of that sort.

Agreed.

Maybe we should boot RecentGlobalXmin with InvalidOid, and ensure where
it's going to be used that it's not that.

 I'm thinking that maybe an appropriate fix is to insert a
 GetTransactionSnapshot call at the beginning of InitPostgres'
 transaction, thus ensuring that every backend has some vaguely sane
 value for RecentGlobalXmin before it tries to do any database access.

AFAIR there's an initial transaction in InitPostgres or something like
that.  Since it goes away quickly, it'd be a good place to ensure the
snapshot does not last much longer.

 Another thought is that even with that, an autovac worker is likely
 to reach vac_update_datfrozenxid with a RecentGlobalXmin value that
 was computed at the start of its run, and is thus rather old.
 I wonder why vac_update_datfrozenxid is using the variable at all
 rather than doing GetOldestXmin?  It's not like that function is
 so performance-critical that it needs to avoid calling GetOldestXmin.

The function is called only once per autovacuum iteration, and once in
manually-invoked vacuum, so certainly it's not performance-critical.

 Lastly, now that we have the PROC_IN_VACUUM test in GetSnapshotData,
 is it actually necessary for lazy vacuum to avoid setting a snapshot?
 It seems like it might be a good idea for it to do so in order to
 keep its RecentGlobalXmin reasonably current.

Hmm, I think I'd rather be inclined to get a snapshot just when it's
going to finish.  That way, RecentGlobalXmin will be up to date even if
the 

 I've only looked at this in HEAD, but I am thinking that we have
 a real problem here in both HEAD and 8.3.  I'm less sure how bad
 things are in the older branches.

8.2 does contain the vac_update_datfrozenxid problem at the very least.
Older versions do not have that logic, so they are probably safe.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I observed a curious bug in autovac just now.  ...

 Maybe we should boot RecentGlobalXmin with InvalidOid, and ensure where
 it's going to be used that it's not that.

Good idea --- an Assert at the use sites should be sufficient.

 Lastly, now that we have the PROC_IN_VACUUM test in GetSnapshotData,
 is it actually necessary for lazy vacuum to avoid setting a snapshot?
 It seems like it might be a good idea for it to do so in order to
 keep its RecentGlobalXmin reasonably current.

 Hmm, I think I'd rather be inclined to get a snapshot just when it's
 going to finish.

I'm worried about keeping RecentGlobalXmin up to date during the
vacuums, not only at the end, because it will be used for HOT page
pruning during the vacuums.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Tom Lane wrote:

  Lastly, now that we have the PROC_IN_VACUUM test in GetSnapshotData,
  is it actually necessary for lazy vacuum to avoid setting a snapshot?
  It seems like it might be a good idea for it to do so in order to
  keep its RecentGlobalXmin reasonably current.
 
  Hmm, I think I'd rather be inclined to get a snapshot just when it's
  going to finish.
 
 I'm worried about keeping RecentGlobalXmin up to date during the
 vacuums, not only at the end, because it will be used for HOT page
 pruning during the vacuums.

Oh, I see.  I didn't know we were doing HOT pruning during vacuum; does
it make sense?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I'm worried about keeping RecentGlobalXmin up to date during the
 vacuums, not only at the end, because it will be used for HOT page
 pruning during the vacuums.

 Oh, I see.  I didn't know we were doing HOT pruning during vacuum; does
 it make sense?

Sorry, I got a bit confused there.  The vacuum's intentional pruning
will use its own OldestXmin variable, which is known current at the
start of the vacuum (and I think there were even proposals to advance
it more frequently than that).  However, a vacuum could require some
incidental system catalog fetches, which I think could result in
prune operations based on RecentGlobalXmin on the catalog pages
(cf index_getnext).

So it's probably not too terribly important ... as long as an autovac
worker doesn't live long enough that its RecentGlobalXmin threatens
to wrap around ... but I'm still interested in it as a code cleanup
measure.  Skipping the transaction snapshot fetch was a performance
kluge, and if we don't need it any more I'd like to get rid of that
distinction between full and lazy vacuum behavior.

Anyway I think we are on the same page about the rest of the issues.
Did you want to work on fixing them, or shall I?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Heikki Linnakangas

Alvaro Herrera wrote:

I didn't know we were doing HOT pruning during vacuum; does
it make sense?


Removing HOT-updated, dead tuples is a bit complicated. It needs to be 
done one HOT-chain at a time, and the root line pointer needs to be 
updated to the next live tuple in the chain. lazy_scan_heap() calls the 
regular pruning function heap_page_prune() to deal with those before 
doing the normal scan of line pointers.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Simon Riggs

On Wed, 2008-09-10 at 17:57 +0900, Fujii Masao wrote:

 My sequence covers several cases :
 
 * There is no missing WAL file.
 * There is a lot of missing WAL file.

This is the likely case for any medium+ sized database.

 * There are missing history files. Failover always generates the gap
 of
history file because TLI is incremented when archive recovery is
 completed.

Yes, but failover doesn't happen while we are configuring replication,
it can only happen after we have configured replication. It would be
theoretically possible to take a copy from one server and then try to
synchronise with a 3rd copy of the same server, but that seems perverse
and bug prone. So I advise that we only allow replication when the
timeline of the standby matches the timeline of the master, having it as
an explicit check.

 In your design, does not initial setup block the master?
 Does your design cover above-mentioned case?

The way I described it does not block the master. It does defer the
point at which we can start using synchronous replication, so perhaps
that is your objection. I think it is acceptable: good food takes time
to cook.

I have thought about the approach you've outlined, though it seems to me
now like a performance optimisation rather than something we must have.

IMHO it will be confusing to be transferring both old and new data at
the same time from master to slave. We will have two different processes
sending and two different processes receiving. You'll need to work
through about four times as many failure modes, all of which will need
testing. Diagnosing problems in it via the log hurts my head just
thinking about it. ISTM that will severely impact the initial robustness
of the software for this feature. Perhaps in time it is the right way.

Anyway, feels like we're getting close to some good designs. There isn't
much difference between what we're discussing here.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner question

2008-09-10 Thread Tom Raney

Tom Lane wrote:

Tom Raney [EMAIL PROTECTED] writes:

Why does the planner consider both input variations of each symmetric merge join?  The 
README says there is not a lot of difference between the two options.  When 
are there any differences?


The righthand side needs to support mark/restore, the left doesn't;
so depending on plan types one way might need a helper Materialize
node that the other way doesn't.  Also, duplicated values are a bit
cheaper to process on the left than the right.

regards, tom lane



Thank you for the explanation.

On a somewhat related issue, I am a bit stumped on the way path keys 
function.


In the following query and debug data, why would an index scan on a 
single relation contain a path key from a different relation?


optimizer/README says, The PathKeys data structure represents what is 
known about the sort order of the tuples generated by a particular 
Path.  A path's pathkeys field is a list of PathKey nodes, where the 
n'th item represents the n'th sort key of the result.


Why does the index scan for tenk1 include a path key from 
onek.unique2?  Is it implying an equivalence there?


-Tom Raney


bench=# explain select * from tenk1 JOIN onek ON 
tenk1.unique2=onek.unique2;


RELOPTINFO (tenk1): rows=1 width=244
path list:
SeqScan(tenk1) rows=1 cost=0.00..434.00
IdxScan(tenk1) rows=1 cost=0.00..583.25
  pathkeys: ((tenk1.unique2, onek.unique2))  ---

cheapest startup path:
SeqScan(tenk1) rows=1 cost=0.00..434.00

cheapest total path:
SeqScan(tenk1) rows=1 cost=0.00..434.00

RELOPTINFO (onek): rows=1000 width=244
path list:
SeqScan(onek) rows=1000 cost=0.00..44.00
IdxScan(onek) rows=1000 cost=0.00..72.25
  pathkeys: ((tenk1.unique2, onek.unique2))

cheapest startup path:
SeqScan(onek) rows=1000 cost=0.00..44.00

cheapest total path:
SeqScan(onek) rows=1000 cost=0.00..44.00

RELOPTINFO (tenk1/onek): rows=1000 width=488
path list:
MergeJoin(tenk1/onek) rows=1000 cost=0.52..144.24
  clauses: tenk1.unique2 = onek.unique2
IdxScan(tenk1) rows=1 cost=0.00..583.25
  pathkeys: ((tenk1.unique2, onek.unique2))
IdxScan(onek) rows=1000 cost=0.00..72.25
  pathkeys: ((tenk1.unique2, onek.unique2))
NestLoop(tenk1/onek) rows=1000 cost=0.00..1756.96
  clauses: tenk1.unique2 = onek.unique2
SeqScan(onek) rows=1000 cost=0.00..44.00
IdxScan(tenk1) rows=1 cost=0.00..1.70

cheapest startup path:
NestLoop(tenk1/onek) rows=1000 cost=0.00..1756.96
  clauses: tenk1.unique2 = onek.unique2
SeqScan(onek) rows=1000 cost=0.00..44.00
IdxScan(tenk1) rows=1 cost=0.00..1.70

cheapest total path:
MergeJoin(tenk1/onek) rows=1000 cost=0.52..144.24
  clauses: tenk1.unique2 = onek.unique2
IdxScan(tenk1) rows=1 cost=0.00..583.25
  pathkeys: ((tenk1.unique2, onek.unique2))
IdxScan(onek) rows=1000 cost=0.00..72.25
  pathkeys: ((tenk1.unique2, onek.unique2))

   QUERY PLAN
-
 Merge Join  (cost=0.52..144.24 rows=1000 width=488)
   Merge Cond: (tenk1.unique2 = onek.unique2)
   -  Index Scan using tenk1_unique2 on tenk1  (cost=0.00..583.25
rows=1 width=244)
   -  Index Scan using onek_unique2 on onek  (cost=0.00..72.25 rows=1000
width=244)
(4 rows)

bench=#

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-10 Thread Robert Haas
 I meant that -- hypothetically if we did accept the feature as-is -- the
 number of evaluations would be documented to be undefined, not N. That
 would avoid the backwards-compatibility problem.

 This one point is probably not worth discussing now, because argument
 #1 and #2 stand on their own.

Agreed.  Plus, both Tom and Pavel seem to think this is a relatively
solvable problem.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner question

2008-09-10 Thread Tom Lane
Tom Raney [EMAIL PROTECTED] writes:
 Why does the index scan for tenk1 include a path key from 
 onek.unique2?  Is it implying an equivalence there?

 bench=# explain select * from tenk1 JOIN onek ON 
 tenk1.unique2=onek.unique2;

Yes, for an example like that the planner knows that tenk1.unique2 and
onek.unique2 will have equal values in any valid join row, so it's okay
to suppose that a sort by one is the same as a sort by the other.  So
the pathkey items actually reference sets of variables
{tenk1.unique2, onek.unique2}
not just individual variables.

 RELOPTINFO (tenk1): rows=1 width=244
  path list:
  SeqScan(tenk1) rows=1 cost=0.00..434.00
  IdxScan(tenk1) rows=1 cost=0.00..583.25
pathkeys: ((tenk1.unique2, onek.unique2))  ---

  cheapest startup path:
  SeqScan(tenk1) rows=1 cost=0.00..434.00

  cheapest total path:
  SeqScan(tenk1) rows=1 cost=0.00..434.00

Hm, I don't recognize this output format, is it coming from some custom
code?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-10 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:



Do you still have the iGen setup available? Want to give it a shot?


Not sure if I have it configured, need to check. But I'll try it or I'll ask 
Jignesh or Paul if they have a free time. They are real benchmark gurus.


Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Simon Riggs

On Wed, 2008-09-10 at 11:52 -0400, Tom Lane wrote:

 I'm thinking that maybe an appropriate fix is to insert a
 GetTransactionSnapshot call at the beginning of InitPostgres'
 transaction, thus ensuring that every backend has some vaguely sane
 value for RecentGlobalXmin before it tries to do any database access.

Can't we just set RecentGlobalXmin without getting a Snapshot? There's
no need for a snapshot at that point is there? Just lock ProcArrayLock,
read GlobalXmin, drop lock.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Wed, 2008-09-10 at 11:52 -0400, Tom Lane wrote:
 I'm thinking that maybe an appropriate fix is to insert a
 GetTransactionSnapshot call at the beginning of InitPostgres'
 transaction, thus ensuring that every backend has some vaguely sane
 value for RecentGlobalXmin before it tries to do any database access.

 Can't we just set RecentGlobalXmin without getting a Snapshot?

Well, certainly building an MVCC snapshot is more than we (appear to)
need, but I'm of the opinion that what we have got here is an unexpected
way in which these specialized transactions are unlike all others.
I think the safest fix is to make them more like normal transactions,
rather than invent still other ways to do things.  If there were a
serious performance argument against that, then yeah, but I don't see
one.  Backend startup isn't cheap in any case, and neither is a VACUUM,
so the incremental cost involved here seems negligible.

 There's
 no need for a snapshot at that point is there? Just lock ProcArrayLock,
 read GlobalXmin, drop lock.

There's no GlobalXmin variable.  We'd have to use GetOldestXmin();
which is cheaper than GetSnapshotData, but not hugely so.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner question

2008-09-10 Thread Tom Raney

Tom Lane wrote:

Tom Raney [EMAIL PROTECTED] writes:
Why does the index scan for tenk1 include a path key from 
onek.unique2?  Is it implying an equivalence there?


bench=# explain select * from tenk1 JOIN onek ON 
tenk1.unique2=onek.unique2;


Yes, for an example like that the planner knows that tenk1.unique2 and
onek.unique2 will have equal values in any valid join row, so it's okay
to suppose that a sort by one is the same as a sort by the other.  So
the pathkey items actually reference sets of variables
{tenk1.unique2, onek.unique2}
not just individual variables.


Thanks.




RELOPTINFO (tenk1): rows=1 width=244
 path list:
 SeqScan(tenk1) rows=1 cost=0.00..434.00
 IdxScan(tenk1) rows=1 cost=0.00..583.25
   pathkeys: ((tenk1.unique2, onek.unique2))  ---



 cheapest startup path:
 SeqScan(tenk1) rows=1 cost=0.00..434.00



 cheapest total path:
 SeqScan(tenk1) rows=1 cost=0.00..434.00


Hm, I don't recognize this output format, is it coming from some custom
code?


Yes, it is.  I thought it was easier to read the OPTIMIZER_DEBUG 
output with the relation names instead of the relation indexes.  I 
will post a patch against CVS HEAD if you think it will help others.


-Tom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Alvaro Herrera
Tom Lane wrote:

 Sorry, I got a bit confused there.  The vacuum's intentional pruning
 will use its own OldestXmin variable, which is known current at the
 start of the vacuum (and I think there were even proposals to advance
 it more frequently than that).  However, a vacuum could require some
 incidental system catalog fetches, which I think could result in
 prune operations based on RecentGlobalXmin on the catalog pages
 (cf index_getnext).

Hmm, right, and what Heikki said too.

 Anyway I think we are on the same page about the rest of the issues.
 Did you want to work on fixing them, or shall I?

Is this more or less what you had in mind?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Index: src/backend/access/heap/heapam.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.262
diff -c -p -r1.262 heapam.c
*** src/backend/access/heap/heapam.c	11 Aug 2008 11:05:10 -	1.262
--- src/backend/access/heap/heapam.c	10 Sep 2008 19:50:36 -
*** heapgetpage(HeapScanDesc scan, BlockNumb
*** 219,224 
--- 219,225 
  	/*
  	 * Prune and repair fragmentation for the whole page, if possible.
  	 */
+ 	Assert(TransactionIdIsValid(RecentGlobalXmin));
  	heap_page_prune_opt(scan-rs_rd, buffer, RecentGlobalXmin);
  
  	/*
*** heap_hot_search_buffer(ItemPointer tid, 
*** 1469,1474 
--- 1470,1477 
  	if (all_dead)
  		*all_dead = true;
  
+ 	Assert(TransactionIdIsValid(RecentGlobalXmin));
+ 
  	Assert(ItemPointerGetBlockNumber(tid) == BufferGetBlockNumber(buffer));
  	offnum = ItemPointerGetOffsetNumber(tid);
  	at_chain_start = true;
Index: src/backend/access/index/indexam.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/index/indexam.c,v
retrieving revision 1.109
diff -c -p -r1.109 indexam.c
*** src/backend/access/index/indexam.c	19 Jun 2008 00:46:03 -	1.109
--- src/backend/access/index/indexam.c	10 Sep 2008 19:51:38 -
*** index_getnext(IndexScanDesc scan, ScanDi
*** 419,424 
--- 419,426 
  	SCAN_CHECKS;
  	GET_SCAN_PROCEDURE(amgettuple);
  
+ 	Assert(TransactionIdIsValid(RecentGlobalXmin));
+ 
  	/*
  	 * We always reset xs_hot_dead; if we are here then either we are just
  	 * starting the scan, or we previously returned a visible tuple, and in
Index: src/backend/commands/vacuum.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.376
diff -c -p -r1.376 vacuum.c
*** src/backend/commands/vacuum.c	13 Aug 2008 00:07:50 -	1.376
--- src/backend/commands/vacuum.c	10 Sep 2008 20:17:32 -
*** vac_update_datfrozenxid(void)
*** 790,803 
  	bool		dirty = false;
  
  	/*
! 	 * Initialize the min calculation with RecentGlobalXmin.	Any
! 	 * not-yet-committed pg_class entries for new tables must have
! 	 * relfrozenxid at least this high, because any other open xact must have
! 	 * RecentXmin = its PGPROC.xmin = our RecentGlobalXmin; see
! 	 * AddNewRelationTuple().  So we cannot produce a wrong minimum by
! 	 * starting with this.
  	 */
! 	newFrozenXid = RecentGlobalXmin;
  
  	/*
  	 * We must seqscan pg_class to find the minimum Xid, because there is no
--- 790,801 
  	bool		dirty = false;
  
  	/*
! 	 * Initialize the min calculation with GetOldestXmin, which is a
! 	 * reasonable approximation to the minimum relfrozenxid for not-yet-
! 	 * committed pg_class entries for new tables; see AddNewRelationTuple().
! 	 * Se we cannot produce a wrong minimum by starting with this.
  	 */
! 	newFrozenXid = GetOldestXmin(true, true);
  
  	/*
  	 * We must seqscan pg_class to find the minimum Xid, because there is no
*** vacuum_rel(Oid relid, VacuumStmt *vacstm
*** 990,1007 
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
  
! 	if (vacstmt-full)
! 	{
! 		/* functions in indexes may want a snapshot set */
! 		PushActiveSnapshot(GetTransactionSnapshot());
! 	}
! 	else
  	{
  		/*
! 		 * During a lazy VACUUM we do not run any user-supplied functions, and
! 		 * so it should be safe to not create a transaction snapshot.
! 		 *
! 		 * We can furthermore set the PROC_IN_VACUUM flag, which lets other
  		 * concurrent VACUUMs know that they can ignore this one while
  		 * determining their OldestXmin.  (The reason we don't set it during a
  		 * full VACUUM is exactly that we may have to run user- defined
--- 988,1003 
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
  
! 	/*
! 	 * Functions in indexes may want a snapshot set.  Also, setting
! 	 * a snapshot ensures that RecentGlobalXmin is kept truly recent.
! 	 */
! 	

Re: [HACKERS] Potential Join Performance Issue

2008-09-10 Thread Simon Riggs

On Tue, 2008-09-09 at 11:21 -0700, Lawrence, Ramon wrote:

 Our research group has been using the PostgreSQL code base to test new
 join algorithms.

Sounds cool. I'm sure you'll come up with some good things.

You might be interested in this also
http://archives.postgresql.org/pgsql-hackers/2007-01/msg01600.php
after which Greg Stark and I were investigating using
alternate/compressed data structures to avoid the need to switch to
multi-batch hash joins. 

If we knew we were dealing with nearly contiguous ranges of discrete
values, we could store the missing values rather than the present values
using an HRL encoded bitmap. Other ideas are possible also, I'm sure.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Simon Riggs

On Wed, 2008-09-10 at 16:11 -0400, Tom Lane wrote:
 If there were a serious performance argument against that, then yeah,
 but I don't see one.

Maybe! Just finishing other patch then I'll be starting Hot Standby
discussions, so we'll see.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Anyway I think we are on the same page about the rest of the issues.
 Did you want to work on fixing them, or shall I?

 Is this more or less what you had in mind?

Yeah, this looks like exactly what I had in mind for HEAD.  I'm not too
sure about the back branches though.  I think we could apply all of it
to 8.3, but further back is going to require a separate investigation
for each branch.  Will you take that on?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Anyway I think we are on the same page about the rest of the issues.
  Did you want to work on fixing them, or shall I?
 
  Is this more or less what you had in mind?
 
 Yeah, this looks like exactly what I had in mind for HEAD.  I'm not too
 sure about the back branches though.  I think we could apply all of it
 to 8.3, but further back is going to require a separate investigation
 for each branch.  Will you take that on?

Sure.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Tom Lane
I wrote:
 Yeah, this looks like exactly what I had in mind for HEAD.  I'm not too
 sure about the back branches though.  I think we could apply all of it
 to 8.3, but further back is going to require a separate investigation
 for each branch.  Will you take that on?

BTW, I did a quick look at all the uses of RecentGlobalXmin in the back
branches, and I think we might be all right before 8.2.  The older
branches do in fact init RecentGlobalXmin to InvalidTransactionId,
and the only thing they use it for is is this tuple dead to everyone
tests.  Since InvalidTransactionId compares older than anything else,
the only consequence of not having set it is overly-conservative
decisions not to mark tuples killed.  So unless you see a problem I
missed, I think we only have an issue-worth-fixing in 8.2 and 8.3.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec

2008-09-10 Thread Kevin Grittner
 Tom Lane [EMAIL PROTECTED] wrote: 
 
 The patch includes regression test changes that illustrate what it
does.
 I am not sure about some of the corner cases --- anyone want to see
if
 their understanding of the spec for interval string is different?
 
The patch seems to support extensions to the standard.
 
(1)  In the spec, an interval value or literal must be either
year-month or day-time.  (I guess they didn't want to try to deal with
the sticky issues of what it means to have an interval of, for
example, seven months and three days -- since an interval has no sense
of which seven months.)
 
(2)  The interval qualifier is not optional in the spec.
 
(3)  It seems to me that they were requiring that there be a
one-to-one match between the fields specified in the interval
qualifier and the fields present in the interval string.
 
(4)  I'm not 100% sure on this one, but it seemed to me that they were
requiring year to be four digits and other components (except for
fractional seconds) to be two digits.
 
So long as they are documented, there's nothing wrong with extensions.
Nothing I saw suggests that legal interval literals would be
misinterpreted.
 
 There is still some unfinished business if anyone wants to make it
 really exactly 100% spec compliant.  In particular the spec seems to
 allow a minus sign *outside* the string literal,
 
I agree.  They go so far as to point out how it should be interpreted
if the minus is present in both allowed locations (both inside and
outside the quotes):
 
5) The sign in a signed numeric literal or an interval literal is
a monadic arithmetic operator. The
   monadic arithmetic operators + and * specify monadic plus and
monadic minus, respectively. If neither
   monadic plus nor monadic minus are specified in a signed numeric
literal or an interval literal, or if
   monadic plus is specified, then the literal is positive. If monadic
minus is specified in a signed numeric
   literal or interval literal, then the literal is negative. If
sign is specified in both possible locations in
   an interval literal, then the sign of the literal is determined by
normal mathematical interpretation of
   multiple sign operators.
 
 and if I'm reading it
 right, a precision spec in combination with field restrictions ought
to
 look like INTERVAL '...' DAY TO SECOND(3) not INTERVAL(3) '...' DAY
TO
 SECOND.
 
Agreed.
 
 However, for these you'll get a syntax error instead of
 silently wrong answers if you try to use the other syntax, so it's
not
 quite as pernicious as the matters addressed here.
 
Agreed.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Transaction Snapshots and Hot Standby

2008-09-10 Thread Simon Riggs
Transaction snapshots is probably the most difficult problem for Hot
Standby to resolve. We *must* have a transaction snapshot to access
table data in a consistent way, or we must accept some inconsistency,
though that is not simple either.

We can derive transaction snapshots

  * remotely from primary node
  * locally on the standby node

If we derive a snapshot locally, then we will end up with a situation
where the xmin of the local snapshot precedes the xmin of the primary
node. When this occurs it will then be possible for WAL records to
arrive on the standby that request removal of rows that a transaction
might wish to see. Preventing that situation can be done by either
deferring WAL apply or by cancelling queries. 

We can defer WAL apply for particular tables only, but this could
significantly complicate the apply process and is not a suggested option
for the first release of this feature.

We might control query cancellation by tracking which tables have had
rows removed that would have been visible to particular queries. Again,
possible but suffers from the problem that tables on which HOT is
frequently active would be almost unusable. So not a general solution.

Simply ignoring WAL removal has been discussed and rejected.
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00753.php
Explicitly defining the tables a transaction wishes to see has also been
discussed and rejected.
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00268.php

So the most generally applicable and simplest initial solution for
generating snapshots is to take them from the remote primary node. The
proposal for this follows:

Transaction snapshots will be managed by a new process, Snapproc.
Snapproc will start only in recovery mode and will exit when normal
processing mode begins. Snapshot process will connect to the remote
primary node and export snapshot data and copy this to shared memory on
the standby node. When all standby backends have run
UnregisterSnapshot() then the snapshot will then be unregistered on the
remote primary node also.

The standby must not think a transaction is visible until all changes
made by it are have been applied. So snapshots from the primary cannot
be used on the standby until the LSN at which they were taken has been
reached by WAL apply on the standby. Snapshots don't normally have an
LSN, so we must derive that information as well when we take a Snapshot.
With asynchronous commits a transaction can be committed and yet not yet
written to disk, so we cannot rely on the current WAL write pointer.
Taking the WAL insert pointer is the safest thing to do, though most
heavily contended. We don't want to hold ProcArrayLock while requesting
WALInsertLock, so we will get the LSN of the WAL insert pointer *after*
the Snapshot has been derived (it doesn't matter that much, as long as
its not before the snapshot).

So snapshots from the primary take time before they can be used. The
delay is exactly the current processing delay from primary to standby,
so another reason why we do not wish to fall behind.

Taking snapshots from primary has a few disadvantages

  * snapshots take time before they are usable
  * requesting new snapshots is via remote request
  * snapshots on primary prevent row removal (but this was also an
advantage of this technique!)

If primary and standby node are connected by private ethernet then the
snapshot request time will be ~10ms, though that is probably 100 times
slower than current snapshot access. If primary and standby are opposite
sides of an ocean then times could be as high as 500ms. Taking snapshots
directly could be annoyingly slow for small read-only statements, so we
need to buffer this process in various ways. For larger queries, this
may not be a problem at all, but we do not wish to limit or assume the
uses of the standby node. 

First of all, each backend will have the opportunity to reuse previous
snapshots both within transactions and across them. A userset parameter
snapshot_reuse_window = 0..6ms will define the time window in which
any request for a new snapshot will simply result in being fed the last
snapshot again. When the window on a snapshot has expired a newer
snapshot will be presented instead. This idea is similar to serializable
transactions, which continually reuse the same snapshot. This is a
useful parameter for normal processing as well, since it will reduce
contention on the ProcArrayLock for many applications. 

Snapshots can be reused across transactions in recovery mode, since they
are held in shared memory. Amount of shared memory dedicated to storing
snapshots will be max_connections * max size of snapshots.

Since there is a delay between obtaining a new snapshot and it becoming
usable the Snapshot process will buffer them until they become mature,
like a good Whiskey. Snapshot process will take regular snapshots and
pre-age them so that when a backend requests a snapshot it will be 

Re: [HACKERS] Proposed patch: make SQL interval-literal syntaxwork per spec

2008-09-10 Thread Kevin Grittner
 Kevin Grittner [EMAIL PROTECTED] wrote: 
 
 (4)  I'm not 100% sure on this one, but it seemed to me that they
were
 requiring year to be four digits and other components (except for
 fractional seconds) to be two digits.
 
That can't be right.  Maybe I saw that in datetime literal specs.
 
Apologies.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec

2008-09-10 Thread Tom Lane
Kevin Grittner [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote: 
 I am not sure about some of the corner cases --- anyone want to see if
 their understanding of the spec for interval string is different?
 
 The patch seems to support extensions to the standard.

Right.  All of these were extensions that already existed in PG.

 (3)  It seems to me that they were requiring that there be a
 one-to-one match between the fields specified in the interval
 qualifier and the fields present in the interval string.

Yeah.  I couldn't actually find any such statement in SQL92, but
the SQL:2008 draft has this at 5.3 rule 30:

  30)Let N be the number of primary datetime fields in the precision of
  the interval literal, as specified by interval qualifier.

  The interval literal being defined shall contain N datetime components. 

and at rule 34:

  34)Within the definition of an interval literal that contains a
  year-month literal, the interval qualifier shall not specify DAY,
  HOUR, MINUTE, or SECOND. Within the definition of an interval
  literal that contains a day-time literal, the interval qualifier
  shall not specify YEAR or MONTH.

This seems to be requiring that not only do you give the exact number of
components, but the formatting must match the expectation.  So anything
we accept beyond that is gravy.  I think that most of the extension
cases were already being accepted in some form, and I'd be hesitant
to take them out for fear of breaking existing applications.
 
 There is still some unfinished business if anyone wants to make it
 really exactly 100% spec compliant ...
 
 I agree.

I committed the patch as presented, and I think I might go take a quick
look at the other two issues.  What I suspect I'll find is that the
minus sign issue isn't fixable without turning INTERVAL into a fully
reserved word, which is probably a cure worse than the disease.  However
it might be fairly easy to get the grammar to allow the precision in
either place.  (We'd want to keep the old way working for backward
compatibility.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Alvaro Herrera
Tom Lane wrote:

 BTW, I did a quick look at all the uses of RecentGlobalXmin in the back
 branches, and I think we might be all right before 8.2.  The older
 branches do in fact init RecentGlobalXmin to InvalidTransactionId,
 and the only thing they use it for is is this tuple dead to everyone
 tests.  Since InvalidTransactionId compares older than anything else,
 the only consequence of not having set it is overly-conservative
 decisions not to mark tuples killed.  So unless you see a problem I
 missed, I think we only have an issue-worth-fixing in 8.2 and 8.3.

Actually, 8.2 initializes it to InvalidTransactionId too, so it doesn't
seem like there's a problem there either.  Since the problem stems from
using it as initializer for the Min() calculation, there's no actual bug
on 8.2 either.  The bug only appeared on 8.3 when the initializer was
changed.  And given that there's no HOT in 8.2, then there's no danger
of misusing it in page pruning either.

Still, I produced a patch to 8.2, but given that I'm not sure if it's
worth applying.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Index: src/backend/commands/vacuum.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.342.2.5
diff -c -p -r1.342.2.5 vacuum.c
*** src/backend/commands/vacuum.c	11 Feb 2008 19:14:38 -	1.342.2.5
--- src/backend/commands/vacuum.c	11 Sep 2008 01:08:10 -
*** vac_update_datfrozenxid(void)
*** 764,777 
  	bool		dirty = false;
  
  	/*
! 	 * Initialize the min calculation with RecentGlobalXmin.  Any
! 	 * not-yet-committed pg_class entries for new tables must have
! 	 * relfrozenxid at least this high, because any other open xact must have
! 	 * RecentXmin = its PGPROC.xmin = our RecentGlobalXmin; see
! 	 * AddNewRelationTuple().  So we cannot produce a wrong minimum by
! 	 * starting with this.
  	 */
! 	newFrozenXid = RecentGlobalXmin;
  
  	/*
  	 * We must seqscan pg_class to find the minimum Xid, because there is no
--- 764,775 
  	bool		dirty = false;
  
  	/*
! 	 * Initialize the min calculation with GetOldestXmin, which is a
! 	 * reasonable approximation to the minimum relfrozenxid for not-yet-
! 	 * committed pg_class entries for new tables; see AddNewRelationTuple().
! 	 * Se we cannot produce a wrong minimum by starting with this.
  	 */
! 	newFrozenXid = GetOldestXmin(true, true);
  
  	/*
  	 * We must seqscan pg_class to find the minimum Xid, because there is no
*** vacuum_rel(Oid relid, VacuumStmt *vacstm
*** 965,982 
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
  
! 	if (vacstmt-full)
! 	{
! 		/* functions in indexes may want a snapshot set */
! 		ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
! 	}
! 	else
  	{
  		/*
! 		 * During a lazy VACUUM we do not run any user-supplied functions, and
! 		 * so it should be safe to not create a transaction snapshot.
! 		 *
! 		 * We can furthermore set the inVacuum flag, which lets other
  		 * concurrent VACUUMs know that they can ignore this one while
  		 * determining their OldestXmin.  (The reason we don't set inVacuum
  		 * during a full VACUUM is exactly that we may have to run user-
--- 963,978 
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
  
! 	/*
! 	 * Functions in indexes may want a snapshot set. Also, setting
! 	 * a snapshot ensures that RecentGlobalXmin is kept truly recent.
! 	 */
! 	ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
! 
! 	if (!vacstmt-full)
  	{
  		/*
! 		 * During a lazy VACUUM we can set the inVacuum flag, which lets other
  		 * concurrent VACUUMs know that they can ignore this one while
  		 * determining their OldestXmin.  (The reason we don't set inVacuum
  		 * during a full VACUUM is exactly that we may have to run user-
Index: src/backend/utils/init/postinit.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/utils/init/postinit.c,v
retrieving revision 1.172
diff -c -p -r1.172 postinit.c
*** src/backend/utils/init/postinit.c	5 Nov 2006 22:42:09 -	1.172
--- src/backend/utils/init/postinit.c	10 Sep 2008 23:43:45 -
***
*** 42,47 
--- 42,48 
  #include utils/guc.h
  #include utils/portal.h
  #include utils/relcache.h
+ #include utils/tqual.h
  #include utils/syscache.h
  #include pgstat.h
  
*** InitPostgres(const char *dbname, const c
*** 380,389 
  	on_shmem_exit(ShutdownPostgres, 0);
  
  	/*
! 	 * Start a new transaction here before first access to db
  	 */
  	if (!bootstrap)
  		StartTransactionCommand();
  
  	/*
  	 * Now that we have a transaction, we can take locks.  Take a writer's
--- 381,395 
  	on_shmem_exit(ShutdownPostgres, 0);
  
  	/*
! 	 * Start a new transaction here 

Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec

2008-09-10 Thread Josh Berkus



(1)  In the spec, an interval value or literal must be either
year-month or day-time.  (I guess they didn't want to try to deal with
the sticky issues of what it means to have an interval of, for
example, seven months and three days -- since an interval has no sense
of which seven months.)


Note that, for usability reasons, Karel some time ago try-partitioned 
our intervals: year-month|day-week|hour-min-sec.


--Josh


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting glitch in autovacuum

2008-09-10 Thread Pavan Deolasee
On Wed, Sep 10, 2008 at 9:22 PM, Tom Lane [EMAIL PROTECTED] wrote:


 On reflection I'm not even sure that this is strictly an autovacuum
 bug.  It can be cast more generically as RecentGlobalXmin getting
 used without ever having been set, and it sure looks to me like the
 HOT patch may have introduced a few risks of that sort.


ISTM that HOT may be safe here because even if RecentGlobalXmin is not
set and has the boot time value of FirstNormalTransactionId, the
heap_page_prune_opt() would just return without doing any real work.
This is OK except in VACUUM where we anyways use OldestXmin. So I
don't see a real problem here. But its good to fix the larger problem
as per the suggestion.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Commitfest patches mostly assigned ... status

2008-09-10 Thread Josh Berkus

Hackers,

At this point, almost all patches have been assigned to reviewers.  If 
you submitted a patch and don't get feedback by Saturday, take a look at 
who's in the reviewers column and send them a query.  Since I have no 
way to track when patches are assigned to reviewers, I have no idea if 
some of them are sitting on their hands.


Some patches have not been assigned to reviewers for various reasons. 
The following weren't assigned because they are complex and really need 
a high-end hacker or committer to take them on:


libpq events
rmgr hooks and contrib/rmgr_hook
CLUSTER using sort instead of index scan

The following were not assigned because there has already been 
discussion on this list debating them being a good idea.  These need 
consensus on this list before they get assigned to a reviewer:


remove --inputdir and --outputdir from pg_regress
GUC: Case-insensitive units
Allow has_table_privilege(...,'usage') on sequences

I've assigned some reviewers to WIP patches with instructions to 
report back on their general experience with building, functionality and 
spec.


Note that patches need not have only one reviewer!  If you have time, 
please take on testing some of the more complex patches, especially:

Column-level Permissions
Common Table Expressions
SE-PostgreSQL patches

Also, note that the following patches need performance testing on a 
variety of platforms.  Everyone should help with this.

GSoC Improved Hash Indexing
posix fadvises
operator restrictivity function for text search
CLUSTER using sort instead of index scan

Thanks for you input, and let's close this commitfest in a week!

--CommitFest Mom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers