date:20100408

Thanks for the great patch! I apologize for leaving the issue
half-finished for long time :(

On Wed, Apr 7, 2010 at 7:02 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 In your version of this patch, the default was still the current
 behavior where the primary retains WAL files that are still needed by
 connected stadby servers indefinitely. I think that's a dangerous
 default, so I changed it so that if you don't set standby_keep_segments,
 the primary doesn't retain any extra segments; the number of WAL
 segments available for standby servers is determined only by the
 location of the previous checkpoint, and the status of WAL archiving.
 That makes the code a bit simpler too, as we never care how far the
 walsenders are. In fact, the GetOldestWALSenderPointer() function is now
 dead code.

It's OK for me to change the default behavior. We can remove
the GetOldestWALSenderPointer() function.

doc/src/sgml/config.sgml
-archival or to recover from a checkpoint. If standby_keep_segments
+archival or to recover from a checkpoint. If
varnamestandby_keep_segments/

The word standby_keep_segments always needs the varname tag, I think.

We should remove the document 25.2.5.2. Monitoring?

Why is standby_keep_segments used even if max_wal_senders is zero?
In that case, ISTM we don't need to keep any WAL files in pg_xlog
for the standby.

When XLogRead() reads two WAL files and only the older of them is recycled
during being read, it might fail in checking whether the read data is valid.
This is because the variable recptr can advance to the newer WAL file
before the check.

When walreceiver has gotten stuck for some reason, walsender would be
unable to pass through the send() system call, and also get stuck.
In the patch, such a walsender cannot exit forever because it cannot
call XLogRead(). So I think that the bgwriter needs to send the
exit-signal to such a too lagged walsender. Thought?

The shmem of latest recycled WAL file is updated before checking whether
it's already been archived. If archiving is not working for some reason,
the WAL file which that shmem indicates might not actually have been
recycled yet. In this case, the standby cannot obtain the WAL file from
the primary because it's been marked as latest recycled, and from the
archive because it's not been archived yet. This seems to be a big problem.
How about moving the update of the shmem to after calling XLogArchiveCheckDone()
in RemoveOldXlogFiles()?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, Apr 8, 2010 at 10:41 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Apr 7, 2010 at 8:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
 OK, that looks a lot less risky than I had understood from discussions.
 The main thing for me is it doesn't interfere with Startup or
 WalReceiver, so assuming it works I've got no objections. Thanks for
 chasing this down, good addition.

 Thanks.  Committed.

Thanks. The following TODO item should be removed?

Redefine smart shutdown in standby mode to exist as soon as all
read-only connections are gone.
http://wiki.postgresql.org/wiki/Todo#Standby_server_mode

Or change it to something like?

Change smart shutdown in standby mode so that it kills the startup
 and walreceiver process before waiting for the regular backends to die off

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Forbid using pg_xlogfile_name() and pg_xlogfile_name_offset()

Fujii Masao wrote:
 On Wed, Apr 7, 2010 at 7:23 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 This commit is a stop-gap solution until we figure out what exactly to
 do about that. Masao-san wrote a patch that included the TLI in the
 string returned by pg_last_xlog_receive/replay_location() (see
 http://archives.postgresql.org/message-id/3f0b79eb1003030603ibd0cbadjebb09fa424930...@mail.gmail.com
 and
 http://archives.postgresql.org/message-id/3f0b79eb1003300214r6cf98c46tc9be5d563ccf4...@mail.gmail.com),
 but it still wasn't clear it did the right thing in corner-cases where
 the TLI changes. Using GetRecoveryTargetTLI() for the tli returned by
 pg_last_receive_location() seems bogus, at least.
 
 Why? The tli of the last WAL record received is always the
 recovery target tli currently.

True.

Hmm, currently pg_last_xlog_receive_location() returns the last location
 streamed via streaming replication. Should that be changed so that it
also advances when a WAL segment is restored from archive? It seems
strange that pg_last_xlog_receive_location() can be smaller than
pg_last_xlog_replay_location().

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Forbid using pg_xlogfile_name() and pg_xlogfile_name_offset()

On Thu, 2010-04-08 at 09:54 +0300, Heikki Linnakangas wrote:
Fujii Masao wrote:
On Wed, Apr 7, 2010 at 7:23 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
This commit is a stop-gap solution until we figure out what exactly to
do about that. Masao-san wrote a patch that included the TLI in the
string returned by pg_last_xlog_receive/replay_location() (see
http://archives.postgresql.org/message-id/3f0b79eb1003030603ibd0cbadjebb09fa424930...@mail.gmail.com
and
http://archives.postgresql.org/message-id/3f0b79eb1003300214r6cf98c46tc9be5d563ccf4...@mail.gmail.com),
but it still wasn't clear it did the right thing in corner-cases where
the TLI changes. Using GetRecoveryTargetTLI() for the tli returned by
pg_last_receive_location() seems bogus, at least.

Why? The tli of the last WAL record received is always the
recovery target tli currently.

True.

Only in streaming mode. If you use the current TLI as I have suggested
then it will be correct in more cases.

Hmm, currently pg_last_xlog_receive_location() returns the last location
streamed via streaming replication. Should that be changed so that it
also advances when a WAL segment is restored from archive? It seems
strange that pg_last_xlog_receive_location() can be smaller than
pg_last_xlog_replay_location().

Yes, it should be changed.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Forbid using pg_xlogfile_name() and pg_xlogfile_name_offset()

On Thu, Apr 8, 2010 at 4:06 PM, Simon Riggs si...@2ndquadrant.com wrote:
On Thu, 2010-04-08 at 09:54 +0300, Heikki Linnakangas wrote:
Fujii Masao wrote:
On Wed, Apr 7, 2010 at 7:23 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
This commit is a stop-gap solution until we figure out what exactly to
do about that. Masao-san wrote a patch that included the TLI in the
string returned by pg_last_xlog_receive/replay_location() (see
http://archives.postgresql.org/message-id/3f0b79eb1003030603ibd0cbadjebb09fa424930...@mail.gmail.com
and
http://archives.postgresql.org/message-id/3f0b79eb1003300214r6cf98c46tc9be5d563ccf4...@mail.gmail.com),
but it still wasn't clear it did the right thing in corner-cases where
the TLI changes. Using GetRecoveryTargetTLI() for the tli returned by
pg_last_receive_location() seems bogus, at least.

Why? The tli of the last WAL record received is always the
recovery target tli currently.

True.

Only in streaming mode. If you use the current TLI as I have suggested
then it will be correct in more cases.

pg_last_xlog_receive_location() might be executed also after archive
recovery ends. In this case, using the current tli seems not correct
because it's always different from the recovery target tli after recovery.

Yes, it should be changed.

Should it advance when WAL file in pg_xlog is read? If not,
pg_last_xlog_receive_location() can be smaller than
pg_last_xlog_replay_location().

And, how far should it advance when WAL file is
partially-filled for some reasons?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Dave Page

On Thu, Apr 8, 2010 at 6:11 AM, Greg Stark st...@mit.edu wrote:
 Because the poster chose to send it to pgsql-admin instead of
 pgsql-general (or pgsql-bugs) very few of the usual suspects had a
 chance to see it. 7 days later a question about a rather serious
 database corruption problem had no responses. I've never understand
 what the point of pgsql-admin is;  just about every question posted is
 an admin question of some sort.

I can't argue with that... but a counter argument is that merging
lists would significantly increase the traffic on -general would may
not be appreciated by the many people that are only subscribed to one
or two of the affected lists. I would wager that the majority of
people aren't subscribed to more than a small number of the available
lists.

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Forbid using pg_xlogfile_name() and pg_xlogfile_name_offset()

On Thu, 2010-04-08 at 16:41 +0900, Fujii Masao wrote:
   Why? The tli of the last WAL record received is always the
   recovery target tli currently.
 
  True.
 
  Only in streaming mode. If you use the current TLI as I have suggested
  then it will be correct in more cases.
 
 pg_last_xlog_receive_location() might be executed also after archive
 recovery ends. In this case, using the current tli seems not correct
 because it's always different from the recovery target tli after recovery.

Which is why the code I write says if (RecoveryInProgress())

  Hmm, currently pg_last_xlog_receive_location() returns the last location
   streamed via streaming replication. Should that be changed so that it
  also advances when a WAL segment is restored from archive? It seems
  strange that pg_last_xlog_receive_location() can be smaller than
  pg_last_xlog_replay_location().
 
  Yes, it should be changed.
 
 Should it advance when WAL file in pg_xlog is read? If not,
 pg_last_xlog_receive_location() can be smaller than
 pg_last_xlog_replay_location().
 
 And, how far should it advance when WAL file is
 partially-filled for some reasons?

Just make pg_last_xlog_receive_location() do exactly the same thing as
pg_last_xlog_replay_location() when working with files. No need to try
to keep two things exactly in sync.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Dave Page

On Thu, Apr 8, 2010 at 8:46 AM, Dave Page dp...@pgadmin.org wrote:

 I can't argue with that... but a counter argument is ...

Yes, I know. Clearly it's coffee time :-p



-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walreceiver is uninterruptible on win32

On Wed, Apr 7, 2010 at 1:45 AM, Magnus Hagander mag...@hagander.net wrote:
 No, I don't mean that. I mean store it in one place, and copy/link it
 into where it's used. Look at for example how crypt.c and
 getaddrinfo.c are handled in libpq.

Thanks for the advice!

 Not sure how that will play with PGXS, though, but I'm not entirely
 sure we care if it can be built that way?

Probably Yes.

 If it does, there should be
 some way to get PGXS to execute that rule as well, I'm sure.

If we can copy/link the source file defining new PQexec when
we compile the dblink, DLL doesn't seem to be required. So I
stop creating new DLL for PGXS.

 Also note that per Tom's comments this is not a win32 only fix, so it
 shouldn't be called pgwin32_*().

Yep.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Magnus Hagander

On Thu, Apr 8, 2010 at 9:46 AM, Dave Page dp...@pgadmin.org wrote:
 On Thu, Apr 8, 2010 at 6:11 AM, Greg Stark st...@mit.edu wrote:
 Because the poster chose to send it to pgsql-admin instead of
 pgsql-general (or pgsql-bugs) very few of the usual suspects had a
 chance to see it. 7 days later a question about a rather serious
 database corruption problem had no responses. I've never understand
 what the point of pgsql-admin is;  just about every question posted is
 an admin question of some sort.

 I can't argue with that... but a counter argument is that merging
 lists would significantly increase the traffic on -general would may
 not be appreciated by the many people that are only subscribed to one
 or two of the affected lists. I would wager that the majority of
 people aren't subscribed to more than a small number of the available
 lists.

That's actually something we can easily find out, if we can get a list
of the subscribers emails into a Real Database. I know this bunch of
database geeks who write strange ess-cue-ell kweriis, or whatever
they call it, to make such analysis...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Oddly indented raw_expression_tree_walker

2010-04-08 Thread Tom Lane

Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp writes:
 I found raw_expression_tree_walker() is oddly indented in 8.4 and HEAD.
 I expected pgindent would fix those clutter, but it could not.
 Should we cleanup it manually, or leave it as-is?

There is exactly zero point in a manual cleanup, because pgindent will
just do it again next time.  If you want to try to fix pgindent,
though, have at it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

On Tue, 2010-04-06 at 10:22 +0100, Simon Riggs wrote:

 Initial patch. I will be testing over next day. No commit before at
 least midday on Wed 7 Apr.

Various previous discussions sidelined a very important point: what
exactly does it mean to start recovery from a shutdown checkpoint?

If standby_mode is enabled and there is no source of WAL, then we get a
stream of messages saying

LOG:  record with zero length at 0/C88
...

but most importantly we never get to the main recovery loop, so Hot
Standby never gets to start at all. We can't keep retrying the request
for WAL and at the same time enter the retry loop, executing lots of
things that expect non-NULL pointers using a NULL xlog pointer.

What we are asking for here is a completely new state: the database is
not in recovery - by definition there is nothing at all to recover. 

The following patch adds Snapshot Mode, a very simple variation on the
existing code - emphasis on the simple:

LOG:  entering snapshot mode
LOG:  record with zero length at 0/C88
LOG:  consistent recovery state reached at 0/C88
LOG:  database system is ready to accept read only connections

this mode does *not* continually check to see if new WAL files have been
added. Startup just sits and waits, backends allowed. If a trigger file
is specified, then we can leave recovery. Otherwise Startup process just
sits doing nothing.

There's possibly an argument for inventing some more special modes where
we do allow read only connections but don't start the bgwriter. I don't
personally wish to do this at this stage of the release cycle. The
attached patch is non-invasive and safe and I want to leave it at that.

I will be committing later today, unless major objections, but I ask you
to read the patch before you sharpen your pen. It's simple.

-- 
 Simon Riggs   www.2ndQuadrant.com
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e2566a4..365cd17 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1719,6 +1719,88 @@ PrescanPreparedTransactions(TransactionId **xids_p, int *nxids_p)
 }
 
 /*
+ * StandbyRecoverPreparedTransactions
+ *
+ * Scan the pg_twophase directory and setup all the required information to
+ * allow standby queries to treat prepared transactions as still active.
+ * This is never called at the end of recovery - we use
+ * RecoverPreparedTransactions() at that point.
+ *
+ * Currently we simply call SubTransSetParent() for any subxids of prepared
+ * transactions.
+ */
+void
+StandbyRecoverPreparedTransactions(bool can_overwrite)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	cldir = AllocateDir(TWOPHASE_DIR);
+	while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
+	{
+		if (strlen(clde-d_name) == 8 
+			strspn(clde-d_name, 0123456789ABCDEF) == 8)
+		{
+			TransactionId xid;
+			char	   *buf;
+			TwoPhaseFileHeader *hdr;
+			TransactionId *subxids;
+			int			i;
+
+			xid = (TransactionId) strtoul(clde-d_name, NULL, 16);
+
+			/* Already processed? */
+			if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
+			{
+ereport(WARNING,
+		(errmsg(removing stale two-phase state file \%s\,
+clde-d_name)));
+RemoveTwoPhaseFile(xid, true);
+continue;
+			}
+
+			/* Read and validate file */
+			buf = ReadTwoPhaseFile(xid, true);
+			if (buf == NULL)
+			{
+ereport(WARNING,
+	  (errmsg(removing corrupt two-phase state file \%s\,
+			  clde-d_name)));
+RemoveTwoPhaseFile(xid, true);
+continue;
+			}
+
+			/* Deconstruct header */
+			hdr = (TwoPhaseFileHeader *) buf;
+			if (!TransactionIdEquals(hdr-xid, xid))
+			{
+ereport(WARNING,
+	  (errmsg(removing corrupt two-phase state file \%s\,
+			  clde-d_name)));
+RemoveTwoPhaseFile(xid, true);
+pfree(buf);
+continue;
+			}
+
+			/*
+			 * Examine subtransaction XIDs ... they should all follow main
+			 * XID, and they may force us to advance nextXid.
+			 */
+			subxids = (TransactionId *)
+(buf + MAXALIGN(sizeof(TwoPhaseFileHeader)));
+			for (i = 0; i  hdr-nsubxacts; i++)
+			{
+TransactionId subxid = subxids[i];
+
+Assert(TransactionIdFollows(subxid, xid));
+SubTransSetParent(xid, subxid, can_overwrite);
+			}
+		}
+	}
+	FreeDir(cldir);
+}
+
+/*
  * RecoverPreparedTransactions
  *
  * Scan the pg_twophase directory and reload shared-memory state for each
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3000ab7..d26b369 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -180,6 +180,7 @@ static TimestampTz recoveryLastXTime = 0;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyMode = false;
+static bool SnapshotMode = false;
 static char *PrimaryConnInfo = NULL;
 char	   *TriggerFile = NULL;
 
@@ -3845,7 +3846,7 @@ next_record_is_invalid:
 	}
 
 	/* In standby-mode, keep trying */
-	if (StandbyMode)
+	if

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

Simon Riggs wrote:
 On Tue, 2010-04-06 at 10:22 +0100, Simon Riggs wrote:
 
 Initial patch. I will be testing over next day. No commit before at
 least midday on Wed 7 Apr.
 
 Various previous discussions sidelined a very important point: what
 exactly does it mean to start recovery from a shutdown checkpoint?

Hot standby should be possible as soon we know that the database is
consistent. That is, as soon as we've replayed WAL past the
minRecoveryPoint/backupStartPoint point indicated in pg_control.

 If standby_mode is enabled and there is no source of WAL, then we get a
 stream of messages saying
 
 LOG:  record with zero length at 0/C88
 ...
 
 but most importantly we never get to the main recovery loop, so Hot
 Standby never gets to start at all. We can't keep retrying the request
 for WAL and at the same time enter the retry loop, executing lots of
 things that expect non-NULL pointers using a NULL xlog pointer.

You mean it can't find even the checkpoint record to start replaying? I
think the behavior in that scenario is fine as it is. The database isn't
consistent (or at least we can't know if it is, because we don't know
the redo pointer) until you read and replay the first checkpoint record.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

On Thu, Apr 8, 2010 at 3:46 AM, Dave Page dp...@pgadmin.org wrote:
 On Thu, Apr 8, 2010 at 6:11 AM, Greg Stark st...@mit.edu wrote:
 Because the poster chose to send it to pgsql-admin instead of
 pgsql-general (or pgsql-bugs) very few of the usual suspects had a
 chance to see it. 7 days later a question about a rather serious
 database corruption problem had no responses. I've never understand
 what the point of pgsql-admin is;  just about every question posted is
 an admin question of some sort.

 I can't argue with that... but a counter argument is that merging
 lists would significantly increase the traffic on -general would may
 not be appreciated by the many people that are only subscribed to one
 or two of the affected lists. I would wager that the majority of
 people aren't subscribed to more than a small number of the available
 lists.

Yeah.  I read -performance, -hackers, -bugs, but not -sql, -admin,
-general.  Consolidating multiple mailing lists to increase viewership
of certain messages is only going to work if everyone who now follows
each of the smaller mailing lists does an equally good job following
the bigger one.  That doesn't seem like a safe assumption.

I might be able to buy an argument that -admin is too fuzzy to be
readily distinguishable, although I don't really know since I don't
read it.  But -performance seems to have a fairly well-defined charter
and it's a subset of messages I enjoy reading.  Of course if some
performance questions get posted elsewhere, yeah, I'll miss them, but
oh well: reading every message on every topic hasn't seemed like a
good way to address that problem.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, Apr 8, 2010 at 2:54 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Apr 8, 2010 at 10:41 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Apr 7, 2010 at 8:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
 OK, that looks a lot less risky than I had understood from discussions.
 The main thing for me is it doesn't interfere with Startup or
 WalReceiver, so assuming it works I've got no objections. Thanks for
 chasing this down, good addition.

 Thanks.  Committed.

 Thanks. The following TODO item should be removed?

 Redefine smart shutdown in standby mode to exist as soon as all
 read-only connections are gone.
 http://wiki.postgresql.org/wiki/Todo#Standby_server_mode

 Or change it to something like?

 Change smart shutdown in standby mode so that it kills the startup
  and walreceiver process before waiting for the regular backends to die off

Yeah, we should do one of those two things, but I don't much care which.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [pgadmin-hackers] Feature request: limited deletions

2010-04-08 Thread Thom Brown

On 8 April 2010 11:55, Ian Barwick barw...@gmail.com wrote:

 2010/4/8 Thom Brown thombr...@gmail.com:
  I couldn't find any discussion on this, but the request is quite
  straightforward.  Implement a LIMIT on DELETE statements like SELECT
  statements.
 
  So you could write:
 
  DELETE FROM massive_table WHERE id  4000 LIMIT 1;
 
  This would allow deletions in smaller batches rather than waiting
  potentially hours for the server to mark all those rows as deleted and
  commit it as one massive transaction.

 Is this a PgAdmin-specific question? If it is, apologies I am missing
 the context.

 If not, this is totally the wrong list, but why not use a subquery to
 control what is deleted?


 Ian Barwick


Erm... my mistake, I thought this was on the generic hackers list.  Moving
it over in this reply.

Thom

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

On Thu, 2010-04-08 at 13:33 +0300, Heikki Linnakangas wrote:

  If standby_mode is enabled and there is no source of WAL, then we get a
  stream of messages saying
  
  LOG:  record with zero length at 0/C88
  ...
  
  but most importantly we never get to the main recovery loop, so Hot
  Standby never gets to start at all. We can't keep retrying the request
  for WAL and at the same time enter the retry loop, executing lots of
  things that expect non-NULL pointers using a NULL xlog pointer.
 
 You mean it can't find even the checkpoint record to start replaying? 

Clearly I don't mean that. Otherwise it wouldn't be start from a
shutdown checkpoint. I think you are misunderstanding me.

Let me explain in more detail though please also read the patch before
replying, if you do.

The patch I submitted at top of this thread works for allowing Hot
Standby during recovery. Yes, of course that occurs when the database is
consistent. The trick is to get recovery to the point where it can be
enabled. The second patch on this thread presents a way to get the
database to that point; it touches some of the other recovery code that
you and Masao have worked on. We *must* touch that code if we are to
enable Hot Standby in the way you desire.

In StartupXlog() when we get to the point where we Find the first
record that logically follows the checkpoint, in the current code
ReadRecord() loops forever, spitting out
LOG: record with zero length at 0/C88
...

That prevents us from going further down StartupXLog() to the point
where we start the InRedo loop and hence start hot standby. As long as
we retry we cannot progress further: this is the main problem.

So in the patch, I have modified the retry test in ReadRecord() so it no
longer retries iff there is no WAL source defined. Now, when
ReadRecord() exits, record == NULL at that point and so we do not (and
cannot) enter the redo loop.

So I have introduced the new mode (snapshot mode) to enter hot standby
anyway. That avoids us having to screw around with the loop logic for
redo. I don't see any need to support the case of where we have no WAL
source defined, yet we want Hot Standby but we also want to allow
somebody to drop a WAL file into pg_xlog at some future point. That has
no use case of value AFAICS and is too complex to add at this stage of
the release cycle.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, 2010-04-08 at 06:58 -0400, Robert Haas wrote:

  Thanks.  Committed.
 
  Thanks. The following TODO item should be removed?
 
  Redefine smart shutdown in standby mode to exist as soon as all
  read-only connections are gone.
  http://wiki.postgresql.org/wiki/Todo#Standby_server_mode

  Or change it to something like?
 
  Change smart shutdown in standby mode so that it kills the startup
   and walreceiver process before waiting for the regular backends to die off
 
 Yeah, we should do one of those two things, but I don't much care which.

I do. I see no reason to do the latter, ever, so should not be added to
any TODO.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [pgadmin-hackers] Feature request: limited deletions

On Thu, Apr 8, 2010 at 7:05 AM, Thom Brown thombr...@gmail.com wrote:
 On 8 April 2010 11:55, Ian Barwick barw...@gmail.com wrote:

 2010/4/8 Thom Brown thombr...@gmail.com:
  I couldn't find any discussion on this, but the request is quite
  straightforward.  Implement a LIMIT on DELETE statements like SELECT
  statements.
 
  So you could write:
 
  DELETE FROM massive_table WHERE id  4000 LIMIT 1;
 
  This would allow deletions in smaller batches rather than waiting
  potentially hours for the server to mark all those rows as deleted and
  commit it as one massive transaction.

 Is this a PgAdmin-specific question? If it is, apologies I am missing
 the context.

 If not, this is totally the wrong list, but why not use a subquery to
 control what is deleted?

 Erm... my mistake, I thought this was on the generic hackers list.  Moving
 it over in this reply.

I've certainly worked around the lack of this syntax more than once.
And I bet it's not even that hard to implement.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, Apr 8, 2010 at 7:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, 2010-04-08 at 06:58 -0400, Robert Haas wrote:

  Thanks.  Committed.
 
  Thanks. The following TODO item should be removed?
 
  Redefine smart shutdown in standby mode to exist as soon as all
  read-only connections are gone.
  http://wiki.postgresql.org/wiki/Todo#Standby_server_mode

  Or change it to something like?
 
  Change smart shutdown in standby mode so that it kills the startup
   and walreceiver process before waiting for the regular backends to die 
  off

 Yeah, we should do one of those two things, but I don't much care which.

 I do. I see no reason to do the latter, ever, so should not be added to
 any TODO.

Well, stopping recovery earlier would mean fewer locks, which would
mean a better chance for the read-only backends to finish their work
and exit quickly.  But I'm not sure how much it's worth worrying
about.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, 2010-04-08 at 07:53 -0400, Robert Haas wrote:

  I do. I see no reason to do the latter, ever, so should not be added to
  any TODO.
 
 Well, stopping recovery earlier would mean fewer locks, which would
 mean a better chance for the read-only backends to finish their work
 and exit quickly.  But I'm not sure how much it's worth worrying
 about.

The purpose of the lock is to prevent access to objects when they are in
inappropriate states for access. If we stopped startup and allowed
access, how do we know that things are in sufficiently good state to
allow access? We don't. If the Startup process is holding a lock then
that is the only safe thing to do. Otherwise we might allow access to a
table with a partially built index or other screw ups.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, Apr 8, 2010 at 8:00 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, 2010-04-08 at 07:53 -0400, Robert Haas wrote:

  I do. I see no reason to do the latter, ever, so should not be added to
  any TODO.

 Well, stopping recovery earlier would mean fewer locks, which would
 mean a better chance for the read-only backends to finish their work
 and exit quickly.  But I'm not sure how much it's worth worrying
 about.

 The purpose of the lock is to prevent access to objects when they are in
 inappropriate states for access. If we stopped startup and allowed
 access, how do we know that things are in sufficiently good state to
 allow access? We don't. If the Startup process is holding a lock then
 that is the only safe thing to do. Otherwise we might allow access to a
 table with a partially built index or other screw ups.

Hmm.  Good point.  I guess you could really only stop the startup
process safely when it wasn't holding any locks anyhow - you couldn't
just kill it and have it release the locks.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

On Thu, Apr 8, 2010 at 6:16 AM, Simon Riggs si...@2ndquadrant.com wrote:
 If standby_mode is enabled and there is no source of WAL, then we get a
 stream of messages saying

 LOG:  record with zero length at 0/C88
 ...

 but most importantly we never get to the main recovery loop, so Hot
 Standby never gets to start at all. We can't keep retrying the request
 for WAL and at the same time enter the retry loop, executing lots of
 things that expect non-NULL pointers using a NULL xlog pointer.

This is pretty much a corner case, so I don't think it's a good idea
to add a new mode to handle it.  It also seems like it would be pretty
inconsistent if we allow WAL to be dropped in pg_xlog, but only if we
are also doing archive recovery or streaming replication.  If we can't
support this case with the same code path we use otherwise, I think we
should revert to disallowing it.

Having said that, I guess I don't understand how having a source of
WAL solves the problem described above.  Do we always have to read at
least 1 byte of WAL from either SR or the archive before starting up?
If not, why do we need to do so here?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system table/view and sequence

2010-04-08 Thread Olivier Baheux

On 7 avr, 17:44, nicolas.barb...@gmail.com (Nicolas Barbier) wrote:
 2010/4/7 Olivier Baheux olivierbah...@gmail.com:

  i'm trying to find where are stored sequence definition
  (increment,minvalue,maxvalue,start,cache) in system tables. Atm I
  found everything exept sequence.

 It's in the sequence itself (which can be accessed like a table). The
 fact that this table is in fact a sequence is stored in pg_class:

 8
 itsme=# CREATE TABLE a (i serial);
 HINWEIS:  CREATE TABLE erstellt implizit eine Sequenz »a_i_seq« für
 die »serial«-Spalte »a.i«
 CREATE TABLE
 itsme=# SELECT * FROM a_i_seq;
  sequence_name | last_value | increment_by |      max_value      |
 min_value | cache_value | log_cnt | is_cycled | is_called
 ---++--+-+---+-+-+---+---
  a_i_seq       |          1 |            1 | 9223372036854775807 |
     1 |           1 |       1 | f         | f
 (1 Zeile)

 itsme=# SELECT relkind FROM pg_class WHERE relname = 'a_i_seq';
  relkind
 -
  S
 (1 Zeile)
 8

 Nicolas

 --
 Sent via pgsql-hackers mailing list (pgsql-hack...@postgresql.org)
 To make changes to your 
 subscription:http://www.postgresql.org/mailpref/pgsql-hackers

it work, thanx so much

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [pgadmin-hackers] Feature request: limited deletions

2010-04-08 Thread Csaba Nagy

Hi all,

On Thu, 2010-04-08 at 07:45 -0400, Robert Haas wrote:
  2010/4/8 Thom Brown thombr...@gmail.com:
   So you could write:
  
   DELETE FROM massive_table WHERE id  4000 LIMIT 1;

 I've certainly worked around the lack of this syntax more than once.
 And I bet it's not even that hard to implement.

The fact that it's not implemented has nothing to do with it's
complexity (in fact it is probably just a matter of enabling it) -
you'll have a hard time to convince some old-time hackers on this list
that the non-determinism inherent in this kind of query is
acceptable ;-)

There is a workaround to do it, which works quite good in fact:

delete from massive_table where ctid = any(array(select ctid from
massive_table WHERE id  4000 LIMIT 1));

Just run an explain on it and you'll see it won't get any better, but
beware that it might be less optimal than you think, as you will be
likely sequential scanning the table for each chunk unless you put some
selective where conditions on it too - and then you'll still scan the
whole deleted part and not just the next chunk - the deleted records
won't go out of the way magically, you need to vacuum, and that's
probably a problem too on a big table. So most likely it will help you
less than you think on a massive table, the run time per chunk will
increase with each chunk unless you're able to vacuum efficiently. In
any case you need to balance the chunk size with the scanned portion of
the table so you get a reasonable run time per chunk, and not too much
overhead of the whole chunking process...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, 2010-04-08 at 09:40 -0400, Robert Haas wrote:
 On Thu, Apr 8, 2010 at 8:00 AM, Simon Riggs si...@2ndquadrant.com wrote:
  On Thu, 2010-04-08 at 07:53 -0400, Robert Haas wrote:
 
   I do. I see no reason to do the latter, ever, so should not be added to
   any TODO.
 
  Well, stopping recovery earlier would mean fewer locks, which would
  mean a better chance for the read-only backends to finish their work
  and exit quickly.  But I'm not sure how much it's worth worrying
  about.
 
  The purpose of the lock is to prevent access to objects when they are in
  inappropriate states for access. If we stopped startup and allowed
  access, how do we know that things are in sufficiently good state to
  allow access? We don't. If the Startup process is holding a lock then
  that is the only safe thing to do. Otherwise we might allow access to a
  table with a partially built index or other screw ups.
 
 Hmm.  Good point.  I guess you could really only stop the startup
 process safely when it wasn't holding any locks anyhow - you couldn't
 just kill it and have it release the locks.

... and if it isn't holding any locks at all, there is no reason to kill
Startup first = no TODO item.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, Apr 8, 2010 at 9:56 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, 2010-04-08 at 09:40 -0400, Robert Haas wrote:
 On Thu, Apr 8, 2010 at 8:00 AM, Simon Riggs si...@2ndquadrant.com wrote:
  On Thu, 2010-04-08 at 07:53 -0400, Robert Haas wrote:
 
   I do. I see no reason to do the latter, ever, so should not be added to
   any TODO.
 
  Well, stopping recovery earlier would mean fewer locks, which would
  mean a better chance for the read-only backends to finish their work
  and exit quickly.  But I'm not sure how much it's worth worrying
  about.
 
  The purpose of the lock is to prevent access to objects when they are in
  inappropriate states for access. If we stopped startup and allowed
  access, how do we know that things are in sufficiently good state to
  allow access? We don't. If the Startup process is holding a lock then
  that is the only safe thing to do. Otherwise we might allow access to a
  table with a partially built index or other screw ups.

 Hmm.  Good point.  I guess you could really only stop the startup
 process safely when it wasn't holding any locks anyhow - you couldn't
 just kill it and have it release the locks.

 ... and if it isn't holding any locks at all, there is no reason to kill
 Startup first = no TODO item.

I think you could shut it down at the first point at which it is
holding no locks, rather than letting it continue recovering and
potentially retake some new locks.  That would be more consistent with
the general idea of what a smart shutdown is supposed to be about.  I
think the real question is whether it's worth the code complexity.  I
suspect most people use fast shutdown most of the time anyway in
real-world applications.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Unsafe threading in syslogger on Windows

On Windows, syslogger uses two threads. The main thread loops and polls
if any SIGHUPs have been received or if the log file needs to be
rotated. Another thread, pipe thread, does ReadFile() on the pipe that
other processes send their log messages to. ReadFile() blocks, and
whenever new data arrives, it is processed in the pipe thread.

Both threads use palloc()/pfree(), which are not thread-safe :-(.

It's hard to trigger a crash because the main thread mostly just sleeps,
and the pipe thread only uses palloc()/pfree() when it receives chunked
messages, larger than 512 bytes. Browsing the CVS history, this was made
visibly broken by the patch that introduced the message chunking. Before
that the pipe thread just read from the pipe and wrote to the log file,
which was safe. It has always used ereport() to report read errors,
though, which can do palloc(), but there shouldn't normally be any read
errors.

I chatted with Magnus about this, and he suggested using a Windows
critical section to make sure that only one of the threads is active at
a time. That seems suitable for back-porting, but I'd like to get rid of
this threading in CVS head, it seems too error-prone.

The reason it uses threads like this on Windows is explained in the
comments:
 /*
  * Worker thread to transfer data from the pipe to the current logfile.
  *
  * We need this because on Windows, WaitForSingleObject does not work on
  * unnamed pipes: it always reports signaled, so the blocking ReadFile won't
  * allow for SIGHUP; and select is for sockets only.
  */

But Magnus pointed out that our pgpipe() implementation on Windows
actually creates a pair of sockets instead of pipes, for exactly that
reason, so that you can use select() on the returned file descriptor.
For some reason syslogger explicitly doesn't use pgpipe() on Windows,
though, but calls CreatePipe(). I don't see any explanation why.

I'm going to see what happens if I remove all the #ifdef WIN32 blocks in
syslogger, and let it use pgpipe() and select() instead of the extra thread.

Thoughts?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

Simon Riggs wrote:
 In StartupXlog() when we get to the point where we Find the first
 record that logically follows the checkpoint, in the current code
 ReadRecord() loops forever, spitting out
 LOG: record with zero length at 0/C88
 ...
 
 That prevents us from going further down StartupXLog() to the point
 where we start the InRedo loop and hence start hot standby. As long as
 we retry we cannot progress further: this is the main problem.
 
 So in the patch, I have modified the retry test in ReadRecord() so it no
 longer retries iff there is no WAL source defined. Now, when
 ReadRecord() exits, record == NULL at that point and so we do not (and
 cannot) enter the redo loop.

Oh, I see.

 So I have introduced the new mode (snapshot mode) to enter hot standby
 anyway. That avoids us having to screw around with the loop logic for
 redo. I don't see any need to support the case of where we have no WAL
 source defined, yet we want Hot Standby but we also want to allow
 somebody to drop a WAL file into pg_xlog at some future point. That has
 no use case of value AFAICS and is too complex to add at this stage of
 the release cycle.

You don't need a new mode for that. Just do the same are we consistent
now? check you do in the loop once before calling ReadRecord to fetch
the record that follows the checkpoint pointer. Attached is a patch to
show what I mean. We just need to let postmaster know that recovery has
started a bit earlier, right after processing the checkpoint record, not
delaying it until we've read the first record after it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
***
*** 1719,1724  PrescanPreparedTransactions(TransactionId **xids_p, int *nxids_p)
--- 1719,1806 
  }
  
  /*
+  * StandbyRecoverPreparedTransactions
+  *
+  * Scan the pg_twophase directory and setup all the required information to
+  * allow standby queries to treat prepared transactions as still active.
+  * This is never called at the end of recovery - we use
+  * RecoverPreparedTransactions() at that point.
+  *
+  * Currently we simply call SubTransSetParent() for any subxids of prepared
+  * transactions.
+  */
+ void
+ StandbyRecoverPreparedTransactions(bool can_overwrite)
+ {
+ 	DIR		   *cldir;
+ 	struct dirent *clde;
+ 
+ 	cldir = AllocateDir(TWOPHASE_DIR);
+ 	while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
+ 	{
+ 		if (strlen(clde-d_name) == 8 
+ 			strspn(clde-d_name, 0123456789ABCDEF) == 8)
+ 		{
+ 			TransactionId xid;
+ 			char	   *buf;
+ 			TwoPhaseFileHeader *hdr;
+ 			TransactionId *subxids;
+ 			int			i;
+ 
+ 			xid = (TransactionId) strtoul(clde-d_name, NULL, 16);
+ 
+ 			/* Already processed? */
+ 			if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
+ 			{
+ ereport(WARNING,
+ 		(errmsg(removing stale two-phase state file \%s\,
+ clde-d_name)));
+ RemoveTwoPhaseFile(xid, true);
+ continue;
+ 			}
+ 
+ 			/* Read and validate file */
+ 			buf = ReadTwoPhaseFile(xid, true);
+ 			if (buf == NULL)
+ 			{
+ ereport(WARNING,
+ 	  (errmsg(removing corrupt two-phase state file \%s\,
+ 			  clde-d_name)));
+ RemoveTwoPhaseFile(xid, true);
+ continue;
+ 			}
+ 
+ 			/* Deconstruct header */
+ 			hdr = (TwoPhaseFileHeader *) buf;
+ 			if (!TransactionIdEquals(hdr-xid, xid))
+ 			{
+ ereport(WARNING,
+ 	  (errmsg(removing corrupt two-phase state file \%s\,
+ 			  clde-d_name)));
+ RemoveTwoPhaseFile(xid, true);
+ pfree(buf);
+ continue;
+ 			}
+ 
+ 			/*
+ 			 * Examine subtransaction XIDs ... they should all follow main
+ 			 * XID, and they may force us to advance nextXid.
+ 			 */
+ 			subxids = (TransactionId *)
+ (buf + MAXALIGN(sizeof(TwoPhaseFileHeader)));
+ 			for (i = 0; i  hdr-nsubxacts; i++)
+ 			{
+ TransactionId subxid = subxids[i];
+ 
+ Assert(TransactionIdFollows(subxid, xid));
+ SubTransSetParent(xid, subxid, can_overwrite);
+ 			}
+ 		}
+ 	}
+ 	FreeDir(cldir);
+ }
+ 
+ /*
   * RecoverPreparedTransactions
   *
   * Scan the pg_twophase directory and reload shared-memory state for each
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 494,499  static XLogRecPtr minRecoveryPoint;		/* local copy of
--- 494,501 
  		 * ControlFile-minRecoveryPoint */
  static bool updateMinRecoveryPoint = true;
  
+ static bool reachedMinRecoveryPoint = false;
+ 
  static bool InRedo = false;
  
  /*
***
*** 547,552  static void ValidateXLOGDirectoryStructure(void);
--- 549,555 
  static void CleanupBackupHistory(void);
  static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
  static XLogRecord *ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt);
+ static void CheckRecoveryConsistency(void);
  static bool ValidXLOGHeader(XLogPageHeader hdr, int emode);
  static XLogRecord

Re: [HACKERS] Unsafe threading in syslogger on Windows

2010-04-08 Thread Andrew Dunstan




Heikki Linnakangas wrote:

On Windows, syslogger uses two threads. The main thread loops and polls
if any SIGHUPs have been received or if the log file needs to be
rotated. Another thread, pipe thread, does ReadFile() on the pipe that
other processes send their log messages to. ReadFile() blocks, and
whenever new data arrives, it is processed in the pipe thread.

Both threads use palloc()/pfree(), which are not thread-safe :-(.

It's hard to trigger a crash because the main thread mostly just sleeps,
and the pipe thread only uses palloc()/pfree() when it receives chunked
messages, larger than 512 bytes. Browsing the CVS history, this was made
visibly broken by the patch that introduced the message chunking. Before
that the pipe thread just read from the pipe and wrote to the log file,
which was safe. It has always used ereport() to report read errors,
though, which can do palloc(), but there shouldn't normally be any read
errors.

I chatted with Magnus about this, and he suggested using a Windows
critical section to make sure that only one of the threads is active at
a time. That seems suitable for back-porting, but I'd like to get rid of
this threading in CVS head, it seems too error-prone.

The reason it uses threads like this on Windows is explained in the
comments:
  

/*
 * Worker thread to transfer data from the pipe to the current logfile.
 *
 * We need this because on Windows, WaitForSingleObject does not work on
 * unnamed pipes: it always reports signaled, so the blocking ReadFile won't
 * allow for SIGHUP; and select is for sockets only.
 */



But Magnus pointed out that our pgpipe() implementation on Windows
actually creates a pair of sockets instead of pipes, for exactly that
reason, so that you can use select() on the returned file descriptor.
For some reason syslogger explicitly doesn't use pgpipe() on Windows,
though, but calls CreatePipe(). I don't see any explanation why.

I'm going to see what happens if I remove all the #ifdef WIN32 blocks in
syslogger, and let it use pgpipe() and select() instead of the extra thread.


  



Sounds reasonable. Let's see how big the changes are on HEAD. I'm not 
sure it's worth creating a different smaller fix for the back branches.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

On Thu, 2010-04-08 at 18:35 +0300, Heikki Linnakangas wrote:
 
  So I have introduced the new mode (snapshot mode) to enter hot
 standby
  anyway. That avoids us having to screw around with the loop logic
 for
  redo. I don't see any need to support the case of where we have no
 WAL
  source defined, yet we want Hot Standby but we also want to allow
  somebody to drop a WAL file into pg_xlog at some future point. That
 has
  no use case of value AFAICS and is too complex to add at this stage
 of
  the release cycle.
 
 You don't need a new mode for that. Just do the same are we
 consistent now? check you do in the loop once before calling
 ReadRecord to fetch the record that follows the checkpoint pointer.
 Attached is a patch to show what I mean. We just need to let
 postmaster know that recovery has started a bit earlier, right after
 processing the checkpoint record, not delaying it until we've read the
 first record after it.

OK, that seems better. I'm happy with that instead.

Have you tested this? Is it ready to commit?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

Simon Riggs wrote:
 OK, that seems better. I'm happy with that instead.
 
 Have you tested this? Is it ready to commit?

Only very briefly. I think the code is ready, but please review and test
to see I didn't miss anything.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

Robert Haas robertmh...@gmail.com wrote: 
 Dave Page dp...@pgadmin.org wrote:
 Greg Stark st...@mit.edu wrote:
 
 Because the poster chose to send it to pgsql-admin instead of
 pgsql-general (or pgsql-bugs) very few of the usual suspects had
 a chance to see it. 7 days later a question about a rather
 serious database corruption problem had no responses.
 
I do monitor that list, and try to respond to those issues I can,
but had no clue what that message was about -- so I left it for
someone else to take up.  I often see Tom responding to posts on
that list, so I kinda figure anything serious (or where I get it
wrong) will be addressed by him, but this thread makes me wonder
whether we should advise people not to post there when there is any
indication of possible corruption or bugs.
 
 I've never understand what the point of pgsql-admin is;  just
 about every question posted is an admin question of some sort.
 
I think you just answered your own question.  I've considered it to
be a list for DBAs (or those filling that role, regardless of title)
to discuss administrative and operational issues and best
practices.  That seems useful to me.
 
 a counter argument is that merging lists would significantly
 increase the traffic on -general would may not be appreciated by
 the many people that are only subscribed to one or two of the
 affected lists. I would wager that the majority of people aren't
 subscribed to more than a small number of the available lists.
 
 Yeah.  I read -performance, -hackers, -bugs, but not -sql, -admin,
 -general.
 
My set is different, but the principle is the same -- I can't find
the time to read all messages to all lists (really, I've tried), so
I limit by list to try to target the issues of most interest to me.
 
 Consolidating multiple mailing lists to increase viewership of
 certain messages is only going to work if everyone who now follows
 each of the smaller mailing lists does an equally good job
 following the bigger one.  That doesn't seem like a safe
 assumption.
 
Agreed.
 
Perhaps further clarifying the charters of the various lists would
help, but folding too much into any one list is likely to reduce the
number of readers or cause spotty attention.  (When I was
attempting to follow all the lists, I'd typically give up when I
fell about 6000 messages behind, and try to start up again cold
after having missed a big interval of messages.)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Greg Sabino Mullane


-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 Perhaps further clarifying the charters of the various lists would
 help, but folding too much into any one list is likely to reduce the
 number of readers or cause spotty attention.  (When I was
 attempting to follow all the lists, I'd typically give up when I
 fell about 6000 messages behind, and try to start up again cold
 after having missed a big interval of messages.)

I don't quite agree with this, because -general is *already* at the level 
where it takes a significant chunk of daily time to keep up with it. 
All the other mergeable lists pale in comparison to its volume.
I stopped trying to read lists completely a time ago, and merely read 
subject lines at this point, diving into ones that seem interesting 
or important.

Merging the smaller lists that have a huge overlap of topics with -general 
already would thus be a win, as there would be a larger audience to 
reply to, and less lists to administer and have people keep track of.
It would also reduce the confusion of which list should I post this to?

I think -admin should absolutely be folded in, -sql really should as well, 
and slightly less so -performance and -interfaces.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201004081214
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAku+AV0ACgkQvJuQZxSWSsgVvgCbBh9vsx2cecfAhZQRmju4Vtyi
zz0An0OjXFGtAtOyTUZFDDWGxrRZltBB
=gDQ+
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

On Thu, 2010-04-08 at 19:02 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  OK, that seems better. I'm happy with that instead.
  
  Have you tested this? Is it ready to commit?
 
 Only very briefly. I think the code is ready, but please review and test
 to see I didn't miss anything.

I'm going to need you to commit this. I'm on holiday now until 14 April,
so its not going to get a retest before then otherwise; its not smart to
commit and then go on holiday, IIRC. 

I've reviewed your changes and they look correct to me; the main chunk
of code is mine and that was tested by me.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Josh Berkus

On 4/7/10 10:11 PM, Greg Stark wrote:
 Likewise I don't think we should have pgsql-performance or pgsql-sql
 or pgsql-novice -- any thread appropriate for any of these would be
 better served by sending it to pgsql-general anyways (with the
 exception of pgsql-performance which has a weird combination of hacker
 threads and user performance tuning threads). Sending threads to
 pgsql-general would get more eyes on them and would avoid a lot of the
 cross-posting headaches. What would someone subscribed to one of these
 lists but not pgsql-general get anyways but some random sample of
 threads that might be vaguely performance or admin related. They would
 still miss most of the administration and performance questions and
 discussions which happen on -general and -hackers as appropriate.

(1) Regarding -sql and -performance, I couldn't disagree more.  I agree
that the charter of -admin is extremely vague.

(2) This is *definitely* the wrong list for this discussion; it should
be on -www.

And, no, #2 was not meant to be ironic, even if it is.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set LC_COLLATE to de_DE_phoneb

2010-04-08 Thread Frank Jagusch

Am Donnerstag, den 08.04.2010, 10:27 +0900 schrieb Takahiro Itagaki:
 Frank Jagusch fr...@jagusch-online.de wrote:
 
  de_DE_phoneb is the name of an alternative sorting in german (only a
  few languages have alternate sorting). You may find some information
  when you search the MSDN for de_DE_phoneb, i.e.
  http://msdn.microsoft.com/en-en/library/ms404373.aspx
  These alternate sorting is supported by the OS, but I don't know how it
  is supported in the msvcrt.
 
 Hmmm, I found de-DE_phoneb in MSDN:
 http://msdn.microsoft.com/en-us/library/dd374060
 but setlocale(de-DE_phoneb) always fails at least on my machine.

Is it a windows box? May be you need to install some german language
support?

 The doc says de-DE_phoneb is a locale name for 
 MAKELCID(MAKELANGID(LANG_GERMAN, SUBLANG_GERMAN), SORT_GERMAN_PHONE_BOOK).
 Some of native Win32 APIs could accept the locale and sort-order
 combination, but setlocale() in CRT seems to reject it.
 
 So, you could use the locale if you find a setlocale-compatible name of
 de-DE_phoneb.

As far as I investigated I didn't find one. I hoped to find such a
setlocale-compatible name or a hint where to search here...

 Or, you cannot use it, unless we modify PostgreSQL to
 use Win32 locale functions instead of standard libc ones -- but it is
 hardly acceptable.

I thought PostgreSQL is using OS specific functions for sorting (means
Win32 functions?). This sounds not good for my request. So I ask the
whole audience: Are others out there asking for a support for the
alternate sort orders? Is it worth to discuss further in this direction?

Here an other Idea: Is there a way to define a custom collation for a
database?

Thanks so far,
Frank Jagusch
-- 
http://www.jagusch-online.de/frank


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] autovacuum and temp tables support

2010-04-08 Thread Oleg Bartunov


Hi there,

our client complained about slow query, which involves temporary tables.
Analyzing them manually solved the problem. I don't remember arguments 
against temporary tables support by autovacuum. I'd appreciate any

pointers.

Also, it's worth to add autovacuum_enable_temp_tables variable to control
autovacuum behaviour ?

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Greg Stark

On Thu, Apr 8, 2010 at 5:09 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 My set is different, but the principle is the same -- I can't find
 the time to read all messages to all lists (really, I've tried), so
 I limit by list to try to target the issues of most interest to me.

But all it means is you get a random subset of the messages. You're
still missing most of the admin or sql or performance related threads
since they're mostly on -general anyways. Those three categories cover
pretty much all of -general.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

On Thu, Apr 8, 2010 at 2:30 PM, Greg Stark st...@mit.edu wrote:
 On Thu, Apr 8, 2010 at 5:09 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 My set is different, but the principle is the same -- I can't find
 the time to read all messages to all lists (really, I've tried), so
 I limit by list to try to target the issues of most interest to me.

 But all it means is you get a random subset of the messages. You're
 still missing most of the admin or sql or performance related threads
 since they're mostly on -general anyways. Those three categories cover
 pretty much all of -general.

Maybe we should remove -general.  :-)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

Greg Stark st...@mit.edu wrote:
 
 But all it means is you get a random subset of the messages.
 You're still missing most of the admin or sql or performance
 related threads since they're mostly on -general anyways. Those
 three categories cover pretty much all of -general.
 
Perhaps -general should be eliminated in favor of more specific
lists?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum and temp tables support

2010-04-08 Thread Alvaro Herrera

Oleg Bartunov wrote:

 our client complained about slow query, which involves temporary tables.
 Analyzing them manually solved the problem. I don't remember
 arguments against temporary tables support by autovacuum. I'd
 appreciate any
 pointers.

Autovacuum can't process temp tables; they could reside in a backend's
private temp buffers (local memory, not shared).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

Greg Stark st...@mit.edu wrote:
 
 But all it means is you get a random subset of the messages.
 You're still missing most of the admin or sql or performance
 related threads since they're mostly on -general anyways. Those
 three categories cover pretty much all of -general.
 
Well, one of these more specific lists must be getting over half of
the message relevant to the title, unless things are freakishly
evenly divided.  Message counts in the last 30 days:
 
 143 -novice
 199 -sql
 321 -admin
 436 -performance

1099  *subtotal*

1102 -general

2201  **total**

 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Ned Lilly

+1 for the idea, and +1 for the Zork reference. Hello sailor.

On 4/8/2010 1:11 AM Greg Stark wrote:

I've often said in the past that we have too many mailing lists with
overlapping and vague charters. I submit the following thread as
evidence that this causes real problems.

http://archives.postgresql.org/message-id/g2o4b46b5f01004010610ib8625426uae6ee90ac1435...@mail.gmail.com

Because the poster chose to send it to pgsql-admin instead of
pgsql-general (or pgsql-bugs) very few of the usual suspects had a
chance to see it. 7 days later a question about a rather serious
database corruption problem had no responses. I've never understand
what the point of pgsql-admin is; just about every question posted is
an admin question of some sort.

Likewise I don't think we should have pgsql-performance or pgsql-sql
or pgsql-novice -- any thread appropriate for any of these would be
better served by sending it to pgsql-general anyways (with the
exception of pgsql-performance which has a weird combination of hacker
threads and user performance tuning threads). Sending threads to
pgsql-general would get more eyes on them and would avoid a lot of the
cross-posting headaches. What would someone subscribed to one of these
lists but not pgsql-general get anyways but some random sample of
threads that might be vaguely performance or admin related. They would
still miss most of the administration and performance questions and
discussions which happen on -general and -hackers as appropriate.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Jaime Casanova

On Thu, Apr 8, 2010 at 2:31 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Apr 8, 2010 at 2:30 PM, Greg Stark st...@mit.edu wrote:
 On Thu, Apr 8, 2010 at 5:09 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 My set is different, but the principle is the same -- I can't find
 the time to read all messages to all lists (really, I've tried), so
 I limit by list to try to target the issues of most interest to me.

 But all it means is you get a random subset of the messages. You're
 still missing most of the admin or sql or performance related threads
 since they're mostly on -general anyways. Those three categories cover
 pretty much all of -general.

 Maybe we should remove -general.  :-)



if we want specific topics, then remove -general, -novice, -admin

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum and temp tables support

On Thu, Apr 8, 2010 at 2:53 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Oleg Bartunov wrote:

 our client complained about slow query, which involves temporary tables.
 Analyzing them manually solved the problem. I don't remember
 arguments against temporary tables support by autovacuum. I'd
 appreciate any
 pointers.

 Autovacuum can't process temp tables; they could reside in a backend's
 private temp buffers (local memory, not shared).

On general thought I've had is that it would be nice if the first
attempt to SELECT against a table with no statistics would trigger an
automatic ANALYZE by the backend on which the query was executed.
It's pretty common to populate a table using INSERT, or CTAS, or COPY
and then try to immediately run a query against it, and I've often
found that it's necessary to insert manual analyze statements in there
to get decent query plans.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Joshua D. Drake

On Thu, 2010-04-08 at 15:06 -0400, Jaime Casanova wrote:
 On Thu, Apr 8, 2010 at 2:31 PM, Robert Haas robertmh...@gmail.com wrote:
  On Thu, Apr 8, 2010 at 2:30 PM, Greg Stark st...@mit.edu wrote:
  On Thu, Apr 8, 2010 at 5:09 PM, Kevin Grittner
  kevin.gritt...@wicourts.gov wrote:
  My set is different, but the principle is the same -- I can't find
  the time to read all messages to all lists (really, I've tried), so
  I limit by list to try to target the issues of most interest to me.
 
  But all it means is you get a random subset of the messages. You're
  still missing most of the admin or sql or performance related threads
  since they're mostly on -general anyways. Those three categories cover
  pretty much all of -general.
 
  Maybe we should remove -general.  :-)
 
 
 
 if we want specific topics, then remove -general, -novice, -admin

This will likely never fly, see the archives.

Joshua D. Drake


 
 -- 
 Atentamente,
 Jaime Casanova
 Soporte y capacitación de PostgreSQL
 Asesoría y desarrollo de sistemas
 Guayaquil - Ecuador
 Cel. +59387171157
 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or 
Sir.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum and temp tables support

Robert Haas robertmh...@gmail.com wrote:
 On Thu, Apr 8, 2010 at 2:53 PM, Alvaro Herrera
 
 Autovacuum can't process temp tables; they could reside in a
 backend's private temp buffers (local memory, not shared).
 
 it would be nice if the first attempt to SELECT against a table
 with no statistics would trigger an automatic ANALYZE by the
 backend on which the query was executed.
 
+1 as an RFE
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Jaime Casanova

On Thu, Apr 8, 2010 at 3:09 PM, Joshua D. Drake j...@commandprompt.com wrote:
 On Thu, 2010-04-08 at 15:06 -0400, Jaime Casanova wrote:
 

 if we want specific topics, then remove -general, -novice, -admin

 This will likely never fly, see the archives.


well, -novice shuold be easy... actually it has no reason to exist.
after all what are the rules? you should subscribe here first 6 month
you use postgres or until you make a course?

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum and temp tables support

2010-04-08 Thread Oleg Bartunov


On Thu, 8 Apr 2010, Robert Haas wrote:


On Thu, Apr 8, 2010 at 2:53 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

Oleg Bartunov wrote:


our client complained about slow query, which involves temporary tables.
Analyzing them manually solved the problem. I don't remember
arguments against temporary tables support by autovacuum. I'd
appreciate any
pointers.


Autovacuum can't process temp tables; they could reside in a backend's
private temp buffers (local memory, not shared).


On general thought I've had is that it would be nice if the first
attempt to SELECT against a table with no statistics would trigger an
automatic ANALYZE by the backend on which the query was executed.
It's pretty common to populate a table using INSERT, or CTAS, or COPY
and then try to immediately run a query against it, and I've often
found that it's necessary to insert manual analyze statements in there
to get decent query plans.


Oracle does this. So, is't worth to add support (configurable, like
Oracle's optimizer_dynamic_sampling) ?


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum and temp tables support

On Thu, Apr 8, 2010 at 3:22 PM, Oleg Bartunov o...@sai.msu.su wrote:
 On general thought I've had is that it would be nice if the first
 attempt to SELECT against a table with no statistics would trigger an
 automatic ANALYZE by the backend on which the query was executed.
 It's pretty common to populate a table using INSERT, or CTAS, or COPY
 and then try to immediately run a query against it, and I've often
 found that it's necessary to insert manual analyze statements in there
 to get decent query plans.

 Oracle does this. So, is't worth to add support (configurable, like
 Oracle's optimizer_dynamic_sampling) ?

Well, dynamic sampling is considerably more complicated than what I
proposed, which is just to force an ordinary ANALYZE before the first
query against the table.  It would be a very powerful feature if we
could use it to ameliorate, for example, the gross statistical errors
that sometimes occur when multiple, correlated filter conditions are
applied to the same base table; but I don't think it's in the direct
path of solving the present complaint.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GSOC PostgreSQL partitioning issue

2010-04-08 Thread Necati Batur

Benefits of Project

Partitioning refers to splitting what is logically one large table
into smaller physical pieces. Partitioning can provide several
benefits:

Query performance can be improved dramatically in certain situations,
particularly when most of the heavily accessed rows of the table are
in a single partition or a small number of partitions. The
partitioning substitutes for leading columns of indexes, reducing
index size and making it more likely that the heavily-used parts of
the indexes fit in memory.

When queries or updates access a large percentage of a single
partition, performance can be improved by taking advantage of
sequential scan of that partition instead of using an index and random
access reads scattered across the whole table.

Bulk loads and deletes can be accomplished by adding or removing
partitions, if that requirement is planned into the partitioning
design. ALTER TABLE is far faster than a bulk operation. It also
entirely avoids the VACUUM overhead caused by a bulk DELETE.

Seldom-used data can be migrated to cheaper and slower storage media.

Delivarables

*The trigger based operations can be done automatically

*The stored procedures can help us to do some functionalities like
check constraint problem

*manual VACUUM or ANALYZE commands can be handled by using triggers
DBMS SQL can help to provide faster executions

*Some more functionalities can be added to UPDATE operations to make
administrations easy

Timeline (not exact but most probably)

Start at june 7 and End around 7 september

*Warm up to environment to Postgresql(1-2 weeks)

*Determine exact operations to be addded on postgresql

*Initial coding as to workbreakdown structure

*Start implementing on distributed environment to check inital functions work

*Write test cases for code

*Further implementation to support full functionalities on ideas

*Write it to discussion site and collect feedbacks

*More support upon feedbacks

*Last tests and documentation of final operations

About me

I am a senior student at computer engineering at iztech in turkey. My
areas of inetrests are information management, OOP(Object Oriented
Programming) and currently bioinformatics. I have been working with a
Asistan Professor(Jens Allmer) in molecular biology genetics
department for one year.Firstly, we worked on a protein database 2DB
and we presented the project in HIBIT09 organization. The Project  was
“Database management system independence by amending 2DB with a
database access layer”. Currently, I am working on another project
(Kerb) as my senior project which is a general sqeuential task
management system intend to reduce the errors and increase time saving
in biological experiments. We will present this project in HIBIT2010
too. Moreover,I am good at data structures and implementations on C.


Contact: e-mails; necatiba...@gmail.com , necati_ba...@hotmail.com(msn)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC PostgreSQL partitioning issue

On Thu, Apr 8, 2010 at 3:58 PM, Necati Batur necatiba...@gmail.com wrote:
 *The trigger based operations can be done automatically

 *The stored procedures can help us to do some functionalities like
 check constraint problem

 *manual VACUUM or ANALYZE commands can be handled by using triggers
 DBMS SQL can help to provide faster executions

 *Some more functionalities can be added to UPDATE operations to make
 administrations easy

I think you need to be a LOT more specific about each of these items
and what you intend to do about them.  You also need to explain the
relationship between your work and Itagaki Takahiro's existing patch.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC PostgreSQL partitioning issue

2010-04-08 Thread Necati Batur

The more specific of the items will just be the exact job I guess.
However the detailed form will be hard to write now.
Or should I explain the sql issues for each point?


2010/4/8 Necati Batur necatiba...@gmail.com:
 Benefits of Project

 Partitioning refers to splitting what is logically one large table
 into smaller physical pieces. Partitioning can provide several
 benefits:

 Query performance can be improved dramatically in certain situations,
 particularly when most of the heavily accessed rows of the table are
 in a single partition or a small number of partitions. The
 partitioning substitutes for leading columns of indexes, reducing
 index size and making it more likely that the heavily-used parts of
 the indexes fit in memory.

 When queries or updates access a large percentage of a single
 partition, performance can be improved by taking advantage of
 sequential scan of that partition instead of using an index and random
 access reads scattered across the whole table.

 Bulk loads and deletes can be accomplished by adding or removing
 partitions, if that requirement is planned into the partitioning
 design. ALTER TABLE is far faster than a bulk operation. It also
 entirely avoids the VACUUM overhead caused by a bulk DELETE.

 Seldom-used data can be migrated to cheaper and slower storage media.

 Delivarables

 *The trigger based operations can be done automatically

 *The stored procedures can help us to do some functionalities like
 check constraint problem

 *manual VACUUM or ANALYZE commands can be handled by using triggers
 DBMS SQL can help to provide faster executions

 *Some more functionalities can be added to UPDATE operations to make
 administrations easy

 Timeline (not exact but most probably)

 Start at june 7 and End around 7 september

 *Warm up to environment to Postgresql(1-2 weeks)

 *Determine exact operations to be addded on postgresql

 *Initial coding as to workbreakdown structure

 *Start implementing on distributed environment to check inital functions work

 *Write test cases for code

 *Further implementation to support full functionalities on ideas

 *Write it to discussion site and collect feedbacks

 *More support upon feedbacks

 *Last tests and documentation of final operations

 About me

 I am a senior student at computer engineering at iztech in turkey. My
 areas of inetrests are information management, OOP(Object Oriented
 Programming) and currently bioinformatics. I have been working with a
 Asistan Professor(Jens Allmer) in molecular biology genetics
 department for one year.Firstly, we worked on a protein database 2DB
 and we presented the project in HIBIT09 organization. The Project  was
 “Database management system independence by amending 2DB with a
 database access layer”. Currently, I am working on another project
 (Kerb) as my senior project which is a general sqeuential task
 management system intend to reduce the errors and increase time saving
 in biological experiments. We will present this project in HIBIT2010
 too. Moreover,I am good at data structures and implementations on C.


 Contact: e-mails; necatiba...@gmail.com , necati_ba...@hotmail.com(msn)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC PostgreSQL partitioning issue

On Thu, Apr 8, 2010 at 4:14 PM, Necati Batur necatiba...@gmail.com wrote:
 The more specific of the items will just be the exact job I guess.
 However the detailed form will be hard to write now.
 Or should I explain the sql issues for each point?

Partitioning is a big project.  It seems to me that if you want to
have any chance of making a meaningful contribution in one summer,
you're going to need to have a pretty specific idea of what you hope
to accomplish up front, and I don't think you have that right now.
The hard changes are not going to be adjustments to SQL syntax, but in
the guts of the planner, executor, system catalogs, etc.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC PostgreSQL partitioning issue

2010-04-08 Thread Necati Batur

In order to make a system change a student need to be more informed
and experienced for the issue.Nonetheless,this step of work is
actually not the phase of determining all the details,I
guess.Otherwise,I would just do a few weeks of coding in summer to
complete the project and I would be the person in charge in project
management :)

2010/4/8 Necati Batur necatiba...@gmail.com:
 Benefits of Project

 Partitioning refers to splitting what is logically one large table
 into smaller physical pieces. Partitioning can provide several
 benefits:

 Query performance can be improved dramatically in certain situations,
 particularly when most of the heavily accessed rows of the table are
 in a single partition or a small number of partitions. The
 partitioning substitutes for leading columns of indexes, reducing
 index size and making it more likely that the heavily-used parts of
 the indexes fit in memory.

 When queries or updates access a large percentage of a single
 partition, performance can be improved by taking advantage of
 sequential scan of that partition instead of using an index and random
 access reads scattered across the whole table.

 Bulk loads and deletes can be accomplished by adding or removing
 partitions, if that requirement is planned into the partitioning
 design. ALTER TABLE is far faster than a bulk operation. It also
 entirely avoids the VACUUM overhead caused by a bulk DELETE.

 Seldom-used data can be migrated to cheaper and slower storage media.

 Delivarables

 *The trigger based operations can be done automatically

 *The stored procedures can help us to do some functionalities like
 check constraint problem

 *manual VACUUM or ANALYZE commands can be handled by using triggers
 DBMS SQL can help to provide faster executions

 *Some more functionalities can be added to UPDATE operations to make
 administrations easy

 Timeline (not exact but most probably)

 Start at june 7 and End around 7 september

 *Warm up to environment to Postgresql(1-2 weeks)

 *Determine exact operations to be addded on postgresql

 *Initial coding as to workbreakdown structure

 *Start implementing on distributed environment to check inital functions work

 *Write test cases for code

 *Further implementation to support full functionalities on ideas

 *Write it to discussion site and collect feedbacks

 *More support upon feedbacks

 *Last tests and documentation of final operations

 About me

 I am a senior student at computer engineering at iztech in turkey. My
 areas of inetrests are information management, OOP(Object Oriented
 Programming) and currently bioinformatics. I have been working with a
 Asistan Professor(Jens Allmer) in molecular biology genetics
 department for one year.Firstly, we worked on a protein database 2DB
 and we presented the project in HIBIT09 organization. The Project  was
 “Database management system independence by amending 2DB with a
 database access layer”. Currently, I am working on another project
 (Kerb) as my senior project which is a general sqeuential task
 management system intend to reduce the errors and increase time saving
 in biological experiments. We will present this project in HIBIT2010
 too. Moreover,I am good at data structures and implementations on C.


 Contact: e-mails; necatiba...@gmail.com , necati_ba...@hotmail.com(msn)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] some more data corruption

2010-04-08 Thread Alvaro Herrera

Hi,

So I'm still investigating data corruption issues.  They have some
serious TOAST index corruption too; for example,

For example, notice how an indexscan returns invalid chunk_ids here:

select xmin,xmax,ctid,cmin, chunk_id, chunk_seq,length(chunk_data)
from pg_toast.pg_toast_18141
 where chunk_id between 194679500 and 194679700
order by chunk_id;
  xmin   |  xmax   |ctid | cmin | chunk_id  | chunk_seq | length
-+-+-+--+---+---+
 9767631 |   0 | (2855683,1) |   27 |  94929948 | 1 | 26
 9767659 |   0 | (2855683,5) |   28 |  94929949 | 1 | 26
 9767659 |   0 | (2855683,6) |   32 |  94929950 | 1 | 26
 9767659 |   0 | (2855683,7) |   34 |  94929951 | 1 | 26
 9767797 | 9773223 | (2855684,4) |   36 |  94929958 | 1 | 27
 9767797 | 9773223 | (2855684,5) |   38 |  94929959 | 1 | 27
 9767797 |   0 | (2855684,6) |   41 |  94929960 | 1 | 27



One thing that surprised me is that they are seeing such short values
being pushed out to TOAST.  The rows are typically 156 bytes wide, and
the values being pushed out are 30 bytes, so I fail to see why it would
get done.  Any theories?  Is this normal?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC PostgreSQL partitioning issue

On Thu, Apr 8, 2010 at 4:23 PM, Necati Batur necatiba...@gmail.com wrote:
 In order to make a system change a student need to be more informed
 and experienced for the issue.Nonetheless,this step of work is
 actually not the phase of determining all the details,I
 guess.Otherwise,I would just do a few weeks of coding in summer to
 complete the project and I would be the person in charge in project
 management :)

Well, obviously there are going to be details that won't get worked
out until you really settle down to do the project.  But I don't think
you've even really defined what exactly you would be working on.
Going through your deliverables one by one:

*The trigger based operations can be done automatically

What trigger-based operations are you talking about and how do you
plan to automate them?

*The stored procedures can help us to do some functionalities like
check constraint problem

What check constraint problem?  PostgreSQL already has check
constraints.  If you think they have a problem, say what it is.

*manual VACUUM or ANALYZE commands can be handled by using triggers
DBMS SQL can help to provide faster executions

We already have an autovacuum daemon that automates VACUUM and ANALYZE
commands and it works pretty well.  Certainly, there's room for
improvement, but what do you think needs improving?   What about the
current implementation do you not like and how do you propose to fix
it?

*Some more functionalities can be added to UPDATE operations to make
administrations easy

What administrations are currently difficult and what functionality do
you propose to add to simplify them?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Greg Stark

On Thu, Apr 8, 2010 at 7:58 PM, Ned Lilly n...@nedscape.com wrote:
 +1 for the idea, and +1 for the Zork reference.  Hello sailor.

fwiw it's older than Zork. It comes from Adventure
(http://en.wikipedia.org/wiki/Colossal_Cave_Adventure)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Magnus Hagander

On Thu, Apr 8, 2010 at 20:35, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Greg Stark st...@mit.edu wrote:

 But all it means is you get a random subset of the messages.
 You're still missing most of the admin or sql or performance
 related threads since they're mostly on -general anyways. Those
 three categories cover pretty much all of -general.

 Perhaps -general should be eliminated in favor of more specific
 lists?

That sounds like a great way to make things harder for newbies and outsiders.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unsafe threading in syslogger on Windows

Andrew Dunstan wrote:
 Heikki Linnakangas wrote:
 I'm going to see what happens if I remove all the #ifdef WIN32 blocks in
 syslogger, and let it use pgpipe() and select() instead of the extra
 thread.

 Sounds reasonable. Let's see how big the changes are on HEAD. I'm not
 sure it's worth creating a different smaller fix for the back branches.

I tried that, and got a crash somewhere in the code that inherits the
syslogger pipe/socket to the child process. I don't understand why, and
I don't feel like debugging any deeper into that right now. If you or
someone else wants to give it a shot, that would be good. If not, I
might try again some other day after sleeping over it.

Anyway, here's the patch I had in mind for back-branches.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 22cf5f2..ee449ac 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -117,7 +117,7 @@ HANDLE		syslogPipe[2] = {0, 0};
 
 #ifdef WIN32
 static HANDLE threadHandle = 0;
-static CRITICAL_SECTION sysfileSection;
+static CRITICAL_SECTION sysloggerSection;
 #endif
 
 /*
@@ -268,7 +268,8 @@ SysLoggerMain(int argc, char *argv[])
 
 #ifdef WIN32
 	/* Fire up separate data transfer thread */
-	InitializeCriticalSection(sysfileSection);
+	InitializeCriticalSection(sysloggerSection);
+	EnterCriticalSection(sysloggerSection);
 
 	threadHandle = (HANDLE) _beginthreadex(NULL, 0, pipeThread, NULL, 0, NULL);
 	if (threadHandle == 0)
@@ -423,8 +424,16 @@ SysLoggerMain(int argc, char *argv[])
 		 * On Windows we leave it to a separate thread to transfer data and
 		 * detect pipe EOF.  The main thread just wakes up once a second to
 		 * check for SIGHUP and rotation conditions.
+		 *
+		 * Server code isn't generally thread-safe, so we ensure that only
+		 * one of the threads is active at a time by entering the critical
+		 * section whenever we're not sleeping.
 		 */
+		LeaveCriticalSection(sysloggerSection);
+
 		pg_usleep(100L);
+
+		EnterCriticalSection(sysloggerSection);
 #endif   /* WIN32 */
 
 		if (pipe_eof_seen)
@@ -911,17 +920,9 @@ write_syslogger_file(const char *buffer, int count, int destination)
 	if (destination == LOG_DESTINATION_CSVLOG  csvlogFile == NULL)
 		open_csvlogfile();
 
-#ifdef WIN32
-	EnterCriticalSection(sysfileSection);
-#endif
-
 	logfile = destination == LOG_DESTINATION_CSVLOG ? csvlogFile : syslogFile;
 	rc = fwrite(buffer, 1, count, logfile);
 
-#ifdef WIN32
-	LeaveCriticalSection(sysfileSection);
-#endif
-
 	/* can't use ereport here because of possible recursion */
 	if (rc != count)
 		write_stderr(could not write to log file: %s\n, strerror(errno));
@@ -945,11 +946,21 @@ pipeThread(void *arg)
 	for (;;)
 	{
 		DWORD		bytesRead;
+		BOOL		result;
+
+		result = ReadFile(syslogPipe[0],
+		  logbuffer + bytes_in_logbuffer,
+		  sizeof(logbuffer) - bytes_in_logbuffer,
+		  bytesRead, 0);
 
-		if (!ReadFile(syslogPipe[0],
-	  logbuffer + bytes_in_logbuffer,
-	  sizeof(logbuffer) - bytes_in_logbuffer,
-	  bytesRead, 0))
+		/*
+		 * Enter critical section before doing anything that might touch
+		 * global state shared by the main thread. Anything that uses
+		 * palloc()/pfree() in particular are not safe outside the critical
+		 * section.
+		 */
+		EnterCriticalSection(sysloggerSection);
+		if (!result)
 		{
 			DWORD		error = GetLastError();
 
@@ -966,6 +977,7 @@ pipeThread(void *arg)
 			bytes_in_logbuffer += bytesRead;
 			process_pipe_input(logbuffer, bytes_in_logbuffer);
 		}
+		LeaveCriticalSection(sysloggerSection);
 	}
 
 	/* We exit the above loop only upon detecting pipe EOF */
@@ -974,6 +986,7 @@ pipeThread(void *arg)
 	/* if there's any data left then force it out now */
 	flush_pipe_input(logbuffer, bytes_in_logbuffer);
 
+	LeaveCriticalSection(sysloggerSection);
 	_endthread();
 	return 0;
 }
@@ -1097,18 +1110,9 @@ logfile_rotate(bool time_based_rotation, int size_rotation_for)
 		_setmode(_fileno(fh), _O_TEXT); /* use CRLF line endings on Windows */
 #endif
 
-		/* On Windows, need to interlock against data-transfer thread */
-#ifdef WIN32
-		EnterCriticalSection(sysfileSection);
-#endif
-
 		fclose(syslogFile);
 		syslogFile = fh;
 
-#ifdef WIN32
-		LeaveCriticalSection(sysfileSection);
-#endif
-
 		/* instead of pfree'ing filename, remember it for next time */
 		if (last_file_name != NULL)
 			pfree(last_file_name);
@@ -1164,18 +1168,9 @@ logfile_rotate(bool time_based_rotation, int size_rotation_for)
 		_setmode(_fileno(fh), _O_TEXT); /* use CRLF line endings on Windows */
 #endif
 
-		/* On Windows, need to interlock against data-transfer thread */
-#ifdef WIN32
-		EnterCriticalSection(sysfileSection);
-#endif
-
 		fclose(csvlogFile);
 		csvlogFile = fh;
 
-#ifdef WIN32
-		LeaveCriticalSection(sysfileSection);
-#endif
-
 		/* instead of pfree'ing filename, remember it for next time */
 		if (last_csv_file_name !=

Re: [HACKERS] GSOC PostgreSQL partitioning issue

2010-04-08 Thread Greg Smith

An introduction to the current state of work in progress for adding 
improved partitioning features to PostgreSQL is documented at 
http://wiki.postgresql.org/wiki/Table_partitioning


If you can find a small, targeted piece of that overall plan that builds 
on the work already done, and is in the direction of the final goal 
here, you may be able to make useful progress in a few months time.  
This area is extremely well explored already.  There are 13 mailing list 
threads you'll need to browse through carefully just to have enough 
background that you're likely to build something new, rather than just 
wandering down a path that's already been followed but leads to a dead end.


You have picked a PostgreSQL feature that is dramatically more difficult 
than it appears to be, and I wouldn't expect you'll actually finish even 
a fraction of your goals in a summer of work.  You're at least in 
plentiful company--most students do the same.  As a rule, if you see a 
feature on our TODO list that looks really useful and fun to work on, 
it's only still there because people have tried multiple times to build 
it completely but not managed to do so because it's harder than it 
appears.  This is certainly the case with improving the partitioning 
support that's built in to the database.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Thoughts on pg_hba.conf rejection

2010-04-08 Thread Joshua Tolley

On Wed, Apr 07, 2010 at 01:07:21PM -0400, Robert Haas wrote:
 On Wed, Apr 7, 2010 at 10:46 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Simon Riggs si...@2ndquadrant.com writes:
  When there is a specific reject rule, why does the server say
  FATAL:  no pg_hba.conf entry
 
  It's intentional.  We try to expose the minimum amount of knowledge
  about the contents of pg_hba.conf to potential attackers.
 
 The problem with the message is not that it's uninformative, but that
 it's counterfactual.
 
 ...Robert

I agree (I noticed and was bothered by this today, as a matter of irrelevant
fact). I can support the idea of exposing as little as possible of
pg_hba.conf, but ISTM the no pg_hba.conf entry is exposing too much, by that
standard. Just say something like connection disallowed and leave it at that
-- either it's disallowed by lack of a rule, or by existence of a reject
rule, or by something else entirely. As long as the message isn't clearly
wrong in the reject case, as it is now.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature

[HACKERS] a faster compression algorithm for pg_dump

2010-04-08 Thread Joachim Wieland

I'd like to revive the discussion about offering another compression
algorithm than zlib to at least pg_dump. There has been a previous
discussion here:

http://archives.postgresql.org/pgsql-performance/2009-08/msg00053.php

and it ended without any real result. The results so far were:

- There exist BSD-licensed compression algorithms
- Nobody knows a patent that is in our way
- Nobody can confirm that no patent is in our way

I do see a very real demand for replacing zlib which compresses quite
well but is slow as hell. For pg_dump what people want is cheap
compression, they usually prefer an algorithm that compresses less
optimal but that is really fast.

One question that I do not yet see answered is, do we risk violating a
patent even if we just link against a compression library, for example
liblzf, without shipping the actual code?

I have checked what other projects do, especially about liblzf which
would be my favorite choice (BSD license, available since quite some
time...) and there are other projects that actually ship the lzf code
(I haven't found a project that just links to it). The most prominent
projects are

- KOffice (implements a derived version in
koffice-2.1.2/libs/store/KoXmlReader.cpp)
- Virtual Box (ships it in vbox-ose-1.3.8/src/libs/liblzf-1.51)
- TuxOnIce (formerly known as suspend2 - linux kernel patch, ships it
in the patch)

We have pg_lzcompress.c which implements the compression routines for
the tuple toaster. Are we sure that we don't violate any patents with
this algorithm?


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] extended operator classes vs. type interfaces