[PATCHES] plpgpsm

2007-03-29 Thread Pavel Stehule

Hello

I have problem with sending propably too much large patch, so I am sending 
only link to patch. Is it correct? I sent this patch two times without 
success. This patch add new PL language to PostgreSQL. You can enable it by 
setting --with-sqlpsm in configure.


Proposal: http://archives.postgresql.org/pgsql-hackers/2007-03/msg01487.php

link to patch: http://www.pgsql.cz/patches/plpgpsm.diff.gz

Regards
Pavel Stehule

_
Chcete sdilet sve obrazky a hudbu s prateli? http://messenger.msn.cz/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [PATCH] add CLUSTER table USING index (take 2)

2007-03-29 Thread Heikki Linnakangas

Holger Schurig wrote:

Index: src/doc/src/sgml/ref/cluster.sgml
===
*** src.orig/doc/src/sgml/ref/cluster.sgml  2007-03-28 23:02:12.0 
+0200
--- src/doc/src/sgml/ref/cluster.sgml   2007-03-28 23:03:14.0 +0200
***
*** 20,27 
  
   refsynopsisdiv

  synopsis
! CLUSTER replaceable class=PARAMETERindexname/replaceable ON replaceable 
class=PARAMETERtablename/replaceable
! CLUSTER replaceable class=PARAMETERtablename/replaceable
  CLUSTER
  /synopsis
   /refsynopsisdiv
--- 20,26 
  
   refsynopsisdiv

  synopsis
! CLUSTER replaceable class=PARAMETERtablename/replaceable [ USING replaceable 
class=PARAMETERindexname/replaceable ]
  CLUSTER
  /synopsis
   /refsynopsisdiv


We still need to document the old syntax, especially if we don't change 
the example as well.



Index: src/src/backend/parser/gram.y
===
*** src.orig/src/backend/parser/gram.y  2007-03-28 22:58:48.0 +0200
--- src/src/backend/parser/gram.y   2007-03-28 22:59:15.0 +0200
***
*** 209,215 
  
  %type str		relation_name copy_file_name

database_name access_method_clause 
access_method attr_name
!   index_name name file_name
  
  %type list	func_name handler_name qual_Op qual_all_Op subquery_Op

opt_class opt_validator
--- 209,215 
  
  %type str		relation_name copy_file_name

database_name access_method_clause 
access_method attr_name
!   index_name name file_name opt_cluster_using
  
  %type list	func_name handler_name qual_Op qual_all_Op subquery_Op

opt_class opt_validator


Is the placement of opt_cluster_using completely arbitrary? I'm not very 
familiar with the parser, it really looks like those type-definitions 
are in random order.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PATCHES] Small addition to PGInterval

2007-03-29 Thread Hartmut Benz

Hi all,

please find attached a small extension to PGInterval:
- getTimeInMills   - returns the number of milliseconds in the interval
- copying constructor PGInterval(PGInterval) for type-safe, cast-free 
cloning

- now implements ComaparablePGInterval

--
Hartmut Benz,
TI-WMC, http://www.ti-wmc.nl


PGInterval.diff.gz
Description: application/gzip

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-29 Thread Heikki Linnakangas
Here's an update, fixing conflict by Tom's recent commit of Simon's 
patch to skip WAL-inserts when archiving is not enabled.


Heikki Linnakangas wrote:
This patch makes CLUSTER MVCC-safe. Visibility information and update 
chains are preserved like in VACUUM FULL.


I created a new generic rewriteheap-facility to handle rewriting tables 
in a visibility-preserving manner. All the update chain tracking is done 
in rewriteheap.c, the caller is responsible for supplying the stream of 
tuples.


CLUSTER is currently the only user of the facility, but I'm envisioning 
we might have other users in the future. For example, a version of 
VACUUM FULL that rewrites the whole table. We could also use it to make 
ALTER TABLE MVCC-safe, but there's some issues with that. For example, 
what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.


One complication in the implementation was the fact that heap_insert 
overwrites the visibility information, and it doesn't write the full 
tuple header to WAL. I ended up implementing a special-purpose 
raw_heap_insert function instead, which is optimized for bulk inserting 
a lot of tuples, knowing that we have exclusive access to the heap. 
raw_heap_insert keeps the current buffer locked over calls, until it 
gets full, and inserts the whole page to WAL as a single record using 
the existing XLOG_HEAP_NEWPAGE record type.


This makes CLUSTER a more viable alternative to VACUUM FULL. One 
motivation for making CLUSTER MVCC-safe is that if some poor soul runs 
pg_dump to make a backup concurrently with CLUSTER, the clustered tables 
will appear to be empty in the dump file.


The documentation doesn't anything about CLUSTER not being MVCC-safe, so 
I suppose there's no need to touch the docs. I sent a doc patch earlier 
to add a note about it, that doc patch should still be applied to older 
release branches, IMO.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/heap/heapam.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.230
diff -c -r1.230 heapam.c
*** src/backend/access/heap/heapam.c	29 Mar 2007 00:15:37 -	1.230
--- src/backend/access/heap/heapam.c	29 Mar 2007 08:27:21 -
***
*** 3280,3285 
--- 3280,3322 
  	return log_heap_update(reln, oldbuf, from, newbuf, newtup, true);
  }
  
+ /*
+  * Perofrm XLogInsert of a full page image to WAL. Caller must make sure
+  * the page contents on disk are consistent with the page inserted to WAL.
+  */
+ XLogRecPtr
+ log_newpage(RelFileNode *rnode, BlockNumber blkno, Page page)
+ {
+ 	xl_heap_newpage xlrec;
+ 	XLogRecPtr	recptr;
+ 	XLogRecData rdata[2];
+ 
+ 	/* NO ELOG(ERROR) from here till newpage op is logged */
+ 	START_CRIT_SECTION();
+ 
+ 	xlrec.node = *rnode;
+ 	xlrec.blkno = blkno;
+ 
+ 	rdata[0].data = (char *) xlrec;
+ 	rdata[0].len = SizeOfHeapNewpage;
+ 	rdata[0].buffer = InvalidBuffer;
+ 	rdata[0].next = (rdata[1]);
+ 
+ 	rdata[1].data = (char *) page;
+ 	rdata[1].len = BLCKSZ;
+ 	rdata[1].buffer = InvalidBuffer;
+ 	rdata[1].next = NULL;
+ 
+ 	recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
+ 
+ 	PageSetLSN(page, recptr);
+ 	PageSetTLI(page, ThisTimeLineID);
+ 
+ 	END_CRIT_SECTION();
+ 
+ 	return recptr;
+ }
+ 
  static void
  heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record)
  {
Index: src/backend/access/nbtree/nbtsort.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtsort.c,v
retrieving revision 1.110
diff -c -r1.110 nbtsort.c
*** src/backend/access/nbtree/nbtsort.c	9 Jan 2007 02:14:10 -	1.110
--- src/backend/access/nbtree/nbtsort.c	20 Mar 2007 17:58:20 -
***
*** 64,69 
--- 64,70 
  
  #include postgres.h
  
+ #include access/heapam.h
  #include access/nbtree.h
  #include miscadmin.h
  #include storage/smgr.h
***
*** 265,296 
  	if (wstate-btws_use_wal)
  	{
  		/* We use the heap NEWPAGE record type for this */
! 		xl_heap_newpage xlrec;
! 		XLogRecPtr	recptr;
! 		XLogRecData rdata[2];
! 
! 		/* NO ELOG(ERROR) from here till newpage op is logged */
! 		START_CRIT_SECTION();
! 
! 		xlrec.node = wstate-index-rd_node;
! 		xlrec.blkno = blkno;
! 
! 		rdata[0].data = (char *) xlrec;
! 		rdata[0].len = SizeOfHeapNewpage;
! 		rdata[0].buffer = InvalidBuffer;
! 		rdata[0].next = (rdata[1]);
! 
! 		rdata[1].data = (char *) page;
! 		rdata[1].len = BLCKSZ;
! 		rdata[1].buffer = InvalidBuffer;
! 		rdata[1].next = NULL;
! 
! 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
! 
! 		PageSetLSN(page, recptr);
! 		PageSetTLI(page, ThisTimeLineID);
! 
! 		END_CRIT_SECTION();
  	}
  	else
  	{
--- 266,272 
  	if (wstate-btws_use_wal)
  	{
  		/* We use the heap NEWPAGE record type for this */
! 		log_newpage(wstate-index-rd_node, blkno, page);
  	}
  	else
  	

Re: [PATCHES] Small addition to PGInterval

2007-03-29 Thread Heikki Linnakangas

Hartmut Benz wrote:

please find attached a small extension to PGInterval:
- getTimeInMills   - returns the number of milliseconds in the interval
- copying constructor PGInterval(PGInterval) for type-safe, cast-free 
cloning

- now implements ComaparablePGInterval


This list is for patches to PostgreSQL server. Please send patches to 
the JDBC driver to the pgsql-jdbc mailing list.


PS. Thanks for the patch!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-29 Thread Koichi Suzuki

Hi, Here're some feedback to the comment:

Simon Riggs wrote:

On Wed, 2007-03-28 at 10:54 +0900, Koichi Suzuki wrote:


As written below, full page write can be
categolized as follows:

1) Needed for crash recovery: first page update after each checkpoint.
This has to be kept in WAL.

2) Needed for archive recovery: page update between pg_start_backup and
pg_stop_backup. This has to be kept in archive log.

3) For log-shipping slave such as pg_standby: no full page writes will
be needed for this purpose.

My proposal deals with 2). So, if we mark each full_page_write, I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.


I'm very happy with this proposal, though I do still have some points in
detailed areas.

If you accept that 1  2 are valid goals, then 1  3 or 1, 2  3 are
also valid goals, ISTM. i.e. you might choose to use full_page_writes on
the primary and yet would like to see optimised data transfer to the
standby server. In that case, you would need the mark.


Yes, I need the mark.  In my proposal, only unmarked full-page-writes, 
which were written as the first update after a checkpoint, are to be 
removed offline (pg_compresslog).





- Not sure why we need full_page_compress, why not just mark them
always? That harms noone. (Did someone else ask for that? If so, keep
it)

No, no one asked to have a separate option. There'll be no bad
influence to do so.  So, if we mark each full_page_write, I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.


OK, different question: 
Why would anyone ever set full_page_compress = off? 


Why have a parameter that does so little? ISTM this is:

i) one more thing to get wrong

ii) cheaper to mark the block when appropriate than to perform the if()
test each time. That can be done only in the path where backup blocks
are present.

iii) If we mark the blocks every time, it allows us to do an offline WAL
compression. If the blocks aren't marked that option is lost. The bit is
useful information, so we should have it in all cases.


Not only full-page-writes are written as WAL record.   In my proposal, 
both full-page-writes and logical log are written in a WAL record, which 
will make WAL size slightly bigger (five percent or so).   If 
full_page_compress = off, only a full-page-write will be written in a 
WAL record.   I thought someone will not be happy with this size growth.


I agree to make this mandatory if every body is happy with extra logical 
 log in WAL records with full page writes.


I'd like to have your opinion.




- OTOH I'd like to see an explicit parameter set during recovery since
you're asking the main recovery path to act differently in case a single
bit is set/unset. If you are using that form of recovery, we should say
so explicitly, to keep everybody else safe.

Only one thing I had to do is to create dummy full page write to
maintain LSNs. Full page writes are omitted in archive log. We have to
LSNs same as those in the original WAL. In this case, recovery has to
read logical log, not dummy full page writes. On the other hand, if
both logical log and real full page writes are found in a log record,
the recovery has to use real full page writes.


I apologise for not understanding your reply, perhaps my original
request was unclear.

In recovery.conf, I'd like to see a parameter such as

dummy_backup_blocks = off (default) | on

to explicitly indicate to the recovery process that backup blocks are
present, yet they are garbage and should be ignored. Having garbage data
within the system is potentially dangerous and I want to be told by the
user that they were expecting that and its OK to ignore that data.
Otherwise I want to throw informative errors. Maybe it seems OK now, but
the next change to the system may have unintended consequences and it
may not be us making the change. It's OK the Alien will never escape
from the lab is the starting premise for many good sci-fi horrors and I
want to watch them, not be in one myself. :-)

We can call it other things, of course. e.g.
ignore_dummy_blocks
decompressed_blocks
apply_backup_blocks


So far, we don't need any modification to the recovery and redo 
functions.   They ignore the dummy and apply logical logs.   Also, if 
there are both full page writes and logical log, current recovery 
selects full page writes to apply.


I agree to introduce this option if 8.3 code introduces any conflict to 
the current.   Or, we could introduce this option for future safety.  Do 
you think we should introduce this option?


If this should be introduced now, what we should do is to check this 
option when dummy full-page-write appears.






Yes I believe so. As pg_standby does not include any chance to meet
partial writes of pages, I believe you can omit all the full page
writes. Of course, as Tom Lange suggested in
http://archives.postgresql.org/pgsql-hackers/2007-02/msg00034.php
removing full 

Re: [PATCHES] ecpg threading vs win32

2007-03-29 Thread Michael Meskes
On Mon, Mar 19, 2007 at 09:48:19AM +0100, Magnus Hagander wrote:
  Q2. Do we need to use PQescapeStringConn() instead of PQescapeString()?
 PQescapeString() is used to escape literals, and the documentation says
 PQescapeStringConn() should be used in multi-threaded client programs.
  
  http://momjian.us/main/writings/pgsql/sgml/libpq-exec.html#LIBPQ-EXEC-ESCAPE-STRING
 | PQescapeString can be used safely in single-threaded client programs
 | that work with only one PostgreSQL connection at a time 
 
 Seems so, but that's unrelated to this patch ;-) I'll leave the final
 comment on that up to Michael.

Looking at the source code it seems to me that the connection argument
is only used once in PQescapeStringInternal which both functions call.
Here's the snippet:

if (conn)
printfPQExpBuffer(conn-errorMessage,
  libpq_gettext(incomplete multibyte character\n));

So this essantially is only to get an error message into the connection
structure. However, having an empty connection makes PQescapeStringConn
return an error and an empty string which I consider a problem. I have
to check that in detail but we may run into a usage of
PQescapeString without an open connection which then would fail.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-29 Thread Simon Riggs
On Thu, 2007-03-29 at 17:50 +0900, Koichi Suzuki wrote:

 Not only full-page-writes are written as WAL record.   In my proposal, 
 both full-page-writes and logical log are written in a WAL record, which 
 will make WAL size slightly bigger (five percent or so).   If 
 full_page_compress = off, only a full-page-write will be written in a 
 WAL record.   I thought someone will not be happy with this size growth.

OK, I see what you're doing now and agree with you that we do need a
parameter. Not sure about the name you've chosen though - it certainly
confused me until you explained.

A parameter called ..._compress indicates to me that it would reduce
something in size whereas what it actually does is increase the size of
WAL slightly. We should have a parameter name that indicates what it
actually does, otherwise some people will choose to use this parameter
even when they are not using archive_command with pg_compresslog.

Some possible names...

additional_wal_info = 'COMPRESS'
add_wal_info
wal_additional_info
wal_auxiliary_info
wal_extra_data
attach_wal_info
...
others?

I've got some ideas for the future for adding additional WAL info for
various purposes, so it might be useful to have a parameter that can
cater for multiple types of additional WAL data. Or maybe we go for
something more specific like

wal_add_compress_info = on
wal_add__info ...

  In recovery.conf, I'd like to see a parameter such as
  
  dummy_backup_blocks = off (default) | on
  
  to explicitly indicate to the recovery process that backup blocks are
  present, yet they are garbage and should be ignored. Having garbage data
  within the system is potentially dangerous and I want to be told by the
  user that they were expecting that and its OK to ignore that data.
  Otherwise I want to throw informative errors. Maybe it seems OK now, but
  the next change to the system may have unintended consequences and it
  may not be us making the change. It's OK the Alien will never escape
  from the lab is the starting premise for many good sci-fi horrors and I
  want to watch them, not be in one myself. :-)
  
  We can call it other things, of course. e.g.
  ignore_dummy_blocks
  decompressed_blocks
  apply_backup_blocks
 
 So far, we don't need any modification to the recovery and redo 
 functions.   They ignore the dummy and apply logical logs.   Also, if 
 there are both full page writes and logical log, current recovery 
 selects full page writes to apply.
 
 I agree to introduce this option if 8.3 code introduces any conflict to 
 the current.   Or, we could introduce this option for future safety.  Do 
 you think we should introduce this option?

Yes. You are skipping a correctness test and that should be by explicit
command only. It's no problem to include that as well, since you are
already having to specify pg_... decompress... but the recovery process
doesn't know whether or not you've done that.

 Anyway, could you try to run pg_standby with pg_compresslog and 
 pg_decompresslog?

After freeze, yes.

 
 Additional recomment on page header removal:
 
 I found that it is not simple to keep page header in the compressed 
 archive log.   Because we eliminate unmarked full page writes and shift 
 the rest of the WAL file data, it is not simple to keep page header as 
 the page header in the compressed archive log.   It is much simpler to 
 remove page header as well and rebuild them.   I'd like to keep current 
 implementation in this point.

OK.

This is a good feature. Thanks for your patience with my comments.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PATCHES] bgwriter stats

2007-03-29 Thread Magnus Hagander
Attached is a new version of the bgwriter stats patch. Per previous
discussion, now uses the stats system only. Introduces a new stats message
for bgwriter, and also introduces a global stats part of the stats file
for statistics not bound to a database or table.

I've included a couple of more counters per ideas from Greg Smith in his
logging patch.

Unless there are objections, I'll go ahead and apply this version of the
patch (once the documentation is written, of course).

//Magnus

Index: src/backend/catalog/system_views.sql
===
RCS file: /cvsroot/pgsql/src/backend/catalog/system_views.sql,v
retrieving revision 1.36
diff -c -r1.36 system_views.sql
*** src/backend/catalog/system_views.sql16 Mar 2007 17:57:36 -  
1.36
--- src/backend/catalog/system_views.sql29 Mar 2007 13:21:54 -
***
*** 364,366 
--- 364,376 
  pg_stat_get_db_tuples_updated(D.oid) AS tup_updated,
  pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted
  FROM pg_database D;
+ 
+ CREATE VIEW pg_stat_bgwriter AS
+   SELECT
+   pg_stat_get_bgwriter_timed_checkpoints() AS 
checkpoints_timed,
+   pg_stat_get_bgwriter_requested_checkpoints() AS 
checkpoints_req,
+   pg_stat_get_bgwriter_buf_written_checkpoints() AS 
buffers_checkpoint,
+   pg_stat_get_bgwriter_buf_written_lru() AS buffers_lru,
+   pg_stat_get_bgwriter_buf_written_all() AS buffers_all,
+   pg_stat_get_bgwriter_maxwritten_lru() AS maxwritten_lru,
+   pg_stat_get_bgwriter_maxwritten_all() AS maxwritten_all;
Index: src/backend/postmaster/bgwriter.c
===
RCS file: /cvsroot/pgsql/src/backend/postmaster/bgwriter.c,v
retrieving revision 1.36
diff -c -r1.36 bgwriter.c
*** src/backend/postmaster/bgwriter.c   17 Jan 2007 16:25:01 -  1.36
--- src/backend/postmaster/bgwriter.c   29 Mar 2007 13:21:57 -
***
*** 50,55 
--- 50,56 
  #include access/xlog_internal.h
  #include libpq/pqsignal.h
  #include miscadmin.h
+ #include pgstat.h
  #include postmaster/bgwriter.h
  #include storage/fd.h
  #include storage/freespace.h
***
*** 125,130 
--- 126,138 
  static BgWriterShmemStruct *BgWriterShmem;
  
  /*
+  * BgWriter statistic counters, as sent to the stats collector
+  * Stored directly in a stats message structure so it can be sent
+  * without nedeing to copy things around.
+  */
+ PgStat_MsgBgWriter BgWriterStats;
+ 
+ /*
   * GUC parameters
   */
  int   BgWriterDelay = 200;
***
*** 243,248 
--- 251,261 
MemoryContextSwitchTo(bgwriter_context);
  
/*
+* Initialize statistics counters to zero
+*/
+   memset(BgWriterStats, 0, sizeof(BgWriterStats));
+ 
+   /*
 * If an exception is encountered, processing resumes here.
 *
 * See notes in postgres.c about the design of this coding.
***
*** 354,359 
--- 367,373 
checkpoint_requested = false;
do_checkpoint = true;
force_checkpoint = true;
+   BgWriterStats.m_requested_checkpoints++;
}
if (shutdown_requested)
{
***
*** 376,382 
--- 390,400 
now = time(NULL);
elapsed_secs = now - last_checkpoint_time;
if (elapsed_secs = CheckPointTimeout)
+   {
do_checkpoint = true;
+   if (!force_checkpoint)
+   BgWriterStats.m_timed_checkpoints++;
+   }
  
/*
 * Do a checkpoint if requested, otherwise do one cycle of
***
*** 474,479 
--- 492,502 
}
  
/*
+* Send of activity statistics to the stats collector
+*/
+   pgstat_send_bgwriter();
+ 
+   /*
 * Nap for the configured time, or sleep for 10 seconds if 
there is no
 * bgwriter activity configured.
 *
***
*** 789,791 
--- 812,815 
  
END_CRIT_SECTION();
  }
+ 
Index: src/backend/postmaster/pgstat.c
===
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.151
diff -c -r1.151 pgstat.c
*** src/backend/postmaster/pgstat.c 28 Mar 2007 22:17:12 -  1.151
--- src/backend/postmaster/pgstat.c 29 Mar 2007 13:22:00 -
***
*** 135,140 
--- 135,150 
  static PgBackendStatus *localBackendStatusTable = NULL;
  static intlocalNumBackends = 0;
  
+ /*
+  * BgWriter global 

Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-29 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Heikki Linnakangas wrote:
 Here's an update, fixing conflict by Tom's recent commit of Simon's 
 patch to skip WAL-inserts when archiving is not enabled.
 
 Heikki Linnakangas wrote:
  This patch makes CLUSTER MVCC-safe. Visibility information and update 
  chains are preserved like in VACUUM FULL.
  
  I created a new generic rewriteheap-facility to handle rewriting tables 
  in a visibility-preserving manner. All the update chain tracking is done 
  in rewriteheap.c, the caller is responsible for supplying the stream of 
  tuples.
  
  CLUSTER is currently the only user of the facility, but I'm envisioning 
  we might have other users in the future. For example, a version of 
  VACUUM FULL that rewrites the whole table. We could also use it to make 
  ALTER TABLE MVCC-safe, but there's some issues with that. For example, 
  what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.
  
  One complication in the implementation was the fact that heap_insert 
  overwrites the visibility information, and it doesn't write the full 
  tuple header to WAL. I ended up implementing a special-purpose 
  raw_heap_insert function instead, which is optimized for bulk inserting 
  a lot of tuples, knowing that we have exclusive access to the heap. 
  raw_heap_insert keeps the current buffer locked over calls, until it 
  gets full, and inserts the whole page to WAL as a single record using 
  the existing XLOG_HEAP_NEWPAGE record type.
  
  This makes CLUSTER a more viable alternative to VACUUM FULL. One 
  motivation for making CLUSTER MVCC-safe is that if some poor soul runs 
  pg_dump to make a backup concurrently with CLUSTER, the clustered tables 
  will appear to be empty in the dump file.
  
  The documentation doesn't anything about CLUSTER not being MVCC-safe, so 
  I suppose there's no need to touch the docs. I sent a doc patch earlier 
  to add a note about it, that doc patch should still be applied to older 
  release branches, IMO.
 
 -- 
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com


 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] bgwriter stats

2007-03-29 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Attached is a new version of the bgwriter stats patch. Per previous
 discussion, now uses the stats system only. Introduces a new stats message
 for bgwriter, and also introduces a global stats part of the stats file
 for statistics not bound to a database or table.

It looks sane in a quick once-over, but the comments need some love --- too
many typos and misspellings for my taste.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] bgwriter stats

2007-03-29 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 Attached is a new version of the bgwriter stats patch. Per previous
 discussion, now uses the stats system only. Introduces a new stats message
 for bgwriter, and also introduces a global stats part of the stats file
 for statistics not bound to a database or table.
 
 It looks sane in a quick once-over, but the comments need some love --- too
 many typos and misspellings for my taste.

Ok. I'll be sure to work them over before I commit. Thanks for the quick
check.

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] LIMIT/SORT optimization

2007-03-29 Thread Gregory Stark

Updated patch attached:

1) Removes #if 0 optimizations

2) Changes #if 0 to #if NOT_USED for code that's there for completeness and to
   keep the code self-documenting purposes rather but isn't needed by anything
   currently

3) Fixed cost model to represent bounded sorts




sort-limit-v7.patch.gz
Description: Binary data


Gregory Stark [EMAIL PROTECTED] writes:

 Heikki Linnakangas [EMAIL PROTECTED] writes:

 There's a few blocks of code surrounded with #if 0 - #endif. Are those just
 leftovers that should be removed, or are things that still need to finished 
 and
 enabled?

 Uhm, I don't remember, will go look, thanks.

Ok, they were left over code from an optimization that I decided wasn't very
important to pursue. The code that was ifdef'd out detected when disk sorts
could abort a disk sort merge because it had already generated enough tuples
for to satisfy the limit. 

But I never wrote the code to actually abort the run and it looks a bit
tricky. In any case the disk sort use case is extremely narrow, you would need
something like LIMIT 5 or more to do it and it would have to be a an
input table huge enough to cause multiple rounds of merges.


I think I've figured out how to adjust the cost model. It turns out that it
doesn't usually matter whether the cost model is correct since any case where
the optimization kicks in is a case you're reading a small portion of the
input so it's a case where an index would be *much* better if available. So
the only times the optimization is used is when there's no index available.
Nonetheless it's nice to get the estimates right so that higher levels in the
plan get reasonable values.

I think I figured out how to do the cost model. At least the results are
reasonable. I'm not sure if I've done it the right way though.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] tsearch_core patch for inclusion

2007-03-29 Thread Teodor Sigaev

http://www.sigaev.ru/misc/tsearch_core-0.43.gz

Changes:
1 Ispell dictionary now supports hunspell dictionary's  format which is used by
  OpenOffice = 2.0.2
  http://wiki.services.openoffice.org/wiki/Dictionaries
  Changes in format is addressed, basically, to better support of compound words
  ( German, Norwegian ). So, please, test it - we don't know that languages at
  all.
2 added recent fixes of contrib/tsearch2
3 fix usage of fopen/fclose

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [PATCH] add CLUSTER table USING index (take 2)

2007-03-29 Thread Bruce Momjian

FYI, this is a great example of valuable patch review.

---

Heikki Linnakangas wrote:
 Holger Schurig wrote:
  Index: src/doc/src/sgml/ref/cluster.sgml
  ===
  *** src.orig/doc/src/sgml/ref/cluster.sgml  2007-03-28 23:02:12.0 
  +0200
  --- src/doc/src/sgml/ref/cluster.sgml   2007-03-28 23:03:14.0 
  +0200
  ***
  *** 20,27 

 refsynopsisdiv
synopsis
  ! CLUSTER replaceable class=PARAMETERindexname/replaceable ON 
  replaceable class=PARAMETERtablename/replaceable
  ! CLUSTER replaceable class=PARAMETERtablename/replaceable
CLUSTER
/synopsis
 /refsynopsisdiv
  --- 20,26 

 refsynopsisdiv
synopsis
  ! CLUSTER replaceable class=PARAMETERtablename/replaceable [ USING 
  replaceable class=PARAMETERindexname/replaceable ]
CLUSTER
/synopsis
 /refsynopsisdiv
 
 We still need to document the old syntax, especially if we don't change 
 the example as well.
 
  Index: src/src/backend/parser/gram.y
  ===
  *** src.orig/src/backend/parser/gram.y  2007-03-28 22:58:48.0 
  +0200
  --- src/src/backend/parser/gram.y   2007-03-28 22:59:15.0 +0200
  ***
  *** 209,215 

%type str   relation_name copy_file_name
  database_name access_method_clause 
  access_method attr_name
  !   index_name name file_name

%type list  func_name handler_name qual_Op qual_all_Op subquery_Op
  opt_class opt_validator
  --- 209,215 

%type str   relation_name copy_file_name
  database_name access_method_clause 
  access_method attr_name
  !   index_name name file_name opt_cluster_using

%type list  func_name handler_name qual_Op qual_all_Op subquery_Op
  opt_class opt_validator
 
 Is the placement of opt_cluster_using completely arbitrary? I'm not very 
 familiar with the parser, it really looks like those type-definitions 
 are in random order.
 
 -- 
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] [HACKERS] tsearch_core patch for inclusion

2007-03-29 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Teodor Sigaev wrote:
 http://www.sigaev.ru/misc/tsearch_core-0.43.gz
 
 Changes:
 1 Ispell dictionary now supports hunspell dictionary's  format which is used 
 by
OpenOffice = 2.0.2
http://wiki.services.openoffice.org/wiki/Dictionaries
Changes in format is addressed, basically, to better support of compound 
 words
( German, Norwegian ). So, please, test it - we don't know that languages 
 at
all.
 2 added recent fixes of contrib/tsearch2
 3 fix usage of fopen/fclose
 
 -- 
 Teodor Sigaev   E-mail: [EMAIL PROTECTED]
 WWW: http://www.sigaev.ru/
 
 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
 http://www.postgresql.org/about/donate

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] [PATCH] add CLUSTER table USING index (take 2)

2007-03-29 Thread Holger Schurig
 FYI, this is a great example of valuable patch review.

It would have been better if the TODO entry would have
been rigth :-)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [PATCH] add CLUSTER table USING index (take 2)

2007-03-29 Thread Holger Schurig
 We still need to document the old syntax, especially if we don't change 
 the example as well.

I agree that the example should be re-written. But I'm not sure if I need
to have a paragraph about the old syntax. There are two reasons:

- I haven't seen any other SQL command where an old syntax was
  documented

- I thought I could come away without writing doc. After all, I'm
  not a native english speaker. That's a point where I could need
  some help ...  (maybe my english is good enought, but it's not
  worth to make a take 4 to take 17 patch just for english
  grammar, typos, subtle meanings, whatever.




  Index: src/src/backend/parser/gram.y
  ===
  *** src.orig/src/backend/parser/gram.y  2007-03-28 22:58:48.0 
  +0200
  --- src/src/backend/parser/gram.y   2007-03-28 22:59:15.0 +0200
  ***
  *** 209,215 

%type str   relation_name copy_file_name
  database_name access_method_clause 
  access_method attr_name
  !   index_name name file_name

%type list  func_name handler_name qual_Op qual_all_Op subquery_Op
  opt_class opt_validator
  --- 209,215 

%type str   relation_name copy_file_name
  database_name access_method_clause 
  access_method attr_name
  !   index_name name file_name opt_cluster_using

%type list  func_name handler_name qual_Op qual_all_Op subquery_Op
  opt_class opt_validator
 
 Is the placement of opt_cluster_using completely arbitrary? I'm not very 
 familiar with the parser, it really looks like those type-definitions 
 are in random order.

I thought so. As you can see in the above patch, there are things
like opt_validator in the next %type list section.

There are many other %type str section in gram.y, but I haven't
found a structure yet. For example, some tokens are named
OptSchemaName, some are named opt_encoding. Let's look at
this one. It's used in line 1090, defined in 1218. Before and
after the usage there is transaction_mode_list and
Colid_or_Sconst. Before and after the definition is
zone_value and again ColId_or_Sconst. But neither of this
three is defined at the same %type str as opt_encoding
is.


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] [PATCH] add CLUSTER table USING index (take 3)

2007-03-29 Thread Holger Schurig
SGML ref text (swapped parameter list, changed example text)

Also, I noticed that the text of the example spoke about a
table employees, but the example used the table emp. I
fixed this inconsistency.


Index: src/doc/src/sgml/ref/cluster.sgml
===
*** src.orig/doc/src/sgml/ref/cluster.sgml  2007-03-28 23:03:20.0 
+0200
--- src/doc/src/sgml/ref/cluster.sgml   2007-03-29 21:32:26.0 +0200
***
*** 20,27 
  
   refsynopsisdiv
  synopsis
! CLUSTER replaceable class=PARAMETERindexname/replaceable ON 
replaceable class=PARAMETERtablename/replaceable
! CLUSTER replaceable class=PARAMETERtablename/replaceable
  CLUSTER
  /synopsis
   /refsynopsisdiv
--- 20,26 
  
   refsynopsisdiv
  synopsis
! CLUSTER replaceable class=PARAMETERtablename/replaceable [ USING 
replaceable class=PARAMETERindexname/replaceable ]
  CLUSTER
  /synopsis
   /refsynopsisdiv
***
*** 77,95 
  
variablelist
 varlistentry
! termreplaceable class=PARAMETERindexname/replaceable/term
  listitem
   para
!   The name of an index.
   /para
  /listitem
 /varlistentry
  
 varlistentry
! termreplaceable class=PARAMETERtablename/replaceable/term
  listitem
   para
!   The name (possibly schema-qualified) of a table.
   /para
  /listitem
 /varlistentry
--- 76,94 
  
variablelist
 varlistentry
! termreplaceable class=PARAMETERtablename/replaceable/term
  listitem
   para
!   The name (possibly schema-qualified) of a table.
   /para
  /listitem
 /varlistentry
  
 varlistentry
! termreplaceable class=PARAMETERindexname/replaceable/term
  listitem
   para
!   The name of an index.
   /para
  /listitem
 /varlistentry
***
*** 172,180 
  
para
 Cluster the table literalemployees/literal on the basis of
!its index literalemp_ind/literal:
  programlisting
! CLUSTER emp_ind ON emp;
  /programlisting
/para
  
--- 171,179 
  
para
 Cluster the table literalemployees/literal on the basis of
!its index literalemployees_ind/literal:
  programlisting
! CLUSTER employees USING employees_ind;
  /programlisting
/para
  
***
*** 182,188 
 Cluster the literalemployees/literal table using the same
 index that was used before:
  programlisting
! CLUSTER emp;
  /programlisting
/para
  
--- 181,187 
 Cluster the literalemployees/literal table using the same
 index that was used before:
  programlisting
! CLUSTER employees;
  /programlisting
/para
  
Index: src/src/backend/parser/gram.y
===
*** src.orig/src/backend/parser/gram.y  2007-03-28 23:03:20.0 +0200
--- src/src/backend/parser/gram.y   2007-03-28 23:03:35.0 +0200
***
*** 209,215 
  
  %type str   relation_name copy_file_name
database_name access_method_clause 
access_method attr_name
!   index_name name file_name
  
  %type list  func_name handler_name qual_Op qual_all_Op subquery_Op
opt_class opt_validator
--- 209,215 
  
  %type str   relation_name copy_file_name
database_name access_method_clause 
access_method attr_name
!   index_name name file_name opt_cluster_using
  
  %type list  func_name handler_name qual_Op qual_all_Op subquery_Op
opt_class opt_validator
***
*** 5327,5332 
--- 5327,5333 
   *
   *QUERY:
   *cluster index_name on qualified_name
+  *cluster qualified_name USING index_name
   *cluster qualified_name
   *cluster
   *
***
*** 5340,5350 
   n-indexname = $2;
   $$ = (Node*)n;
}
!   | CLUSTER qualified_name
{
   ClusterStmt *n = makeNode(ClusterStmt);
   n-relation = $2;
!  n-indexname = NULL;
   $$ = (Node*)n;
}
| CLUSTER
--- 5341,5351 
   n-indexname = $2;
   $$ = (Node*)n;
}
!   | CLUSTER qualified_name opt_cluster_using
{
   ClusterStmt *n = makeNode(ClusterStmt);
   n-relation = $2;
!  n-indexname = $3;
  

Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-29 Thread Josh Berkus
Simon,

 OK, different question:
 Why would anyone ever set full_page_compress = off?

The only reason I can see is if compression costs us CPU but gains RAM  
I/O.  I can think of a lot of applications ... benchmarks included ... 
which are CPU-bound but not RAM or I/O bound.  For those applications, 
compression is a bad tradeoff.

If, however, CPU used for compression is made up elsewhere through smaller 
file processing, then I'd agree that we don't need a switch.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-29 Thread Simon Riggs
On Thu, 2007-03-29 at 11:45 -0700, Josh Berkus wrote:

  OK, different question:
  Why would anyone ever set full_page_compress = off?
 
 The only reason I can see is if compression costs us CPU but gains RAM  
 I/O.  I can think of a lot of applications ... benchmarks included ... 
 which are CPU-bound but not RAM or I/O bound.  For those applications, 
 compression is a bad tradeoff.
 
 If, however, CPU used for compression is made up elsewhere through smaller 
 file processing, then I'd agree that we don't need a switch.

Koichi-san has explained things for me now.

I misunderstood what the parameter did and reading your post, ISTM you
have as well. I do hope Koichi-san will alter the name to allow
everybody to understand what it does.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] [PATCH] add CLUSTER table USING index (take 2)

2007-03-29 Thread Bruce Momjian
Holger Schurig wrote:
  FYI, this is a great example of valuable patch review.
 
 It would have been better if the TODO entry would have
 been rigth :-)

It was right when I wrote it.  ;-)  I have updated it now.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] DEALLOCATE ALL

2007-03-29 Thread Bruce Momjian
Marko Kreen wrote:
 Then a pooler argument: there is one pooler where RandomJoe executes
 queries and another for specific app where the subset of SQL it uses is
 known.  I want to RESET only specific things in app case.  So it would be
 good if the RESET-s for specific areas would be available.
 
 Also the objections to the Hans' patch give impression that different
 pooling solutions want different RESET EVERYTHING, so again,
 it would be good if RESET-s for different areas are available
 and the all-encomassing RESET EVERYTHING just ties all the specific
 RESETs together.

Totally agree.  Please make the adjustments to DEALLOCATE ALL, and roll
that into the 2004 patch for RESET SESSION and post and updated version.
Thanks.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [PATCH] add CLUSTER table USING index (take 3)

2007-03-29 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Holger Schurig wrote:
 SGML ref text (swapped parameter list, changed example text)
 
 Also, I noticed that the text of the example spoke about a
 table employees, but the example used the table emp. I
 fixed this inconsistency.
 
 
 Index: src/doc/src/sgml/ref/cluster.sgml
 ===
 *** src.orig/doc/src/sgml/ref/cluster.sgml2007-03-28 23:03:20.0 
 +0200
 --- src/doc/src/sgml/ref/cluster.sgml 2007-03-29 21:32:26.0 +0200
 ***
 *** 20,27 
   
refsynopsisdiv
   synopsis
 ! CLUSTER replaceable class=PARAMETERindexname/replaceable ON 
 replaceable class=PARAMETERtablename/replaceable
 ! CLUSTER replaceable class=PARAMETERtablename/replaceable
   CLUSTER
   /synopsis
/refsynopsisdiv
 --- 20,26 
   
refsynopsisdiv
   synopsis
 ! CLUSTER replaceable class=PARAMETERtablename/replaceable [ USING 
 replaceable class=PARAMETERindexname/replaceable ]
   CLUSTER
   /synopsis
/refsynopsisdiv
 ***
 *** 77,95 
   
 variablelist
  varlistentry
 ! termreplaceable class=PARAMETERindexname/replaceable/term
   listitem
para
 !   The name of an index.
/para
   /listitem
  /varlistentry
   
  varlistentry
 ! termreplaceable class=PARAMETERtablename/replaceable/term
   listitem
para
 !   The name (possibly schema-qualified) of a table.
/para
   /listitem
  /varlistentry
 --- 76,94 
   
 variablelist
  varlistentry
 ! termreplaceable class=PARAMETERtablename/replaceable/term
   listitem
para
 !   The name (possibly schema-qualified) of a table.
/para
   /listitem
  /varlistentry
   
  varlistentry
 ! termreplaceable class=PARAMETERindexname/replaceable/term
   listitem
para
 !   The name of an index.
/para
   /listitem
  /varlistentry
 ***
 *** 172,180 
   
 para
  Cluster the table literalemployees/literal on the basis of
 !its index literalemp_ind/literal:
   programlisting
 ! CLUSTER emp_ind ON emp;
   /programlisting
 /para
   
 --- 171,179 
   
 para
  Cluster the table literalemployees/literal on the basis of
 !its index literalemployees_ind/literal:
   programlisting
 ! CLUSTER employees USING employees_ind;
   /programlisting
 /para
   
 ***
 *** 182,188 
  Cluster the literalemployees/literal table using the same
  index that was used before:
   programlisting
 ! CLUSTER emp;
   /programlisting
 /para
   
 --- 181,187 
  Cluster the literalemployees/literal table using the same
  index that was used before:
   programlisting
 ! CLUSTER employees;
   /programlisting
 /para
   
 Index: src/src/backend/parser/gram.y
 ===
 *** src.orig/src/backend/parser/gram.y2007-03-28 23:03:20.0 
 +0200
 --- src/src/backend/parser/gram.y 2007-03-28 23:03:35.0 +0200
 ***
 *** 209,215 
   
   %type str relation_name copy_file_name
   database_name access_method_clause 
 access_method attr_name
 ! index_name name file_name
   
   %type listfunc_name handler_name qual_Op qual_all_Op subquery_Op
   opt_class opt_validator
 --- 209,215 
   
   %type str relation_name copy_file_name
   database_name access_method_clause 
 access_method attr_name
 ! index_name name file_name opt_cluster_using
   
   %type listfunc_name handler_name qual_Op qual_all_Op subquery_Op
   opt_class opt_validator
 ***
 *** 5327,5332 
 --- 5327,5333 
*
*  QUERY:
*  cluster index_name on qualified_name
 +  *  cluster qualified_name USING index_name
*  cluster qualified_name
*  cluster
*
 ***
 *** 5340,5350 
  n-indexname = $2;
  $$ = (Node*)n;
   }
 ! | CLUSTER qualified_name
   {
  ClusterStmt *n = makeNode(ClusterStmt);
  n-relation = $2;
 !n-indexname = NULL;
  $$ = (Node*)n;
   }
   | CLUSTER
 --- 5341,5351 
 

Re: [PATCHES] DEALLOCATE ALL

2007-03-29 Thread Neil Conway

Marko Kreen wrote:

When pooling connections where prepared statements are in use,
it is hard to give new client totally clean connection as
there may be allocated statements that give errors when
new client starts preparing statements again.


I agree with the other comments that RESET SESSION is the right API for 
this, although we can also provide DEALLOCATE ALL, I suppose.


As to the implementation, calling hash_remove() in a loop seems a pretty 
unfortunate way to clear a hash table -- adding a hash_reset() function 
to the dynahash API would be cleaner and faster.


-Neil


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] Small code clean-up

2007-03-29 Thread Bruce Momjian

I have applied the initdb part of this patch, and Magnus did the
win32_shmem part.  Thanks.

---

ITAGAKI Takahiro wrote:
 Here are two small code clean-up in initdb and win32_shmem.
 
 pg_char_to_encoding() was redundant in initdb because
 pg_valid_server_encoding() returns the same result if the encoding is valid,
 
 Changes in win32_shmem suppress the following warnings.
 | pg_shmem.c: In function `PGSharedMemoryCreate':
 | pg_shmem.c:137: warning: long unsigned int format, Size arg (arg 2)
 | pg_shmem.c:159: warning: long unsigned int format, Size arg (arg 2)
 
 
 *** initdb.c.orig Mon Mar 19 01:50:43 2007
 --- initdb.c  Wed Mar 28 10:02:42 2007
 *** get_encoding_id(char *encoding_name)
 *** 709,716 
   
   if (encoding_name  *encoding_name)
   {
 ! if ((enc = pg_char_to_encoding(encoding_name)) = 0 
 ! pg_valid_server_encoding(encoding_name) = 0)
   return encodingid_to_string(enc);
   }
   fprintf(stderr, _(%s: \%s\ is not a valid server encoding name\n),
 --- 709,715 
   
   if (encoding_name  *encoding_name)
   {
 ! if ((enc = pg_valid_server_encoding(encoding_name)) = 0)
   return encodingid_to_string(enc);
   }
   fprintf(stderr, _(%s: \%s\ is not a valid server encoding name\n),
 
 
 *** win32_shmem.c.origWed Mar 28 10:17:14 2007
 --- win32_shmem.c Wed Mar 28 10:16:36 2007
 *** PGSharedMemoryCreate(Size size, bool mak
 *** 136,142 
   if (!hmap)
   ereport(FATAL,
   (errmsg(could not create shared memory 
 segment: %lu, GetLastError()),
 !  errdetail(Failed system call was 
 CreateFileMapping(size=%lu, name=%s), size, szShareMem)));
   
   /*
* If the segment already existed, CreateFileMapping() will return a
 --- 136,142 
   if (!hmap)
   ereport(FATAL,
   (errmsg(could not create shared memory 
 segment: %lu, GetLastError()),
 !  errdetail(Failed system call was 
 CreateFileMapping(size=%lu, name=%s), (unsigned long) size, szShareMem)));
   
   /*
* If the segment already existed, CreateFileMapping() will return a
 *** PGSharedMemoryCreate(Size size, bool mak
 *** 158,164 
   if (!hmap)
   ereport(FATAL,
   (errmsg(could not create shared memory 
 segment: %lu, GetLastError()),
 !  errdetail(Failed system call was 
 CreateFileMapping(size=%lu, name=%s), size, szShareMem)));
   
   if (GetLastError() == ERROR_ALREADY_EXISTS)
   ereport(FATAL,
 --- 158,164 
   if (!hmap)
   ereport(FATAL,
   (errmsg(could not create shared memory 
 segment: %lu, GetLastError()),
 !  errdetail(Failed system call was 
 CreateFileMapping(size=%lu, name=%s), (unsigned long) size, szShareMem)));
   
   if (GetLastError() == ERROR_ALREADY_EXISTS)
   ereport(FATAL,
 
 
 Regards,
 ---
 ITAGAKI Takahiro
 NTT Open Source Software Center
 
 
 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
 http://www.postgresql.org/about/donate

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Patch for circular buffer in tuplestore to optimize merge joins (v1)

2007-03-29 Thread Bruce Momjian

Via IM, author says it is ready.

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


stark wrote:
 
 This patch implements a circular buffer in tuplestore which drops old tuples
 as they're no longer needed. It uses this for merge joins to avoid having to
 spill the tuplestore if no single value exceeds work_mem. It also is what's
 needed for both recursive query support and OLAP window functions (hence why
 it implements the more complex circular buffer rather than just moving the
 single tuple up to the head of the buffer).
 
 This was mostly already done by Simon, I just finished the logic in 
 tuplesort.c.
 
 This is actually not quite polished so I guess it's still a WIP but it's
 certainly ready to be reviewed. All that remains is polishing. If there's
 anything in there people object to now I would like to know. 
 

[ Attachment, skipping... ]

 
 
 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [PATCH] add CLUSTER table USING index (take 2)

2007-03-29 Thread Tom Lane
Holger Schurig [EMAIL PROTECTED] writes:
 I agree that the example should be re-written. But I'm not sure if I need
 to have a paragraph about the old syntax. There are two reasons:
 - I haven't seen any other SQL command where an old syntax was
   documented

If we were deprecating the old syntax with intention to remove it, that
might be a defensible position, but I didn't think we were doing that.

IMHO both forms seriously do need to be documented so that people will
understand that the index/table order is different.  Otherwise there'll
be enormous confusion.

 - I thought I could come away without writing doc. After all, I'm
   not a native english speaker. That's a point where I could need
   some help ...  (maybe my english is good enought, but it's not
   worth to make a take 4 to take 17 patch just for english
   grammar, typos, subtle meanings, whatever.

Your English seems fine to me, certainly more than good enough to
produce first-draft documentation.  Whoever reviews/commits it will
help out as needed.

 Is the placement of opt_cluster_using completely arbitrary? I'm not very 
 familiar with the parser, it really looks like those type-definitions 
 are in random order.

 I thought so.

Yeah, it's just a mess :=(.  Somebody might go through and sort them
into alphabetical order or something someday, but not today.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Current enums patch

2007-03-29 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Tom Dunstan wrote:
 Hi all
 
 Here's the current version of the enums patch. Not much change from last 
 time, the only thought-inducing stuff was fixing up some macros that 
 changed with the VARLENA changes, and adding a regression test to do 
 basic checking of RI behavior, after the discussions that we had 
 recently on the ri_trigger stuff with generic types. The actual behavior 
 was fixed by Tom's earlier patch, so this is just a sanity check.
 
 Cheers
 
 Tom

[ application/x-gzip is not supported, skipping... ]

 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-29 Thread Koichi Suzuki

Josh;

I'd like to explain what the term compression in my proposal means 
again and would like to show the resource consumption comparision with 
cp and gzip.


My proposal is to remove unnecessary full page writes (they are needed 
in crash recovery from inconsistent or partial writes) when we copy WAL 
to archive log and rebuilt them as a dummy when we restore from archive 
log.  Dummy is needed to maintain LSN.  So it is very very different 
from general purpose compression such as gzip, although pg_compresslog 
compresses archive log as a result.


As to CPU and I/O consumption, I've already evaluated as follows:

1) Collect all the WAL segment.
2) Copy them by different means, cp, pg_compresslog and gzip.

and compared the ellapsed time as well as other resource consumption.

Benchmark: DBT-2
Database size: 120WH (12.3GB)
Total WAL size: 4.2GB (after 60min. run)
Elapsed time:
  cp:120.6sec
  gzip:  590.0sec
  pg_compresslog: 79.4sec
Resultant archive log size:
  cp: 4.2GB
  gzip:   2.2GB
  pg_compresslog: 0.3GB
Resource consumption:
  cp:   user:   0.5sec system: 15.8sec idle:  16.9sec I/O wait: 87.7sec
  gzip: user: 286.2sec system:  8.6sec idle: 260.5sec I/O wait: 36.0sec
  pg_compresslog:
user:   7.9sec system:  5.5sec idle:  37.8sec I/O wait: 28.4sec

Because the resultant log size is considerably smaller than cp or gzip, 
pg_compresslog need much less I/O and because the logic is much simpler 
than gzip, it does not consume CPU.


The term compress may not be appropriate.   We may call this log 
optimization instead.


So I don't see any reason why this (at least optimization mark in each 
log record) can't be integrated.


Simon Riggs wrote:

On Thu, 2007-03-29 at 11:45 -0700, Josh Berkus wrote:


OK, different question:
Why would anyone ever set full_page_compress = off?
The only reason I can see is if compression costs us CPU but gains RAM  
I/O.  I can think of a lot of applications ... benchmarks included ... 
which are CPU-bound but not RAM or I/O bound.  For those applications, 
compression is a bad tradeoff.


If, however, CPU used for compression is made up elsewhere through smaller 
file processing, then I'd agree that we don't need a switch.


As I wrote to Simon's comment, I concern only one thing.

Without a switch, because both full page writes and corresponding 
logical log is included in WAL, this will increase WAL size slightly 
(maybe about five percent or so).   If everybody is happy with this, we 
don't need a switch.




Koichi-san has explained things for me now.

I misunderstood what the parameter did and reading your post, ISTM you
have as well. I do hope Koichi-san will alter the name to allow
everybody to understand what it does.



Here're some candidates:
full_page_writes_optimize
full_page_writes_mark: means it marks full_page_write as needed in 
crash recovery, needed in archive recovery and so on.


I don't insist these names.  It's very helpful if you have any 
suggestion to reflect what it really means.


Regards;
--
Koichi Suzuki

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-29 Thread Alvaro Herrera
Heikki Linnakangas wrote:
 Here's an update, fixing conflict by Tom's recent commit of Simon's 
 patch to skip WAL-inserts when archiving is not enabled.

Hmm, wouldn't it be better if the rewriteheap.c file was in
access/heap/ instead of commands/?



-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-29 Thread Koichi Suzuki

Hi,

Here's a patch reflected some of Simon's comments.

1) Removed an elog call in a critical section.

2) Changed the name of the commands, pg_complesslog and pg_decompresslog.

3) Changed diff option to make a patch.

--
Koichi Suzuki


pg_lesslog.tgz
Description: Binary data

---(end of broadcast)---
TIP 6: explain analyze is your friend