Re: [PATCHES] patch implementing the multi-argument aggregates (SOC project)

2006-07-26 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 This patch is nowhere near ready for submission :-(.  Most of the
 comments seem to be I don't know what to do here ...

Just to be clear, I think what Tom's saying it's not ready to be *applied*.
Sending patches to this list early and often during development is generally
encouraged.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Trivial patch to double vacuum speed on tables with no indexes

2006-08-27 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 stark [EMAIL PROTECTED] writes:
 There isn't really any need for the second pass in lazy vacuum if the table
 has no indexes.

 How often does that case come up in the real world, for tables that are
 large enough that you'd care about vacuum performance?

Admittedly it's not the most common scenario. But it does come up. ETL
applications for example that load data, then perform some manipulation of the
data before loading the data. If they have many updates to do they'll probably
have to do vacuums between some of them.

Arguably if you don't have any indexes on a large table it's quite likely to
be *because* you're planning on doing some big updates such that it'll be
faster to simply rebuild the indexes when you're done anyways.

I would have had the same objection if it resulted in substantially more
complex code but it was so simple that it doesn't seem like a concern.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Trivial patch to double vacuum speed on tables with no indexes

2006-08-28 Thread Gregory Stark
) {
+   maxtuples = (maintenance_work_mem * 1024L) / 
sizeof(ItemPointerData);
+   maxtuples = Min(maxtuples, INT_MAX);
+   maxtuples = Min(maxtuples, MaxAllocSize / 
sizeof(ItemPointerData));
+   /* stay sane if small maintenance_work_mem */
+   maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
+   } else {
+   maxtuples = MaxHeapTuplesPerPage;
+   }
 
vacrelstats-num_dead_tuples = 0;
vacrelstats-max_dead_tuples = (int) maxtuples;
-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Concurrent connections in psql patch

2006-09-03 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 Is this something people are interested in?  I am thinking no based on
 the lack of requests and the size of the patch.

Lack of requests? I was actually surprised by how enthusiastically people
reacted to it.

However I don't think the patch as is is ready to be committed. Aside from
missing documentation and regression tests it was only intended to be a
proof-of-concept and to be useful for specific tests I was doing.

I did try to do a decent job, I got \timing and server-tracked variables like
encoding. But I need to go back through the code and make sure there are no
other details like that.

It would be nice to get feedback from other developers from looking at the
patch to confirm that there aren't more fundamental problems with the approach
and how it uses libpq before I go through the effort of cleaning up the
details.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Trivial patch to double vacuum speed on tables with no indexes

2006-09-04 Thread Gregory Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Patch applied.  Thanks.
 
 Wait a minute.   This patch changes the behavior so that
 LockBufferForCleanup is applied to *every* heap page, not only the ones
 where there are removable tuples.  It's not hard to imagine scenarios
 where that results in severe system-wide performance degradation.
 Has there been any real-world testing of this idea?

 I see the no-index case now:

 +   if (nindexes)
 +   LockBuffer(buf, BUFFER_LOCK_SHARE);
 +   else
 +   LockBufferForCleanup(buf);

 Let's see what Greg says, or revert.

Hm, that's a good point. I could return it to the original method where it
released the share lock and did he LockBufferForCleanup only if necessary. I
thought it was awkward to acquire a lock then release it to acquire a
different lock on the same buffer but it's true that it doesn't always have to
acquire the second lock.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PATCHES] Some editing of docs for create index concurrently

2006-09-11 Thread Gregory Stark


The references to data warehousing and large tables were bothering me since --
while true -- it's not really the main use case for CREATE INDEX CONCURRENTLY.
Really it's OLTP systems that may or may not have large tables but regardless
cannot stand the downtime caused by locks.


Index: doc/src/sgml/indices.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/indices.sgml,v
retrieving revision 1.58
diff -c -r1.58 indices.sgml
*** doc/src/sgml/indices.sgml   25 Aug 2006 04:06:44 -  1.58
--- doc/src/sgml/indices.sgml   11 Sep 2006 14:36:04 -
***
*** 91,101 
/para
  
para
!Creating an index on a large table can take a long time.  By default,
!productnamePostgreSQL/productname allows reads (selects) to occur
!on the table in parallel with index creation, but writes (inserts,
!updates, deletes) are blocked until the index build is finished.
!It is possible to allow writes to occur in parallel with index
 creation, but there are several caveats to be aware of mdash;
 for more information see xref linkend=SQL-CREATEINDEX-CONCURRENTLY
 endterm=SQL-CREATEINDEX-CONCURRENTLY-title.
--- 91,101 
/para
  
para
!By default when creating indexes productnamePostgreSQL/productname
!allows reads (selects) to occur on the table being indexed in parallel with
!index creation, but writes (inserts, updates, deletes) are blocked until
!the index build is finished. In production environments this is often
!unacceptable. It is possible to allow writes to occur in parallel with 
index
 creation, but there are several caveats to be aware of mdash;
 for more information see xref linkend=SQL-CREATEINDEX-CONCURRENTLY
 endterm=SQL-CREATEINDEX-CONCURRENTLY-title.
cvs diff: Diffing doc/src/sgml/ref
Index: doc/src/sgml/ref/create_index.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/ref/create_index.sgml,v
retrieving revision 1.56
diff -c -r1.56 create_index.sgml
*** doc/src/sgml/ref/create_index.sgml  25 Aug 2006 04:06:45 -  1.56
--- doc/src/sgml/ref/create_index.sgml  11 Sep 2006 14:36:06 -
***
*** 264,281 
 /indexterm
  
 para
! Creating an index for a large table can be a long operation. In large data
! warehousing applications it can easily take hours or even days to build
! indexes. It's important to understand the impact creating indexes has on a
! system.
!/para
! 
!para
  Normally productnamePostgreSQL/ locks the table to be indexed against
  writes and performs the entire index build with a single scan of the
  table. Other transactions can still read the table, but if they try to
  insert, update, or delete rows in the table they will block until the
! index build is finished.
 /para
  
 para
--- 264,278 
 /indexterm
  
 para
! Creating an index can interfere with regular operation of a database.
  Normally productnamePostgreSQL/ locks the table to be indexed against
  writes and performs the entire index build with a single scan of the
  table. Other transactions can still read the table, but if they try to
  insert, update, or delete rows in the table they will block until the
! index build is finished. This could have a severe effect if the system is
! a live production database. Large tables can take many hours to be
! indexed, and even smaller tables can lock out writers for unacceptably
! long periods for a production system.
 /para
  
 para


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PATCHES] More doc patches

2006-09-19 Thread Gregory Stark
.
! /para
! 
!sect2 id=ddl-partitioning-managing-partitions
!titleManaging Partitions/title
! 
!para
!  Normally the set of partitions established when initally defining the
!  table are not intended to remain static. It's common to want to remove
!  old partitions of data from the partitioned tables and add new partitions
!  for new data periodically. One of the most important advantages of
!  partitioning is precisely that it allows this otherwise painful task to
!  be executed nearly instantaneously by manipulating the partition
!  structure rather than moving the large amounts of data around physically.
!/para
! 
!para
!  The simplest option for removing old data is to simply DROP the partition
!  that is no longer necessary. 
! 
! programlisting
! DROP TABLE measurement_y2003mm02
! /programlisting
! 
!  This can very quickly delete millions of records because it doesn't have
!  to individually delete every record. (Note however that some filesystems
!  may still take some time to delete large files.)
!/para
! 
!para
!  Another option that is often preferable is to remove the partition from
!  the partitioned table but retain access to it as a table in its own
!  right.
  
  programlisting
  ALTER TABLE measurement_y2003mm02 NO INHERIT measurement
  /programlisting
  
!  This allows further operations on the data before actually dropping it.
!  Often this is a useful time to back up the data using COPY, pg_dump, or
!  other tools, for example. It can also be a useful time to aggregate data
!  into smaller formats, perform other data manipulations, or run reports.
!/para
! 
!para
!  Similarly we can add a new partition to handle new data. We can create an
!  empty partition in the partitioned table just as the original partitions
!  were created above. 
! 
! programlisting
! CREATE TABLE measurement_y2006m02 (
! CHECK ( logdate = DATE '2006-02-01' AND logdate  DATE '2006-03-01' )
! ) INHERITS (measurement);
! /programlisting
  
!  Alternatively sometimes it's easier to create the table outside the
!  partition structure and move it in later. This allows the data to be
!  loaded and checked prior to it appearing in the partitioned table. It
!  also allows any data manipulation necessary to be performed without the
!  intermediate data appearing in queries against the partitioned tale. 
  
  programlisting
! CREATE TABLE measurement_y2006m02 (LIKE measurement INCLUDING DEFAULTS 
INCLUDING CONSTRAINTS);
  \COPY measurement_y2006m02 FROM 'measurement_y2006m02'
  UPDATE ...
  ALTER TABLE measurement_y2006m02 ADD CONSTRAINT y2006m02 CHECK ( logdate 
gt;= DATE '2006-02-01' AND logdate lt DATE '2006-03-01' );
  ALTER TABLE measurement_y2006m02 INHERIT MEASUREMENT;
  /programlisting
  
  /para
  
  
+sect2 id=ddl-partitioning-caveats
 para
! The following caveats currently apply to partitioned tables:
 itemizedlist
  listitem
   para


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] More doc patches

2006-09-19 Thread Gregory Stark

Neil Conway [EMAIL PROTECTED] writes:

 BTW, I personally prefer patches to be proper MIME attachments, created
 as a patch against the root of the Postgres source tree. However, I'm
 not sure what the other committers prefer.

Good to know. I usually generate diffs with cvs -c. 
I don't know why I did it differently here, weird.
Sorry about that.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Eliminating phase 3 requirement for varlen increases via ALTER COLUMN

2006-10-26 Thread Gregory Stark

Jonah H. Harris [EMAIL PROTECTED] writes:

 On 10/26/06, Tom Lane [EMAIL PROTECTED] wrote:
 This makes some really quite unacceptable assumptions about
 the meaning and encoding of typmod ...

 True, so VARCHAR seems like the only one?  That's the only one I've
 really encountered in the field on a fairly regular basis.

I think what you want is to add a new method entry in pg_type to allow a type
to declare a method to tell you whether a change is work-free or not. Then any
type, even user-defined types, can allow some changes to be work-free and some
not without exposing any implementation details outside the type.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-31 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 The added WAL volume should be pretty minimal, because only tuples that have
 gone untouched for a long time incur extra work.

That seems like a weak point in the logic. It seems like it would make VACUUM
which is already an i/o hog even more so. Perhaps something clever can be done
with vacuum_cost_delay and commit_siblings.

Something like inserting the delay between WAL logging and syncing the log and
writing to the heap. So if another transaction commits in the meantime we can
skip the extra fsync and continue.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-19 Thread Gregory Stark
Andrew Dunstan [EMAIL PROTECTED] writes:

 I'm not a big fan of ordering columns to optimise record layout, except in the
 most extreme cases (massive DW type apps). I think visible column order should
 be logical, not governed by physical considerations.

Well as long as we're talking shoulds the database should take care of this
for you anyways.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] WIP patch for operator families

2006-12-22 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Attached is the current state of my work on generalizing operator classes
 into operator families, as per discussion last week.  It's far from
 ready to apply, but I thought I'd circulate it for comments.  What's done
 is ...

 Comments?

Didn't we say discussion should happen on -hackers so development progress
would be more visible?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PATCHES] Assorted typos

2007-01-04 Thread Gregory Stark

A few assorted typos and grammar corrections I caught while skimming source


Index: src/backend/access/gin/ginvacuum.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/gin/ginvacuum.c,v
retrieving revision 1.7
diff -c -r1.7 ginvacuum.c
*** src/backend/access/gin/ginvacuum.c  4 Oct 2006 00:29:48 -   1.7
--- src/backend/access/gin/ginvacuum.c  4 Jan 2007 12:45:44 -
***
*** 34,40 
  /*
   * Cleans array of ItemPointer (removes dead pointers)
   * Results are always stored in *cleaned, which will be allocated
!  * if its needed. In case of *cleaned!=NULL caller is resposible to
   * enough space. *cleaned and items may point to the same
   * memory addres.
   */
--- 34,40 
  /*
   * Cleans array of ItemPointer (removes dead pointers)
   * Results are always stored in *cleaned, which will be allocated
!  * if it's needed. In case of *cleaned!=NULL caller is resposible to
   * enough space. *cleaned and items may point to the same
   * memory addres.
   */
Index: src/backend/access/heap/tuptoaster.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/heap/tuptoaster.c,v
retrieving revision 1.66
diff -c -r1.66 tuptoaster.c
*** src/backend/access/heap/tuptoaster.c5 Oct 2006 23:33:33 -   
1.66
--- src/backend/access/heap/tuptoaster.c4 Jan 2007 12:45:45 -
***
*** 1113,1119 
return;
  
/*
!* Open the toast relation and it's index
 */
toastrel = heap_open(attr-va_content.va_external.va_toastrelid,
 RowExclusiveLock);
--- 1113,1119 
return;
  
/*
!* Open the toast relation and its index
 */
toastrel = heap_open(attr-va_content.va_external.va_toastrelid,
 RowExclusiveLock);
***
*** 1337,1343 
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
  
/*
!* Open the toast relation and it's index
 */
toastrel = heap_open(attr-va_content.va_external.va_toastrelid,
 AccessShareLock);
--- 1337,1343 
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
  
/*
!* Open the toast relation and its index
 */
toastrel = heap_open(attr-va_content.va_external.va_toastrelid,
 AccessShareLock);
Index: src/include/catalog/pg_type.h
===
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_type.h,v
retrieving revision 1.172
diff -c -r1.172 pg_type.h
*** src/include/catalog/pg_type.h   4 Oct 2006 00:30:08 -   1.172
--- src/include/catalog/pg_type.h   4 Jan 2007 12:45:59 -
***
*** 236,242 
   * 
   */
  
! /* keep the following ordered by OID so that later changes can be made 
easier*/
  
  /* Make sure the typlen, typbyval, and typalign values here match the initial
 values for attlen, attbyval, and attalign in both places in pg_attribute.h
--- 236,242 
   * 
   */
  
! /* keep the following ordered by OID so that later changes can be made more 
easily */
  
  /* Make sure the typlen, typbyval, and typalign values here match the initial
 values for attlen, attbyval, and attalign in both places in pg_attribute.h


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 What did you think about protecting against torn writes using id numbers 
 every
 512 bytes.

 Pretty much not happening; or are you volunteering to fix every part of
 the system to tolerate injections of inserted data anywhere in a stored
 datum?

I was thinking to do it at a low level as the xlog records are prepared to be
written to the filesystem and as the data is being read from disk. I haven't
read that code yet to see where to inject it but I understand there's already
a copy happening and it could be done there. Even if we optimize out all the
copies we could do it in the actual i/o call using readv/writev.

I wasn't thinking of doing it on actual disk buffers since it doesn't help us
avoid full page writes, but it could be done there too using readv/writev in
the smgr. That might be useful for a full_page_writes=off system so even if it
can't guarantee no corruption at least it can guarantee no silent corruption.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Pretty much not happening; or are you volunteering to fix every part of
 the system to tolerate injections of inserted data anywhere in a stored
 datum?

 I was thinking to do it at a low level as the xlog records are prepared to be
 written to the filesystem and as the data is being read from disk. I haven't
 read that code yet to see where to inject it but I understand there's already
 a copy happening and it could be done there.

 You understand wrong ... a tuple sitting on disk is normally read
 directly from the shared buffer, and I don't think we want to pay for
 copying it.

xlog records


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Oh, sorry, had the wrong context in mind.  I'm still not very impressed
 with the idea --- a CRC check will catch many kinds of problems, whereas
 this approach catches exactly one kind of problem.

Well in fairness I tossed in a throwaway comment at the end of that email
about heap pages. I'll do the same here since I can't resist. But the main
thread here is about xlog really.

It just seems to me like it's better to target each problem with a solution
that addresses it directly than have one feature that we hope hope addresses
them all more or less.

Having a CRC in WAL but not in the heap seems kind of pointless. If your
hardware is unreliable the corruption could anywhere. Depending on it to solve
multiple problems means we can't offer the option to disable it because it
would affect other things as well.

What I would like to see is a CRC option that would put CRC checks in every
disk page whether heap, index, WAL, control file, etc. I think we would
default that to off to match our current setup most closely.

Separately we would have a feature in WAL to detect torn pages so that we can
reliably detect the end of valid WAL. That would have to always be on. But
having it as a separate feature means the CRC could be optional.

Also, incidentally like I mentioned in my previous email, we could do the torn
page detection in heap pages too by handling it in the smgr using
readv/writev. No copies, no corrupted datums. Essentially the tags would be
inserted on the fly as the data was copied into kernel space.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [pgsql-patches] Recalculating OldestXmin in a long-running vacuum

2007-01-17 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 Tom Lane wrote:
 
  In any case I'd like to see some evidence of significant real-world
  benefit before adding such a conceptual wart ...
 
 I've asked our testers to do a TPC-C run with and without the
 patch. I'm not expecting it to make a huge difference there, but if you're
 using a big cost delay, it could make quite a difference for such a simple
 thing.

I think the key here is that it removes the size of the table from the list of
factors that govern the steady-state. Currently as the table grows the maximum
age of the snapshot vacuum uses gets older too which means a greater
percentage of the table is dedicated to dead tuple overhead. (which in turn
means a larger table which means an even greater percentage of dead tuples...)

With the patch the size of the table is no longer a factor. As long as the
work vacuum has on a given page can keep up with the rate of creating dead
tuples then it won't matter how large the table is. The only factors that
determine the steady state are the rate of creation of dead tuples and the
rate at which vacuum can process them.

Unfortunately indexes, again, mean that's not entirely true. As the table
grows the indexes will grow as well and that will slow vacuum down. Though
indexes are usually smaller than tables.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PATCHES] WIP: Recursive Queries

2007-02-06 Thread Gregory Stark
(rtindex, pstate-p_rtable));
--- 693,720 
  	if (IsA(n, RangeVar))
  	{
  		/* Plain relation reference */
+ 		RangeVar *rv = (RangeVar *)n;
  		RangeTblRef *rtr;
! 		RangeTblEntry *rte = NULL;
  		int			rtindex;
+ 		ListCell *cte;
+ 
+ 		if (pstate-p_ctenamespace  !rv-schemaname) {
+ 			/* We have to check if this is a reference to a common table
+ 			 * expression (ie subquery defined in the WITH clause) */
+ 			foreach(cte, pstate-p_ctenamespace)
+ 			{
+ RangeSubselect *r = (RangeSubselect*) lfirst(cte);
+ if (!strcmp(rv-relname, r-alias-aliasname))
+ {
+ 	rte = transformRangeCTE(pstate, rv, r);
+ 	break;
+ }
+ 			}
+ 		}
+ 		if (!rte)
+ 			rte = transformTableEntry(pstate, rv);
  
  		/* assume new rte is at end */
  		rtindex = list_length(pstate-p_rtable);
  		Assert(rte == rt_fetch(rtindex, pstate-p_rtable));
Index: src/include/nodes/parsenodes.h
===
RCS file: /home/stark/src/REPOSITORY/pgsql/src/include/nodes/parsenodes.h,v
retrieving revision 1.338
diff -c -r1.338 parsenodes.h
*** src/include/nodes/parsenodes.h	9 Jan 2007 02:14:15 -	1.338
--- src/include/nodes/parsenodes.h	30 Jan 2007 18:09:27 -
***
*** 789,794 
--- 789,795 
  	/*
  	 * These fields are used only in upper-level SelectStmts.
  	 */
+ 	List		*with_cte_list; /* List of Common Table Expressions (ie WITH clause) */
  	SetOperation op;			/* type of set op */
  	bool		all;			/* ALL specified? */
  	struct SelectStmt *larg;	/* left child */
Index: src/include/parser/parse_clause.h
===
RCS file: /home/stark/src/REPOSITORY/pgsql/src/include/parser/parse_clause.h,v
retrieving revision 1.48
diff -c -r1.48 parse_clause.h
*** src/include/parser/parse_clause.h	9 Jan 2007 02:14:15 -	1.48
--- src/include/parser/parse_clause.h	30 Jan 2007 16:07:39 -
***
*** 16,21 
--- 16,22 
  
  #include parser/parse_node.h
  
+ extern void transformWithClause(ParseState *pstate, List *with_cte_list);
  extern void transformFromClause(ParseState *pstate, List *frmList);
  extern int setTargetTable(ParseState *pstate, RangeVar *relation,
  			   bool inh, bool alsoSource, AclMode requiredPerms);
Index: src/include/parser/parse_node.h
===
RCS file: /home/stark/src/REPOSITORY/pgsql/src/include/parser/parse_node.h,v
retrieving revision 1.51
diff -c -r1.51 parse_node.h
*** src/include/parser/parse_node.h	5 Jan 2007 22:19:57 -	1.51
--- src/include/parser/parse_node.h	30 Jan 2007 16:07:01 -
***
*** 58,63 
--- 58,70 
   * of ParseStates, only the topmost ParseState contains paramtype info; but
   * we copy the p_variableparams flag down to the child nodes for speed in
   * coerce_type.
+  *
+  * [1] Note that p_ctenamespace is a namespace for relations but distinct
+  * from p_relnamespace. p_ctenamespace is the list of relations that can be
+  * referred to in a FROM or JOIN clause. p_relnamespace is the list of
+  * relations which already have and therefore can be referred to in
+  * qualified variable references. Also, note that p_ctenamespace is a list
+  * of RangeSubselects, not a list of range table entries.
   */
  typedef struct ParseState
  {
***
*** 68,73 
--- 75,81 
   * node's fromlist) */
  	List	   *p_relnamespace; /* current namespace for relations */
  	List	   *p_varnamespace; /* current namespace for columns */
+ 	List	   *p_ctenamespace; /* current namespace for common table expressions [1] */
  	Oid		   *p_paramtypes;	/* OIDs of types for $n parameter symbols */
  	int			p_numparams;	/* allocated size of p_paramtypes[] */
  	int			p_next_resno;	/* next targetlist resno to assign */



-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PATCHES] LIMIT/SORT optimization

2007-02-07 Thread Gregory Stark

Earlier I mentioned I had implemented the limited sort optimization. This
kicks in quite frequently on web applications that implement paging. Currently
Postgres has to sort the entire data set, often in a disk sort. If the page is
early in the result set it's a lot more efficient to just keep the top n
records in memory and do a single pass through the result set.

The patch is quite simple and is mostly localized to tuplesort.c which
provides a new function to advise it of the maximum necessary. It uses the
existing heapsort functionality which makes it easy to keep the top n records
by peeking at the top element of the heap and removing it if the new record
would displace it.

In experimenting I found heap sort about half the speed of quicksort so I made
it switch over to heap sort if the input size reached 2x the specified maximum
or if it can avoid spilling to disk by switching.

The two open issues (which are arguably the same issue) is how to get the
information down to the sort node and how to cost the plan. Currently it's a
bit of a hack: the Limit node peeks at its child and if it's a sort it calls a
special function to provide the limit.



limit-sort.patch.4
Description: Binary data





-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PATCHES] [test]

2007-02-22 Thread Gregory Stark

[testing to see if -patches allows me to post yet. I send a patch last night
but haven't seen it come through]

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PATCHES] Short varlena headers

2007-02-22 Thread Gregory Stark

I've tried repeatedly to send this patch but it doesn't seem to be getting
through. It's not in the archives or my inbox. Now that I look I realized that
none of the WIP versions of the patch that I sent arrived either. That's
fairly disappointing since I had made efforts to keep people apprised of the
development.

Here's a working patch that provides 1-byte headers and unaligned storage for
varlena data types storing  128 bytes (actually = 126 bytes).

 http://community.enterprisedb.com/varlena/patch-varvarlena-9.patch.gz


Things it does:

1) Changes the varlena api to use SET_VARSIZE instead of VARATT_SIZEP

2) Changes the heap_form_tuple api in a subtle way: attributes in a heap tuple
   may need to be detoasted even if the tuple is never written out to disk.

3) Changes the GETSTRUCT api in another subtle way: it's no loner safe to
   access then first varlena in a tuple directly through the GETSTRUCT
   interface. At least not unless special care is taken.

4) Saves a *ton* of space because it saves 3 of the 4 bytes of varlena
   overhead *and* the up to 4 bytes of alignment padding before every varlena.
   That turns out to be a lot more space than I realized. I'll post some
   sample schemas with space savings later.

5) Passes all postgres regression tests. 

Things it doesn't do:

1) 2-byte headers for objects that exceed 128 bytes :(

2) 0-byte headers for single ascii characters :(

3) avoid htonl/ntohl by using low order bits on little-endian machines

4) provide an escape hatch for types or columns that don't want this
   behaviour. Currently int2vector and oidvector are specifically exempted
   since they're used in the system tables, sometimes through the GETSTRUCT
   api. I doubt anything not used in the system tables has any business being
   exempted which only leaves us with the occasional text attribute which I
   plan to double check aren't problems.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PATCHES] 1-byte packed varlena headers

2007-02-26 Thread Gregory Stark


Updated patch at:

 http://community.enterprisedb.com/varlena/patch-varvarlena-12.patch.gz

This fixes a rather critical oversight which caused it all to appear to work
but not actually save any space. I've added an assertion check that the
predicted tuple size matches the tuple size generated by heap_form*tuple.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PATCHES] Numeric patch to add special-case representations for 8 bytes

2007-02-26 Thread Gregory Stark

I've uploaded a quick hack to store numerics in  8 bytes when possible. 

 http://community.enterprisedb.com/numeric-hack-1.patch

This is a bit of a kludge since it doesn't actually provide any interface for
external clients of the numeric module to parse the resulting data. Ie, the
macros in numeric.h will return garbage.

But I'm not entirely convinced that matters. It's not like those macros were
really useful to any other modules anyways since there was no way to extract
the actual digits.

The patch is also slightly unsatisfactory because as I discovered numbers like
1.1 are stored as two digits currently. But it does work and it does save a
substantial amount of space for integers.

postgres=# select n,pg_column_size(n) from n;
n | pg_column_size 
--+
1 |  2
   11 |  2
  101 |  2
 1001 |  3
10001 |  9
   11 |  9
  1.1 |  9
 10.1 |  9
100.1 |  9
   1000.1 |  9
  1.1 | 11
 10.1 | 11

I had hoped to get the second batch to be 3-4 bytes. But even now note how
much space is saved for integers 1.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PATCHES] Updated Packed Varlena (relative to CVS post SET_VARSIZE changes)

2007-02-28 Thread Gregory Stark

I've updated the patch against Tom's recent commit of the SET_VARSIZE changes.
It's about 40% the size it was before, most of the remaining patch is actually
heaptuple.c and tuptoaster.c which is the key code.

 http://community.enterprisedb.com/varlena/patch-varvarlena-13.patch.gz

I had fixed most of contrib last night but I pulled it out of this patch to
keep the size down. I'll send in a separate patch for contrib.

I'm not sure what to do about all the GIST data types. All those index support
functions take C data structures as datums and I can't tell which ones are
supposed to be varlenas and which are just random C data structures.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Numeric patch to add special-case representations for 8 bytes

2007-02-28 Thread Gregory Stark

Patric Bechtel [EMAIL PROTECTED] writes:

 Maybe you want to have a look here:
 http://www2.hursley.ibm.com/decimal/DPDecimal.html

Well we're not really looking for the optimal packing in general. All the
problems here have to do with convenience in the implementation rather than
the problems with the approach. It's easier in the code to have fixed size
scale and weight and to have them at the beginning of the data type, which is
fine for larger values but for small values they dominate and waste space.

But as far as the approach goes, I admit I find it a bit hard to believe that
we're still doing BCD in the 21st century at all.

If we stored the digits in base 2 with a base 10 exponent would it really be
too hard to output the digits? Even long multiplication and then reversing the
results doesn't seem like it would be too bad and surely there must exist
reasonable algorithms to do better.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Numeric patch to add special-case representations for 8 bytes

2007-03-02 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Patric Bechtel [EMAIL PROTECTED] writes:
 Tom Lane schrieb am 02.03.2007 14:38:
 Exact decimal fractions are no longer exact when converted to base 2.

 I think multiplying with base 10 until it's a whole number, then saving
 that exponent with it, that's how I understood it.

 That hardly seems likely to win in terms of calculation efficiency ---
 for example, adding two numbers will now likely require a multiplication
 in order to align the values for addition.  Having to store the exponent
 doesn't sound that great for the original complaint of too much overhead
 for short numbers, either...

Adding two numbers with two different exponents would require multiplying to
set the exponents equal. 

a) normally you're adding together numbers of the same type so normally you'll
   have the same precision.

b) It's only going to hurt for very large numbers where handling base-10^n
   numbers is *so* bad that the argument kind of breaks down.

c) I was picturing storing the exponent relative to the decimal place instead
   of relative to the first digit as we do now. That means numbers like 1x10^6
   might take more space but numbers like 123456 would take less since we
   could define a missing exponent to represent an exponent of 0.

Incidentally -- doing what I just castigated Jonah for doing -- I looked
around and both libgmp and the java math.BigDecimal libraries use base-2
digits. The latter with base-10 exponents no less.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Simon Riggs [EMAIL PROTECTED] writes:
 I'm slightly worried though since that seems to have changed from 8.2,
 which I oprofiled over Christmas.

 If you were testing a case with wider rows than Heikki tested, you'd see
 less impact --- the cost of the old way was O(N^2) in the number of
 tuples that fit on a page, so the behavior gets rapidly worse as you get
 down to smaller tuple sizes.  (Come to think of it, the cmin/cmax
 collapse would be a factor here too.)

Or larger block sizes of course. A 32kb block would be 16x as bad which starts
to be pretty serious.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Simon Riggs [EMAIL PROTECTED] writes:
 Feedback from someone else looking to the problem last year. IIRC there
 was a feeling that if we didn't have to search for delimiters the COPY
 FROM input parsing could be easier.

 Of what use is the above comment?  You have to parse the input into
 fields somehow.

Well those two requirements aren't inconsistent if you're using fixed-width
input text files. Not that I'm enamoured of such file formats, just saying. 

The only file formats I ever wanted when I was doing this kind of stuff is tab
separated, all the others (comma separated and (egads) *pipe* separated?!) are
just disasters.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Heap page diagnostic/test functions (WIP)

2007-03-06 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes:

 I'll return the infomasks directly, for you to manipulate.

 Not happy with that, but open to suggestions.

Well the alternative would be a long list of boolean columns which would make
the output kind of long.

Perhaps a function pg_decode_infomask(varbit) which returns a ROW of booleans
with appropriate names would be a good compromise. If you want it you could
use it in your query.

Or perhaps you could include a ROW of booleans in your output already,
something like:

postgres=# insert into tuple_info values (b'000', ROW(false,false,false));
INSERT 0 1

postgres=# select * from tuple_info;
 infomask_bits | infomask_flags 
---+
 000   | (f,f,f)
(1 row)

postgres=# select (infomask_flags).* from tuple_info;
 flag_a | flag_b | flag_c 
++
 f  | f  | f
(1 row)

That might be kind of tricky to cons up though. I had to create a table to do
it here. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] Packed Varlenas update

2007-03-08 Thread Gregory Stark

. Fixed all regressions in contrib (gist indexed) modules 
. Fixed a bug that occurs when a compressed datum ends up under 128 bytes
. Fixed a bug on zero-column tuples
. Added regression tests

 http://community.enterprisedb.com/varlena/patch-varvarlena-16.patch.gz

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PATCHES] Updated packed varlena patch

2007-03-09 Thread Gregory Stark

I added a little-endian special-case using the low order bits as described by
Tom.

Also, I noticed that int2vector and oidvector are already flagged in pg_type
with attstorage == 'p'. Really any type with attstorage == 'p' should be
exempted from packed varlena handling since it isn't expecting to be toasted.

This lets me clean up heaptuple.c a bit so now it doesn't depend on pg_type.h
and doesn't have any special case exemption for oidvector and int2vector. It
also means any other type can opt out from packed varlena packing by setting
attstorage. I've left inet and cidr with toast support and changed them to
have attstorage 'm' instead.

 http://community.enterprisedb.com/varlena/patch-varvarlena-17.patch.gz

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PATCHES] non-recursive WITH clause support

2007-03-09 Thread Gregory Stark

Here's an updated patch that fixes the bug I had. This is now functional basic
non-recursive WITH clause support.

 http://community.enterprisedb.com/recursive/with-pg82stable-v2.patch.gz

It's a pretty short simple patch as is; it just directly inlines any WITH
clauses as if they had been written as subqueries. We'll have to do something
much more clever to get recursive queries to work but for non-recursive
queries that's sufficient.

Example:

postgres=# with a as (select 1 as x) select * from (select * from a) as x;
 x 
---
 1
(1 row)




-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] WIP patch for plan invalidation

2007-03-10 Thread Gregory Stark
Gregory Stark [EMAIL PROTECTED] writes:

 Tom Lane [EMAIL PROTECTED] writes:

 Comments?

 Why do CREATE/DROP/REINDEX DATABASE no longer call PreventTransactionChain?

sigh, nevermind.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PATCHES] Packed varlena patch update

2007-03-12 Thread Gregory Stark



I implemented Tom's suggestion of packing the external toast pointers
unaligned and copying them to a local struct to access the fields. This saves
3-6 bytes on every external toast pointer. Presumably this isn't interesting
if you're accessing the toasted values since they would be quite large
anyways, but it could be interesting if you're doing lots of queries that
don't access the toasted values. It does uglify the code a bit but it's all
contained in tuptoaster.c.

 http://community.enterprisedb.com/varlena/patch-varvarlena-19.patch.gz

I think this is the last of the todo suggestions that were mentioned on the
list previously, at least that I recall.

Some implications:

1) There's no longer any macro to determine if an external attribute is
   compressed. I could provide a function to do it if we need it but in all
   the cases where I see it being used we already needed to extract the fields
   of the toast pointer anyways, so it wouldn't make sense.

2) We check if a toasted value is replaced with a new one by memcmp'ing the
   entire toast pointer. We used to just compare valueid and relid, so this
   means we're now comparing extsize and rawsize as well. I can't see how they
   could vary for the same valueid though so I don't think that's actually
   changing anything.


Remaining things in the back of my mind though:

. I'm not 100% confident of the GIST changes. I think I may have changed a few
  too many lines of code there. It does pass all the contrib regressions
  though.

. I think there may be some opportunities for optimizing heaptuple.c. In
  particular the places where it uses VARSIZE_ANY where it knows some of the
  cases are impossible. It may not make a difference due to branch prediction
  though.

. I've left the htonl/ntohl calls in place in the BIG_ENDIAN #if-branch.
  They're easy enough to remove and it leaves us the option of removing the
  #if entirely and just us use network-byte-order instead of the #ifdef.

  I'm a bit worried about modules building against postgres.h without
  including the config.h include file. Is that possible? Is it worth worrying
  about? I think I could make it fail to build rather than crash randomly.

. One of the regression tests I wrote makes a datum of 1/2Meg. Can all the
  build-farm machines handle that? I don't really need anything so large for
  the regression but it was the most convenient way to make something which
  after compressing was large enough to toast externally. It might be better
  to find some less compressible data.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] guc patch: Make variables fall back to default values

2007-03-12 Thread Gregory Stark

Peter Eisentraut [EMAIL PROTECTED] writes:

 Joachim Wieland wrote:
 Attached is the long-awaited guc patch that makes values fall back to
 their default values when they got removed (or commented) from the
 configuration file. This has always been a source of confusion.

 I have applied your patch with some of the discussed corrections.  The 
 use of memory allocation in the GUC code might still need some review.

This is a pretty major user-visible change. While I'm strongly in favour of it
it seems like there ought to be some documentation file touched by this, no?
Or have I missed it?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] guc patch: Make variables fall back to default values

2007-03-13 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 It's a release-note item ... assuming that it doesn't get reverted in
 the near future.  Could we have some attention to the all-red buildfarm?

It's not just a bug. There's code missing. 

The code seems to assume that all custom variables are strings. There are
about half a dozen Assert(variable-vartype == PGC_STRING) throughout the
patch. That's not true, plperl's use_strict is a boolean and we have
DefineCustome*Variable functions for each type of variable. Perl bombs because
plperl.use_strict is a boolean.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PATCHES] Updated Packed Varlena patch v20 (final?)

2007-03-13 Thread Gregory Stark

Updated patch:

I went through the high traffic areas and levelled them up to using the new
macros to avoid detoasting smaller arguments. They key areas are really
btree operators since they have a noticeable effect on sorts, especially index
builds, when the data being sorted fits in memory.

There is a bit of a name-game here. The macros I made are called
VARDATA_ANY(datum) VARSIZE_ANY(datum) AND VARSIZE_ANY_EXHDR(datum). 
And the datatype macros are, for example,  PG_GETARG_TEXT_PP() and
DatumGetTextPP()  -- short for packed pointer.

Maybe not the prettiest names in the world but clear and I've found them
pretty easy to spot when I'm looking for inconsistent use of
VARSIZE_ANY-VARDATA for example. I thought of PVARSIZE/PVARDATA (for
packed) but I'm afraid it would camouflage such cases.

Anyone have any better ideas? If not I'm satisfied with them... 
Except maybe VARSIZE_ANY_EXHDR() which seems too long.

I got to almost everything in varlena.c and varchar.c so that includes text,
bpchar, and bytea as well as varchar's few procedures. That includes probably
more than I really needed to, but as long as the datatypes are working with
bytes it's convenient enough.

Also, I replaced my tweaks to hstore and pg_trgm with Teodore's.

And of course updated CVS.

 http://community.enterprisedb.com/varlena/varvarlena-20.patch.gz

I'm going to be putting this aside for a little while. I think it's really
done. There's more that can be done of course, hit inet and numeric with the
new macros, for instance. But I want to see what happens when it gets reviewed
before I do that kind of bookkeeping.

One thing that I've left in there again is the htonl/ntohl macros in the
big-endian case. It really makes sense to either remove them or remove the
#ifdef.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] LIMIT/SORT optimization

2007-03-14 Thread Gregory Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 Is there a newer version of this patch?

As requested, I've cut an updated version of this patch against CVS HEAD:

 http://community.enterprisedb.com/sort-limit-v5.patch.gz

The open issues I see are:

 Adjusting the costing of the sort to take into account the optimization

 Whether the communication between the Limit node and the Sort node is kosher
 or whether something more abstract is needed.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] LIMIT/SORT optimization

2007-03-14 Thread Gregory Stark

Gregory Stark [EMAIL PROTECTED] writes:

 Bruce Momjian [EMAIL PROTECTED] writes:

 Is there a newer version of this patch?

 As requested, I've cut an updated version of this patch against CVS HEAD:

  http://community.enterprisedb.com/sort-limit-v5.patch.gz

Someone asked why I've been posting links rather than attachments. The only
reason was because when I posted patches in the past they were dropped by the
mailing list. I would say refused except I never received a bounce, the
messages just never appeared on list or in the archive.

I'll try attaching this patch again, which is relatively small compared to the
recursive query patches and packed varlena patches which disappeared into the
ether. Also, this one is gzipped whereas in the past I usually attached
patches uncompressed so people could read them without saving and
uncompressing them. Perhaps one of those differences is the source of the
problem?

Do people prefer receiving attachments or downloadable links? 
Does the answer change if the patches are quite large?



sort-limit-v5.patch.gz
Description: Binary data

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] patch adding new regexp functions

2007-03-21 Thread Gregory Stark

Jeremy Drake [EMAIL PROTECTED] writes:

 BTW, should I be calling get_typlenbyvalalign on TEXTOID or are there macros
 for those also?

Hardcoding -1 for typlen of varlenas is one of the few (the only?) magic
constants used throughout the source code. I'm surprised there isn't a macro
for it though.

Do you need the alignment? If so I want to check the code against the packed
varlena patch. Just in case.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] patch adding new regexp functions

2007-03-21 Thread Gregory Stark

Gregory Stark [EMAIL PROTECTED] writes:

 Jeremy Drake [EMAIL PROTECTED] writes:

 BTW, should I be calling get_typlenbyvalalign on TEXTOID or are there macros
 for those also?

 Hardcoding -1 for typlen of varlenas is one of the few (the only?) magic
 constants used throughout the source code. I'm surprised there isn't a macro
 for it though.

 Do you need the alignment? If so I want to check the code against the packed
 varlena patch. Just in case.

Ah, it's just to construct an array, that's not a concern at all. And you're
detoasting the text data types before using or storing them so that's fine.


The only thing I would say is that this should maybe be a TextGetDatum() just
for code hygiene. It should be exactly equivalent though:

+ PointerGetDatum(matchctx-orig_str),

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-24 Thread Gregory Stark
 OTOH, do we have any evidence that this is worth bothering with at all?
 I fear that the cases of long-running transactions that are problems
 in the real world wouldn't be helped much --- for instance, pg_dump
 wouldn't change behavior because it uses a serializable transaction.
Well I think this would be the same infrastructure we would need to do the
other discussed improvement to address pg_dump's impact. That would require us
to publish the youngest xmax of the live snapshots. Vacuum could deduce that
that xid cannot possibly see any transactions between the youngest extant xmax
and the oldest in-progress transaction.
-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] LIMIT/SORT optimization

2007-03-24 Thread Gregory Stark
 Did Greg push a version which didn't carry the WIP label to it?  As
 far as I remember he was still asking how to make the Sort and Limit
 nodes communicate.

 Good question.  I asked for a new version of this patch and the WIP was
 only in the email subject line. Greg, is this ready for review?
Well my question was whether it was acceptable to do things this way or
whether there was a better way. If it's the right way then sure, if not then
no. I guess that's what review is for.
In short I'm not sure what constitutes ready for review. Are you asking if
it's ready to apply? I don't know, that's why we have reviews. Or are you
asking if it's ready for someone to look at? What's the point of posting WIP
patches if you don't want someone to look at it?
-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-25 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Well I think this would be the same infrastructure we would need to do
 the other discussed improvement to address pg_dump's impact. That
 would require us to publish the youngest xmax of the live
 snapshots. Vacuum could deduce that that xid cannot possibly see any
 transactions between the youngest extant xmax and the oldest
 in-progress transaction.

 ... and do what with the knowledge?  Not remove tuples, because any such
 tuples would be part of RECENTLY_DEAD update chains that that xact might
 be following now or in the future.

Well that just means it might require extra work, not that it would be
impossible. 

Firstly, some tuples would not be part of a chain and could be cleaned up
anyways. Others would be part of a HOT chain which might make it easier to
clean them up.

But even for tuples that are part of a chain there may be solutions. We could
truncate the tuple to just the MVCC information so subsequent transactions can
find the head. Or we might be able to go back and edit the forward link to
skip the dead intermediate tuples (and somehow deal with the race
conditions...)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-26 Thread Gregory Stark

I have a question about what would happen for a transaction running a command
like COPY FROM. Is it possible it would manage to arrange to have no live
snapshots at all? So it would have no impact on concurrent VACUUMs? What about
something running a large pg_restore?

Tom Lane [EMAIL PROTECTED] writes:

 On the whole though I think we should let this idea go till 8.4; 

I tend to agree but for a different reason. I think it's something that will
open the doors for a lot of new ideas. If we put it in CVS HEAD early in 8.4 I
think (or hope at any rate) we'll think of at least a few new things we can do
with the new more precise information it exposes.

Just as an example, if you find you have no live snapshots can you throw out
the combocid hash? Any tuple you find with a combocid that's been discarded
that way must predate your current scan and therefore is deleted for you.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Numeric patch to add special-case representations for 8 bytes

2007-03-27 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 Greg, do you want to submit a patch for this or add a TODO item for this?

Well I never got any approval or rejection in principle so I don't know if
such a patch would be accepted even if it were implemented reasonably. If it
has the consensus needed to be a TODO item I would go ahead and do it.

The main design issue is that I was proposing to make it impossible to access
the internals of the numeric storage using macros. Currently some of the data
(the sign, dscale, and weight) is visible without having to call any special
numeric functions. I was proposing to use representations where those might
not be as easily accessible. 

However I don't think we have any consumers of that data outside of numeric.c
anyways. Is there anything using that functionality currently? Do we mind
losing it?

 ---

 Gregory Stark wrote:
 
 I've uploaded a quick hack to store numerics in  8 bytes when possible. 
 
  http://community.enterprisedb.com/numeric-hack-1.patch
 
 This is a bit of a kludge since it doesn't actually provide any interface for
 external clients of the numeric module to parse the resulting data. Ie, the
 macros in numeric.h will return garbage.
 
 But I'm not entirely convinced that matters. It's not like those macros were
 really useful to any other modules anyways since there was no way to extract
 the actual digits.
 
 The patch is also slightly unsatisfactory because as I discovered numbers 
 like
 1.1 are stored as two digits currently. But it does work and it does save a
 substantial amount of space for integers.
 
 postgres=# select n,pg_column_size(n) from n;
 n | pg_column_size 
 --+
 1 |  2
11 |  2
   101 |  2
  1001 |  3
 10001 |  9
11 |  9
   1.1 |  9
  10.1 |  9
 100.1 |  9
1000.1 |  9
   1.1 | 11
  10.1 | 11
 
 I had hoped to get the second batch to be 3-4 bytes. But even now note how
 much space is saved for integers 1.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Recalculating OldestXmin in a long-running vacuum

2007-03-27 Thread Gregory Stark

Heikki Linnakangas [EMAIL PROTECTED] writes:

 But what surprises me is that response times went up a with the patch. I
 don't know why.
 Maybe because of increased contention of ProcArrayLock?  (I assume you
 are using that, althought I haven't seen the patch)
 I am, but I doubt that's it. The response times are dominated by I/O, so any
 increase in lock contention would hardly show up. And the patch is only
 adding one GetOldestXmin call every 1000 scanned pages, which is nothing
 compared to the thousands of GetSnapshot calls the normal transactions are
 issuing per minute.

 The patch must have changed the I/O pattern in some subtle way.

 So are you stopping work on the patch?  I assume so.

 Yes, at least for now. I can't believe the patch actually hurts performance,
 but I'm not going to spend time investigating it.

Isn't this exactly what you would expect? It will clean up more tuples so
it'll dirty more pages. Especially given the pessimal way vacuum's dirty
buffers are handled until Simon's patch to fix that goes in.

The benefit of the patch that we would expect to see is that you won't need to
run VACUUM as often. In the long term we would expect the stock table to grow
less too but I doubt these tests were long enough to demonstrate that effect.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] [PATCH] add CLUSTER table ORDER BY index

2007-03-27 Thread Gregory Stark
Holger Schurig [EMAIL PROTECTED] writes:

 * psql tab-completion, it favours now CLUSTER table ORDER BY index

It occurs to me (sorry that I didn't think of this earlier) that if we're
going to use ORDER BY it really ought to take a list columns. It would be
more consistent with what ORDER BY means in queries and one day we may want to
add support for ordering by arbitrary lists of columns even if no index
exists.

It may be reasonable to allow both to coexist and just have an arbitrary rule
that if an index of the specified name exists takes precedence over any
columns. I've never seen anyone name their indexes in a way that would
conflict with a column anyways.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] LIMIT/SORT optimization

2007-03-28 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 Some comments on the patch below.

Thanks!

 Gregory Stark wrote:



 The comment claims that we use heap sort when the user says he doesn't want to
 use glibc's qsort. I recall that we always use our own qsort implementation
 nowadays. And we never used the heap sort as a qsort replacement, did we?

Thanks, I had a version that used heap sort instead of qsort but that was
before I discovered what you said. So I stripped that useless bit out.

 In performsort, you convert the in-memory heap to a sorted list in one go. I
 wonder if it would be better to switch to a new TSS_ALLINHEAP state that means
 all tuples are now in the in-memory heap, and call tuplesort_heap_siftup in
 gettuple. 

The problem is that the heap is backwards. The head of the heap is the
greatest, ie, the last element we want to return. Hm, Is there such a thing as
a two-way heap?

 There's a few blocks of code surrounded with #if 0 - #endif. Are those just
 leftovers that should be removed, or are things that still need to finished 
 and
 enabled?

Uhm, I don't remember, will go look, thanks.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] LIMIT/SORT optimization

2007-03-29 Thread Gregory Stark

Updated patch attached:

1) Removes #if 0 optimizations

2) Changes #if 0 to #if NOT_USED for code that's there for completeness and to
   keep the code self-documenting purposes rather but isn't needed by anything
   currently

3) Fixed cost model to represent bounded sorts




sort-limit-v7.patch.gz
Description: Binary data


Gregory Stark [EMAIL PROTECTED] writes:

 Heikki Linnakangas [EMAIL PROTECTED] writes:

 There's a few blocks of code surrounded with #if 0 - #endif. Are those just
 leftovers that should be removed, or are things that still need to finished 
 and
 enabled?

 Uhm, I don't remember, will go look, thanks.

Ok, they were left over code from an optimization that I decided wasn't very
important to pursue. The code that was ifdef'd out detected when disk sorts
could abort a disk sort merge because it had already generated enough tuples
for to satisfy the limit. 

But I never wrote the code to actually abort the run and it looks a bit
tricky. In any case the disk sort use case is extremely narrow, you would need
something like LIMIT 5 or more to do it and it would have to be a an
input table huge enough to cause multiple rounds of merges.


I think I've figured out how to adjust the cost model. It turns out that it
doesn't usually matter whether the cost model is correct since any case where
the optimization kicks in is a case you're reading a small portion of the
input so it's a case where an index would be *much* better if available. So
the only times the optimization is used is when there's no index available.
Nonetheless it's nice to get the estimates right so that higher levels in the
plan get reasonable values.

I think I figured out how to do the cost model. At least the results are
reasonable. I'm not sure if I've done it the right way though.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Blocked post

2007-04-02 Thread Gregory Stark
Magnus Hagander [EMAIL PROTECTED] writes:

 Plaintext patch, plus two plaintext .c files.

 Then I have no clue. 

Actually I've had more success with .tar.gz than plain text attachments.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Packed Varlena Update (v21)

2007-04-06 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 I wrote:
 stark [EMAIL PROTECTED] writes:
 [ packed varlena patch ]

 Applied with revisions.

 Forgot to mention: one of the revisions was to not add the sizes.sql
 test, because the output was platform-dependent and is likely to get
 more so if any ability to change the toast thresholds gets put in.
 Can we improve that test to not expose any platform-dependent numbers?

I had imagined that we would have two versions of output files. Other than
32-bit versus 64-bit what other platform-dependent variations enter into it?

I'll look at it again on a 64-bit machine, but probably not for another week.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] non-recursive WITH clause support

2007-04-10 Thread Gregory Stark
David Fetter [EMAIL PROTECTED] writes:

 On Mon, Apr 09, 2007 at 06:43:08PM -0400, Tom Lane wrote:

 So this does not really add any new functionality, it's just variant
 syntax for something you can do about as easily without it, right?

 Not totally as easily.  For example, you can do some kinds of
 aggregation with a few fewer keystrokes.

I think fewer keystrokes is exactly what Tom meant by a variant syntax
without new functionality. That's an accurate description.

I suppose it depends in part on how important we think it is to add variant
syntaxes just because they're blessed by the ANSI standard. If this were a
syntax we were creating just for our user's convenience it would be a pretty
weak justification for an incompatibility. But if there are users who expect
this syntax to work because it's standard then it could be considered an
omission in our standards compliance.

I'm actually not too sure what the answer is. I hadn't heard of it before the
discussion about recursive queries myself.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Packed Varlena Update (v21)

2007-04-10 Thread Gregory Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 Added to TODO:

 o Allow single-byte header storage for arrays

Fwiw this is single-byte header storage for varlena array *elements*

The arrays themselves already get the packed varlena treatment.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Dead Space Map version 3 (simplified)

2007-04-23 Thread Gregory Stark
Hiroki Kataoka [EMAIL PROTECTED] writes:

 I think there is no problem.  Bloating will make pages including the
 unnecessary area which will not be accessed.  Soon, those pages will be
 registered into DSM.

Except the whole point of the DSM is to let us vacuum those pages *before*
that happens...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] LIMIT/SORT optimization

2007-04-25 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Bruce Momjian [EMAIL PROTECTED] writes:
 I did some performance testing of the patch, and the results were good. 
 I did this:

  test= CREATE TABLE test (x INTEGER);
  test= INSERT INTO test SELECT * FROM generate_series(1, 100);
  test= SET log_min_duration_statement = 0;
  test= SELECT * FROM test ORDER BY x LIMIT 3;

 LIMIT 3 seems an awfully favorable case; if the patch can only manage a
 factor of 4 speedup there, what happens at limit 10, 20, 100?  

I did a whole bunch of benchmarking and spent a pile of time gathering numbers
in gnumeric to make some pretty graphs. Then gnumeric crashed :( It's days
like this that make me appreciate our own code review process even if I'm
usually wishing it were a bit looser.

Until I gather all those numbers together again I'll just describe the
executive summary. I found that the limit of a factor of 4 improvement was
mostly an artifact of the very narrow and cheap to sort keys. When more of the
time is spent pushing around large tuples or in even moderately expensive
comparisons the speedup is much greater.

When I tested with narrow tuples consisting of only a single integer I found
that the maximum speedup was, as Bruce found, about 4x. I think what's going
on is that the overhead in the linear factor of just reading in all those
tuples is large enough to hide the improvements in sorting. Especially since
with such narrow tuples the resulting data is just too small to overflow
filesystem cache where the big benefits come.

When I tested with text strings ranging from 0-97 characters long in locale
en_GB.UTF-8 I found that the speedup kept going up as the input data set grew.

Sorting the top 1,000 strings from an input set of 1,000,000 strings went from
8m11s in stock Postgres to 12s using the bounded heapsort. 

(Which was an even better result than I had prior to fully randomizing the
data. It probably just got packed better on disk in the source table.)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PATCHES] updated SORT/LIMIT patch

2007-04-25 Thread Gregory Stark

Updated patch against cvs update in case it makes applying easier.

One minor change:

. Added #include limits.h in tuplesort.h to pull in UINT_MAX
  (thanks to dpage for noticing this is necessary on OSX)



sort-limit-v8.patch.gz
Description: Binary data


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] updated SORT/LIMIT patch

2007-05-04 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Updated patch against cvs update in case it makes applying easier.

 Applied with revisions --- notably, I avoided adding any overhead to
 HEAPCOMPARE() by the expedient of reversing the logical sort order
 before heapify'ing.  We couldn't have done that before the NULLS_FIRST
 patch went in, but now it's trivial to make the sort order reverse
 fully.

Hum. The major change I see is the bit related to rescans where you made it
resort if the bound had changed. But surely the only way the bound can change
is if it's a parameter, and if there is a parameter then surely the executor
must be doing more than just a plain rescan? The sort key could have changed
if it depends on the parameter.

What does the executor do differently in the case of a subplan with a
parameter that makes it re-execute the plan from scratch and not just do a
simple rescan?

 Since you didn't include any documentation patch for the
 optimize_bounded_sort GUC variable, I assumed it was meant only for
 debugging and hid it behind #ifdef DEBUG_BOUNDED_SORT.

Sure, I originally had it #ifdef'd on TRACE_SORT but took it out for reasons
that I don't recall.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PATCHES] Doc and comment patch for packed varlena

2007-05-09 Thread Gregory Stark
===
RCS file: /home/stark/src/REPOSITORY/pgsql/src/include/postgres.h,v
retrieving revision 1.80
diff -c -r1.80 postgres.h
*** src/include/postgres.h	4 May 2007 02:01:02 -	1.80
--- src/include/postgres.h	9 May 2007 15:38:37 -
***
*** 235,240 
--- 235,246 
   * use VARSIZE_ANY/VARSIZE_ANY_EXHDR/VARDATA_ANY.  The other macros here
   * should usually be used only by tuple assembly/disassembly code and
   * code that specifically wants to work with still-toasted Datums.
+  *
+  * WARNING: It is only safe to use VARDATA_ANY() -- typically with
+  * PG_DETOAST_DATUM_UNPACKED() -- if you really don't care about the alignment.
+  * Either because you're working with something like text where the alignment
+  * doesn't matter or because you're not going to access its constituent parts
+  * and just use things like memcpy on it anyways.
   */
  #define VARDATA(PTR)		VARDATA_4B(PTR)
  #define VARSIZE(PTR)		VARSIZE_4B(PTR)
***
*** 265,270 
--- 271,277 
  	  VARSIZE_4B(PTR)-4))
  
  /* caution: this will not work on an external or compressed-in-line Datum */
+ /* caution: this will return a possibly unaligned pointer */
  #define VARDATA_ANY(PTR) \
  	 (VARATT_IS_1B(PTR) ? VARDATA_1B(PTR) : VARDATA_4B(PTR))


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] Concurrent psql patch

2007-05-11 Thread Gregory Stark

Fixed the major omissions that made it incomplete.

- Added sgml documentation and \? usage
- Added basic mvcc regression tests using new functionality
- Fixed cursor-mode (FETCH_COUNT) functionality
- Removed \cwait in favour of psql variable ASYNC_DELAY

I'm still not sure it's quite polished enough to commit but if there's any
feedback I'll be happy to fix up anything that's not acceptable.

Also, if anyone has any better ideas for names than \cswitch and \cnowait
now's the time. I had intended them only as placeholders because I couldn't
think of anything better but it doesn't sound like anyone else has any better
ideas either. If not then we're going to be stuck with them. More or less,
it's explicitly described as an experimental feature in the docs so I
suppose we could always change them later.



concurrent-psql-v7.patch.gz
Description: Binary data


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] updated WIP: arrays of composites

2007-05-11 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 * I'm a bit concerned about dump order.  If a user wants to create
 types named foo and _foo, he can, but it will only work if he
 makes _foo first --- else the derived type for foo is in the way.
 Since pg_dump has no clue about that constraint, it might easily
 dump foo first leading to an unrestorable dump.  The most usable
 solution would be to auto-rename previously created array types,
 but I dunno how implementable that would be.

BTW, why exactly do we need array types to have names at all? The only
user-visible way to refer to these types is always by foo[] isn't it?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Concurrent psql patch

2007-05-13 Thread Gregory Stark
Jim Nasby [EMAIL PROTECTED] writes:

 I don't see how we could make the names shorter without moving away from a
 backslash command (which I'm guessing would be painful).

 Assuming we're stuck with a backslash command \cs[witch] and \cn
 [owait] seem to be about as good as we could get.

I don't have \cs or \cn set up as abbreviations. 

I was originally thinking \c1, \c2, ... for \cswitch and \c for \cnowait. I'm
not sure if going for cryptic short commands is better or worse here.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Concurrent psql patch

2007-05-13 Thread Gregory Stark
David Fetter [EMAIL PROTECTED] writes:

 I was originally thinking \c1, \c2, ... for \cswitch and \c for
 \cnowait. I'm not sure if going for cryptic short commands is better
 or worse here.

 +1 for \c1, \c2, etc.

 What's the reasoning behind \c?  Does it send things into the
 background the way  does in the shell?

Sort of. It sends the *subsequent* command to the background... And unlike the
shell you can't usefully do anything more in the current session while the
command is in the background, you have to manually switch sessions before
issuing subsequent commands.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Concurrent psql patch

2007-05-13 Thread Gregory Stark

Jim C. Nasby [EMAIL PROTECTED] writes:

 On Sun, May 13, 2007 at 02:39:45PM +0100, Gregory Stark wrote:
 
 I was originally thinking \c1, \c2, ... for \cswitch and \c for \cnowait. 
 I'm
 not sure if going for cryptic short commands is better or worse here.

 \c sounds fine (as do \c1...\c9). \g would probably be helpful as well
 (send query buffer to server in nowait mode).

Er, I just realized I typed the wrong thing there. It can't be \c since I do
assign a meaning to that make a new connection to the same place as this
one.

I meant \

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Concurrent psql patch

2007-05-13 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 David Fetter [EMAIL PROTECTED] writes:
 What's the reasoning behind \c?  Does it send things into the
 background the way  does in the shell?

 Sort of. It sends the *subsequent* command to the background...

 That sounds just bizarre.  Existing backslash commands that do something
 to a SQL command are typed *after* the command they affect (\g for
 instance).  I don't think you should randomly change that.

So would you prefer \g as Jim Nasby suggested? I hadn't even considered that
previously since I'm not accustomed to using \g but it does seem kind of
pretty. I normally use ; but I suppose there's nothing wrong with just
declaring that asynchronous commands must be issued using \g rather than use
the semicolon to fire them off.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] [DOCS] OS/X startup scripts

2007-05-14 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 David Fetter [EMAIL PROTECTED] writes:
 I haven't included the customary diffs.  This points me to some of the
 many deficiencies of CVS, namely that I would need write access in
 order to have it create a diff,

 Strange, it works fine for everyone else.

If you have rsh/ssh access to a CVS repository then you do in fact need write
access just to generate diffs. It is one of the annoyances of CVS but it
doesn't really matter for Postgres where we use pserver anonymous access.

Personally I find CVS so terribly slow for large trees like Postgres that it's
essential to use rsync to maintain a local CVS repository. That makes 'cvs
diff' remarkably fast.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [DOCS] OS/X startup scripts

2007-05-14 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 You need write-access to add files, even on anonymouse server. We often get
 patches with new files as separate attachments because of that.

Oh quite right. I had forgotten but that was the original reason I switched to
using rsync. The alternative is to manually edit the Entries files to list the
new files.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Concurrent psql patch

2007-05-14 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 So would you prefer \g as Jim Nasby suggested? I hadn't even considered that
 previously since I'm not accustomed to using \g but it does seem kind of
 pretty. I normally use ; but I suppose there's nothing wrong with just
 declaring that asynchronous commands must be issued using \g rather than use
 the semicolon to fire them off.

 It makes sense to me... but what is the state of the session afterward?
 Should this be combined with switching to another connection?

It's an interesting idea since you'll inevitably have to switch connections.
If you issue a second query it'll forces the session to wait for the results.
(It doesn't seem like there's any point in keeping a queue of pending queries
per session.)

However we do still need a command to switch back anyways so there doesn't
seem to be any advantage in combining the two.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Concurrent psql patch

2007-05-14 Thread Gregory Stark

Jim C. Nasby [EMAIL PROTECTED] writes:

 Would \c# limit us to 9 concurrent connections? Might want

 \cs[witch] [session]

Hm, we kind of have a choice with \c#. Either we treat it as part of the
command in which case the way to connect to an integer-named database is to
include a space. We could even have it magically connect to a database if the
connection isn't already active.

But these kinds of inconsistent behaviours can be traps for users. It means
\c1 and \c 1 do different things even though \cpostgres and \c postgres
do the same thing. And it means \c1 might connect to a database named 1
today but switch sessions tomorrow.

Or we treat it as the first argument in which case even \c 9 switches to
session 9. I would prefer to do that but I fear there may be people with
databases named 9.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Concurrent psql patch

2007-05-14 Thread Gregory Stark

Jim C. Nasby [EMAIL PROTECTED] writes:

 Since this command will be getting used very frequently by anyone using
 concurrent connections interactively, it'd be nice if it was lower-case.
 It looks like that limits us to j, k, m, n, v, and y.  In unix this idea
 is about jobs, what about using \j?

Well currently it's not really terribly interesting to use interactively since
you could always just start a second shell and run a second instance of psql.
I really only have regression tests in mind for it. That's why I don't find it
a problem at all to only extend \g and not semicolon handling.

That said, I think a next step for this for interactive use would be to handle
C-z to background the currently running query. So perhaps it does make sense
to keep use cases like that when deciding on command names now.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] updated SORT/LIMIT patch

2007-05-15 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 [ greps a bit... ]  It looks like the only way that you could expose the
 bug in the current state of the system would be if the sort/limit with
 the outer parameter were the inside of a nestloop join in the subplan.
 nodeNestloop would set EXEC_FLAG_REWIND, causing nodeSort to set
 randomAccess, allowing ExecReScanSort to suppose that it could rewind
 the sort.

I finally managed to trigger this case and found that the checks don't
actually work:

postgres=# SELECT (SELECT n 
 FROM (VALUES (1)) AS x, 
  (SELECT n FROM generate_series(1,10) AS n 
ORDER BY n LIMIT 1 OFFSET s-1) AS y) AS z 
 FROM generate_series(1,10) AS s;
ERROR:  retrieved too many tuples in a bounded sort

What's going on is that nodeLimit.c only invokes recompute_limit when the first
tuple is actually generated. It has a comment saying (We can't do this any
earlier, because parameters from upper nodes may not be set until now.)
So the checks are still comparing the previous bound against the boundDone.

Attached is a small patch which fixes this case. It also makes the check
slightly more liberal -- we don't need to resort if the previous sort was
unbounded or the bound was greater than or equal to the new bound.

There is one bit I'm not too sure of. We may or may not end up requesting
tuples from our child node. If we do we have to ReScan it but by then we don't
have the exprCtx passed to the ReScan call. I just made it call ReScan always
even if we later decide we can just rewind the tuplesort, is that ok?

Also, I left a comment that it would be nice if we could peek at the
tuplesort's boundUsed and state to avoid resorting unnecessarily. Currently it
pretty much always resorts unless you construct a bizarre query like the above
to force the randomAccess flag to be true. Most of the time tuplesort is going
to sort in memory anyways even if random access isn't requested and resorting
is pointless.

I think it would be worthwhile adding a method to tuplesort to ask whether
random access is possible and how many tuples were actually kept. Then
nodeSort could ask it those values instead of just remembering what values
were requested.



sortlimit-fix-v2.diff.gz
Description: Binary data


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] updated SORT/LIMIT patch

2007-05-16 Thread Gregory Stark
Alvaro Herrera [EMAIL PROTECTED] writes:

 Gregory Stark wrote:

 Attached is a small patch which fixes this case. It also makes the check
 slightly more liberal -- we don't need to resort if the previous sort was
 unbounded or the bound was greater than or equal to the new bound.

 Huh, can you clarify this comment:

 +* XXX It would be nice to check tuplesortstate-boundUsed too but 
 that
 +* seems like an abstraction violation. And for that matter to check
 +* the tuplesort to see if randomaccess is possible even if it wasn't
 +* requested so we don't resort input when the parameters haven't
 +* changed if it was sorted in memory.

 I'm having serious trouble parsing it.

Well in the executor currently we check node-boundedDone and node-boundDone
to see whether the previous execution of this node had a bound. If it did and
we now don't or if it did but now our bound is larger then we have to
re-execute.

However the tuplesort may have actually done an unbounded sort either because
the bound (actually bound*2) may not have been reached or because it spilled
to disk and we did a disk sort. It would be nice to check that and not have to
reexecute unnecessarily.

But that means peeking into tuplesort's state. I started doing a patch
yesterday to do this by exporting a new method on tuplesort:

bool
tuplesort_random_access(Tuplesortstate *state, bool bounded, unsigned 
tuples_needed)
{
switch (state-status)
{
case TSS_FINALMERGE:
return false;
case TSS_SORTEDONTAPE:
return state-randomAccess;
case TSS_SORTEDINMEM:
return (!state-boundUsed || (bounded  state-bound  
tuples_needed));
default:
return false;
}
}

That solves the abstraction barrier issue but I'm not sure if it's quite
detailed enough. Really Sort only needs to rewind the tuplesort which it can
do even if state-randomAccess is false. We may need two separate functions,
one which tests if rewind is supported and another which tests if random
access is supported. Also, I haven't looked at how the final merge case is
handled. It might be possible to support rescan (but not random access) in
that state as well.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] [DOCS] OS/X startup scripts

2007-05-16 Thread Gregory Stark
David Fetter [EMAIL PROTECTED] writes:

 On Wed, May 16, 2007 at 09:12:23AM +0100, Heikki Linnakangas wrote:
 Jim C. Nasby wrote:
 BTW, is there some trick to getting cvs diff to ignore files that
 aren't in the repo?
 
 Trick? That's what it does by default.

 I suspect he's talking about all the lines starting with '?' that diff
 produces.

 Lacking sophistication, I've been known to do:

 cvs diff [list of files here] |grep -v '^?'  the_file.diff

Those lines go to stderr. If you do cvs diff  file it spits out all the cvs
file statuses to the terminal but dumps the diff to the file.

It doesn't matter, diffs can contain arbitrary junk between the file diffs.
patch only looks at the things it recognizes.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Concurrent psql patch

2007-05-16 Thread Gregory Stark

So based on the feedback and suggestions here this is the interface I suggest:

\connect   - to open a new connection keeping the existing one
\g - to submit a command asynchronously (like  in the shell)
\S [Sess#]  - to _S_witch to a different _S_ession
- if no connection # specified list available _S_essions
\D  - _D_isconnect from current session (like ^D in the shell)

This leaves no way to submit an asynchronous command without using \g but I'm
really not too concerned with that. I don't want to start messing with psql's
semicolon parsing behaviour and I'm mainly only concerned with this for
regression tests.

Another thought I had for the future is a \C command to simulate C-c and send
a query cancel. That would let us have regression tests that query
cancellation worked. The tests would presumably have to be written using
pg_sleep() to ensure they ran for long enough but even then there would be no
way to control exactly when the interrupt arrived.

Attached is an updated patch.

I also found and fixed some missing ResetCancelConn()s. I think I got them all
and the behaviour seems correct in practice when cancelling various
combinations of synchronous queries, asynchronous queries, and backslash
commands. The one thing I wonder about is that I'm a bit concerned I may have
introduced an assumption about how many resultsets arrive from a single query.

I'll be offline for a few days but I'll be back Monday.



concurrent-psql-v8.patch.gz
Description: Binary data


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] updated SORT/LIMIT patch

2007-05-16 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 This patch makes what was already a hack into a full-fledged crock (and
 it's not just the self-doubting comments that make me distrust it).
 I think we need to rip out this ad-hoc parameter change signaling code
 and make it work through the regular chgParam mechanism.  Not sure about
 details though.  There may be no clean solution short of folding
 Sort and Limit into a single node type.

Well I can't disagree, I always was concerned about the inter-node
communication part. If I have power on the train I might look at it then but
otherwise I'm offline until Monday.

 I think it would be worthwhile adding a method to tuplesort to ask whether
 random access is possible and how many tuples were actually kept. Then
 nodeSort could ask it those values instead of just remembering what values
 were requested.

 I disagree with this line of development, as it amounts to exposing
 tuplesort implementation details as API.

I'm not sure I agree. What's the difference if between using a boolean value
we pass to tuplesort requesting random access and using a boolean value we get
back from asking tuplesort?

The tuples_needed is a bit of a wart but then it's the same inevitable wart
as set_bound.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] updated SORT/LIMIT patch

2007-05-17 Thread Gregory Stark
Gregory Stark [EMAIL PROTECTED] writes:

 Tom Lane [EMAIL PROTECTED] writes:

 This patch makes what was already a hack into a full-fledged crock (and
 it's not just the self-doubting comments that make me distrust it).
 I think we need to rip out this ad-hoc parameter change signaling code
 and make it work through the regular chgParam mechanism.  Not sure about
 details though.  There may be no clean solution short of folding
 Sort and Limit into a single node type.

 Well I can't disagree, I always was concerned about the inter-node
 communication part. If I have power on the train I might look at it then but
 otherwise I'm offline until Monday.

I did in fact have power on the train and due the wonders of a local rsync'd
CVS repository I could even view cvs logs and diffs offline. It's interesting
how I've read all this code and comments several times and each time I get
more out of it.

It doesn't look like the timing of the ExecRescan is an issue at all. There
are plenty of nodes that Rescan their children much later than when they first
start up. Even Nested Loop does so.

I do still need to read more about parameters and how the parameter sets are
initially constructed. It would be nice to set it up so the sort node gets
signalled using the existing infrastructure.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] CREATE TABLE LIKE INCLUDING INDEXES support

2007-05-17 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 NikhilS [EMAIL PROTECTED] writes:
 I just now realized that even though we allow the above. We do not allow:

 pg=# create table t1 (a int, b int, unique(a+b));

 Any specific reason for this behaviour?

 It'd be contrary to SQL spec.  The UNIQUE constraint takes a list of
 column names, full stop.

Does the SQL spec actually specify what happens if you provide an
non-compliant table definition like this? 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Concurrent psql patch

2007-05-21 Thread Gregory Stark
Andrew Dunstan [EMAIL PROTECTED] writes:

 Gregory Stark wrote:
 Attached is an updated patch.


 This patch appears to add a nonexistent test to the regression schedules.

I must have forgotten to cvs add it. Sorry.

Also, I forgot to mention previously there is an unrelated trivial hunk in
here. I noticed we free the password early, presumably for security reasons,
but don't actually overwrite the memory at that point. I added a memset in
there, otherwise the free seems kind of pointless. I normally don't bother
with security features like that since they don't really add any security
but as long as it's there it may as well do something vaguely useful.



concurrent-psql-v9.patch.gz
Description: Binary data

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] cluster test

2007-05-25 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 This is in the regression database after a completed regression run, so
 it's possible that it's a bit different state from what's seen at the
 instant the cluster test was running, but it sure looks like the
 expected results are what you get from a seqscan.  Would you force a
 seqscan and see what EXPLAIN shows as the cost on your machine?

Perhaps this comes down to 64 vs 32 bit datum and aligments and therefore
different size tables which because the planner does the lseek to measure the
table size shows up as different estimates for sequential scan costs?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Seq scans status update

2007-05-28 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 A point I have not figured out how to deal with is that in the patch as
 given, UnpinBuffer needs to know the strategy; and getting it that info
 would make the changes far more invasive.  But the patch's behavior here
 is pretty risky anyway, since the strategy global at the time of unpin
 might have nothing to do with how it was set when the buffer was
 acquired.  What I'm tempted to do is remove the special case there and
 adjust buffer acquisition instead (maybe make it decrement the
 usage_count when re-using a buffer from the ring).

Is there a reason UnpinBuffer has to be the one to increment the usage count
anyways? Why can't ReadBuffer handle incrementing the count and just trust
that it won't be decremented until the buffer is unpinned anyways?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Concurrent psql patch

2007-05-29 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Andrew Dunstan [EMAIL PROTECTED] writes:
 if (pset.c-db-asyncStatus != PGASYNC_BUSY)
 {
 break;
 }

 There already is a defined API for this, namely PQisBusy().

 In any case, I rather concur with the XXX comment: busy-waiting like
 this sucks.  The correct way to do this is to get the socket numbers for
 the connections (via PQsocket), wait for any of them to be read-ready
 according to select() (or for the timeout to elapse, assuming that we
 think that behavior is good), then cycle through PQconsumeInput() and
 PQisBusy() on each connection.  See
 http://www.postgresql.org/docs/8.2/static/libpq-async.html

Huh, so it turns out we already have code that does exactly this in
pqSocketPoll and pqSocketCheck. Except that they have too little resolution
because they work with time_t which means we would have to wait at least 1-2
seconds.

And pqSocketCheck keeps looping when it gets an EINTR which doesn't seem like
the right thing for psql to do.

It would be nice to use these functions though because:

a) They get the SSL case right in that that they check the SSL buffer before
   calling select/poll.

b) They use poll if available and fall back to select

c) they would keep the select/poll system code out of psql where there's none
   of it currently.

So would I be better off adding a PQSocketPollms() which works in milliseconds
instead of seconds? Or should I just copy all this code into psql?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Concurrent psql patch

2007-06-01 Thread Gregory Stark
Andrew Dunstan [EMAIL PROTECTED] writes:

 Unless you have a better solution I think we'd better do that, though.

 I notice that the header is also included in command.c even though it doesn't
 use PGASYNC_IDLE.

 Are you going to make these fixes?

Yes, sorry, I started to already but got distracted by some tests I've been 
running.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PATCHES] Bug in date.c

2007-06-02 Thread Gregory Stark
:
 
~!@

I would be inclined to just go ahead and just call textout which would
effectively be pallocing a copy. Is there some reason these functions in
particular shouldn't leak memory?

I've attached both a patch that does that and a patch that just makes the
minimal fix of calling textout when the error is thrown.

Alternatively it might be handy to have a custom escape in errmsg format
strings for text varlena data.



date.c.patch.gz
Description: Binary data


date-minimal.c.patch.gz
Description: Binary data


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Synchronized scans

2007-06-04 Thread Gregory Stark

Heikki Linnakangas [EMAIL PROTECTED] writes:

 Were you thinking of storing the PID of the backend that originally created 
 the
 hint, or updating the PID every time the hint is updated? In any case, we 
 still
 wouldn't know if there's other scanners still running.

My reaction was if you always put the pid in when updating the hint it might
work out pretty well. When you finish you remove the hint iff the pid is your
own. 

It has exactly the same properties you describe of leaving a small window with
no hint when you finish a scan but only if you were the last to update the
scan position which i would expect would be pretty rare except for the last
scanner.

If a backend died it would leave a scan position behind but the next scanner
on that table would overwrite the pid and then remove it when it's finished.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Synchronized scans

2007-06-09 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 The vacuum-cost-limit issue may be sufficient reason to kill this idea;
 not sure.

We already have a much higher cost for blocks that cause i/o than blocks which
don't. I think if we had zero cost for blocks which don't cause i/o it would
basically work unless the sleep time was so large that the other scans managed
to cycle through the entire ring in that time.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Synchronized scans

2007-06-10 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 I don't think sync-scanning vacuum is worth pursuing, though, because of the
 other issues: index scans, vacuum cost accounting, and the fact that the 2nd
 pass would be harder to synchronize. There's a lot of other interesting ideas
 for vacuum that are more generally applicable.

I think we could probably arrange for vacuum to synchronize. If there's one
sequential scan running we have to imagine there are others coming along soon
too. so if we become desynchronized we'll just coerce the next one to start
where we want and follow it for a while.

However I have a another worry. Even if we did manage to get vacuum
synchronizing well what would it do to the sequential scan performance.
Instead of zipping along reading clean blocks into its small ring buffer and
discarding them when it's done it'll suddenly find many of its blocks dirty
when it goes to reuse them. Effectively we'll have just reinvented the problem
we had with vacuum previously albeit in a way which only hits sequential scans
particularly hard.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PATCHES] Two aesthetic bugs in the 1-byte packed varlena code

2007-06-12 Thread Gregory Stark

Korry spotted two violations of the rule about accessing the first varlena
field directly which I had missed. In both cases I believe the accesses are
actually safe because they follow an int4 column and the data are either
explicitly detoasted or passed to textout which can handle 1-byte headers
fine.

One instance is in flatfiles.c when it accesses the password field which is a
text field. Here I don't see any reasonable way to avoid it and think we
should just document that we're bending the rules.

The other instance is in inv_api.c where it would be quite possible to use
fastgetattr() instead. But the column is always at the same fixed offset and
again it follows an int4 so it'll always be 4-byte aligned and work fine. So
for performance reasons perhaps we should keep this as well?

I've attached three patches. A patch which adds a comment for flatfiles.c. And
two patches for inv_api.c one which just adds a comment and one which converts
it to use fastgetattr().

Again, credit to Korry for spotting these. To do so he commented out all the
varlena fields (excluding int2vector and oidvector) from the struct
definitions and saw what broke. 

Why do we even have those fields in the structs if they're unsafe to use?
Perhaps we should #if 0 out all the unsafe fields from all the struct
definitions? That would avoid one of the most common new programmer bugs and
also let us document the few exceptions and why they're allowed.



inv_api.c-minimal.gz
Description: Binary data


flatfiles.c-minimal.gz
Description: Binary data


inv_api.c-fastgetattr.gz
Description: Binary data

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Two aesthetic bugs in the 1-byte packed varlena code

2007-06-12 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Why do we even have those fields in the structs if they're unsafe to use?

 1. genbki.sh

But genbki.sh wouldn't care if we #if 0 around the unsafe ones would it?

 2. As you note, they're not always unsafe to use.

Well, it seems like it would be nice to mark which ones are and aren't unsafe
rather than have them all be there waiting to trip people up.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Two aesthetic bugs in the 1-byte packed varlena code

2007-06-12 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 The other instance is in inv_api.c where it would be quite possible to use
 fastgetattr() instead. But the column is always at the same fixed offset and
 again it follows an int4 so it'll always be 4-byte aligned and work fine. So
 for performance reasons perhaps we should keep this as well?

 After looking closer, I think there are worse problems here: the code is
 still using VARSIZE/VARDATA etc, which it should not be because the
 field could easily be in 1-byte-header form.  

Well that's ok because VARATT_IS_EXTENDED returns true for 1-byte forms so
it'll detoast them first. We could avoid the detoasting but given that it's
expecting the chunks to be compressed anyways the memcpys of the smallest
chunks probably don't matter much either way. I'm assuming it's like toast in
that only the last chunk will be smaller than LOBLKSIZE anyways, right?

 I concur that we probably want to avoid adding fastgetattr for
 performance reasons, seeing that this table's layout seems unlikely
 to change.  But it seems like we might want to trouble with a null
 test.  Thoughts?

There should never even be a null bitmap right? Maybe we should just
elog(ERROR) if we find HeapTupleHasNulls(tuple) to be true at all.

  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PATCHES] Silly bug in pgbench's random number generator

2007-06-14 Thread Gregory Stark

pgbench's random number generator was only generating the first and last value
in the specified range half as often as other values in the range. Not that it
actually matters but it may as well do what it claims. This line has a pretty
long and sordid history with various people tweaking it one way and another.

cvs diff: Diffing contrib/pgbench
Index: contrib/pgbench/pgbench.c
===
RCS file: /home/stark/src/REPOSITORY/pgsql/contrib/pgbench/pgbench.c,v
retrieving revision 1.66
diff -u -r1.66 pgbench.c
--- contrib/pgbench/pgbench.c   24 May 2007 18:54:10 -  1.66
+++ contrib/pgbench/pgbench.c   14 Jun 2007 16:22:19 -
@@ -191,7 +191,7 @@
 static int
 getrand(int min, int max)
 {
-   return min + (int) (((max - min) * (double) random()) / 
MAX_RANDOM_VALUE + 0.5);
+   return min + (int) (((max - min + 1) * (double) random()) / 
MAX_RANDOM_VALUE);
 }
 
 /* call PQexec() and exit() on failure */


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Maintaining cluster order on insert

2007-06-18 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 The reason for switching to the new API instead of the amsuggestblock API is
 CPU overhead. It avoids constructing the IndexTuple twice and descending the
 tree twice.

I wonder if it's possible to finesse this. Have the suggestblock function
remember the last block it suggested and somehow recognize when it's given the
same tuple to index. Keeping the block pinned would still be the sticky point
though.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-19 Thread Gregory Stark

 * Tom Lane ([EMAIL PROTECTED]) wrote:

 Yeah, I wouldn't want one per second.  Do we already track how long
 we've been waiting?

 No, because we're *asleep*.  You'd have to add an additional
 timeout-interrupt reason.  Plus there's a ton of interesting questions
 about what's safe to do from an interrupt service routine.

 In fact, I am scandalized to see that someone has inserted a boatload
 of elog calls into CheckDeadLock since 8.2 --- that seems entirely
 unsafe.  [ checks revision history... ]

Attached is a patch which moves the messages to ProcSleep(). To do this I had
to move the semaphore signal to unconditionally be signalled whenever
CheckDeadLock() is called regardless of whether it finds a hard deadlock. I'm
not 100% sure that's correct but afaik we only use semaphores to signal state
changes and deal with spurious semaphore firings everywhere.

Incidentally in looking at this I found that the early deadlock detection
never seems to fire. Reading the source it seems it ought to be firing
whenever we have a simple two-process deadlock. But instead I only get the
timeout-based detection.



checkpoint-log-messages-fix.patch.gz
Description: Binary data

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-19 Thread Gregory Stark
Gregory Stark [EMAIL PROTECTED] writes:

 Incidentally in looking at this I found that the early deadlock detection
 never seems to fire. Reading the source it seems it ought to be firing
 whenever we have a simple two-process deadlock. But instead I only get the
 timeout-based detection.

Ok, I understand now that early deadlock detection only kicks in when doing
something like LOCK TABLE and even then only if you're deadlocking because
you're upgrading a lock. So this works as intended though it's much less
useful than I thought.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-20 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Applied with some further revisions to improve the usefulness of the log
 messages.  There's now one issued when the deadlock timeout elapses, and
 another when the lock is finally obtained:

 LOG:  process 14945 still waiting for AccessExclusiveLock on relation 86076 
 of database 86042 after 1003.354 ms
 ...
 LOG:  process 14945 acquired AccessExclusiveLock on relation 86076 of 
 database 86042 after 5918.002 ms

Is it possible for unlocking the semaphore to wake another process other than
our own? In which case checking log_lock_waits before signalling the semaphore
arguably locks us into having log_lock_waits be PGC_POSTMASTER. Currently it's
PGC_SIGHUP which is odd since it could have been USERSET in the old regime.

Also, I did just think of a reason why maybe having the times in the messages
could be annoying: it makes it hard to write regression tests. I suppose
having the pids already interferes with regression tests though. Maybe we
should do something like optionally postprocess .out files with some sed
script like s/[0-9]+/###/ before running diff.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-20 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Is it possible for unlocking the semaphore to wake another process other than
 our own? In which case checking log_lock_waits before signalling the 
 semaphore
 arguably locks us into having log_lock_waits be PGC_POSTMASTER.

 How you figure that?

Well I'm not clear exactly what's going on with the semaphores here. If it's
possible for to be printing the messages only as a result of another backend
unlocking the semaphore then making the PGSemaphoreUnlock conditional on
log_lock_waits means you can't enable log_lock_waits after startup and get
deterministic behaviour because whether you get messages will depend on which
other backend happens to wake you up.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-20 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Which is always the same process:
   PGSemaphoreUnlock(MyProc-sem);

Aaah! I just grokked what's going on with the semaphores here. It all makes a
lot more sense now. Nevermind.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Gregory Stark

Heikki Linnakangas [EMAIL PROTECTED] writes:

 We could just allow any value up to 1.0, and note in the docs that you should
 leave some headroom, unless you don't mind starting the next checkpoint a bit
 late. That actually sounds pretty good.

What exactly happens if a checkpoint takes so long that the next checkpoint
starts. Aside from it not actually helping is there much reason to avoid this
situation? Have we ever actually tested it?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Gregory Stark
Greg Smith [EMAIL PROTECTED] writes:

 If you write them twice, so what? You didn't even get to that point as an
 option until all the important stuff was taken care of and the system was
 near idle.

Well even if it's near idle you were still occupying the i/o system for a few
milliseconds. If someone else came in with a block request at that time you
extended their response time by that much. 

 The elimination of the all-scan background writer means that true hot and 
 dirty
 spots in the buffer cache, like popular index blocks on a heavily updated 
 table
 that never get a zero usage_count, are never going to be written out other 
 than
 as part of the checkpoint process.  

If they're really popular blocks on a heavily updated table then they really
don't buy us anything to write them out unnecessarily. 

The case where they help us is when they weren't really popular but we're not
doing enough to get around to writing them out and then when we do need to
write it out the system's busy. In that case we wasted a chance to write them
out when the system was more idle.

But don't forget that we're still going through the OS's buffers. That will
smooth out a lot of the i/o we generate anyways. Just because Postgres is idle
doesn't mean the OS isn't busy flushing the buffers we wrote out when it was
busy.

 That's OK for now, but I'd like it to be the case that one day the
 database's I/O scheduling would eventually get to those, in order to
 optimize performance in the kind of bursty scenarios I've been mentioning
 lately.

I think the feeling is that the bursty scenarios are really corner cases.
Heikki described a case where you're dirtying tons of blocks without
triggering any WAL. That can happen but it's pretty peculiar.

I can imagine a scenario where you have a system that's very busy for 60s and
then idle for 60s repeatedly. And for some reason you configure a
checkpoint_timeout on the order of 20m or so (assuming you're travelling
precisely 60mph). 

In that scenario bgwriter's lru scan has to fight to keep up for 60s while
it's mostly writing out dirty pages that it could have flushed out during the
idle time. Effectively you're only making use of half the i/o bandwidth since
bgwriter doesn't do any work for half the duty cycle.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] dblink connection security

2007-07-01 Thread Gregory Stark
Joe Conway [EMAIL PROTECTED] writes:

 Robert Treat wrote:
 Patch based on recent -hackers discussions, it removes usage from public, and
 adds a note to the documentation about why this is neccessary.


 I agree with the fix as the simplest and most sensible approach, and in 
 general
 with the doc change, but I'm not inclined to reference the security paper.
 Maybe something like:

As a security precaution, dblink revokes access from PUBLIC role
usage for the dblink_connect functions. It is not safe to allow
remote users to execute dblink from a database in a PostgreSQL
installation that allows local account access using the trust
authentication method. In that case, remote users could gain
access to other accounts via dblink. If trust authentication
is disabled, this is no longer an issue.

I think putting the emphasis on Postgresql trust authentication is missing the
broader point. I would suggest two paragraphs such as:

 dblink allows any connected user to attempt to connect to TCP/IP or Unix
 sockets from the database server as the user the database system is running.
 This could allow users to circumvent access control policies based on the
 connecting user or the connecting host.

 In particular Postgres's trust authentication is one such system. It
 authenticates connecting users based on the unix userid of the process
 forming the connection. In typical configurations any user who is granted
 execute access to dblink can form connections as the postgres user which is
 the database super-user. If trust authentication is disabled this is no
 longer an issue.

 Therefore dblink requires you to explicitly grant execute privileges to users
 or roles you wish to allow to form connections. It does not grant these
 privileges to the PUBLIC role by default as other packages typically do. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] dblink connection security

2007-07-01 Thread Gregory Stark
Robert Treat [EMAIL PROTECTED] writes:

  In particular Postgres's trust authentication is one such system. It
  authenticates connecting users based on the unix userid of the process
  forming the connection. In typical configurations any user who is granted
  execute access to dblink can form connections as the postgres user which
  is the database super-user. If trust authentication is disabled this is
  no longer an issue.

 Did you mean s/trust/ident/g, otherwise I don't think I understand the 
 above...   granted the combination of trust for localhost does open a door 
 for remote users if they have access to dblink, but I don't think that's what 
 you were trying to say. 

Er quite right. Moreover it's not even true that ``if ident authentication
is disabled this is no longer an issue''. It's still possible to have other
restrictions in pg_hba which dblink would allow you to circumvent. That
sentence is too generous of a promise.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


  1   2   3   >