date:20080917

Re: [HACKERS] 0x1A in control file on Windows

2008-09-17 Thread ITAGAKI Takahiro


Tom Lane <[EMAIL PROTECTED]> wrote:

> ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> > We probably need to add PG_BINARY when we open control files
> > because 0x1A is an end-of-file marker on Windows.
> 
> Well, why is that a bug?  If the platform is so silly as to define text
> files that way, who are we to argue?

Google says it is for for backward compatibility with CP/M
http://en.wikipedia.org/wiki/End-of-file
and adding O_BINARY is the answer.

http://codenewbie.com/forum/standard-c-c/1208-binary-i-o-file-reading-0x1a-trouble.html

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Heikki Linnakangas


Tom Lane wrote:

Alvaro Herrera <[EMAIL PROTECTED]> writes:

BTW in testing this patch I was surprised by the fact that temp tables
files are removed at checkpoint time,


[ blink... ]  Doesn't look like that should happen.  What is your
test case?


Hmph, must be because of the patch from last winter to prevent 
relfilenode reuse until next checkpoint. Looks like we didn't make an 
exception for temporary tables. Although it's harmless, we could put an 
isTempOrToastNamespace() test in there:


*** md.c11 Aug 2008 11:05:11 -  1.139
--- md.c18 Sep 2008 06:22:13 -
***
*** 19,24 
--- 19,25 
  #include 

  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "miscadmin.h"
  #include "postmaster/bgwriter.h"
  #include "storage/fd.h"
***
*** 324,330 
/*
 * Delete or truncate the first segment.
 */
!   if (isRedo || forkNum != MAIN_FORKNUM)
ret = unlink(path);
else
{
--- 325,331 
/*
 * Delete or truncate the first segment.
 */
! 	if (isRedo || forkNum != MAIN_FORKNUM || 
isTempOrToastNamespace(rnode.spcNode))

ret = unlink(path);
else
{

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 0x1A in control file on Windows

2008-09-17 Thread Tom Lane

ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> I found a bug that pg_controldata ends with error if control files
> contain 0x1A (Ctrl+Z) on Windows.

> We probably need to add PG_BINARY when we open control files
> because 0x1A is an end-of-file marker on Windows.

Well, why is that a bug?  If the platform is so silly as to define text
files that way, who are we to argue?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> BTW in testing this patch I was surprised by the fact that temp tables
> files are removed at checkpoint time,

[ blink... ]  Doesn't look like that should happen.  What is your
test case?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Tatsuo Ishii

Tom, thanks for the review.

Here is an in-progress report. Patches against CVS HEAD attached.
(uncommented items are work-in-progress).
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> > Included is the latest patches against CVS HEAD.
> 
> I spent some time reading this patch.  Here are a few comments in
> no particular order:
> 
> RangeRecursive node lacks copyfuncs/equalfuncs support.

Functions added.
 
> Query.recursive is missed in equalfuncs.c.  But rather than fix that,
> get rid of it entirely.  AFAICS the only use is in qual_is_pushdown_safe,
> and what is the value of that test?  The callers know perfectly well
> whether they are operating on a recursive RTE or not.  You might as well
> just delete all the useless qual-pushdown-attempt code from
> set_recursion_pathlist, and not need to touch qual_is_pushdown_safe
> at all.

Query.recursive removed and qual_is_pushdown_safe is untouched.

> Is physical_tlist optimization sensible for RecursiveScan?  We seem
> to use it for every other Scan node type.

Fixed and physical_tlist optimization is enabled for RecursiveScan I
believe.

> I dislike putting state into ExecutorState; that makes it impossible
> to have multiple recursion nodes in one plan tree.  It would probably
> be better for the Recursion and RecursiveScan nodes to talk to each
> other directly (compare HashJoin/Hash); although since they are not
> adjacent in the plan tree I admit I'm not sure how to do that.
> 
> es_disallow_tuplestore doesn't seem to need to be in ExecutorState
> at all, it could as well be in RecursionState.

It was for a workaround to avoid an infinit recursion in some
cases. Discussion came to the conclusion that it's user's
responsibilty to avoid such that case (otherwise the semantics of our
recursive query becomes to be different from the one defined in the
standard) I believe.

es_disallow_tuplestore removed.

> I don't really like the way that Append nodes are being abused here.
> It would be better to allow nodeRecursion.c to duplicate a little code
> from nodeAppend.c, and have the child plans be direct children of
> the Recursion node.  BTW, is it actually possible to have more than
> two children?  I didn't spend enough time analyzing the restrictions
> in parse_cte to be sure.  If there are always just two then you could
> simplify the representation by treating it like a join node instead
> of an append.  (The RTE_RECURSIVE representation sure makes it look
> like there can be only two...)
> 
> Mark/restore support seems useless ... note the comment on
> ExecSupportsMarkRestore (which should be updated if this code
> isn't removed).
> 
> RecursiveScan claims to support backwards fetch, but does not in fact
> contain code to do so.  (Given that it will always be underneath
> Recursion, which can't do backwards fetch, I see little point in adding
> such code; fix execAmi.c instead.)
> 
> ExecInitRecursion doesn't seem to be on the same page about whether
> it supports backward scan as execAmi.c, either.
> 
> This comment in nodeRecursivescan.c seems bogus:
>   /*
>* Do not initialize scan tuple type, result tuple type and
>* projection info in ExecInitRecursivescan. These types are
>* initialized after initializing Recursion node.
>*/
> because the code seems to be doing exactly what the comment says it
> doesn't.
> 
> Numerous comments appear to have been copied-and-pasted and not modified
> from the original.  Somebody will have to go over all that text.
> 
> ruleutils.c fails completely for non-recursive WITH.  It *must* regenerate
> such a query with a WITH clause, not as a flattened subquery which is what
> you seem to be doing.  This isn't negotiable because the semantics are
> different.  This will mean at least some change in the parsetree
> representation.  Perhaps we could add a bool to subquery RTEs to mark them
> as coming from a nonrecursive WITH?  The tests added for RTE_RECURSIVE
> seem a bit ugly too.  If it thinks that can't happen it should Assert so,
> not just fall through silently.
> 
> commentary for ParseState.p_ctenamespace is gratuitously unlike the
> comment style for the other fields, and p_recursive_namespace isn't
> documented at all.
> 
> ParseState.p_in_with_clause is unused, should be removed.

Done.

> The WithClause struct definition is poorly commented.  It should be
> stated that it is used only pre-parse-analysis (assuming that continues
> to be true after you get done fixing ruleutils.c...), and it doesn't
> say what the elements of the subquery list are (specifically, what
> node type).  A lot of the other added structs and fields could use
> better commenting too.
> 
> For that matter "subquery" is a poor name for WithClause's list of CTEs,
> especially so since it's hard to search for.  It should be a plural name
> and I'd be inclined to use something like "ctes" not "subqueries".
> The term "subquery" is too overloaded already, so any place you

[HACKERS] Where is Aggregation Attribute

2008-09-17 Thread Zhe He

I want to get the attribute from an Aggregation node, where is it?

typedef struct Agg
{
Plan plan;
AggStrategy aggstrategy;
int numCols; /* number of grouping columns */
AttrNumber *grpColIdx; /* their indexes in the target list */
Oid   *grpOperators; /* equality operators to compare with */
long numGroups; /* estimated number of groups in input */
} Agg;

-- 
Best Regards,
Zhe HE
TEL: (001) 646-789-3008
Address:965 Amsterdam Avenue,
New York, NY 10025

Master Student, CS Dept.
Columbia University
www.columbia.edu/~zh2132
---
07 Alumni
Bachelor of Eng, BUPT
www.bupt.edu.cn

[HACKERS] 0x1A in control file on Windows

2008-09-17 Thread ITAGAKI Takahiro

I found a bug that pg_controldata ends with error if control files
contain 0x1A (Ctrl+Z) on Windows.

We probably need to add PG_BINARY when we open control files
because 0x1A is an end-of-file marker on Windows.
This fix needs to be applied in back versions (8.2, 8.3 and HEAD).


Index: src/bin/pg_controldata/pg_controldata.c
===
--- src/bin/pg_controldata/pg_controldata.c (head)
+++ src/bin/pg_controldata/pg_controldata.c (pg_control_0x1A)
@@ -107,7 +107,7 @@
 
snprintf(ControlFilePath, MAXPGPATH, "%s/global/pg_control", DataDir);
 
-   if ((fd = open(ControlFilePath, O_RDONLY, 0)) == -1)
+   if ((fd = open(ControlFilePath, O_RDONLY | PG_BINARY, 0)) == -1)
{
fprintf(stderr, _("%s: could not open file \"%s\" for reading: 
%s\n"),
progname, ControlFilePath, strerror(errno));
Index: src/bin/pg_resetxlog/pg_resetxlog.c
===
--- src/bin/pg_resetxlog/pg_resetxlog.c (head)
+++ src/bin/pg_resetxlog/pg_resetxlog.c (pg_control_0x1A)
@@ -373,7 +373,7 @@
char   *buffer;
pg_crc32crc;
 
-   if ((fd = open(XLOG_CONTROL_FILE, O_RDONLY, 0)) < 0)
+   if ((fd = open(XLOG_CONTROL_FILE, O_RDONLY | PG_BINARY, 0)) < 0)
{
/*
 * If pg_control is not there at all, or we can't read it, the 
odds

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New FSM patch

2008-09-17 Thread Decibel!


On Sep 17, 2008, at 9:30 AM, Heikki Linnakangas wrote:
I think we'd still need to WAL log operations that decrease the  
amount of free space on page. Otherwise, after we have partial  
vacuum, we might never revisit a page, and update the FSM, even  
though there's usable space on the page, leaving the space  
forgotten forever.



ISTM that it would be better to deal with such corner cases via  
periodic non-partial vacuums, done by something like autovac, and  
probably done with an ever higher-than-normal vacuum_cost_delay  
setting so as to minimize performance. That's likely a lot less  
wasteful than further compounding lock contention for the WAL. Even  
if it does result in more overall IO, you have to trade a *lot* of IO  
to balance out the impact of lock contention.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Alvaro Herrera


BTW in testing this patch I was surprised by the fact that temp tables
files are removed at checkpoint time, rather than when the transaction
ends (at first I thought I had broken the removal of temp files).  Is
this a recent feature?

(I verified that this continues to work fine for WITH HOLD cursors too.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Ah -- like this?
> 
> +1, but there are two kinds of temp files in that module, and only
> one of them is relevant here.  Call it something like
> have_xact_temporary_files to make things clearer.

Ok, so that's what I called it.

Simon wrote:

> if test should include
> || isProcExit
> 
> so you don't skip non-transactional temp files at proc exit when there
> weren't any created in the last transaction.

Yep, I noticed that too.  Thanks.

Should I backpatch this to 8.3?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Index: src/backend/storage/file/fd.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/storage/file/fd.c,v
retrieving revision 1.143
diff -c -p -r1.143 fd.c
*** src/backend/storage/file/fd.c	1 Jan 2008 19:45:51 -	1.143
--- src/backend/storage/file/fd.c	17 Sep 2008 22:58:32 -
*** static int	max_safe_fds = 32;	/* default
*** 121,126 
--- 121,132 
  #define FD_TEMPORARY		(1 << 0)	/* T = delete when closed */
  #define FD_XACT_TEMPORARY	(1 << 1)	/* T = delete at eoXact */
  
+ /*
+  * Flag to tell whether it's worth scanning VfdCache looking for temp files to
+  * close
+  */
+ static bool		have_xact_temporary_files = false;
+ 
  typedef struct vfd
  {
  	signed short fd;			/* current FD, or VFD_CLOSED if none */
*** OpenTemporaryFile(bool interXact)
*** 889,894 
--- 895,903 
  	{
  		VfdCache[file].fdstate |= FD_XACT_TEMPORARY;
  		VfdCache[file].create_subid = GetCurrentSubTransactionId();
+ 
+ 		/* ensure cleanup happens at eoxact */
+ 		have_xact_temporary_files = true;
  	}
  
  	return file;
*** AtEOSubXact_Files(bool isCommit, SubTran
*** 1603,1609 
  {
  	Index		i;
  
! 	if (SizeVfdCache > 0)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i < SizeVfdCache; i++)
--- 1612,1618 
  {
  	Index		i;
  
! 	if (have_xact_temporary_files)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i < SizeVfdCache; i++)
*** CleanupTempFiles(bool isProcExit)
*** 1679,1685 
  {
  	Index		i;
  
! 	if (SizeVfdCache > 0)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i < SizeVfdCache; i++)
--- 1688,1698 
  {
  	Index		i;
  
! 	/*
! 	 * Careful here: at proc_exit we need extra cleanup, not just
! 	 * xact_temporary files.
! 	 */
! 	if (isProcExit || have_xact_temporary_files)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i < SizeVfdCache; i++)
*** CleanupTempFiles(bool isProcExit)
*** 1697,1702 
--- 1710,1717 
  	FileClose(i);
  			}
  		}
+ 
+ 		have_xact_temporary_files = false;
  	}
  
  	while (numAllocatedDescs > 0)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Simon Riggs


On Wed, 2008-09-17 at 17:34 -0400, Alvaro Herrera wrote:

> Ah -- like this?

if test should include 
|| isProcExit

so you don't skip non-transactional temp files at proc exit when there
weren't any created in the last transaction.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Simon Riggs


On Wed, 2008-09-17 at 17:34 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
> > 
> > On Wed, 2008-09-17 at 16:25 -0400, Alvaro Herrera wrote:
> > 
> > > We've been profiling a large system (8 CPUs, 64 GB of memory, some
> > > dozens of disks) which seems rather more swamped than it should.  Part
> > > of the problem seems to come from CleanupTempFiles, the second entry in
> > > oprofile output.
> > 
> > I'm glad you've observed this also. I saw it about two years ago but
> > wasn't able to convince anyone else it existed at the time.
> 
> I couldn't find it in the archives.

Perhaps it was a private conversation then, but the main point is I've
seen it too and believe it is a real world problem.

> > Simple solution is to have a state variable so you can see whether a
> > backend has created an temp files in this transaction. Most don't, so I
> > think the above two solutions are overkill. If we created any, scan for
> > them, if not, don't. Just a simple boolean state, just as we have for
> > AtEOXact_RelationCache().
> 
> Ah -- like this?

Yeh, nice and simple.

Might be better to call it "need_eoxact_work" to mirror relcache.c?
any_temporary_files is fairly vague and could be misinterpreted.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Ah -- like this?

+1, but there are two kinds of temp files in that module, and only
one of them is relevant here.  Call it something like
have_xact_temporary_files to make things clearer.

I concur that the explicit test on SizeVfdCache > 0 is a waste of
effort, too.  It'll nearly always be true anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Simon Riggs wrote:

> > Simple solution is to have a state variable so you can see whether a
> > backend has created an temp files in this transaction. Most don't, so I
> > think the above two solutions are overkill. If we created any, scan for
> > them, if not, don't. Just a simple boolean state, just as we have for
> > AtEOXact_RelationCache().
> 
> Ah -- like this?

BTW I wonder if there's any point in checking SizeVfdCache > 0 in the
places where checking the new flag suffices.


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Alvaro Herrera

Simon Riggs wrote:
> 
> On Wed, 2008-09-17 at 16:25 -0400, Alvaro Herrera wrote:
> 
> > We've been profiling a large system (8 CPUs, 64 GB of memory, some
> > dozens of disks) which seems rather more swamped than it should.  Part
> > of the problem seems to come from CleanupTempFiles, the second entry in
> > oprofile output.
> 
> I'm glad you've observed this also. I saw it about two years ago but
> wasn't able to convince anyone else it existed at the time.

I couldn't find it in the archives.

> Simple solution is to have a state variable so you can see whether a
> backend has created an temp files in this transaction. Most don't, so I
> think the above two solutions are overkill. If we created any, scan for
> them, if not, don't. Just a simple boolean state, just as we have for
> AtEOXact_RelationCache().

Ah -- like this?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Index: src/backend/storage/file/fd.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/storage/file/fd.c,v
retrieving revision 1.143
diff -c -p -r1.143 fd.c
*** src/backend/storage/file/fd.c	1 Jan 2008 19:45:51 -	1.143
--- src/backend/storage/file/fd.c	17 Sep 2008 21:33:28 -
*** static int	max_safe_fds = 32;	/* default
*** 121,126 
--- 121,132 
  #define FD_TEMPORARY		(1 << 0)	/* T = delete when closed */
  #define FD_XACT_TEMPORARY	(1 << 1)	/* T = delete at eoXact */
  
+ /*
+  * Flag to tell whether it's worth scanning VfdCache looking for temp files to
+  * close
+  */
+ static bool		any_temporary_files = false;
+ 
  typedef struct vfd
  {
  	signed short fd;			/* current FD, or VFD_CLOSED if none */
*** OpenTemporaryFile(bool interXact)
*** 889,894 
--- 895,903 
  	{
  		VfdCache[file].fdstate |= FD_XACT_TEMPORARY;
  		VfdCache[file].create_subid = GetCurrentSubTransactionId();
+ 
+ 		/* ensure cleanup happens at eoxact */
+ 		any_temporary_files = true;
  	}
  
  	return file;
*** AtEOSubXact_Files(bool isCommit, SubTran
*** 1603,1609 
  {
  	Index		i;
  
! 	if (SizeVfdCache > 0)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i < SizeVfdCache; i++)
--- 1612,1618 
  {
  	Index		i;
  
! 	if (SizeVfdCache > 0 && any_temporary_files)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i < SizeVfdCache; i++)
*** CleanupTempFiles(bool isProcExit)
*** 1679,1685 
  {
  	Index		i;
  
! 	if (SizeVfdCache > 0)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i < SizeVfdCache; i++)
--- 1688,1694 
  {
  	Index		i;
  
! 	if (SizeVfdCache > 0 && (isProcExit || any_temporary_files))
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i < SizeVfdCache; i++)
*** CleanupTempFiles(bool isProcExit)
*** 1697,1702 
--- 1706,1713 
  	FileClose(i);
  			}
  		}
+ 
+ 		any_temporary_files = false;
  	}
  
  	while (numAllocatedDescs > 0)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Simon Riggs


On Wed, 2008-09-17 at 16:25 -0400, Alvaro Herrera wrote:

> We've been profiling a large system (8 CPUs, 64 GB of memory, some
> dozens of disks) which seems rather more swamped than it should.  Part
> of the problem seems to come from CleanupTempFiles, the second entry in
> oprofile output.

I'm glad you've observed this also. I saw it about two years ago but
wasn't able to convince anyone else it existed at the time.

> I can see two simple ways to solve this problem.  One is to create a
> second array that keeps pointers to the temp files in VfdCache.  Then,
> on CleanupTempFiles we scan that array instead of VfdCache directly.
> 
> The second one is to use a separate VfdCache for temp files, but this
> seems much more involved, requiring touch almost all of fd.c.
> 
> Of course, perhaps there's another solution which involves rethinking
> fd.c in a more thorough fashion, but I'm not sure how right offhand.

Simple solution is to have a state variable so you can see whether a
backend has created an temp files in this transaction. Most don't, so I
think the above two solutions are overkill. If we created any, scan for
them, if not, don't. Just a simple boolean state, just as we have for
AtEOXact_RelationCache().

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Alvaro Herrera

Hi,

We've been profiling a large system (8 CPUs, 64 GB of memory, some
dozens of disks) which seems rather more swamped than it should.  Part
of the problem seems to come from CleanupTempFiles, the second entry in
oprofile output.

This is the top 3 symbol report for a 2 minute oprofile run:

$ opreport '*postgres' -l
CPU: AMD64 processors, speed 2600 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 7000
samples  %symbol name
1963666   5.4862  AllocSetAlloc
1629477   4.5525  CleanupTempFiles
1560543   4.3599  hash_search_with_hash_value


Annotating the function results in this:

   :static void
   :CleanupTempFiles(bool isProcExit)
  1334  0.0037 :{ /* CleanupTempFiles total: 1629477  4.5525 */
   :Index  i;
   :
90 2.5e-04 :if (SizeVfdCache > 0)
   :{
   :Assert(FileIsNotOpen(0));  /* Make 
sure ring not corrupted */
470706  1.3151 :for (i = 1; i < SizeVfdCache; i++)
   :{
281998  0.7879 :unsigned short fdstate = 
VfdCache[i].fdstate;
   :
872073  2.4365 :if ((fdstate & FD_TEMPORARY) && 
VfdCache[i].fileName != NULL)
   :{
   :/*
   : * If we're in the process of 
exiting a backend process, close
   : * all temporary files. 
Otherwise, only close temporary files
   : * local to the current 
transaction.
   : */
   :if (isProcExit || (fdstate & 
FD_XACT_TEMPORARY))
   :FileClose(i);
   :}
   :}
   :}
   :
  3198  0.0089 :while (numAllocatedDescs > 0)
   :FreeDesc(&allocatedDescs[0]);
78 2.2e-04 :}



So we're scanning the array lots of times (there are about 1300
transactions per second), and do no useful work on each scan.

There are about 8900 files in the database, and since this is using a
connection pooler, it's quite likely that each session opens a large
subset of them at least once at some point, and so the VfdCache gets
very large.


I can see two simple ways to solve this problem.  One is to create a
second array that keeps pointers to the temp files in VfdCache.  Then,
on CleanupTempFiles we scan that array instead of VfdCache directly.

The second one is to use a separate VfdCache for temp files, but this
seems much more involved, requiring touch almost all of fd.c.

Of course, perhaps there's another solution which involves rethinking
fd.c in a more thorough fashion, but I'm not sure how right offhand.

Thoughts?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal - function's default parametes

2008-09-17 Thread Pavel Stehule

Hello,

before any implementation of named params we have to implement
parameters with default values. I don't see any sense of named params
without it. It's ortogonal to variadic functions (variadic functions
are called when we put more parameters than functions has, dafault
parameters are used when we put less parameters than funtions has).
Similar implementation has firebird. Default values are filled from
right -

create function foo(undefvar type, var1 type1 = expr1, var2 type2 =
expr2, ... var_n type_n = expr_n)

select foo(cexpr1, cexpr2) is transformed to

select foo(cexpr1, cexpr2, expr2, ... expr_n)

Problems with identification of good function are same as with
variadic functions.

Implementation:
In my prototype I used special datatype: position_expression that
carries position and parsed and transformed expression. pg_proc has
new column "defaults" of position_expression[] type. Position will be
used for named params in future, and using this type protect us to
compatibility problems.

postgres=# create or replace function x1(int = 1,int = 2,int= 3)
returns int as $$
  select $1+$2+$3;
$$ language sql;
CREATE FUNCTION
postgres=# select x1();
x1

6
(1 row)

postgres=# select x1(10);;
x1

15
(1 row)

postgres=# select x1(10,20);
x1

33
(1 row)

postgres=# select x1(10,20,30);
x1

60
(1 row)

This feature is implemented on SQL level, so all PL languages will be supported.

any comments, ideas are welcome

regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Robert Haas

> I tend to agree with Alvaro that there's not very much of a use case for
> an analyze-only autovacuum mode.  Assuming that we get to the point of
> having a parallelizing pg_restore, it would be interesting to give it an
> option to include ANALYZE for each table it's loaded among the tasks
> that it schedules.  (I'm visualizing these commands as being made up by
> pg_restore itself, *not* added to the pg_dump output.)  Then you could
> have a reasonably optimal total workflow, whereas allowing autovacuum
> to try to schedule the ANALYZEs can't be.

In Simon's original email, he suggested forcing an automatic ANALYZE
on the server side after CREATE TABLE AS.  I objected on the grounds
that this won't fix anything for people who are doing bulk data loads
using any other mechanism.  Here, you're proposing the exact same
thing, except instead of restricting it to people who use CREATE TABLE
AS, you're restricting it to people who use a hypothetical
parallelized implementation of pg_restore.

While either of these is better than doing nothing, ISTM it would be
far better to give the database some smarts about what constitutes a
bulk data load (a whole bunch of insert operations on a newly created
table) and what to do about it (synchronous analyze just before the
first operation on the table that isn't an insert - and perhaps not
before).

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Simon Riggs

On Wed, 2008-09-17 at 10:52 -0400, Tom Lane wrote:
> Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> > Alvaro Herrera wrote:
> >> Why doesn't this new request conflict with that one?
> 
> > The problem back then was that a CREATE INDEX was waiting on the 
> > autoanalyze to finish, and the autoanalyze took a long time to finish 
> > because of vacuum_cost_delay. Now that we have the auto-cancel 
> > mechanism, that's not a problem.
> 
> Define "not a problem".  With auto-cancel, what will happen is that
> whatever work the autoanalyze does will be wasted.  It seems to me
> that the current complaint is about background autovacuum/autoanalyze
> wasting cycles during a bulk load, and there's certainly no purer waste
> than an analyze cycle that gets aborted.

OK, but that's an argument against auto-anything, not just against
splitting out autoanalyze and autovacuum.

> I tend to agree with Alvaro that there's not very much of a use case for
> an analyze-only autovacuum mode. 

Did he say that? I thought he said "we could do that", what did that mean 
Alvaro?

I have a customer saying this would be a good thing and I agree. The
roles of Autovacuum and autoanalyze are not exactly matched, so why do
we force them to be run together or not at all? Why not allow
the user to specify whether they want both or not? It's an option, we're
not forcing anyone to do it that way if they don't want to.

>  Assuming that we get to the point of
> having a parallelizing pg_restore, it would be interesting to give it an
> option to include ANALYZE for each table it's loaded among the tasks
> that it schedules.  (I'm visualizing these commands as being made up by
> pg_restore itself, *not* added to the pg_dump output.)  Then you could
> have a reasonably optimal total workflow, whereas allowing autovacuum
> to try to schedule the ANALYZEs can't be.

That doesn't solve all problems, just ones with pg_restore. That's nice
and I won't turn it away, but what will we do about plain pg_dump and
about other table creations and loads?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch for SQL-Standard Interval output and decoupling DateStyle from IntervalStyle

2008-09-17 Thread Ron Mayer


Tom Lane wrote:

In fact, given that we are now
somewhat SQL-compliant on interval input, a GUC that selected
PG traditional, SQL-standard, or ISO 8601 interval output format seems
like it could be a good idea.


Short summary:

   The attached patch
 (1) adds a new GUC called "IntervalStyle" that decouples interval
 output from the "DateStyle" GUC, and
 (2) adds a new interval style that will match the SQL standards
 for interval literals when given interval data that meets the
 sql standard (year-month or date-time only; and no mixed sign).

Background:

   Currently Postgres outputs Intervals in one of two formats
   depending on DateStyle.  When DateStyle is 'ISO',  it outputs
   intervals like '1 year 2 mons 3 days -04:05:06.78' (though I
   know of no ISO interval standards that look like that).  When
   DateStyle is 'SQL' it outputs intervals like
   '@ 1 year 2 mons 3 days -4 hours -5 mins -6.78 secs' (though
   I know of no SQL interval standards that look like that).

The feature:

   The SQL standard only specifies interval literal strings
   for the two kinds of intervals (year-month and day-time)
   that are defined by the SQL spec.  It also doesn't account
   for postgres's intervals that can have mixed-sign components.
   I tried to make the output here be a logical extension
   of the spec, concatenating the year-month and day-time
   interval strings and forcing signs in the output that could
   otherwise have been ambiguous.

   This little table shows an example of the output of this
   new style compared to the existing postgres output for
   (a) a year-month interval, (b) a day-time interval, and
   (c) a not-quite-standard postgres interval with year-month
   and day-time components of varying signs.

'1-2' | '@ 1 year 2 mons'
'3 4:05:06.78'| '@ 3 days 4 hours 5 mins 6.78 secs'
'+1-2 -3 +4:05:06.78' | '@ 1 year 2 mons -3 days 4 hours 5 mins 6.78 secs'

The patch:

   Seems to work for me; and I believe I updated the docs.
   Many regression tests fail, though, because they assume
   the existing coupling of DateStyle and interval output
   styles.   If people like where this is going I can update
   those regression tests and add ones to test this new style.


*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 4090,4095  SET XML OPTION { DOCUMENT | CONTENT };
--- 4090,4117 

   
  
+  
+   IntervalStyle (string)
+   
+IntervalStyle configuration parameter
+   
+   
+
+ Sets the display format for interval values. 
+ The value sql_standard will output SQL Standard
+ strings when given intervals that conform to the SQL
+ standard (either year-month only or date-time only; and no
+ mixing of positive and negative components).
+ The value postgres will output intervals in
+ a format that matches what old releases had output when
+ the DateStyle was set to 'ISO'.
+ The value postgres_verbose will output intervals in
+ a format that matches what old releases had output when
+ the DateStyle was set to 'SQL'.
+
+   
+  
+ 
   
timezone (string)

*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
***
*** 2213,2218  January 8 04:05:06 1999 PST
--- 2213,2305 
  
 
  
+
+ Interval Output
+ 
+ 
+  interval
+  output format
+  formatting
+ 
+ 
+ 
+  The output format of the interval types can be set to one of the four
+  styles sql_standard, 
+  postgres, or postgres_verbose.The default
+  is the postgres format.  
+   shows examples of each
+  output style.
+ 
+ 
+ 
+  The sql_standard style will output SQL standard
+  interval literal strings where the value of the interval
+  value consists of only a year-month component or a datetime
+  component (as required by the sql standard).   For an interval
+  containing both a year-month and a datetime component, the
+  output will be a SQL Standard unquoted year-month literal
+  string joined to a SQL Standard unquoted datetime literal
+  string with a space in between.
+ 
+ 
+ 
+  The postgres style will output intervals that match
+  the style PostgreSQL 8.3 outputed when the 
+  parameter was set to ISO.
+ 
+ 
+ 
+  The postgres_verbose style will output intervals that match
+  the style PostgreSQL 8.3 outputed when the 
+  parameter was set to SQL.
+ 
+ 
+  
+ 	   Interval Style Example
+ 	   
+ 		
+ 		 
+ 		  Style Specification
+ 		  Year-Month Interval
+ 		  DateTime Interval
+ 		  Nonstandardrd Extended Interval
+ 		 
+ 		
+ 		
+ 		 
+ 		  sql_standard
+ 		  1-2
+ 		  3 4:05:06
+ 		  -1-2 +3 -4:05:06
+ 		 
+ 		 
+ 		  postgres
+ 		  1 year 2 mons
+ 		  3 days 04:05:06
+ 		   -1 years -

Re: [HACKERS] text search patch status update?

2008-09-17 Thread Teodor Sigaev

I remember about that but right now I havn't time to make final review. Sorry. 
Will return soon.


Sushant Sinha wrote:

Any status updates on the following patches?

1. Fragments in tsearch2 headlines:
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00043.php

2. Bug in hlCover:
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00089.php

-Sushant.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Column level privileges was:(Re: [HACKERS] Extending grant insert on tables to sequences)

2008-09-17 Thread Jaime Casanova

On 9/17/08, Stephen Frost <[EMAIL PROTECTED]> wrote:
>
> > A piece which can be broken off pretty easily is adding support to track
> > the columns used through to the executor so we can check the permissions
> > in the right place.
>
> Jamie, have you had a chance to work on this?  It's next on my list and
> I'll start working on it tonight unless you've had a chance to get to
> it.  Please let me know.
>

not really, i start to read the code... but was interrupted for a new
task... (if we only could send kill -9 signals to work tasks ;)

-- 
regards,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for SQL-standard negative valued year-month literals

2008-09-17 Thread Stephen R. van den Berg

Tom Lane wrote:
>"Stephen R. van den Berg" <[EMAIL PROTECTED]> writes:
>> Intervals are a scalar, not an addition of assorted values, alternating signs
>> between fields would be wrong.

>Sorry, you're the one who's wrong on that.  We've treated intervals as
>three independent fields for years now (and before that it was two
>independent fields).

Ok, didn't know that.
Let's put it differently then: I can understand that the standard
considers it a scalar and not an addition, but apparently the addition
characteristic is being used in Postgres code already; that makes it
undesirable to change it indeed.
-- 
Sincerely,
   Stephen R. van den Berg.

He did a quarter of the work in *half* the time!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Tatsuo Ishii

> > To enable physical_tlist optimization, it seems build_physical_tlist,
> > use_physical_tlist and disuse_physical_tlist need to be
> > changed. build_physical_tlist and use_physical_tlist have been already
> > patched and only disuse_physical_tlist needs to be patched. Any other
> > place I miss to enable the optimization?
> 
> IIRC, the comment for build_physical_tlist hadn't been patched, but
> yeah that seems like about it.

Yeah, I need to fix sloppy comments in the existing patches all over
the places:-)
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Tom Lane

Tatsuo Ishii <[EMAIL PROTECTED]> writes:
>> Is physical_tlist optimization sensible for RecursiveScan?  We seem
>> to use it for every other Scan node type.

> To enable physical_tlist optimization, it seems build_physical_tlist,
> use_physical_tlist and disuse_physical_tlist need to be
> changed. build_physical_tlist and use_physical_tlist have been already
> patched and only disuse_physical_tlist needs to be patched. Any other
> place I miss to enable the optimization?

IIRC, the comment for build_physical_tlist hadn't been patched, but
yeah that seems like about it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Tom Lane

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Alvaro Herrera wrote:
>> Why doesn't this new request conflict with that one?

> The problem back then was that a CREATE INDEX was waiting on the 
> autoanalyze to finish, and the autoanalyze took a long time to finish 
> because of vacuum_cost_delay. Now that we have the auto-cancel 
> mechanism, that's not a problem.

Define "not a problem".  With auto-cancel, what will happen is that
whatever work the autoanalyze does will be wasted.  It seems to me
that the current complaint is about background autovacuum/autoanalyze
wasting cycles during a bulk load, and there's certainly no purer waste
than an analyze cycle that gets aborted.

I tend to agree with Alvaro that there's not very much of a use case for
an analyze-only autovacuum mode.  Assuming that we get to the point of
having a parallelizing pg_restore, it would be interesting to give it an
option to include ANALYZE for each table it's loaded among the tasks
that it schedules.  (I'm visualizing these commands as being made up by
pg_restore itself, *not* added to the pg_dump output.)  Then you could
have a reasonably optimal total workflow, whereas allowing autovacuum
to try to schedule the ANALYZEs can't be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Tatsuo Ishii

> Is physical_tlist optimization sensible for RecursiveScan?  We seem
> to use it for every other Scan node type.

To enable physical_tlist optimization, it seems build_physical_tlist,
use_physical_tlist and disuse_physical_tlist need to be
changed. build_physical_tlist and use_physical_tlist have been already
patched and only disuse_physical_tlist needs to be patched. Any other
place I miss to enable the optimization?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] libpq events patch (with sgml docs)

2008-09-17 Thread Andrew Chernow


Andrew Chernow wrote:

New patch following our discussion with updated docs.

few logical errors).  I don't think it makes sense to do it 
otherwise, it would be like calling free after a malloc failure.


I can live with that definition, but please document it.




To build on this analogy, PGEVT_CONNRESET is like a realloc.  Meaning, 
the initial malloc "PGEVT_REGISTER" worked by the realloc 
"PGEVT_CONNRESET" didn't ... you still have free "PGEVT_CONNDESTROY" the 
initial.  Its documented that way.  Basically if a register succeeds, a 
destroy will always be sent regardless of what happens with a reset.





I attached the wrong patch.  I'm sorry.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/
Index: doc/src/sgml/libpq.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/libpq.sgml,v
retrieving revision 1.261
diff -C6 -r1.261 libpq.sgml
*** doc/src/sgml/libpq.sgml 17 Sep 2008 04:31:08 -  1.261
--- doc/src/sgml/libpq.sgml 17 Sep 2008 14:19:29 -
***
*** 4911,4923 
 When a PGEVT_REGISTER event is received, the
 evtInfo pointer should be cast to a
 PGEventRegister *.  This structure contains a
 PGconn that should be in the
 CONNECTION_OK status; guaranteed if one calls
 PQregisterEventProc right after obtaining a good
!PGconn.

   
  
  
  
   PGEVT_CONNRESET
--- 4911,4925 
 When a PGEVT_REGISTER event is received, the
 evtInfo pointer should be cast to a
 PGEventRegister *.  This structure contains a
 PGconn that should be in the
 CONNECTION_OK status; guaranteed if one calls
 PQregisterEventProc right after obtaining a good
!PGconn.  When returning a failure code, all
!cleanup must be performed as no PGEVT_CONNDESTROY
!event will be sent.

   
  
  
  
   PGEVT_CONNRESET
***
*** 4941,4953 
  
 When a PGEVT_CONNRESET event is received, the
 evtInfo pointer should be cast to a
 PGEventConnReset *.  Although the contained
 PGconn was just reset, all event data remains
 unchanged.  This event should be used to reset/reload/requery any
!associated instanceData.

   
  
  
  
   PGEVT_CONNDESTROY
--- 4943,4956 
  
 When a PGEVT_CONNRESET event is received, the
 evtInfo pointer should be cast to a
 PGEventConnReset *.  Although the contained
 PGconn was just reset, all event data remains
 unchanged.  This event should be used to reset/reload/requery any
!associated instanceData.  A PGEVT_CONNDESTROY event
!is always sent, regardless of the event procedure's return value.

   
  
  
  
   PGEVT_CONNDESTROY
***
*** 5000,5023 
 PGEventResultCreate *.  The
 conn is the connection used to generate the
 result.  This is the ideal place to initialize any
 instanceData that needs to be associated with the
 result.  If the event procedure fails, the result will be cleared and
 the failure will be propagated.  The event procedure must not try to
!PQclear the result object for itself.

   
  
  
  
   PGEVT_RESULTCOPY
   

 The result copy event is fired in response to
 PQcopyResult.  This event will only be fired after
!the copy is complete.
  

  typedef struct
  {
  const PGresult *src;
  PGresult *dest;
--- 5003,5030 
 PGEventResultCreate *.  The
 conn is the connection used to generate the
 result.  This is the ideal place to initialize any
 instanceData that needs to be associated with the
 result.  If the event procedure fails, the result will be cleared and
 the failure will be propagated.  The event procedure must not try to
!PQclear the result object for itself.  When returning a
!failure code, all cleanup must be performed as no
!PGEVT_RESULTDESTROY event will be sent.

   
  
  
  
   PGEVT_RESULTCOPY
   

 The result copy event is fired in response to
 PQcopyResult.  This event will only be fired after
!the copy is complete.  Only source result event procedures that have
!successfully handled the PGEVT_RESULTCREATE event,
!will be copied to the destination result.
  

  typedef struct
  {
  const PGresult *src;
  PGresult *dest;
***
*** 5029,5041 
 PGEventResultCopy *.  The
 src result is what was copied while the
 dest result is the copy destination.  This event
 can be used to provide a deep copy of instanceData,
 since PQcopyResult cannot do that.  If the event
 pro

Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Heikki Linnakangas


Alvaro Herrera wrote:

Simon Riggs wrote:

On Wed, 2008-09-17 at 10:09 +0300, Heikki Linnakangas wrote:
Isn't autoanalyze a waste of time during a bulk load? Seems better to 
run ANALYZE manually at the end.

Its not a waste of time because it catches tables immediately they have
been loaded, not just at the end of the bulk load. Running ANALYZE is a
waste of time if autoanalyze has already caught it, which is why that's
never been added onto the end of a pg_dump script. But currently this is
true only when we have both autoVACUUM and autoANALYZE enabled.


Hmm, one of the first complaints about defaulting autovacuum to on was
that it made restores so much longer *because* it was choosing to do
autoanalyzes on the tables as they were imported.  It was then that the
auto-cancel mechanism was introduced.

http://pgsql.markmail.org/message/rqyjkafuw43426xy

Why doesn't this new request conflict with that one?


The problem back then was that a CREATE INDEX was waiting on the 
autoanalyze to finish, and the autoanalyze took a long time to finish 
because of vacuum_cost_delay. Now that we have the auto-cancel 
mechanism, that's not a problem.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Column level privileges was:(Re: [HACKERS] Extending grant insert on tables to sequences)

2008-09-17 Thread Stephen Frost

Jaime,

* Stephen Frost ([EMAIL PROTECTED]) wrote:
> * Jaime Casanova ([EMAIL PROTECTED]) wrote:
> > On 7/25/08, Stephen Frost <[EMAIL PROTECTED]> wrote:
> > > Yes, I'm working on it
> > 
> > hi, any work on it? may i help?
> 
> If you look at the commitfest, I've posted my WIP so far there.  Most of
> the grammer, parser, and catalog changes are there.  There's a couple of
> bugs in that code that I'm working to run down but otherwise I think
> it's pretty good.  I do need to add in the dependency tracking as well
> though, and that's what I'm planning to work on next.

I've now added dependency tracking and worked out a few kinks in the
code, both existing previously and from adding the dep tracking.  I'd
really like to simplify things in aclchk.c, perhaps by factoring out
more common bits into functional pieces, but it's been kind of a bear so
far.

The dependency tracking is being done by continuing to treat the table
as a single entity and just figuring out the total set (including all
column-level permissions) of roles for the entire table, rather than
introducing the sub-object concept.  This requires a bit of extra effort
when doing DDLs and GRANTs but simplifies the dependency tracking
itself, especially since we have to keep track of both table-level
permissions and column-level permissions seperately.

I'm open to other suggestions/comments.  If people feel the sub-object
is a better approach, it would get somewhat more awkward because we'd
have to handle the relation-level dependencies as well as the
column-level ones.  Not impossible to do, of course, but a bit more
complicated than how it was done originally.

> A piece which can be broken off pretty easily is adding support to track
> the columns used through to the executor so we can check the permissions
> in the right place.

Jamie, have you had a chance to work on this?  It's next on my list and
I'll start working on it tonight unless you've had a chance to get to
it.  Please let me know.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] New FSM patch

2008-09-17 Thread Heikki Linnakangas


Zdenek Kotala wrote:
I tested it with DTrace on Solaris 10 and 8CPUs SPARC machine. I got 
similar result as you. Main problem in your new implementation is 
locking. On small tables where FSM fits on one page clients spend about 
3/4 time to waiting on page lock.


That in itself is not that surprising. The test case does nothing else 
than exercise the FSM, so it's pretty obvious that all the time will be 
spent in the FSM. And they will be serialized on the lock on the single 
FSM page, like they are currently serialized on the FreeSpaceMapLock.


I think it's pretty unlikely we'd run into FSM congestion on a small 
table in real life. If you're (non-HOT) updating a table at such a high 
rate, it's not going to stay small for long.


On medium tables (2level FSM) then 
InsertWal lock become significant - it takes 1/4 of waiting time. Page 
waiting takes "only" 1/3.


That's a more interesting case from scalability point of view.

For the case of a medium size table, we could implement the optimization 
of a "fast-root", similar to B-tree, to speed up the searches. If the 
heap is small enough that not all FSM levels are needed, we could simply 
start searches from the lower levels. That should reduce the page 
locking significantly. However, when searching, the upper levels are 
locked in shared, not exclusive, mode, so it's not clear if that would 
help with that 1/3 that's now spent in page waiting.



Suggestions:

1) remove WAL logging. I think that FSM record should be recovered 
during processing of others WAL records (like insert, update). Probably 
only we need full page write on first modification after checkpoint.


Hmm, we don't have the infrastructure to do that, but I guess it's 
doable. In case of a crash, the FSM information after recovery wouldn't 
obviously be up-to-date. And the FSM information in a warm standby would 
also lag behind.


One option would be to put RecordPageWithFreeSpace() calls into 
heap_redo, so that the FSM would be updated based on the WAL records we 
write anyway.


I think we'd still need to WAL log operations that decrease the amount 
of free space on page. Otherwise, after we have partial vacuum, we might 
never revisit a page, and update the FSM, even though there's usable 
space on the page, leaving the space forgotten forever.


And the torn page problem would need to be handled somehow.

2) break lock - use only share lock for page locking and divide page for 
smaller part for exclusive locking (at least for root page)


That sounds difficult. But I don't think it's very likely to have 
congestion on a single FSM page in real life. Unless the root page 
becomes a bottleneck. Can you figure out how much of the lock waits come 
from the upper level and root pages?



However, your test case is too artificial.


I'm going to run OLTP 
workload and test it with "real" workload.


Agreed. It should be possible to emulate the pattern the FSM really sees:

- tables become fuller and fuller, until no more tuples fill. Vacuum 
sweeps through them periodically, adding a lot of free space to each page.
- RecordPageWithFreeSpace is only called when a tuple being inserted 
doesn't fit on the previous page the backend inserted to.


However, even if these artificial tests show that the new implementation 
is X times slower than the old one, the question that they can't answer 
is whether it matters or not. If only say 0.01% of the time was spent in 
FSM, a 10x slowdown would be acceptable. A full-blown OLTP benchmark 
should give some clue on that.


Other important test cases are various bulk insert kind of tests.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Alvaro Herrera

Simon Riggs wrote:
> 
> On Wed, 2008-09-17 at 10:09 +0300, Heikki Linnakangas wrote:

> > Isn't autoanalyze a waste of time during a bulk load? Seems better to 
> > run ANALYZE manually at the end.
> 
> Its not a waste of time because it catches tables immediately they have
> been loaded, not just at the end of the bulk load. Running ANALYZE is a
> waste of time if autoanalyze has already caught it, which is why that's
> never been added onto the end of a pg_dump script. But currently this is
> true only when we have both autoVACUUM and autoANALYZE enabled.

Hmm, one of the first complaints about defaulting autovacuum to on was
that it made restores so much longer *because* it was choosing to do
autoanalyzes on the tables as they were imported.  It was then that the
auto-cancel mechanism was introduced.

http://pgsql.markmail.org/message/rqyjkafuw43426xy

Why doesn't this new request conflict with that one?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for SQL-standard negative valued year-month literals

2008-09-17 Thread Ron Mayer


Tom Lane wrote:

"Stephen R. van den Berg" <[EMAIL PROTECTED]> writes:

Intervals are a scalar, not an addition of assorted values, alternating signs
between fields would be wrong.


Sorry, you're the one who's wrong on that.  We've treated intervals as
three independent fields for years now (and before that it was two
independent fields).  We're not going to throw away that capability.


+1 It's very useful.

Currently our terse input format that's similar to the SQL standard
rejects more mixed-sign intervals than I'd like.  I'd be quite
happy if:
  '1 2:03:-04'
gave me
  '1 day 2 hours 3 minutes -4 seconds'
but currently we reject that mixed-sign-literal.


I'd just like to find a way to have SQL-standard input produce SQL-standard
output in the cases where the input happened to match the standard.

If we had a blank slate, my vote would be that
  '-1 2:03:04'  should mean what the SQL standard says it should.
  '-1 +2:03:04' should mean negative 1 days, plus 2 hours 3 minutes 4 sec
  '1 2:03:-04'  should mean 1 day 2 hours 3 minutes minus 4 seconds
  '-1 2:03:+04'  should mean negative 1 day 2 hours 3 minutes plus 4 seconds
but I'm aware that there are backward compatibility issues.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Kevin Grittner

>>> Tom Lane <[EMAIL PROTECTED]> wrote: 
> "Robert Haas" <[EMAIL PROTECTED]> writes:
>>> I am not sure, if these rule is good. Somebody who develop on
>>> postgresql should have a problems when they will be port to other
>>> databases in future. Reserved words in standards should be
respected.
> 
>> If people want to write code that will work on multiple databases,
>> they should of course avoid using any SQL reserved words for
anything
>> other than their reserved purposes.
> 
> More than that, they have to actually test their SQL on each target
DB.
> Every DB (including us) is going to have some reserved words that
are
> not in the standard; so imagining that Postgres can all by itself
> protect you from this type of problem is doomed to failure anyway.
 
If someone wants portable code, they can use a development tool which
wraps ALL identifiers in quotes, every time.  That's what we do.  The
important thing is that, to the extent practicable, standard SQL code
is accepted and behaves in compliance with the standard.  I don't see
that it does anything to compromise that if you support additional,
non-standard syntax for extensions.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] libpq events patch (with sgml docs)

2008-09-17 Thread Tom Lane

Andrew Chernow <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Some thought would need to be given to how that interacts with
>> RESULTCOPY.  Presumably on the destination side, RESULTCOPY is the
>> equivalent of RESULTCREATE, ie, don't DESTROY if you didn't get COPY.
>> But what of the source side?  Arguably you shouldn't invoke COPY on
>> events that were never initialized in this object.

> You are correct.  The resultcopy function needs to check if the src 
> result eventproc was initialized.  BUT, that should not be possible 
> unless someone is not checking return values.  Meaning, the only way to 
> have an uninitialized eventproc in this instance is if something failed 
> during a resultcreate.  Actually, if you call PQmakeEmptyPGResult then 
> you can also run into this problem.  That can be solved by adding an 
> argument to makeResult as I previously suggested.

I still think it would be a good idea to expend a couple more lines in
PQcopyResult to cover the case --- the likelihood of there being code
paths that make an empty result and never invoke RESULTCREATE just seems
too high.  Defensive programming 'n all that.  In any case I'm not
convinced that we should force a RESULTCREATE cycle on external callers
of PQmakeEmptyPGResult.  If you guys didn't see a need for it in your
development then it probably doesn't exist.

> If a resultcreate fails, than I think that resultdestroy should not be 
> delivered to that eventproc (same for resultcopy).  That is how register 
> works and how I saw this patch working (eventhough it appears I have a 
> few logical errors).  I don't think it makes sense to do it otherwise, 
> it would be like calling free after a malloc failure.

I can live with that definition, but please document it.

> The needDestroy member should be called resultInitialized.

Yeah, probably, so that you can set it FALSE in conn events and continue
to use memcpy to move things over.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NDirectFileRead and Write

2008-09-17 Thread Tom Lane

ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> - NDirectFile{Read|Write} are renamed to BufFile{Read|Write}Count.
> - They are still declared in execdebug.h and buffile.c includes it.

After some thought I concluded that the least ugly way to do this
was to declare and define the variables in the same places as the
other counters that are printed by ShowBufferUsage.  It's not any
worse from a modularity standpoint than the other alternatives we
considered, and it seems to satisfy the principle of least
surprise a little better.  Committed it that way.

> I did not touch messages in ShowBufferUsage() in the patch.

I didn't change the message either.  Heikki's suggestion of "temp
blocks" doesn't seem very good to me because of the likelihood
of confusion with temp tables.  I don't have a better alternative
offhand.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] libpq events patch (with sgml docs)

2008-09-17 Thread Andrew Chernow


Tom Lane wrote:


Some thought would need to be given to how that interacts with
RESULTCOPY.  Presumably on the destination side, RESULTCOPY is the
equivalent of RESULTCREATE, ie, don't DESTROY if you didn't get COPY.
But what of the source side?  Arguably you shouldn't invoke COPY on
events that were never initialized in this object.



You are correct.  The resultcopy function needs to check if the src 
result eventproc was initialized.  BUT, that should not be possible 
unless someone is not checking return values.  Meaning, the only way to 
have an uninitialized eventproc in this instance is if something failed 
during a resultcreate.  Actually, if you call PQmakeEmptyPGResult then 
you can also run into this problem.  That can be solved by adding an 
argument to makeResult as I previously suggested.



I also think that a little bit of thought should go into whether or
not to call DESTROY on an event that *did* get a CREATE and failed it.
You could argue that one either way: should a failing CREATE operation
be expected to fully clean up after itself, or should it be expected
to leave things in a state where DESTROY can clean up properly?
I'm not entirely sure, but option A might force duplication of code
between CREATE's failure path and DESTROY.  Whichever semantics we
choose needs to be documented.



If a resultcreate fails, than I think that resultdestroy should not be 
delivered to that eventproc (same for resultcopy).  That is how register 
works and how I saw this patch working (eventhough it appears I have a 
few logical errors).  I don't think it makes sense to do it otherwise, 
it would be like calling free after a malloc failure.


The needDestroy member should be called resultInitialized.  Although the 
conn doesn't reference the 'resultInitialized' member, I should 
initialize it to FALSE.  I did not do this in the last patch ... 
register function.


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal of SE-PostgreSQL patches (for CommitFest:Sep)

2008-09-17 Thread KaiGai Kohei

KaiGai Kohei wrote:

Peter, thanks for your comments.

 > Let's review:
 >
 > *) System-wide consistency in access controls could be nice to have in
 > some cases.  But is it really achievable?  In the typical three-tier web
 > application scenario, do you really have system-wide consistency?  Can
 > you configure your application server using SELinux?  I'm no expert on
 > these things, but I wonder, would it even work in a useful way, over the
 > network, with all the different threads, processes, and sessions going
 > on?  Or how about a desktop, pgAdmin with several database connections,
 > can those be isolated from each other or whatever the security setup may
 > be?

It's a good question. Yes, it is possible no need to say. :)

We can configure Apache to kick its contents handler with a proper security
context. The contents handler is a sort of Apache module to handle various
kind of web contents like *.html, *.php, *.cgi and so on.
The existing module (mod_selinux) eanbles to invoke CGI program with a 
proper

security context based on HTTP authentication. In addition, the upcoming
Linux kernel got a feature to assign built-in scripts its security context.

SELinux applied its access controls based on the assigned security context
for various kind of objects like files, sockets, IPCs, tables, columns and
so on.

I can provide a demonstration, pelase wait for a while to set up.

The following URL can show the demonstration:
  http://kaigai.myhome.cx/index.php

It requires HTTP authentication, and you can choose one of "foo", "var" or 
"baz".
They can be authenticated by same password: "sepgsql".

The web server assigns per-user security context for its contents handler
including the PHP script. It shows the result set of SQL query depends on
the security context of its client.

(note) This script always connects to SE-PostgreSQL server with "apache" role
   that has a privileged user rights.

Thanks,
--
KaiGai Kohei <[EMAIL PROTECTED]>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] per-table autovacuum configuration

2008-09-17 Thread Martin Pihlak

Alvaro Herrera wrote:
>> any new thoughts on the matter? Perhaps someone is already working
>> on it?
> 
> It is still a wanted feature, but a couple of people have offered
> patches and I'm waiting for them ...
> 

Aha, good. I was considering taking a stab at it, but under the
circumstances will wait and see.

regards,
Martin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for SQL-standard negative valued year-month literals

2008-09-17 Thread Tom Lane

"Stephen R. van den Berg" <[EMAIL PROTECTED]> writes:
> Intervals are a scalar, not an addition of assorted values, alternating signs
> between fields would be wrong.

Sorry, you're the one who's wrong on that.  We've treated intervals as
three independent fields for years now (and before that it was two
independent fields).  We're not going to throw away that capability.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for SQL-standard negative valued year-month literals

2008-09-17 Thread Stephen R. van den Berg

Tom Lane wrote:
>Ron Mayer <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>> If I read SQL 200N's spec correctly

>>   select interval '-1 1:00:00';

>> should mean"-1 days -1 hours",
>> yet 8.3 sees it as "-1 days +1 hours".

>I think we are kind of stuck on this one.  If we change it, then how
>would one represent -1 days +1 hours?  The spec's format is only sane
>if you assume all the fields must have the same sign, which is not
>the case for PG.

-1 days +1 hours = interval '-0 23:00:00'

Intervals are a scalar, not an addition of assorted values, alternating signs
between fields would be wrong.
-- 
Sincerely,
   Stephen R. van den Berg.

He did a quarter of the work in *half* the time!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Tom Lane

"Robert Haas" <[EMAIL PROTECTED]> writes:
>> I am not sure, if these rule is good. Somebody who develop on
>> postgresql should have a problems when they will be port to other
>> databases in future. Reserved words in standards should be respected.

> I disagree.  I have never ported an app written for PostgreSQL to
> another database system, and have no plans to start.  The fact that
> some other database system might barf on a particular bit of SQL is
> insufficient reason for PostgreSQL to do the same thing.

> If people want to write code that will work on multiple databases,
> they should of course avoid using any SQL reserved words for anything
> other than their reserved purposes.

More than that, they have to actually test their SQL on each target DB.
Every DB (including us) is going to have some reserved words that are
not in the standard; so imagining that Postgres can all by itself
protect you from this type of problem is doomed to failure anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New FSM patch

2008-09-17 Thread Zdenek Kotala


Heikki Linnakangas napsal(a):

Heikki Linnakangas wrote:






Let me describe this test case first:
- The test program calls RecordAndGetPageWithFreeSpace in a tight loop, 
with random values. There's no activity to the heap. In normal usage, 
the time spent in RecordAndGetWithFreeSpace is minuscule compared to the 
heap and index updates that cause RecordAndGetWithFreeSpace to be called.
- WAL was placed on a RAM drive. This is of course not how people set up 
their database servers, but the point of this test was to measure CPU 
speed and scalability. The impact of writing extra WAL is significant 
and needs to be taken into account, but that's a separate test and 
discussion, and needs to be considered in comparison to the WAL written 
by heap and index updates.








Another surprise was how badly both implementations scale. On CVS HEAD, 
I expected the performance to be roughly the same with 1 and 2 clients, 
because all access to the FSM is serialized on the FreeSpaceLock. But 
adding the 2nd client not only didn't help, but it actually made the 
performance much worse than with a single client. Context switching or 
cache line contention, perhaps? The new FSM implementation shows the 
same effect, which was an even bigger surprise. At table sizes > 32 MB, 
the FSM no longer fits on a single FSM page, so I expected almost a 
linear speed up with bigger table sizes from using multiple clients. 
That's not happening, and I don't know why. Although, going from 33MB to 
333 MB, the performance with 2 clients almost doubles, but it still 
doesn't exceed that with 1 client.



I tested it with DTrace on Solaris 10 and 8CPUs SPARC machine. I got 
similar result as you. Main problem in your new implementation is 
locking. On small tables where FSM fits on one page clients spend about 
3/4 time to waiting on page lock. On medium tables (2level FSM) then 
InsertWal lock become significant - it takes 1/4 of waiting time. Page 
waiting takes "only" 1/3.


I think the main reason of scalability problem is that locking invokes 
serialization.


Suggestions:

1) remove WAL logging. I think that FSM record should be recovered 
during processing of others WAL records (like insert, update). Probably 
only we need full page write on first modification after checkpoint.


2) break lock - use only share lock for page locking and divide page for 
smaller part for exclusive locking (at least for root page)



However, your test case is too artificial. I'm going to run OLTP 
workload and test it with "real" workload.


Zdenek






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] libpq events patch (with sgml docs)

2008-09-17 Thread Tom Lane

Andrew Chernow <[EMAIL PROTECTED]> writes:
>>> Now, it's questionable how tense we need to be about that as long as
>>> event proc failure aborts calling of later event procs.  That means
>>> that procs have to be robust against getting DESTROY with no CREATE
>>> calls in any case.  Should we try to make that less uncertain?

> I attached a patch that adds a 'needDestroy' member to PGEvent It is
> set when resultcreate or resultcopy succeeds and checked during a
> PQclear.  That *should* resolve the issue of "no resultcreate but gets
> a resultdestroy".

Some thought would need to be given to how that interacts with
RESULTCOPY.  Presumably on the destination side, RESULTCOPY is the
equivalent of RESULTCREATE, ie, don't DESTROY if you didn't get COPY.
But what of the source side?  Arguably you shouldn't invoke COPY on
events that were never initialized in this object.

I also think that a little bit of thought should go into whether or
not to call DESTROY on an event that *did* get a CREATE and failed it.
You could argue that one either way: should a failing CREATE operation
be expected to fully clean up after itself, or should it be expected
to leave things in a state where DESTROY can clean up properly?
I'm not entirely sure, but option A might force duplication of code
between CREATE's failure path and DESTROY.  Whichever semantics we
choose needs to be documented.

Also, if we choose option B, then the same would hold for REGISTER
versus CONNDESTROY: failing REGISTER should still mean that you get
a CONNDESTROY.  So maybe that's an argument for option A, because
that's how REGISTER works now.

Lastly, the idea that was in the back of my mind was to resolve this
issue by unconditionally calling all the event procs at CREATE time,
not by cutting off the later ones if an earlier one failed.  Again,
I do not see a performance argument for skipping the extra steps,
and it seems to me that it might improve symmetry/robustness.
I'm not necessarily wedded to that approach but it bears thinking
about.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Robert Haas

> I am not sure, if these rule is good. Somebody who develop on
> postgresql should have a problems when they will be port to other
> databases in future. Reserved words in standards should be respected.

I disagree.  I have never ported an app written for PostgreSQL to
another database system, and have no plans to start.  The fact that
some other database system might barf on a particular bit of SQL is
insufficient reason for PostgreSQL to do the same thing.

If people want to write code that will work on multiple databases,
they should of course avoid using any SQL reserved words for anything
other than their reserved purposes.  But there is no reason for the
database system to unilaterally shove that down everyone's throat.  It
is very easy to overdo the idea of protecting users from themselves.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] libpq events patch (with sgml docs)

2008-09-17 Thread Andrew Chernow


Andrew Chernow wrote:

 > Now, it's questionable how tense we need to be about that as long as
 > event proc failure aborts calling of later event procs.  That means
 > that procs have to be robust against getting DESTROY with no CREATE
 > calls in any case.  Should we try to make that less uncertain?
 >
 >

I attached a patch that adds a 'needDestroy' member to PGEvent  It is 
set when resultcreate or resultcopy succeeds and checked during a 
PQclear.  That *should* resolve the issue of "no resultcreate but gets a 
resultdestroy".





Shoot.  I have a booboo in that last patch.  Attached is the corrected version.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/
Index: src/interfaces/libpq/fe-exec.c
===
RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/fe-exec.c,v
retrieving revision 1.198
diff -C6 -r1.198 fe-exec.c
*** src/interfaces/libpq/fe-exec.c	17 Sep 2008 04:31:08 -	1.198
--- src/interfaces/libpq/fe-exec.c	17 Sep 2008 10:49:10 -
***
*** 356,367 
--- 356,369 
  		if (!dest->events[i].proc(PGEVT_RESULTCOPY, &evt,
    dest->events[i].passThrough))
  		{
  			PQclear(dest);
  			return NULL;
  		}
+ 
+ 		dest->events[i].needDestroy = TRUE;
  	}
  
  	return dest;
  }
  
  /*
***
*** 378,396 
  		return NULL;
  
  	newEvents = (PGEvent *) malloc(count * sizeof(PGEvent));
  	if (!newEvents)
  		return NULL;
  
- 	memcpy(newEvents, events, count * sizeof(PGEvent));
- 
- 	/* NULL out the data pointers and deep copy names */
  	for (i = 0; i < count; i++)
  	{
  		newEvents[i].data = NULL;
! 		newEvents[i].name = strdup(newEvents[i].name);
  		if (!newEvents[i].name)
  		{
  			while (--i >= 0)
  free(newEvents[i].name);
  			free(newEvents);
  			return NULL;
--- 380,398 
  		return NULL;
  
  	newEvents = (PGEvent *) malloc(count * sizeof(PGEvent));
  	if (!newEvents)
  		return NULL;
  
  	for (i = 0; i < count; i++)
  	{
+ 		newEvents[i].proc = events[i].proc;
+ 		newEvents[i].passThrough = events[i].passThrough;
+ 		newEvents[i].needDestroy = FALSE;
  		newEvents[i].data = NULL;
! 		newEvents[i].name = strdup(events[i].name);
  		if (!newEvents[i].name)
  		{
  			while (--i >= 0)
  free(newEvents[i].name);
  			free(newEvents);
  			return NULL;
***
*** 663,679 
  
  	if (!res)
  		return;
  
  	for (i = 0; i < res->nEvents; i++)
  	{
! 		PGEventResultDestroy evt;
  
- 		evt.result = res;
- 		(void) res->events[i].proc(PGEVT_RESULTDESTROY, &evt,
-    res->events[i].passThrough);
  		free(res->events[i].name);
  	}
  
  	if (res->events)
  		free(res->events);
  
--- 665,685 
  
  	if (!res)
  		return;
  
  	for (i = 0; i < res->nEvents; i++)
  	{
! 		if(res->events[i].needDestroy)
! 		{
! 			PGEventResultDestroy evt;
! 
! 			evt.result = res;
! 			(void) res->events[i].proc(PGEVT_RESULTDESTROY, &evt,
! 	   res->events[i].passThrough);
! 		}
  
  		free(res->events[i].name);
  	}
  
  	if (res->events)
  		free(res->events);
  
***
*** 1609,1620 
--- 1615,1628 
    libpq_gettext("PGEventProc \"%s\" failed during PGEVT_RESULTCREATE event\n"),
    res->events[i].name);
  pqSetResultError(res, conn->errorMessage.data);
  res->resultStatus = PGRES_FATAL_ERROR;
  break;
  			}
+ 
+ 			res->events[i].needDestroy = TRUE;
  		}
  	}
  
  	return res;
  }
  
Index: src/interfaces/libpq/libpq-int.h
===
RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/libpq-int.h,v
retrieving revision 1.132
diff -C6 -r1.132 libpq-int.h
*** src/interfaces/libpq/libpq-int.h	17 Sep 2008 04:31:08 -	1.132
--- src/interfaces/libpq/libpq-int.h	17 Sep 2008 10:49:10 -
***
*** 153,164 
--- 153,165 
  typedef struct PGEvent
  {
  	PGEventProc	proc;			/* the function to call on events */
  	char	   *name;			/* used only for error messages */
  	void	   *passThrough;	/* pointer supplied at registration time */
  	void	   *data;			/* optional state (instance) data */
+ 	int		   needDestroy; 	/* indicates a PGEVT_RESULTDESTROY is needed. */
  } PGEvent;
  
  struct pg_result
  {
  	int			ntups;
  	int			numAttributes;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] libpq events patch (with sgml docs)

2008-09-17 Thread Andrew Chernow


> Now, it's questionable how tense we need to be about that as long as
> event proc failure aborts calling of later event procs.  That means
> that procs have to be robust against getting DESTROY with no CREATE
> calls in any case.  Should we try to make that less uncertain?
>
>

I attached a patch that adds a 'needDestroy' member to PGEvent  It is set when 
resultcreate or resultcopy succeeds and checked during a PQclear.  That *should* 
resolve the issue of "no resultcreate but gets a resultdestroy".



The general question of symmetry between RESULTCREATE and RESULTDESTROY
callbacks is still bothering me.  As committed, PQmakeEmptyPGresult
will copy events into its result, but not fire RESULTCREATE for them
... but they'll get RESULTDESTROY when it's deleted.  Is that what we
want?  


PQmakeEmptyPGResult was given thought.  The problem is every internal function 
that generates a result calls PQmakeEmptyPGResult, but those cases should not 
fire a resultcreate.  resultcreate should be called when the result is fully 
constructed (tuples and all) so the eventproc gets a useful PGresult.


One solution is to do something like the below:

PGresult *
PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
{
  return pqMakeEmptyPGresult(conn, status, TRUE);
}

PGresult *
pqMakeEmptyPGresult(PGconn *conn, ExecStatusType status, int fireEvents)
{
  // existing function, only change is handling fireEvents
}

I am willing to create a patch for this.  Is this an acceptable idea?

> I don't have a lot of faith that PQgetResult is the only place
> inside libpq that needs to fire RESULTCREATE, either.
>

I looked again and I didn't see anything.  Is there something your thinking of? 
 ISTM that PQgetResult is called every where a result is created (outside of 
PQmakeEmptyPGresult).


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/
Index: src/interfaces/libpq/fe-exec.c
===
RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/fe-exec.c,v
retrieving revision 1.198
diff -C6 -r1.198 fe-exec.c
*** src/interfaces/libpq/fe-exec.c	17 Sep 2008 04:31:08 -	1.198
--- src/interfaces/libpq/fe-exec.c	17 Sep 2008 10:40:41 -
***
*** 356,367 
--- 356,369 
  		if (!dest->events[i].proc(PGEVT_RESULTCOPY, &evt,
    dest->events[i].passThrough))
  		{
  			PQclear(dest);
  			return NULL;
  		}
+ 
+ 		dest->events[i].needDestroy = TRUE;
  	}
  
  	return dest;
  }
  
  /*
***
*** 378,394 
  		return NULL;
  
  	newEvents = (PGEvent *) malloc(count * sizeof(PGEvent));
  	if (!newEvents)
  		return NULL;
  
- 	memcpy(newEvents, events, count * sizeof(PGEvent));
- 
- 	/* NULL out the data pointers and deep copy names */
  	for (i = 0; i < count; i++)
  	{
  		newEvents[i].data = NULL;
  		newEvents[i].name = strdup(newEvents[i].name);
  		if (!newEvents[i].name)
  		{
  			while (--i >= 0)
  free(newEvents[i].name);
--- 380,396 
  		return NULL;
  
  	newEvents = (PGEvent *) malloc(count * sizeof(PGEvent));
  	if (!newEvents)
  		return NULL;
  
  	for (i = 0; i < count; i++)
  	{
+ 		newEvents[i].proc = events[i].proc;
+ 		newEvents[i].passThrough = events[i].passThrough;
+ 		newEvents[i].needDestroy = FALSE;
  		newEvents[i].data = NULL;
  		newEvents[i].name = strdup(newEvents[i].name);
  		if (!newEvents[i].name)
  		{
  			while (--i >= 0)
  free(newEvents[i].name);
***
*** 663,679 
  
  	if (!res)
  		return;
  
  	for (i = 0; i < res->nEvents; i++)
  	{
! 		PGEventResultDestroy evt;
  
- 		evt.result = res;
- 		(void) res->events[i].proc(PGEVT_RESULTDESTROY, &evt,
-    res->events[i].passThrough);
  		free(res->events[i].name);
  	}
  
  	if (res->events)
  		free(res->events);
  
--- 665,685 
  
  	if (!res)
  		return;
  
  	for (i = 0; i < res->nEvents; i++)
  	{
! 		if(res->events[i].needDestroy)
! 		{
! 			PGEventResultDestroy evt;
! 
! 			evt.result = res;
! 			(void) res->events[i].proc(PGEVT_RESULTDESTROY, &evt,
! 	   res->events[i].passThrough);
! 		}
  
  		free(res->events[i].name);
  	}
  
  	if (res->events)
  		free(res->events);
  
***
*** 1609,1620 
--- 1615,1628 
    libpq_gettext("PGEventProc \"%s\" failed during PGEVT_RESULTCREATE event\n"),
    res->events[i].name);
  pqSetResultError(res, conn->errorMessage.data);
  res->resultStatus = PGRES_FATAL_ERROR;
  break;
  			}
+ 
+ 			res->events[i].needDestroy = TRUE;
  		}
  	}
  
  	return res;
  }
  
Index: src/interfaces/libpq/libpq-int.h
===
RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/libpq-int.h,v
retrieving revision 1.132
diff -C6 -r1.132 libpq-int.h
*** src/interfaces/libpq/libpq-int.h	17 Sep 2008 04:31:08 -	1.132
--- src/interfaces/libpq/libpq-int.h	17 Sep 2008 10:40:41 -
***
*** 153,164 
--- 153,165 
  typedef struct PGEvent

Re: [HACKERS] text search patch status update?

2008-09-17 Thread Heikki Linnakangas


Sushant Sinha wrote:

Patch #2. I think this is a straigt forward bug fix.


Yes, I think you're right. In hlCover(), *q is 0 when the only match is 
the first item in the text, and we shouldn't bail out with "return 
false" in that case.


But there seems to be something else going on here as well:

postgres=# select ts_headline('1 2 3 4 5', '2'::tsquery, 'MinWords=2, 
MaxWords=3');

 ts_headline
--
 2 3 4
(1 row)

postgres=# select ts_headline('aaa1 aaa2 aaa3 aaa4 
aaa5','aaa2'::tsquery, 'MinWords=2, MaxWords=3');

   ts_headline
--
 aaa2 aaa3
(1 row)

In the first example, you get three words, and in the 2nd, just two. It 
must be because of the default ShortWord setting of 3. Also, if only the 
last word matches, and it's a "short word", you get the whole text:


postgres=# select ts_headline('1 2 3 4 5','5'::tsquery, 'MinWords=2, 
MaxWords=3');

   ts_headline
--
 1 2 3 4 5
(1 row)

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Pavel Stehule

2008/9/17 Tom Lane <[EMAIL PROTECTED]>:
> Tatsuo Ishii <[EMAIL PROTECTED]> writes:
>>> Do we really have to make RECURSIVE a fully reserved keyword?
>
>> According to the standard, RECURSIVE is a reserved keyword, I believe.
>
> Sure, but our general rule is to make keywords no more reserved than
> is absolutely necessary to make the bison grammar unambiguous.  I
> haven't tested, but I'm thinking that if WITH is fully reserved then
> RECURSIVE shouldn't have to be.

I am not sure, if these rule is good. Somebody who develop on
postgresql should have a problems when they will be port to other
databases in future. Reserved words in standards should be respected.

regards
Pavel Stehule

>
>regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXEC_BACKEND

2008-09-17 Thread Simon Riggs


On Tue, 2008-09-16 at 15:53 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > We keep talking about EXEC_BACKEND mode, though until recently I had
> > misunderstood what that meant. I also realised that I have more than
> > once neglected to take it into account when writing a patch - one recent
> > patch failed to do this.
> 
> > I can't find anything coherent in docs/readme/comments to explain why it
> > exists and what its implications are.
> 
> It exists because Windows doesn't have fork(), only the equivalent of
> fork-and-exec.  Which means that no state variables will be inherited
> from the postmaster by its child processes, and any state that needs to
> be carried across has to be handled explicitly.  You can define
> EXEC_BACKEND in a non-Windows build, for the purpose of testing code
> to see if it works in that environment.

OK, if its that simple then I see why its not documented. Thanks. I
thought there might be more to it than that.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Simon Riggs

On Wed, 2008-09-17 at 10:09 +0300, Heikki Linnakangas wrote:
> David Fetter wrote:
> > On Tue, Sep 16, 2008 at 08:59:08PM -0400, Alvaro Herrera wrote:
> >> Simon Riggs wrote:
> >>> Disabling autovacuum can have catastrophic effects, since it disables
> >>> the ANALYZing of tables.
> >>>
> >>> Can we have a mode where we disable autoVACUUM yet enable autoANALYZE?
> >> You mean something like
> >> autovacuum = on / off / analyze ?
> >>
> >> We can certainly do that, but is there buy-in?
> > 
> > +1
> > 
> > Having autovacuum on during bulk loads can really tank performance,
> > but having autoanalyze on is good :)
> 
> Isn't autoanalyze a waste of time during a bulk load? Seems better to 
> run ANALYZE manually at the end.

Its not a waste of time because it catches tables immediately they have
been loaded, not just at the end of the bulk load. Running ANALYZE is a
waste of time if autoanalyze has already caught it, which is why that's
never been added onto the end of a pg_dump script. But currently this is
true only when we have both autoVACUUM and autoANALYZE enabled.

> Adding that option feels natural to me, but it is a rather blunt 
> instrument. You can already do that with pg_autovacuum, though that 
> interface isn't very user-friendly. I whole-heartedly support the idea 
> of controlling autovacuum with storage options, e.g "ALTER TABLE ... 
> WITH (autoanalyze = on)".

Yes, have that option also, since it is fine tuning.

I definitely want a blunt instrument! I don't want to have to run ALTER
TABLE on *every* table. Even if you think that's possible, it won't work
in conjunction with interfaces submitting standard SQL, plus it won't
work if I forget either.

This request comes from a real situation where a dump was reloaded
during the day when autovacuum was off and so ANALYZE was missed. Not my
mistake, but it took time to resolve that could have been avoided by the
new option suggested here.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Heikki Linnakangas


David Fetter wrote:

On Tue, Sep 16, 2008 at 08:59:08PM -0400, Alvaro Herrera wrote:

Simon Riggs wrote:

Disabling autovacuum can have catastrophic effects, since it disables
the ANALYZing of tables.

Can we have a mode where we disable autoVACUUM yet enable autoANALYZE?

You mean something like
autovacuum = on / off / analyze ?

We can certainly do that, but is there buy-in?


+1

Having autovacuum on during bulk loads can really tank performance,
but having autoanalyze on is good :)


Isn't autoanalyze a waste of time during a bulk load? Seems better to 
run ANALYZE manually at the end.


Adding that option feels natural to me, but it is a rather blunt 
instrument. You can already do that with pg_autovacuum, though that 
interface isn't very user-friendly. I whole-heartedly support the idea 
of controlling autovacuum with storage options, e.g "ALTER TABLE ... 
WITH (autoanalyze = on)".


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

53 matches

Mail list logo