Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread David Fetter
On Tue, Sep 16, 2008 at 08:59:08PM -0400, Alvaro Herrera wrote:
 Simon Riggs wrote:
  Disabling autovacuum can have catastrophic effects, since it disables
  the ANALYZing of tables.
  
  Can we have a mode where we disable autoVACUUM yet enable autoANALYZE?
 
 You mean something like
 autovacuum = on / off / analyze ?
 
 We can certainly do that, but is there buy-in?

+1

Having autovacuum on during bulk loads can really tank performance,
but having autoanalyze on is good :)

Cheers,
David.
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal of SE-PostgreSQL patches (for CommitFest:Sep)

2008-09-17 Thread KaiGai Kohei

Greg Smith wrote:

On Wed, 17 Sep 2008, Peter Eisentraut wrote:

System-wide consistency in access controls could be nice to have in 
some cases.  But is it really achievable?  In the typical three-tier 
web application scenario, do you really have system-wide consistency?  
Can you configure your application server using SELinux?


Each of the tiers end up with mapping layer similar to the one 
implemented here to map the SELinux permissions - PostgreSQL.  Java for 
example has a whole JVM security manager component that makes it 
straighforward to do such a mapping. 
http://articles.techrepublic.com.com/5100-10878_11-6178805.html is a 
good quick intro that shows how the call structure is similar to what 
the SE-PostgreSQL code does.


I guess these security architectures have same origin.

The reference monitor concept requres all accesses to data objects to be
checked by a tamperproof, always-invoked module based on its policy.
  http://en.wikipedia.org/wiki/Reference_monitor

SE-PostgreSQL uses in-kernel SELinux as a reference monitor to check
all accesses to database object via SQL.

And is SELinux really the desirable interface for a system-wide access 
control facility?  Why not port chmod or POSIX ACLs to PostgreSQL, or 
port SQL roles back to the operating system, or something else that 
captures what more people are actually using in practice.


The main feature of SELinux that this crowd likes is how it manages 
privledge escalation risk.  I'm not sure if POSIX ACLs for example are 
as effective at limiting the damage an exploitable suid binary can 
cause. As for what people are actually using, as someone who lives near 
the US capital I can tell you that installs using SELinux are quite 
plentiful around here--there really is no other UNIX-based technology 
for this purpose that's gotten traction inside this government like 
SELinux has.


Anyway, even though I think picking SELinux as the primary security 
mechanism to integrate with is a sensible choice and I'm confident that 
the rest of the software stack isn't a problem, I do share your concern 
that implementing row and column-level security would make more sense in 
a general way first.


Thanks for your explanation.

The PGACE security framework can mount a OS independent fine
grained access control feature, like Oracle Label Security.
However, one concern is we have only one CommitFest remained.

As I mentioned at the previous message, I think it is not
a strange behavior that different security subsystems make
different decisions on individual gulanualities.


Ultimately, I see this patch as an interesting proof of concept -- it 
got us on the NSA site anyway -- but I can't see more than three 
people actually making use of it


I take it you've never seen how big the NSA fort^H^H^H^Hfacility is?  
I'm not sure exactly how many orders of magnitude your estimate is off 
by, but I know it's at least 2 just based on conversations I've been 
involved in with companies around here.  A lot of the US defense and 
homeland security related companies are adopting open-source software 
stacks because they can audit every level of the software, and there's a 
big void in that stack waiting for a database with the right security 
model to fill.  You are right that getting code contributions back again 
is a challenge though.


I don't have statistically reliable information. :)
However, I believe there is potentially strong demand for secure database
due to responses from audiences when I had presentations about SE-PostgreSQL
in various opportunities.

IIRC, Josh also said similar things.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Heikki Linnakangas

David Fetter wrote:

On Tue, Sep 16, 2008 at 08:59:08PM -0400, Alvaro Herrera wrote:

Simon Riggs wrote:

Disabling autovacuum can have catastrophic effects, since it disables
the ANALYZing of tables.

Can we have a mode where we disable autoVACUUM yet enable autoANALYZE?

You mean something like
autovacuum = on / off / analyze ?

We can certainly do that, but is there buy-in?


+1

Having autovacuum on during bulk loads can really tank performance,
but having autoanalyze on is good :)


Isn't autoanalyze a waste of time during a bulk load? Seems better to 
run ANALYZE manually at the end.


Adding that option feels natural to me, but it is a rather blunt 
instrument. You can already do that with pg_autovacuum, though that 
interface isn't very user-friendly. I whole-heartedly support the idea 
of controlling autovacuum with storage options, e.g ALTER TABLE ... 
WITH (autoanalyze = on).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Simon Riggs

On Wed, 2008-09-17 at 10:09 +0300, Heikki Linnakangas wrote:
 David Fetter wrote:
  On Tue, Sep 16, 2008 at 08:59:08PM -0400, Alvaro Herrera wrote:
  Simon Riggs wrote:
  Disabling autovacuum can have catastrophic effects, since it disables
  the ANALYZing of tables.
 
  Can we have a mode where we disable autoVACUUM yet enable autoANALYZE?
  You mean something like
  autovacuum = on / off / analyze ?
 
  We can certainly do that, but is there buy-in?
  
  +1
  
  Having autovacuum on during bulk loads can really tank performance,
  but having autoanalyze on is good :)
 
 Isn't autoanalyze a waste of time during a bulk load? Seems better to 
 run ANALYZE manually at the end.

Its not a waste of time because it catches tables immediately they have
been loaded, not just at the end of the bulk load. Running ANALYZE is a
waste of time if autoanalyze has already caught it, which is why that's
never been added onto the end of a pg_dump script. But currently this is
true only when we have both autoVACUUM and autoANALYZE enabled.

 Adding that option feels natural to me, but it is a rather blunt 
 instrument. You can already do that with pg_autovacuum, though that 
 interface isn't very user-friendly. I whole-heartedly support the idea 
 of controlling autovacuum with storage options, e.g ALTER TABLE ... 
 WITH (autoanalyze = on).

Yes, have that option also, since it is fine tuning.

I definitely want a blunt instrument! I don't want to have to run ALTER
TABLE on *every* table. Even if you think that's possible, it won't work
in conjunction with interfaces submitting standard SQL, plus it won't
work if I forget either.

This request comes from a real situation where a dump was reloaded
during the day when autovacuum was off and so ANALYZE was missed. Not my
mistake, but it took time to resolve that could have been avoided by the
new option suggested here.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXEC_BACKEND

2008-09-17 Thread Simon Riggs

On Tue, 2008-09-16 at 15:53 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  We keep talking about EXEC_BACKEND mode, though until recently I had
  misunderstood what that meant. I also realised that I have more than
  once neglected to take it into account when writing a patch - one recent
  patch failed to do this.
 
  I can't find anything coherent in docs/readme/comments to explain why it
  exists and what its implications are.
 
 It exists because Windows doesn't have fork(), only the equivalent of
 fork-and-exec.  Which means that no state variables will be inherited
 from the postmaster by its child processes, and any state that needs to
 be carried across has to be handled explicitly.  You can define
 EXEC_BACKEND in a non-Windows build, for the purpose of testing code
 to see if it works in that environment.

OK, if its that simple then I see why its not documented. Thanks. I
thought there might be more to it than that.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Pavel Stehule
2008/9/17 Tom Lane [EMAIL PROTECTED]:
 Tatsuo Ishii [EMAIL PROTECTED] writes:
 Do we really have to make RECURSIVE a fully reserved keyword?

 According to the standard, RECURSIVE is a reserved keyword, I believe.

 Sure, but our general rule is to make keywords no more reserved than
 is absolutely necessary to make the bison grammar unambiguous.  I
 haven't tested, but I'm thinking that if WITH is fully reserved then
 RECURSIVE shouldn't have to be.

I am not sure, if these rule is good. Somebody who develop on
postgresql should have a problems when they will be port to other
databases in future. Reserved words in standards should be respected.

regards
Pavel Stehule


regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] text search patch status update?

2008-09-17 Thread Heikki Linnakangas

Sushant Sinha wrote:

Patch #2. I think this is a straigt forward bug fix.


Yes, I think you're right. In hlCover(), *q is 0 when the only match is 
the first item in the text, and we shouldn't bail out with return 
false in that case.


But there seems to be something else going on here as well:

postgres=# select ts_headline('1 2 3 4 5', '2'::tsquery, 'MinWords=2, 
MaxWords=3');

 ts_headline
--
 b2/b 3 4
(1 row)

postgres=# select ts_headline('aaa1 aaa2 aaa3 aaa4 
aaa5','aaa2'::tsquery, 'MinWords=2, MaxWords=3');

   ts_headline
--
 baaa2/b aaa3
(1 row)

In the first example, you get three words, and in the 2nd, just two. It 
must be because of the default ShortWord setting of 3. Also, if only the 
last word matches, and it's a short word, you get the whole text:


postgres=# select ts_headline('1 2 3 4 5','5'::tsquery, 'MinWords=2, 
MaxWords=3');

   ts_headline
--
 1 2 3 4 b5/b
(1 row)

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] libpq events patch (with sgml docs)

2008-09-17 Thread Andrew Chernow

 Now, it's questionable how tense we need to be about that as long as
 event proc failure aborts calling of later event procs.  That means
 that procs have to be robust against getting DESTROY with no CREATE
 calls in any case.  Should we try to make that less uncertain?



I attached a patch that adds a 'needDestroy' member to PGEvent  It is set when 
resultcreate or resultcopy succeeds and checked during a PQclear.  That *should* 
resolve the issue of no resultcreate but gets a resultdestroy.



The general question of symmetry between RESULTCREATE and RESULTDESTROY
callbacks is still bothering me.  As committed, PQmakeEmptyPGresult
will copy events into its result, but not fire RESULTCREATE for them
... but they'll get RESULTDESTROY when it's deleted.  Is that what we
want?  


PQmakeEmptyPGResult was given thought.  The problem is every internal function 
that generates a result calls PQmakeEmptyPGResult, but those cases should not 
fire a resultcreate.  resultcreate should be called when the result is fully 
constructed (tuples and all) so the eventproc gets a useful PGresult.


One solution is to do something like the below:

PGresult *
PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
{
  return pqMakeEmptyPGresult(conn, status, TRUE);
}

PGresult *
pqMakeEmptyPGresult(PGconn *conn, ExecStatusType status, int fireEvents)
{
  // existing function, only change is handling fireEvents
}

I am willing to create a patch for this.  Is this an acceptable idea?

 I don't have a lot of faith that PQgetResult is the only place
 inside libpq that needs to fire RESULTCREATE, either.


I looked again and I didn't see anything.  Is there something your thinking of? 
 ISTM that PQgetResult is called every where a result is created (outside of 
PQmakeEmptyPGresult).


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/
Index: src/interfaces/libpq/fe-exec.c
===
RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/fe-exec.c,v
retrieving revision 1.198
diff -C6 -r1.198 fe-exec.c
*** src/interfaces/libpq/fe-exec.c	17 Sep 2008 04:31:08 -	1.198
--- src/interfaces/libpq/fe-exec.c	17 Sep 2008 10:40:41 -
***
*** 356,367 
--- 356,369 
  		if (!dest-events[i].proc(PGEVT_RESULTCOPY, evt,
    dest-events[i].passThrough))
  		{
  			PQclear(dest);
  			return NULL;
  		}
+ 
+ 		dest-events[i].needDestroy = TRUE;
  	}
  
  	return dest;
  }
  
  /*
***
*** 378,394 
  		return NULL;
  
  	newEvents = (PGEvent *) malloc(count * sizeof(PGEvent));
  	if (!newEvents)
  		return NULL;
  
- 	memcpy(newEvents, events, count * sizeof(PGEvent));
- 
- 	/* NULL out the data pointers and deep copy names */
  	for (i = 0; i  count; i++)
  	{
  		newEvents[i].data = NULL;
  		newEvents[i].name = strdup(newEvents[i].name);
  		if (!newEvents[i].name)
  		{
  			while (--i = 0)
  free(newEvents[i].name);
--- 380,396 
  		return NULL;
  
  	newEvents = (PGEvent *) malloc(count * sizeof(PGEvent));
  	if (!newEvents)
  		return NULL;
  
  	for (i = 0; i  count; i++)
  	{
+ 		newEvents[i].proc = events[i].proc;
+ 		newEvents[i].passThrough = events[i].passThrough;
+ 		newEvents[i].needDestroy = FALSE;
  		newEvents[i].data = NULL;
  		newEvents[i].name = strdup(newEvents[i].name);
  		if (!newEvents[i].name)
  		{
  			while (--i = 0)
  free(newEvents[i].name);
***
*** 663,679 
  
  	if (!res)
  		return;
  
  	for (i = 0; i  res-nEvents; i++)
  	{
! 		PGEventResultDestroy evt;
  
- 		evt.result = res;
- 		(void) res-events[i].proc(PGEVT_RESULTDESTROY, evt,
-    res-events[i].passThrough);
  		free(res-events[i].name);
  	}
  
  	if (res-events)
  		free(res-events);
  
--- 665,685 
  
  	if (!res)
  		return;
  
  	for (i = 0; i  res-nEvents; i++)
  	{
! 		if(res-events[i].needDestroy)
! 		{
! 			PGEventResultDestroy evt;
! 
! 			evt.result = res;
! 			(void) res-events[i].proc(PGEVT_RESULTDESTROY, evt,
! 	   res-events[i].passThrough);
! 		}
  
  		free(res-events[i].name);
  	}
  
  	if (res-events)
  		free(res-events);
  
***
*** 1609,1620 
--- 1615,1628 
    libpq_gettext(PGEventProc \%s\ failed during PGEVT_RESULTCREATE event\n),
    res-events[i].name);
  pqSetResultError(res, conn-errorMessage.data);
  res-resultStatus = PGRES_FATAL_ERROR;
  break;
  			}
+ 
+ 			res-events[i].needDestroy = TRUE;
  		}
  	}
  
  	return res;
  }
  
Index: src/interfaces/libpq/libpq-int.h
===
RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/libpq-int.h,v
retrieving revision 1.132
diff -C6 -r1.132 libpq-int.h
*** src/interfaces/libpq/libpq-int.h	17 Sep 2008 04:31:08 -	1.132
--- src/interfaces/libpq/libpq-int.h	17 Sep 2008 10:40:41 -
***
*** 153,164 
--- 153,165 
  typedef struct PGEvent
  {
  	PGEventProc	proc;			/* the function 

Re: [HACKERS] [PATCHES] libpq events patch (with sgml docs)

2008-09-17 Thread Andrew Chernow

Andrew Chernow wrote:

  Now, it's questionable how tense we need to be about that as long as
  event proc failure aborts calling of later event procs.  That means
  that procs have to be robust against getting DESTROY with no CREATE
  calls in any case.  Should we try to make that less uncertain?
 
 

I attached a patch that adds a 'needDestroy' member to PGEvent  It is 
set when resultcreate or resultcopy succeeds and checked during a 
PQclear.  That *should* resolve the issue of no resultcreate but gets a 
resultdestroy.





Shoot.  I have a booboo in that last patch.  Attached is the corrected version.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/
Index: src/interfaces/libpq/fe-exec.c
===
RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/fe-exec.c,v
retrieving revision 1.198
diff -C6 -r1.198 fe-exec.c
*** src/interfaces/libpq/fe-exec.c	17 Sep 2008 04:31:08 -	1.198
--- src/interfaces/libpq/fe-exec.c	17 Sep 2008 10:49:10 -
***
*** 356,367 
--- 356,369 
  		if (!dest-events[i].proc(PGEVT_RESULTCOPY, evt,
    dest-events[i].passThrough))
  		{
  			PQclear(dest);
  			return NULL;
  		}
+ 
+ 		dest-events[i].needDestroy = TRUE;
  	}
  
  	return dest;
  }
  
  /*
***
*** 378,396 
  		return NULL;
  
  	newEvents = (PGEvent *) malloc(count * sizeof(PGEvent));
  	if (!newEvents)
  		return NULL;
  
- 	memcpy(newEvents, events, count * sizeof(PGEvent));
- 
- 	/* NULL out the data pointers and deep copy names */
  	for (i = 0; i  count; i++)
  	{
  		newEvents[i].data = NULL;
! 		newEvents[i].name = strdup(newEvents[i].name);
  		if (!newEvents[i].name)
  		{
  			while (--i = 0)
  free(newEvents[i].name);
  			free(newEvents);
  			return NULL;
--- 380,398 
  		return NULL;
  
  	newEvents = (PGEvent *) malloc(count * sizeof(PGEvent));
  	if (!newEvents)
  		return NULL;
  
  	for (i = 0; i  count; i++)
  	{
+ 		newEvents[i].proc = events[i].proc;
+ 		newEvents[i].passThrough = events[i].passThrough;
+ 		newEvents[i].needDestroy = FALSE;
  		newEvents[i].data = NULL;
! 		newEvents[i].name = strdup(events[i].name);
  		if (!newEvents[i].name)
  		{
  			while (--i = 0)
  free(newEvents[i].name);
  			free(newEvents);
  			return NULL;
***
*** 663,679 
  
  	if (!res)
  		return;
  
  	for (i = 0; i  res-nEvents; i++)
  	{
! 		PGEventResultDestroy evt;
  
- 		evt.result = res;
- 		(void) res-events[i].proc(PGEVT_RESULTDESTROY, evt,
-    res-events[i].passThrough);
  		free(res-events[i].name);
  	}
  
  	if (res-events)
  		free(res-events);
  
--- 665,685 
  
  	if (!res)
  		return;
  
  	for (i = 0; i  res-nEvents; i++)
  	{
! 		if(res-events[i].needDestroy)
! 		{
! 			PGEventResultDestroy evt;
! 
! 			evt.result = res;
! 			(void) res-events[i].proc(PGEVT_RESULTDESTROY, evt,
! 	   res-events[i].passThrough);
! 		}
  
  		free(res-events[i].name);
  	}
  
  	if (res-events)
  		free(res-events);
  
***
*** 1609,1620 
--- 1615,1628 
    libpq_gettext(PGEventProc \%s\ failed during PGEVT_RESULTCREATE event\n),
    res-events[i].name);
  pqSetResultError(res, conn-errorMessage.data);
  res-resultStatus = PGRES_FATAL_ERROR;
  break;
  			}
+ 
+ 			res-events[i].needDestroy = TRUE;
  		}
  	}
  
  	return res;
  }
  
Index: src/interfaces/libpq/libpq-int.h
===
RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/libpq-int.h,v
retrieving revision 1.132
diff -C6 -r1.132 libpq-int.h
*** src/interfaces/libpq/libpq-int.h	17 Sep 2008 04:31:08 -	1.132
--- src/interfaces/libpq/libpq-int.h	17 Sep 2008 10:49:10 -
***
*** 153,164 
--- 153,165 
  typedef struct PGEvent
  {
  	PGEventProc	proc;			/* the function to call on events */
  	char	   *name;			/* used only for error messages */
  	void	   *passThrough;	/* pointer supplied at registration time */
  	void	   *data;			/* optional state (instance) data */
+ 	int		   needDestroy; 	/* indicates a PGEVT_RESULTDESTROY is needed. */
  } PGEvent;
  
  struct pg_result
  {
  	int			ntups;
  	int			numAttributes;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Robert Haas
 I am not sure, if these rule is good. Somebody who develop on
 postgresql should have a problems when they will be port to other
 databases in future. Reserved words in standards should be respected.

I disagree.  I have never ported an app written for PostgreSQL to
another database system, and have no plans to start.  The fact that
some other database system might barf on a particular bit of SQL is
insufficient reason for PostgreSQL to do the same thing.

If people want to write code that will work on multiple databases,
they should of course avoid using any SQL reserved words for anything
other than their reserved purposes.  But there is no reason for the
database system to unilaterally shove that down everyone's throat.  It
is very easy to overdo the idea of protecting users from themselves.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] libpq events patch (with sgml docs)

2008-09-17 Thread Tom Lane
Andrew Chernow [EMAIL PROTECTED] writes:
 Now, it's questionable how tense we need to be about that as long as
 event proc failure aborts calling of later event procs.  That means
 that procs have to be robust against getting DESTROY with no CREATE
 calls in any case.  Should we try to make that less uncertain?

 I attached a patch that adds a 'needDestroy' member to PGEvent It is
 set when resultcreate or resultcopy succeeds and checked during a
 PQclear.  That *should* resolve the issue of no resultcreate but gets
 a resultdestroy.

Some thought would need to be given to how that interacts with
RESULTCOPY.  Presumably on the destination side, RESULTCOPY is the
equivalent of RESULTCREATE, ie, don't DESTROY if you didn't get COPY.
But what of the source side?  Arguably you shouldn't invoke COPY on
events that were never initialized in this object.

I also think that a little bit of thought should go into whether or
not to call DESTROY on an event that *did* get a CREATE and failed it.
You could argue that one either way: should a failing CREATE operation
be expected to fully clean up after itself, or should it be expected
to leave things in a state where DESTROY can clean up properly?
I'm not entirely sure, but option A might force duplication of code
between CREATE's failure path and DESTROY.  Whichever semantics we
choose needs to be documented.

Also, if we choose option B, then the same would hold for REGISTER
versus CONNDESTROY: failing REGISTER should still mean that you get
a CONNDESTROY.  So maybe that's an argument for option A, because
that's how REGISTER works now.
 
Lastly, the idea that was in the back of my mind was to resolve this
issue by unconditionally calling all the event procs at CREATE time,
not by cutting off the later ones if an earlier one failed.  Again,
I do not see a performance argument for skipping the extra steps,
and it seems to me that it might improve symmetry/robustness.
I'm not necessarily wedded to that approach but it bears thinking
about.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-17 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Heikki Linnakangas wrote:


snip



Let me describe this test case first:
- The test program calls RecordAndGetPageWithFreeSpace in a tight loop, 
with random values. There's no activity to the heap. In normal usage, 
the time spent in RecordAndGetWithFreeSpace is minuscule compared to the 
heap and index updates that cause RecordAndGetWithFreeSpace to be called.
- WAL was placed on a RAM drive. This is of course not how people set up 
their database servers, but the point of this test was to measure CPU 
speed and scalability. The impact of writing extra WAL is significant 
and needs to be taken into account, but that's a separate test and 
discussion, and needs to be considered in comparison to the WAL written 
by heap and index updates.




snip



Another surprise was how badly both implementations scale. On CVS HEAD, 
I expected the performance to be roughly the same with 1 and 2 clients, 
because all access to the FSM is serialized on the FreeSpaceLock. But 
adding the 2nd client not only didn't help, but it actually made the 
performance much worse than with a single client. Context switching or 
cache line contention, perhaps? The new FSM implementation shows the 
same effect, which was an even bigger surprise. At table sizes  32 MB, 
the FSM no longer fits on a single FSM page, so I expected almost a 
linear speed up with bigger table sizes from using multiple clients. 
That's not happening, and I don't know why. Although, going from 33MB to 
333 MB, the performance with 2 clients almost doubles, but it still 
doesn't exceed that with 1 client.



I tested it with DTrace on Solaris 10 and 8CPUs SPARC machine. I got 
similar result as you. Main problem in your new implementation is 
locking. On small tables where FSM fits on one page clients spend about 
3/4 time to waiting on page lock. On medium tables (2level FSM) then 
InsertWal lock become significant - it takes 1/4 of waiting time. Page 
waiting takes only 1/3.


I think the main reason of scalability problem is that locking invokes 
serialization.


Suggestions:

1) remove WAL logging. I think that FSM record should be recovered 
during processing of others WAL records (like insert, update). Probably 
only we need full page write on first modification after checkpoint.


2) break lock - use only share lock for page locking and divide page for 
smaller part for exclusive locking (at least for root page)



However, your test case is too artificial. I'm going to run OLTP 
workload and test it with real workload.


Zdenek






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Tom Lane
Robert Haas [EMAIL PROTECTED] writes:
 I am not sure, if these rule is good. Somebody who develop on
 postgresql should have a problems when they will be port to other
 databases in future. Reserved words in standards should be respected.

 I disagree.  I have never ported an app written for PostgreSQL to
 another database system, and have no plans to start.  The fact that
 some other database system might barf on a particular bit of SQL is
 insufficient reason for PostgreSQL to do the same thing.

 If people want to write code that will work on multiple databases,
 they should of course avoid using any SQL reserved words for anything
 other than their reserved purposes.

More than that, they have to actually test their SQL on each target DB.
Every DB (including us) is going to have some reserved words that are
not in the standard; so imagining that Postgres can all by itself
protect you from this type of problem is doomed to failure anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for SQL-standard negative valued year-month literals

2008-09-17 Thread Stephen R. van den Berg
Tom Lane wrote:
Ron Mayer [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 If I read SQL 200N's spec correctly

   select interval '-1 1:00:00';

 should mean-1 days -1 hours,
 yet 8.3 sees it as -1 days +1 hours.

I think we are kind of stuck on this one.  If we change it, then how
would one represent -1 days +1 hours?  The spec's format is only sane
if you assume all the fields must have the same sign, which is not
the case for PG.

-1 days +1 hours = interval '-0 23:00:00'

Intervals are a scalar, not an addition of assorted values, alternating signs
between fields would be wrong.
-- 
Sincerely,
   Stephen R. van den Berg.

He did a quarter of the work in *half* the time!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for SQL-standard negative valued year-month literals

2008-09-17 Thread Tom Lane
Stephen R. van den Berg [EMAIL PROTECTED] writes:
 Intervals are a scalar, not an addition of assorted values, alternating signs
 between fields would be wrong.

Sorry, you're the one who's wrong on that.  We've treated intervals as
three independent fields for years now (and before that it was two
independent fields).  We're not going to throw away that capability.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per-table autovacuum configuration

2008-09-17 Thread Martin Pihlak
Alvaro Herrera wrote:
 any new thoughts on the matter? Perhaps someone is already working
 on it?
 
 It is still a wanted feature, but a couple of people have offered
 patches and I'm waiting for them ...
 

Aha, good. I was considering taking a stab at it, but under the
circumstances will wait and see.

regards,
Martin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal of SE-PostgreSQL patches (for CommitFest:Sep)

2008-09-17 Thread KaiGai Kohei

KaiGai Kohei wrote:

Peter, thanks for your comments.

  Let's review:
 
  *) System-wide consistency in access controls could be nice to have in
  some cases.  But is it really achievable?  In the typical three-tier web
  application scenario, do you really have system-wide consistency?  Can
  you configure your application server using SELinux?  I'm no expert on
  these things, but I wonder, would it even work in a useful way, over the
  network, with all the different threads, processes, and sessions going
  on?  Or how about a desktop, pgAdmin with several database connections,
  can those be isolated from each other or whatever the security setup may
  be?

It's a good question. Yes, it is possible no need to say. :)

We can configure Apache to kick its contents handler with a proper security
context. The contents handler is a sort of Apache module to handle various
kind of web contents like *.html, *.php, *.cgi and so on.
The existing module (mod_selinux) eanbles to invoke CGI program with a 
proper

security context based on HTTP authentication. In addition, the upcoming
Linux kernel got a feature to assign built-in scripts its security context.

SELinux applied its access controls based on the assigned security context
for various kind of objects like files, sockets, IPCs, tables, columns and
so on.

I can provide a demonstration, pelase wait for a while to set up.


The following URL can show the demonstration:
  http://kaigai.myhome.cx/index.php

It requires HTTP authentication, and you can choose one of foo, var or 
baz.
They can be authenticated by same password: sepgsql.

The web server assigns per-user security context for its contents handler
including the PHP script. It shows the result set of SQL query depends on
the security context of its client.

(note) This script always connects to SE-PostgreSQL server with apache role
   that has a privileged user rights.

Thanks,
--
KaiGai Kohei [EMAIL PROTECTED]

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] libpq events patch (with sgml docs)

2008-09-17 Thread Andrew Chernow

Tom Lane wrote:


Some thought would need to be given to how that interacts with
RESULTCOPY.  Presumably on the destination side, RESULTCOPY is the
equivalent of RESULTCREATE, ie, don't DESTROY if you didn't get COPY.
But what of the source side?  Arguably you shouldn't invoke COPY on
events that were never initialized in this object.



You are correct.  The resultcopy function needs to check if the src 
result eventproc was initialized.  BUT, that should not be possible 
unless someone is not checking return values.  Meaning, the only way to 
have an uninitialized eventproc in this instance is if something failed 
during a resultcreate.  Actually, if you call PQmakeEmptyPGResult then 
you can also run into this problem.  That can be solved by adding an 
argument to makeResult as I previously suggested.



I also think that a little bit of thought should go into whether or
not to call DESTROY on an event that *did* get a CREATE and failed it.
You could argue that one either way: should a failing CREATE operation
be expected to fully clean up after itself, or should it be expected
to leave things in a state where DESTROY can clean up properly?
I'm not entirely sure, but option A might force duplication of code
between CREATE's failure path and DESTROY.  Whichever semantics we
choose needs to be documented.



If a resultcreate fails, than I think that resultdestroy should not be 
delivered to that eventproc (same for resultcopy).  That is how register 
works and how I saw this patch working (eventhough it appears I have a 
few logical errors).  I don't think it makes sense to do it otherwise, 
it would be like calling free after a malloc failure.


The needDestroy member should be called resultInitialized.  Although the 
conn doesn't reference the 'resultInitialized' member, I should 
initialize it to FALSE.  I did not do this in the last patch ... 
register function.


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NDirectFileRead and Write

2008-09-17 Thread Tom Lane
ITAGAKI Takahiro [EMAIL PROTECTED] writes:
 - NDirectFile{Read|Write} are renamed to BufFile{Read|Write}Count.
 - They are still declared in execdebug.h and buffile.c includes it.

After some thought I concluded that the least ugly way to do this
was to declare and define the variables in the same places as the
other counters that are printed by ShowBufferUsage.  It's not any
worse from a modularity standpoint than the other alternatives we
considered, and it seems to satisfy the principle of least
surprise a little better.  Committed it that way.

 I did not touch messages in ShowBufferUsage() in the patch.

I didn't change the message either.  Heikki's suggestion of temp
blocks doesn't seem very good to me because of the likelihood
of confusion with temp tables.  I don't have a better alternative
offhand.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] libpq events patch (with sgml docs)

2008-09-17 Thread Tom Lane
Andrew Chernow [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Some thought would need to be given to how that interacts with
 RESULTCOPY.  Presumably on the destination side, RESULTCOPY is the
 equivalent of RESULTCREATE, ie, don't DESTROY if you didn't get COPY.
 But what of the source side?  Arguably you shouldn't invoke COPY on
 events that were never initialized in this object.

 You are correct.  The resultcopy function needs to check if the src 
 result eventproc was initialized.  BUT, that should not be possible 
 unless someone is not checking return values.  Meaning, the only way to 
 have an uninitialized eventproc in this instance is if something failed 
 during a resultcreate.  Actually, if you call PQmakeEmptyPGResult then 
 you can also run into this problem.  That can be solved by adding an 
 argument to makeResult as I previously suggested.

I still think it would be a good idea to expend a couple more lines in
PQcopyResult to cover the case --- the likelihood of there being code
paths that make an empty result and never invoke RESULTCREATE just seems
too high.  Defensive programming 'n all that.  In any case I'm not
convinced that we should force a RESULTCREATE cycle on external callers
of PQmakeEmptyPGResult.  If you guys didn't see a need for it in your
development then it probably doesn't exist.

 If a resultcreate fails, than I think that resultdestroy should not be 
 delivered to that eventproc (same for resultcopy).  That is how register 
 works and how I saw this patch working (eventhough it appears I have a 
 few logical errors).  I don't think it makes sense to do it otherwise, 
 it would be like calling free after a malloc failure.

I can live with that definition, but please document it.

 The needDestroy member should be called resultInitialized.

Yeah, probably, so that you can set it FALSE in conn events and continue
to use memcpy to move things over.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Kevin Grittner
 Tom Lane [EMAIL PROTECTED] wrote: 
 Robert Haas [EMAIL PROTECTED] writes:
 I am not sure, if these rule is good. Somebody who develop on
 postgresql should have a problems when they will be port to other
 databases in future. Reserved words in standards should be
respected.
 
 If people want to write code that will work on multiple databases,
 they should of course avoid using any SQL reserved words for
anything
 other than their reserved purposes.
 
 More than that, they have to actually test their SQL on each target
DB.
 Every DB (including us) is going to have some reserved words that
are
 not in the standard; so imagining that Postgres can all by itself
 protect you from this type of problem is doomed to failure anyway.
 
If someone wants portable code, they can use a development tool which
wraps ALL identifiers in quotes, every time.  That's what we do.  The
important thing is that, to the extent practicable, standard SQL code
is accepted and behaves in compliance with the standard.  I don't see
that it does anything to compromise that if you support additional,
non-standard syntax for extensions.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for SQL-standard negative valued year-month literals

2008-09-17 Thread Ron Mayer

Tom Lane wrote:

Stephen R. van den Berg [EMAIL PROTECTED] writes:

Intervals are a scalar, not an addition of assorted values, alternating signs
between fields would be wrong.


Sorry, you're the one who's wrong on that.  We've treated intervals as
three independent fields for years now (and before that it was two
independent fields).  We're not going to throw away that capability.


+1 It's very useful.

Currently our terse input format that's similar to the SQL standard
rejects more mixed-sign intervals than I'd like.  I'd be quite
happy if:
  '1 2:03:-04'
gave me
  '1 day 2 hours 3 minutes -4 seconds'
but currently we reject that mixed-sign-literal.


I'd just like to find a way to have SQL-standard input produce SQL-standard
output in the cases where the input happened to match the standard.

If we had a blank slate, my vote would be that
  '-1 2:03:04'  should mean what the SQL standard says it should.
  '-1 +2:03:04' should mean negative 1 days, plus 2 hours 3 minutes 4 sec
  '1 2:03:-04'  should mean 1 day 2 hours 3 minutes minus 4 seconds
  '-1 2:03:+04'  should mean negative 1 day 2 hours 3 minutes plus 4 seconds
but I'm aware that there are backward compatibility issues.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Alvaro Herrera
Simon Riggs wrote:
 
 On Wed, 2008-09-17 at 10:09 +0300, Heikki Linnakangas wrote:

  Isn't autoanalyze a waste of time during a bulk load? Seems better to 
  run ANALYZE manually at the end.
 
 Its not a waste of time because it catches tables immediately they have
 been loaded, not just at the end of the bulk load. Running ANALYZE is a
 waste of time if autoanalyze has already caught it, which is why that's
 never been added onto the end of a pg_dump script. But currently this is
 true only when we have both autoVACUUM and autoANALYZE enabled.

Hmm, one of the first complaints about defaulting autovacuum to on was
that it made restores so much longer *because* it was choosing to do
autoanalyzes on the tables as they were imported.  It was then that the
auto-cancel mechanism was introduced.

http://pgsql.markmail.org/message/rqyjkafuw43426xy

Why doesn't this new request conflict with that one?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Column level privileges was:(Re: [HACKERS] Extending grant insert on tables to sequences)

2008-09-17 Thread Stephen Frost
Jaime,

* Stephen Frost ([EMAIL PROTECTED]) wrote:
 * Jaime Casanova ([EMAIL PROTECTED]) wrote:
  On 7/25/08, Stephen Frost [EMAIL PROTECTED] wrote:
   Yes, I'm working on it
  
  hi, any work on it? may i help?
 
 If you look at the commitfest, I've posted my WIP so far there.  Most of
 the grammer, parser, and catalog changes are there.  There's a couple of
 bugs in that code that I'm working to run down but otherwise I think
 it's pretty good.  I do need to add in the dependency tracking as well
 though, and that's what I'm planning to work on next.

I've now added dependency tracking and worked out a few kinks in the
code, both existing previously and from adding the dep tracking.  I'd
really like to simplify things in aclchk.c, perhaps by factoring out
more common bits into functional pieces, but it's been kind of a bear so
far.

The dependency tracking is being done by continuing to treat the table
as a single entity and just figuring out the total set (including all
column-level permissions) of roles for the entire table, rather than
introducing the sub-object concept.  This requires a bit of extra effort
when doing DDLs and GRANTs but simplifies the dependency tracking
itself, especially since we have to keep track of both table-level
permissions and column-level permissions seperately.

I'm open to other suggestions/comments.  If people feel the sub-object
is a better approach, it would get somewhat more awkward because we'd
have to handle the relation-level dependencies as well as the
column-level ones.  Not impossible to do, of course, but a bit more
complicated than how it was done originally.

 A piece which can be broken off pretty easily is adding support to track
 the columns used through to the executor so we can check the permissions
 in the right place.

Jamie, have you had a chance to work on this?  It's next on my list and
I'll start working on it tonight unless you've had a chance to get to
it.  Please let me know.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Heikki Linnakangas

Alvaro Herrera wrote:

Simon Riggs wrote:

On Wed, 2008-09-17 at 10:09 +0300, Heikki Linnakangas wrote:
Isn't autoanalyze a waste of time during a bulk load? Seems better to 
run ANALYZE manually at the end.

Its not a waste of time because it catches tables immediately they have
been loaded, not just at the end of the bulk load. Running ANALYZE is a
waste of time if autoanalyze has already caught it, which is why that's
never been added onto the end of a pg_dump script. But currently this is
true only when we have both autoVACUUM and autoANALYZE enabled.


Hmm, one of the first complaints about defaulting autovacuum to on was
that it made restores so much longer *because* it was choosing to do
autoanalyzes on the tables as they were imported.  It was then that the
auto-cancel mechanism was introduced.

http://pgsql.markmail.org/message/rqyjkafuw43426xy

Why doesn't this new request conflict with that one?


The problem back then was that a CREATE INDEX was waiting on the 
autoanalyze to finish, and the autoanalyze took a long time to finish 
because of vacuum_cost_delay. Now that we have the auto-cancel 
mechanism, that's not a problem.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] libpq events patch (with sgml docs)

2008-09-17 Thread Andrew Chernow

Andrew Chernow wrote:

New patch following our discussion with updated docs.

few logical errors).  I don't think it makes sense to do it 
otherwise, it would be like calling free after a malloc failure.


I can live with that definition, but please document it.




To build on this analogy, PGEVT_CONNRESET is like a realloc.  Meaning, 
the initial malloc PGEVT_REGISTER worked by the realloc 
PGEVT_CONNRESET didn't ... you still have free PGEVT_CONNDESTROY the 
initial.  Its documented that way.  Basically if a register succeeds, a 
destroy will always be sent regardless of what happens with a reset.





I attached the wrong patch.  I'm sorry.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/
Index: doc/src/sgml/libpq.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/libpq.sgml,v
retrieving revision 1.261
diff -C6 -r1.261 libpq.sgml
*** doc/src/sgml/libpq.sgml 17 Sep 2008 04:31:08 -  1.261
--- doc/src/sgml/libpq.sgml 17 Sep 2008 14:19:29 -
***
*** 4911,4923 
 When a literalPGEVT_REGISTER/literal event is received, the
 parameterevtInfo/parameter pointer should be cast to a
 structnamePGEventRegister */structname.  This structure contains a
 structnamePGconn/structname that should be in the
 literalCONNECTION_OK/literal status; guaranteed if one calls
 functionPQregisterEventProc/function right after obtaining a good
!structnamePGconn/structname.
/para
   /listitem
  /varlistentry
  
  varlistentry
   termliteralPGEVT_CONNRESET/literal/term
--- 4911,4925 
 When a literalPGEVT_REGISTER/literal event is received, the
 parameterevtInfo/parameter pointer should be cast to a
 structnamePGEventRegister */structname.  This structure contains a
 structnamePGconn/structname that should be in the
 literalCONNECTION_OK/literal status; guaranteed if one calls
 functionPQregisterEventProc/function right after obtaining a good
!structnamePGconn/structname.  When returning a failure code, all
!cleanup must be performed as no literalPGEVT_CONNDESTROY/literal
!event will be sent.
/para
   /listitem
  /varlistentry
  
  varlistentry
   termliteralPGEVT_CONNRESET/literal/term
***
*** 4941,4953 
  
 When a literalPGEVT_CONNRESET/literal event is received, the
 parameterevtInfo/parameter pointer should be cast to a
 structnamePGEventConnReset */structname.  Although the contained
 structnamePGconn/structname was just reset, all event data remains
 unchanged.  This event should be used to reset/reload/requery any
!associated literalinstanceData/literal.
/para
   /listitem
  /varlistentry
  
  varlistentry
   termliteralPGEVT_CONNDESTROY/literal/term
--- 4943,4956 
  
 When a literalPGEVT_CONNRESET/literal event is received, the
 parameterevtInfo/parameter pointer should be cast to a
 structnamePGEventConnReset */structname.  Although the contained
 structnamePGconn/structname was just reset, all event data remains
 unchanged.  This event should be used to reset/reload/requery any
!associated literalinstanceData/literal.  A PGEVT_CONNDESTROY event
!is always sent, regardless of the event procedure's return value.
/para
   /listitem
  /varlistentry
  
  varlistentry
   termliteralPGEVT_CONNDESTROY/literal/term
***
*** 5000,5023 
 structnamePGEventResultCreate */structname.  The
 parameterconn/parameter is the connection used to generate the
 result.  This is the ideal place to initialize any
 literalinstanceData/literal that needs to be associated with the
 result.  If the event procedure fails, the result will be cleared and
 the failure will be propagated.  The event procedure must not try to
!functionPQclear/ the result object for itself.
/para
   /listitem
  /varlistentry
  
  varlistentry
   termliteralPGEVT_RESULTCOPY/literal/term
   listitem
para
 The result copy event is fired in response to
 functionPQcopyResult/function.  This event will only be fired after
!the copy is complete.
  
synopsis
  typedef struct
  {
  const PGresult *src;
  PGresult *dest;
--- 5003,5030 
 structnamePGEventResultCreate */structname.  The
 parameterconn/parameter is the connection used to generate the
 result.  This is the ideal place to initialize any
 literalinstanceData/literal that needs to be associated with the
 result.  If the event procedure fails, the result will be cleared and
 the failure will be propagated.  The event procedure must not try to
!

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Tatsuo Ishii
 Is physical_tlist optimization sensible for RecursiveScan?  We seem
 to use it for every other Scan node type.

To enable physical_tlist optimization, it seems build_physical_tlist,
use_physical_tlist and disuse_physical_tlist need to be
changed. build_physical_tlist and use_physical_tlist have been already
patched and only disuse_physical_tlist needs to be patched. Any other
place I miss to enable the optimization?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Alvaro Herrera wrote:
 Why doesn't this new request conflict with that one?

 The problem back then was that a CREATE INDEX was waiting on the 
 autoanalyze to finish, and the autoanalyze took a long time to finish 
 because of vacuum_cost_delay. Now that we have the auto-cancel 
 mechanism, that's not a problem.

Define not a problem.  With auto-cancel, what will happen is that
whatever work the autoanalyze does will be wasted.  It seems to me
that the current complaint is about background autovacuum/autoanalyze
wasting cycles during a bulk load, and there's certainly no purer waste
than an analyze cycle that gets aborted.

I tend to agree with Alvaro that there's not very much of a use case for
an analyze-only autovacuum mode.  Assuming that we get to the point of
having a parallelizing pg_restore, it would be interesting to give it an
option to include ANALYZE for each table it's loaded among the tasks
that it schedules.  (I'm visualizing these commands as being made up by
pg_restore itself, *not* added to the pg_dump output.)  Then you could
have a reasonably optimal total workflow, whereas allowing autovacuum
to try to schedule the ANALYZEs can't be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes:
 Is physical_tlist optimization sensible for RecursiveScan?  We seem
 to use it for every other Scan node type.

 To enable physical_tlist optimization, it seems build_physical_tlist,
 use_physical_tlist and disuse_physical_tlist need to be
 changed. build_physical_tlist and use_physical_tlist have been already
 patched and only disuse_physical_tlist needs to be patched. Any other
 place I miss to enable the optimization?

IIRC, the comment for build_physical_tlist hadn't been patched, but
yeah that seems like about it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Tatsuo Ishii
  To enable physical_tlist optimization, it seems build_physical_tlist,
  use_physical_tlist and disuse_physical_tlist need to be
  changed. build_physical_tlist and use_physical_tlist have been already
  patched and only disuse_physical_tlist needs to be patched. Any other
  place I miss to enable the optimization?
 
 IIRC, the comment for build_physical_tlist hadn't been patched, but
 yeah that seems like about it.

Yeah, I need to fix sloppy comments in the existing patches all over
the places:-)
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for SQL-standard negative valued year-month literals

2008-09-17 Thread Stephen R. van den Berg
Tom Lane wrote:
Stephen R. van den Berg [EMAIL PROTECTED] writes:
 Intervals are a scalar, not an addition of assorted values, alternating signs
 between fields would be wrong.

Sorry, you're the one who's wrong on that.  We've treated intervals as
three independent fields for years now (and before that it was two
independent fields).

Ok, didn't know that.
Let's put it differently then: I can understand that the standard
considers it a scalar and not an addition, but apparently the addition
characteristic is being used in Postgres code already; that makes it
undesirable to change it indeed.
-- 
Sincerely,
   Stephen R. van den Berg.

He did a quarter of the work in *half* the time!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Column level privileges was:(Re: [HACKERS] Extending grant insert on tables to sequences)

2008-09-17 Thread Jaime Casanova
On 9/17/08, Stephen Frost [EMAIL PROTECTED] wrote:

  A piece which can be broken off pretty easily is adding support to track
  the columns used through to the executor so we can check the permissions
  in the right place.

 Jamie, have you had a chance to work on this?  It's next on my list and
 I'll start working on it tonight unless you've had a chance to get to
 it.  Please let me know.


not really, i start to read the code... but was interrupted for a new
task... (if we only could send kill -9 signals to work tasks ;)

-- 
regards,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] text search patch status update?

2008-09-17 Thread Teodor Sigaev
I remember about that but right now I havn't time to make final review. Sorry. 
Will return soon.


Sushant Sinha wrote:

Any status updates on the following patches?

1. Fragments in tsearch2 headlines:
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00043.php

2. Bug in hlCover:
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00089.php

-Sushant.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch for SQL-Standard Interval output and decoupling DateStyle from IntervalStyle

2008-09-17 Thread Ron Mayer

Tom Lane wrote:

In fact, given that we are now
somewhat SQL-compliant on interval input, a GUC that selected
PG traditional, SQL-standard, or ISO 8601 interval output format seems
like it could be a good idea.


Short summary:

   The attached patch
 (1) adds a new GUC called IntervalStyle that decouples interval
 output from the DateStyle GUC, and
 (2) adds a new interval style that will match the SQL standards
 for interval literals when given interval data that meets the
 sql standard (year-month or date-time only; and no mixed sign).

Background:

   Currently Postgres outputs Intervals in one of two formats
   depending on DateStyle.  When DateStyle is 'ISO',  it outputs
   intervals like '1 year 2 mons 3 days -04:05:06.78' (though I
   know of no ISO interval standards that look like that).  When
   DateStyle is 'SQL' it outputs intervals like
   '@ 1 year 2 mons 3 days -4 hours -5 mins -6.78 secs' (though
   I know of no SQL interval standards that look like that).

The feature:

   The SQL standard only specifies interval literal strings
   for the two kinds of intervals (year-month and day-time)
   that are defined by the SQL spec.  It also doesn't account
   for postgres's intervals that can have mixed-sign components.
   I tried to make the output here be a logical extension
   of the spec, concatenating the year-month and day-time
   interval strings and forcing signs in the output that could
   otherwise have been ambiguous.

   This little table shows an example of the output of this
   new style compared to the existing postgres output for
   (a) a year-month interval, (b) a day-time interval, and
   (c) a not-quite-standard postgres interval with year-month
   and day-time components of varying signs.

'1-2' | '@ 1 year 2 mons'
'3 4:05:06.78'| '@ 3 days 4 hours 5 mins 6.78 secs'
'+1-2 -3 +4:05:06.78' | '@ 1 year 2 mons -3 days 4 hours 5 mins 6.78 secs'

The patch:

   Seems to work for me; and I believe I updated the docs.
   Many regression tests fail, though, because they assume
   the existing coupling of DateStyle and interval output
   styles.   If people like where this is going I can update
   those regression tests and add ones to test this new style.


*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 4090,4095  SET XML OPTION { DOCUMENT | CONTENT };
--- 4090,4117 
/listitem
   /varlistentry
  
+  varlistentry id=guc-intervalstyle xreflabel=IntervalStyle
+   termvarnameIntervalStyle/varname (typestring/type)/term
+   indexterm
+primaryvarnameIntervalStyle/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ Sets the display format for interval values. 
+ The value literalsql_standard/ will output SQL Standard
+ strings when given intervals that conform to the SQL
+ standard (either year-month only or date-time only; and no
+ mixing of positive and negative components).
+ The value literalpostgres/ will output intervals in
+ a format that matches what old releases had output when
+ the DateStyle was set to literal'ISO'/.
+ The value literalpostgres_verbose/ will output intervals in
+ a format that matches what old releases had output when
+ the DateStyle was set to literal'SQL'/.
+/para
+   /listitem
+  /varlistentry
+ 
   varlistentry id=guc-timezone xreflabel=timezone
termvarnametimezone/varname (typestring/type)/term
indexterm
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
***
*** 2213,2218  January 8 04:05:06 1999 PST
--- 2213,2305 
  /para
 /sect2
  
+sect2 id=interval-output
+ titleInterval Output/title
+ 
+ indexterm
+  primaryinterval/primary
+  secondaryoutput format/secondary
+  seealsoformatting/seealso
+ /indexterm
+ 
+ para
+  The output format of the interval types can be set to one of the four
+  styles literalsql_standard/, 
+  literalpostgres/, or literalpostgres_verbose/.The default
+  is the literalpostgres/ format.  
+  xref
+  linkend=interval-style-output-table shows examples of each
+  output style.
+ /para
+ 
+ para
+  The literalsql_standard/ style will output SQL standard
+  interval literal strings where the value of the interval
+  value consists of only a year-month component or a datetime
+  component (as required by the sql standard).   For an interval
+  containing both a year-month and a datetime component, the
+  output will be a SQL Standard unquoted year-month literal
+  string joined to a SQL Standard unquoted datetime literal
+  string with a space in between.
+ /para
+ 
+ para
+  The literalpostgres/ style will output intervals that match
+  the style PostgreSQL 8.3 outputed when the xref 

Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Simon Riggs
On Wed, 2008-09-17 at 10:52 -0400, Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  Alvaro Herrera wrote:
  Why doesn't this new request conflict with that one?
 
  The problem back then was that a CREATE INDEX was waiting on the 
  autoanalyze to finish, and the autoanalyze took a long time to finish 
  because of vacuum_cost_delay. Now that we have the auto-cancel 
  mechanism, that's not a problem.
 
 Define not a problem.  With auto-cancel, what will happen is that
 whatever work the autoanalyze does will be wasted.  It seems to me
 that the current complaint is about background autovacuum/autoanalyze
 wasting cycles during a bulk load, and there's certainly no purer waste
 than an analyze cycle that gets aborted.

OK, but that's an argument against auto-anything, not just against
splitting out autoanalyze and autovacuum.

 I tend to agree with Alvaro that there's not very much of a use case for
 an analyze-only autovacuum mode. 

Did he say that? I thought he said we could do that, what did that mean 
Alvaro?

I have a customer saying this would be a good thing and I agree. The
roles of Autovacuum and autoanalyze are not exactly matched, so why do
we force them to be run together or not at all? Why not allow
the user to specify whether they want both or not? It's an option, we're
not forcing anyone to do it that way if they don't want to.

  Assuming that we get to the point of
 having a parallelizing pg_restore, it would be interesting to give it an
 option to include ANALYZE for each table it's loaded among the tasks
 that it schedules.  (I'm visualizing these commands as being made up by
 pg_restore itself, *not* added to the pg_dump output.)  Then you could
 have a reasonably optimal total workflow, whereas allowing autovacuum
 to try to schedule the ANALYZEs can't be.

That doesn't solve all problems, just ones with pg_restore. That's nice
and I won't turn it away, but what will we do about plain pg_dump and
about other table creations and loads?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Robert Haas
 I tend to agree with Alvaro that there's not very much of a use case for
 an analyze-only autovacuum mode.  Assuming that we get to the point of
 having a parallelizing pg_restore, it would be interesting to give it an
 option to include ANALYZE for each table it's loaded among the tasks
 that it schedules.  (I'm visualizing these commands as being made up by
 pg_restore itself, *not* added to the pg_dump output.)  Then you could
 have a reasonably optimal total workflow, whereas allowing autovacuum
 to try to schedule the ANALYZEs can't be.

In Simon's original email, he suggested forcing an automatic ANALYZE
on the server side after CREATE TABLE AS.  I objected on the grounds
that this won't fix anything for people who are doing bulk data loads
using any other mechanism.  Here, you're proposing the exact same
thing, except instead of restricting it to people who use CREATE TABLE
AS, you're restricting it to people who use a hypothetical
parallelized implementation of pg_restore.

While either of these is better than doing nothing, ISTM it would be
far better to give the database some smarts about what constitutes a
bulk data load (a whole bunch of insert operations on a newly created
table) and what to do about it (synchronous analyze just before the
first operation on the table that isn't an insert - and perhaps not
before).

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal - function's default parametes

2008-09-17 Thread Pavel Stehule
Hello,

before any implementation of named params we have to implement
parameters with default values. I don't see any sense of named params
without it. It's ortogonal to variadic functions (variadic functions
are called when we put more parameters than functions has, dafault
parameters are used when we put less parameters than funtions has).
Similar implementation has firebird. Default values are filled from
right -

create function foo(undefvar type, var1 type1 = expr1, var2 type2 =
expr2, ... var_n type_n = expr_n)

select foo(cexpr1, cexpr2) is transformed to

select foo(cexpr1, cexpr2, expr2, ... expr_n)

Problems with identification of good function are same as with
variadic functions.

Implementation:
In my prototype I used special datatype: position_expression that
carries position and parsed and transformed expression. pg_proc has
new column defaults of position_expression[] type. Position will be
used for named params in future, and using this type protect us to
compatibility problems.

postgres=# create or replace function x1(int = 1,int = 2,int= 3)
returns int as $$
  select $1+$2+$3;
$$ language sql;
CREATE FUNCTION
postgres=# select x1();
x1

6
(1 row)

postgres=# select x1(10);;
x1

15
(1 row)

postgres=# select x1(10,20);
x1

33
(1 row)

postgres=# select x1(10,20,30);
x1

60
(1 row)

This feature is implemented on SQL level, so all PL languages will be supported.

any comments, ideas are welcome

regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Alvaro Herrera
Hi,

We've been profiling a large system (8 CPUs, 64 GB of memory, some
dozens of disks) which seems rather more swamped than it should.  Part
of the problem seems to come from CleanupTempFiles, the second entry in
oprofile output.

This is the top 3 symbol report for a 2 minute oprofile run:

$ opreport '*postgres' -l
CPU: AMD64 processors, speed 2600 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 7000
samples  %symbol name
1963666   5.4862  AllocSetAlloc
1629477   4.5525  CleanupTempFiles
1560543   4.3599  hash_search_with_hash_value


Annotating the function results in this:

   :static void
   :CleanupTempFiles(bool isProcExit)
  1334  0.0037 :{ /* CleanupTempFiles total: 1629477  4.5525 */
   :Index  i;
   :
90 2.5e-04 :if (SizeVfdCache  0)
   :{
   :Assert(FileIsNotOpen(0));  /* Make 
sure ring not corrupted */
470706  1.3151 :for (i = 1; i  SizeVfdCache; i++)
   :{
281998  0.7879 :unsigned short fdstate = 
VfdCache[i].fdstate;
   :
872073  2.4365 :if ((fdstate  FD_TEMPORARY)  
VfdCache[i].fileName != NULL)
   :{
   :/*
   : * If we're in the process of 
exiting a backend process, close
   : * all temporary files. 
Otherwise, only close temporary files
   : * local to the current 
transaction.
   : */
   :if (isProcExit || (fdstate  
FD_XACT_TEMPORARY))
   :FileClose(i);
   :}
   :}
   :}
   :
  3198  0.0089 :while (numAllocatedDescs  0)
   :FreeDesc(allocatedDescs[0]);
78 2.2e-04 :}



So we're scanning the array lots of times (there are about 1300
transactions per second), and do no useful work on each scan.

There are about 8900 files in the database, and since this is using a
connection pooler, it's quite likely that each session opens a large
subset of them at least once at some point, and so the VfdCache gets
very large.


I can see two simple ways to solve this problem.  One is to create a
second array that keeps pointers to the temp files in VfdCache.  Then,
on CleanupTempFiles we scan that array instead of VfdCache directly.

The second one is to use a separate VfdCache for temp files, but this
seems much more involved, requiring touch almost all of fd.c.

Of course, perhaps there's another solution which involves rethinking
fd.c in a more thorough fashion, but I'm not sure how right offhand.

Thoughts?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Simon Riggs

On Wed, 2008-09-17 at 16:25 -0400, Alvaro Herrera wrote:

 We've been profiling a large system (8 CPUs, 64 GB of memory, some
 dozens of disks) which seems rather more swamped than it should.  Part
 of the problem seems to come from CleanupTempFiles, the second entry in
 oprofile output.

I'm glad you've observed this also. I saw it about two years ago but
wasn't able to convince anyone else it existed at the time.

 I can see two simple ways to solve this problem.  One is to create a
 second array that keeps pointers to the temp files in VfdCache.  Then,
 on CleanupTempFiles we scan that array instead of VfdCache directly.
 
 The second one is to use a separate VfdCache for temp files, but this
 seems much more involved, requiring touch almost all of fd.c.
 
 Of course, perhaps there's another solution which involves rethinking
 fd.c in a more thorough fashion, but I'm not sure how right offhand.

Simple solution is to have a state variable so you can see whether a
backend has created an temp files in this transaction. Most don't, so I
think the above two solutions are overkill. If we created any, scan for
them, if not, don't. Just a simple boolean state, just as we have for
AtEOXact_RelationCache().

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Alvaro Herrera
Simon Riggs wrote:
 
 On Wed, 2008-09-17 at 16:25 -0400, Alvaro Herrera wrote:
 
  We've been profiling a large system (8 CPUs, 64 GB of memory, some
  dozens of disks) which seems rather more swamped than it should.  Part
  of the problem seems to come from CleanupTempFiles, the second entry in
  oprofile output.
 
 I'm glad you've observed this also. I saw it about two years ago but
 wasn't able to convince anyone else it existed at the time.

I couldn't find it in the archives.

 Simple solution is to have a state variable so you can see whether a
 backend has created an temp files in this transaction. Most don't, so I
 think the above two solutions are overkill. If we created any, scan for
 them, if not, don't. Just a simple boolean state, just as we have for
 AtEOXact_RelationCache().

Ah -- like this?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Index: src/backend/storage/file/fd.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/storage/file/fd.c,v
retrieving revision 1.143
diff -c -p -r1.143 fd.c
*** src/backend/storage/file/fd.c	1 Jan 2008 19:45:51 -	1.143
--- src/backend/storage/file/fd.c	17 Sep 2008 21:33:28 -
*** static int	max_safe_fds = 32;	/* default
*** 121,126 
--- 121,132 
  #define FD_TEMPORARY		(1  0)	/* T = delete when closed */
  #define FD_XACT_TEMPORARY	(1  1)	/* T = delete at eoXact */
  
+ /*
+  * Flag to tell whether it's worth scanning VfdCache looking for temp files to
+  * close
+  */
+ static bool		any_temporary_files = false;
+ 
  typedef struct vfd
  {
  	signed short fd;			/* current FD, or VFD_CLOSED if none */
*** OpenTemporaryFile(bool interXact)
*** 889,894 
--- 895,903 
  	{
  		VfdCache[file].fdstate |= FD_XACT_TEMPORARY;
  		VfdCache[file].create_subid = GetCurrentSubTransactionId();
+ 
+ 		/* ensure cleanup happens at eoxact */
+ 		any_temporary_files = true;
  	}
  
  	return file;
*** AtEOSubXact_Files(bool isCommit, SubTran
*** 1603,1609 
  {
  	Index		i;
  
! 	if (SizeVfdCache  0)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i  SizeVfdCache; i++)
--- 1612,1618 
  {
  	Index		i;
  
! 	if (SizeVfdCache  0  any_temporary_files)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i  SizeVfdCache; i++)
*** CleanupTempFiles(bool isProcExit)
*** 1679,1685 
  {
  	Index		i;
  
! 	if (SizeVfdCache  0)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i  SizeVfdCache; i++)
--- 1688,1694 
  {
  	Index		i;
  
! 	if (SizeVfdCache  0  (isProcExit || any_temporary_files))
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i  SizeVfdCache; i++)
*** CleanupTempFiles(bool isProcExit)
*** 1697,1702 
--- 1706,1713 
  	FileClose(i);
  			}
  		}
+ 
+ 		any_temporary_files = false;
  	}
  
  	while (numAllocatedDescs  0)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Simon Riggs wrote:

  Simple solution is to have a state variable so you can see whether a
  backend has created an temp files in this transaction. Most don't, so I
  think the above two solutions are overkill. If we created any, scan for
  them, if not, don't. Just a simple boolean state, just as we have for
  AtEOXact_RelationCache().
 
 Ah -- like this?

BTW I wonder if there's any point in checking SizeVfdCache  0 in the
places where checking the new flag suffices.


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Ah -- like this?

+1, but there are two kinds of temp files in that module, and only
one of them is relevant here.  Call it something like
have_xact_temporary_files to make things clearer.

I concur that the explicit test on SizeVfdCache  0 is a waste of
effort, too.  It'll nearly always be true anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Simon Riggs

On Wed, 2008-09-17 at 17:34 -0400, Alvaro Herrera wrote:
 Simon Riggs wrote:
  
  On Wed, 2008-09-17 at 16:25 -0400, Alvaro Herrera wrote:
  
   We've been profiling a large system (8 CPUs, 64 GB of memory, some
   dozens of disks) which seems rather more swamped than it should.  Part
   of the problem seems to come from CleanupTempFiles, the second entry in
   oprofile output.
  
  I'm glad you've observed this also. I saw it about two years ago but
  wasn't able to convince anyone else it existed at the time.
 
 I couldn't find it in the archives.

Perhaps it was a private conversation then, but the main point is I've
seen it too and believe it is a real world problem.

  Simple solution is to have a state variable so you can see whether a
  backend has created an temp files in this transaction. Most don't, so I
  think the above two solutions are overkill. If we created any, scan for
  them, if not, don't. Just a simple boolean state, just as we have for
  AtEOXact_RelationCache().
 
 Ah -- like this?

Yeh, nice and simple.

Might be better to call it need_eoxact_work to mirror relcache.c?
any_temporary_files is fairly vague and could be misinterpreted.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Simon Riggs

On Wed, 2008-09-17 at 17:34 -0400, Alvaro Herrera wrote:

 Ah -- like this?

if test should include 
|| isProcExit

so you don't skip non-transactional temp files at proc exit when there
weren't any created in the last transaction.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Ah -- like this?
 
 +1, but there are two kinds of temp files in that module, and only
 one of them is relevant here.  Call it something like
 have_xact_temporary_files to make things clearer.

Ok, so that's what I called it.

Simon wrote:

 if test should include
 || isProcExit
 
 so you don't skip non-transactional temp files at proc exit when there
 weren't any created in the last transaction.

Yep, I noticed that too.  Thanks.

Should I backpatch this to 8.3?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Index: src/backend/storage/file/fd.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/storage/file/fd.c,v
retrieving revision 1.143
diff -c -p -r1.143 fd.c
*** src/backend/storage/file/fd.c	1 Jan 2008 19:45:51 -	1.143
--- src/backend/storage/file/fd.c	17 Sep 2008 22:58:32 -
*** static int	max_safe_fds = 32;	/* default
*** 121,126 
--- 121,132 
  #define FD_TEMPORARY		(1  0)	/* T = delete when closed */
  #define FD_XACT_TEMPORARY	(1  1)	/* T = delete at eoXact */
  
+ /*
+  * Flag to tell whether it's worth scanning VfdCache looking for temp files to
+  * close
+  */
+ static bool		have_xact_temporary_files = false;
+ 
  typedef struct vfd
  {
  	signed short fd;			/* current FD, or VFD_CLOSED if none */
*** OpenTemporaryFile(bool interXact)
*** 889,894 
--- 895,903 
  	{
  		VfdCache[file].fdstate |= FD_XACT_TEMPORARY;
  		VfdCache[file].create_subid = GetCurrentSubTransactionId();
+ 
+ 		/* ensure cleanup happens at eoxact */
+ 		have_xact_temporary_files = true;
  	}
  
  	return file;
*** AtEOSubXact_Files(bool isCommit, SubTran
*** 1603,1609 
  {
  	Index		i;
  
! 	if (SizeVfdCache  0)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i  SizeVfdCache; i++)
--- 1612,1618 
  {
  	Index		i;
  
! 	if (have_xact_temporary_files)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i  SizeVfdCache; i++)
*** CleanupTempFiles(bool isProcExit)
*** 1679,1685 
  {
  	Index		i;
  
! 	if (SizeVfdCache  0)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i  SizeVfdCache; i++)
--- 1688,1698 
  {
  	Index		i;
  
! 	/*
! 	 * Careful here: at proc_exit we need extra cleanup, not just
! 	 * xact_temporary files.
! 	 */
! 	if (isProcExit || have_xact_temporary_files)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i  SizeVfdCache; i++)
*** CleanupTempFiles(bool isProcExit)
*** 1697,1702 
--- 1710,1717 
  	FileClose(i);
  			}
  		}
+ 
+ 		have_xact_temporary_files = false;
  	}
  
  	while (numAllocatedDescs  0)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Alvaro Herrera

BTW in testing this patch I was surprised by the fact that temp tables
files are removed at checkpoint time, rather than when the transaction
ends (at first I thought I had broken the removal of temp files).  Is
this a recent feature?

(I verified that this continues to work fine for WITH HOLD cursors too.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-17 Thread Decibel!

On Sep 17, 2008, at 9:30 AM, Heikki Linnakangas wrote:
I think we'd still need to WAL log operations that decrease the  
amount of free space on page. Otherwise, after we have partial  
vacuum, we might never revisit a page, and update the FSM, even  
though there's usable space on the page, leaving the space  
forgotten forever.



ISTM that it would be better to deal with such corner cases via  
periodic non-partial vacuums, done by something like autovac, and  
probably done with an ever higher-than-normal vacuum_cost_delay  
setting so as to minimize performance. That's likely a lot less  
wasteful than further compounding lock contention for the WAL. Even  
if it does result in more overall IO, you have to trade a *lot* of IO  
to balance out the impact of lock contention.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


[HACKERS] 0x1A in control file on Windows

2008-09-17 Thread ITAGAKI Takahiro
I found a bug that pg_controldata ends with error if control files
contain 0x1A (Ctrl+Z) on Windows.

We probably need to add PG_BINARY when we open control files
because 0x1A is an end-of-file marker on Windows.
This fix needs to be applied in back versions (8.2, 8.3 and HEAD).


Index: src/bin/pg_controldata/pg_controldata.c
===
--- src/bin/pg_controldata/pg_controldata.c (head)
+++ src/bin/pg_controldata/pg_controldata.c (pg_control_0x1A)
@@ -107,7 +107,7 @@
 
snprintf(ControlFilePath, MAXPGPATH, %s/global/pg_control, DataDir);
 
-   if ((fd = open(ControlFilePath, O_RDONLY, 0)) == -1)
+   if ((fd = open(ControlFilePath, O_RDONLY | PG_BINARY, 0)) == -1)
{
fprintf(stderr, _(%s: could not open file \%s\ for reading: 
%s\n),
progname, ControlFilePath, strerror(errno));
Index: src/bin/pg_resetxlog/pg_resetxlog.c
===
--- src/bin/pg_resetxlog/pg_resetxlog.c (head)
+++ src/bin/pg_resetxlog/pg_resetxlog.c (pg_control_0x1A)
@@ -373,7 +373,7 @@
char   *buffer;
pg_crc32crc;
 
-   if ((fd = open(XLOG_CONTROL_FILE, O_RDONLY, 0))  0)
+   if ((fd = open(XLOG_CONTROL_FILE, O_RDONLY | PG_BINARY, 0))  0)
{
/*
 * If pg_control is not there at all, or we can't read it, the 
odds

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Where is Aggregation Attribute

2008-09-17 Thread Zhe He
I want to get the attribute from an Aggregation node, where is it?

typedef struct Agg
{
Plan plan;
AggStrategy aggstrategy;
int numCols; /* number of grouping columns */
AttrNumber *grpColIdx; /* their indexes in the target list */
Oid   *grpOperators; /* equality operators to compare with */
long numGroups; /* estimated number of groups in input */
} Agg;

-- 
Best Regards,
Zhe HE
TEL: (001) 646-789-3008
Address:965 Amsterdam Avenue,
New York, NY 10025

Master Student, CS Dept.
Columbia University
www.columbia.edu/~zh2132
---
07 Alumni
Bachelor of Eng, BUPT
www.bupt.edu.cn


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-17 Thread Tatsuo Ishii
Tom, thanks for the review.

Here is an in-progress report. Patches against CVS HEAD attached.
(uncommented items are work-in-progress).
--
Tatsuo Ishii
SRA OSS, Inc. Japan

 Tatsuo Ishii [EMAIL PROTECTED] writes:
  Included is the latest patches against CVS HEAD.
 
 I spent some time reading this patch.  Here are a few comments in
 no particular order:
 
 RangeRecursive node lacks copyfuncs/equalfuncs support.

Functions added.
 
 Query.recursive is missed in equalfuncs.c.  But rather than fix that,
 get rid of it entirely.  AFAICS the only use is in qual_is_pushdown_safe,
 and what is the value of that test?  The callers know perfectly well
 whether they are operating on a recursive RTE or not.  You might as well
 just delete all the useless qual-pushdown-attempt code from
 set_recursion_pathlist, and not need to touch qual_is_pushdown_safe
 at all.

Query.recursive removed and qual_is_pushdown_safe is untouched.

 Is physical_tlist optimization sensible for RecursiveScan?  We seem
 to use it for every other Scan node type.

Fixed and physical_tlist optimization is enabled for RecursiveScan I
believe.

 I dislike putting state into ExecutorState; that makes it impossible
 to have multiple recursion nodes in one plan tree.  It would probably
 be better for the Recursion and RecursiveScan nodes to talk to each
 other directly (compare HashJoin/Hash); although since they are not
 adjacent in the plan tree I admit I'm not sure how to do that.
 
 es_disallow_tuplestore doesn't seem to need to be in ExecutorState
 at all, it could as well be in RecursionState.

It was for a workaround to avoid an infinit recursion in some
cases. Discussion came to the conclusion that it's user's
responsibilty to avoid such that case (otherwise the semantics of our
recursive query becomes to be different from the one defined in the
standard) I believe.

es_disallow_tuplestore removed.

 I don't really like the way that Append nodes are being abused here.
 It would be better to allow nodeRecursion.c to duplicate a little code
 from nodeAppend.c, and have the child plans be direct children of
 the Recursion node.  BTW, is it actually possible to have more than
 two children?  I didn't spend enough time analyzing the restrictions
 in parse_cte to be sure.  If there are always just two then you could
 simplify the representation by treating it like a join node instead
 of an append.  (The RTE_RECURSIVE representation sure makes it look
 like there can be only two...)
 
 Mark/restore support seems useless ... note the comment on
 ExecSupportsMarkRestore (which should be updated if this code
 isn't removed).
 
 RecursiveScan claims to support backwards fetch, but does not in fact
 contain code to do so.  (Given that it will always be underneath
 Recursion, which can't do backwards fetch, I see little point in adding
 such code; fix execAmi.c instead.)
 
 ExecInitRecursion doesn't seem to be on the same page about whether
 it supports backward scan as execAmi.c, either.
 
 This comment in nodeRecursivescan.c seems bogus:
   /*
* Do not initialize scan tuple type, result tuple type and
* projection info in ExecInitRecursivescan. These types are
* initialized after initializing Recursion node.
*/
 because the code seems to be doing exactly what the comment says it
 doesn't.
 
 Numerous comments appear to have been copied-and-pasted and not modified
 from the original.  Somebody will have to go over all that text.
 
 ruleutils.c fails completely for non-recursive WITH.  It *must* regenerate
 such a query with a WITH clause, not as a flattened subquery which is what
 you seem to be doing.  This isn't negotiable because the semantics are
 different.  This will mean at least some change in the parsetree
 representation.  Perhaps we could add a bool to subquery RTEs to mark them
 as coming from a nonrecursive WITH?  The tests added for RTE_RECURSIVE
 seem a bit ugly too.  If it thinks that can't happen it should Assert so,
 not just fall through silently.
 
 commentary for ParseState.p_ctenamespace is gratuitously unlike the
 comment style for the other fields, and p_recursive_namespace isn't
 documented at all.
 
 ParseState.p_in_with_clause is unused, should be removed.

Done.

 The WithClause struct definition is poorly commented.  It should be
 stated that it is used only pre-parse-analysis (assuming that continues
 to be true after you get done fixing ruleutils.c...), and it doesn't
 say what the elements of the subquery list are (specifically, what
 node type).  A lot of the other added structs and fields could use
 better commenting too.
 
 For that matter subquery is a poor name for WithClause's list of CTEs,
 especially so since it's hard to search for.  It should be a plural name
 and I'd be inclined to use something like ctes not subqueries.
 The term subquery is too overloaded already, so any place you can
 refer to a WITH-list member as a CTE you should do so.
 
 WithClause node may need a 

Re: [HACKERS] optimizing CleanupTempFiles

2008-09-17 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 BTW in testing this patch I was surprised by the fact that temp tables
 files are removed at checkpoint time,

[ blink... ]  Doesn't look like that should happen.  What is your
test case?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 0x1A in control file on Windows

2008-09-17 Thread Tom Lane
ITAGAKI Takahiro [EMAIL PROTECTED] writes:
 I found a bug that pg_controldata ends with error if control files
 contain 0x1A (Ctrl+Z) on Windows.

 We probably need to add PG_BINARY when we open control files
 because 0x1A is an end-of-file marker on Windows.

Well, why is that a bug?  If the platform is so silly as to define text
files that way, who are we to argue?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers