Re: [HACKERS] tsearch2 in 8.3

2007-04-25 Thread Jeremy Drake
On Tue, 24 Apr 2007, Bruce Momjian wrote:

 Naz Gassiep wrote:
  A few of us on IRC were wondering what the status of tsearch2 is in 8.3 ?
  Was it decided to include it in core or did we decide to keep FTS as a
  plugin?
  Some brief comments from anyone on the inside of the whole FTS issue
  would be greatly appreciated by us mere end users.
  Regards,

 The patch is in the patch queue and we will try to get it into 8.3.

Let me just say, that for me this is the most anticipated feature for 8.3.
Along with the patch to allow the database owner to create trusted PLs,
this will allow me to move all but one of my databases to my hosting
provider's PostgreSQL instance from my own instance running in my home
directory (the one I cannot move also requires dblink).  I can only
imagine there are other users out there in similar circumstances to mine.
I was lucky enough to find a hosting provider with shell access where I
can run a postgres instance and that I already had the know-how to do so.
Without running my own instance, my only other option was to choose the
lesser of two evils: do without FTS, or use MySQL. ;)

Sorry for the rant, I just wanted to make sure that people knew that this
is not just cosmetic, or a restructure for its own sake, but will actually
help real world users.


-- 
The cow is nothing but a machine which makes grass fit for us people to
eat.
-- John McNulty

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-25 Thread Zeugswetter Andreas ADI SD

  1) To deal with partial/inconsisitent write to the data file at
crash 
  recovery, we need full page writes at the first modification to
pages
  after each checkpoint.   It consumes much of WAL space.
 
 We need to find a way around this someday.  Other DBs don't 
 do this; it may be becuase they're less durable, or because 
 they fixed the problem.

They eighter can only detect a failure later (this may be a very long
time depending on access and verify runs) or they also write page
images. Those that write page images usually write before images to a
different area that is cleared periodically (e.g. during checkpoint).

Writing to a different area was considered in pg, but there were more
negative issues than positive.
So imho pg_compresslog is the correct path forward. The current
discussion is only about whether we want a more complex pg_compresslog
and no change to current WAL, or an increased WAL size for a less
complex implementation.
Both would be able to compress the WAL to the same archive log size.

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Buildfarm: Stage logs not available for MSVC builds

2007-04-25 Thread Dave Page
I just noticed that the stage logs aren't displayed against MSVC build
hosts as they are for regular hosts, eg:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mastodondt=2007-04-25%2001:00:02

vs.

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=narwhaldt=2007-04-25%2002:00:03

Is this WIP, or a bug to be fixed?

Regards Dave.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] autovacuum does not start in HEAD

2007-04-25 Thread ITAGAKI Takahiro
I found that autovacuum launcher does not launch any workers in HEAD.

AFAICS, we track the time to be vaccumed of each database in the following way:

1. In rebuild_database_list(), we initialize avl_dbase-adl_next_worker
   with (current_time + autovacuum_naptime / nDBs).
2. In do_start_worker(), we skip database entries that adl_next_worker
   is between current_time and current_time + autovacuum_naptime.
3. If there is no jobs in do_start_worker(), we call rebuild_database_list()
   to rebuild database entries.

The point is we use the same range (current_time and current_time +
autovacuum_naptime) at 1 and 2. We set adl_next_worker with values in the
range, and drop all of them at 2 because their values are in the range.
And if there is no database to vacuum, we re-initilaize database list at 3,
then we repeat the cycle.

Or am I missing something?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Dave Page
I'm seeing an ECPG-Check failure on Windows Vista - any ideas what might
be causing this?

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=vaquitadt=2007-04-24%2020:00:05

The only other Vista buildfarm member (baiji, on the same physical box)
is running MSVC builds which don't yet test ECPG from what I can see.

Regards, Dave

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Buildfarm: Stage logs not available for MSVC builds

2007-04-25 Thread Dave Page
Andrew Dunstan wrote:
 The problem not beacuse of MSVC, but because of member misconfiguration,
 by the look of it. The tar command string will need to be set in the
 config file and tar installed. I found that I needed bsdtar for Windows
 for this to work. See

Ah, OK, thanks - there was a typo in the path entry I'd added for tar.

'tis working now.

Regards, Dave

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Avoiding unnecessary reads in recovery

2007-04-25 Thread Heikki Linnakangas
In recovery, with full_pages_writes=on, we read in each page only to 
overwrite the contents with a full page image. That's a waste of time, 
and can have a surprisingly large effect on recovery time.


As a quick test on my laptop, I initialized a DBT-2 test with 5 
warehouses, and let it run for 2 minutes without think-times to generate 
some WAL. Then I did a kill -9 postmaster, and took a copy of the data 
directory to use for testing recovery.


With CVS HEAD, the recovery took ~ 2 minutes. With the attached patch, 
it took 5 seconds. (yes, I used the same not-yet-recovered data 
directory in both tests, and cleared the os cache with echo 1  
/proc/sys/vm/drop_caches).


I was surprised how big a difference it makes, but when you think about 
it it's logical. Without the patch, it's doing roughly the same I/O as 
the test itself, reading in pages, modifying them, and writing them 
back. With the patch, all the reads are done sequentially from the WAL, 
and then written back in a batch at the end of the WAL replay which is a 
lot more efficient.


It's interesting that (with the patch) full_page_writes can *shorten* 
your recovery time. I've always thought it to have a purely negative 
effect on performance.


I'll leave it up to the jury if this tiny little change is appropriate 
after feature freeze...


While working on this, this comment in ReadBuffer caught my eye:


/*
 * During WAL recovery, the first access to any data page should
 * overwrite the whole page from the WAL; so a clobbered page
 * header is not reason to fail.  Hence, when InRecovery we may
 * always act as though zero_damaged_pages is ON.
 */
if (zero_damaged_pages || InRecovery)
{


But that assumption only holds if full_page_writes is enabled, right? I 
changed that in the attached patch as well, but if it isn't accepted 
that part of it should still be applied, I think.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/transam/xlogutils.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xlogutils.c,v
retrieving revision 1.49
diff -c -r1.49 xlogutils.c
*** src/backend/access/transam/xlogutils.c	5 Jan 2007 22:19:24 -	1.49
--- src/backend/access/transam/xlogutils.c	25 Apr 2007 11:40:09 -
***
*** 226,232 
  	if (blkno  lastblock)
  	{
  		/* page exists in file */
! 		buffer = ReadBuffer(reln, blkno);
  	}
  	else
  	{
--- 226,235 
  	if (blkno  lastblock)
  	{
  		/* page exists in file */
! 		if(init)
! 			buffer = ZapBuffer(reln, blkno);
! 		else
! 			buffer = ReadBuffer(reln, blkno);
  	}
  	else
  	{
Index: src/backend/storage/buffer/bufmgr.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.216
diff -c -r1.216 bufmgr.c
*** src/backend/storage/buffer/bufmgr.c	30 Mar 2007 18:34:55 -	1.216
--- src/backend/storage/buffer/bufmgr.c	25 Apr 2007 11:44:27 -
***
*** 97,102 
--- 97,103 
  static void TerminateBufferIO(volatile BufferDesc *buf, bool clear_dirty,
    int set_flag_bits);
  static void buffer_write_error_callback(void *arg);
+ static Buffer ReadBuffer_common(Relation reln, BlockNumber blockNum, bool alloc_only);
  static volatile BufferDesc *BufferAlloc(Relation reln, BlockNumber blockNum,
  			bool *foundPtr);
  static void FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln);
***
*** 121,126 
--- 122,148 
  Buffer
  ReadBuffer(Relation reln, BlockNumber blockNum)
  {
+ 	return ReadBuffer_common(reln, blockNum, false);
+ }
+ 
+ /*
+  * ZapBuffer -- like ReadBuffer, but doesn't read the contents of the page
+  *		from disk. The caller is expected to completely rewrite the page,
+  *		regardless of the current contents. This should only be used in
+  *		recovery where there's no concurrent readers that might see the
+  *		contents of the page before the caller rewrites it.
+  */
+ Buffer
+ ZapBuffer(Relation reln, BlockNumber blockNum)
+ {
+ 	Assert(InRecovery);
+ 
+ 	return ReadBuffer_common(reln, blockNum, true);
+ }
+ 
+ static Buffer
+ ReadBuffer_common(Relation reln, BlockNumber blockNum, bool alloc_only)
+ {
  	volatile BufferDesc *bufHdr;
  	Block		bufBlock;
  	bool		found;
***
*** 253,269 
  	}
  	else
  	{
  		smgrread(reln-rd_smgr, blockNum, (char *) bufBlock);
  		/* check for garbage data */
  		if (!PageHeaderIsValid((PageHeader) bufBlock))
  		{
  			/*
! 			 * During WAL recovery, the first access to any data page should
! 			 * overwrite the whole page from the WAL; so a clobbered page
! 			 * header is not reason to fail.  Hence, when InRecovery we may
! 			 * always act as though zero_damaged_pages is ON.
  			 */
! 			if (zero_damaged_pages || InRecovery)
  			{
  

Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-25 Thread Kenneth Marshall
On Wed, Apr 25, 2007 at 10:00:16AM +0200, Zeugswetter Andreas ADI SD wrote:
 
   1) To deal with partial/inconsisitent write to the data file at
 crash 
   recovery, we need full page writes at the first modification to
 pages
   after each checkpoint.   It consumes much of WAL space.
  
  We need to find a way around this someday.  Other DBs don't 
  do this; it may be becuase they're less durable, or because 
  they fixed the problem.
 
 They eighter can only detect a failure later (this may be a very long
 time depending on access and verify runs) or they also write page
 images. Those that write page images usually write before images to a
 different area that is cleared periodically (e.g. during checkpoint).
 
 Writing to a different area was considered in pg, but there were more
 negative issues than positive.
 So imho pg_compresslog is the correct path forward. The current
 discussion is only about whether we want a more complex pg_compresslog
 and no change to current WAL, or an increased WAL size for a less
 complex implementation.
 Both would be able to compress the WAL to the same archive log size.
 
 Andreas
 
I definitely am in the camp of not increasing WAL size at all. If we
need a bit more complexity to ensure that, so be it. Any approach that
increases WAL volume would need to have an amazing benefit to make it
warranted. This certainly does not meet that criteria.

Ken


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] autovacuum does not start in HEAD

2007-04-25 Thread Alvaro Herrera
ITAGAKI Takahiro wrote:
 I found that autovacuum launcher does not launch any workers in HEAD.
 
 AFAICS, we track the time to be vaccumed of each database in the following 
 way:
 
 1. In rebuild_database_list(), we initialize avl_dbase-adl_next_worker
with (current_time + autovacuum_naptime / nDBs).
 2. In do_start_worker(), we skip database entries that adl_next_worker
is between current_time and current_time + autovacuum_naptime.
 3. If there is no jobs in do_start_worker(), we call rebuild_database_list()
to rebuild database entries.
 
 The point is we use the same range (current_time and current_time +
 autovacuum_naptime) at 1 and 2. We set adl_next_worker with values in the
 range, and drop all of them at 2 because their values are in the range.
 And if there is no database to vacuum, we re-initilaize database list at 3,
 then we repeat the cycle.
 
 Or am I missing something?

Note that rebuild_database_list skips databases that don't have stat
entries.  Maybe that's what confusing your examination.  When the list
is empty, worker are launched only every naptime seconds; and then it'll
also pick only databases with stat entries.  All other databases will be
skipped until the max_freeze_age is reached.  Right after an initdb or a
WAL replay, all database stats are deleted.

The point of (1) is to spread the starting of workers in the
autovacuum_naptime interval.

The point of (2) is that we don't want to process a database that was
processed too recently (less than autovacuum_naptime seconds ago).  This
is useful in the cases where databases are dropped, so the launcher is
awakened earlier than what the schedule would say if the dropped
database were not in the list.  It is possible that I confused the
arithmetic in there (because TimestampDifference does not return
negative results so there may be strange corner cases), but the last
time I examined it it was correct.

The point of (3) is to cover the case where there were no databases
being previously autovacuumed and that may now need vacuuming (i.e. just
after a database got its stat entry).

The fact that some databases may not have stat entries tends to confuse
the logic, both in rebuild_database_list and do_start_worker.  If it's
not documented enough maybe it needs extra clarification in code
comments.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-25 Thread Heikki Linnakangas

Heikki Linnakangas wrote:

While working on this, this comment in ReadBuffer caught my eye:


/*
 * During WAL recovery, the first access to any data page should
 * overwrite the whole page from the WAL; so a clobbered page
 * header is not reason to fail.  Hence, when InRecovery we may
 * always act as though zero_damaged_pages is ON.
 */
if (zero_damaged_pages || InRecovery)
{


But that assumption only holds if full_page_writes is enabled, right? I 
changed that in the attached patch as well, but if it isn't accepted 
that part of it should still be applied, I think.


On second thought, my fix still isn't 100% right because one could turn 
full_page_writes on before starting replay.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-25 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 While working on this, this comment in ReadBuffer caught my eye:

  /*
   * During WAL recovery, the first access to any data page should
   * overwrite the whole page from the WAL; so a clobbered page
   * header is not reason to fail.  Hence, when InRecovery we may
   * always act as though zero_damaged_pages is ON.
   */
  if (zero_damaged_pages || InRecovery)
  {

 But that assumption only holds if full_page_writes is enabled, right? I 
 changed
 that in the attached patch as well, but if it isn't accepted that part of it
 should still be applied, I think.

Well it's only true if full_page_writes was on when the WAL was written. Which
isn't necessarily the same as saying it's enabled during recovery...

As long as there's a backup block in the log we can use it to clobber pages in
the heap -- which is what your patch effectively does anyways. If we're
replaying a log entry where there isn't a backup block and we find a damaged
page then we're in trouble. Either the damaged page was in a previous backup
block or it's the recovery itself that's damaging it. 

In the latter case it would be pretty useful to abort the recovery so the user
doesn't lose his backup and has a chance to recovery properly (possibly after
reporting and fixing the bug).

So in short I think with your patch this piece of code no longer has a role.
Either your patch kicks in and we never even look at the damaged page at all,
or we should be treating it as corrupt data and just check zero_damaged_pages
alone and not do anything special in recovery.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Michael Meskes
On Wed, Apr 25, 2007 at 10:47:57AM +0100, Dave Page wrote:
 I'm seeing an ECPG-Check failure on Windows Vista - any ideas what might
 be causing this?

Hmm, first glance suggests some permission problems. 

I never touched a Vista system so far, so I'm at a loss as far as
details are concerned.

I also saw that wombat is segfaulting in ecpg tests but not only with
CVS HEAD but also trying to test 8.2. Any idea what's going on with this
machine?

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] database size estimates

2007-04-25 Thread Francois Deliege

Hi,

I am trying to estimate the size of a table composed of 51754000 rows.
Each row has 31 attributes: 16 x bit(24) and 15 x bit(8)

So, the payload should be:
51754000 x ( 16 x 24 + 15 x 24 ) bits = 3115 MB

Now, from what I understand from postgresql manual is that the overhead 
is composed of

32 bytes per row + 20 bytes per page.
This leads me to approx. 1700 MB overhead.
Therefore a total table size of 4900 MB.

However, when I load the table, the table reaches the size of 21500 MB, 
i.e., 400% of my estimate.

The table has no toast table, no index.
So I am wondering if someone could give me a better estimate.

Cheers,

Francois
begin:vcard
fn;quoted-printable:Fran=C3=A7ois Deli=C3=A8ge
n;quoted-printable;quoted-printable:Deli=C3=A8ge;Fran=C3=A7ois
org:AAU;CS department
adr:;;;Aalborg;;9000;Danemark
email;internet:[EMAIL PROTECTED]
title:PhD student
tel;work:+45 96359830
x-mozilla-html:TRUE
url:http://www.cs.aau.dk/~fdeliege
version:2.1
end:vcard


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Dave Page
Michael Meskes wrote:
 On Wed, Apr 25, 2007 at 10:47:57AM +0100, Dave Page wrote:
 I'm seeing an ECPG-Check failure on Windows Vista - any ideas what might
 be causing this?
 
 Hmm, first glance suggests some permission problems. 

Yes, that was my thought as well, however I ran cacls down the entire
build tree, granting full control to Everyone on everything, but the
problem persists.

Additionally, all the other tests pass, including make check and make
installcheck, so unless the ECPG tests are trying to create something
outside of the buildtree, I don't think it is a permissions issue.

Is there anything I can try building/running manually to help debug this?

 I also saw that wombat is segfaulting in ecpg tests but not only with
 CVS HEAD but also trying to test 8.2. Any idea what's going on with this
 machine?
 
 Michael

Can't help with that one I'm afraid.

Regards, Dave

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] database size estimates

2007-04-25 Thread Heikki Linnakangas

Francois Deliege wrote:

Hi,

I am trying to estimate the size of a table composed of 51754000 rows.
Each row has 31 attributes: 16 x bit(24) and 15 x bit(8)

So, the payload should be:
51754000 x ( 16 x 24 + 15 x 24 ) bits = 3115 MB


What data types are those exactly? If those 24-bit fields are in fact 
text, varchar, char(x) or other data types that are stored as variable 
length fields, the varlen header will take 4 bytes. And then there's 
alignment, those 24-bit fields are most almost certainly 4-byte aligned, 
which means that there'll be one byte of padding between them.


The upcoming 8.3 release will be much better in that respect, It'll use 
just a 1 byte varlen header per field instead of 4 bytes for small 
values like yours. You might want to test a CVS snapshot.


Now, from what I understand from postgresql manual is that the overhead 
is composed of

32 bytes per row + 20 bytes per page.
This leads me to approx. 1700 MB overhead.
Therefore a total table size of 4900 MB.


In addition, there will be on average 1/2 rows worth of wasted space on 
every page.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Kill a Long Running Query

2007-04-25 Thread Mageshwaran

Hi ,
Any body tell me how to kill a long running query in postgresql, is 
there any statement to kill a query, and also tell me how to log slow 
queries to a log file.


Regards
J Mageshwaran

** DISCLAIMER **
Information contained and transmitted by this E-MAIL is proprietary to 
Sify Limited and is intended for use only by the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential or exempt from disclosure under applicable law. If this is a 
forwarded message, the content of this E-MAIL may not have been sent with 
the authority of the Company. If you are not the intended recipient, an 
agent of the intended recipient or a  person responsible for delivering the 
information to the named recipient,  you are notified that any use, 
distribution, transmission, printing, copying or dissemination of this 
information in any way or in any manner is strictly prohibited. If you have 
received this communication in error, please delete this mail  notify us 
immediately at [EMAIL PROTECTED]



Complete Coverage of the ICC World Cup '07! Log on to www.sify.com/khel for 
latest updates, expert columns, schedule, desktop scorecard, photo galleries 
and more!


Watch the hottest videos from Bollywood, Fashion, News and more only on 
www.sifymax.com


For the Expert view of the ICC World Cup log on to www.sify.com/khel. 
Read exclusive interviews with Sachin, Ganguly, Yuvraj, Sreesanth, Expert 
Columns by Gavaskar, Web chat with Dhoni and more! .


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [ADMIN] Kill a Long Running Query

2007-04-25 Thread Aaron Bono

On 4/25/07, Mageshwaran [EMAIL PROTECTED] wrote:


Hi ,
Any body tell me how to kill a long running query in postgresql, is
there any statement to kill a query, and also tell me how to log slow
queries to a log file.

Regards
J Mageshwaran




See if this helps:
http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00039.php




--
==
  Aaron Bono
  Aranya Software Technologies, Inc.
  http://www.aranya.com
  http://codeelixir.com
==


Re: [HACKERS] Kill a Long Running Query

2007-04-25 Thread Andrew Dunstan

Mageshwaran wrote:

Hi ,
Any body tell me how to kill a long running query in postgresql, is 
there any statement to kill a query, and also tell me how to log slow 
queries to a log file.





First. please do not cross-post like this. Pick the correct list and use it.

Second, this query definitely does not belong on the -hackers list.

Third, please find a way of posting to lists that does not include a 
huge disclaimer and advertisements. If that is added by your company's 
mail server, you should look at using some other method of posting such 
as gmail.


Fourth, please read our excellent documentation. It contains the answers 
to your questions, I believe.


cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Kill a Long Running Query

2007-04-25 Thread Heikki Linnakangas
Please don't cross-post to multiple mailing lists. And pgsql-hackers is 
not the correct list for basic usage questions. And long end-of-mail 
disclaimers are not generally appreciated.


Mageshwaran wrote:
Any body tell me how to kill a long running query in postgresql, is 
there any statement to kill a query, 


See the user manual on administration functions, pg_cancel_backend in 
particular:


http://www.postgresql.org/docs/8.2/interactive/functions-admin.html

Basically you issue a SELECT * FROM pg_stat_activity, or plain ps to 
find out the pid of the backend executing the long running query, and 
then use pg_cancel_backend (or kill -INT) to cancel it.



and also tell me how to log slow queries to a log file.


Using the log_min_duration_statement configuration variable.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-25 Thread Heikki Linnakangas

Gregory Stark wrote:

So in short I think with your patch this piece of code no longer has a role.
Either your patch kicks in and we never even look at the damaged page at all,
or we should be treating it as corrupt data and just check zero_damaged_pages
alone and not do anything special in recovery.


Good point. Adjusted patch attached. I added some comments as well.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/transam/xlogutils.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xlogutils.c,v
retrieving revision 1.49
diff -c -r1.49 xlogutils.c
*** src/backend/access/transam/xlogutils.c	5 Jan 2007 22:19:24 -	1.49
--- src/backend/access/transam/xlogutils.c	25 Apr 2007 14:27:05 -
***
*** 226,232 
  	if (blkno  lastblock)
  	{
  		/* page exists in file */
! 		buffer = ReadBuffer(reln, blkno);
  	}
  	else
  	{
--- 226,235 
  	if (blkno  lastblock)
  	{
  		/* page exists in file */
! 		if (init)
! 			buffer = ZapBuffer(reln, blkno);
! 		else
! 			buffer = ReadBuffer(reln, blkno);
  	}
  	else
  	{
Index: src/backend/storage/buffer/bufmgr.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.216
diff -c -r1.216 bufmgr.c
*** src/backend/storage/buffer/bufmgr.c	30 Mar 2007 18:34:55 -	1.216
--- src/backend/storage/buffer/bufmgr.c	25 Apr 2007 14:36:15 -
***
*** 17,22 
--- 17,26 
   *		and pin it so that no one can destroy it while this process
   *		is using it.
   *
+  * ZapBuffer() -- like ReadBuffer, but destroys the contents of the 
+  *		page. Used in recovery when the page is completely overwritten 
+  *		from WAL.
+  *
   * ReleaseBuffer() -- unpin a buffer
   *
   * MarkBufferDirty() -- mark a pinned buffer's contents as dirty.
***
*** 97,102 
--- 101,107 
  static void TerminateBufferIO(volatile BufferDesc *buf, bool clear_dirty,
    int set_flag_bits);
  static void buffer_write_error_callback(void *arg);
+ static Buffer ReadBuffer_common(Relation reln, BlockNumber blockNum, bool alloc_only);
  static volatile BufferDesc *BufferAlloc(Relation reln, BlockNumber blockNum,
  			bool *foundPtr);
  static void FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln);
***
*** 121,126 
--- 126,155 
  Buffer
  ReadBuffer(Relation reln, BlockNumber blockNum)
  {
+ 	return ReadBuffer_common(reln, blockNum, false);
+ }
+ 
+ /*
+  * ZapBuffer -- like ReadBuffer, but doesn't read the contents of the page
+  *		from disk. The caller is expected to completely rewrite the page,
+  *		regardless of the current contents. This should only be used in
+  *		recovery where there's no concurrent readers that might see the
+  *		contents of the page before the caller rewrites it.
+  */
+ Buffer
+ ZapBuffer(Relation reln, BlockNumber blockNum)
+ {
+ 	Assert(InRecovery);
+ 
+ 	return ReadBuffer_common(reln, blockNum, true);
+ }
+ 
+ /*
+  * ReadBuffer_common -- common logic of ReadBuffer and ZapBuffer.
+  */
+ static Buffer
+ ReadBuffer_common(Relation reln, BlockNumber blockNum, bool alloc_only)
+ {
  	volatile BufferDesc *bufHdr;
  	Block		bufBlock;
  	bool		found;
***
*** 253,269 
  	}
  	else
  	{
  		smgrread(reln-rd_smgr, blockNum, (char *) bufBlock);
  		/* check for garbage data */
  		if (!PageHeaderIsValid((PageHeader) bufBlock))
  		{
  			/*
! 			 * During WAL recovery, the first access to any data page should
! 			 * overwrite the whole page from the WAL; so a clobbered page
! 			 * header is not reason to fail.  Hence, when InRecovery we may
! 			 * always act as though zero_damaged_pages is ON.
  			 */
! 			if (zero_damaged_pages || InRecovery)
  			{
  ereport(WARNING,
  		(errcode(ERRCODE_DATA_CORRUPTED),
--- 282,304 
  	}
  	else
  	{
+ 		/* 
+ 		 * Read in the page, unless the caller intends to overwrite it
+ 		 * and just wants us to allocate a buffer.
+ 		 */
+ 		if (!alloc_only)
+ 		{
  		smgrread(reln-rd_smgr, blockNum, (char *) bufBlock);
  		/* check for garbage data */
  		if (!PageHeaderIsValid((PageHeader) bufBlock))
  		{
  			/*
! 			 * We used to ignore corrupt pages in WAL recovery, but
! 			 * that was only ever safe if full_page_writes was enabled.
! 			 * Now the caller sets alloc_only to false if he intends to 
! 			 * overwrite the whole page, which is already checked above.
  			 */
! 			if (zero_damaged_pages)
  			{
  ereport(WARNING,
  		(errcode(ERRCODE_DATA_CORRUPTED),
***
*** 277,282 
--- 312,318 
   errmsg(invalid page header in block %u of relation \%s\,
  		blockNum, RelationGetRelationName(reln;
  		}
+ 		}
  	}
  
  	if (isLocalBuf)
Index: src/include/storage/bufmgr.h
===
RCS file: 

Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Andrew Dunstan

Dave Page wrote:

Michael Meskes wrote:
  

On Wed, Apr 25, 2007 at 10:47:57AM +0100, Dave Page wrote:


I'm seeing an ECPG-Check failure on Windows Vista - any ideas what might
be causing this?
  
Hmm, first glance suggests some permission problems. 



Yes, that was my thought as well, however I ran cacls down the entire
build tree, granting full control to Everyone on everything, but the
problem persists.
  


Please don't do that on your buildfarm repo copy (if that's what you 
did). You should not touch *anything* inside it. If need to you do this, 
make a copy (see later) and alter that.


If you did do this to the buildfarm repo copy, please blow it away so 
that buildfarm will get a fresh clean copy next time it runs.



Additionally, all the other tests pass, including make check and make
installcheck, so unless the ECPG tests are trying to create something
outside of the buildtree, I don't think it is a permissions issue.

Is there anything I can try building/running manually to help debug this?
  



To recreate the failing buildfarm environment (including making an 
alterable repo copy), run the following (with suitable local mods):


run_build.pl --test --keepall

At the end of that you should have the build tree that was actually 
used, and the install tree too if it got that far. Then you can start 
playing manually.


Remember to move or remove the kept trees before your next buildfarm run.

cheers

andrew



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Dave Page
Andrew Dunstan wrote:
 Please don't do that on your buildfarm repo copy (if that's what you
 did). You should not touch *anything* inside it. If need to you do this,
 make a copy (see later) and alter that.
 
 If you did do this to the buildfarm repo copy, please blow it away so
 that buildfarm will get a fresh clean copy next time it runs.

No, I reset the permissions on /msys. Thats entirely necessary on Vista
where the permissions are so locked down by default that you can't even
create it without some measure of pain.

Regards, Dave.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Andrew Dunstan

Dave Page wrote:

Andrew Dunstan wrote:
  

Please don't do that on your buildfarm repo copy (if that's what you
did). You should not touch *anything* inside it. If need to you do this,
make a copy (see later) and alter that.

If you did do this to the buildfarm repo copy, please blow it away so
that buildfarm will get a fresh clean copy next time it runs.



No, I reset the permissions on /msys. Thats entirely necessary on Vista
where the permissions are so locked down by default that you can't even
create it without some measure of pain.

  


Yes, my Vista experience so far has been very unpleasant indeed. It 
strikes me as the OS equivalent of jumping the shark. I haven't even 
thought about Msys ...


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-25 Thread Simon Riggs
On Wed, 2007-04-25 at 13:48 +0100, Heikki Linnakangas wrote:

 I was surprised how big a difference it makes, but when you think about 
 it it's logical. Without the patch, it's doing roughly the same I/O as 
 the test itself, reading in pages, modifying them, and writing them 
 back. With the patch, all the reads are done sequentially from the WAL, 
 and then written back in a batch at the end of the WAL replay which is a 
 lot more efficient.

Interesting patch.

It would be good to see a longer term test. I'd expect things to fall
back to about 50% of the time on a longer recovery where the writes need
to be written out of cache.

I was thinking of getting the bgwriter to help out to improve matters
there.


Patch-wise, I love the name ZapBuffer() but would think that
OverwriteBuffer() would be more descriptive.

As regards the zero_damaged_pages question, I raised that some time ago
but we didn't arrive at an explicit answer. All I would say is we can't
allow invalid pages in the buffer manager at any time, whatever options
we have requested, otherwise other code will fail almost immediately.
I'm not sure there's another option?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] temporal variants of generate_series()

2007-04-25 Thread Neil Conway
On Thu, 2007-04-12 at 14:56 -0700, Andrew Hammond wrote:
 I've written the following function definitions to extend
 generate_series to support some temporal types (timestamptz, date and
 time). Please include them if there's sufficient perceived need or
 value.

I could see these being useful, but a PL/PgSQL implementation is not
eligible for inclusion in the core backend (since PL/PgSQL is not
enabled by default).

-Neil



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Mark Wong

On 4/25/07, Michael Meskes [EMAIL PROTECTED] wrote:

I also saw that wombat is segfaulting in ecpg tests but not only with
CVS HEAD but also trying to test 8.2. Any idea what's going on with this
machine?


I generated a stack trace for REL8_2_STABLE, but I'm not sure how
helpful it is.  Let me know what other information I can provide...
Looks like I don't have symbols for libc.

Core was generated by `sql/update '.
Program terminated with signal 11, Segmentation fault.
#0  ECPGget_variable (ap=value optimized out, type=value optimized out,
   var=0x10028730, indicator=1 '\001') at execute.c:131
131 var-ind_value = *((char **)
(var-ind_pointer));
(gdb) bt
#0  ECPGget_variable (ap=value optimized out, type=value optimized out,
   var=0x10028730, indicator=1 '\001') at execute.c:131
#1  0x04048948 in ECPGdo (lineno=28, compat=value optimized out,
   force_indicator=value optimized out,
   connection_name=value optimized out, query=value optimized out)
   at execute.c:195
#2  0x1d20 in main (argc=value optimized out,
   argv=value optimized out) at update.pgc:28
#3  0x0463ce4c in .generic_start_main () from /lib/libc.so.6
#4  0x0463d0f8 in .__libc_start_main () from /lib/libc.so.6
#5  0x in ?? ()


Regards,
Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Andrew Dunstan

Mark Wong wrote:

On 4/25/07, Michael Meskes [EMAIL PROTECTED] wrote:

I also saw that wombat is segfaulting in ecpg tests but not only with
CVS HEAD but also trying to test 8.2. Any idea what's going on with this
machine?


I generated a stack trace for REL8_2_STABLE, but I'm not sure how
helpful it is.  Let me know what other information I can provide...
Looks like I don't have symbols for libc.

Core was generated by `sql/update '.
Program terminated with signal 11, Segmentation fault.
#0  ECPGget_variable (ap=value optimized out, type=value optimized 
out,

   var=0x10028730, indicator=1 '\001') at execute.c:131
131 var-ind_value = *((char **)
(var-ind_pointer));
(gdb) bt
#0  ECPGget_variable (ap=value optimized out, type=value optimized 
out,

   var=0x10028730, indicator=1 '\001') at execute.c:131
#1  0x04048948 in ECPGdo (lineno=28, compat=value optimized 
out,

   force_indicator=value optimized out,
   connection_name=value optimized out, query=value optimized out)
   at execute.c:195
#2  0x1d20 in main (argc=value optimized out,
   argv=value optimized out) at update.pgc:28
#3  0x0463ce4c in .generic_start_main () from /lib/libc.so.6
#4  0x0463d0f8 in .__libc_start_main () from /lib/libc.so.6
#5  0x in ?? ()





I think you'll need to compile with optimisation turned off and then try 
running the test under debugger control, putting a breakpoint in 
ECPGget_variable() and then stepping through it. I wonder what value of 
var-ind_pointer it is getting?


cheers

andrew


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-25 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 As regards the zero_damaged_pages question, I raised that some time ago
 but we didn't arrive at an explicit answer. All I would say is we can't
 allow invalid pages in the buffer manager at any time, whatever options
 we have requested, otherwise other code will fail almost immediately.

Yeah --- the proposed new bufmgr routine should probably explicitly zero
the content of the buffer.  It doesn't really matter in the context of
WAL recovery, since there can't be any concurrent access to the buffer,
but it'd make it safe to use in non-WAL contexts (I think there are
other places where we know we are going to init the page and so a
physical read is a waste of time).  Also, this would let the patch be

+   if (alloc_only)
+   MemSet...
+   else
smgrread...

and you don't need to hack out the PageHeaderIsValid test, since it will
allow zeroed pages.

Possibly ReadZeroedBuffer would be a better name?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 I think you'll need to compile with optimisation turned off and then try 
 running the test under debugger control, putting a breakpoint in 
 ECPGget_variable() and then stepping through it. I wonder what value of 
 var-ind_pointer it is getting?

You could probably inspect the contents of *var in that dump without
having to recompile.  Given that this is PPC64, I'm betting on a pointer
size or alignment problem ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-25 Thread Josh Berkus
Andreas,

 Writing to a different area was considered in pg, but there were more
 negative issues than positive.
 So imho pg_compresslog is the correct path forward. The current
 discussion is only about whether we want a more complex pg_compresslog
 and no change to current WAL, or an increased WAL size for a less
 complex implementation.
 Both would be able to compress the WAL to the same archive log size.

Huh?  As conceived, pg_compresslog does nothing to lower log volume for 
general purposes, just on-disk storage size for archiving.  It doesn't help 
us at all with the tremendous amount of log we put out for an OLTP server, 
for example.

Not that pg_compresslog isn't useful on its own for improving warm standby 
managability, but it's completely separate from addressing the we're logging 
too much issue.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Mark Wong

On 4/25/07, Tom Lane [EMAIL PROTECTED] wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
 I think you'll need to compile with optimisation turned off and then try
 running the test under debugger control, putting a breakpoint in
 ECPGget_variable() and then stepping through it. I wonder what value of
 var-ind_pointer it is getting?

You could probably inspect the contents of *var in that dump without
having to recompile.  Given that this is PPC64, I'm betting on a pointer
size or alignment problem ...


Does this help?

(gdb) p var-type
$1 = 4267828624
(gdb) p var-value
$2 = (void *) 0x1
(gdb) p var-pointer
$3 = (void *) 0x1
(gdb) p var-varcharsize
$4 = 3
(gdb) p var-arrsize
$5 = 4
(gdb) p var-offset
$6 = 29
(gdb) p var-ind_type
$7 = 0
(gdb) p var-ind_pointer
$8 = (void *) 0x0
(gdb) p var-ind_varcharsize
$9 = 0
(gdb) p var-ind_arrsize
$10 = 0
(gdb) p var-ind_offset
$11 = 5
(gdb) p var-next
$12 = (struct variable *) 0x0

Regards,
Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [BUGS] BUG #3245: PANIC: failed to re-find shared loc k o b j ect

2007-04-25 Thread Tom Lane
I wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 We could have two kinds of seq scans, with and without support for 
 concurrent inserts.

 Yeah, I considered that too, but it just seems too error-prone.  We
 could maybe make it trustworthy by having hash_seq_search complain if
 it noticed there had been any concurrent insertions --- but then you're
 putting new overhead into hash_seq_search, which kind of defeats the
 argument for it (and hash_seq_search is a bit of a bottleneck, so extra
 cycles there matter).

I just finished looking through the uses of hash_seq_search, and
realized that there is one place where it would be a bit painful to
convert to the insertion-safe approach I'm proposing; namely nodeAgg.c.
The places where the hashtable iteration is started and used are
scattered, and we don't really track whether the iteration is done or
not, so it's hard to be sure where to cancel the iteration.  It could
probably be made to work but it seems like it'd be fragile.

I still don't want to introduce more checking overhead into
hash_seq_search, though, so what I'm now thinking about is a new
dynahash primitive named something like hash_freeze, which'd mark a
hashtable as disallowing insertions.  If the hashtable is frozen before
hash_seq_init then we don't add it to the central list of scans, and
therefore there is no cleanup to do at the end.  nodeAgg can use this
mode since it doesn't modify its hashtable anymore after beginning its
readout scan.

BTW, we didn't really get into details, but for the insertion-safe case
I'm envisioning adding a routine hash_seq_term, which you would need
to call if and only if you abandon a hash_seq_search scan without
running it to completion (if you do the complete scan, hash_seq_search
will automatically call hash_seq_term before returning NULL).  All but
a very small number of places run their searches to completion and
therefore won't require any source code changes with this API.

Thoughts?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] strange buildfarm failures

2007-04-25 Thread Stefan Kaltenbrunner
Stefan Kaltenbrunner wrote:
 two of my buildfarm members had different but pretty weird looking
 failures lately:
 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quaggadt=2007-04-25%2002:03:03
 
 and
 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=emudt=2007-04-24%2014:35:02
 
 
 any ideas on what might causing those ?

lionfish just failed too:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2007-04-25%2005:30:09


Stefan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Mark Wong

On 4/25/07, Tom Lane [EMAIL PROTECTED] wrote:

Mark Wong [EMAIL PROTECTED] writes:
 Does this help?

 (gdb) p var-ind_pointer
 $8 = (void *) 0x0

Well, that seems to be the reason why it's failing to indirect through
ind_pointer ... but why is it only failing on your machine and not
everyone else's?  I think this indicates something unportable about
ecpg's usage of va_list.

Hmm, and I don't have to look far to find a smoking gun:

#if defined(__GNUC__)  (defined (__powerpc__) || defined(__amd64__) || 
defined(__x86_64__))
if (create_statement(lineno, compat, force_indicator, con, stmt, 
query, args) == false)
#else
if (create_statement(lineno, compat, force_indicator, con, stmt, query, 
args) == false)
#endif

Why in the world is that like that?  We don't have such a kluge
anyplace else we use va_list.  stringinfo.c for instance has
never needed any such thing.

Mark, does your gcc define __powerpc__, or only __powerpc64__?


$ touch foo.c; gcc -E -dM foo.c | grep __p ; rm foo.c
#define __powerpc64__ 1
#define __powerpc__ 1

Regards,
Mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Andrew Dunstan

Tom Lane wrote:


Hmm, and I don't have to look far to find a smoking gun:

#if defined(__GNUC__)  (defined (__powerpc__) || defined(__amd64__) || 
defined(__x86_64__))
if (create_statement(lineno, compat, force_indicator, con, stmt, 
query, args) == false)
#else
if (create_statement(lineno, compat, force_indicator, con, stmt, query, 
args) == false)
#endif


  

I also see:

#if defined(__GNUC__)  (defined (__powerpc__) || defined(__amd64__) || 
defined(__x86_64__))

#define APREF ap
#else
#define APREF *ap
#endif


But I also see that my amd64/FC6 machine does pass these tests with gcc.

I would certainly be nice if we could simplify all this. And if not, we 
should have a note about why it's needed.


cheers

andrew


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] My upcoming travel

2007-04-25 Thread Bruce Momjian
I am leaving Friday for a two week trip to Sydney and Melbourne,
Australia, Mumbai and Pune, India, and London, England.

I will be attending Open Cebit in Sydney, and a Sydney PostgreSQL Users
Group meeting.  London is planning to put together a user group meeting.
I expect both events to appear on the main PostgreSQL web page once
they are finalized.  If Melbourne, Mumbai, or Pune have user groups, I
would be glad to visit them too.

I will try to read email during the trip, but obviously not as
frequently.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-25 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 But I also see that my amd64/FC6 machine does pass these tests with gcc.

Yeah, but the typedef represented by va_list can and probably does vary
between amd64 and ppc64.  I haven't an easy way to check, but I wonder
whether it's not an array type on ppc.

I'm of the opinion that ecpg is trying to do something here that's not
portable.  The C99 draft I have says

   [#3] The type declared is

   va_list

   which is an object type  suitable  for  holding  information
   needed  by the macros va_start, va_arg, va_end, and va_copy.
   If access to the varying arguments is  desired,  the  called
   function  shall declare an object (referred to as ap in this
   subclause) having type va_list.  The object ap may be passed
   as an argument to another function; if that function invokes
   the va_arg macro with parameter ap, the value of ap  in  the
   calling function is indeterminate and shall be passed to the
   va_end macro prior to any further reference to ap. (198)

   

  198 It is permitted to create a pointer to a va_list and pass
  that  pointer  to  another  function,  in  which case the
  original function may make further use  of  the  original
  list after the other function returns.

The footnote seems to say that what the code is doing is OK ... but
there isn't any such footnote in the Single Unix Spec:
http://www.opengroup.org/onlinepubs/007908799/xsh/stdarg.h.html
which makes me wonder just how portable it really is.

My recommendation is to get rid of the APREF hack, deal only in
va_list not va_list, and inline ECPGget_variable into the two
places it's used to avoid the question of passing va_lists around
after they've been modified.  The routine's not that big (especially
seeing that only half of it is actually shared by the two callers)
and it's just not worth the notational effort, let alone any portability
risks, to factor it out.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fragmentation project

2007-04-25 Thread Gustavo Tonini

Josh,

On 4/23/07, Josh Berkus [EMAIL PROTECTED] wrote:

Gustavo,

  Oh, you're talking about distributing partitions across different nodes
  and parallelizing queries. No, we don't do that today.

 Yes.This is the goal. Well, I will try it. I'll send the project
 reports to this list. Comments will be valuable. Desire me good
 luck...

You might join/look at the PgPoolII project, which is working on parallel
query amoung other things.



The pgpool is an interesting approach to this, but I think that the
funcionality of inserting a record at a backend which will be
redirectioned to other and verifying deadlocks under network demands
in acquiring locks on the referenced records/tables in several hosts.
Then, IMO, this may be implemented inside dbms. How Marko wrote, this
is a non-trivial solution...
Really, It could be improved on pgpool to be a process coordinator,
but will need some changes in backend too.
This is a non trivial implementation, but there are several users
waiting for an effective solution for data distributing in a cluster.
These users actually buy commercial solutions or build themselves one.

Gustavo.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Fragmentation project

2007-04-25 Thread Gustavo Tonini

Marko,

On 4/24/07, Marko Kreen [EMAIL PROTECTED] wrote:

On 4/23/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:
 Oh, you're talking about distributing partitions across different nodes
 and parallelizing queries. No, we don't do that today.

PL/Proxy actually works like that, only in smaller scope -
for function calls only.



I think that proposed funcionalities cannot be implemented in a PL scope...

Gustavo.


General solution that partitions free-form SQL
will be non-trivial...



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] Re: [BUGS] BUG #3245: PANIC: failed to re-find shared loc k o b j ect

2007-04-25 Thread Heikki Linnakangas

Tom Lane wrote:

I wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
We could have two kinds of seq scans, with and without support for 
concurrent inserts.



Yeah, I considered that too, but it just seems too error-prone.  We
could maybe make it trustworthy by having hash_seq_search complain if
it noticed there had been any concurrent insertions --- but then you're
putting new overhead into hash_seq_search, which kind of defeats the
argument for it (and hash_seq_search is a bit of a bottleneck, so extra
cycles there matter).


I just finished looking through the uses of hash_seq_search, and
realized that there is one place where it would be a bit painful to
convert to the insertion-safe approach I'm proposing; namely nodeAgg.c.
The places where the hashtable iteration is started and used are
scattered, and we don't really track whether the iteration is done or
not, so it's hard to be sure where to cancel the iteration.  It could
probably be made to work but it seems like it'd be fragile.

I still don't want to introduce more checking overhead into
hash_seq_search, though, so what I'm now thinking about is a new
dynahash primitive named something like hash_freeze, which'd mark a
hashtable as disallowing insertions.  If the hashtable is frozen before
hash_seq_init then we don't add it to the central list of scans, and
therefore there is no cleanup to do at the end.  nodeAgg can use this
mode since it doesn't modify its hashtable anymore after beginning its
readout scan.


This plan includes having the list of hash tables that mustn't be 
expanded? And the list would be cleaned up at the end of transaction, to 
avoid leaks.



BTW, we didn't really get into details, but for the insertion-safe case
I'm envisioning adding a routine hash_seq_term, which you would need
to call if and only if you abandon a hash_seq_search scan without
running it to completion (if you do the complete scan, hash_seq_search
will automatically call hash_seq_term before returning NULL).  All but
a very small number of places run their searches to completion and
therefore won't require any source code changes with this API.


Sounds good to me.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-25 Thread Heikki Linnakangas

Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

As regards the zero_damaged_pages question, I raised that some time ago
but we didn't arrive at an explicit answer. All I would say is we can't
allow invalid pages in the buffer manager at any time, whatever options
we have requested, otherwise other code will fail almost immediately.


Yeah --- the proposed new bufmgr routine should probably explicitly zero
the content of the buffer.  It doesn't really matter in the context of
WAL recovery, since there can't be any concurrent access to the buffer,
but it'd make it safe to use in non-WAL contexts (I think there are
other places where we know we are going to init the page and so a
physical read is a waste of time).


Is there? I can't think of any. Extending a relation doesn't count.

Zeroing the buffer explicitly might be a good idea anyway. I agree it 
feels a bit dangerous to have a moment when there's a garbled page in 
buffer cache, even if only in recovery.



Also, this would let the patch be

+   if (alloc_only)
+   MemSet...
+   else
smgrread...

and you don't need to hack out the PageHeaderIsValid test, since it will
allow zeroed pages.


Well, I'd still put the PageHeaderIsValid test in the else-branch. Not 
like the few cycles spent on the test matter, but I don't see a reason 
not to.



Possibly ReadZeroedBuffer would be a better name?


Yeah, sounds good. Read is a bit misleading, since it doesn't actually 
read in the buffer, but it's good that the name is closely related to 
ReadBuffer.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [BUGS] BUG #3245: PANIC: failed to re-find shared loc k o b j ect

2007-04-25 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I still don't want to introduce more checking overhead into
 hash_seq_search, though, so what I'm now thinking about is a new
 dynahash primitive named something like hash_freeze, which'd mark a
 hashtable as disallowing insertions.  If the hashtable is frozen before
 hash_seq_init then we don't add it to the central list of scans, and
 therefore there is no cleanup to do at the end.  nodeAgg can use this
 mode since it doesn't modify its hashtable anymore after beginning its
 readout scan.

 This plan includes having the list of hash tables that mustn't be 
 expanded? And the list would be cleaned up at the end of transaction, to 
 avoid leaks.

Right, all that's still the same.  This is just a way to exempt certain
scans from that machinery, by allowing the caller to declare he doesn't
need to modify the hashtable anymore.  AFAICS that covers our needs,
at least for the present.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-25 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 but it'd make it safe to use in non-WAL contexts (I think there are
 other places where we know we are going to init the page and so a
 physical read is a waste of time).

 Is there? I can't think of any. Extending a relation doesn't count.

No, but re-using a free page in an index does.  I'm not sure which index
AMs know for sure the page is free, and which have to read it and check,
but I think there's at least some scope for that.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-25 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Andreas,
 So imho pg_compresslog is the correct path forward. The current
 discussion is only about whether we want a more complex pg_compresslog
 and no change to current WAL, or an increased WAL size for a less
 complex implementation.
 Both would be able to compress the WAL to the same archive log size.

 Huh?  As conceived, pg_compresslog does nothing to lower log volume for 
 general purposes, just on-disk storage size for archiving.  It doesn't help 
 us at all with the tremendous amount of log we put out for an OLTP server, 
 for example.

I don't see how what you said refutes what he said.  The sticking point
here is that the patch as-proposed *increases* the log volume before
compression.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] strange buildfarm failures

2007-04-25 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Stefan Kaltenbrunner wrote:
 two of my buildfarm members had different but pretty weird looking
 failures lately:
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quaggadt=2007-04-25%2002:03:03
 and
 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=emudt=2007-04-24%2014:35:02
 
 any ideas on what might causing those ?

 lionfish just failed too:

 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2007-04-25%2005:30:09

And had a similar failure a few days ago.  The curious thing is that
what we get in the postmaster log is

LOG:  server process (PID 23405) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

You would think SIGABRT would come from an assertion failure, but
there's no preceding assertion message in the log.  The other
characteristic of these crashes is that *all* of the failing regression
instances report terminating connection because of crash of another
server process, which suggests strongly that the crash was in an
autovacuum process (if it were bgwriter or stats collector the
postmaster would've said so).  So I think the recent autovac patches
are at fault.  I spent a bit of time trolling for a spot where the code
might abort() without having printed anything, but didn't find one.

If any of the buildfarm owners can get a stack trace from the core dump
of one of these events, it'd be mighty helpful.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] database size estimates

2007-04-25 Thread Gregory Stark

Heikki Linnakangas [EMAIL PROTECTED] writes:

 Francois Deliege wrote:
 Hi,

 I am trying to estimate the size of a table composed of 51754000 rows.
 Each row has 31 attributes: 16 x bit(24) and 15 x bit(8)

 So, the payload should be:
 51754000 x ( 16 x 24 + 15 x 24 ) bits = 3115 MB

 What data types are those exactly? If those 24-bit fields are in fact text,
 varchar, char(x) or other data types that are stored as variable length 
 fields,

And sadly that includes bit() if you're being literal.

As of Postgres 8.1 you can see how much space a column is taking up using the
pg_column_size() function. This won't include alignment padding but will
include the length header for that column.

You can see how much a given row is taking up by passing the entire row to
og_column_size with something like pg_column_size(tab.*)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Fragmentation project

2007-04-25 Thread Josh Berkus

Gustavo,


The pgpool is an interesting approach to this, but I think that the
funcionality of inserting a record at a backend which will be
redirectioned to other and verifying deadlocks under network demands
in acquiring locks on the referenced records/tables in several hosts.
Then, IMO, this may be implemented inside dbms. How Marko wrote, this
is a non-trivial solution...
Really, It could be improved on pgpool to be a process coordinator,
but will need some changes in backend too.
This is a non trivial implementation, but there are several users
waiting for an effective solution for data distributing in a cluster.
These users actually buy commercial solutions or build themselves one.


Yeah, I was just thinking that if you start from scratch you're not 
likely to get very far ... you might look at some of the existing 
partial solutions (pgPoolII, PostgreSQLForest, ExtenDB, etc.) and see if 
you can improve them.


--Josh

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-25 Thread Koichi Suzuki

Hi,

Zeugswetter Andreas ADI SD wrote:
I don't insist the name and the default of the GUC parameter. 
 I'm afraid wal_fullpage_optimization = on (default) makes 
some confusion because the default behavior becomes a bit 
different on WAL itself.


Seems my wal_fullpage_optimization is not a good name if it caused
misinterpretation already :-(

Amount of WAL after 60min. run of DBT-2 benchmark 
wal_add_optimization_info = off (default) 3.13GB

how about wal_fullpage_optimization = on (default)


The meaning of wal_fullpage_optimization = on (default)
would be the same as your wal_add_optimization_info = off (default).
(Reversed name, reversed meaning of the boolean value)

It would be there to *turn off* the (default) WAL full_page
optimization.
For your pg_compresslog it would need to be set to off. 
add_optimization_info sounded like added info about/for some

optimization
which it is not. We turn off an optimization with the flag for the
benefit
of an easier pg_compresslog implementation.


For pg_compresslog to remove full page writes, we need 
wal_add_optimization_info=on.




As already said I would decouple this setting from the part that sets
the removeable full page flag in WAL, and making the recovery able to
skip dummy records. This I would do unconditionally.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq




--
-
Koichi Suzuki

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] autovacuum does not start in HEAD

2007-04-25 Thread ITAGAKI Takahiro
I wrote:
 I found that autovacuum launcher does not launch any workers in HEAD.

The attached autovacuum-fix.patch could fix the problem. I changed
to use 'greater or equal' instead of 'greater' at the decision of
next autovacuum target.

The point was in the resolution of timer; There is a platform that timer
has only a resolution of milliseconds. We initialize adl_next_worker with
current_time in rebuild_database_list(), but we could use again the same
value in do_start_worker(), because there is no measurable difference
in those low-resolution-platforms.


Another attached patch, autovacuum-debug.patch, is just for printf-debug.
I got the following logs without fix -- autovacuum never works.

# SELECT oid, datname FROM pg_database ORDER BY oid;
  oid  |  datname  
---+---
 1 | template1
 11494 | template0
 11495 | postgres
 16384 | bench
(4 rows)

# pgbench bench -s1 -c1 -t10
[with configurations of autovacuum_naptime = 10s and log_min_messages = debug1]

LOG:  do_start_worker skip : 230863399.25, 230863399.25, 
230863409.25
LOG:  rebuild_database_list: db=11495, time=230863404.25
LOG:  rebuild_database_list: db=16384, time=230863409.25
DEBUG:  autovacuum: processing database bench
LOG:  do_start_worker skip : 230863404.25, 230863404.25, 
230863414.25
LOG:  do_start_worker skip : 230863404.25, 230863409.25, 
230863414.25
LOG:  rebuild_database_list: db=11495, time=230863409.25
LOG:  rebuild_database_list: db=16384, time=230863414.25
LOG:  do_start_worker skip : 230863409.25, 230863409.25, 
230863419.25
LOG:  do_start_worker skip : 230863409.25, 230863414.25, 
230863419.25
LOG:  rebuild_database_list: db=11495, time=230863414.25
LOG:  rebuild_database_list: db=16384, time=230863419.25
...
(no autovacuum activities forever)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



autovacuum-debug.patch
Description: Binary data


autovacuum-fix.patch
Description: Binary data

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] strange buildfarm failures

2007-04-25 Thread Alvaro Herrera
Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
  Stefan Kaltenbrunner wrote:
  two of my buildfarm members had different but pretty weird looking
  failures lately:
  http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quaggadt=2007-04-25%2002:03:03
  and
  
  http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=emudt=2007-04-24%2014:35:02
  
  any ideas on what might causing those ?

Just for the record, quagga and emu failures don't seem related to the
report below.  They don't crash; the regression.diffs contains data that
suggests that there may be data corruption of some sort.

INSERT INTO INET_TBL (c, i) VALUES ('192.168.1.2/30', '192.168.1.226');
ERROR:  invalid cidr value: %{

This doesn't seem to make much sense.


  lionfish just failed too:
 
  http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2007-04-25%2005:30:09
 
 And had a similar failure a few days ago.  The curious thing is that
 what we get in the postmaster log is
 
 LOG:  server process (PID 23405) was terminated by signal 6: Aborted
 LOG:  terminating any other active server processes
 
 You would think SIGABRT would come from an assertion failure, but
 there's no preceding assertion message in the log.  The other
 characteristic of these crashes is that *all* of the failing regression
 instances report terminating connection because of crash of another
 server process, which suggests strongly that the crash was in an
 autovacuum process (if it were bgwriter or stats collector the
 postmaster would've said so).  So I think the recent autovac patches
 are at fault.  I spent a bit of time trolling for a spot where the code
 might abort() without having printed anything, but didn't find one.

Hmm.  I kept an eye on the buildfarm for a few days, but saw nothing
that could be connected to autovacuum so I neglected it.

This is the other failure:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2007-04-20%2005:30:14

It shows the same pattern.  I am baffled -- I don't understand how it
can die without reporting the error.

Apparently it crashes rather frequently, so it shouldn't be too
difficult to reproduce on manual runs.  If we could get it to run with a
higher debug level, it might prove helpful to further pinpoint the
problem.

The core file would be much better obviously (first and foremost to
confirm that it's autovacuum that's crashing ... )

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] strange buildfarm failures

2007-04-25 Thread Stefan Kaltenbrunner
Alvaro Herrera wrote:
 Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Stefan Kaltenbrunner wrote:
 two of my buildfarm members had different but pretty weird looking
 failures lately:
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quaggadt=2007-04-25%2002:03:03
 and

 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=emudt=2007-04-24%2014:35:02

 any ideas on what might causing those ?
 
 Just for the record, quagga and emu failures don't seem related to the
 report below.  They don't crash; the regression.diffs contains data that
 suggests that there may be data corruption of some sort.
 
 INSERT INTO INET_TBL (c, i) VALUES ('192.168.1.2/30', '192.168.1.226');
 ERROR:  invalid cidr value: %{
 
 This doesn't seem to make much sense.

yeah on further reflection it looks like the failures from emu and
quagga seem unrelated to the issue lionfish is experiencing

 
 
 lionfish just failed too:
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2007-04-25%2005:30:09
 And had a similar failure a few days ago.  The curious thing is that
 what we get in the postmaster log is

 LOG:  server process (PID 23405) was terminated by signal 6: Aborted
 LOG:  terminating any other active server processes

 You would think SIGABRT would come from an assertion failure, but
 there's no preceding assertion message in the log.  The other
 characteristic of these crashes is that *all* of the failing regression
 instances report terminating connection because of crash of another
 server process, which suggests strongly that the crash was in an
 autovacuum process (if it were bgwriter or stats collector the
 postmaster would've said so).  So I think the recent autovac patches
 are at fault.  I spent a bit of time trolling for a spot where the code
 might abort() without having printed anything, but didn't find one.
 
 Hmm.  I kept an eye on the buildfarm for a few days, but saw nothing
 that could be connected to autovacuum so I neglected it.
 
 This is the other failure:
 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfishdt=2007-04-20%2005:30:14
 
 It shows the same pattern.  I am baffled -- I don't understand how it
 can die without reporting the error.

I should have mentioned that initially - but I think the failure from
2007-04-20 is not related at all.
The failure from 2007-04-20 was very likely caused due to the kernel
running totally out of memory (lionfish is a very resource starved box
at only 48MB of RAM and 128MB of swap at that time - do we have a recent
patch that is increasing memory usage quite a lot?).
I immediatly added another 128MB of swap after that and I don't think
the failure from yesterday is the same (at least there are no kernel
logs that indicate a similiar issue)
 
 Apparently it crashes rather frequently, so it shouldn't be too
 difficult to reproduce on manual runs.  If we could get it to run with a
 higher debug level, it might prove helpful to further pinpoint the
 problem.

a manual run of the buildfarm script takes ~4,5 hours on lionfish ;-)

 
 The core file would be much better obviously (first and foremost to
 confirm that it's autovacuum that's crashing ... )

I will see what I can come up with ...


Stefan

---(end of broadcast)---
TIP 6: explain analyze is your friend