Re: [HACKERS] Bugtraq: Having Fun With PostgreSQL

2007-06-28 Thread Jeremy Drake
On Tue, 26 Jun 2007, Andrew Dunstan wrote:

 Jeremy Drake wrote:

  2. If you cannot tell what process is connecting on a local socket (which
  I suspect you cannot portably),


 See ident_unix() in hba.c.

 It might not be 100% portable but I think it's fairly close for platforms
 that actually have unix sockets.

It looks to me (looking at docs on the various functions used there) that
only Linux supports getting the PID of the connecting process.  The other
various *BSD methods tend only to give the uid and gid, which will not be
helpful if the connection is coming from another backend in the same
cluster.

In the linux case, it looks like one would need to get the client pid, try
to get the PGPROC entry for it, if it exists get the roleid out of that
and allow connections as that role.

For any other case, some sort of painful protocol hack would be in order.
The best way I can see is to see if the client process is owned by the
same user as the database cluster, and if so send an auth request (like
the SCM_CRED one), which would be responded to with the pid and a random
sequence stored in the PGPROC entry.  The server then proves the backend
really is the one it claims to be by looking up the PID's PGPROC entry,
and making sure the token matches.

This is all just thinking out loud, of course...  I have no plans to
implement this in the short-term, but it may be an interesting project in
the future.




-- 
I like to believe that people in the long run are going to do more to
promote peace than our governments.  Indeed, I think that people want
peace so much that one of these days governments had better get out of
the way and let them have it.
-- Dwight D. Eisenhower

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] self defined data type with limit?

2007-06-28 Thread Michael Enke



Heikki Linnakangas wrote:

Michael Enke wrote:


My primary goal is to get quasi numeric ordering on text column, e.g.
1
2
10
Normal order with varchar would be
1
10
2



You don't need to custom type for that. A custom operator class with 
custom comparison operators is enough.




Ok, I tried with ordinary varchar and my own operators/op class. But than:
1) the index is never used (I created it 'with' my opclass)
2) the 'order by' doesn't care about my operator class, it's normal varchar 
sequence.

Regards,
Michael

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-28 Thread Greg Smith

On Thu, 28 Jun 2007, ITAGAKI Takahiro wrote:


Do you need to increase shared_buffers in such case?


If you have something going wild creating dirty buffers with a high usage 
count faster than they are being written to disk, increasing the size of 
the shared_buffers cache can just make the problem worse--now you have an 
ever bigger pile of dirty mess to shovel at checkpoint time.  The existing 
background writers are particularly unsuited to helping out in this 
situation, I think the new planned implementation will be much better.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] self defined data type with limit?

2007-06-28 Thread Tom Lane
Michael Enke [EMAIL PROTECTED] writes:
 Heikki Linnakangas wrote:
 You don't need to custom type for that. A custom operator class with 
 custom comparison operators is enough.

 Ok, I tried with ordinary varchar and my own operators/op class. But than:
 1) the index is never used (I created it 'with' my opclass)
 2) the 'order by' doesn't care about my operator class, it's normal varchar 
 sequence.

Yeah, because ORDER BY is still going to look to the default varchar
opclass to determine what the ordering is supposed to be.  Assuming
your custom less-than operator is named , you'd have to write
ORDER BY col USING 
to get this sort order.

If you want ORDER BY on the column to default to your custom ordering,
the only way is a distinct datatype that you can make your custom
opclass be the default for.

The domain idea might work, I'm not totally sure.  Defining
functions/operators on a domain is a bit ticklish because anything but
an exact match will get smashed to the domain base type and thus not
match your function.  I think though that in this case you might get
away with it because it would be an exact match --- it's worth a try
anyway.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] SetBufferCommitInfoNeedsSave and race conditions

2007-06-28 Thread Pavan Deolasee

During one of HOT stress tests, an asserition failed at tqual.c:1178
in HeapTupleSatisfiesVacuum(). The assertion failure looked really
strange because the assertion checks for HEAP_XMAX_COMMITTED
which we set just couple of lines above. I inspected the core dump
and found that the flag is *set* properly. That was even more strange.
I confirmed that we are holding a SHARE lock on the buffer as we
do at several other places while checking/setting the infomask bits.

We had a theory that somebody clears the flag after the asserting
process sets it and before it checks it. The other process also sets it
back before core dump is generated because core shows the flag
being set properly. The chances of this happening are very slim and
can further be ruled out because I carefully looked at the code and found
that the flag can only be cleared holding an exclusive lock on the buffer.

So we suspected an interaction between multiple processes each holding
a SHARE lock and setting/checking different bits in the infomask and
we could theoritically say that such interaction can potentially lead to
missing hint bit updates. I can think of the following:

Process P1 is setting bit 0 and process P2 setting bit 1 of an integer
'x' whose current value is say 0.

P1 P2
  load x in register A load x in register B
  A = A | 0x0001   B = B | 0x0002
  Store A to x
  Store B to x

At the end, P1's update is missing! If P1's further processing is based
on the bit-check, it would go completely wrong.

This easily explains the assertion and core dump analysis. We can
possibly remove that assertion and any other similar assertions
(unless someone can find a hole in the above analysis). But I am
more worried about other similar race conditions where hint bit updates
go missing and thus causing severe MVCC failures.

Btw, to validate the race condition I quickly wrote a simple C
program which attaches to a share memory. Each instance of the
process sets/clears and checks a separate bit. It clearly demonstrates
the danger. The code is attached. Compile and run with an integer
argument to tell which bit to set/reset.


Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


test.c
Description: Binary data

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] SetBufferCommitInfoNeedsSave and race conditions

2007-06-28 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes:
 So we suspected an interaction between multiple processes each holding
 a SHARE lock and setting/checking different bits in the infomask and
 we could theoritically say that such interaction can potentially lead to
 missing hint bit updates.

Yeah.  This is in fact something that's been foreseen, but I guess it
didn't occur to anyone that those Asserts would fail.  I concur with
removing them.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] write past chunk end in ExprContext / to_char

2007-06-28 Thread Patrick Welche
With today's CVS code (originally noticed with 8.2beta3), on a PC where
INT_MAX=0x7FFF=2147483647

postgres=# select version();
 version
 
-
 PostgreSQL 8.3devel on i386-unknown-netbsdelf4.99.20, compiled by GCC gcc 
(GCC) 4.1.2 20070110 prerelease (NetBSD nb1 20070603)
(1 row)

postgres=# select to_char(2147483647,'999,999,999');
   to_char
--
  ###,###,###
(1 row)

postgres=# select to_char(2147483648,'999,999,999');
WARNING:  detected write past chunk end in ExprContext 0x845509c
WARNING:  detected write past chunk end in ExprContext 0x845509c
   to_char
--
  ###,###,###
(1 row)

postgres=# select to_char(2147483648,'99,999,999');
   to_char   
-
  ##,###,###
(1 row)

postgres=# select to_char(2147483648,'9,999,999,999');
to_char 

  2,147,483,648
(1 row)

postgres=# select to_char(1234567890123,'999,999,999,999');
WARNING:  detected write past chunk end in ExprContext 0x845509c
WARNING:  detected write past chunk end in ExprContext 0x845509c
 to_char  
--
  ###,###,###,###
(1 row)

postgres=# select to_char(1234567890123,'99,999,999,999');
 to_char 
-
  ##,###,###,###
(1 row)

So strangely, to get the worrying WARNING, I seem to need INT_MAX with
a format string with 1 less positions than necessary - no wonder I
seemed to only see it randomly...

Thoughts?

Cheers,

Patrick

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] SetBufferCommitInfoNeedsSave and race conditions

2007-06-28 Thread Heikki Linnakangas

Pavan Deolasee wrote:

During one of HOT stress tests, an asserition failed at tqual.c:1178
in HeapTupleSatisfiesVacuum(). The assertion failure looked really
strange because the assertion checks for HEAP_XMAX_COMMITTED
which we set just couple of lines above. I inspected the core dump
and found that the flag is *set* properly. That was even more strange.
I confirmed that we are holding a SHARE lock on the buffer as we
do at several other places while checking/setting the infomask bits.

We had a theory that somebody clears the flag after the asserting
process sets it and before it checks it. The other process also sets it
back before core dump is generated because core shows the flag
being set properly. The chances of this happening are very slim and
can further be ruled out because I carefully looked at the code and found
that the flag can only be cleared holding an exclusive lock on the buffer.

So we suspected an interaction between multiple processes each holding
a SHARE lock and setting/checking different bits in the infomask and
we could theoritically say that such interaction can potentially lead to
missing hint bit updates. I can think of the following:


FWIW, this can be reproduced by single-stepping with a debugger:

First, you need a tuple that's committed dead but no hint bits have been 
set:


BEGIN; truncate foo; INSERT INTO foo values (1,'foo'); DELETE FROM Foo; 
commit;


In one backend, set a breakpoint to HeapTupleSatisfiesMVCC lin 953 where 
it sets the XMIN_COMMITED hint bit:


 else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
 {
 tuple-t_infomask |= HEAP_XMIN_COMMITTED;
 SetBufferCommitInfoNeedsSave(buffer);
 }

Issue a SELECT * FROM foo, and step a single instruction that fetches 
the infomask field from memory to a register.


Open another backend, set a breakpoint to HeapTupleSatisfiesVacuum line 
1178:


 else if (TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
 {
 tuple-t_infomask |= HEAP_XMAX_COMMITTED;
 SetBufferCommitInfoNeedsSave(buffer);
 }
 else
 {
 /*
  * Not in Progress, Not Committed, so either Aborted or 
crashed

  */
 tuple-t_infomask |= HEAP_XMAX_INVALID;
 SetBufferCommitInfoNeedsSave(buffer);
 return HEAPTUPLE_LIVE;
 }
 /* Should only get here if we set XMAX_COMMITTED */
 Assert(tuple-t_infomask  HEAP_XMAX_COMMITTED);
 }

And issue VACUUM foo. It'll stop on that breakpoint.

Let the first backend continue. It will clear the XMAX_COMMITTED field.

Now let the 2nd backend to continue and you get an assertion failure.


AFAICS, we can just simply remove the assertion. But is there any 
codepaths that assume that after calling HeapTupleSatisfiesSnapshot, all 
appropriate hint bits are set?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] write past chunk end in ExprContext / to_char

2007-06-28 Thread Tom Lane
Patrick Welche [EMAIL PROTECTED] writes:
 With today's CVS code (originally noticed with 8.2beta3), on a PC where
 INT_MAX=0x7FFF=2147483647

 postgres=# select to_char(2147483648,'999,999,999');
 WARNING:  detected write past chunk end in ExprContext 0x845509c
 WARNING:  detected write past chunk end in ExprContext 0x845509c

Yech ... it's scribbling on the output of int8out, which is bad enough,
but it's assuming that buffer will be long enough when it demonstrably
isn't.

Some days I think we ought to throw out formatting.c and rewrite it from
scratch; it's probably the most poorly-coded module in all of Postgres.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] SetBufferCommitInfoNeedsSave and race conditions

2007-06-28 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 AFAICS, we can just simply remove the assertion. But is there any 
 codepaths that assume that after calling HeapTupleSatisfiesSnapshot, all 
 appropriate hint bits are set?

There had better not be, since we are going to postpone setting hint
bits for recently-committed transactions as part of the async-commit
patch.

A quick grep suggests that VACUUM FULL might be at risk here.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SetBufferCommitInfoNeedsSave and race conditions

2007-06-28 Thread Alvaro Herrera
Tom Lane escribió:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  AFAICS, we can just simply remove the assertion. But is there any 
  codepaths that assume that after calling HeapTupleSatisfiesSnapshot, all 
  appropriate hint bits are set?
 
 There had better not be, since we are going to postpone setting hint
 bits for recently-committed transactions as part of the async-commit
 patch.
 
 A quick grep suggests that VACUUM FULL might be at risk here.

That particular case seems easily fixed since VACUUM FULL must hold an
exclusive lock; and we can forcibly set sync commit for VACUUM FULL.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] SetBufferCommitInfoNeedsSave and race conditions

2007-06-28 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane escribió:
 A quick grep suggests that VACUUM FULL might be at risk here.

 That particular case seems easily fixed since VACUUM FULL must hold an
 exclusive lock; and we can forcibly set sync commit for VACUUM FULL.

Uh, that wouldn't help.  The problem is that if VACUUM FULL is *looking
at* a recently-committed tuple, tqual.c might decide it can't set the
hint bit yet because it's not certain the commit record for that other
transaction is flushed.

We could possibly hack things so that inside a VACUUM FULL (maybe plain
vacuum too?), we prefer flushing xlog to leaving hint bits unset.
That's likely to be messy though.

Probably a cleaner and more robust answer is to make VACUUM FULL call
tqual.c again in the places where it currently assumes it can look
directly at the hint bits.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] write past chunk end in ExprContext / to_char

2007-06-28 Thread imad

This is the problematic part in formatting.c, function dch_time.

int siz = 
strlen(tmtcTzn(tmtc));

if (arg == DCH_TZ)
strcpy(inout, tmtcTzn(tmtc));
else
{
char   *p = palloc(siz);

strcpy(p, tmtcTzn(tmtc));
strcpy(inout, str_tolower(p));
pfree(p);
}
return siz;


here, doing a palloc with siz+1 solves the issue but following /
making the convention, pstrdup should be used instead which is
specifically written for this purpose.

Probably too small a change for a patch ?


--Imad
www.EnterpriseDB.com


On 6/29/07, Tom Lane [EMAIL PROTECTED] wrote:

Patrick Welche [EMAIL PROTECTED] writes:
 With today's CVS code (originally noticed with 8.2beta3), on a PC where
 INT_MAX=0x7FFF=2147483647

 postgres=# select to_char(2147483648,'999,999,999');
 WARNING:  detected write past chunk end in ExprContext 0x845509c
 WARNING:  detected write past chunk end in ExprContext 0x845509c

Yech ... it's scribbling on the output of int8out, which is bad enough,
but it's assuming that buffer will be long enough when it demonstrably
isn't.

Some days I think we ought to throw out formatting.c and rewrite it from
scratch; it's probably the most poorly-coded module in all of Postgres.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] lazy vacuum sleeps with exclusive lock on table

2007-06-28 Thread Alvaro Herrera
Hi,

I noticed that lazy vacuum acquires an exclusive lock at the end, to be
able to truncate the table.  This is not a surprise.  If it cannot
acquire the lock, it simply skips truncating the table and goes on with
life.

However, what's problematic is that if a non-zero cost delay has been
set, it will happily take naps while determining what to truncate :-(
This seems a bad idea.  It also may explain why some people is seeing
autovacuum blocking other processes.  It also readily explains why this
is so when there are no non-granted locks for autovacuum.

Comments?  I think we should remove the sleep in the truncate phase.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SetBufferCommitInfoNeedsSave and race conditions

2007-06-28 Thread Simon Riggs
On Thu, 2007-06-28 at 15:16 -0400, Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  AFAICS, we can just simply remove the assertion. But is there any 
  codepaths that assume that after calling HeapTupleSatisfiesSnapshot, all 
  appropriate hint bits are set?
 
 There had better not be, since we are going to postpone setting hint
 bits for recently-committed transactions as part of the async-commit
 patch.
 
 A quick grep suggests that VACUUM FULL might be at risk here.

No we're clear: I caught that issue specifically for VACUUM FULL fairly
early on. VF assumes all hint bits are set after the first scan, so we
flush prior to the scan to ensure its safe to set the hint bits. There
are no concurrent hint bit setters, so we are good.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] lazy vacuum sleeps with exclusive lock on table

2007-06-28 Thread Simon Riggs
On Thu, 2007-06-28 at 17:16 -0400, Alvaro Herrera wrote:

 I noticed that lazy vacuum acquires an exclusive lock at the end, to be
 able to truncate the table.  This is not a surprise.  If it cannot
 acquire the lock, it simply skips truncating the table and goes on with
 life.
 
 However, what's problematic is that if a non-zero cost delay has been
 set, it will happily take naps while determining what to truncate :-(
 This seems a bad idea.  It also may explain why some people is seeing
 autovacuum blocking other processes.  It also readily explains why this
 is so when there are no non-granted locks for autovacuum.
 
 Comments?  I think we should remove the sleep in the truncate phase.

Do we have any timings for that lock-out? Even with a largish sleep
delay, I can't think it's locked out for that long.

Seems like VACUUM shouldn't try just once to get the lock. It could be
very frustrating to wait hours for a VACUUM to finish, only to find a
small query prevents file truncation. That's just too random. It should
retry as many times as there are blocks for it to truncate i.e. it tries
harder to truncate the more it needs to do so.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SetBufferCommitInfoNeedsSave and race conditions

2007-06-28 Thread Simon Riggs
On Thu, 2007-06-28 at 15:29 -0400, Alvaro Herrera wrote:
 Tom Lane escribió:
  Heikki Linnakangas [EMAIL PROTECTED] writes:
   AFAICS, we can just simply remove the assertion. But is there any 
   codepaths that assume that after calling HeapTupleSatisfiesSnapshot, all 
   appropriate hint bits are set?
  
  There had better not be, since we are going to postpone setting hint
  bits for recently-committed transactions as part of the async-commit
  patch.
  
  A quick grep suggests that VACUUM FULL might be at risk here.
 
 That particular case seems easily fixed since VACUUM FULL must hold an
 exclusive lock; and we can forcibly set sync commit for VACUUM FULL.

Exactly what it does!

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] lazy vacuum sleeps with exclusive lock on table

2007-06-28 Thread Alvaro Herrera
Simon Riggs wrote:
 On Thu, 2007-06-28 at 17:16 -0400, Alvaro Herrera wrote:
 
  I noticed that lazy vacuum acquires an exclusive lock at the end, to be
  able to truncate the table.  This is not a surprise.  If it cannot
  acquire the lock, it simply skips truncating the table and goes on with
  life.
  
  However, what's problematic is that if a non-zero cost delay has been
  set, it will happily take naps while determining what to truncate :-(
  This seems a bad idea.  It also may explain why some people is seeing
  autovacuum blocking other processes.  It also readily explains why this
  is so when there are no non-granted locks for autovacuum.
  
  Comments?  I think we should remove the sleep in the truncate phase.
 
 Do we have any timings for that lock-out? Even with a largish sleep
 delay, I can't think it's locked out for that long.

I created a table like this:

create table foo (a int);
begin;
insert into foo select * from generate_series(1, 100);
rollback;

It took it 14 seconds to truncate with 50ms vacuum delay.

What I'm requesting here is that the sleep in count_nondeletable_pages()
be removed and that change backpatched to 8.2 and 8.1.

 Seems like VACUUM shouldn't try just once to get the lock. It could be
 very frustrating to wait hours for a VACUUM to finish, only to find a
 small query prevents file truncation. That's just too random. It should
 retry as many times as there are blocks for it to truncate i.e. it tries
 harder to truncate the more it needs to do so.

We don't know how many pages we can truncate until after we have
acquired the exclusive lock and examined the pages in question, scanning
backwards from the end of the table.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SetBufferCommitInfoNeedsSave and race conditions

2007-06-28 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Thu, 2007-06-28 at 15:16 -0400, Tom Lane wrote:
 A quick grep suggests that VACUUM FULL might be at risk here.

 No we're clear: I caught that issue specifically for VACUUM FULL fairly
 early on. VF assumes all hint bits are set after the first scan, so we
 flush prior to the scan to ensure its safe to set the hint bits.

Flush what prior to the scan?

The methodology I suggested earlier (involving tracking LSN only at the
level of pg_clog pages) isn't going to make that work, unless you
somehow force the XID counter forward to the next page boundary.
It might be that that level of tracking is too coarse anyway, since
it essentially says that you can't hint any transaction until the
next 32K-transaction boundary is reached.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] write past chunk end in ExprContext / to_char

2007-06-28 Thread Tom Lane
imad [EMAIL PROTECTED] writes:
 This is the problematic part in formatting.c, function dch_time.
 intsiz = strlen(tmtcTzn(tmtc));

 if (arg == DCH_TZ)
 strcpy(inout, tmtcTzn(tmtc));
 else
 {
 char   *p = palloc(siz);

 strcpy(p, tmtcTzn(tmtc));
 strcpy(inout, str_tolower(p));
 pfree(p);
 }
 return siz;

Hmm.  That was not the buffer overrun I was looking at, but it sure
looks like another one :-(.  Thanks for spotting it!

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] lazy vacuum sleeps with exclusive lock on table

2007-06-28 Thread ITAGAKI Takahiro

Alvaro Herrera [EMAIL PROTECTED] wrote:

 What I'm requesting here is that the sleep in count_nondeletable_pages()
 be removed and that change backpatched to 8.2 and 8.1.

Agreed. We'd better to shorten the exclusive locking as far as possible.

 We don't know how many pages we can truncate until after we have
 acquired the exclusive lock and examined the pages in question, scanning
 backwards from the end of the table.

But many OSes do not care about read-ahead in backward scanning. I have
a test result that shows truncating a large part of table takes very long
time. Is it better better to change it to forward scanning? For example,
starting with (tail of the file - 16MB) and scanning 16MB of segment forward
to decide the position for truncation. If we can truncate all of the segment,
do recheck from (tail of the file - 32MB) and repeat.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] lazy vacuum sleeps with exclusive lock on table

2007-06-28 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 What I'm requesting here is that the sleep in count_nondeletable_pages()
 be removed and that change backpatched to 8.2 and 8.1.

Are you sure that that is, and always will be, the only sleep in that
part of the code path?

Seems like it might be better to adjust the cost_delay parameters after
we acquire exclusive lock.  I'm not sure dialing them all the way back
to zero is a good idea, but certainly we could make them more
aggressive.

 Seems like VACUUM shouldn't try just once to get the lock.

 We don't know how many pages we can truncate until after we have
 acquired the exclusive lock and examined the pages in question, scanning
 backwards from the end of the table.

We could estimate this during the forward scan.  The backward scan is
still necessary to see if anyone has inserted tuples after we looked at
a page, but certainly a page that VACUUM failed to empty will still be
nonempty, so we can determine an upper bound on how much might be
truncatable.

However, that's an orthogonal concern and should probably be discussed
separately.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] AutoVacuum Behaviour Question

2007-06-28 Thread Alvaro Herrera

Bruce, please make sure to keep the list copied on replies.  I think
there is an important bug here and I don't want it to get lost just
because I lose track of it.  I'm also crossposting to pgsql-hackers.

Bruce McAlister wrote:

 okidoki, I tried this:
 
 blueface-crm=# select relname, nspname from pg_class join pg_namespace
 on (relnamespace = pg_namespace.oid) where
 pg_is_other_temp_schema(relnamespace);
  relname  |  nspname
 --+
  temp4295 | pg_temp_63
 (1 row)
 
 blueface-crm=# select pg_stat_get_backend_pid(63);
  pg_stat_get_backend_pid
 -
 6661
 (1 row)
 
 blueface-crm=# select datname, client_addr, client_port from
 pg_stat_activity where procpid = '6661';
 datname | client_addr | client_port
 +-+-
  whitelabel-ibb | 10.6.0.181  |1587
 (1 row)
 
 Is that correct? If it is then I'm really confused, how can a connection
 to the whitelabel-ibb database create temporary tables in the
 blueface-crm database?

Well, it certainly seems like this shouldn't be happening.  Maybe the
table belonged to a session that crashed, but the pg_class entry has not
been cleaned up -- possibly because that backend has not connected to
that particular database.

Maybe autovacuum itself could do something about cleaning up this kind
of stuff on sight (-- dropping temp tables belonging to sessions that
crash).  I'm not sure though.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] How do we create the releases?

2007-06-28 Thread Bruce Momjian
Robert Treat wrote:
 And while we're talking about things that suck wrt packaging, I noticed it's 
 now been over a year since I first complained about the stable snapshots in 
 our ftp directory being outdated 
 (http://www.postgresql.org/ftp/stable_snapshot/), if no one is going to fix 
 that, can we remove them? 

They look current now.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] lazy vacuum sleeps with exclusive lock on table

2007-06-28 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  What I'm requesting here is that the sleep in count_nondeletable_pages()
  be removed and that change backpatched to 8.2 and 8.1.
 
 Are you sure that that is, and always will be, the only sleep in that
 part of the code path?

It is currently, as far as I can see, the only sleep.  I think we could
backpatch the removal of that call, and consider changing the
cost_delay parameters when we acquire the exclusive lock in HEAD.

I haven't tried with crazy features like gist indexes though.  Maybe
there's more sleep calls in the vacuum code for those.  But from what I
can gather, all the index clean up is done before trying to truncate the
relation so we should be safe.

Also, we don't release the exclusive lock; we hold on it till commit.
Maybe it would be a good idea to release it as soon as we're done with
it.

 Seems like it might be better to adjust the cost_delay parameters after
 we acquire exclusive lock.  I'm not sure dialing them all the way back
 to zero is a good idea, but certainly we could make them more
 aggressive.

Hmm.  The less we keep the exclusive lock, the better.  I think an extra
bit of I/O for a short moment is warranted in this case -- better than
holding a lock that lots of processes could be waiting on.

  Seems like VACUUM shouldn't try just once to get the lock.
 
  We don't know how many pages we can truncate until after we have
  acquired the exclusive lock and examined the pages in question, scanning
  backwards from the end of the table.
 
 We could estimate this during the forward scan.  The backward scan is
 still necessary to see if anyone has inserted tuples after we looked at
 a page, but certainly a page that VACUUM failed to empty will still be
 nonempty, so we can determine an upper bound on how much might be
 truncatable.
 
 However, that's an orthogonal concern and should probably be discussed
 separately.

Right (and furthermore we shouldn't backpatch it).

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
This is a foot just waiting to be shot(Andrew Dunstan)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] AutoVacuum Behaviour Question

2007-06-28 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Well, it certainly seems like this shouldn't be happening.  Maybe the
 table belonged to a session that crashed, but the pg_class entry has not
 been cleaned up -- possibly because that backend has not connected to
 that particular database.

Hm --- a crash would mean that the temp table would remain until some
other session (a) connected to the same database (b) using the same
BackendId (sinval slot number), and (c) decided to create some temp
tables of its own.  So indeed it's not implausible that the table could
hang around for a long time, especially if you were unlucky enough that
the original creator had been using a very high BackendId slot.  (Which
pg_temp schema is this table attached to, anyway?)

 Maybe autovacuum itself could do something about cleaning up this kind
 of stuff on sight (-- dropping temp tables belonging to sessions that
 crash).  I'm not sure though.

Yeah, we had better investigate some way to clean them up.  It was never
obvious before that it mattered to get rid of orphan temp tables, but I
guess it does.

Another possibility is just to ignore temp tables while computing
datvacuumxid.  A temp table that survives for  2G transactions is going
to be trouble, but I'm not sure there's anything we can usefully do
about it anyway --- certainly autovacuum has no power to fix it.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] write past chunk end in ExprContext / to_char

2007-06-28 Thread Bruce Momjian
Tom Lane wrote:
 Patrick Welche [EMAIL PROTECTED] writes:
  With today's CVS code (originally noticed with 8.2beta3), on a PC where
  INT_MAX=0x7FFF=2147483647
 
  postgres=# select to_char(2147483648,'999,999,999');
  WARNING:  detected write past chunk end in ExprContext 0x845509c
  WARNING:  detected write past chunk end in ExprContext 0x845509c
 
 Yech ... it's scribbling on the output of int8out, which is bad enough,
 but it's assuming that buffer will be long enough when it demonstrably
 isn't.
 
 Some days I think we ought to throw out formatting.c and rewrite it from
 scratch; it's probably the most poorly-coded module in all of Postgres.

Agreed from personal experience.  I am in there wacking it around it
seems every release.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [GENERAL] AutoVacuum Behaviour Question

2007-06-28 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Well, it certainly seems like this shouldn't be happening.  Maybe the
  table belonged to a session that crashed, but the pg_class entry has not
  been cleaned up -- possibly because that backend has not connected to
  that particular database.
 
 Hm --- a crash would mean that the temp table would remain until some
 other session (a) connected to the same database (b) using the same
 BackendId (sinval slot number), and (c) decided to create some temp
 tables of its own.  So indeed it's not implausible that the table could
 hang around for a long time, especially if you were unlucky enough that
 the original creator had been using a very high BackendId slot.  (Which
 pg_temp schema is this table attached to, anyway?)

It's pg_temp_63.  Backend 63 is running in another database.  It seems
perfectly possible that a backend connects to database A, creates a temp
table, crashes, then connects to database B after restart and then keeps
running there forever :-(

  Maybe autovacuum itself could do something about cleaning up this kind
  of stuff on sight (-- dropping temp tables belonging to sessions that
  crash).  I'm not sure though.
 
 Yeah, we had better investigate some way to clean them up.  It was never
 obvious before that it mattered to get rid of orphan temp tables, but I
 guess it does.

Would it be enough to delete the tuple from pg_class?  I guess that will
leave behind the tuples in pg_attribute etc, but I don't see another way
to drop it ...  Maybe UPDATE to move it to the local temp schema and
then DROP it?

Or maybe it works to do DROP TABLE pg_temp_63.temp2394 as superuser ...?
I haven't tried.

 Another possibility is just to ignore temp tables while computing
 datvacuumxid.  A temp table that survives for  2G transactions is going
 to be trouble, but I'm not sure there's anything we can usefully do
 about it anyway --- certainly autovacuum has no power to fix it.

Yes, I was going to suggest that, though it doesn't seem right.

-- 
Alvaro Herrera   Valdivia, Chile   ICBM: S 39º 49' 18.1, W 73º 13' 56.4
Tiene valor aquel que admite que es un cobarde (Fernandel)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] AutoVacuum Behaviour Question

2007-06-28 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Yeah, we had better investigate some way to clean them up.  It was never
 obvious before that it mattered to get rid of orphan temp tables, but I
 guess it does.

 Would it be enough to delete the tuple from pg_class?

No, you need a full DROP.  I don't see that that's harder than removing
only the pg_class tuple --- the problem in either case is to be sure
it's OK.  In particular, how to avoid a race condition against an
incoming backend that adopts that BackendId?  Worst-case, you could be
deleting a temp table he just made.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] AutoVacuum Behaviour Question

2007-06-28 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Yeah, we had better investigate some way to clean them up.  It was never
  obvious before that it mattered to get rid of orphan temp tables, but I
  guess it does.
 
  Would it be enough to delete the tuple from pg_class?
 
 No, you need a full DROP.  I don't see that that's harder than removing
 only the pg_class tuple --- the problem in either case is to be sure
 it's OK.  In particular, how to avoid a race condition against an
 incoming backend that adopts that BackendId?  Worst-case, you could be
 deleting a temp table he just made.

Oh, I was just thinking in way for Bruce to get out of his current
situation.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] AutoVacuum Behaviour Question

2007-06-28 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Oh, I was just thinking in way for Bruce to get out of his current
 situation.

Oh, for that a manual drop table as superuser should work fine.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCHES] Doc update for pg_start_backup

2007-06-28 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Added a note to the docs that pg_start_backup can take a long time to 
 finish now that we spread out checkpoints:

I was starting to wordsmith this, and then wondered whether it's not
just a stupid idea for pg_start_backup to act that way.  The reason
you're doing it is to take a base backup, right?  What are you going
to take the base backup with?  I do not offhand know of any backup
tools that don't suck major amounts of I/O bandwidth.  That being
the case, you're simply not going to schedule the operation during
full-load periods.  And that leads to the conclusion that
pg_start_backup should just use CHECKPOINT_IMMEDIATE and not slow
you down.

Thoughts?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings