Re: [HACKERS] pg_stop_backup wait bug fix

2008-12-03 Thread Heikki Linnakangas

Fujii Masao wrote:

On Wed, Dec 3, 2008 at 5:13 AM, Heikki Linnakangas
[EMAIL PROTECTED] wrote:

Agreed, should use XLByteToPrevSeg. But I wonder if we can just replace the
current XLByteToSeg call with XLByteToPrevSeg? That would offset the return
value of the function by one byte as well, as well as the value printed to
the backup history file. In fact, I think the original patch got that wrong;
it would return the location of the *beginning* of the last xlog file.


You're right. As you say, the value (stopxlogfilename) printed to the backup
history file is wrong. But, since the value is not used fortunately,
any troubles
have not come up. So, I think that we can just replace them.


Changing the return value doesn't seem like a good idea. If nothing 
else, it would be complicated to explain what it returns. I committed a 
patch that changes the waiting behavior, but not the return value or 
what's written into the backup label file,



I also noticed that the 2nd BackupHistoryFileName call in that function is
useless; histfilepath variable is already filled in earlier.


Somewhat confusingly, BackupHistoryFileName is called only once. Isn't 1st
(which probably you thought) BackupHistoryFilePath?


Ouch, you're right. That's subtle.


In order to prevent
confusion, we should add new local variable (histfilename) for the backup
history file name?


Agreed. I included that in the patch.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions and temp tables

2008-12-03 Thread Heikki Linnakangas

Emmanuel Cecchet wrote:
I think that the Assert in is_temp_rel(Oid) in tablecmds.c should be 
replaced by if (on_commits == NULL) return false;
As the use case below shows, a regular table can be created and hold a 
LOCKTAG_RELATION lock that will trigger the call to is_temp_rel in 
is_preparable_locktag. The assert will break if no temp table was accessed.


Yes, you're right.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions and temp tables

2008-12-03 Thread Heikki Linnakangas

Emmanuel Cecchet wrote:
There is a problem with temp tables with on delete rows that are created 
inside a transaction.
Take the 2pc_on_delete_rows_transaction.sql test case and change the 
creation statement, instead of

create temp table foo(x int) on commit delete rows;
try
create temp table foo(x serial primary key) on commit delete rows;

The test will fail. It looks like the onCommit field is not properly 
updated when serial or primary key is used in that context. I did not 
figure out why.


A serial column uses a sequence behind the scenes.

Hmm. Seems like we would need to treat sequences and indexes the same as 
tables with ON COMMIT DELETE ROWS, i.e release the locks early and don't 
error out.


All in all, this is getting pretty messy. My patch felt a bit hackish to 
begin with, and having to add special cases for sequences and indexes 
would make it even more so. And what about temporary views? I'm starting 
to feel that instead of special-casing temp relations, we need to move 
into the opposite direction and make temp relations more like regular 
relations. Unfortunately, that's not going to happen in the 8.4 
timeframe :-(. Let's try the other approach in 8.5.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup wait bug fix

2008-12-03 Thread Ibrar Ahmed
Hi,

I have looked at the patch and it looks OK to me. BTW I am not too
much familiar with this area of code, so I am not at the position to
argue that patch -:) . I haven't found an easy way to test the patch.

On Wed, Dec 3, 2008 at 1:24 PM, Heikki Linnakangas
[EMAIL PROTECTED] wrote:
 Fujii Masao wrote:

 On Wed, Dec 3, 2008 at 5:13 AM, Heikki Linnakangas
 [EMAIL PROTECTED] wrote:

 Agreed, should use XLByteToPrevSeg. But I wonder if we can just replace
 the
 current XLByteToSeg call with XLByteToPrevSeg? That would offset the
 return
 value of the function by one byte as well, as well as the value printed
 to
 the backup history file. In fact, I think the original patch got that
 wrong;
 it would return the location of the *beginning* of the last xlog file.

 You're right. As you say, the value (stopxlogfilename) printed to the
 backup
 history file is wrong. But, since the value is not used fortunately,
 any troubles
 have not come up. So, I think that we can just replace them.

 Changing the return value doesn't seem like a good idea. If nothing else, it
 would be complicated to explain what it returns. I committed a patch that
 changes the waiting behavior, but not the return value or what's written
 into the backup label file,

 I also noticed that the 2nd BackupHistoryFileName call in that function
 is
 useless; histfilepath variable is already filled in earlier.

 Somewhat confusingly, BackupHistoryFileName is called only once. Isn't 1st
 (which probably you thought) BackupHistoryFilePath?

 Ouch, you're right. That's subtle.

 In order to prevent
 confusion, we should add new local variable (histfilename) for the backup
 history file name?

 Agreed. I included that in the patch.

 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
   Ibrar Ahmed
   EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2008-12-03 Thread Greg Stark



On 3 Dec 2008, at 06:57 AM, Heikki Linnakangas [EMAIL PROTECTED] 
 wrote:



Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
If *that* is a use case we're interested in, the incoming tuples  
could be accumulated in backend-private memory, and inserted into  
the index at commit. That would be a lot simpler, with no need to  
worry about concurrent inserts or vacuums.
Doesn't work --- the index would yield wrong answers for later  
queries

in the same transaction.


Queries would still need to check the backend-private list.



More to the point -- at least if I'm guessing right about tom's  
thoughts --queries would still have to check the heap. That is the  
backend private list would just be a proxy for buffered *index* tuples.


If we do this though it would be really nice to do it at a higher  
level than the indexam. If we could do it for any indexam that  
provides a kind of bulk insert method that would be great.


I'm just not sure how to support all the indexable operators for the  
various indexams on the local buffered list.


Incidentally buffering btree index inserts was originally Heikki's idea.



--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_all_tables vs NULLs

2008-12-03 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 I've noticed that pg_stat_all_tables returns NULL for idx_scan and
 idx_tup_fetch if there are no indexes present on a table.
 
 Is this actually intended, or is that something that should be fixed?
 
 Hmm.  I suspect it's an implementation artifact rather than something
 that was consciously chosen, but on reflection it doesn't seem like a
 bad thing.  If we just COALESCE'd it to zero (which I assume is what
 you have in mind) then there would be no distinction in the view
 between you have no indexes and there are indexes but they aren't
 being used.

But does it make sense to look for that information in pg_stat_*_tables,
really? If you want to know if an index exists for a table, you'd
normally go look in the system tables, not the statistics views, I think.


 I'd vote to leave it alone, I think.

I can go for that as well though. I'd say Let's document it instead
then, but it seems the stats views documentation is very short on what
actually goes in the fields. But I guess we could just add a (NULL if
no indexes are present) to that?

In the long term it might be worthwhile to rewrite that section of the
docs to focus more on the stats views (giving each it's own section with
more information bout it than just a list of fields) and less on the
underlying implementation functions. But that's a different day ;-)


//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] maintenance memory vs autovac

2008-12-03 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 Greg Stark wrote:
 One concern I have about this is people asking how come when I
 runvacuum manually it takes x minutes but when autovacuum runs it it
 tale 5x minutes?
 
 As long as the default is the same, people would get at least an initial
 clue that it might have something to do with them changing a
 configuration parameter...
 
 It seems like mostly a confusion-generator to me.  Is there any actual
 evidence that autovac should use a different maintenance_work_mem than
 other processes?

The use-case that made me think of that is one with lots of autovac
workers in a system with lots of small tables in different databases.

Turns out I read the documentation for autovac wrong. I understood that
if I wanted it to look at 1000 databases at once, I needed
autovac_workers at 1000. Talked a bit offlist with Alvaro and realized
that's not what it is, but that the documentation is a bit unclear on
that - will work on fixing that.

Which means there's probably no real use-case for lots of autovac
workers that each needs only a little maint_work_mem, in which case
having such an extra parameter would become unnecessary.

//Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] maintenance memory vs autovac

2008-12-03 Thread Magnus Hagander
Guillaume Smet wrote:
 On Wed, Dec 3, 2008 at 2:00 AM, Tom Lane [EMAIL PROTECTED] wrote:
 It seems like mostly a confusion-generator to me.  Is there any actual
 evidence that autovac should use a different maintenance_work_mem than
 other processes?
 
 IMHO, the point is that we were used to consider the
 maintenance_work_mem as a one process at a time thing. Even if it's
 not really true, we usually didn't do maintenance task on a concurrent
 basis.
 The autovacuum workers change that and make it a default behaviour (as
 we can have 3*maintenance_work_mem by default).

It's still one per process, it's just that autovac uses more than one
process. It's probably worthwhile to add a note about the effects of
autovacuum around the documentation of maintenance_work_mem, though.

//Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroring out on parser conflicts

2008-12-03 Thread Greg Stark



On 3 Dec 2008, at 03:32 AM, Tom Lane [EMAIL PROTECTED] wrote:


Bruce Momjian [EMAIL PROTECTED] writes:

FYI, this is going to make it hard for developers to test CVS changes
until they get their grammar cleaned up;  perhaps add a comment on  
how

to disable the check?


Well, the point is that their grammar changes are broken if that check
fails, so I'm not sure what the value of testing a known-incorrect
grammar might be.  It wouldn't necessarily act the same after being
fixed.



Well surely the c code the parser invokes will behave the same. A lot  
of c hackers are not bison grammar hackers. Even many of us former  
bison grammar hackers are way rusty. There have been a number of times  
when someone has posted an otherwise working patch with a grammar  
conflict you fixed


Bruce surely nobody would object if you posted a path to add a  
comment. People would of course quibble with the wording but that's  
just par for the course.


Perhaps something like postgres jas a policy of maintaining zero  
parser conflicts. If you disable this for testing make sure you re- 
enable it and eliminate any conflicts. Or post to -hackers asking for  
advice


I'm not sure where to put a comment pointing them to the %expected  
line though. What does the error look like if they violate it?




   regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

2008-12-03 Thread Magnus Hagander
Hiroshi Inoue wrote:
 I think the thing us that as long as the encodings are compatible
 (latin1 with different names for example) it worked  fine.

  In any case I think the problem is that gettext is
 looking at a setting that is not what we are looking at.  Particularly
 with the 8.4 changes to allow per-database locale settings, this has
 got to be fixed in a bulletproof way.
 
 Attached is a new patch to apply bind_textdomain_codeset() to most
 server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL
 and PG_EUC_JIS_2004. EUC-JP may be OK for EUC_JIS_2004.
 
 Unfortunately it's hard for Saito-san and me to check encodings
 other than EUC-JP.


In principle this looks good, I think, but I'm a bit worried around the
lack of testing. I can do some testing under LATIN1 which is what we use
in Sweden (just need to get gettext working *at all* in my dev
environment again - I've somehow managed to break it), and perhaps we
can find someone to do a test in an eastern-european locale to get some
more datapoints?

Can you outline the steps one needs to go through to show the problem,
so we can confirm it's fixed in these locales?

//Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] maintenance memory vs autovac

2008-12-03 Thread Guillaume Smet
On Wed, Dec 3, 2008 at 10:49 AM, Magnus Hagander [EMAIL PROTECTED] wrote:
 The autovacuum workers change that and make it a default behaviour (as
 we can have 3*maintenance_work_mem by default).

 It's still one per process, it's just that autovac uses more than one
 process.

I agree. What I implied is that by default you have 3 autovacuum
workers so the behaviour has changed, even if it didn't change in a
technical way.

 It's probably worthwhile to add a note about the effects of
 autovacuum around the documentation of maintenance_work_mem, though.

+1
A lot of people set maintenance_work_mem quite high because of the old
behaviour.

-- 
Guillaume

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] maintenance memory vs autovac

2008-12-03 Thread Gregory Stark
Guillaume Smet [EMAIL PROTECTED] writes:

 On Wed, Dec 3, 2008 at 10:49 AM, Magnus Hagander [EMAIL PROTECTED] wrote:
 It's probably worthwhile to add a note about the effects of
 autovacuum around the documentation of maintenance_work_mem, though.

 +1
 A lot of people set maintenance_work_mem quite high because of the old
 behaviour.

The high level view is that an admin will expect to be able to allocate all
the memory on his machine (at least all the memory he expects Postgres to use)
as something like:

   shared_buffers and sundry shared mem
 + max_connections*work_mem
 + maintenance_work_mem
 + filesystem cache

(Yes, max_connections isn't quite right there but that's the general idea)

If you have 1G of ram and allocate 200M of shared buffers, 1M of work_mem of
which you don't expect more than a hundred concurrent allocations, and want
about half your ram set aside for filesystem cache you would be quite
reasonable to expect to have about 256M to play with for maintenance_work_me
-- which in my experience is a nice value (lower than that is noticeably
slower and greater has little effect on sorting data sets I've seen).

But if you set things up that way you could end up with three autovacuum
daemons running with 256M allocated each on a 1G machine. That's pretty
frightening, especially with a 200M shared buffers.

We definitely need at the very least a prominent warning in the
maintenance_work_mem documentation. Users can always raise it for manually run
commands if they're sure they're only running one at a time.

But all of this isn't a new issue is it? I thought we've had multiple
autovacuum workers since 8.3. Have there been any complaints?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] maintenance memory vs autovac

2008-12-03 Thread Magnus Hagander
Gregory Stark wrote:
 Guillaume Smet [EMAIL PROTECTED] writes:
 
 On Wed, Dec 3, 2008 at 10:49 AM, Magnus Hagander [EMAIL PROTECTED] wrote:
 It's probably worthwhile to add a note about the effects of
 autovacuum around the documentation of maintenance_work_mem, though.
 +1
 A lot of people set maintenance_work_mem quite high because of the old
 behaviour.

snip

 We definitely need at the very least a prominent warning in the
 maintenance_work_mem documentation. Users can always raise it for manually run
 commands if they're sure they're only running one at a time.

Yeah.


 But all of this isn't a new issue is it? I thought we've had multiple
 autovacuum workers since 8.3. Have there been any complaints?

Yes, that's why I brought it up. Haven't seen complaints on-list, but
have heard a couple from customers off-list. Not necessarily so much
complaints as what does this mean, but questions nevertheless.

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] snapshot leak and core dump with serializable transactions

2008-12-03 Thread Pavan Deolasee
The following test flashes snapshot leak warning and subsequently dumps
core. Though this looks very similar to other bug report, this is a
different issue.


postgres=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE ;
BEGIN
postgres=# SAVEPOINT A;
SAVEPOINT
postgres=# SELECT count(*) from pg_class;
 count
---
   227
(1 row)

postgres=# RELEASE SAVEPOINT A;
WARNING:  Snapshot reference leak: Snapshot 0x9e3e4d4 still referenced
RELEASE
postgres=# SELECT count(*) from pg_class;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!


I looked at this briefly and ISTM that there are couple of issues here:

1. Since SAVEPOINT A is the first statement in the transaction, a
subtransaction is started and CurrentResourceOwner is set to the resource
owner of the subtransaction. Later when serializable snapshot is taken, its
recorded in the subtransaction resource owner. Obviously, when the
subtransaction commits, it complains about the snapshot leak because the
serializable snapshot is not yet unregistered.

So I tried to ensure that the serializable snapshot is always recorded in
the TopTransactionResourceOwner. It solved the above issue, but there is
still a core dump when the top transaction is committed. That leads to the
second issue.

2. In CommitTransaction(), I think we should call AtEOXact_Snapshot *before*
releasing the resource owners. Otherwise, ResourceOwnerReleaseInternal
complains about snapshot leak and then forcefully unregisters the snapshot.
Later when AtEOXact_Snapshot is called, it again tries to unregister the
serializable snapshot and assertion fails.

The attached patch fixes these issues.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
Index: src/backend/access/transam/xact.c
===
RCS file: /repositories/postgreshome/cvs/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.269
diff -c -p -r1.269 xact.c
*** src/backend/access/transam/xact.c	19 Nov 2008 10:34:50 -	1.269
--- src/backend/access/transam/xact.c	3 Dec 2008 12:47:35 -
*** CommitTransaction(void)
*** 1685,1690 
--- 1685,1691 
  	smgrDoPendingDeletes(true);
  
  	AtEOXact_MultiXact();
+ 	AtEOXact_Snapshot(true);
  
  	ResourceOwnerRelease(TopTransactionResourceOwner,
  		 RESOURCE_RELEASE_LOCKS,
*** CommitTransaction(void)
*** 1706,1712 
  	AtEOXact_ComboCid();
  	AtEOXact_HashTables(true);
  	AtEOXact_PgStat(true);
- 	AtEOXact_Snapshot(true);
  	pgstat_report_xact_timestamp(0);
  
  	CurrentResourceOwner = NULL;
--- 1707,1712 
Index: src/backend/utils/time/snapmgr.c
===
RCS file: /repositories/postgreshome/cvs/pgsql/src/backend/utils/time/snapmgr.c,v
retrieving revision 1.7
diff -c -p -r1.7 snapmgr.c
*** src/backend/utils/time/snapmgr.c	25 Nov 2008 20:28:29 -	1.7
--- src/backend/utils/time/snapmgr.c	3 Dec 2008 12:47:36 -
*** GetTransactionSnapshot(void)
*** 136,142 
--- 136,145 
  		 */
  		if (IsXactIsoLevelSerializable)
  		{
+ 			ResourceOwner oldowner = CurrentResourceOwner;
+ 			CurrentResourceOwner = TopTransactionResourceOwner;
  			CurrentSnapshot = RegisterSnapshot(CurrentSnapshot);
+ 			CurrentResourceOwner = oldowner;
  			registered_serializable = true;
  		}
  
*** AtEOXact_Snapshot(bool isCommit)
*** 480,486 
--- 483,494 
  		 * refcount to the serializable snapshot.
  		 */
  		if (registered_serializable)
+ 		{
+ 			ResourceOwner oldowner = CurrentResourceOwner;
+ 			CurrentResourceOwner = TopTransactionResourceOwner;
  			UnregisterSnapshot(CurrentSnapshot);
+ 			CurrentResourceOwner = oldowner;
+ 		}
  
  		if (RegisteredSnapshots != 0)
  			elog(WARNING, %d registered snapshots seem to remain after cleanup,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] maintenance memory vs autovac

2008-12-03 Thread Magnus Hagander
Guillaume Smet wrote:
 On Wed, Dec 3, 2008 at 10:49 AM, Magnus Hagander [EMAIL PROTECTED] wrote:
 The autovacuum workers change that and make it a default behaviour (as
 we can have 3*maintenance_work_mem by default).
 It's still one per process, it's just that autovac uses more than one
 process.
 
 I agree. What I implied is that by default you have 3 autovacuum
 workers so the behaviour has changed, even if it didn't change in a
 technical way.
 
 It's probably worthwhile to add a note about the effects of
 autovacuum around the documentation of maintenance_work_mem, though.
 
 +1
 A lot of people set maintenance_work_mem quite high because of the old
 behaviour.

How about something as simple as this?

//Magnus
*** doc/src/sgml/config.sgml
--- doc/src/sgml/config.sgml
***
*** 881,886  SET ENABLE_SEQSCAN TO OFF;
--- 881,891 
  than varnamework_mem/varname.  Larger settings might improve
  performance for vacuuming and for restoring database dumps.
 /para
+para
+ Note that when autovacuum runs, up to
+ xref linkend=guc-autovacuum-max-workers times this memory may be
+ allocated, so be careful not to set the default value too high.
+/para
/listitem
   /varlistentry
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep: First Thoughts on Code

2008-12-03 Thread Fujii Masao
Hi,

On Tue, Dec 2, 2008 at 10:09 PM, Simon Riggs [EMAIL PROTECTED] wrote:

 On Tue, 2008-12-02 at 21:37 +0900, Fujii Masao wrote:

 Thanks for taking many hours to review the code!!

 On Mon, Dec 1, 2008 at 8:42 PM, Simon Riggs [EMAIL PROTECTED] wrote:
  Can you confirm that all the docs on the Wiki page are up to date? There
  are a few minor discrepancies that make me think it isn't.

 Documentation is ongoing. Sorry for my slow progress.

 BTW, I'm going to add and change the sgml files listed on wiki.
 http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Documentation_Plan

 I'm patient, I know it takes time. Happy to spend hours on the review,
 but I want to do that knowing I agree with the higher level features and
 architecture first.

Since I thought that the figure was more intelligible for some people
than my poor English, I illustrated the architecture first.
http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Detailed_Design

Are there any other parts which should be illustrated for review?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_get_keywords descriptions

2008-12-03 Thread Peter Eisentraut

= select distinct catcode, catdesc from pg_get_keywords();
 catcode |catdesc
-+---
 C   | Column name
 T   | Type or function name
 R   | Reserved
 U   | Unreserved

I find the descriptions of C and T quite confusing.  For example, saying 
that authorization is a type or function name (T) is somewhat bogus.


In 
http://developer.postgresql.org/pgdocs/postgres/sql-keywords-appendix.html, 
the terms


C = non-reserved (cannot be function or type)
T = reserved (can be function or type)

are used.  Should we use these here as well (possibly adding name)?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot leak and core dump with serializable transactions

2008-12-03 Thread Alvaro Herrera
Pavan Deolasee escribió:

 2. In CommitTransaction(), I think we should call AtEOXact_Snapshot *before*
 releasing the resource owners. Otherwise, ResourceOwnerReleaseInternal
 complains about snapshot leak and then forcefully unregisters the snapshot.
 Later when AtEOXact_Snapshot is called, it again tries to unregister the
 serializable snapshot and assertion fails.

Hmm, I've been wondering if we can get away with not having
AtEOXact_Snapshot at all.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] maintenance memory vs autovac

2008-12-03 Thread Alvaro Herrera
Magnus Hagander wrote:
 Tom Lane wrote:

  It seems like mostly a confusion-generator to me.  Is there any actual
  evidence that autovac should use a different maintenance_work_mem than
  other processes?
 
 The use-case that made me think of that is one with lots of autovac
 workers in a system with lots of small tables in different databases.

Another thing to consider here is that lazy vacuum will scale down its
memory usage depending on table size.

 Turns out I read the documentation for autovac wrong. I understood that
 if I wanted it to look at 1000 databases at once, I needed
 autovac_workers at 1000. Talked a bit offlist with Alvaro and realized
 that's not what it is, but that the documentation is a bit unclear on
 that - will work on fixing that.

Yeah, Rob Treat has also asked me twice about this, so it's probably
worth rewriting.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cvs head initdb hangs on unixware

2008-12-03 Thread ohp

On Tue, 2 Dec 2008, Heikki Linnakangas wrote:


Date: Tue, 02 Dec 2008 20:47:19 +0200
From: Heikki Linnakangas [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: Zdenek Kotala [EMAIL PROTECTED],
pgsql-hackers list pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] cvs head initdb hangs on unixware

[EMAIL PROTECTED] wrote:

Suivi de pile correspondant à p1, Programme postmaster
*[0] fsm_rebuild_page( présumé: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
 [1] fsm_search_avail( présumé: 0x2, 0x6, 0x1)  [0x81e68d9]
 [2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416, 
0xb4) [0x81e6385]

 [3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00]
 [4] RelationGetBufferForTuple( présumé: 0x84b2250, 0xb4, 0) [0x8099b59]
 [5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
 [6] simple_heap_insert( présumé: 0x84b2250, 0x853a338, 0x853a310) 
[0x8097297]

 [7] InsertOneTuple( présumé: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
 [8] boot_yyparse( présumé: 0x, 0x3, 0x8047ab8) [0x80c822b]
 [9] BootstrapModeMain( présumé: 0x66, 0x8454600, 0x4)  [0x80ca233]
 [10] AuxiliaryProcessMain(0x4, 0x8047ab4)  [0x80cab3b]
 [11] main(0x4, 0x8047ab4, 0x8047ac8)   [0x8177dce]
 [12] _start()  [0x807ff96]

seems interesting!

We've had problems already with unixware optimizer, hope this one is 
fixable!


Looking at fsm_rebuild_page, I wonder if the compiler is treating int as an 
unsigned integer? That would cause an infinite loop.



No, a simple printf of nodeno shows it  starting at 4096 all the way down 
to 0, starting back at 4096...


I wonder if leftchild/rightchild definitions has something to do with 
it...


--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges+33-5-61-50-97-01 (Fax)
31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
FRANCE  Email: [EMAIL PROTECTED]
--
Make your life a dream, make your dream a reality. (St Exupery)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Heikki Linnakangas

Heikki Linnakangas wrote:
Here's an updated version, with a lot of smaller cleanups, and using 
relcache invalidation to notify other backends when the visibility map 
fork is extended. I already committed the change to FSM to do the same. 
I'm feeling quite satisfied to commit this patch early next week.


Committed.

I haven't done any doc changes for this yet. I think a short section in 
the database internal storage chapter is probably in order, and the 
fact that plain VACUUM skips pages should be mentioned somewhere. I'll 
skim through references to vacuum and see what needs to be changed.


Hmm. It just occurred to me that I think this circumvented the 
anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid 
anymore. We'll need to disable the skipping when autovacuum is triggered 
to prevent wraparound. VACUUM FREEZE does that already, but it's 
unnecessarily aggressive in freezing.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS] libpq does not manage SSL callbacks properly when other libraries are involved.

2008-12-03 Thread Magnus Hagander
Bruce Momjian wrote:
 Bruce Momjian wrote:
 Thanks for the review, Magnus.  I have adjusted the patch to use the
 same mutex every time the counter is accessed, and adjusted the
 pqsecure_destroy() call to properly decrement in the right place.

 Also, I renamed the libpq global destroy function to be clearer
 (the function is not exported).
 
 Here is an updated version of the patch to match CVS HEAD.

I've updated it to match what's CVS HEAD now, and made some minor
modifications. Renamed destroySSL() to make it consistent with
initializeSSL(). Added and changed some comments. ssldiff.patch contains
my changes against Bruce's patch.

I also removed the #ifdef NOT_USED parts. They are in CVS history if we
need them, and they're trivial things anyway, so I think this is much
cleaner.

With this, it looks fine to me. Especially since we've seen some testing
from the PHP folks already.

//Magnus
*** src/backend/libpq/be-secure.c
--- src/backend/libpq/be-secure.c
***
*** 88,94  static DH  *tmp_dh_cb(SSL *s, int is_export, int keylength);
  static int	verify_cb(int, X509_STORE_CTX *);
  static void info_cb(const SSL *ssl, int type, int args);
  static void initialize_SSL(void);
- static void destroy_SSL(void);
  static int	open_server_SSL(Port *);
  static void close_SSL(Port *);
  static const char *SSLerrmessage(void);
--- 88,93 
***
*** 193,209  secure_initialize(void)
  }
  
  /*
-  *	Destroy global context
-  */
- void
- secure_destroy(void)
- {
- #ifdef USE_SSL
- 	destroy_SSL();
- #endif
- }
- 
- /*
   * Indicate if we have loaded the root CA store to verify certificates
   */
  bool
--- 192,197 
***
*** 844,862  initialize_SSL(void)
  }
  
  /*
-  *	Destroy global SSL context.
-  */
- static void
- destroy_SSL(void)
- {
- 	if (SSL_context)
- 	{
- 		SSL_CTX_free(SSL_context);
- 		SSL_context = NULL;
- 	}
- }
- 
- /*
   *	Attempt to negotiate SSL connection.
   */
  static int
--- 832,837 
*** src/interfaces/libpq/fe-secure.c
--- src/interfaces/libpq/fe-secure.c
***
*** 44,49 
--- 44,50 
  #endif
  #include arpa/inet.h
  #endif
+ 
  #include sys/stat.h
  
  #ifdef ENABLE_THREAD_SAFETY
***
*** 89,108  static bool verify_peer_name_matches_certificate(PGconn *);
  static int	verify_cb(int ok, X509_STORE_CTX *ctx);
  static int	client_cert_cb(SSL *, X509 **, EVP_PKEY **);
  static int	init_ssl_system(PGconn *conn);
  static int	initialize_SSL(PGconn *);
! static void destroy_SSL(void);
  static PostgresPollingStatusType open_client_SSL(PGconn *);
  static void close_SSL(PGconn *);
  static char *SSLerrmessage(void);
  static void SSLerrfree(char *buf);
- #endif
  
- #ifdef USE_SSL
  static bool pq_initssllib = true;
- 
  static SSL_CTX *SSL_context = NULL;
  #endif
  
  /*
   * Macros to handle disabling and then restoring the state of SIGPIPE handling.
   * Note that DISABLE_SIGPIPE() must appear at the start of a block.
--- 90,121 
  static int	verify_cb(int ok, X509_STORE_CTX *ctx);
  static int	client_cert_cb(SSL *, X509 **, EVP_PKEY **);
  static int	init_ssl_system(PGconn *conn);
+ static void destroy_ssl_system(void);
  static int	initialize_SSL(PGconn *);
! static void destroySSL(void);
  static PostgresPollingStatusType open_client_SSL(PGconn *);
  static void close_SSL(PGconn *);
  static char *SSLerrmessage(void);
  static void SSLerrfree(char *buf);
  
  static bool pq_initssllib = true;
  static SSL_CTX *SSL_context = NULL;
+ 
+ #ifdef ENABLE_THREAD_SAFETY
+ static int ssl_open_connections = 0;
+ 
+ #ifndef WIN32
+ static pthread_mutex_t ssl_config_mutex = PTHREAD_MUTEX_INITIALIZER;
+ #else
+ static pthread_mutex_t ssl_config_mutex = NULL;
+ static long win32_ssl_create_mutex = 0;
  #endif
  
+ #endif	/* ENABLE_THREAD_SAFETY */
+ 
+ #endif /* SSL */
+ 
+ 
  /*
   * Macros to handle disabling and then restoring the state of SIGPIPE handling.
   * Note that DISABLE_SIGPIPE() must appear at the start of a block.
***
*** 186,192  void
  pqsecure_destroy(void)
  {
  #ifdef USE_SSL
! 	destroy_SSL();
  #endif
  }
  
--- 199,205 
  pqsecure_destroy(void)
  {
  #ifdef USE_SSL
! 	destroySSL();
  #endif
  }
  
***
*** 734,739  client_cert_cb(SSL *ssl, X509 **x509, EVP_PKEY **pkey)
--- 747,755 
  }
  
  #ifdef ENABLE_THREAD_SAFETY
+ /*
+  *	Callback functions for OpenSSL internal locking
+  */
  
  static unsigned long
  pq_threadidcallback(void)
***
*** 765,818  pq_lockingcallback(int mode, int n, const char *file, int line)
  #endif   /* ENABLE_THREAD_SAFETY */
  
  /*
!  * Also see similar code in fe-connect.c, default_threadlock()
   */
  static int
  init_ssl_system(PGconn *conn)
  {
  #ifdef ENABLE_THREAD_SAFETY
! #ifndef WIN32
! 	static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER;
! #else
! 	static pthread_mutex_t init_mutex = NULL;
! 	static long mutex_initlock = 0;
! 
! 	if (init_mutex == NULL)
  	{
! 		while 

Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 Hmm. It just occurred to me that I think this circumvented the anti-wraparound
 vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to
 disable the skipping when autovacuum is triggered to prevent wraparound. 
 VACUUM
 FREEZE does that already, but it's unnecessarily aggressive in freezing.

Having seen how the anti-wraparound vacuums work in the field I think merely
replacing it with a regular vacuum which covers the whole table will not
actually work well.

What will happen is that, because nothing else is advancing the relfrozenxid,
the age of the relfrozenxid for all tables will advance until they all hit
autovacuum_max_freeze_age. Quite often all the tables were created around the
same time so they will all hit autovacuum_max_freeze_age at the same time.

So a database which was operating fine and receiving regular vacuums at a
reasonable pace will suddenly be hit by vacuums for every table all at the
same time, 3 at a time. If you don't have vacuum_cost_delay set that will
cause a major issue. Even if you do have vacuum_cost_delay set it will prevent
the small busy tables from getting vacuumed regularly due to the backlog in
anti-wraparound vacuums.

Worse, vacuum will set the freeze_xid to nearly the same value for all of the
tables. So it will all happen again in another 100M transactions. And again in
another 100M transactions, and again...

I think there are several things which need to happen here.

1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
   means unnecessary full table vacuums long before they accomplish anything.

2) Include a factor which spreads out the anti-wraparound freezes in the
   autovacuum launcher. Some ideas:

. we could implicitly add random(vacuum_freeze_min_age) to the
  autovacuum_max_freeze_age. That would spread them out evenly over 100M
  transactions.

. we could check if another anti-wraparound vacuum is still running and
  implicitly add a vacuum_freeze_min_age penalty to the
  autovacuum_max_freeze_age for each running anti-wraparound vacuum. That
  would spread them out without being introducing non-determinism which
  seems better.

. we could leave autovacuum_max_freeze_age and instead pick a semi-random
  vacuum_freeze_min_age. This would mean the first set of anti-wraparound
  vacuums would still be synchronized but subsequent ones might be spread
  out somewhat. There's not as much room to randomize this though and it
  would affect how much i/o vacuum did which makes it seem less palatable
  to me.

3) I also think we need to put a clamp on the vacuum_cost_delay. Too many
   people are setting it to unreasonably high values which results in their
   vacuums never completing. Actually I think what we should do is junk all
   the existing parameters and replace it with a vacuum_nice_level or
   vacuum_bandwidth_cap from which we calculate the cost_limit and hide all
   the other parameters as internal parameters.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Alvaro Herrera
Heikki Linnakangas wrote:

 Hmm. It just occurred to me that I think this circumvented the  
 anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid  
 anymore. We'll need to disable the skipping when autovacuum is triggered  
 to prevent wraparound. VACUUM FREEZE does that already, but it's  
 unnecessarily aggressive in freezing.

Heh :-)  Yes, this should be handled sanely, without having to invoke
FREEZE.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Magnus Hagander
Gregory Stark wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 
 Hmm. It just occurred to me that I think this circumvented the 
 anti-wraparound
 vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need 
 to
 disable the skipping when autovacuum is triggered to prevent wraparound. 
 VACUUM
 FREEZE does that already, but it's unnecessarily aggressive in freezing.
 
 Having seen how the anti-wraparound vacuums work in the field I think merely
 replacing it with a regular vacuum which covers the whole table will not
 actually work well.
 
 What will happen is that, because nothing else is advancing the relfrozenxid,
 the age of the relfrozenxid for all tables will advance until they all hit
 autovacuum_max_freeze_age. Quite often all the tables were created around the
 same time so they will all hit autovacuum_max_freeze_age at the same time.
 
 So a database which was operating fine and receiving regular vacuums at a
 reasonable pace will suddenly be hit by vacuums for every table all at the
 same time, 3 at a time. If you don't have vacuum_cost_delay set that will
 cause a major issue. Even if you do have vacuum_cost_delay set it will prevent
 the small busy tables from getting vacuumed regularly due to the backlog in
 anti-wraparound vacuums.
 
 Worse, vacuum will set the freeze_xid to nearly the same value for all of the
 tables. So it will all happen again in another 100M transactions. And again in
 another 100M transactions, and again...
 
 I think there are several things which need to happen here.
 
 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
means unnecessary full table vacuums long before they accomplish anything.
 
 2) Include a factor which spreads out the anti-wraparound freezes in the
autovacuum launcher. Some ideas:
 
 . we could implicitly add random(vacuum_freeze_min_age) to the
   autovacuum_max_freeze_age. That would spread them out evenly over 100M
   transactions.
 
 . we could check if another anti-wraparound vacuum is still running and
   implicitly add a vacuum_freeze_min_age penalty to the
   autovacuum_max_freeze_age for each running anti-wraparound vacuum. That
   would spread them out without being introducing non-determinism which
   seems better.
 
 . we could leave autovacuum_max_freeze_age and instead pick a semi-random
   vacuum_freeze_min_age. This would mean the first set of anti-wraparound
   vacuums would still be synchronized but subsequent ones might be spread
   out somewhat. There's not as much room to randomize this though and it
   would affect how much i/o vacuum did which makes it seem less palatable
   to me.

How about a way to say that only one (or a config parameter for n) of
the autovac workers can be used for anti-wraparound vacuum? Then the
other slots would still be available for the
small-but-frequently-updated tables.



 3) I also think we need to put a clamp on the vacuum_cost_delay. Too many
people are setting it to unreasonably high values which results in their
vacuums never completing. Actually I think what we should do is junk all
the existing parameters and replace it with a vacuum_nice_level or
vacuum_bandwidth_cap from which we calculate the cost_limit and hide all
the other parameters as internal parameters.

It would certainly be helpful if it was just a single parameter - the
arbitraryness of the parameters there now make them pretty hard to set
properly - or at least easy to set wrong.


//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot leak and core dump with serializable transactions

2008-12-03 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes:
 2. In CommitTransaction(), I think we should call AtEOXact_Snapshot *before*
 releasing the resource owners.

That's absolutely wrong.  It'll complain about whatever snapshots the
owners still hold.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tuplestore potential performance problem

2008-12-03 Thread Hitoshi Harada
While attacking this issue(*1), I found that tuplestore that is on the
file status has potential performance problem.

The performance problem introduced by Heikki's new approach was caused
by BufFile's frequent flush out in such cases like you put a new row
into it and read middle row of it then put another row again, and so
on. When tuplestore switches its internal mode from TSS_WRITEFILE to
TSS_READFILE, underlying BufFile seeks to read pointer and flushes out
its dirty buffer if the reading pointer is not near the writing
pointer. Also, reading to writing switch avoids OS disk cache benefit.

This is not critical in TSS_INMEM.

So I decided to keep writing until finish if the tuplestore gets in
file mode from memory mode rather than switching reading and writing
randomly, which recovers the earlier performance almost. I am not sure
but am afraid that the nodeCtescan also uses similar logic. Doesn't
CTE have any problem for large data set?

Regards,

*1:http://archives.postgresql.org/pgsql-hackers/2008-12/msg00077.php


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplestore potential performance problem

2008-12-03 Thread Tom Lane
Hitoshi Harada [EMAIL PROTECTED] writes:
 While attacking this issue(*1), I found that tuplestore that is on the
 file status has potential performance problem.

 The performance problem introduced by Heikki's new approach was caused
 by BufFile's frequent flush out in such cases like you put a new row
 into it and read middle row of it then put another row again, and so
 on. When tuplestore switches its internal mode from TSS_WRITEFILE to
 TSS_READFILE, underlying BufFile seeks to read pointer and flushes out
 its dirty buffer if the reading pointer is not near the writing
 pointer. Also, reading to writing switch avoids OS disk cache benefit.

 This is not critical in TSS_INMEM.

 So I decided to keep writing until finish if the tuplestore gets in
 file mode from memory mode rather than switching reading and writing
 randomly, which recovers the earlier performance almost. I am not sure
 but am afraid that the nodeCtescan also uses similar logic. Doesn't
 CTE have any problem for large data set?

If this means a lot of contortion/complication in the upper-level code,
seems like it'd be better to address the performance issue within
tuplestore/buffile.  We could keep separate buffers for write and read
perhaps.  But do you have real evidence of a performance problem?
I'd sort of expect the kernel's disk cache to mitigate this pretty well.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep: First Thoughts on Code

2008-12-03 Thread Simon Riggs

On Wed, 2008-12-03 at 21:37 +0900, Fujii Masao wrote:

 Since I thought that the figure was more intelligible for some people
 than my poor English, I illustrated the architecture first.
 http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Detailed_Design
 
 Are there any other parts which should be illustrated for review?

Those are very useful, thanks.

Some questions to check my understanding (expected answers in brackets)

* Diagram on p.2 has two Archives. We have just one (yes)

* We send data continuously, whether or not we are in sync/async? (yes)
So the only difference between sync/async is whether we wait when we
flush the commit? (yes)

* If we have synchronous_commit = off do we ignore
synchronous_replication = on (yes)

* If two transactions commit almost simultaneously and one is sync and
the other async then only the sync backend will wait? (Yes)


Do we definitely need the archiver to move the files written by
walreceiver to archive and then move them back out again? Seems like we
can streamline that part in many (all?) cases.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2008-12-03 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 If we do this though it would be really nice to do it at a higher  
 level than the indexam. If we could do it for any indexam that  
 provides a kind of bulk insert method that would be great.

 I'm just not sure how to support all the indexable operators for the  
 various indexams on the local buffered list.

In principle, just return all those TIDs marked lossy, please recheck.
This is a bit brute-force but I'm not sure any useful optimization is
possible.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplestore potential performance problem

2008-12-03 Thread Hitoshi Harada
 I don't have real evidence but reasoned it. No strace was done. So it
 may not be cased by flushing out but this commit gets performance
 quite better, to earlier patch performance, around 44sec from around
 76sec.


Oh, I mean, 116sec to 44sec.


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplestore potential performance problem

2008-12-03 Thread Hitoshi Harada
2008/12/3 Tom Lane [EMAIL PROTECTED]:
 If this means a lot of contortion/complication in the upper-level code,
 seems like it'd be better to address the performance issue within
 tuplestore/buffile.  We could keep separate buffers for write and read
 perhaps.  But do you have real evidence of a performance problem?
 I'd sort of expect the kernel's disk cache to mitigate this pretty well.

regards, tom lane

I don't have real evidence but reasoned it. No strace was done. So it
may not be cased by flushing out but this commit gets performance
quite better, to earlier patch performance, around 44sec from around
76sec.

http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=87d9b8ac5dca9fae5f3ac4f3218d8fb4eca8b5b0;hp=f1976a9d002b20006ac31ca85db27abcf56e9ea2

where pos = -1 means spool all rows until the end.

The earlier approach was buffering all the table and the newer
Heikki's approach was buffer on row by row while reading. The newest
is buffering row by row while reading during in memory, and holding
all the remaining tuples before reading after out to file, something
like hybrid method.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2008-12-03 Thread Heikki Linnakangas

Tom Lane wrote:

Greg Stark [EMAIL PROTECTED] writes:
If we do this though it would be really nice to do it at a higher  
level than the indexam. If we could do it for any indexam that  
provides a kind of bulk insert method that would be great.


I'm just not sure how to support all the indexable operators for the  
various indexams on the local buffered list.


In principle, just return all those TIDs marked lossy, please recheck.
This is a bit brute-force but I'm not sure any useful optimization is
possible.


You could flush the local buffer to the index whenever the index is 
queried. Not sure if it's better than returning them for recheck, though.


This wouldn't work for unique indexes, BTW, but that's not a problem for 
GIN.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Heikki Linnakangas

Gregory Stark wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:


Hmm. It just occurred to me that I think this circumvented the anti-wraparound
vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to
disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM
FREEZE does that already, but it's unnecessarily aggressive in freezing.


FWIW, it seems the omission is actually the other way 'round. Autovacuum 
always forces a full-scanning vacuum, making the visibility map useless 
for autovacuum. This obviously needs to be fixed.



What will happen is that, because nothing else is advancing the relfrozenxid,
the age of the relfrozenxid for all tables will advance until they all hit
autovacuum_max_freeze_age. Quite often all the tables were created around the
same time so they will all hit autovacuum_max_freeze_age at the same time.

So a database which was operating fine and receiving regular vacuums at a
reasonable pace will suddenly be hit by vacuums for every table all at the
same time, 3 at a time. If you don't have vacuum_cost_delay set that will
cause a major issue. Even if you do have vacuum_cost_delay set it will prevent
the small busy tables from getting vacuumed regularly due to the backlog in
anti-wraparound vacuums.

Worse, vacuum will set the freeze_xid to nearly the same value for all of the
tables. So it will all happen again in another 100M transactions. And again in
another 100M transactions, and again...


But we already have that problem, don't we? When you initially load your 
database, all tuples will have the same xmin, and all tables will have 
more or less the same relfrozenxid. I guess you can argue that it 
becomes more obvious if vacuums are otherwise cheaper, but I don't think 
the visibility map makes that much difference to suddenly make this 
issue urgent.


Agreed that it would be nice to do something about it, though.


I think there are several things which need to happen here.

1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
   means unnecessary full table vacuums long before they accomplish anything.


It allows you to truncate clog. If I did my math right, 200M 
transactions amounts to ~50MB of clog. Perhaps we should still raise it, 
disk space is cheap after all.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 Gregory Stark wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:

 Hmm. It just occurred to me that I think this circumvented the 
 anti-wraparound
 vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need 
 to
 disable the skipping when autovacuum is triggered to prevent wraparound. 
 VACUUM
 FREEZE does that already, but it's unnecessarily aggressive in freezing.

 FWIW, it seems the omission is actually the other way 'round. Autovacuum 
 always
 forces a full-scanning vacuum, making the visibility map useless for
 autovacuum. This obviously needs to be fixed.

How does it do that? Is there some option in the VacStmt to control this? Do
we just need a syntax to set that option?


How easy is it to tell what percentage of the table needs to be vacuumed? If
it's  50% perhaps it would make sense to scan the whole table? (Hm. Not
really if it's a contiguous 50% though...)

Another idea: Perhaps each page of the visibility map should have a frozenxid
(or multiple frozenxids?). Then if an individual page of the visibility map is
old we could force scanning all the heap pages covered by that map page and
update it. I'm not sure we can do that safely though without locking issues --
or is it ok because it's vacuum doing the updating?

 Worse, vacuum will set the freeze_xid to nearly the same value for all of the
 tables. So it will all happen again in another 100M transactions. And again 
 in
 another 100M transactions, and again...

 But we already have that problem, don't we? When you initially load your
 database, all tuples will have the same xmin, and all tables will have more or
 less the same relfrozenxid. I guess you can argue that it becomes more obvious
 if vacuums are otherwise cheaper, but I don't think the visibility map makes
 that much difference to suddenly make this issue urgent.

We already have that problem but it only bites in a specific case: if you have
no other vacuums being triggered by the regular dead tuple scale factor. The
normal case is intended to be that autovacuum triggers much more frequently
than every 100M transactions to reduce bloat.

However in practice this specific case does seem to arise rather alarmingly
easy. Most databases do have some large tables which are never deleted from or
updated. Also, the default scale factor of 20% is actually quite easy to never
reach if your tables are also growing quickly -- effectively moving the
goalposts further out as fast as the updates and deletes bloat the table.

The visibility map essentially widens this specific use case to cover *all*
tables. Since the relfrozenxid would never get advanced by regular vacuums the
only time it would get advanced is when they all hit the 200M wall
simultaneously.

 Agreed that it would be nice to do something about it, though.

 I think there are several things which need to happen here.

 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
means unnecessary full table vacuums long before they accomplish anything.

 It allows you to truncate clog. If I did my math right, 200M transactions
 amounts to ~50MB of clog. Perhaps we should still raise it, disk space is 
 cheap
 after all.

Ah. Hm. Then perhaps this belongs in the realm of the config generator people
are working on. They'll need a dial to say how much disk space you expect your
database to take in addition to how much memory your machine has available.
50M is nothing for a 1TB database but it's kind of silly to have to keep
hundreds of megs of clogs on a 1MB database.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cvs head initdb hangs on unixware

2008-12-03 Thread Andrew Dunstan



[EMAIL PROTECTED] wrote:


Looking at fsm_rebuild_page, I wonder if the compiler is treating 
int as an unsigned integer? That would cause an infinite loop.



No, a simple printf of nodeno shows it  starting at 4096 all the way 
down to 0, starting back at 4096...


I wonder if leftchild/rightchild definitions has something to do with 
it...


With probably no relevance at all, I notice that this routine is 
declared extern, although it is only referenced in its own file 
apparently. Don't we have a tool that checks that?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cvs head initdb hangs on unixware

2008-12-03 Thread Heikki Linnakangas

[EMAIL PROTECTED] wrote:

On Tue, 2 Dec 2008, Heikki Linnakangas wrote:


Date: Tue, 02 Dec 2008 20:47:19 +0200
From: Heikki Linnakangas [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: Zdenek Kotala [EMAIL PROTECTED],
pgsql-hackers list pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] cvs head initdb hangs on unixware

[EMAIL PROTECTED] wrote:

Suivi de pile correspondant à p1, Programme postmaster
*[0] fsm_rebuild_page( présumé: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
 [1] fsm_search_avail( présumé: 0x2, 0x6, 0x1)  [0x81e68d9]
 [2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 
0x8047416, 0xb4) [0x81e6385]
 [3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) 
[0x81e5a00]

 [4] RelationGetBufferForTuple( présumé: 0x84b2250, 0xb4, 0) [0x8099b59]
 [5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
 [6] simple_heap_insert( présumé: 0x84b2250, 0x853a338, 0x853a310) 
[0x8097297]

 [7] InsertOneTuple( présumé: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
 [8] boot_yyparse( présumé: 0x, 0x3, 0x8047ab8) [0x80c822b]
 [9] BootstrapModeMain( présumé: 0x66, 0x8454600, 0x4)  [0x80ca233]
 [10] AuxiliaryProcessMain(0x4, 0x8047ab4)  [0x80cab3b]
 [11] main(0x4, 0x8047ab4, 0x8047ac8)   [0x8177dce]
 [12] _start()  [0x807ff96]

seems interesting!

We've had problems already with unixware optimizer, hope this one is 
fixable!


Looking at fsm_rebuild_page, I wonder if the compiler is treating 
int as an unsigned integer? That would cause an infinite loop.


No, a simple printf of nodeno shows it  starting at 4096 all the way 
down to 0, starting back at 4096...


Hmm, it's probably looping in fsm_search_avail then. In a fresh cluster, 
there shouldn't be any broken FSM pages that need rebuilding.


I'd like to see what the FSM page in question looks like. Could you try 
to run initdb with -d -n options? I bet you'll get an infinite number 
of lines like:


DEBUG: fixing corrupt FSM block 1, relation 123/456/789

Could you zip up the FSM file of that relation  (a file called e.g 
789_fsm), and send it over? Or the whole data directory, it shouldn't 
be that big.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 Looks like I need to add Python 2.5+Linux to my testing set.  I did not
 expect that the UNIX distributions of Python 2.5 would ship with wintypes.py
 at all.  I think I can fix this on the spot though.  On line 40, you'll find
 this bit:

 except ImportError:

 Change that to the following:

 except ImportError,ValueError:

That didn't work, same error message.

 And it should pass that point.  If it doesn't, you can try the completely
 general:

 except:

That worked.

The settings that this initially spit out (I guess it defaults to
mixed mode) didn't look too sane to me, because as discussed
elsewhere on this thread 50 is not a reasonable value for
default_statistics_target for my installation.  It also wanted to set
constraint_exclusion to on, which I'm pretty confident is useless.

Then I tried -T web and got what seemed like a more reasonable set
of values.  But I wasn't sure I needed that many connections, so I
added -c 150 to see how much difference that made.  Kaboom!

$ ./pgtune -i ~postgres/data/postgresql.conf -T web -c 150
Traceback (most recent call last):
  File ./pgtune, line 463, in module
wizardTune(config,options,settings)
  File ./pgtune, line 403, in wizardTune
'web':mem/con, 'oltp':mem/con,'dw':mem/con/2,
TypeError: unsupported operand type(s) for /: 'int' and 'str'

I'm not sure what mixed mode is supposed to be, but based on what
I've seen so far, I'm a skeptical of the idea that encouraging people
to raise default_statistics_target to 50 and turn on
constraint_exclusion is reasonable.  I'm also a bit surprised that
there doesn't seem to be anything here that depends on the size of the
database, even order-of-magnitude.  It seems like the right value for
checkpoint_segments, at least, might depend on that.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Heikki Linnakangas

Gregory Stark wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Gregory Stark wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Hmm. It just occurred to me that I think this circumvented the anti-wraparound
vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to
disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM
FREEZE does that already, but it's unnecessarily aggressive in freezing.

FWIW, it seems the omission is actually the other way 'round. Autovacuum always
forces a full-scanning vacuum, making the visibility map useless for
autovacuum. This obviously needs to be fixed.


How does it do that? Is there some option in the VacStmt to control this? Do
we just need a syntax to set that option?


The way it works now is that if VacuumStmt-freeze_min_age is not -1 
(which means use the default), the visibility map is not used and the 
whole table is scanned. Autovacuum always sets freeze_min_age, so it's 
never using the visibility map. Attached is a patch I'm considering to 
fix that.



How easy is it to tell what percentage of the table needs to be vacuumed? If
it's  50% perhaps it would make sense to scan the whole table? (Hm. Not
really if it's a contiguous 50% though...)


Hmm. You could scan the visibility map to see how much you could skip by 
using it. You could account for contiguity.



Another idea: Perhaps each page of the visibility map should have a frozenxid
(or multiple frozenxids?). Then if an individual page of the visibility map is
old we could force scanning all the heap pages covered by that map page and
update it. I'm not sure we can do that safely though without locking issues --
or is it ok because it's vacuum doing the updating?


We discussed that a while ago:

http://archives.postgresql.org/message-id/[EMAIL PROTECTED]

Tom was concerned about making the visibility map not just a hint but 
critical data. Rightly so. This is certainly 8.5 stuff; perhaps it would 
be more palatable after we get the index-only-scans working using the 
visibility map, since the map would be critical data anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index fd2429a..3e3cb9d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -171,10 +171,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 	vacrelstats-hasindex = (nindexes  0);
 
 	/* Should we use the visibility map or scan all pages? */
-	if (vacstmt-freeze_min_age != -1)
-		scan_all = true;
-	else
-		scan_all = false;
+	scan_all = vacstmt-scan_all;
  
 	/* Do the vacuuming */
 	lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, scan_all);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index eb7ab4d..2781f6e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2771,6 +2771,7 @@ _copyVacuumStmt(VacuumStmt *from)
 	COPY_SCALAR_FIELD(analyze);
 	COPY_SCALAR_FIELD(verbose);
 	COPY_SCALAR_FIELD(freeze_min_age);
+	COPY_SCALAR_FIELD(scan_all));
 	COPY_NODE_FIELD(relation);
 	COPY_NODE_FIELD(va_cols);
 
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index d4c57bb..86a032f 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1436,6 +1436,7 @@ _equalVacuumStmt(VacuumStmt *a, VacuumStmt *b)
 	COMPARE_SCALAR_FIELD(analyze);
 	COMPARE_SCALAR_FIELD(verbose);
 	COMPARE_SCALAR_FIELD(freeze_min_age);
+	COMPARE_SCALAR_FIELD(scan_all);
 	COMPARE_NODE_FIELD(relation);
 	COMPARE_NODE_FIELD(va_cols);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 85f4616..1aab75c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -5837,6 +5837,7 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 	n-analyze = false;
 	n-full = $2;
 	n-freeze_min_age = $3 ? 0 : -1;
+	n-scan_all = $3;
 	n-verbose = $4;
 	n-relation = NULL;
 	n-va_cols = NIL;
@@ -5849,6 +5850,7 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 	n-analyze = false;
 	n-full = $2;
 	n-freeze_min_age = $3 ? 0 : -1;
+	n-scan_all = $3;
 	n-verbose = $4;
 	n-relation = $5;
 	n-va_cols = NIL;
@@ -5860,6 +5862,7 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 	n-vacuum = true;
 	n-full = $2;
 	n-freeze_min_age = $3 ? 0 : -1;
+	n-scan_all = $3;
 	n-verbose |= $4;
 	$$ = (Node *)n;
 }
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 8d8947f..2c68779 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2649,6 +2649,7 @@ autovacuum_do_vac_analyze(autovac_table *tab,
 	vacstmt.full = false;
 	vacstmt.analyze = tab-at_doanalyze;
 	vacstmt.freeze_min_age = tab-at_freeze_min_age;
+	vacstmt.scan_all = tab-at_wraparound;
 	vacstmt.verbose = false;
 	vacstmt.relation = NULL;	

Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Joshua D. Drake
On Wed, 2008-12-03 at 13:30 -0500, Robert Haas wrote:
  Looks like I need to add Python 2.5+Linux to my testing set.  I did not
  expect that the UNIX distributions of Python 2.5 would ship with wintypes.py
  at all.  I think I can fix this on the spot though.  On line 40, you'll find
  this bit:
 
  except ImportError:
 
  Change that to the following:
 
  except ImportError,ValueError:
 
 That didn't work, same error message.
 
  And it should pass that point.  If it doesn't, you can try the completely
  general:
 
  except:
 
 That worked.
 
 The settings that this initially spit out (I guess it defaults to
 mixed mode) didn't look too sane to me, because as discussed
 elsewhere on this thread 50 is not a reasonable value for
 default_statistics_target for my installation.  It also wanted to set
 constraint_exclusion to on, which I'm pretty confident is useless.
 
 Then I tried -T web and got what seemed like a more reasonable set
 of values.  But I wasn't sure I needed that many connections, so I
 added -c 150 to see how much difference that made.  Kaboom!
 
 $ ./pgtune -i ~postgres/data/postgresql.conf -T web -c 150
 Traceback (most recent call last):
   File ./pgtune, line 463, in module
 wizardTune(config,options,settings)
   File ./pgtune, line 403, in wizardTune
 'web':mem/con, 'oltp':mem/con,'dw':mem/con/2,
 TypeError: unsupported operand type(s) for /: 'int' and 'str'
 
 I'm not sure what mixed mode is supposed to be, but based on what
 I've seen so far, I'm a skeptical of the idea that encouraging people
 to raise default_statistics_target to 50 and turn on
 constraint_exclusion is reasonable.

Why?

   I'm also a bit surprised that
 there doesn't seem to be anything here that depends on the size of the
 database, even order-of-magnitude.  It seems like the right value for
 checkpoint_segments, at least, might depend on that.

What does checkpoint_segments have to do with the size of the database?

Joshua D. Drake




 
 ...Robert
 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions and temp tables

2008-12-03 Thread Emmanuel Cecchet
I would really like to have support for temp tables at least for the 
case where the table is created and dropped in the same transaction. But 
I guess that the other limitations on index, sequences and views would 
still hold, right?


manu

Heikki Linnakangas wrote:

Emmanuel Cecchet wrote:
There is a problem with temp tables with on delete rows that are 
created inside a transaction.
Take the 2pc_on_delete_rows_transaction.sql test case and change the 
creation statement, instead of

create temp table foo(x int) on commit delete rows;
try
create temp table foo(x serial primary key) on commit delete rows;

The test will fail. It looks like the onCommit field is not properly 
updated when serial or primary key is used in that context. I did not 
figure out why.

A serial column uses a sequence behind the scenes.

Hmm. Seems like we would need to treat sequences and indexes the same 
as tables with ON COMMIT DELETE ROWS, i.e release the locks early and 
don't error out.


All in all, this is getting pretty messy. My patch felt a bit hackish 
to begin with, and having to add special cases for sequences and 
indexes would make it even more so. And what about temporary views? 
I'm starting to feel that instead of special-casing temp relations, we 
need to move into the opposite direction and make temp relations more 
like regular relations. Unfortunately, that's not going to happen in 
the 8.4 timeframe :-(. Let's try the other approach in 8.5.





--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development  Consulting

--
Web: http://www.frogthinker.org
email: [EMAIL PROTECTED]
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot leak and core dump with serializable transactions

2008-12-03 Thread Alvaro Herrera
Pavan Deolasee escribió:
 On Wed, Dec 3, 2008 at 7:42 PM, Tom Lane [EMAIL PROTECTED] wrote:
 
  That's absolutely wrong.  It'll complain about whatever snapshots the
  owners still hold.

 You must be right; I don't understand that code much. But don't we expect
 the snapshots to be cleanly released at that point and if not we flash
 warnings anyways ? AtEOXact_Snapshot only unregisters the serialized
 snapshot which otherwise release resource owner will complain about.

Yeah, we need two at-commit routines, one of which needs to be called
early.  I'm prepping a patch.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot leak and core dump with serializable transactions

2008-12-03 Thread Alvaro Herrera
Alvaro Herrera escribió:

 Yeah, we need two at-commit routines, one of which needs to be called
 early.  I'm prepping a patch.

Here it is ... the large object patch is also included.  I've created
new functions to specify the resource owner to register a snapshot in;
now that there are two callers, it seems likely that there will be more
in the future.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Index: src/backend/access/transam/xact.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.269
diff -c -p -r1.269 xact.c
*** src/backend/access/transam/xact.c	19 Nov 2008 10:34:50 -	1.269
--- src/backend/access/transam/xact.c	3 Dec 2008 19:30:35 -
*** CommitTransaction(void)
*** 1667,1672 
--- 1667,1675 
  	/* Clean up the relation cache */
  	AtEOXact_RelationCache(true);
  
+ 	/* Clean up the snapshot manager */
+ 	AtEarlyCommit_Snapshot();
+ 
  	/*
  	 * Make catalog changes visible to all backends.  This has to happen after
  	 * relcache references are dropped (see comments for
*** PrepareTransaction(void)
*** 1906,1911 
--- 1909,1917 
  	/* Clean up the relation cache */
  	AtEOXact_RelationCache(true);
  
+ 	/* Clean up the snapshot manager */
+ 	AtEarlyCommit_Snapshot();
+ 
  	/* notify and flatfiles don't need a postprepare call */
  
  	PostPrepare_PgStat();
Index: src/backend/storage/large_object/inv_api.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/storage/large_object/inv_api.c,v
retrieving revision 1.135
diff -c -p -r1.135 inv_api.c
*** src/backend/storage/large_object/inv_api.c	2 Nov 2008 01:45:28 -	1.135
--- src/backend/storage/large_object/inv_api.c	3 Dec 2008 18:59:43 -
*** inv_open(Oid lobjId, int flags, MemoryCo
*** 247,253 
  	}
  	else if (flags  INV_READ)
  	{
! 		retval-snapshot = RegisterSnapshot(GetActiveSnapshot());
  		retval-flags = IFS_RDLOCK;
  	}
  	else
--- 247,254 
  	}
  	else if (flags  INV_READ)
  	{
! 		retval-snapshot = RegisterSnapshotOnOwner(GetActiveSnapshot(),
!    TopTransactionResourceOwner);
  		retval-flags = IFS_RDLOCK;
  	}
  	else
*** void
*** 270,277 
  inv_close(LargeObjectDesc *obj_desc)
  {
  	Assert(PointerIsValid(obj_desc));
  	if (obj_desc-snapshot != SnapshotNow)
! 		UnregisterSnapshot(obj_desc-snapshot);
  	pfree(obj_desc);
  }
  
--- 271,281 
  inv_close(LargeObjectDesc *obj_desc)
  {
  	Assert(PointerIsValid(obj_desc));
+ 
  	if (obj_desc-snapshot != SnapshotNow)
! 		UnregisterSnapshotFromOwner(obj_desc-snapshot,
! 	TopTransactionResourceOwner);
! 
  	pfree(obj_desc);
  }
  
Index: src/backend/utils/time/snapmgr.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/utils/time/snapmgr.c,v
retrieving revision 1.7
diff -c -p -r1.7 snapmgr.c
*** src/backend/utils/time/snapmgr.c	25 Nov 2008 20:28:29 -	1.7
--- src/backend/utils/time/snapmgr.c	3 Dec 2008 19:28:50 -
*** GetTransactionSnapshot(void)
*** 136,142 
  		 */
  		if (IsXactIsoLevelSerializable)
  		{
! 			CurrentSnapshot = RegisterSnapshot(CurrentSnapshot);
  			registered_serializable = true;
  		}
  
--- 136,143 
  		 */
  		if (IsXactIsoLevelSerializable)
  		{
! 			CurrentSnapshot = RegisterSnapshotOnOwner(CurrentSnapshot,
! 	  TopTransactionResourceOwner);
  			registered_serializable = true;
  		}
  
*** ActiveSnapshotSet(void)
*** 345,351 
  
  /*
   * RegisterSnapshot
!  * 		Register a snapshot as being in use
   *
   * If InvalidSnapshot is passed, it is not registered.
   */
--- 346,352 
  
  /*
   * RegisterSnapshot
!  * 		Register a snapshot as being in use by the current resource owner
   *
   * If InvalidSnapshot is passed, it is not registered.
   */
*** RegisterSnapshot(Snapshot snapshot)
*** 371,376 
--- 372,396 
  }
  
  /*
+  * As above, but register it on a specific resource owner
+  */
+ Snapshot
+ RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner)
+ {
+ 	Snapshot		retval;
+ 	ResourceOwner	save_CurrentResourceOwner;
+ 
+ 	save_CurrentResourceOwner = CurrentResourceOwner;
+ 	CurrentResourceOwner = TopTransactionResourceOwner;
+ 
+ 	retval = RegisterSnapshot(snapshot);
+ 
+ 	CurrentResourceOwner = save_CurrentResourceOwner;
+ 
+ 	return retval;
+ }
+ 
+ /*
   * UnregisterSnapshot
   *
   * Decrement the reference count of a snapshot, remove the corresponding
*** UnregisterSnapshot(Snapshot snapshot)
*** 395,400 
--- 415,433 
  	}
  }
  
+ void
+ UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner owner)
+ {
+ 	ResourceOwner	save_CurrentResourceOwner;
+ 
+ 	save_CurrentResourceOwner = 

Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 I'm not sure what mixed mode is supposed to be, but based on what
 I've seen so far, I'm a skeptical of the idea that encouraging people
 to raise default_statistics_target to 50 and turn on
 constraint_exclusion is reasonable.

 Why?

Because both of those settings are strictly worse for my database than
the defaults.  I don't have any partitioned tables, and see:

http://archives.postgresql.org/pgsql-hackers/2008-11/msg01837.php

   I'm also a bit surprised that
 there doesn't seem to be anything here that depends on the size of the
 database, even order-of-magnitude.  It seems like the right value for
 checkpoint_segments, at least, might depend on that.

 What does checkpoint_segments have to do with the size of the database?

It seems unlikely that you would want 256 MB of checkpoint segments on
a database that is only 100 MB (or even 500 MB).  But you might very
well want that on a database that is 1 TB.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Joshua D. Drake [EMAIL PROTECTED] writes:

 On Wed, 2008-12-03 at 13:30 -0500, Robert Haas wrote:
 I'm not sure what mixed mode is supposed to be, but based on what
 I've seen so far, I'm a skeptical of the idea that encouraging people
 to raise default_statistics_target to 50 and turn on
 constraint_exclusion is reasonable.

 Why?

Well did you have any response to what I posited before? I said mixed should
produce the same settings that the default initdb settings produce. At least
on a moderately low-memory machine that initdb targets.

It sure seems strange to me to have initdb which presumably is targeting a
mixed system -- where it doesn't know for sure what workload will be run --
produce a different set of values than the tuner on the same machine.

   I'm also a bit surprised that
 there doesn't seem to be anything here that depends on the size of the
 database, even order-of-magnitude.  It seems like the right value for
 checkpoint_segments, at least, might depend on that.

 What does checkpoint_segments have to do with the size of the database?

I had the same reaction but I think he's right.

checkpoint_segments is the maximum amount of space you want the WAL to take up
(ideally). Presumably on a small database you don't want hundreds of megabytes
of WAL for a 10M database. But on a terabyte data warehouse sitting on a big
SAN you're not going to be concerned with how much space the WAL files are
taking. In fact, really it would be nice if we allowed units of space (MB, GB,
etc) for checkpoint_segments.

I used to think of checkpoint_segments in terms of transaction rate and
maximum tolerable recovery time but really if those are your constraints
you're better off using checkpoint_timeout I think.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 Well did you have any response to what I posited before? I said mixed should
 produce the same settings that the default initdb settings produce. At least
 on a moderately low-memory machine that initdb targets.

I'm actually really skeptical of this whole idea of modes.  The main
thing mode does, other than set max connections, is handle desktop
differently than other modes by decreasing shared_buffers and
effective_cache_size by 4x and work_mem by 3x.  And the default
settings for max_connections are a SWAG that could easily be way off
for any particular installation.  I think it would be more useful to
get rid of modes, accept the user is going to have to specify
max_connections if the default of, say, 100 is not reasonable, and
handle the desktop case by telling the user to rerun the tool
overriding the system memory with a lower value.

I'm not sure if you've thought about this, but there is also a
difference between max_connections and maximum LIKELY connections.
For example my apps don't have too many users, since they are
internal-facing.  But setting max_connections to 100 gives me a nice
buffer just in case everyone decides to log on at once.  Still, for
performance reasons, I'd prefer to calculate based on a more likely
scenario, where the concurrent user count might be only 10 or 20.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 I can see an argument about constraint_exclusion but
 default_statistics_target I don't.

Why not?  I don't want to accept a big increase in ANALYZE times (or
planning times, though I'm really not seeing that at this point)
without some benefit.

 It seems unlikely that you would want 256 MB of checkpoint segments on
 a database that is only 100 MB (or even 500 MB).  But you might very
 well want that on a database that is 1 TB.

 It also seems unlikely that you would hit 256MB of checkpoint segments
 on a 100MB database before checkpoint_timeout and if you did, you
 certainly did need them.

So why do we have this parameter at all?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Joshua D. Drake [EMAIL PROTECTED] writes:

 It also seems unlikely that you would hit 256MB of checkpoint segments
 on a 100MB database before checkpoint_timeout and if you did, you
 certainly did need them.

 Remember postgresql only creates the segments when it needs them. 

Should we change the initdb output then?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
On Wed, Dec 3, 2008 at 4:41 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:
 If you are concerned about the analyze time between 10, 50 and 150, I
 would suggest that you are concerned about the wrong things. Remember

I can't rule that out.  What things do you think I should be concerned
about?  ISTM that default_statistics_target trades off ANALYZE time
and query planning time vs. the possibility of better plans.  If the
former considerations are not an issue for dst = 50, then maybe we
should emit 50 by default.  But the limited evidence that has been
published in this forum thus far doesn't support that contention.

  It also seems unlikely that you would hit 256MB of checkpoint segments
  on a 100MB database before checkpoint_timeout and if you did, you
  certainly did need them.

 So why do we have this parameter at all?

 Excellent question, for a different thread :)

I think the rhetorical answer is so that we don't fill up the disk,
which gets us back to database size.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Gregory Stark
Gregory Stark [EMAIL PROTECTED] writes:

 Heikki Linnakangas [EMAIL PROTECTED] writes:

 Gregory Stark wrote:
 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
means unnecessary full table vacuums long before they accomplish 
 anything.

 It allows you to truncate clog. If I did my math right, 200M transactions
 amounts to ~50MB of clog. Perhaps we should still raise it, disk space is 
 cheap
 after all.

Hm, the more I think about it the more this bothers me. It's another subtle
change from the current behaviour. 

Currently *every* vacuum tries to truncate the clog. So you're constantly
trimming off a little bit.

With the visibility map (assuming you fix it not to do full scans all the
time) you can never truncate the clog just as you can never advance the
relfrozenxid unless you do a special full-table vacuum.

I think in practice most people had a read-only table somewhere in their
database which prevented the clog from ever being truncated anyways, so
perhaps this isn't such a big deal.

But the bottom line is that the anti-wraparound vacuums are going to be a lot
more important and much more visible now than they were in the past.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Joshua D. Drake
On Wed, 2008-12-03 at 17:33 -0500, Robert Haas wrote:
 On Wed, Dec 3, 2008 at 4:41 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:
  If you are concerned about the analyze time between 10, 50 and 150, I
  would suggest that you are concerned about the wrong things. Remember
 
 I can't rule that out.  What things do you think I should be concerned
 about?

Your databases production performance with the change of the parameter.
Quite a bit more often than not, your problem (if you have one) isn't
going to be default_statistics_target is too high.

   ISTM that default_statistics_target trades off ANALYZE time
 and query planning time vs. the possibility of better plans.  If the
 former considerations are not an issue for dst = 50, then maybe we
 should emit 50 by default.  But the limited evidence that has been
 published in this forum thus far doesn't support that contention.
 

Actually there are years worth of evidence in these archives. Not that
the 50 is the right number but that the current settings are definitely
wrong and that higher ones are needed. That people generally start
around 100 and go from there, except where they don't and then someone
like Tom, I or some other person says, Oh you need to increase
default_statistics_target.

There is no empirical evidence that 50 is the right setting but there is
more than enough anecdotal evidence to suggest that 50 is a lot better
than 10 and that even higher than 50 is reasonable. In an effort to
follow the PostgereSQL conservative mantra, 50 is a good compromise. 


   It also seems unlikely that you would hit 256MB of checkpoint segments
   on a 100MB database before checkpoint_timeout and if you did, you
   certainly did need them.
 
  So why do we have this parameter at all?
 
  Excellent question, for a different thread :)
 
 I think the rhetorical answer is so that we don't fill up the disk,

I don't think at any time I have said to my self, I am going to set this
parameter low so I don't fill up my disk. If I am saying that to myself
I have either greatly underestimated the hardware for the task. Consider
that we are quarreling over what amounts to a nominal amount of hard
drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more
capacity than that.

*If* hard drive is a space (that much of) a concern then you are having
other problems already that pgtune won't satisfy and you should be
manually tuning the conf in the first place.

Joshua D. Drake


 which gets us back to database size.
 
 ...Robert
 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Kevin Grittner
 Robert Haas [EMAIL PROTECTED] wrote: 
 On Wed, Dec 3, 2008 at 4:41 PM, Joshua D. Drake
[EMAIL PROTECTED] wrote:
 If you are concerned about the analyze time between 10, 50 and 150,
I
 would suggest that you are concerned about the wrong things.
Remember
 
 I can't rule that out.  What things do you think I should be
concerned
 about?  ISTM that default_statistics_target trades off ANALYZE time
 and query planning time vs. the possibility of better plans.  If the
 former considerations are not an issue for dst = 50, then maybe we
 should emit 50 by default.  But the limited evidence that has been
 published in this forum thus far doesn't support that contention.
 
One more data point to try to help.
 
While the jump from a default_statistics_target from 10 to 1000
resulted in a plan time increase for a common query from 50 ms to 310
ms, at a target of 50 the plan time was 53 ms.  Analyze time was 7.2
minutes and 18.5 minutes for targets of 10 and 50.  This is an 842 GB
database on an 8 processor (3.5 GHz Xeon) machine with 64 GB RAM
running (soon to be updated) PostgreSQL 8.2.7.
 
Based on the minimal plan time increase of this test, we're going to
try 50 in production and see how it goes.
 
It's worth pondering that at the target of 1000, had we put that into
production, running this query 300,000 times per day would have used
21 hours and 40 minutes of additional CPU time per day on planning the
runs of this one query, while a target of 50 only consumes an
additional 15 minutes of 3.5 GHz CPU time per day.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] In-place upgrade: catalog side

2008-12-03 Thread Bruce Momjian
Zdenek Kotala wrote:
 If you compare with pg_migrator, there is better handling of locale and I 
 think 
 vacuum freeze is used correctly. Also shuffling with tablespaces is little 
 bit 
 different (it should avoid to move data outside of mountpoint). But in 
 principal 
 the idea is same.
 
  -There are 10 TODO items listed for the pg_migrator project, most or all 
  of which look like should be squashed before this is really complete. 
  Any chance somebody (Korry?) has an improved version of this floating 
  around beyond what's in the pgfoundry CVS already?
 
 As I mentioned before pg_migrator and pg_upgrade.sh is not good way. It is 
 workaround. It does not make sense to continue in this way.

As the author of the original shell script, which was in
/contrib/pg_upgrade, I think the code has grown to the point where it
should be reimplemented in something like Perl.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Wed, 3 Dec 2008, Gregory Stark wrote:


It sure seems strange to me to have initdb which presumably is targeting a
mixed system -- where it doesn't know for sure what workload will be run --
produce a different set of values than the tuner on the same machine.


It's been a long time since the output from initdb was targeting anything 
but a minimal system with an untuned kernel and limited resources.  If you 
invert the normal tuning recommendations, as if its initial configuration 
were the output from typical practice, it would be aiming at a system with 
approximately 128MB of RAM.  That feels about right to me; when I had 
128MB of RAM in my high-end P2-300 server running PG 7.0, 32MB of 
shared_buffers was huge and 3 checkpoints segments was plenty.  I don't 
recall regularly dirtying things fast enough to see checkpoints occuring 
too fast then like you can do trivially nowadays.  Here in 2008, I push 
checkpoint_segments up to 10 even for the most trivial apps lest the logs 
fill with those buggers the first time I run an update on a table.


Right now, my program doesn't fiddle with any memory settings if you've 
got less than 256MB of RAM.  Were someone to champion the idea that 
*nothing* should be fiddled with in those cases, that's not an 
unreasonable position.  I'm not the sort to be too concerned myself that 
the guy who thinks he's running a DW on a system with 64MB of RAM might 
get bad settings, but it's a fair criticism to point that out as a 
problem.



In fact, really it would be nice if we allowed units of space (MB, GB,
etc) for checkpoint_segments.


That's a good way to think about this, let's run with that for a minute. 
The values I'm throwing in there look like this (if your tab stops aren't 
at 8 characters this will suck):


  Completion  Max   Max
TypeSegsTarget  SegmentsUsage
web 8   0.7 23  368MB
oltp16  0.9 47  752MB
dw  64  0.9 187 3GB
mixed   16  0.9 47  752MB
desktop 3   0.5 9   144MB

Is 368MB of overhead unreasonable for a web application database today, 
where you can get a mirrored pair of disks for under $1/GB?  It's only the 
DW case that even starts to leave trivial territory.  Your example of 
somebody who thinks the overhead is too high on their 10MB database is 
already being blown away even at the default of 3 segments (assuming that 
data has enough churn on it to go through that many segments ever--if it 
doesn't then the maximum doesn't matter anyway).


The reality here is that it's the recovery playback time that's the real 
bear.  If I were trying to argue against me, what would be more persuasive 
is some tests showing how long it takes to sort through, cleanup, and 
replay the appropriate portions of as many as 47 segments worth of WAL 
after an unclean shutdown when checkpoint_segments=16.  Given how long 
that takes, it might be possible to find a modern system takes a while to 
process that much WAL volume.  It's pretty rare I run into that (usually 
only after I do something abusive), whereas complaints about the logs 
filling with checkpoint warnings on systems set to the default seem to pop 
up all the time.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Kevin Grittner [EMAIL PROTECTED] writes:

 One more data point to try to help.
  
 While the jump from a default_statistics_target from 10 to 1000
 resulted in a plan time increase for a common query from 50 ms to 310
 ms, at a target of 50 the plan time was 53 ms.  

That sounds like it would be an interesting query to analyze in more detail.
Is there any chance to could run the complete graph and get a chart of analyze
times for all statistics values from 1..1000 ? And log the explain plans to a
file so we can look for at what statistics targets the plan changed?

Or if the data is public I would be interested in looking at doing it if you
want to send it to me.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Joshua D. Drake
On Thu, 2008-12-04 at 00:11 +, Gregory Stark wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:

  I
 started to do this for you last week but got side-tracked. Do you have any
 time for this?

I can do it if you have a script.

 So how big should a minimum postgres install be not including your data? 
 Is 100M reasonable? Should we say Postgres requires 200M? 500? 1G? Whatever
 number we pick (or allow the user to pick) will determine how large this value
 ought to be. And incidentally also provide a bound on
 autovacuum_max_freeze_age as Heikki pointed out on another thread.
 

I fail to see what any of the above paragraph has to do with
checkpoint_segments.

Anyway, I have made my arguments.

Joshua D. Drake


 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Wed, 3 Dec 2008, Robert Haas wrote:


I'm not sure if you've thought about this, but there is also a
difference between max_connections and maximum LIKELY connections.


It's actually an implicit assumption of the model Josh threw out if you 
stare at the numbers.  The settings for work_mem are twice as high per 
connection in the Web+OLTP application cases, based on the assumption that 
you're just not going to get everybody doing sorting at once in those 
situations.  I toyed with exposing that as an explicit connection load 
duty factor, then remembered I was trying to deliver something rather 
than tweak the parameters forever.  It may be a bit too aggressive as 
written right now in those cases.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Wed, 3 Dec 2008, Guillaume Smet wrote:


- it would be really nice to make it work with Python 2.4 as RHEL 5 is
a Python 2.4 thing and it is a very widespread platform out there,


The 2.5 stuff is only required in order to detect memory on Windows.  My 
primary box is RHEL5 and runs 2.4, it works fine there.



- considering the audience of this tool, I think you should explain in
the usage text which type of workload implies each database type (DW,
OLTP, Web, Mixed, Desktop).


Once I'm done with the docs I'll refer over to those, it's too much to put 
into the usage without cluttering it.



- it would be nice to be able to define the architecture (32-64 bits)
from the command line (especially considering I won't be able to run
it on our target boxes which are all RHEL 5 :))


I'm starting to lean toward making everything that gets detected as also 
being possible to override, for these case.  I want to make this just work 
in as many cases as possible, but the situation where someone is 
configuring/testing on a system other than the server is pretty common. 
Now that I think about it I often setup configs on my 32-bit laptop and 
them move them over onto 64-bit servers.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Greg Smith [EMAIL PROTECTED] writes:

 On Wed, 3 Dec 2008, Gregory Stark wrote:

 It sure seems strange to me to have initdb which presumably is targeting a
 mixed system -- where it doesn't know for sure what workload will be run --
 produce a different set of values than the tuner on the same machine.

 It's been a long time since the output from initdb was targeting anything but 
 a
 minimal system with an untuned kernel and limited resources.  If you invert 
 the
 normal tuning recommendations, as if its initial configuration were the output
 from typical practice, it would be aiming at a system with approximately 128MB
 of RAM.  That feels about right to me; when I had 128MB of RAM in my high-end
 P2-300 server running PG 7.0, 32MB of shared_buffers was huge and 3 
 checkpoints
 segments was plenty.  I don't recall regularly dirtying things fast enough to
 see checkpoints occuring too fast then like you can do trivially nowadays.

Well I think there's also an assumption in initdb that Postgres can't assume
it's on a dedicated machine. So whether it's 32MB on a dedicated 128MB machine
or 32MB on a 256MB machine where it's only expected to be half the workload of
the machine it works out to about the same thing.

 Right now, my program doesn't fiddle with any memory settings if you've got
 less than 256MB of RAM.  

What I'm suggesting is that you shouldn't have to special case this. That you
should expect whatever formulas you're using to produce the same values as
initdb if they were run on the same machine initdb is targeting.

But actually I'm more concerned with the *non* memory related parameters. It
may make sense to tweak those one way or the other for oltp or dss but mixed
should be exactly what initdb produces since that's exactly what it's
targeting -- a system that will have a wide mixture of queries and must
function reasonably well for both data warehouse and oltp queries.

 Completion  Max   Max
 Type  SegsTarget  SegmentsUsage
 web   8   0.7 23  368MB
 oltp  16  0.9 47  752MB
 dw64  0.9 187 3GB
 mixed 16  0.9 47  752MB
 desktop   3   0.5 9   144MB

(incidentally using tab stops in emails is probably a bad idea because of
quoting as above)

Uhm, I hadn't actually seen this list before. I don't understand how web is
different from oltp. A web service really is just one (very typical) example
of an oltp application.

And desktop seems like an outlier here. I suppose it's meant to capture
whether postgres is on a dedicated box? But it's possible to have a
non-dedicated oltp application or non-dedicated data warehouse box just as
easily. It's an orthogonal issue from the oltp/data-warehouse axis.

 Is 368MB of overhead unreasonable for a web application database today

Well I think it's more than most people expect a single application install to
take up before they start putting data in it. It would probably work better if
we were asking how big their database was and then could say, well, you said
you had 10G of data so 300MB of overhead isn't going to be so bad.


 The reality here is that it's the recovery playback time that's the real bear.

I agree, but then that's what checkpoint_timeout is for, no? It might take
longer to replay but the recovery time should bear some relation to how long
it took to write out the wal. More so than to the sheer size of the wal. 

 whereas complaints about the logs filling with checkpoint warnings on systems
 set to the default seem to pop up all the time.

filling? The cure to having too much space taken up by logs is to take up
space with, well, logs?

The logs are filling up with warnings which explain exactly what parameter to
adjust. Are there really complaints about this?

I'm really beginning to think the root of the problem is the name. If it were
transaction_log_max_space and measured in megabytes people would be happy to
say ok, I'll make space for 100MB of logs or whatever. Today they don't know
what to set it to or what the impact of setting it will be.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Wed, 3 Dec 2008, Robert Haas wrote:


Then I tried -T web and got what seemed like a more reasonable set
of values.  But I wasn't sure I needed that many connections, so I
added -c 150 to see how much difference that made.  Kaboom!


That and the import errors fixed in the version attached (just replacing 
the script, not its data), thanks for testing that out.



I'm not sure what mixed mode is supposed to be, but based on what
I've seen so far, I'm a skeptical of the idea that encouraging people
to raise default_statistics_target to 50 and turn on
constraint_exclusion is reasonable.


The statistics stuff is obviously a broader discussion, will let that rage 
in existing threads.  The reason for setting constraint_exclusion in the 
mixed case is that people who just pick the defaults without reading 
anything will get a configuration that supports partitions usefully.  One 
of the HINTs I intend to throw out for that specific case is that they 
should turn it off if they don't ever intend to use paritions.


The idea of the mixed mode is that you want to reduce the odds someone 
will get a massively wrong configuration if they're not paying attention. 
Is it worse to suffer from additional query overhead if you're sloppy with 
the tuning tool, or to discover addition partitions didn't work as you 
expected?  That's a tough call; I could invert things, so that it defaults 
to off in mixed mode, as always, and just produces a HINT to turn it on. 
I don't have a really strong opinion there either way.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

pgtune.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Greg Smith [EMAIL PROTECTED] writes:

 Is it worse to suffer from additional query overhead if you're sloppy with
 the tuning tool, or to discover addition partitions didn't work as you
 expected?

Surely that's the same question we faced when deciding what the Postgres
default should be? 

That and the unstated other question Is someone more likely to use partitions
without reading the manual or not use partitions without reading the manual
about the down-sides of constraint_exclusion (in the partitioning
section)


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark

Joshua D. Drake [EMAIL PROTECTED] writes:

 On Thu, 2008-12-04 at 00:11 +, Gregory Stark wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:

  I
 started to do this for you last week but got side-tracked. Do you have any
 time for this?

 I can do it if you have a script.

Well, I can send you what I have so far but it still needs more work. I only
got as far as the graphs I sent earlier which don't include scanning for
changed plans. Also, if you have any sample databases with skewed data sets
that would be interesting.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Alvaro Herrera
Gregory Stark escribió:
 Joshua D. Drake [EMAIL PROTECTED] writes:

  I don't think at any time I have said to my self, I am going to set this
  parameter low so I don't fill up my disk. If I am saying that to myself
  I have either greatly underestimated the hardware for the task. Consider
  that we are quarreling over what amounts to a nominal amount of hard
  drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more
  capacity than that.
 
 Well my phone has 16G of RAM, why not 1 ?

I don't think the disk space used is the only consideration here.  You
also have to keep recovery time in mind.  If you set it to 1000,
recovery would take way too long.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Joshua D. Drake [EMAIL PROTECTED] writes:

 On Thu, 2008-12-04 at 00:11 +, Gregory Stark wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:

  I
 started to do this for you last week but got side-tracked. Do you have any
 time for this?

 I can do it if you have a script.

 So how big should a minimum postgres install be not including your data? 
 Is 100M reasonable? Should we say Postgres requires 200M? 500? 1G? Whatever
 number we pick (or allow the user to pick) will determine how large this 
 value
 ought to be. And incidentally also provide a bound on
 autovacuum_max_freeze_age as Heikki pointed out on another thread.
 

 I fail to see what any of the above paragraph has to do with
 checkpoint_segments.

Are we all on the same page on what checkpoint_segments does? It's the number
of segments of WAL log postgres will allow to accumulate before it triggers a
checkpoint and trims off any it doesn't need. 

That means even if your database is just churning updating the same records
over and over the WAL will grow to this size before Postgres makes any attempt
to trim it (unless it hits checkpoint_timeout but that's a separate tunable).
If you're loading data all your data will go into the heap *and* the wal log
until it hits this size and triggers a checkpoint.

So this is the minimum amount of extra space you need in addition to your data
for a functioning postgres database install not including your data. If you
don't anticipate postgres using this much space and set aside enough space for
it, your database is at risk of randomly stopping and producing errors when it
can't create new log files you told it needs.

It's interesting that this is the *only* parameter we can come up with that
really directly depends on disk space. The tables are obviously going to be as
big as they have to be and there's not much to do about that. If we could
eliminate this parameter it would be a lot nicer. 


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

2008-12-03 Thread Hiroshi Inoue

Magnus Hagander wrote:

Hiroshi Inoue wrote:

I think the thing us that as long as the encodings are compatible
(latin1 with different names for example) it worked  fine.


 In any case I think the problem is that gettext is
looking at a setting that is not what we are looking at.  Particularly
with the 8.4 changes to allow per-database locale settings, this has
got to be fixed in a bulletproof way.

Attached is a new patch to apply bind_textdomain_codeset() to most
server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL
and PG_EUC_JIS_2004. EUC-JP may be OK for EUC_JIS_2004.

Unfortunately it's hard for Saito-san and me to check encodings
other than EUC-JP. 


In principle this looks good, I think,  but I'm a bit worried around the
lack of testing.


Thanks and I agree with you.

 I can do some testing under LATIN1 which is what we use

in Sweden (just need to get gettext working *at all* in my dev
environment again - I've somehow managed to break it), and perhaps we
can find someone to do a test in an eastern-european locale to get some
more datapoints?

Can you outline the steps one needs to go through to show the problem,
so we can confirm it's fixed in these locales?


Saito-san and I have been working on another related problem about
changing LC_MESSAGES locale properly under Windows and would be able
to provide a patch in a few days. It seems preferable for us to apply
the patch also so as to change the message catalog easily.

Attached is an example in which LC_MESSAGES is cht_twn(zh_TW) and
the server encoding is EUC-TW. You can see it as a UTF-8 text
because the client_encoding is set to UTF-8 first.

BTW you can see another problem at line 4 in the text.
At the point the LC_MESSAGES is still japanese and postgres fails
to convert a Japanese error message to EUC_TW encoding. There's
no wonder but it doesn't seem preferable.

regards,
Hiroshi Inoue
set client_encoding to utf_8;
SET
1;
psql:cmd/euctw.sql:2: ERROR:  character 0xb9e6 of encoding EUC_TW has no 
equivalent in UTF8
select current_database();
 current_database 
--
 euctw
(1 s)

show server_encoding;
 server_encoding 
-
 EUC_TW
(1 s)

show lc_messages;
lc_messages 

 Japanese_Japan.932
(1 s)

set lc_messages to cht;
SET
select a;
psql:cmd/euctw.sql:7: 錯誤:  欄位a不存在
LINE 1: select a;
   ^
1;
psql:cmd/euctw.sql:8: 錯誤:  在語法錯誤附近發生 1
LINE 1: 1;
^
select * from a;
psql:cmd/euctw.sql:9: 錯誤:  relation a不存在
LINE 1: select * from a;
  ^

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Joshua D. Drake
On Wed, 2008-12-03 at 22:17 -0300, Alvaro Herrera wrote:
 Gregory Stark escribió:
  Joshua D. Drake [EMAIL PROTECTED] writes:
 
   I don't think at any time I have said to my self, I am going to set this
   parameter low so I don't fill up my disk. If I am saying that to myself
   I have either greatly underestimated the hardware for the task. Consider
   that we are quarreling over what amounts to a nominal amount of hard
   drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more
   capacity than that.
  
  Well my phone has 16G of RAM, why not 1 ?
 
 I don't think the disk space used is the only consideration here.  You
 also have to keep recovery time in mind.  If you set it to 1000,
 recovery would take way too long.

Well certainly but the original argument that came back was, (from
Robert Haas):


It seems unlikely that you would want 256 MB of checkpoint segments on
a database that is only 100 MB (or even 500 MB).  But you might very
well want that on a database that is 1 TB.


My whole point is that:

1. It seems unlikely that you would hit 256MB of checkpoint segments on
a 100MB database before checkpoint_timeout and if you did, you certainly
did need them. (the checkpoint segments)

2. taking up space is such a minute concern in comparison to the
potential benefit.

Recovery is certainly a consideration but let's be realistic it is the
last consideration because it is the least likely to happen. What is
more likely to happen is IO spikes because we are recycling logs too
much.

I know we have some other facilities to deal with that now too but it
doesn't completely negate the problem and in my opinion, increasing the
checkpoint_segments provides no perceivable downside in production use
but does provide significant perceivable upside.

Joshua D. Drake




-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Thu, 4 Dec 2008, Gregory Stark wrote:


Right now, my program doesn't fiddle with any memory settings if you've got
less than 256MB of RAM.


What I'm suggesting is that you shouldn't have to special case this. That you
should expect whatever formulas you're using to produce the same values as
initdb if they were run on the same machine initdb is targeting.


The reason that approach isn't taken is that the model here assumes the OS 
overhead is negligable relative to everything else going on.  If you've 
only got a small amount of RAM, that assumption is so badly broken that 
you can't just extend the curves for everything down to there and expect 
that what comes out will make any sense.  I started to make a more 
complicated bit that did scale down to the bottom by modeling the overhead 
better, Josh talked me out of doing it for now.



But actually I'm more concerned with the *non* memory related parameters. It
may make sense to tweak those one way or the other for oltp or dss but mixed
should be exactly what initdb produces since that's exactly what it's
targeting -- a system that will have a wide mixture of queries and must
function reasonably well for both data warehouse and oltp queries.


The only way this line of discussion will go is toward talking about what 
should be changed in initdb to make it more representative of the current 
real world, and I know that's not going anywhere (see 
default_statistics_target=10).  The idea that the sample configuration 
is tuned usefully for any application whatsoever gets nothing from me but 
a chuckle.



And desktop seems like an outlier here. I suppose it's meant to capture
whether postgres is on a dedicated box? But it's possible to have a
non-dedicated oltp application or non-dedicated data warehouse box just as
easily.


That's the target for something that's not a dedicated server--a desktop 
PC you use as a general workstation, maybe you're installing PostgreSQL as 
a developer that's competing with your web server and other apps; 
something like that.  There might be a better name for that.



Is 368MB of overhead unreasonable for a web application database today


Well I think it's more than most people expect a single application install to
take up before they start putting data in it.


Segments don't get allocated until you churn through that much WAL 
activity; that figure is an upper-bound after you've pushed more than that 
worth of data through WAL and into the database.  The only example where 
this overhead isn't dwarfed by the size of the resulting database is where 
some small number of records are inserted, then constantly updated and 
vacuumed.  And you know what?  The person doing that is likely to really 
benefit from having checkpoint_segments set to a larger value.  Update and 
vacuum heavy workloads are exactly the sort where you end up checkpointing 
too often with the default parameters.



I'm really beginning to think the root of the problem is the name. If it were
transaction_log_max_space and measured in megabytes people would be happy to
say ok, I'll make space for 100MB of logs or whatever. Today they don't know
what to set it to or what the impact of setting it will be.


Unless they do something crazy like read the documentation:

http://www.postgresql.org/docs/8.3/static/wal-configuration.html
There will always be at least one WAL segment file, and will normally not 
be more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 
files. Each segment file is normally 16 MB (though this size can be 
altered when building the server). You can use this to estimate space 
requirements for WAL.


Too complicated for most people you say?  I agree; that's why I put some 
annotated examples for what those translate into 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server like 32 
(checkpoint every 512MB).


What fun.  I'm beginning to remember why nobody has ever managed to 
deliver a community tool that helps with this configuration task before.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 The idea of the mixed mode is that you want to reduce the odds someone will
 get a massively wrong configuration if they're not paying attention. Is it
 worse to suffer from additional query overhead if you're sloppy with the
 tuning tool, or to discover addition partitions didn't work as you expected?
  That's a tough call; I could invert things, so that it defaults to off in
 mixed mode, as always, and just produces a HINT to turn it on. I don't have
 a really strong opinion there either way.

I think that the strong feelings about default_statistics_target and
constraint_exclusion come from the fact that when they are too low (in
the first case) or off (in the second case), you can get very, very
bad query plans.  The penalties in the opposite direction are more
subtle.  If they're so subtle that we don't care about incurring them,
then let's change initdb too.  If not, then let's not have the tuning
tool generate them by default either.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V2 of PITR performance improvement for 8.4

2008-12-03 Thread Koichi Suzuki
Agreed.

I borrowed WAL parsing code from XLogdump and I think WAL parsing
should be another candidate.

2008/12/3 Fujii Masao [EMAIL PROTECTED]:
 Hi,

 On Thu, Nov 27, 2008 at 9:04 PM, Koichi Suzuki [EMAIL PROTECTED] wrote:
 Please find enclosed a revised version of pg_readahead and a patch to
 invoke pg_readahead.

 Some similar functions are in xlog.c and pg_readahead.c (for example,
 RecordIsValid). I think that we should unify them as a common function,
 which helps to develop the tool (for example, xlogdump) treating WAL in
 the future.

 Regards,

 --
 Fujii Masao
 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
 NTT Open Source Software Center




-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: pgsql: Properly unregister OpenSSL callbacks when libpq is done with

2008-12-03 Thread Kris Jurka

Magnus Hagander wrote:

Log Message:
---
Properly unregister OpenSSL callbacks when libpq is done with
it's connection. This is required for applications that unload
the libpq library (such as PHP) in which case we'd otherwise
have pointers to these functions when they no longer exist.


Breaks the build with --enable-thread-safety and --with-openssl because 
of this typo.


Kris Jurka
*** a/src/interfaces/libpq/fe-secure.c
--- b/src/interfaces/libpq/fe-secure.c
***
*** 918,925  destroy_ssl_system(void)
  			 * This means we leak a little memory on repeated load/unload
  			 * of the library.
  			 */
! 			free(pqlockarray);
! 			pqlockarray = NULL;
  		}
  	}
  
--- 918,925 
  			 * This means we leak a little memory on repeated load/unload
  			 * of the library.
  			 */
! 			free(pq_lockarray);
! 			pq_lockarray = NULL;
  		}
  	}
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pgsql: Properly unregister OpenSSL callbacks when libpq is done with

2008-12-03 Thread Bruce Momjian
Kris Jurka wrote:
 Magnus Hagander wrote:
  Log Message:
  ---
  Properly unregister OpenSSL callbacks when libpq is done with
  it's connection. This is required for applications that unload
  the libpq library (such as PHP) in which case we'd otherwise
  have pointers to these functions when they no longer exist.
 
 Breaks the build with --enable-thread-safety and --with-openssl because 
 of this typo.

Thanks, applied.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 What fun.  I'm beginning to remember why nobody has ever managed to deliver
 a community tool that helps with this configuration task before.

I have to say I really like this tool.  It may not be perfect but it's
a lot easier than trying to do this analysis from scratch.  And we are
really only arguing about a handful of settings.

It wouldn't take a lot to convince me that checkpoint_segments=3 is
too low.  I easily blew through that testing the bulk-insert tuning
patch.  I'm curious why wal_buffers is being set to 512 *
checkpoint_segments.  Are they related?  The default value for
wal_buffers is only 64 kB, which means someone thought you shouldn't
need much space for this at all, but this suggests a setting in the
4-32 MB range, an increase of ~2 orders of magnitude.  For all I know
that could be right but it's a big increase.

Regarding the religious war now in progress, I think it would be
awfully good for someone to offer some thoughts on how to figure out
which particular columns on which particular tables need a higher
statistics target.  That might allow us to either (a) build a wizard
that helps you find those problems that could perhaps be used
alongside this one or (b) incorporate those same smarts into core.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-03 Thread Bruce Momjian
KaiGai Kohei wrote:
 I updated the patch set of SE-PostgreSQL (revision 1268).
 
 [1/6] 
 http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1268.patch
 [2/6] 
 http://sepgsql.googlecode.com/files/sepostgresql-pg_dump-8.4devel-3-r1268.patch
 [3/6] 
 http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1268.patch
 [4/6] 
 http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1268.patch
 [5/6] 
 http://sepgsql.googlecode.com/files/sepostgresql-tests-8.4devel-3-r1268.patch
 [6/6] 
 http://sepgsql.googlecode.com/files/sepostgresql-row_acl-8.4devel-3-r1268.patch
 
 Draft of the SE-PostgreSQL documentation is here:
   http://wiki.postgresql.org/wiki/SEPostgreSQL
 
 List of updates:
 - The patches are rebased to the CVS HEAD.
 - RelOptions related hooks are cleaned up.
 - The Row-level ACL feature is chosen in default.
 - rowacl_table_default() is added to show the default ACL of the table.
 - The initial revision of regression test for Row-level ACL is added.
 
 If you have anything to comment for the patches, could you disclose it?
 It is not necessary to be a comprehensive one. Don't hesitate to submit.

I looked over the patch and was wondering why you chose to have a
configure option to disable row-level ACLs.  I assume that could just be
always enabled.  In fact, perhaps that is the first part of the patch
that should be applied because it doesn't rely on SE-Linux.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark

Greg Smith [EMAIL PROTECTED] writes:

 On Thu, 4 Dec 2008, Gregory Stark wrote:

 What I'm suggesting is that you shouldn't have to special case this. That you
 should expect whatever formulas you're using to produce the same values as
 initdb if they were run on the same machine initdb is targeting.

 The reason that approach isn't taken is that the model here assumes the OS
 overhead is negligable relative to everything else going on.  

ok that does make sense. But the non-memory parameters...

... I've cut part of my response for a separate thread ...

 And desktop seems like an outlier here. I suppose it's meant to capture
 whether postgres is on a dedicated box? But it's possible to have a
 non-dedicated oltp application or non-dedicated data warehouse box just as
 easily.

 That's the target for something that's not a dedicated server--a desktop PC 
 you
 use as a general workstation, maybe you're installing PostgreSQL as a 
 developer
 that's competing with your web server and other apps; something like that.
 There might be a better name for that.

My point was more that you could have a data warehouse on a non-dedicated
machine, you could have a web server on a non-dedicated machine, or you could
have a mixed server on a non-dedicated machine. I don't see how you would
decide whether to set enable_constraint_exclusion for desktop for example.

 Is 368MB of overhead unreasonable for a web application database today

 Well I think it's more than most people expect a single application install 
 to
 take up before they start putting data in it.

 Segments don't get allocated until you churn through that much WAL activity;
 that figure is an upper-bound after you've pushed more than that worth of data
 through WAL and into the database.  The only example where this overhead isn't
 dwarfed by the size of the resulting database 

Right, well, no, it won't be dwarfed -- it'll be about the same size. Ie, if
you load 100MB into the database there'll be about 100MB of logs generated. Up
to the point where you hit this maximum upper bound.

But yes, right that it's the upper bound for the extra space allocated in
addition to the size of the database. And how much extra space should we
allocate? 

I don't see why this extra space bound should depend on the type of OLTP vss
DSS workload. Only on how much disk space is available that the admin is
willing to dedicate to Postgres. Assuming an admin of a 1TB server is willing
to dedicate 1GB to logs and the admin of a 1GB server would be annoyed to have
to throw more than a few hundred megs seems as reasonable a place as any to
start.

 is where some small number of records are inserted, then constantly updated
 and vacuumed. And you know what? The person doing that is likely to really
 benefit from having checkpoint_segments set to a larger value. Update and
 vacuum heavy workloads are exactly the sort where you end up checkpointing
 too often with the default parameters.

Well there are a few problems with this. a) we have HOT now so you don't need
any vacuums to be part of the picture. b) if you're updating the same pages
over and over again a checkpoint will be super-quick since there will only be
a few pages to write out so no you don't really need some large
checkpoint_segments for any performance reason.

Josh's logic is impeccable -- for the specific use case he's describing of a
truly dedicated server with enough disk space for a major production database.
But not every install is going to have gigabytes of space reserved for it and
not every admin is going to realize that he really should set aside gigabytes
of space even though he only expects his database to be a few megabytes.

 I'm really beginning to think the root of the problem is the name. If it were
 transaction_log_max_space and measured in megabytes people would be happy 
 to
 say ok, I'll make space for 100MB of logs or whatever. Today they don't 
 know
 what to set it to or what the impact of setting it will be.

 Unless they do something crazy like read the documentation:

Well we know nobody does that :/

It's great that Postgres has such great documentation but whenever we have the
chance to replace something with an option which doesn't need any
documentation that would be even better. I'm just exploring whether that's an
option here.

 What fun.  I'm beginning to remember why nobody has ever managed to deliver a
 community tool that helps with this configuration task before.

Well I don't think this is why. Nobody's even tried to do this side of things
before. They always got bogged down in trying to parse config files and such.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Thu, 4 Dec 2008, Gregory Stark wrote:


Greg Smith [EMAIL PROTECTED] writes:


Is it worse to suffer from additional query overhead if you're sloppy with
the tuning tool, or to discover addition partitions didn't work as you
expected?


Surely that's the same question we faced when deciding what the Postgres
default should be?


Gosh, you're right.  I'm really new here, and I just didn't understand how 
things work.  I should have known that there was lots of thorough research 
into that setting before the default was set.  (hangs head in shame)


Wait, what list am I on?  pgsql-hackers?  Oh, crap, that can't be right at 
all then.  This one is actually an interesting example of how this stuff 
ends up ossified without being revisited, I'm glad you brought it up.


First we have to visit the 8.1 and 8.2 documentation.  There we find the 
real reason it originally defaulted to off:


http://www.postgresql.org/docs/8.1/static/runtime-config-query.html 
Currently, constraint_exclusion is disabled by default because it risks 
incorrect results if query plans are cached if a table constraint is 
changed or dropped, the previously generated plan might now be wrong, and 
there is no built-in mechanism to force re-planning.  It stayed off for 
that reason for years.


Then the plan invalidation stuff went into 8.3 that made this no longer 
true.  Bruce even removed the item from the TODO list that used to say 
that constraint_exclusion should be improved to allow it to be used for 
all statements with little performance impact.  Then a couple of months 
later, when the 8.3 docs were being worked on, Tom updated the text to 
remove the obsolete warning about the plan risks:


http://archives.postgresql.org/pgsql-committers/2007-03/msg00372.php

Leaving only the leftovers of the original caveat about how it can also 
cause some overhead as the reason for why it was still off--a concern 
which was certainly more serious when that text was written in 2005 than 
it is today for multiple reasons.


How much was that overhead lowered by the work done in 8.3?  I can't find 
any public information suggesting that was ever even discussed.  The only 
thing I found when poking around looking for it is that Tom had expressed 
some concerns that the proof overhead was too still large back in 2006: 
http://archives.postgresql.org/pgsql-committers/2006-02/msg00035.php


But you know what?  The cached proof comparison bit Tom commited a couple 
of weeks ago shifted the mechanics of the overhead for this specific case 
around, so even if we did have 8.3 results they'd need to get re-run at 
this point anyway.  See below for more on what might be different soon.


So, if you want to say that turning on constraint_exclusion by default is 
a horrible idea because it adds significant overhead, and you have any 
sort of evidence that will still be true for 8.4 on the kind of hardware 
8.4 is likely to run on, I would greatly appreciate that information.


But presuming that serious thought must have went into every decision made 
about what the defaults for all the performance-related parameter in the 
postgresql.conf is something we all know just ain't so.  What I see is a 
parameter that doesn't add enough overhead relative to query execution 
time on today's systems that I've noticed whether it was on or off, one 
that's set to off only by historical accident combined with basic 
conservatism (mainly from Tom far as I can tell, he's a nice reliable 
source for that).  Whereas if it's accidentally set wrong, it can lead to 
massively wrong plans.  I'm not sure what the right move here is, but the 
appeal to authority approach for defending the default here isn't going to 
work on me.



That and the unstated other question Is someone more likely to use partitions
without reading the manual or not use partitions without reading the manual
about the down-sides of constraint_exclusion (in the partitioning
section)


Have you started thinking about the implications of 
http://archives.postgresql.org/message-id/[EMAIL PROTECTED] 
yet?  It is a bold new world of people who partition with less time stuck 
in the manual first we approach, and I was very much thinking about that 
when mulling over whether I agreed with Josh's suggestion to put that into 
the default mixed settings before I went with it (that's right--I wrote 
all the above and it wasn't even my idea originally).  If that doesn't 
make it into 8.4 I will yield to your statement of the boring, 
manual-reading status quo still being on target.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-03 Thread KaiGai Kohei

Bruce Momjian wrote:

KaiGai Kohei wrote:

I updated the patch set of SE-PostgreSQL (revision 1268).

[1/6] 
http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1268.patch
[2/6] 
http://sepgsql.googlecode.com/files/sepostgresql-pg_dump-8.4devel-3-r1268.patch
[3/6] 
http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1268.patch
[4/6] 
http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1268.patch
[5/6] 
http://sepgsql.googlecode.com/files/sepostgresql-tests-8.4devel-3-r1268.patch
[6/6] 
http://sepgsql.googlecode.com/files/sepostgresql-row_acl-8.4devel-3-r1268.patch

Draft of the SE-PostgreSQL documentation is here:
  http://wiki.postgresql.org/wiki/SEPostgreSQL

List of updates:
- The patches are rebased to the CVS HEAD.
- RelOptions related hooks are cleaned up.
- The Row-level ACL feature is chosen in default.
- rowacl_table_default() is added to show the default ACL of the table.
- The initial revision of regression test for Row-level ACL is added.

If you have anything to comment for the patches, could you disclose it?
It is not necessary to be a comprehensive one. Don't hesitate to submit.


I looked over the patch and was wondering why you chose to have a
configure option to disable row-level ACLs.


There are no explicit reasons.
I thought it was natural, as if we can build Linux kernel without any
enhanced security features (SELinux, SMACK and so on).

I don't oppose to elimination of --disable-row-acl options, however,
it is not clear for me whether it should be unavoidable selection in
the future, or not.


I assume that could just be always enabled.


It is not always enabled. When we build it with SE-PostgreSQL feature,
rest of enhanced security features (includes the row-level ACL) are
disabled automatically, as we discussed before.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Greg Smith [EMAIL PROTECTED] writes:

 On Thu, 4 Dec 2008, Gregory Stark wrote:

 Greg Smith [EMAIL PROTECTED] writes:

 Is it worse to suffer from additional query overhead if you're sloppy with
 the tuning tool, or to discover addition partitions didn't work as you
 expected?

 Surely that's the same question we faced when deciding what the Postgres
 default should be?

 Gosh, you're right.  I'm really new here, and I just didn't understand how
 things work.  I should have known that there was lots of thorough research 
 into
 that setting before the default was set.  (hangs head in shame)

Oh no, in this case I meant just that if we want to change it we should change
it in *both* places. That the argument you're making applies just as much to
the Postgres default as it does to the mixed workload default in the tuner.

But I admit I totally didn't remember that the main reason it was originally
off was the lack of plan invalidation. That does rather change things. Perhaps
we should be enabling it now.

If we do though, it shouldn't default one way and then get randomly flipped by
a tool that has the same information to make its decision on. What I'm saying
is that mixed is the same information that initdb had about the workload.

If we do change this then I wonder if we need the parameter at all. I mean, we
don't generally have parameters to turn off random parts of the optimizer...

 How much was that overhead lowered by the work done in 8.3?  I can't find any
 public information suggesting that was ever even discussed.  

Well it does have to compare every constraint with every clause and do a
moderately complex analysis. It's never going to be super-fast for complex
queries. But on the other hand it should drop out pretty fast if the tables
haven't got any constraints so it does seem like it's only hurting people when
they would want it on anyways.

 What I see is a parameter that doesn't add enough overhead relative to query
 execution time on today's systems that I've noticed whether it was on or off

There's a danger in this. There's some famous, probably apocryphal, example of
a fast food restaurant that taste tested their menu and got great results.
Then they cheapened an ingredient and their testers couldn't taste the
difference. Then they did that with another ingredient and another and so on
and each time the testers couldn't taste a difference. And in the end they
ended up rolling out a dramatically inferior menu which people panned compared
to the original...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Mark Wong
On Mon, Dec 1, 2008 at 9:32 PM, Greg Smith [EMAIL PROTECTED] wrote:
 On Mon, 1 Dec 2008, Mark Wong wrote:

 So then I attempted to see if there might have been difference between the
 executing time of each individual query with the above parameters. The
 queries that don't seem to be effected are Q1, Q4, Q12, Q13, and Q15.  Q17
 suggests that anything higher than default_statistics_target=10 is an
 improvement.  The rest of the queries appears not to follow any particular
 trend with respect to default_statistics_target.

 The interesting ones are Q2, Q9, Q17, Q18, and Q20, and that data is much
 more useful than the summary.  As you mention, Q17 improves significantly
 with a higher target.  All of the rest are dramatically slower in one or
 both tests going from default_statistics_target=10 to 100.  Those look like
 the most useful data points on the X axis--the increases from 100 up to 1000
 aren't particularly interesting in most of these, except in Q20 where the
 Power Test seems to oscillate between degrees of good and bad behavior
 seemingly at random.

 My picks for the most useful graphs from the long list Mark sent:

 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q2.png
 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q9.png
 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q17.png
 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q18.png
 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q20.png

 I think the tests you could consider next is to graph the target going from
 10 to 100 in steps of 10 just for those 5 queries.  If it gradually
 degrades, that's interesting but hard to nail down.  But if there's a sharp
 transition, getting an explain plan for the two sides of that should provide
 some insight.  I'm really more interested in the ones that slowed down than
 the one that improved, understanding that might finally provide some
 evidence against increasing it by default.

I've updated the charts to include results from setting
default_statistics_target from 20-90.  The links to the charts are the
same.  The links to the raw data are in
http://207.173.203.223/~markwkm/community6/dbt3/ in directories 45-52.

Regards,
Mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 If we do though, it shouldn't default one way and then get randomly flipped by
 a tool that has the same information to make its decision on. What I'm saying
 is that mixed is the same information that initdb had about the workload.

+1.

 If we do change this then I wonder if we need the parameter at all. I mean, we
 don't generally have parameters to turn off random parts of the optimizer...

It probably isn't a good idea to both change the default setting and
remove the parameter in the same release.  It would be awesome if this
is cheap enough now to have it on by default - but constraints are
pretty useful for maintaining data integrity, so it's conceivable to
me that someone could have a lot of constraints most of which are
unuseful for query planning.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Thu, 4 Dec 2008, Gregory Stark wrote:

My point was more that you could have a data warehouse on a 
non-dedicated machine, you could have a web server on a non-dedicated 
machine, or you could have a mixed server on a non-dedicated machine.


I should just finish the documentation, where there will be a big 
disclaimer saying THESE SETTINGS ASSUME A SERVER DEDICATED TO 
POSTGRESQL!  That's the context here.  Why, after you follow my tuning 
instructions, you're lucky if the server will run anything but the 
database afterwards.



Josh's logic is impeccable -- for the specific use case he's describing of a
truly dedicated server with enough disk space for a major production database.
But not every install is going to have gigabytes of space reserved for it and
not every admin is going to realize that he really should set aside gigabytes
of space even though he only expects his database to be a few megabytes.


It's really quite simple.  Josh and I don't care directly about disk space 
used by the WAL for people with trivial databases.  At all.  Whatsoever. 
Maybe once, long ago, when we were young and frugal and skinny[1]; not 
now, or probably ever again the future.  If that's your concern, maybe 
there can be some companion utility named pgmiser that lowers parameters 
back down again.  Your mascot can be some sort of animal that efficiently 
lives off small scraps of food or something.[2]


The context here is pgtune, which is aiming to make a fat elephant of a 
server faster so that there's an answer to people who say My benchmarks 
are all running really slow, is this because my system with 16PT of RAM is 
only using 32MB of it for the database?  This sucks, I'm going back to 
Oracle which used all my RAM.  If there are people who instead think, 
hey, I'll run this tuning utility to make my database faster, then it 
will also be a lot smaller!, maybe we can find a class about space/time 
tradeoffs in algorithm design to send them to or something.[3]


There are exactly two important things here.  The first is how large 
checkpoint_settings needs to be in order to for the considerable overhead 
of checkpoints to be bearable.  That drives the setting up.  Our super-fat 
DW application gets set to at least 64 so that when you bulk-load another 
TB of data into it, that doesn't get bottlenecked dumping gigabytes of 
dirty buffers every few seconds.  If the database crashes and recovery 
reads or writes a bunch of data, who cares about random writes because 
your SAN has a 4GB write cache on it and dozens of drives slaving away.


Driving the setting down is knowing how much time you'll have to wait for 
recovery to happen, which is really a measure of what your tolerance for 
downtime is.  We're thinking that someone who picks the Desktop tuning may 
have no tolerance for the database to be sluggish coming back up after 
Windows crashed and they rebooted, so tiny setting for them to make 
recovery super fast.


Everybody else in our sample profiles fall in the middle of those two 
extremes, which is why the values curve the way they do.  Web app? 
Probably not a lot of write volume, probably trouble if it's down a long 
time; how about 8, on the low side, but it gives checkpoints more time to 
spread out their I/O so worst-case latency isn't as bad.  That's the sort 
of analysis those numbers come from.  Do performance tuning and juggle 
these trade-offs for long enough for new people all the time, you get a 
gut feel for the right ballpark an app should start at based on its type. 
The whole idea behind this tool is that we're taking some of that hard-won 
knowledge and trying to automate the distribution of it.



It's great that Postgres has such great documentation but whenever we have the
chance to replace something with an option which doesn't need any
documentation that would be even better. I'm just exploring whether that's an
option here.


I would be glad to have a post-CommitFest discussion of this very topic as 
it's quite a pain to me in its current form.  Just not right now because 
it's too late to touch it.


Nobody's even tried to do this side of things before. They always got 
bogged down in trying to parse config files and such.


It's actually because most of them were working in Perl, which encourages 
deviant behavior where people delight in converting useful ideas into 
illegible punctuation rather than actually getting anything done.  Except 
for that other Greg around here who's not involved in this discussion, his 
Perl is pretty good.


[1] Josh is being aggressively bulked up right now for his next sumo 
match.


[2] Like a rat, which would give you an excuse to add the long overdue 
PL/Ratfor.


[3] This wouldn't actually help them learn anything, but it would make 
their heads explode at which point all their problems are gone.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)

Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 I think the tests you could consider next is to graph the target going from
 10 to 100 in steps of 10 just for those 5 queries.  If it gradually
 degrades, that's interesting but hard to nail down.  But if there's a sharp
 transition, getting an explain plan for the two sides of that should provide
 some insight.  I'm really more interested in the ones that slowed down than
 the one that improved, understanding that might finally provide some
 evidence against increasing it by default.

 I've updated the charts to include results from setting
 default_statistics_target from 20-90.  The links to the charts are the
 same.  The links to the raw data are in
 http://207.173.203.223/~markwkm/community6/dbt3/ in directories 45-52.

I still think we're missing the boat here because it's not really the
same query every time.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Wed, 3 Dec 2008, Mark Wong wrote:


http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q2.png
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q9.png
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q17.png
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q18.png
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q20.png


I've updated the charts to include results from setting
default_statistics_target from 20-90.


Interesting.

Q2: Explodes hard when the target hits 100, but not before.  Would be 
interesting to compare the explain plans at, say, 50 and 150 to see if 
it's possible to nail down what is shifting so much.


Q3: goes bonkers as soon as the target hits 20, so the interesting plans 
to compare are 10 and 20.


Q17:  our one from the original set that improved a bunch with the larger 
target gets all that behefit just from going to 20.  Would be interesting 
to compare the plans at 10 and 20 to see what changed so much with such a 
small difference.


Q18:  looks like it was in some sort of local bad area around 100-400 
before, with some more context that one doesn't look interesting anymore.


Q20:  also doesn't look very interesting anymore.  The results at 10 were 
nice, and the ones at 100 were among the unusuallly bad ones, but it's 
pretty random--if there was something inherantly bad related to the stats, 
there wouldn't be low points around 200.


Out of those, the most interesting one to me (as someone who is trying to 
defend raising the target some but not going crazy with that) is Q3.  The 
reason I say that is that everything else is better or basically the same 
raising the target from 10, as long as you don't go too high (=100). 
That one falls apart immediately with a larger target which seems weird.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep: First Thoughts on Code

2008-12-03 Thread Fujii Masao
Hi,

On Wed, Dec 3, 2008 at 11:33 PM, Simon Riggs [EMAIL PROTECTED] wrote:
 I'm patient, I know it takes time. Happy to spend hours on the review,
 but I want to do that knowing I agree with the higher level features and
 architecture first.

I wrote the features and restrictions of Synch Rep. Please also check
it together with the figures of architecture.
http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#User_Overview

 Some questions to check my understanding (expected answers in brackets)

 * Diagram on p.2 has two Archives. We have just one (yes)

No, we need archive in both the primary and standby. The primary needs
archive because a base backup is required when starting the standby.
Meanwhile, the standby needs archive for cooperating with pg_standby.

If the directory where pg_standby checks is the same as the directory
where walreceiver writes the WAL, the halfway WAL file might be
restored by pg_standby, and continuous recovery would fail. So, we have
to separate the directories, and I assigned pg_xlog and archive to them.

Another idea; walreceiver writes the WAL to the file with temporary name,
and rename it to the formal name when it fills. So, pg_standby doesn't
restore a halfway WAL file. But it's more difficult to perform the failover
because the unrenamed WAL file remains.

Do you have any other good idea?


 * We send data continuously, whether or not we are in sync/async? (yes)

Yes.

 So the only difference between sync/async is whether we wait when we
 flush the commit? (yes)

Yes.
And, in asynch case, the backend basically doesn't send the wakeup-signal
to walsender.


 * If we have synchronous_commit = off do we ignore
 synchronous_replication = on (yes)

No, we can configure them independently. synchronous_commit covers
only local writing of the WAL. If synch_*commit* should cover both local
writing and replication, I'd like to add new GUC which covers only local
writing (synchronous_local_write?).


 * If two transactions commit almost simultaneously and one is sync and
 the other async then only the sync backend will wait? (Yes)

Yes.



 Do we definitely need the archiver to move the files written by
 walreceiver to archive and then move them back out again?

Yes, it's because of cooperating with pg_standby.

 Seems like we
 can streamline that part in many (all?) cases.

Agreed. But I thought that such streaming was TODO of next time.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] In-place upgrade: catalog side

2008-12-03 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:

Greg Smith napsal(a):
-There are 10 TODO items listed for the pg_migrator project, most or 
all of which look like should be squashed before this is really 
complete. Any chance somebody (Korry?) has an improved version of 
this floating around beyond what's in the pgfoundry CVS already?


As I mentioned before pg_migrator and pg_upgrade.sh is not good way. 
It is workaround. It does not make sense to continue in this way.


Why not?



Problem is the pg_dump does not export all important data for upgrade. For 
example relfileid and so on. However, biggest problem here are dropped columns 
(thanks to point me on this issue). Dropped column does not have information 
about type. It could be possible to fake it somehow during a dump and drop them 
again after restore, but I'm not convinced that it is what we want.


The solution is good now as a starter, but it is far from final solution.

Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Heikki Linnakangas

Alvaro Herrera wrote:

Gregory Stark escribió:

Joshua D. Drake [EMAIL PROTECTED] writes:



I don't think at any time I have said to my self, I am going to set this
parameter low so I don't fill up my disk. If I am saying that to myself
I have either greatly underestimated the hardware for the task. Consider
that we are quarreling over what amounts to a nominal amount of hard
drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more
capacity than that.

Well my phone has 16G of RAM, why not 1 ?


I don't think the disk space used is the only consideration here.  You
also have to keep recovery time in mind.  If you set it to 1000,
recovery would take way too long.


Presumably if you set checkpoint_segments to a high value, you'd use 
checkpoint_timeout to limit recovery time.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] In-place upgrade: catalog side

2008-12-03 Thread Greg Smith

On Wed, 3 Dec 2008, Zdenek Kotala wrote:

It works fine for 8.3-8.4 too, but I'm working on cleanup and fixing 
bugs. I hope that I will send updated version to community today.


That would be great.  It didn't feel like you were quite done with it yet. 
I'll be glad to help test it out, just didn't want to jump into that if it 
was known to still have issues that were being worked on.  Please let us 
know what the remaining bugs you know about are at that point, I really 
don't want this part of things to get ignored just because the page format 
stuff is the harder part.


It is more workaround or temporary solution. This approach is easy but it has 
lot of limitation. Problem with toast tables is one, but biggest problem is 
with dropped columns. And maybe there will be more issues. Problem with dump 
is that you lost a internal data.


Can you be a bit more specific about what the problems with TOAST and 
dropped columns are?  If those are covered in your presentation or came up 
already and I missed it, just point me that way; I'm still working my way 
through parts of this and don't expect to ever have it all in my head like 
you do at this point.  Obviously this approach is going to be somewhat 
traumatic even if perfectly executed because of things like losing table 
statistics.


As we move closer to final crunch time here, what I am trying to keep 
clear in my head is which bits are absolutely required to do any type of 
in-place upgrade, whether or not the page format changes in 8.4.  What's 
nice is that those parts I can be testing right now just by trying to 
upgrade from 8.3 to 8.4.  Barring things like the TOAST problem you 
mention getting in the way, the fundamental approach taken by these 
upgrade scripts seems workable for the job even it's not optimal, and 
that's a whole lot better than nothing at all.


I personally prefer to have special mode (like boostrap) which converts data 
from old catalog to new format.


That's a perfectly fine idea I would like to see too.  But if we have to 
write such a thing from scratch right now, I'm afraid that may be too late 
to implement and still ship the next release on schedule.  And if such 
bootstrap code is needed, we sure need to make sure the prototype it's 
going to be built on is solid ASAP.  That's what I want to help you look 
into if you can catch me up a bit here.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers