Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMPTABLES in PostgreSQL

2007-07-05 Thread Simon Riggs
On Wed, 2007-07-04 at 22:27 +0100, Gregory Stark wrote:
 Pavel Stehule [EMAIL PROTECTED] writes:

  Catalog bloat is one unwanted effect. Second is different behave of
  temp tables  than other mayor rdbms, and uncomfortable work with temp
  tables in stored procedures. Third argument for implementation of
  global temp tables is full support of ANSI SQL,
 
 I think the ANSI concept of temporary tables which are defined once but give
 you a fresh empty work-space for each transaction only makes sense if you're
 thinking in terms of an OLTP environment. Otherwise you would just go ahead
 and do the DDL to create new tables for each query and not worry about the
 down-sides.
 
 The advantages of the ANSI temporary tables are all things you would worry
 about in an OLTP environment but not a data warehousing environment:

IIRC there were similar problems with temp table usage at many DW sites
using Teradata. The issue was about locking, specifically the access
rights required. We might have that problem, or not, but the issues
related to significant numbers of temp tables effect many types of
application., not just OLTP.
-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-05 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 2007/7/4, Bruce Momjian [EMAIL PROTECTED]:
 The solution is to fix the bloat, not add a work-around.

 The bloat is a direct consequence of performing DDL in the midst of an OLTP
 transaction.

 Hardly.  It's a consequence of our current implementation of temp
 tables; that does not necessarily imply that we cannot fix it without
 an API change.

Sure, we could change our regular temporary tables to not create new records
in pg_class at all, but I don't think it would make a big difference to DSS
users. And I think for OLTP you would still want all the other advantages the
standard api gives you.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Still recommending daily vacuum...

2007-07-05 Thread Michael Paesold

Alvaro Herrera wrote:

So what you are proposing above amounts to setting scale factor = 0.05.
The threshold is unimportant -- in the case of a big table it matters
not if it's 0 or 1000, it will be almost irrelevant in calculations.  In
the case of small tables, then the table will be vacuumed in almost
every iteration if the threshold is 0, which is fine because the table
is small anyway.  So why not let the threshold be 0 and be done with it?


For very small tables, setting a threshold of 0 could mean a vacuum 
after every single row update (or every other row). I think that is just 
burning cycles. What about a threshold of 10 or 50, to have at least 
some sanity limit? Even though the cost of vacuum of a small table is 
low, it is still not free, IMHO, no?


Best Regards
Michael Paesold


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] GRANT ROLE and triggers

2007-07-05 Thread Richard Huxton

Claudio Rossi wrote:

Hello, I'm trying to write a trigger function which uses GRANT ROLE
statement. Scenario is: I have a group role (let's call it A) which
has not superuser privileges and I want to grant A membership to
current user after an insert on a particular table, then revoke it
after a delete on it; I write trigger function as superuser then I
normally login as unprivileged user. 


I think you want to define your function as SECURITY DEFINER, then it'll 
do what you want.


This isn't a hackers question by the way, the hackers list is for 
developing PostgreSQL itself. This question is better suited to the 
general / sql / admin lists perhaps.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] SetBufferCommitInfoNeedsSave and race conditions

2007-07-05 Thread Simon Riggs
On Thu, 2007-06-28 at 20:23 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Thu, 2007-06-28 at 15:16 -0400, Tom Lane wrote:
  A quick grep suggests that VACUUM FULL might be at risk here.
 
  No we're clear: I caught that issue specifically for VACUUM FULL fairly
  early on. VF assumes all hint bits are set after the first scan, so we
  flush prior to the scan to ensure its safe to set the hint bits.
 
 Flush what prior to the scan?
 
 The methodology I suggested earlier (involving tracking LSN only at the
 level of pg_clog pages) isn't going to make that work, unless you
 somehow force the XID counter forward to the next page boundary.
 It might be that that level of tracking is too coarse anyway, since
 it essentially says that you can't hint any transaction until the
 next 32K-transaction boundary is reached.

Solutions I'm going for are these:

- Force XLogFlush() prior to initial VF scan. Tqual will set hint bits
if WAL has been flushed, else it will be deferred, so no WAL flushes
will be forced by normal hint bit setting and VF will work without
needing any crufty special cases or rework of VF logic.

- Use Tom's LSN tracking at clog page level. Make the LSN tracking store
an array of LSNs rather than just one. Array size is fixed at
NUMBER_TRACKED_LSNS_PER_PAGE, so that each LSN covers
32,000/NUMBER_TRACKED_LSNS_PER_PAGE transactions. I'd guess that storing
8 per page would be optimal, so each stored xid would track 4,000
transactions - probably around 1 sec worth of transactions when the
feature is used.

Comments?

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SetBufferCommitInfoNeedsSave and race conditions

2007-07-05 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes:

 I'd guess that storing 8 per page would be optimal, so each stored xid would
 track 4,000 transactions - probably around 1 sec worth of transactions when
 the feature is used.

This is something we can experiment with but I suspect that while 8 might be
sufficient for many use cases there would be others where more would be
better. The cost to having more lsns stored in the clog would be pretty small.

On TPCC which has longer transactions on moderate hardware we only see order
of 1,000 txn/min. So a setting like 128 which allows a granularity of 256
transactions would be about 15s which is not so much longer than the xmin
horizon of the 90th percentile response time of 2*5s.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] todo: Hash index creation

2007-07-05 Thread Heikki Linnakangas

Kenneth Marshall wrote:

I definitely agree with Tom's assessment. If we cannot need to make the
hash index as performant as it is in theory, none of the other refinements
are worth it. You would need to use BTree if you were concerned about
speed. (and who isn't)


I just got an idea.

Hash indexes would take much less space, and be more efficient to 
search, if we only stored the hash of the key in the index. Such index 
tuples would be fixed size, so we could get rid of the overhead of the 
length-field in IndexTuple, as well as the line pointers.


Of course, the key value would need to be rechecked after fetching the 
heap tuple, but that shouldn't be a problem assuming there's few collisions.


Another idea: when searching, we scan the whole bucket page looking for 
matching tuples. If we stored the tuples ordered by hash value within 
page, we could do a binary search instead.


These changes might give hash indexam the edge over b-tree it needs.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Still recommending daily vacuum...

2007-07-05 Thread Florian G. Pflug

Michael Paesold wrote:

Alvaro Herrera wrote:

So what you are proposing above amounts to setting scale factor = 0.05.
The threshold is unimportant -- in the case of a big table it matters
not if it's 0 or 1000, it will be almost irrelevant in calculations.  In
the case of small tables, then the table will be vacuumed in almost
every iteration if the threshold is 0, which is fine because the table
is small anyway.  So why not let the threshold be 0 and be done with it?


For very small tables, setting a threshold of 0 could mean a vacuum 
after every single row update (or every other row). I think that is just 
burning cycles. What about a threshold of 10 or 50, to have at least 
some sanity limit? Even though the cost of vacuum of a small table is 
low, it is still not free, IMHO, no?


A bit off-topic (because probably not realistic in a 8.3 timeframe) -
but maybe the threshold should be specified in terms of expected number of
pages to be freed, instead specifing a bias for the number of modified
rows as it is done now. Then 1 would probably be a reasonable default, because
a vacuum that won't free at least one page seems to be not really worth
the effort - it won't safe any future IO bandwith.

Just an idea I got while following this thread...

greetings, Florian Pflug

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Still recommending daily vacuum...

2007-07-05 Thread Kevin Grittner
 On Tue, Jul 3, 2007 at  5:34 PM, in message
[EMAIL PROTECTED], Alvaro Herrera
[EMAIL PROTECTED] wrote: 
 Kevin Grittner wrote:
 
 Autovacuum is enabled with very aggressive settings, to cover small
 tables, including one with about 75 rows that can be updated 100 or more
 times per second.  Even with these settings there is zero chance of any
 table of even moderate size hitting the autovacuum threshold between our
 scheduled vacuums.
 
 Sounds like you would be served by setting those specific tables to a
 lower vacuum scale factor (keeping a more normal default for the rest of
 the tables), and having a non-zero vacuum delay setting (to avoid
 excessive I/O consumption).  Have you tried that?
 
I did play with that, but it doens't seem to make sense in our environment.
We have about 100 databases, most of them scattered around the state, and
any extra maintenance like that has a cost, particularly with the daily
cluster changing the oid.  Both from doing the math and from experience,
I can say that the autovacuum only affects the small, frequently updated
tables, so I could see no benefit.  Am I missing somethign?  (I can't see
where this causes any extra I/O.)
 
Our tables tend to fall into one of four categories, small tables with high
update rates, medium tables (millions or tens of millions of rows) with
thousands or tens of thousands of updates per day, static tables of various
sizes that are only modified as part of a software release, and big honking
tables (100s of GB) which are either insert-only or are insert with
periodic purge of old rows.  Only the first group has a chance of being
autovacuumed in normal operations.  Event he purges don't cause it to kick
in.
 
 In terms of our autovacuum settings, we have several different types of
 databases, and in all of them we seem to do well with these changes from
 the 8.2 defaults, combined with (except for the above configuration) a
 nightly database vacuum:
  
 autovacuum_naptime = 10s
 
 Another change in 8.3 is that the naptime is per-database, i.e. the time
 between two consecutive autovac runs on a database.  So with a setting
 of 10s, if you have 10 database there will be one autovac run per
 second, whereas on 8.2 there would be one autovac each 10 seconds
 (unless you run out of worker slots).
 
That's fine.  We actually want it every ten seconds in a production
database.  When you can have more updates per second than there are rows
in a small table, frequent vacuums are good.  As long as the table doesn't
bloat too badly, the vacuum is typically 10 to 20 milliseconds.  I'm sure
that part of it is that the table tends to remain fully cached. When these
tables were vacuumed once per minute, we ran into performance problems.
 
 Oh, the tiny, high-update tables occasionally bloat to hundreds or
 thousands of pages because of long-running transactions, so we schedule
 a daily cluster on those, just to keep things tidy.
 
 If you can afford the cluster then there's no problem.  I don't expect
 that to change in 8.3.
 
Here also we're talking 10 to 20 milliseconds.  I understand that in 8.2
that leaves a chance of an error, but we seem to have dodged that bullet
so far.  Has that gotten any safer in 8.3?
 
-Kevin
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCH] A crash and subsequent recovery of themaster can cause the slave to get out-of-sync

2007-07-05 Thread Tom Lane
[ back to dealing with this patch, finally ]

Florian G. Pflug [EMAIL PROTECTED] writes:
 While creating the patch, I've been thinking if it might be worthwile
 to note that we just did recovery in the ShutdownCheckpoint
 (or create a new checkpoint type RecoveryCheckpoint). This wouldl allow
 for more error checking, because then the slave could check that
 safe_restartpoint() is true for all ShutdownCheckpoints that were not
 after recovering.

I concur that this is a good idea --- we should have a third checkpoint
record type that shows that a crash recovery occurred.  However, we can
probably only do that for 8.3 and beyond.  If we try to do it in
existing release branches then there's likelihood of trouble due to WAL
incompatibility between master and standby.  While we do advise people
to update their standbys first, I don't think it's worth risking such
problems just to add some more error checking.

Conclusion: we should apply Florian's patch as-is in 8.2, do something
morally equivalent in 8.1 and before, and invent a
CrashRecoveryCheckpoint record type in HEAD.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCH] A crash and subsequent recovery of themaster can cause the slave to get out-of-sync

2007-07-05 Thread Florian G. Pflug

Tom Lane wrote:

[ back to dealing with this patch, finally ]

Florian G. Pflug [EMAIL PROTECTED] writes:

While creating the patch, I've been thinking if it might be worthwile
to note that we just did recovery in the ShutdownCheckpoint
(or create a new checkpoint type RecoveryCheckpoint). This wouldl allow
for more error checking, because then the slave could check that
safe_restartpoint() is true for all ShutdownCheckpoints that were not
after recovering.


I concur that this is a good idea --- we should have a third checkpoint
record type that shows that a crash recovery occurred.  However, we can
probably only do that for 8.3 and beyond.  If we try to do it in
existing release branches then there's likelihood of trouble due to WAL
incompatibility between master and standby.  While we do advise people
to update their standbys first, I don't think it's worth risking such
problems just to add some more error checking.



Conclusion: we should apply Florian's patch as-is in 8.2, do something
morally equivalent in 8.1 and before, and invent a
CrashRecoveryCheckpoint record type in HEAD.


Sounds good.

Do you want me to code up such patches for 8.1 and 8.3 in the next days,
or is someone else already working on it?

greetings, Florian Pflug



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [PATCH] A crash and subsequent recovery of themaster can cause the slave to get out-of-sync

2007-07-05 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Conclusion: we should apply Florian's patch as-is in 8.2, do something
 morally equivalent in 8.1 and before, and invent a
 CrashRecoveryCheckpoint record type in HEAD.

 Sounds good.

Actually, now that I look closer, this patch seems completely wrong.
It's predicated on an assumption that rm_cleanup won't write WAL entries
describing what it did ... but, at least in the btree case, it does.
(I think gist/gin might not, but that's a bug in those AMs not in xlog.)
I'm therefore wondering what test case led you to think there was
something wrong.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] usleep feature for pgbench

2007-07-05 Thread Jan Wieck

To test some changes in Slony I needed a

\usleep [microseconds|:variable]

in pgbench's scripting language to be able to have hundreds of 
concurrent running transactions without totally swamping the system. I 
was wondering if anyone would object to permanently adding this to the 
pgbench code?



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] unclean SPI_scroll_cursor_move documentation, is SPI_tuptable valid?

2007-07-05 Thread Tom Lane
Pavel Stehule [EMAIL PROTECTED] writes:
 in doc there are for returned value:
 SPI_processed and SPI_tuptable are set as in SPI_execute if successful.
 But for move statement is SPI_tuptable undefined. Move statement only move 
 cursor. Doesn't return anything.

Fixed, thanks.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] usleep feature for pgbench

2007-07-05 Thread Tom Lane
Jan Wieck [EMAIL PROTECTED] writes:
 To test some changes in Slony I needed a
  \usleep [microseconds|:variable]
 in pgbench's scripting language to be able to have hundreds of 
 concurrent running transactions without totally swamping the system. I 
 was wondering if anyone would object to permanently adding this to the 
 pgbench code?

How's it implemented?  Does the whole pgbench app freeze up for the
duration of the sleep, or does it just delay that one client script?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCH] A crash and subsequent recovery of themaster can cause the slave to get out-of-sync

2007-07-05 Thread Florian G. Pflug

Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

Tom Lane wrote:

Conclusion: we should apply Florian's patch as-is in 8.2, do something
morally equivalent in 8.1 and before, and invent a
CrashRecoveryCheckpoint record type in HEAD.



Sounds good.


Actually, now that I look closer, this patch seems completely wrong.
It's predicated on an assumption that rm_cleanup won't write WAL entries
describing what it did ... but, at least in the btree case, it does.
(I think gist/gin might not, but that's a bug in those AMs not in xlog.)
I'm therefore wondering what test case led you to think there was
something wrong.


It wasn't a testcase - I was trying to understand the xlog code while working
on my concurrent walreplay patch, and wondered what happens if the master 
crashes and then recovery while the slave keeps running.


I've re-read my original email to Simon, and it seems that I believed
that rm_cleanup methods won't bee able to write to the xlog because they are
called during recovery.

But StartupXLOG *does* make the wal append able *before* the rm_cleanup methods
are called.

So I now think (at least for btree) that everything is fine, and that I was
just being stupid.

Sorry for the noise, guys
greetings, Florian Pflug


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] usleep feature for pgbench

2007-07-05 Thread Jan Wieck

On 7/5/2007 3:34 PM, Tom Lane wrote:

Jan Wieck [EMAIL PROTECTED] writes:

To test some changes in Slony I needed a
 \usleep [microseconds|:variable]
in pgbench's scripting language to be able to have hundreds of 
concurrent running transactions without totally swamping the system. I 
was wondering if anyone would object to permanently adding this to the 
pgbench code?


How's it implemented?  Does the whole pgbench app freeze up for the
duration of the sleep, or does it just delay that one client script?


Only that one client script.

The \usleep command itself calculates a struct timeval until that sits 
in the state struct and sets a sleeping flag also in the state struct. 
 So both are per client script/connection and multiple clients can be 
sleeping in an overlapping fashion. The loop in main() then picks the 
lowest timeval (if there is anyone sleeping) and uses that to calculate 
the timeout for select(). doCustom() resets the sleeping flag for those 
scheduled for wakeup and advances the client to the next scripted command.


I think I've coded it in a way that if one doesn't use the \usleep 
command at all, it will never even call gettimeofday() and use a NULL 
timeout in select() as it used to.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Bgwriter strategies

2007-07-05 Thread Heikki Linnakangas

I ran some DBT-2 tests to compare different bgwriter strategies:

http://community.enterprisedb.com/bgwriter/

imola-336 was run with minimal bgwriter settings, so that most writes 
are done by backends. imola-337 was patched with an implementation of 
Tom's bgwriter idea, trying to aggressively keep all pages with 
usage_count=0 clean. Imola-340 was with a patch along the lines of 
Itagaki's original patch, ensuring that there's as many clean pages in 
front of the clock head as were consumed by backends since last bgwriter 
iteration.


All test runs were also patched to count the # of buffer allocations, 
and # of buffer flushes performed by bgwriter and backends. Here's those 
results (I hope the intendation gets through properly):


imola-336   imola-337   imola-340
writes by checkpoint  38302   30410   39529
writes by bgwriter   350113 2205782 1418672
writes by backends  1834333  265755  787633
writes total748 2501947 2245834
allocations 2683170 2657896 2699974

It looks like Tom's idea is not a winner; it leads to more writes than 
necessary. But the OS caches the writes, so let's look at the actual I/O 
performed to be sure, from iostat:


http://community.enterprisedb.com/bgwriter/writes-336-337-340.jpg

The graph shows that on imola-337, there was indeed more write traffic 
than on the other two test runs.


On imola-340, there's still a significant amount of backend writes. I'm 
still not sure what we should be aiming at. Is 0 backend writes our goal?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Bgwriter strategies

2007-07-05 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
   imola-336   imola-337   imola-340
 writes by checkpoint38302   30410   39529
 writes by bgwriter 350113 2205782 1418672
 writes by backends1834333  265755  787633
 writes total  748 2501947 2245834
 allocations   2683170 2657896 2699974

 It looks like Tom's idea is not a winner; it leads to more writes than 
 necessary.

The incremental number of writes is not that large; only about 10% more.
The interesting thing is that those extra writes must represent
buffers that were re-touched after their usage_count went to zero, but
before they could be recycled by the clock sweep.  While you'd certainly
expect some of that, I'm surprised it is as much as 10%.  Maybe we need
to play with the buffer allocation strategy some more.

The very small difference in NOTPM among the three runs says that either
this whole area is unimportant, or DBT2 isn't a good test case for it;
or maybe that there's something wrong with the patches?

 On imola-340, there's still a significant amount of backend writes. I'm 
 still not sure what we should be aiming at. Is 0 backend writes our goal?

Well, the lower the better, but not at the cost of a very large increase
in total writes.

 Imola-340 was with a patch along the lines of 
 Itagaki's original patch, ensuring that there's as many clean pages in 
 front of the clock head as were consumed by backends since last bgwriter 
 iteration.

This seems intuitively wrong, since in the presence of bursty request
behavior it'll constantly be getting caught short of buffers.  I think
you need a safety margin and a moving-average decay factor.  Possibly
something like

buffers_to_clean = Max(buffers_used * 1.1,
   buffers_to_clean * 0.999);

where buffers_used is the current observation of demand.  This would
give us a safety margin such that buffers_to_clean is not less than
the largest demand observed in the last 100 iterations (0.999 ^ 100
is about 0.90, cancelling out the initial 10% safety margin), and it
takes quite a while for the memory of a demand spike to be forgotten
completely.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] usleep feature for pgbench

2007-07-05 Thread Tom Lane
Jan Wieck [EMAIL PROTECTED] writes:
 I think I've coded it in a way that if one doesn't use the \usleep 
 command at all, it will never even call gettimeofday() and use a NULL 
 timeout in select() as it used to.

Did you check that the observed performance for non-usleep-using scripts
didn't change?  If this extra overhead causes a reduction in reported
TPS rates it would make it hard to compare older and newer tests.

Other than that I've got no objection to it.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] usleep feature for pgbench

2007-07-05 Thread Alvaro Herrera
Tom Lane wrote:
 Jan Wieck [EMAIL PROTECTED] writes:
  I think I've coded it in a way that if one doesn't use the \usleep 
  command at all, it will never even call gettimeofday() and use a NULL 
  timeout in select() as it used to.
 
 Did you check that the observed performance for non-usleep-using scripts
 didn't change?  If this extra overhead causes a reduction in reported
 TPS rates it would make it hard to compare older and newer tests.

I keep wondering, why is that important?  If you want to compare results
of different PG versions, surely you can run the tests with the newest
version of pgbench?

-- 
Alvaro Herrera http://www.flickr.com/photos/alvherre/
Those who use electric razors are infidels destined to burn in hell while
we drink from rivers of beer, download free vids and mingle with naked
well shaved babes. (http://slashdot.org/comments.pl?sid=44793cid=4647152)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] usleep feature for pgbench

2007-07-05 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Did you check that the observed performance for non-usleep-using scripts
 didn't change?  If this extra overhead causes a reduction in reported
 TPS rates it would make it hard to compare older and newer tests.

 I keep wondering, why is that important?  If you want to compare results
 of different PG versions, surely you can run the tests with the newest
 version of pgbench?

It's more about not having to repeat old test cases every time you want
to have numbers comparable to your latest case.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Bgwriter strategies

2007-07-05 Thread Greg Smith

On Thu, 5 Jul 2007, Heikki Linnakangas wrote:

It looks like Tom's idea is not a winner; it leads to more writes than 
necessary.


What I came away with as the core of Tom's idea is that the cleaning/LRU 
writer shouldn't ever scan the same section of the buffer cache twice, 
because anything that resulted in a new dirty buffer will be unwritable by 
it until the clock sweep passes over it.  I never took that to mean that 
idea necessarily had to be implemented as trying to aggressively keep all 
pages with usage_count=0 clean.


I've been making slow progress on this myself, and the question I've been 
trying to answer is whether this fundamental idea really matters or not. 
One clear benefit of that alternate implementation should allow is setting 
a lower value for the interval without being as concerned that you're 
wasting resources by doing so, which I've found to a problem with the 
current implementation--it will consume a lot of CPU scanning the same 
section right now if you lower that too much.


As far as your results, first off I'm really glad to see someone else 
comparing checkpoint/backend/bgwriter writes the same I've been doing so I 
finally have someone else's results to compare against.  I expect that the 
optimal approach here is a hybrid one that structures scanning the buffer 
cache the new way Tom suggests, but limits the number of writes to just 
enough.  I happen to be fond of the just enough computation based on a 
weighted moving average I wrote before, but there's certainly room for 
multiple implementations of that part of the code to evolve.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] enable-integer-datetimes vs datetime hash functions

2007-07-05 Thread Tom Lane
The hash opclasses for time, timestamptz, timestamp use the hashfloat8()
hash function even when we are using integer datetimes.  I had been
aware of this for awhile but thought it was just harmless ugliness ...
so long as you get a hash value, who cares how it was computed?  But on
second thought, this means that almost any arbitrary bitpattern can be
fed to hashfloat8(), and in particular an IEEE signaling NAN could be
fed to it, possibly resulting in an unexpected error.  (I tried this
on my Linux box and didn't get an error, possibly because the only
float operation actually performed is a non-NaN-aware comparison; but
perhaps other platforms would show the failure.)

Meanwhile, timetz_hash and interval_hash have the opposite problem: they
use plain ol' hash_any for structs that might contain either float8 or
int8.  That works fine for integer datetimes but would give different
hash codes for positive and negative float zeroes.  I'm not certain how
to get the datetime code to compute a negative float zero, but I
wouldn't swear it can't be done, either.

Since we've already broken hash index compatibility for 8.3, this seems
like a good time to clean this up.  Barring objections, I will make
physically separate hash functions for each of these datatypes and give
them two appropriate code paths depending on #ifdef HAVE_INT64_TIMESTAMP.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] usleep feature for pgbench

2007-07-05 Thread Jan Wieck

On 7/5/2007 5:30 PM, Tom Lane wrote:

Jan Wieck [EMAIL PROTECTED] writes:
I think I've coded it in a way that if one doesn't use the \usleep 
command at all, it will never even call gettimeofday() and use a NULL 
timeout in select() as it used to.


Did you check that the observed performance for non-usleep-using scripts
didn't change?  If this extra overhead causes a reduction in reported
TPS rates it would make it hard to compare older and newer tests.


Given pgbench's unpredictability of results ... lets see.

I ran

dropdb x
createdb x
pgbench -i -s10 x
psql -c 'checkpoint' x
sleep 1
psql -c 'checkpoint' x
pgbench -s10 -c5 -t1 x
pgbench -s10 -c5 -t1 x
pgbench -s10 -c5 -t1 x

Original pgbench reported 39, 37 and 33 TPS. Having my patch applied it 
reported 40, 38 and 33 TPS. Inserting a \usleep 1 after the update to 
accounts of a default equivalent script changed those numbers to 40, 37 
and 33. I interpret that as does not change observed performance.




Other than that I've got no objection to it.


Will be committed after adjusting the README.


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match