[HACKERS] non-blocking CREATE INDEX in 8.2??

2007-07-13 Thread Jim C. Nasby
According to http://developer.postgresql.org/index.php/Feature_Matrix,
8.2 has non-blocking CREATE INDEX, which is news to me. Is it correct?

http://developer.postgresql.org/index.php?title=Feature_Matrixoldid=1734
is the version where that was added (very early in the history of the
page).
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpbHQz24XMTQ.pgp
Description: PGP signature


Re: [HACKERS] non-blocking CREATE INDEX in 8.2??

2007-07-13 Thread Jim C. Nasby
On Fri, Jul 13, 2007 at 01:45:18PM -0700, Joshua D. Drake wrote:
 Jim C. Nasby wrote:
 According to http://developer.postgresql.org/index.php/Feature_Matrix,
 8.2 has non-blocking CREATE INDEX, which is news to me. Is it correct?
 
 CREATE INDEX CONCURRENTLY

Well, I guess it's a good thing someone created that cheat-sheet, 'cause
I can't keep which feature went into what version straight anymore. :)
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpOpzG4z9Zs9.pgp
Description: PGP signature


Re: [HACKERS] Idea: Comments on system catalogs?

2007-07-09 Thread Jim C. Nasby
On Wed, Jul 04, 2007 at 01:03:20PM +0200, Dawid Kuroczko wrote:
 Hello.
 
 I think it could be a nice idea to put descriptions from
 http://www.postgresql.org/docs/8.2/static/catalogs.html
 into system catalogs itself.  I.e., make a bunch of
 
 COMMENT ON COLUMN pg_class.relname
IS 'Name of the table, index, view, etc.';
 ...
 COMMENT ON COLUMN pg_class.relkind
IS 'r = ordinary table, i = index, S = sequence, v = view, c =
 composite type, t = TOAST table';
 
 and so on.
 
 I think it could be helpful, when you're writing your own selects
 on system catalogs.
 
 Perhaps it could be optional (as a contrib .sql file).
 
 If you like the idea, I could prepare a script which will
 convert documentation into .sql file with series of
 COMMENT ON .. IS ...;

Actually, this does exist for some things in the catalog; I suspect it
just wasn't done in the past (perhaps Postgres didn't originally have
comments). I think it would be a useful addition. But I think it'd need
to be more than just a .sql file (initdb would probably need to be
modified). Ideally, we'd be able to suck the info out of the appropriate
.sgml files.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpX4NaplLfEd.pgp
Description: PGP signature


Re: [HACKERS] psql/pg_dump vs. dollar signs in identifiers

2007-07-09 Thread Jim C. Nasby
On Mon, Jul 09, 2007 at 07:04:27PM +0100, Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  Now, because we surround the pattern with ^...$ anyway, I can't offhand
  see a use-case for putting $ with its regexp meaning into the pattern.
 
 It's possible to still usefully use $ in the regexp, but it's existence at the
 end means there should always be a way to write the regexp without needing
 another one inside.

Unless you're doing muti-line regex, what's the point of a $ anywhere
but the end of the expression? Am I missing something? Likewise with ^.

I'm inclined to escape $ as Tom suggested.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgplWIvqL4KXG.pgp
Description: PGP signature


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Jim C. Nasby
On Tue, Jul 03, 2007 at 11:31:08AM +0200, Michael Paesold wrote:
 Joshua D. Drake wrote:
 Alvaro Herrera wrote:
 Joshua D. Drake wrote:
 Did we change the default autovac parameters for 8.3 (beyond turning 
 it on?) because on any reasonably used database, they are way to 
 conservative.
 
 We're still on time to change them ...  Any concrete proposals?
 
 I could provide numbers from production high use databases. We could 
 probably back those down a little and make more reasonable numbers.
 
 Please do so. Perhaps others can also tell their typical settings.

FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpLfM5OHZFng.pgp
Description: PGP signature


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Jim C. Nasby
On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
 
 Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
 text now.
 
 Is there a reason to say anything beyond use autovac?

There is; I know that things like web session tables aren't handled very
well by autovacuum if there are any moderately large tables (anything
that will take more than a few minutes to vacuum). Eventually we should
be able to accommodate that case with multiple workers, but we'll need a
mechanism to ensure that at least one worker doesn't get tied up in
large vacuums.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpxNRAaq8BOV.pgp
Description: PGP signature


Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-03 Thread Jim C. Nasby
On Tue, Jul 03, 2007 at 11:49:05AM -0400, Bruce Momjian wrote:
 Tom Lane wrote:
  Alvaro Herrera [EMAIL PROTECTED] writes:
   Tom Lane escribi?:
   I rather doubt that.  The most likely implementation would involve
   cloning a template entry into pg_class.
  
   How about a new relkind which causes the table to be located in
   PGDATA/base/dboid/pg_temp_backendid/relfilenode
   So each backend can have its own copy of the table with the same
   relfilenode; there's no need for extra catalog entries.
  
  Uh-huh.  And what do you do with relpages, reltuples, relfrozenxid, and
  pg_statistic entries?  What if one backend wants to TRUNCATE or CLUSTER
  its copy (requiring a new relfilenode)?  Where does ALTER TABLE fit into
  this?
 
 And what is the use-case for this functionality?  What does it give us
 that we don't already have?

The use case is any system that uses temp tables in an OLTP setting,
which certainly isn't uncommon. The problem is that today (and as well
with a global temp table that is still writing to the catalogs) is that
every OLTP operation that creates or drops a temp table is doing DDL.
At best, that leads to a lot of catalog bloat. Right now, it appears to
also expose some race conditions (we've got a customer that's been bit
by this and we've been able to reproduce some odd behavior in the lab).
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp6y7cHzcrFd.pgp
Description: PGP signature


[HACKERS] Still recommending daily vacuum...

2007-07-02 Thread Jim C. Nasby
From
http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :

Recommended practice for most sites is to schedule a database-wide
VACUUM once a day at a low-usage time of day, supplemented by more
frequent vacuuming of heavily-updated tables if necessary. (Some
installations with extremely high update rates vacuum their busiest
tables as often as once every few minutes.) If you have multiple
databases in a cluster, don't forget to VACUUM each one; the program
vacuumdb  might be helpful.

Do we still want that to be our formal recommendation? ISTM it would be
more logical to recommend a combination of autovac, daily vacuumdb -a if
you can afford it and have a quiet period, and frequent manual vacuuming
of things like web session tables.

I'm happy to come up with a patch, but I figure there should be
consensus first...
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp6Wph0n8LxU.pgp
Description: PGP signature


Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-13 Thread Jim C. Nasby
On Sun, Jun 10, 2007 at 08:49:24PM +0100, Heikki Linnakangas wrote:
 Jim C. Nasby wrote:
 On Thu, Jun 07, 2007 at 10:16:25AM -0400, Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 Thinking about this whole idea a bit more, it occured to me that the 
 current approach to write all, then fsync all is really a historical 
 artifact of the fact that we used to use the system-wide sync call 
 instead of fsyncs to flush the pages to disk. That might not be the best 
 way to do things in the new load-distributed-checkpoint world.
 How about interleaving the writes with the fsyncs?
 I don't think it's a historical artifact at all: it's a valid reflection
 of the fact that we don't know enough about disk layout to do low-level
 I/O scheduling.  Issuing more fsyncs than necessary will do little
 except guarantee a less-than-optimal scheduling of the writes.
 
 If we extended relations by more than 8k at a time, we would know a lot
 more about disk layout, at least on filesystems with a decent amount of
 free space.
 
 I doubt it makes that much difference. If there was a significant amount 
 of fragmentation, we'd hear more complaints about seq scan performance.
 
 The issue here is that we don't know which relations are on which drives 
 and controllers, how they're striped, mirrored etc.

Actually, isn't pre-allocation one of the tricks that Greenplum uses to
get it's seqscan performance?
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp9v0jJYJxA0.pgp
Description: PGP signature


Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-09 Thread Jim C. Nasby
On Thu, Jun 07, 2007 at 10:16:25AM -0400, Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  Thinking about this whole idea a bit more, it occured to me that the 
  current approach to write all, then fsync all is really a historical 
  artifact of the fact that we used to use the system-wide sync call 
  instead of fsyncs to flush the pages to disk. That might not be the best 
  way to do things in the new load-distributed-checkpoint world.
 
  How about interleaving the writes with the fsyncs?
 
 I don't think it's a historical artifact at all: it's a valid reflection
 of the fact that we don't know enough about disk layout to do low-level
 I/O scheduling.  Issuing more fsyncs than necessary will do little
 except guarantee a less-than-optimal scheduling of the writes.

If we extended relations by more than 8k at a time, we would know a lot
more about disk layout, at least on filesystems with a decent amount of
free space.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpD131BxuJOC.pgp
Description: PGP signature


Re: [HACKERS] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-08 Thread Jim C. Nasby
On Fri, Jun 08, 2007 at 09:49:56AM -0400, Matthew O'Connor wrote:
 Michael Paesold wrote:
 Matthew T. O'Connor schrieb:
 Do we need a configurable autovacuum naptime at all?  I know I put it 
 in the original contrib autovacuum because I had no idea what knobs 
 might be needed.  I can't see a good reason to ever have a naptime 
 longer than the default 60 seconds, but I suppose one might want a 
 smaller naptime for a very active system?
 
 A PostgreSQL database on my laptop for testing. It should use as little 
 resources as possible while being idle. That would be a scenario for 
 naptime greater than 60 seconds, wouldn't it?
 
 Perhaps, but that isn't the use case PostgresSQL is being designed for. 
  If that is what you really need, then you should probably disable 
 autovacuum.  Also a very long naptime means that autovacuum will still 
 wake up at random times and to do the work.  At least with short 
 naptime, it will do the work shortly after you updated your tables.

Agreed. Maybe 10 minutes might make sense, but the overhead of checking
to see if anything needs vacuuming is pretty tiny.

There *is* reason to allow setting the naptime smaller, though (or at
least there was; perhaps Alvero's recent changes negate this need):
clusters that have a large number of databases. I've worked with folks
who are in a hosted environment and give each customer their own
database; it's not hard to get a couple hundred databases that way.
Setting the naptime higher than a second in such an environment would
mean it could be hours before a database is checked for vacuuming.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgprhXI13515U.pgp
Description: PGP signature


Re: [HACKERS] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-08 Thread Jim C. Nasby
On Thu, Jun 07, 2007 at 12:13:09PM -0700, Andrew Hammond wrote:
 On 6/7/07, Jim C. Nasby [EMAIL PROTECTED] wrote:
 On Mon, Jun 04, 2007 at 11:04:26AM -0400, Alvaro Herrera wrote:
  The launcher is set up to wake up in autovacuum_naptime seconds at most.
  So if the user configures a ridiculuos time (for example 86400 seconds,
  which I've seen) then the launcher would not detect the postmaster death
 
 Is there some threshold after which we should have PostgreSQL emit a
 warning to the effect of autovacuum_naptime is very large. Are you
 sure you know what you're doing?
 
 Yeah, I've seen people set that up with the intention of now autovacuum
 will only run during our slow time!. I'm thinking it'd be worth
 mentioning in the docs that this won't work, and instead suggesting that
 they run vacuumdb -a or equivalent at that time instead. Thoughts?
 
 Hmmm... it seems to me that points new users towards not using
 autovacuum, which doesn't seem like the best idea. I think it'd be

I think we could easily word it so that it's clear that just letting
autovacuum do it's thing is preferred.

 better to say that setting the naptime really high is a Bad Idea.
 Instead, if they want to shift maintenances to off hours they should
 consider using a cron job that bonks around the
 pg_autovacuum.vac_base_thresh or vac_scale_factor values for tables
 they don't want vacuumed during operational hours (set them really
 high at the start of operational hours, then to normal during off
 hours). Tweaking the enable column would work too, but they presumably
 don't want to disable ANALYZE, although it's entirely likely that new
 users don't know what ANALYZE does, in which case they _really_ don't
 want to disable it.
 
That sounds like a rather ugly solution, and one that would be hard to
implement; not something to be putting in the docs.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpcTAxuATxrP.pgp
Description: PGP signature


Re: [HACKERS] Command tags in create/drop scripts

2007-06-08 Thread Jim C. Nasby
On Tue, Jun 05, 2007 at 05:52:39PM -, Andrew Hammond wrote:
 On Jun 5, 9:19 am, [EMAIL PROTECTED] (Alvaro Herrera) wrote:
  Zdenek Kotala wrote:
   Tom Lane wrote:
   Bruce Momjian [EMAIL PROTECTED] writes:
   Is this a TODO?
 
   I don't think so; there is no demand from anybody but Zdenek to remove
   those programs.  Has it ever even come up before?
 
  Personally I found really strange to have createuser and createdb
  shipped by Postgres when I started using it.  I just didn't complain.
 
 +1. Given the prevalence of the pg_foo convention, those names are
 clunky. So is initdb. I'm less creative than Zdenek, so I'd suggest
 simply renaming to pg_createuser and friends with the same command
 line options as the originals. Have the binaries check $0 and emit a
 warning about using the deprecated name to STDERR if called by a name
 that doesn't have the pg_ prefix. Default to symlinking the old names
 for backwards compatibility until 9.0.

+1
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpFvOvoY19iH.pgp
Description: PGP signature


[HACKERS] Issues with factorial operator

2007-06-08 Thread Jim C. Nasby
I'm working with a customer that recently discovered that some code had
generated the following nice query...

SELECT ... WHERE table_id = 92838278! AND ...

So their production server now has several processes that are trying to
compute some absurdly large factorial. There's two issues here:

1) the computation doesn't check for signals. This means both a plain
kill and pg_cancel_backend() are useless.

2) Even though the answer is going to be an obscene number of digits,
and that's supposed to be fed into a numeric, there's no overflow or
bounds checking occurring. This is true even if I store into a field
defined as numeric:

decibel=# create table n(n numeric);
CREATE TABLE
decibel=# insert into n select !;
INSERT 0 1
decibel=# select char_length(trim(n, '0')) from n;
 char_length 
-
9466
(1 row)

So at the very least the documentation is confusing:

The type numeric can store numbers with up to 1000 digits of precision
and perform calculations exactly.
...
Specifying

NUMERIC

without any precision or scale creates a column in which numeric values
of any precision and scale can be stored, up to the implementation limit
on precision.

Yet here we have a numeric that's storing nearly 10,000 digits of
precision.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpy96qgWmWSR.pgp
Description: PGP signature


Re: [HACKERS] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-07 Thread Jim C. Nasby
On Mon, Jun 04, 2007 at 11:04:26AM -0400, Alvaro Herrera wrote:
 The launcher is set up to wake up in autovacuum_naptime seconds at most.
 So if the user configures a ridiculuos time (for example 86400 seconds,
 which I've seen) then the launcher would not detect the postmaster death

Yeah, I've seen people set that up with the intention of now autovacuum
will only run during our slow time!. I'm thinking it'd be worth
mentioning in the docs that this won't work, and instead suggesting that
they run vacuumdb -a or equivalent at that time instead. Thoughts?
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpwir45JckSd.pgp
Description: PGP signature


Re: [HACKERS] To all the pgsql developers..Have a look at the operators proposed by me in my researc

2007-06-06 Thread Jim C. Nasby
On Sat, Jun 02, 2007 at 01:37:19PM +, Tasneem Memon wrote:
 We can make the system ask the user as to what membership degree s/he wants 
 to get the values, but we don?t want to make the system interactive, where a 
 user gives a membership degree value of his/her choice. These operators are 
 supposed to work just like the other operators in SQL.. you just put them in 
 the query and get a result. I have put 0.8 because all the case studies I 
 have made for the NEAR,  0.8 seems to be the best choice.. 0.9 narrows the 
 range.. 0.75 or 0.7 gets those values also that are irrelevant.. However, 
 these values will no more seem to be irrelevant when we haven?t got any 
 values till the md 0.8, so the operator fetches them when they are the 
 NEARest. 
  
While having them function just like any other operator is good, it
seems like you're making quite a bit of an assumption for the user;
namely that you know what their data looks like better than they might.
Is it not possible that someone would come along with a dataset that
looks different enough from your test cases so that the values you
picked wouldn't work?
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpqZoDw3txQR.pgp
Description: PGP signature


Re: [HACKERS] TOAST usage setting

2007-06-04 Thread Jim C. Nasby
On Fri, Jun 01, 2007 at 01:50:12PM -0400, Bruce Momjian wrote:
 I think the long-term solution is to go to a 2k/8k fragment/block model,
 but that isn't going to happen for 8.3.
 
There might well have been lessons learned since UFS (anyone know what
ZFS does in this regard?), but I agree that we want to be able to do a
mix of full chunks and fragments.

 The big question is do we want to drop the target tuple size down to
 512, and increase the chunk size to 8k for 8.3?  Dropping the tuple size
 down to 512 is going to give us some smaller TOAST values to fill in
 free space created by the 8k chuck size, assuming you have both types of
 values in the table.  Do we want to increase the access time of long
 TOAST by 6% if it means having more wasted space for lots of 4.1k
 values?

If we do that people could see their disk space usage increase by up to
16x: currently 513 bytes fits in heap and takes (roughly) 513 bytes; if
we make that change it would then get toasted and take 8K. I don't think
we want to do that. Disk space aside, it's almost certain to seriously
hurt performance as soon as you don't fit entirely in memory.

How big is the hit for setting both to 512? Also, is this something that
could be set at initdb instead of compile time? That would make it
easier for folks to go back to old behavior if the needed to...
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpzcxY6b7mU2.pgp
Description: PGP signature


[HACKERS] ecpg test runs out of disk space

2007-05-27 Thread Jim C. Nasby
Just had a rather disturbing event happen on platypus..

[EMAIL 
PROTECTED]:07]~/buildfarm/HEAD/pgsql.39276/src/interfaces/ecpg/test/results:271ll
 preproc-variable.*
-rw-r--r--  1 buildfarm  decibel 6328 May 26 23:42 preproc-variable.c
-rw-r--r--  1 buildfarm  decibel  76460670919 May 27 02:18 
preproc-variable.stderr
-rw-r--r--  1 buildfarm  decibel  14668726272 May 27 03:07 
preproc-variable.stdout

The processes that were running (I'd already killed 39276 without thinking...):
[EMAIL 
PROTECTED]:11]~/buildfarm/HEAD/pgsql.39276/src/interfaces/ecpg/test/results:282ps
 auxww
USERPID %CPU %MEM   VSZ   RSS  TT  STAT STARTED  TIME COMMAND
buildfarm 65843 94.5  0.1 20848  3056  ??  RN   11:42PM 204:15.00 
preproc/variable
buildfarm 61647  0.0  0.0  5776  1012  ??  IN   11:42PM   0:00.01 gmake -j4 
NO_LOCALE=1 check
buildfarm 61654  0.0  0.0  5784  1032  ??  IN   11:42PM   0:00.01 gmake -C test 
check
buildfarm 61876  0.0  0.0  5224  1624  ??  IN   11:42PM   0:00.09 sh 
./pg_regress --dbname=regress1 --temp-install --top-builddir=../../../.. 
--temp-port=55678 --multibyte=SQL_ASCII --load-language=plpgsql --no-locale
buildfarm 65430  0.0  0.1 54168  4656  ??  IN   11:42PM   0:00.21 [postgres]
buildfarm 65846  0.0  0.1 54232  3720  ??  SNs  11:42PM   0:01.55 postmaster: 
writer process(postgres)
buildfarm 65847  0.0  0.1 17884  3244  ??  SNs  11:42PM   0:00.11 postmaster: 
stats collector process(postgres)
buildfarm 65848  0.0  0.1 54452  3508  ??  INs  11:42PM   0:00.05 postmaster: 
autovacuum launcher process(postgres)
buildfarm 69673  0.0  0.1 30584  3676  ??  S 3:03AM   0:00.07 sshd: [EMAIL 
PROTECTED] (sshd)
buildfarm 69674  0.0  0.1  9452  3284  p3  Ss3:03AM   0:00.07 -tcsh (tcsh)
buildfarm 69886  0.0  0.0  4740  1040  p3  R+3:11AM   0:00.00 ps auxww

preproc-variable.stdout contains the following line, over and over:

  ?, born 140737488349640, age = -5704, married (null), children = 4196007

Here's the interesting bits from .stderr:
[EMAIL 
PROTECTED]:10]~/buildfarm/HEAD/pgsql.39276/src/interfaces/ecpg/test/results:277head
 -n 100 preproc-variable.stderr 
[NO_PID]: ECPGdebug: set to 1
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: ECPGconnect: opening database regress1 on DEFAULT port DEFAULT 
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: ECPGexecute line 46: QUERY: set datestyle to iso on connection 
regress1
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: ECPGexecute line 46 Ok: SET
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: ECPGexecute line 49: QUERY: create  table family ( name char  ( 8 )   
 , born integer   , age smallint   , married date, children integer   ) 
on connection regress1
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: ECPGexecute line 49 Ok: CREATE TABLE
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: ECPGexecute line 52: QUERY: insert into family ( name  , married  , 
children  ) values ( 'Mum' , '19870714' , 3 )  on connection regress1
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: terminating connection because of crash of another server 
process[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: raising sqlcode 0
[NO_PID]: sqlca: code: 0, state: 57P02
[NO_PID]: ECPGexecute line 52: error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[NO_PID]: sqlca: code: 0, state: 57P02
[NO_PID]: raising sqlstate YE000 (sqlcode: -400) in line 52, ''server closed 
the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
' in line 5'.
[NO_PID]: sqlca: code: -400, state: YE000
sql error 'server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
' in line 5
[NO_PID]: ECPGexecute line 53: QUERY: insert into family ( name  , born  , 
married  , children  ) values ( 'Dad' , '19610721' , '19870714' , 3 )  on 
connection regress1
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: ECPGexecute line 53: error: no connection to the server
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: raising sqlstate YE000 (sqlcode: -400) in line 53, ''no connection to 
the server
' in line 53.'.
[NO_PID]: sqlca: code: -400, state: YE000
sql error 'no connection to the server
' in line 53.
[NO_PID]: ECPGexecute line 54: QUERY: insert into family ( name  , age  ) 
values ( 'Child 1' , 16 )  on connection regress1
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: ECPGexecute line 54: error: no connection to the server
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: raising sqlstate YE000 (sqlcode: -400) in line 54, ''no connection to 
the server
' in line 54.'.
[NO_PID]: sqlca: code: -400, state: YE000
sql error 'no connection to the server
' in line 54.
[NO_PID]: ECPGexecute line 55: QUERY: insert into family ( name  , age  ) 
values ( 'Child 2' , 14 )  on connection regress1

Re: [HACKERS] COPY into a view; help w. design patch

2007-05-21 Thread Jim C. Nasby
On Sat, May 19, 2007 at 01:41:47PM -0400, Tom Lane wrote:
  I _could_ make tables that correspond
  to the views and put BEFORE INSERT triggers on them and
  have the triggers insert into the views (or the equalivent),
  but then the users would have to use the views for most
  things and the corresponding tables when doing a COPY
  or using the application's data import function.
 
 There's been previous discussion of allowing BEFORE INSERT triggers
 on views, so long as the triggers always return NULL to suppress
 the actual insertion attempt (ie, we'd move the can't insert into
 view test out of the rewriter and put it downstream of trigger firing
 in the executor).  So far no one's figured out how to make that idea
 work for UPDATE/DELETE, but maybe you could argue that even if it
 only worked for INSERT it'd be a useful feature.  It'd certainly solve
 the problem for COPY.

What about adding COPY support to rules? ISTM if you want to copy into a
view you probably want to insert into it as well, so why not use the
same mechanism? Presumably a COPY rule would also be faster than a
trigger.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] COPY into a view; help w. design patch

2007-05-21 Thread Jim C. Nasby
On Mon, May 21, 2007 at 05:02:29PM +, Karl O. Pinc wrote:
 
 On 05/21/2007 11:23:57 AM, Jim C. Nasby wrote:
 
 What about adding COPY support to rules? ISTM if you want to copy into
 a
 view you probably want to insert into it as well, so why not use the
 same mechanism? Presumably a COPY rule would also be faster than a
 trigger.
 
 I'd say there's no difference between the rule you'd use
 for COPYing and the rule you'd use for INSERTing,
 which is why my patch produces an
 INSERT statement and then proceeds to (attempt
 to) execute the statement for every row of data
 to be copied.  If you don't have a rule that allows
 INSERT into the view you get (the already existing)
 error with a hint that tells you to make an INSERT
 rule.

As Tom mentioned, that's very non-transparent to users. You're also
assuming that converting a COPY to a string of INSERTS (which would then
get pushed through the rule system one-by-one) would be as efficient as
just copying into a table. I don't believe that's the case.

I haven't studied the rule code, but at least for the simple case of
redirecting a copy into a view to a single table (ie: a single statement
INSTEAD rule that has no where clause) the copy command should be able
to be changed by the rule so that it's just inserting into a different
table. The performance should then be the same as if you copied directly
into that table in the first place.

This doesn't mean that a row-by-row capability (or the ability to have
COPY generate insert statements) would be bad, but they are not the same
as a simple rewrite of a COPY command (ie: adding COPY support to rules).
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Not ready for 8.3

2007-05-18 Thread Jim C. Nasby
On Thu, May 17, 2007 at 11:02:31PM -0400, Greg Smith wrote:
 On Thu, 17 May 2007, David Fetter wrote:
 
 Would you be interested in providing this meat?  You're uniquely
 qualified because your shins still smart from all the things you
 barked them on :)
 
 Unfortunately I'm temporarily on the other side of this problem; all the 
 time I have to spare right now is going to into reviewing other people's 
 patches.
 
 Even more unfortunately, I'm not 100% happy with what I'm doing, and 
 certainly wouldn't want to present the shoddy techniques I'm using as 
 recommended.  If anyone else has useful info on this subject (keeping a 
 local repository in sync with HEAD via rsync while working on branches) 
 they'd like to pass along, please drop me a message with whatever level of 
 documentation you might have--even rough notes would be a help.  Once I'm 
 finished with the reviews I'm trying to contribute, I will sort through 
 any suggestions I get and turn that into formal documentation.  This has 
 left enough shin damage that I'd enjoy completely slaying the cause.

Rather than trying to get something FAQ-perfect right now, can you at
least brain-dump what your current process is so that others can learn?
I've created a wiki page for this and added Heikki's tips:
http://developer.postgresql.org/index.php/Working_with_CVS
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Async commands (like drop index)

2007-05-18 Thread Jim C. Nasby
On Thu, May 17, 2007 at 12:30:45PM -0700, Joshua D. Drake wrote:
 It seems that it may be useful to allow something like:
 
 DROP INDEX NOWAIT.
 
 The idea being, that the terminal will come back, the index will be 
 dropped in the background. If it doesn't drop, it rollback like normal 
 and logs.
 
 I bring this up now, as an idea. We can argue about it later... :)

Assuming the concurrent psql stuff gets in, do you still see a use for
this?
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Lack of urgency in 8.3 reviewing

2007-05-18 Thread Jim C. Nasby
On Fri, May 18, 2007 at 12:33:11AM +0800, Cui Shijun wrote:
 Thank you for your suggestions, I am thinking about Full page writes
 improvement. It seems not so complicated, just fit for a novice like
 me.
 I'll work on it.   :-)
 
Updated on http://developer.postgresql.org/index.php/Todo:PatchStatus
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Lack of urgency in 8.3 reviewing

2007-05-18 Thread Jim C. Nasby
On Fri, May 18, 2007 at 03:21:00PM +0200, Zdenek Kotala wrote:
 What's about
 
 - full page writes improvement
 
 but I will have time after PGCon.

Added you to the list for that at
http://developer.postgresql.org/index.php/Todo:PatchStatus
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Async commands (like drop index)

2007-05-18 Thread Jim C. Nasby
On Fri, May 18, 2007 at 01:39:56PM -0400, Neil Conway wrote:
 On Fri, 2007-18-05 at 13:29 -0400, Alvaro Herrera wrote:
  I think what Joshua really wants is an equivalent of this
 
 That's not what his original email asked for, at any rate.
 
  start:
  BEGIN;
  LOCK TABLE foo IN ACCESS EXCLUSIVE MODE NOWAIT;
  -- if fail, rollback and go to start
  DROP INDEX foo_idx;
  COMMIT;
  
  The idea is that the lock is only acquired if immediately available,
  thus not blocking other queries which would otherwise be blocked behind
  the DROP INDEX.
 
 ISTM this can easily be implemented with statement_timeout (which is
 more general to boot).

Well, with statement_timeout, and writing a function...

This functionality would actually be useful, but I'm not sure if it's
worth including in core. It would be really nice to have an example of
how to do this in a PostgreSQL Cookbook somewhere though.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Managing the community information stream

2007-05-16 Thread Jim C. Nasby
On Wed, May 16, 2007 at 10:46:50AM -0400, Alvaro Herrera wrote:
   I am not sure.  We will have to investigate more the capabilities of the
   bug tracking system we intend to use.  In the worst case one could add
   the URL for the archived message copy; second worst would be bouncing
   (hopefully not forward) the interesting messages to the bug address.  
  
  Sounds like what I do with the TODO list now.
 
 Except that this is the *worst case* with the bug tracker, whereas for
 the TODO list it is not only the worst case, it is also the best case
 and the only case at all.

And any number of people can manage it (just like the wiki).
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Not ready for 8.3

2007-05-16 Thread Jim C. Nasby
On Wed, May 16, 2007 at 08:58:44AM +0100, Dave Page wrote:
 Jim C. Nasby wrote:
  On Tue, May 15, 2007 at 10:32:14PM +0200, Magnus Hagander wrote:
  Stefan Kaltenbrunner wrote:
  They are not stable.  The items should point to the archives, which are
  supposedly more stable.  (I had already fixed one item in PatchStatus
  this morning).  Really it would be much nicer to have links using the
  Message-Id but I doubt that's at all doable.
  hrm - I see so is there a particular reason for that behaviour ?
  They're stable until Bruce removes something from the queue. When
  something is removed, it's renumbered.
 
  It's how mhonarc works. It's the same with the archives - if we delete a
  mail, they get renumbered. So we never should delete, we should just
  blank out, but it has happened a couple of times.
  
  Isn't there any other archiver we could use? The lack of URL stability
  in mhonarc is bad enough, but the cross-month issue is just horrible.
 
 Nothing useful last time I looked (a year or two back admittedley). I
 have a design for one in mind that I was looking to prototype - there
 are some php classes that would make it quite simple to get messages
 into a database either via procmail, or from an mbox.
 
 the stumbling block I was running into was rewriting the old archives
 URLs to the new ones.

How much visibility do we have into the mhonarc database? We should be
able to come up with a simple redirector that would point the old
mhonarc URLs to URLs for the new system...
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] 8.3 pending patch queue

2007-05-16 Thread Jim C. Nasby
On Wed, May 16, 2007 at 12:33:38AM -0300, Marc G. Fournier wrote:
 Someone (you, I think) advocated a '3 weeks and then dump the rest of the 
 patches' (not quote as  strong of wording, but similar) ... why not split the 
 patches list up:
 
 submitted patches, not reviewed
 reviewed patches, needs work, waiting on author
 reviewed patches, ready for commit.
 
 Once feature freeze started, the first list should only get small patches to 
 it, easily reviewed and committed ... then, focus on reviewing list A and 
 move 
 the patch to list B or commit it ... once list A is cleared off, we go into 
 Beta ... if a patch on list B gets re-submitted before Beta, it gets reviewed 
 and either committed, or punt'd to the next release ...

I don't think we want to be adding anything new in beta. But if we went
into 'alpha' when list A is cleared that might work.

(BTW, it's not really clear which list A is...)

 That leaves Freeze - Beta being as long as it takes to get thorugh List A 
 ... 
 and the only thing punt'd to the next release being that which really isn't 
 ready ...
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Testing concurrent psql

2007-05-16 Thread Jim C. Nasby
On Wed, May 16, 2007 at 09:43:36AM -0400, Gregory Stark wrote:
 
 I'm looking for corner cases where the concurrent psql patch doesn't handle
 things properly. I'm wondering about multiple result sets but I can't think of
 any cases where I can test them.
 
 If you submit multiple commands at the psql prompt then psql seems to stop at
 the first semicolon and send the query separately. The only way to send
 multiple queries together seems to be with -c and I don't think we have any
 intention to support asynchronous queries via that route. Regular queries
 still work fine via this route.
 
 I seem to recall there was a way to construct scenarios that returned multiple
 result sets via rules but I don't know how to arrange that. Anyone remember?

An ALSO SELECT rule?
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Not ready for 8.3

2007-05-16 Thread Jim C. Nasby
On Wed, May 16, 2007 at 04:34:56PM +0100, Dave Page wrote:
 How much visibility do we have into the mhonarc database? We should be
 able to come up with a simple redirector that would point the old
 mhonarc URLs to URLs for the new system...
 
 coughdatabase?/cough
 
And here I thought the reason we used tho POS was because it was the
only archiver that used PostgreSQL as it's backend...

 It's a file system. It simply generates HTML (or in our case) PHP files 
 from each message in an mbox. That's one of the reasons for the monthly 
 break - without it, the directories would be unusably full of files.
 
 I the current URLs represent the month, and the ID of the message as it 
 comes out of the mbox I believe. We could probably write a script to 
 dump a list of message IDs, directories and mbox positions I imagine, 
 and then import that into a new database.
 
Yeah, if the files still resemble real emails then we can probably come
up with a way to pull the data in.

 It's been on my list to rewrite the whole archive system for a while for 
 various reasons. There is quite a bit of crossover with the patch 
 tracker I proposed so I was hoping to look at both together.

Let me know when you start on that...
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Not ready for 8.3

2007-05-16 Thread Jim C. Nasby
On Wed, May 16, 2007 at 07:48:10PM +0200, Magnus Hagander wrote:
 Dave Page wrote:
  I the current URLs represent the month, and the ID of the message as
  it comes out of the mbox I believe. We could probably write a script
  to dump a list of message IDs, directories and mbox positions I
  imagine, and then import that into a new database.
   
  Yeah, if the files still resemble real emails then we can probably come
  up with a way to pull the data in.
  
  We have all the mbox files, so we can import them from there as raw
  messages.
 
 yeah, that's clearly the best source to work from. It's *possible* work
 from the mhonarc files (I've done it before), but it's more work.

We'd want the old URLs to be redirected too, so at some point we'll have
to deal with mhonarc.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Not ready for 8.3

2007-05-16 Thread Jim C. Nasby
On Wed, May 16, 2007 at 09:32:44PM +0100, Richard Huxton wrote:
 Dave Page wrote:
 Richard Huxton wrote:
 Magnus Hagander wrote:
 It's been on my list to rewrite the whole archive system for a while
 for various reasons. There is quite a bit of crossover with the patch
 tracker I proposed so I was hoping to look at both together.
 Let me know when you start on that...
 Roger.
 Same here - I've done something similar (off mhonarc files and in much
 smaller scale) before, and I'm definitely interested in helping out.
 Is everyone aware of this system that runs on a well-known open-source
 database?
   http://www.archiveopteryx.org/
 I've used it in a small way, and while I don't claim to have looked at
 it in detail it seems to pretty much do what it claims to.
 
 
 Yeah, I looked at it in the past. The database storage part is actually
 pretty simple - it's the web front end that's going to take more effort,
 and thats what that product doesn't do (or if it does, it's a secondary
 function they don't shout about).
 
 It's supposed to have something in the latest version, I think. I used 
 it as backing store for a small workflow app, so I've got some simple 
 views/functions I added and PHP code (cake-php framework) for displaying 
 messages if it'll be of any use.
 
 My one concern with the schema was that there didn't seem to be a way to 
 partition archives (e.g. by year) to make maintenance a little simpler 
 for large databases.

Luckily I happen to know of a database that would make that transparent
to the app...

But I tend to agree with Dave; the storage part is pretty easy. If we've
still got to write our own front-end ISTM it'd be better to just make it
exactly what we want...
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Managing the community information stream

2007-05-15 Thread Jim C. Nasby
On Tue, May 15, 2007 at 01:18:42PM -0400, Bruce Momjian wrote:
  In Debian's bug tracking system, when the bug is created (which is done
  by sending an email to a certain address) it gets a number, and the
  email is distributed to certain lists.  People can then reply to that
  mail, and send messages to [EMAIL PROTECTED] and it gets tracked in
  the bug, and you can see all those messages in the bug report.  I
  ass-ume that BZ 3.0 does something similar.
 
 But often a TODO item has multiple threads containing details (often
 months apart), and it isn't obvious at the time the thread is started
 that this will happen.  Note the number of TODO items that now have
 multiple URLs.  How is that handled?

Worst-case, for those cases we add URLs to the tracker manually like you
do now. The big advantage is that most of the time that's not needed.
And in cases where it's not automatic we can grant any number of people
permission to add that information to the tracker, because that
permission wouldn't be tied to CVS commit privs.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Jim C. Nasby
On Tue, May 15, 2007 at 10:25:35AM -0700, Jeff Davis wrote:
 On Tue, 2007-05-15 at 10:42 +0100, Heikki Linnakangas wrote:
  Luke Lonergan wrote:
   32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache
   effect.
   
   How about using 256/blocksize?
  
  Sounds reasonable. We need to check the effect on the synchronized 
  scans, though.
  
 
 I am a little worried that there will be greater differences in position
 as the number of scans increase. If we have only 8 buffers and several
 scans progressing, will they all be able to stay within a few buffers of
 eachother at any given time?
 
 Also, with 8 buffers, that means each scan must report every 4 pages at
 most (and maybe every page), which increases lock contention (the new
 design Heikki and I discussed requires a lock every time a backend
 reports its position).

Given that spoiling the L2 cache is a trivial cost compared to extra
physical IO, ISTM we should go with a largish ring for sync scans. What
do you think would be the ideal size? 32 buffers?
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Not ready for 8.3

2007-05-15 Thread Jim C. Nasby
On Tue, May 15, 2007 at 12:42:28PM -0400, Bruce Momjian wrote:
 Joshua D. Drake wrote:
   Patch status:
   
 http://developer.postgresql.org/index.php/Todo:PatchStatus
  
  If... this is actually a problem (I leave to other committers and 
  reviewers to comment) then I suggest we push all patches without a 
  reviewer as of now to 8.4.
  
  Leaving only those patches that have confirmed reviewers to be worked 
  through.
  
  FYI, whoever did that Todo:Patch status, Bravo! That is easily one of 
  the smallest but best improvements to the process I have seen in recent 
  memory.
 
 I did one of those for previous releases.  I guess you forgot.  It was
 done by someone else this time only because I was going to be traveling.

Unless you're really in love with doing that sort of thing it's really
good that someone else did it. You're one of a handful of folks that can
actually review patches, while there's any number of us that can update
a wiki.

Speaking of reviewers... should we put some thought into how we can
increase the number of people who can review code? It seems that's one
of our biggest bottlenecks...
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Not ready for 8.3

2007-05-15 Thread Jim C. Nasby
On Tue, May 15, 2007 at 10:32:14PM +0200, Magnus Hagander wrote:
 Stefan Kaltenbrunner wrote:
  They are not stable.  The items should point to the archives, which are
  supposedly more stable.  (I had already fixed one item in PatchStatus
  this morning).  Really it would be much nicer to have links using the
  Message-Id but I doubt that's at all doable.
  
  hrm - I see so is there a particular reason for that behaviour ?
 
 They're stable until Bruce removes something from the queue. When
 something is removed, it's renumbered.
 
 It's how mhonarc works. It's the same with the archives - if we delete a
 mail, they get renumbered. So we never should delete, we should just
 blank out, but it has happened a couple of times.

Isn't there any other archiver we could use? The lack of URL stability
in mhonarc is bad enough, but the cross-month issue is just horrible.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Bulk inserts and usage_count

2007-05-15 Thread Jim C. Nasby
On Tue, May 15, 2007 at 04:37:28PM +0100, Heikki Linnakangas wrote:
 While testing the buffer ring patch, I noticed that bulk inserts with 
 both INSERT and COPY pin and unpin the buffer they insert to for every 
 tuple. That means that the usage_count of all those buffers are bumped 
snip 
 A fix for COPY will fall naturally out of the buffer ring patch, but not 
 for INSERT.
 
 A more general fix would be to somehow keep the last insertion page 
 pinned across calls to heap_insert.

ISTR discussion in the past about having things like COPY and INSERT
INTO ... SELECT building entire pages in one shot once they exhaust the
FSM. Not only would it address this issue, but it would probably improve
performance in many ways (less locking and unlocking, ability to
pre-sort before inserting into indexes, fewer calls to FSM, probably a
bunch of other things).
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Not ready for 8.3

2007-05-15 Thread Jim C. Nasby
On Tue, May 15, 2007 at 07:01:39PM -0400, Bruce Momjian wrote:
  Unless you're really in love with doing that sort of thing it's really
  good that someone else did it. You're one of a handful of folks that can
  actually review patches, while there's any number of us that can update
  a wiki.
 
 Sure.  Good idea.  All I had to do was ask (10 times, though).  ;-)
 
Even having to ask 10 times undoubtedly was a lot faster than doing it
yourself though. :)

  Speaking of reviewers... should we put some thought into how we can
  increase the number of people who can review code? It seems that's one
  of our biggest bottlenecks...
 
 Sure.  That is going to move us forward.

So... any ideas? :)
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Interaction of PITR backups and Bulkoperationsavoiding WAL

2007-05-15 Thread Jim C. Nasby
Simon intended to commit this per
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01761.php
(actually, there was a change in what was being done). I suspect this
item isn't valid any longer.

On Tue, May 15, 2007 at 07:30:58PM -0400, Bruce Momjian wrote:
 
 This has been saved for the 8.4 release:
 
   http://momjian.postgresql.org/cgi-bin/pgpatches_hold
 
 ---
 
 Simon Riggs wrote:
  On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:
   Simon Riggs [EMAIL PROTECTED] writes:
On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:
It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.
   
I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.
   
   I don't think that people are very likely to need to turn archiving on
   and off on-the-fly.  Your proposed solution introduces a great deal of
   complexity (and risk of future bugs-of-omission, to say nothing of race
   conditions) to solve a non-problem.  We have better things to be doing
   with our development time.
  
  It's certainly a quicker fix. Unless others object, I'll set
  archive_command to only be changeable at server startup.
  
  -- 
Simon Riggs 
EnterpriseDB   http://www.enterprisedb.com
  
  
  
  ---(end of broadcast)---
  TIP 6: explain analyze is your friend
 
 -- 
   Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
   EnterpriseDB   http://www.enterprisedb.com
 
   + If your life is a hard drive, Christ can be your backup. +
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 

-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Performance monitoring

2007-05-13 Thread Jim C. Nasby
On Sun, May 13, 2007 at 07:54:20AM +0100, Heikki Linnakangas wrote:
 Maybe we should improve the stats system so that we can collect events 
 with timestamps and durations, but in my experience log files actually 
 are the most reliable and universal way to collect real-time performance 
 information. Any serious tool has a generic log parser. The other 
 alternative is SNMP. I welcome the efforts on pgsnmpd..

Having timing information in the stats system would be useful, but I'm
not sure how it could actually be done. But at least if the information
is in the stats system it's easy to programatically collect and process.
SNMP is just one example of that (fwiw I agree with Magnus that it
probably doesn't make sense to turn pgsnmpd into a log parser...)
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)

2007-05-12 Thread Jim C. Nasby
Moving to -hackers.

On Fri, May 11, 2007 at 04:37:44PM +0100, Heikki Linnakangas wrote:
 If you know when the checkpoint ended, and you know how long each of the 
 pieces took, you can reconstruct the other times easily.  The way you 
 describe this it is true--that the summary is redundant given the 
 detail--but if you put yourself in the shoes of a log file parser the 
 other way around is easier to work with.  Piecing together log entries 
 is a pain, splitting them is easy.
 
 If I had to only keep one line out of this, it would be the one with the 
 summary.  It would be nice to have it logged at INFO.
 
 Yeah, if we have the summary line we don't need the other lines and vice 
 versa. I have sympathy for parsing log files, I've done that a lot in 
 the past and I can see what you mean. Having the individual lines is 
 nice when you're monitoring a running system; you don't get the summary 
 line until the checkpoint is finished. I suppose we can have both the 
 individual lines and the summary, the extra lines shouldn't hurt anyone, 
 and you won't get them unless you turn on the new log_checkpoints 
 parameter anyway.

Not to beat a dead horse, but do we really want to force folks to be
parsing logs for performance monitoring? Especially if that log parsing
is just going to result in data being inserted into a table anyway?

I know there's concern about performance of the stats system and maybe
that needs to be addressed, but pushing users to log parsing is a lot of
extra effort, non-standard, likely to be overlooked, and doesn't play
well with other tools. It also conflicts with all the existing
statistics framework.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] utf8 COPY DELIMITER?

2007-04-18 Thread Jim C. Nasby
On Tue, Apr 17, 2007 at 02:28:18PM -0400, Tom Lane wrote:
 I doubt that supporting a single multibyte character would be an
 interesting extension --- if we wanted to do anything at all there, we'd
 just generalize the delimiter to be an arbitrary string.  But it would
 certainly slow down COPY by some amount, which is an area where you'll
 get push-back for performance losses, so you'd need to make a convincing
 use-case for it.

Couldn't we use a fast code path (what we have now) for the case when
the delimiter is a single byte? That would allow for multi-character
delimiters without penalizing those that don't use them.

As for use case, I worked on migrating some stuff out of a MySQL
database a while ago, and having arbitrary string delimiters would have
made life easier.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Background LRU Writer/free list

2007-04-18 Thread Jim C. Nasby
On Wed, Apr 18, 2007 at 09:09:11AM -0400, Greg Smith wrote:
 I'm mostly done with my review of the Automatic adjustment of 
 bgwriter_lru_maxpages patch.  In addition to issues already brought up 
 with that code, there are some small things that need to be done to merge 
 it with the recent pg_stat_bgwriter patch, and I have some concerns about 
 its unbounded scanning of the buffer pool; I'll write that up in more 
 detail or just submit an improved patch as I get time this week.
 
 But there's a fundamental question that has been bugging me, and I think 
 it impacts the direction that code should take.  Unless I'm missing 
 something in my reading, buffers written out by the LRU writer aren't ever 
 put onto the free list.  I assume this is to stop from prematurely 
 removing buffers that contain useful data.  In cases where a substantial 
 percentage of the buffer cache is dirty, the LRU writer has to scan a 
 significant portion of the pool looking for one of the rare clean buffers, 
 then write it out.  When a client goes to grab a free buffer afterward, it 
 has to scan the same section of the pool to find the now clean buffer, 
 which seems redundant.
 
 With the new patch, the LRU writer is fairly well bounded in that it 
 doesn't write out more than it thinks it will need; you shouldn't get into 
 a situation where many more pages are written than will be used in the 
 near future.  Given that mindset, shouldn't pages the LRU scan writes just 
 get moved onto the free list?

I've wondered the same thing myself.

If we're worried about freeing pages that we might want back, we could
change the code so that ReadBuffer would also look at the free list if
it couldn't find a page before going to the OS for it.

So if you make this change will BgBufferSync start incrementing
StrategyControl-nextVictimBuffer and decrementing buf-usage_count like
StrategyGetBuffer does now?
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hacking on PostgreSQL via GIT

2007-04-18 Thread Jim C. Nasby
On Wed, Apr 18, 2007 at 06:39:34PM +1200, Martin Langhoff wrote:
 Keyword expansions are generally bad because SCM tools should track
 _content_ - and keyword expansions _modify_ it to add metadata that is
 somewhat redundant, obtainable in other ways, and should just not be in
 the middle of the _data_. Those modifications lead to patches that have
 bogus hunks and sometimes don't apply, MD5/SHA1 checksums that don't
 match and a whole lot of uncertainty.

Then how do you tell what version a file is if it's outside of a
checkout?
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Hacking on PostgreSQL via GIT

2007-04-18 Thread Jim C. Nasby
On Thu, Apr 19, 2007 at 10:07:08AM +1200, Martin Langhoff wrote:
 Jim C. Nasby wrote:
  Then how do you tell what version a file is if it's outside of a
  checkout?
 
 It's trivial for git to answer that - the file will either be pristine,
 and then we can just scan for the matching SHA1, or modified, and we can
 scan (taking a weee bit more time) which are the closest matches in
 your history, in what branches and commits.
 
 The actual scripting for this isn't written just yet -- Linus posted a
 proof-of-concept shell implementation along the lines of
 
  git rev-list --no-merges --full-history v0.5..v0.7 --
 src/widget/widget.c  rev-list
 
best_commit=none
best=100
while read commit
do
git cat-file blob $commit:src/widget/widget.c  tmpfile
lines=$(diff reference-file tmpfile | wc -l)
if [ $lines -lt $best ]
then
echo Best so far: $commit $lines
best=$lines
fi
done  rev-list
 
 and it's fast. One of the good properties of this is that you can ask
 for a range of your history (v0.5 to v0.7 in the example) and an exact
 path (src/widget/widget.c) but you can also say --all (meaning in all
 branches) and a handwavy over there, like src. And git will take an
 extra second or two on a large repo, but tell you about all the good
 candidates across the branches.
 
 Metadata is metadata, and we can fish it out of the SCM easily - and
 data is data, and it's silly to pollute it with metadata that is mostly
 incidental.
 
 If I find time today I'll post to the git list a cleaned up version of
 Linus' shell script as
 
 git-findclosestmatch head or range or --all path/to/scan/ \
 randomfile.c

Not bad... took you 40 lines to answer my question. Let's see if I can
beat that...

  Then how do you tell what version a file is if it's outside of a
  checkout?

Answer: you look at the $Id$ (or in this case, $PostgreSQL$) tag.

Sorry, tried to get it to 2 lines, but couldn't. ;)

I understand the argument about metadata and all, and largely agree with
it. But on the other hand I think a version identifier is a critical
piece of information; it's just as critical as the file name when it
comes to identifying the information contained in the file.

Or does GIT not use filenames, either? :)
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Autovacuum vs statement_timeout

2007-04-17 Thread Jim C. Nasby
On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote:
 Tom Lane wrote:
  I seem to remember that we'd agreed that autovacuum should ignore any
  globally set statement_timeout, on the grounds that a poorly chosen
  setting could indefinitely prevent large tables from being vacuumed.
 
 On a vaguely related matter, should programs such as pg_dump, vacuumdb, 
 and reindexdb disable statement_timeout?

Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore
to that list as well (really, pg_dump(all) should output a command to
disable statement_timeout).
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] where to write small reusable functions ?

2007-04-17 Thread Jim C. Nasby
On Fri, Apr 13, 2007 at 03:02:28PM +0200, Dany DeBontridder wrote:
 On 4/13/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:
 
 Dany DeBontridder wrote:
  I'm working to implement a new feature to pg_dump: the ability to dump
  objects like function, indexes...
 
 pg_dump already dumps functions and indexes.
 
 Right but you can't dump only one or two functions or only the functions and
 nothing else. (the same for index, triggers...)

You should make sure and read past discussion about this, as well as
propose a design to the community before getting too far into a patch.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SoC Students/Projects selected

2007-04-17 Thread Jim C. Nasby
On Sun, Apr 15, 2007 at 08:00:23PM -0300, Josh Berkus wrote:
 Now, while each of these students has an assigned mentor, that doesn't 
 mean other people shouldn't help.  If you're interested in their work, 
 please pitch in.
 
 Note that we'll also be using the pgsql-students mailing list for 
 discussions about SoC itself.
 
Will all discussions take place here or in -students?

 Unfortunately, we only got ONE proposal to work on the buildfarm, and 
 that one was snagged by another project.  So I'm going to be proposing 
 that we use SPI funds to get the Buildfarm work done; if anyone knows a 
 likely candidate, speak up.

Probably worth posting about that to -announce and/or -general...
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Autovacuum vs statement_timeout

2007-04-17 Thread Jim C. Nasby
On Tue, Apr 17, 2007 at 12:51:51PM -0700, Joshua D. Drake wrote:
 Jim C. Nasby wrote:
 On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote:
 Tom Lane wrote:
 I seem to remember that we'd agreed that autovacuum should ignore any
 globally set statement_timeout, on the grounds that a poorly chosen
 setting could indefinitely prevent large tables from being vacuumed.
 On a vaguely related matter, should programs such as pg_dump, vacuumdb, 
 and reindexdb disable statement_timeout?
 
 Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore
 to that list as well (really, pg_dump(all) should output a command to
 disable statement_timeout).
 
 I don't know if that should be a default or not. It is certainly easy 
 enough to disable it should you want to.

How would you disable it for those command-line utilities? Or are you
referring to disabling it via an ALTER ROLE SET ... for superusers?

ISTM current behavior is a bit of a foot-gun. These are administrative
shell commands that aren't going to be run by Joe-user.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: CREATE INDEX and HOT (was [HACKERS] Question: pg_classattributes and race conditions ?)

2007-03-22 Thread Jim C. Nasby
On Mon, Mar 19, 2007 at 12:05:19PM +, Simon Riggs wrote:
 I was unwilling to compromise to have HOT if only one index existed, but
 IMHO allowing HOT with = 3 indexes is an acceptable compromise for this
 release. (We can always use vertical partitioning techniques to allow
 additional access paths to be added to the same table - I'd be very
 happy to document that with worked examples, if requried).

I'm not sure where we're sitting with this, but I've got another idea I
haven't seen (one that I think is better than an arbitrary limit on the
number of indexes)... what if we just disallow non-concurrent index
builds on hot tables? It sounds like the additional pain involved in
chilling an entire table and keeping it chilled for the index build is
even more overhead than just doing a concurrent index build.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] TOASTing smaller things

2007-03-21 Thread Jim C. Nasby
On Wed, Mar 21, 2007 at 12:37:36PM -0400, Chris Browne wrote:
 4.  A different mechanism would be to add a fifth storage column
 strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
 say, TOAST.
 
 At present, the 4 values are essentially advisory; columns get TOASTed
 if the column permits EXTENDED storage, but that only occurs if the
 size is greater than TOAST_TUPLE_THRESHOLD.
 
 If the new value was chosen, the column would *always* get stored as
 TOAST.
 
Rather than a hard and fast limit of 0, why not allow defining a size
threshold? And while we're at it, how about a size threshold for when to
try compressing, too?

 Presumably #1 or #2 could readily get into 8.3 as they're pretty easy;
 #3 is a bit trickier, whilst #4 is probably not 8.3-fittable.
 
 Question:
 
 Which of these sounds preferable?

1 and 2 (variations on how to set the denominator) sound completely
ugly. Trying to minimize wasted space in a toast table is great for a
default, but exposing something like that to the users via any kind of
setting seems way to obtuse.

#3 (GUC for number of bytes) may not make sense for performance reasons,
as Tom mentioned. I'm hoping that it would be easy to check either
pg_class or pg_attribute to see if a table/column has non-standard
toast/compression limits.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Interaction of PITR backups and Bulkoperationsavoiding WAL

2007-03-20 Thread Jim C. Nasby
On Fri, Mar 09, 2007 at 04:57:18PM +, Simon Riggs wrote:
 On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
   On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:
   It strikes me that allowing archive_command to be changed on the fly
   might not be such a good idea though, or at least it shouldn't be
   possible to flip it from empty to nonempty during live operation.
  
   I'd rather fix it the proposed way than force a restart. ISTM wrong to
   have an availability feature cause downtime.
  
  I don't think that people are very likely to need to turn archiving on
  and off on-the-fly.  Your proposed solution introduces a great deal of
  complexity (and risk of future bugs-of-omission, to say nothing of race
  conditions) to solve a non-problem.  We have better things to be doing
  with our development time.
 
 It's certainly a quicker fix. Unless others object, I'll set
 archive_command to only be changeable at server startup.

I think the docs should also explain why it's server-start only, since
if someone wanted to they could circumvent the behavior by having
archive_command call a shell script that changes it's behavior.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Stats for multi-column indexes

2007-03-20 Thread Jim C. Nasby
On Mon, Mar 19, 2007 at 06:55:56PM -0700, Jeff Davis wrote:
 On Mon, 2007-03-19 at 21:24 -0400, Tom Lane wrote:
  Jeff Davis [EMAIL PROTECTED] writes:
   We can already keep stats for a functional index. Is there a reason we
   can't keep stats for a multi-column index?
  
  The questions that need to be answered are (1) what stats are you gonna
  collect, and (2) exactly what are you going to do with them when you
  have 'em?
  
  All the previous discussions have stalled on the question of how to
  avoid trying to collect stats about an exponentially large number of
  column combinations; we've never even reached the question of what
  stats we'd actually want given that a particular combination has been
  determined to be interesting.  Perhaps that's a trivial question,
  but it's been a mighty long time since I took statistics ...
  
 
 I know we can't keep stats on every combination of columns. My initial
 idea would be to only keep stats about a multi-column index (and
 probably optional for those, too).
 
 My thinking was that we could keep a histogram (and MCVs, etc.) of the
 non-scalar key in the multi-column index. That would provide the data
 the planner needs to answer a query like WHERE a = 1 and b  1000 if a
 and b are dependent and you have an index on (a,b).
snip 
 AndrewSN pointed out on IRC that keeping a histogram of non-scalar
 values is not as easy as I thought, because PostgreSQL doesn't allow
 arrays of composite types, among other problems.

I don't think the array problem is that big a deal, since PostgreSQL
doesn't enforce array dimensions at all. You can just make the arrays
for multi-column stats 2 dimensional, though handling indexes with
different data types among the columns would be a bit tricky... right
now the only choice I can think of would be to require that values could
be cast to and from text and just store text in the array. Though
obviously it'd be better to just allow arrays of composite types...

The other challenge is that you can't make all the same assumptions with
a multi-field histogram that you can with a single-field one. For
example, if this is our index:

a   b
-   -
1   1
1   2
...
1   1000
2   500
2   501
...
3   5000

The histogram would likely position the buckets such that 1,1000 and
2,500 would fall within one bucket, which means the planner has no idea
that b doesn't exceed 1000 when a is 1. I'm not sure how big of an issue
that is in reality, though, because the planner does know that the
bucket can only represent so many rows.

It might be worth coming up with a different means to store the
histogram for the multi-column case.

 Is this a worthwhile area of exploration?

ISTM it trips people up often enough to make it worth at least
exploring...
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Auto creation of Partitions

2007-03-08 Thread Jim C. Nasby
On Thu, Mar 08, 2007 at 01:52:27PM +0530, NikhilS wrote:
 I think it'd be great to make adding and removing partitions as
 simple as ALTER TABLE. I don't think that DELETE should be the
 mechanism to drop a partition, though. Again, DML statements
 shouldn't be performing DDL.
 
 
 Since partition is inheritance-based, a simple DROP or  NO INHERIT will do
 the job to deal with the partition. Do we want to reinvent additional syntax
 when these are around and are documented?

Well, if the syntax for adding a new partition eventually ends up as
ALTER TABLE ADD PARTITION, then it would make more sense that you remove
a partition via ALTER TABLE DROP PARTITION.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Auto creation of Partitions

2007-03-08 Thread Jim C. Nasby
On Thu, Mar 08, 2007 at 10:12:30AM +, Gregory Stark wrote:
 Jim Nasby [EMAIL PROTECTED] writes:
 
  One possibility would be to require 3 functions for a partitioned table: one
  accepts the partitioning key and tells you what partition  it's in, one that
  tells you what the minimum partitioning key for a  partition would be, and 
  one
  that tells you what the maximum would be.  If the user supplied those 3
  functions, I think it would be possibly  to automatically generate code for 
  the
  triggers and check  constraints. The min/max partition key functions might
  allow you to  more efficiently do partition elimination, too.
 
 But then it would be harder to tell whether a clause implied a given
 partition. That is, if you have a partition constraint of col OP const then
 we can test whether a query clause of col OP2 const2 implies that constraint
 when planning (or actually whether it implies it's false to exclude the
 partition). If you have a constraint like P1(const) it'll be pretty hard to
 do much with that.

Well, you could tell what partition 'const' was in; I would think that
plus knowledge about OP2 would allow you to decide what partitions you
need to look at.

There's also nothing to prevent us from also adding the constraints and
using constraint exclusion as well. In fact, I think we'd want to have
the constraints just so we know that a given partition only contains the
data we want it to.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Auto creation of Partitions

2007-03-08 Thread Jim C. Nasby
One other thought... a lot of this discussion seems to parallel the
debate from a while ago about whether SERIAL should act like a macro
(ie: it just sets everything up and users are free to monkey under the
hood afterwards), or whether it should be it's own 'closed-box'
construct.

Currently, we seem to be leaning towards partition management being a
'macro', with child tables very exposed, etc. I don't know if that's
good or bad, but it's probably worth some thought.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Estimating seq_page_fetch and random_page_fetch

2007-03-08 Thread Jim C. Nasby
On Thu, Mar 08, 2007 at 05:35:03PM +, Gregory Stark wrote:
 
 Tom Lane [EMAIL PROTECTED] writes:
 
  Umar Farooq Minhas [EMAIL PROTECTED] writes:
  How can we accrately estimate the seq_page_fetch and =
  random_page_fetch costs from outside the postgres using for example a =
  C routine.
 
  Use a test case larger than memory.  Repeat many times to average out
  noise.  IIRC, when I did the experiments that led to the current
  random_page_cost of 4.0, it took about a week before I had numbers I
  trusted.
 
 When I was running tests I did it on a filesystem where nothing else was
 running. Between tests I unmounted and remounted it. As I understand it Linux
 associates the cache with the filesystem and not the block device and discards
 all pages from cache when the filesystem is unmounted.
 
 That doesn't contradict anything Tom said, it might be useful as an additional
 tool though.

Another trick I've used in the past is to just run the machine out of
memory, using the following:

/*
 * $Id: clearmem.c,v 1.1 2003/06/29 20:41:33 decibel Exp $
 *
 * Utility to clear out a chunk of memory and zero it. Useful for flushing disk 
buffers
 */

int main(int argc, char *argv[]) {
if (!calloc(atoi(argv[1]), 1024*1024)) { printf(Error allocating 
memory.\n); }
}

I'll monitor top while that's running to ensure that some stuff gets
swapped out to disk. I believe this might still leave some cached data
in other areas of the kernel, but it's probably not enough to worry
about.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Resumable vacuum proposal and design overview

2007-03-01 Thread Jim C. Nasby
On Wed, Feb 28, 2007 at 10:14:24PM +, Heikki Linnakangas wrote:
 cache instead. In the index scan phase, it's randomly accessed, but if 
 the table is clustered, it's in fact not completely random access. In 
 the 2nd vacuum pass, the array is scanned sequentially again. I'm not 

Only if there's only one index on the table... otherwise I'd argue that
you're much less likely to be searching the TID list incrementally.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-03-01 Thread Jim C. Nasby
On Wed, Feb 28, 2007 at 12:16:10PM -0500, Bruce Momjian wrote:
 background writer, and I think after a server crash, all pages would
 have to be read and checked.  The good news is that both of these are

Would they? If you're doing recovery you'd have to read all pages
dirtied since the last checkpoint... could there be pages other than
those that had been torn?
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Dead Space Map version 2

2007-03-01 Thread Jim C. Nasby
On Wed, Feb 28, 2007 at 04:10:09PM +0900, ITAGAKI Takahiro wrote:
 Jim C. Nasby [EMAIL PROTECTED] wrote:
 
  At some point it might make sense to convert the FSM into a bitmap; that
  way everything just scales with database size.
 
  In the meantime, I'm not sure if it makes sense to tie the FSM size to
  the DSM size, since each FSM page requires 48x the storage of a DSM
  page. I think there's also a lot of cases where FSM size will not scale
  the same was DSM size will, such as when there's historical data in the
  database.
 
 Bitmapped FSM is interesting. Maybe strict accuracy is not needed for FSM.
 If we change FSM to use 2 bits/page bitmaps, it requires only 1/48 shared
 memory by now. However, 6 bytes/page is small enough for normal use. We need
 to reconsider it if we would go into TB class heavily updated databases.
 
 
  That raises another question... what happens when we run out of DSM
  space?
 
 First, discard completely clean memory chunks in DSM. 'Clean' means all of
 the tuples managed by the chunk are frozen. This is a lossless transition.
 
 Second, discard tracked tables and its chunks that is least recently
 vacuumed. We can assume those tables have many dead tuples and almost
 fullscan will be required. We don't bother to keep tracking to such tables.
 
 Many optimizations should still remain at this point, but I'll make
 a not-so-complex suggestions in the meantime.

Actually, I have to agree with Heikki and Takayuki-san... I really like
the idea of managing DSM (and FSM for that matter) using shared_buffers.
If we do that, that means that we could probably back them to disk very
easily.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SOC user quotas

2007-03-01 Thread Jim C. Nasby
On Wed, Feb 28, 2007 at 02:29:52PM -0800, Joshua D. Drake wrote:
 
  I don't know, but in my opinion, I don't see anything bad in requiring
  dropping the data if the quota is full. That's what usually occurs in
  the case of normal filesystem quota... If you don't have a space there,
  you cannot edit files, copy them etc...
  And that solution should be definitely better than the filesystem quota
  for the PostgreSQL user for example.
 
 The bad point is not that we would rollback the transaction. The bad
 point is what happens when you need to rollback a transaction and in
 your scenario it is quite plausible that a large rollback could occur,
 more than once, causing the requirement of something like a vacuum full
 to clean things up.

ISTM that if the transaction is that big it's likely going to be
extending the heap, which means you'd get space back on a plain vacuum.

As for things like CLUSTER, and REINDEX it would probably be useful to
make an exception, since we know that those operations are intended to
shrink the size of a relation.

I also think there's a lot to be said for a soft limit.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Revitalising VACUUM FULL for 8.3

2007-03-01 Thread Jim C. Nasby
On Thu, Mar 01, 2007 at 04:54:44PM +0200, Hannu Krosing wrote:
   I was not suggesting that we remove visible rows through truncation.
  
  Sure, unless you suggest to not truncate during this vacuum run ?
  But we are talking about vacuum full, so truncation is essential.
  
  It was suggested to do a dummy null update to move live tuples up front.
  The old version is still visible for serializable txns.
 
 There should probably be a wait-other-trxs to finish stage between
 COMPACT and 2nd VACUUM if truncation at this cycle is absolutely needed

Or you could just do a vacuum a bit later. If we've also got the DSM at
that point, vacuum should be fast.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Revitalising VACUUM FULL for 8.3

2007-03-01 Thread Jim C. Nasby
On Thu, Mar 01, 2007 at 01:05:28PM +, Simon Riggs wrote:
 ISTM a radical approach is needed, so I'm very open to discussion about
 this and how we cope.
 
 If we break down the main thoughts into a few parts:
 
 1. would like a way to CLUSTER/VACUUM FULL where we don't have to move
 all of the tuple versions, just the current ones.
 
 2. would like a way to compact a table more efficiently
 
 Your idea does (2) in a concurrent manner, which is very good.

Along similar lines, I think it would also be very useful to have a mode
where any time a tuple on the last X pages gets updated it's moved off
of it's existing page in the relation. Kind of like a 'shrink the heap
in the background'. Granted, this probably won't compact as much as
something more aggressive would, but it should be essentially free.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] autovacuum next steps, take 2

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 01:26:00AM -0500, Matthew T. O'Connor wrote:
 Tom Lane wrote:
 Matthew T. O'Connor matthew@zeut.net writes:
 I'm not sure what you are saying here, are you now saying that partial 
 vacuum won't work for autovac?  Or are you saying that saving state as 
 Jim is describing above won't work?
 
 I'm saying that I don't like the idea of trying to stop on a dime by
 saving the current contents of vacuum's dead-TID array to disk with the
 idea that we can trust those values 100% later.  Saving the array is
 expensive both in runtime and code complexity, and I don't believe we
 can trust it later --- at least not without even more expensive-and-
 complex measures, such as WAL-logging every such save :-(
 
 I'm for stopping only after completing an index-cleaning pass, at the
 point where we empty the dead-TID array anyway.  If you really have to
 have stop on a dime, just kill -INT the process, accepting that you
 will have to redo your heap scan since the last restart point.
 
 OK, so if I understand correct, a vacuum of a table with 10 indexes on 
 it can be interrupted 10 times, once after each index-cleaning pass? 
 That might have some value, especially breaking up the work required to 
 vacuum a large table. Or am I still not getting it?

It'd stop after scanning the heap, scanning all the indexes, and then
cleaning the heap. After that's done it no longer needs any of the
dead-TID info; anytime before that it does need that info, and Tom's
objection is that trying to store that info is a bad idea.

The problem with this is that typically it takes a long time to go
through a complete vacuum cycle; minutes at least, and preferably
longer. Decreasing that cycle time will greatly increase the amount of
IO required for vacuuming a table with any indexes, because every time
you cycle through you have to read the entire index. That's why I don't
see it as being useful at all for getting autovac to work on hot tables
- if you actually got that cycle time low enough you'd kill the system
with all the extra index scanning.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] autovacuum next steps, take 2

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 12:54:28AM -0500, Matthew T. O'Connor wrote:
 Jim C. Nasby wrote:
 On Mon, Feb 26, 2007 at 10:18:36PM -0500, Matthew T. O'Connor wrote:
 Jim C. Nasby wrote:
 Here is a worst case example: A DB with 6 tables all of which are highly 
 active and will need to be vacuumed constantly.  While this is totally 
 hypothetical, it is how I envision things working (without the threshold).
 
 I fail to see how a simple 6 table case is 'worst case'. It's common to
 see hundreds of tables, and I've run across more than one database with
 thousands of tables (think partitioning). In cases like those it's
 certainly possible, perhaps even likely that you would get many daemons
 running in the database at one time just from different tables suddenly
 needing vacuuming and appearing at a higher point in the list than other
 tables. With 100 ~1G tables getting updates it certainly wouldn't be
 hard to end up with 10 of those being vacuumed all at the same time.
 
 Yes 6 tables is small, the worst-case part of the example was that all 
 the tables would need to be vacuumed constantly.  Most databases only 
 have a few hot tables.  Most tables only need to vacuumed every once in 
 a while.
 
It's not the hot tables that are the issue; it's how many large tables
(hot or not) that can come up for vacuuming in order. For example, if
A-Z are all large tables (ie: a few GB), with A being the largest and Z
the smallest, think about what happens here:

Round 1: A needs vacuuming. Daemon gets to it and starts working.
Round 2: B now needs vacuuming. It's slightly smaller than A, so daemon
2 gets to it.
Round 3: C now needs vacuuming. Daemon 3.
...
Round 26: Z now needs vacuuming. Daemon 26 picks it up.

You now have 26 daemons running in the database.

Now, we can argue about how likely that scenario is, but I don't think
it's relevant. What matters is that it *is* possible, and as long as
that's the case you'd have to have some kind of limit. (While this
simple 26 table example is definitely worst-case, if you've got hundreds
of tables that are all multiple GB in size I think it wouldn't be hard
at all for you to end up with a dozen or more daemons all hammering
away).

 I do like the idea since it should be easier to tune, but I think we
 still need some limit on it. Perhaps as a first-pass we could just have
 a hard limit and log a message and/or set a flag any time we hit it.
 That would hopefully allow us to get information about how big a problem
 it really is. We could go one step further and say that the last daemon
 that can start in a database will only vacuum tables that can be done
 quickly; that's essentially what we've been talking about, except the
 limit we've been discussing would be hard-coded at 2.
 
 I'm confused, what limit would be set at 2?  The number of concurrent 
 workers?  I've never said that.

The point I was making is that the proposal about limiting the 2nd
daemon to only processing tables it can do in a short period of time is
akin to setting a limit of only 2 daemons in a database at a time.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] autovacuum next steps, take 2

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 12:00:41AM -0300, Alvaro Herrera wrote:
 Jim C. Nasby wrote:
 
  The advantage to keying this to autovac_naptime is that it means we
  don't need another GUC, but after I suggested that before I realized
  that's probably not the best idea. For example, I've seen clusters that
  are running dozens-hundreds of databases; in that environment you really
  need to turn naptime way down (to like a second). In that case you
  wouldn't want to key to naptime.
 
 Actually, I've been thinking that it would be a good idea to change the
 semantics of autovacuum_naptime so that it means the average time to
 start a worker in any given database.  That way, the time between
 autovac runs is not dependent on the number of databases you have.

BTW, another issue that I don't think we can ignore: we actually need to
do this on a per-tablespace level, or at least have the ability to
disable or somehow limit it. While it's not common, there are users that
run a hundred or more databases in a single cluster; it would be ugly if
we suddenly had 100 vacuums trying to run on the same set of drives
concurrently.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 11:44:28AM +0900, Galy Lee wrote:
 For example, there is one table:
- The table is a hundreds GBs table.
- It takes 4-8 hours to vacuum such a large table.
- Enabling cost-based delay may make it last for 24 hours.
- It can be vacuumed during night time for 2-4 hours.
 
 It is true there is no such restrict requirement that vacuum
 need to be interrupt immediately, but it should be stopped in an
 *predictable way*. In the above example, if we have to wait for the end
  of one full cycle of cleaning, it may take up to 8 hours for vacuum to
 stop after it has received stop request. This seems quit unacceptable.

Even with very large tables, you could likely still fit things into a
specific time frame by adjusting how much time is spent scanning for
dead tuples. The idea would be to give vacuum a target run time, and it
would monitor how much time it had remaining, taking into account how
long it should take to scan the indexes based on how long it's been
taking to scan the heap. When the amount of time left becomes less than
the estimate of the amount of time required to scan the indexes (and
clean the heap), you stop the heap scan and start scanning indexes. As
long as the IO workload on the machine doesn't vary wildly between the
heap scan and the rest of the vacuum process, I would expect this to
work out fairly well.

While not as nice as the ability to 'stop on a dime' as Tom puts it,
this would be much easier and safer to implement. If there's still a
need for something better after that we could revisit it at that time.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Seeking Google SoC Mentors

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 12:47:14AM -0500, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  Yes, but the list being discussed is SoC projects that the community
  would like to see done, which means most people would assume that #1
  isn't an issue.
 
  We need to make sure that every project on the list of SoC ideas is
  supported by the community.
 
 Agreed, except that in most cases a one-liner description of an idea
 isn't enough to get a meaningful reading on whether people think it's
 sane or not.  To take our current example: do you think a one-liner
 description of full disjunctions would have gotten any feedback, except
 for requests for more detail?
 
 I'm not sure how we fix that --- laying out every acceptable project
 in great detail in advance won't happen for lack of manpower, and wouldn't
 be a good idea even if we could do it, because that sounds like a great
 way to stifle creativity.  At the same time we can hardly promise to
 accept every wild-west idea that someone manages to turn into some kind
 of code.  What can we tell the students other than get as much feedback
 as you can, as early as you can?

I agree we certainly don't want to go designing these projects in
advance, but I think we could at least ensure that the community buys
into the concept of each project. ISTM one of the big issues with FD is
that most people didn't even really understand what exactly it was or
how it might be useful, which made getting broad acceptance even harder.

For example, these TODOs seem like they have good acceptance by the
community (though they might not be good SoC projects for other
reasons):
Simplify ability to create partitioned tables
Allow auto-selection of partitioned tables for min/max() operations
Allow commenting of variables in postgresql.conf to restore them to
defaults

Examples of ideas that might not be good because it's unclear that the
community supports them:
Stop-on-a-dime partial vacuum
Adding a replication solution to the backend
Putting time travel support back in

Granted, the 'not good idea' list is pretty exaggerated simply because
it's not as easy to find examples of that on the TODO list, since stuff
on the TODO list is generally supported. Some of the 'temporal database'
items that had been suggested probably fall into the category of 'might
be a good idea, but the community hasn't decided that yet'. So maybe we
should be limiting SoC projects to stuff that's already on the TODO
list..
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Dead Space Map version 2

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 12:55:21AM -0500, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  Yes, DSM would make FSM recovery more important, but I thought it was
  recoverable now? Or is that only on a clean shutdown?
 
 Currently we throw away FSM during any non-clean restart.  This is
 probably overkill but I'm quite unclear what would be a safe
 alternative.

My thought would be to revert to a FSM that has pages marked as free
that no longer are. Could be done by writing the FSM out every time we
add pages to it. After an unclean restart backends would be getting
pages from the FSM that didn't have free space, in which case they'd
need to yank that page out of the FSM and request a new one. Granted,
this means extra IO until the FSM gets back to a realistic state, but I
suspect that's better than bloating tables out until the next vacuum.
And it's ultimately less IO than re-vacuuming every table to rebuild the
FSM.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Dead Space Map version 2

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 05:38:39PM +0900, ITAGAKI Takahiro wrote:
 Jim C. Nasby [EMAIL PROTECTED] wrote:
 
   If we do UPDATE a tuple, the original page containing the tuple is marked
   as HIGH and the new page where the updated tuple is placed is marked as 
   LOW.
  
  Don't you mean UNFROZEN?
 
 No, the new tuples are marked as LOW. I intend to use UNFROZEN and FROZEN
 pages as all tuples in the pages are visible to all transactions for
 index-only-scan in the future.
 
Ahh, ok. Makes sense, though I tend to agree with others that it's
better to leave that off for now, or at least do the initial patch
without it.
 
  What makes it more important to mark the original page as HIGH instead
  of LOW, like the page with the new tuple? The description of the states
  indicates that there would likely be a lot more dead tuples in a HIGH
  page than in a LOW page.
  
  Perhaps it would be better to have the bgwriter take a look at how many
  dead tuples (or how much space the dead tuples account for) when it
  writes a page out and adjust the DSM at that time.
 
 Yeah, I feel it is worth optimizable, too. One question is, how we treat
 dirty pages written by backends not by bgwriter? If we want to add some
 works in bgwriter, do we also need to make bgwriter to write almost of
 dirty pages?

IMO yes, we want the bgwriter to be the only process that's normally
writing pages out. How close we are to that, I don't know...
 
   * Agressive freezing
   We will freeze tuples in dirty pages using OldestXmin but FreezeLimit.
  
  Do you mean using OldestXmin instead of FreezeLimit?
 
 Yes, we will use OldestXmin as the threshold to freeze tuples in
 dirty pages or pages that have some dead tuples. Or, many UNFROZEN
 pages still remain after vacuum and they will cost us in the next
 vacuum preventing XID wraparound.

Another good idea. If it's not too invasive I'd love to see that as a
stand-alone patch so that we know it can get in.
 
   I'm thinking to change them into 2 new paramaters. We will allocates 
   memory
   for DSM that can hold all of estimated_database_size, and for FSM 50% or
   something of the size. Is this reasonable?
   
  I don't think so, at least not until we get data from the field about
  what's typical. If the DSM is tracking every page in the cluster then
  I'd expect the FSM to be closer to 10% or 20% of that, anyway.
 
 I'd like to add some kind of logical flavors to max_fsm_pages
 and max_dsm_pages. For DSM, max_dsm_pages should represent the
 whole database size. In the other hand, what meaning does
 max_fsm_pages have? (estimated_updatable_size ?)

At some point it might make sense to convert the FSM into a bitmap; that
way everything just scales with database size.

In the meantime, I'm not sure if it makes sense to tie the FSM size to
the DSM size, since each FSM page requires 48x the storage of a DSM
page. I think there's also a lot of cases where FSM size will not scale
the same was DSM size will, such as when there's historical data in the
database.

That raises another question... what happens when we run out of DSM
space?
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] autovacuum next steps, take 2

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 12:12:22PM -0500, Matthew T. O'Connor wrote:
 Jim C. Nasby wrote:
 On Tue, Feb 27, 2007 at 12:00:41AM -0300, Alvaro Herrera wrote:
 Jim C. Nasby wrote:
 
 The advantage to keying this to autovac_naptime is that it means we
 don't need another GUC, but after I suggested that before I realized
 that's probably not the best idea. For example, I've seen clusters that
 are running dozens-hundreds of databases; in that environment you really
 need to turn naptime way down (to like a second). In that case you
 wouldn't want to key to naptime.
 Actually, I've been thinking that it would be a good idea to change the
 semantics of autovacuum_naptime so that it means the average time to
 start a worker in any given database.  That way, the time between
 autovac runs is not dependent on the number of databases you have.
 
 BTW, another issue that I don't think we can ignore: we actually need to
 do this on a per-tablespace level, or at least have the ability to
 disable or somehow limit it. While it's not common, there are users that
 run a hundred or more databases in a single cluster; it would be ugly if
 we suddenly had 100 vacuums trying to run on the same set of drives
 concurrently.
 
 I think we all agree that autovacuum needs to become tablespace aware at 
 some point, but I think that is further down the line, we're having 
 enough trouble figuring things out without that additional complication.

Sure, we just need a way to disable the multiple autovac daemon stuff
then.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 10:49:32AM +, Simon Riggs wrote:
  I dislike introducing new nonstandard syntax (Oracle compatible is not
  standard).  If we did this I'd vote for control via a GUC setting only;
  I think that is more useful anyway, as an application can be made to run
  with such a setting without invasive source code changes.
 
 OK.
 
 Having read through all of the above things again, ISTM that we should
 make this functionality available by a new GUC commit_fsync_delay, which
 must be set explicitly  0 before this feature can be used at all. If I
 confused Tom by using commit_delay, then I'll confuse others also and
 group commit and deferred fsync are different techniques with different
 robustness guarantees. When enabled it should have a clear message in
 the log to show that some commits might be using commit_nowait. 
 
 I'd even welcome a more descriptive term that summed up the relaxed
 transaction guarantee implied by the use of the deferred fsync
 technique. Perhaps even a very explicit USERSET GUC:
 
   transaction_guarantee = on (default) | off

So would you set commit_fsync_delay on a per-transaction basis? That
doesn't make much sense to me... I guess I'm not seeing how you would
explicitly mark transactions that you didn't want to fsync immediately.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] No ~ operator for box, point

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 20, 2007 at 04:22:27PM -0500, Bruce Momjian wrote:
 Added to TODO:
 
   * Add missing operators for geometric data types
   
 Some geometric types do not have the full suite of geometric 
 operators,
 e.g. box @ point
 

I've started looking at this, and ISTM that at least part of this could
be solved by allowing some implicit casts. Given that the geometry data
types are point, line, lseg, box, path, polygon, circle, I think the
following should be safe:

box - polygon
lseg - open path
polygon - closed path

I would argue that this is similar to int2 - int4 - int8: a box is a
type of polygon, a polygon is a closed path (that doesn't intersect,
which needs to be added to the docs, btw), and a line segment is an open
path.

Is there any reason not to make these casts implicit? If there is,
what's the best way to go about adding operators for cases where
equivalent operators exist? (IE: @(box,point) doesn't exist, but
@(polygon,point) does, and should suffice for @(box,point) with
appropriate casting)

Actually, looking at one example (@(point,box) vs @(point,poly)), part
of the reason is that it's far simpler to deal with a box than a generic
polygon.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 05:18:28PM -0500, Bruce Momjian wrote:
 Jim C. Nasby wrote:
  On Mon, Feb 26, 2007 at 10:56:58PM +, Simon Riggs wrote:
   2. remove fsync parameter
  
  Why? Wouldn't fsync=off still speed up checkpoints? ISTM you'd still
  want this for things like database restores.
 
 I think we will remove fsync in favor of the new delay, and allow -1 to
 be the same behavior as fsync off.

Well, presumably we'd still allow fsync for some number of versions...

Actually, I don't know that combining both settings is a wise move. The
delay should still provide crash protection, whereas with fsync=off
you've got absolutely no protection from anything. That's a huge
difference, and one that IMHO warrants a separate setting (and a big,
fat WARNING in the comment for that setting).
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 07:17:37PM -0500, Bruce Momjian wrote:
  Actually, I don't know that combining both settings is a wise move. The
  delay should still provide crash protection, whereas with fsync=off
  you've got absolutely no protection from anything. That's a huge
  difference, and one that IMHO warrants a separate setting (and a big,
  fat WARNING in the comment for that setting).
 
 Yes, it needs a warning, or perhaps we just tell people to set it to
 something high and that is all they can do.

Before doing that I'd want to see how the performance compares to
fsync=off. My guess is that fsync=off is a big gain during checkpoints.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Jim C. Nasby
On Mon, Feb 26, 2007 at 09:22:42PM -0500, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Matthew T. O'Connor wrote:
  I'm not sure it's a good idea to tie this to the vacuum cost delay 
  settings either, so let me as you this, how is this better than just 
  allowing the admin to set a new GUC variable like 
  autovacuum_hot_table_size_threshold  (or something shorter) which we can 
  assign a decent default of say 8MB.
 
  Yeah, maybe that's better -- it's certainly simpler.
 
 I'm not liking any of these very much, as they seem critically dependent
 on impossible-to-tune parameters.  I think it'd be better to design this
 around having the first worker explicitly expose its state (list of
 tables to process, in order) and having subsequent workers key off that
 info.  The shared memory state could include the OID of the table each
 worker is currently working on, and we could keep the to-do list in some
 simple flat file for instance (since we don't care about crash safety).
 
 I'm not certain exactly what key off needs to mean; perhaps each
 worker should make its own to-do list and then discard items that are
 either in-progress or recently done by another worker when it gets to
 them.
 
The real problem is trying to set that up in such a fashion that keeps
hot tables frequently vacuumed; that means that the 2nd daemon in a
database either needs to avoid large tables that will take 'a
significant' length of time to vacuum, or you need to allow any number
of daemons in each database (which sounds like a good way to thrash the
machine).

 I think an absolute minimum requirement for a sane design is that no two
 workers ever try to vacuum the same table concurrently, and I don't see
 where that behavior will emerge from your proposal; whereas it's fairly
 easy to make it happen if non-first workers pay attention to what other
 workers are doing.

Isn't there a special lock acquired on a relation by vacuum? Can't we
just check for that? Seems much simpler than building out the ability
for daemons to see what each other is doing (and that still wouldn't
take manual vacuums into account.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Jim C. Nasby
On Mon, Feb 26, 2007 at 08:11:44PM -0300, Alvaro Herrera wrote:
 Matthew T. O'Connor wrote:
  Alvaro Herrera wrote:
 
  The second mode is the hot table worker mode, enabled when the worker
  detects that there's already a worker in the database.  In this mode,
  the worker is limited to those tables that can be vacuumed in less than
  autovacuum_naptime, so large tables are not considered.  Because of
  this, it'll generally not compete with the first mode above -- the
  tables in plain worker were sorted by size, so the small tables were
  among the first vacuumed by the plain worker.  The estimated time to
  vacuum may be calculated according to autovacuum_vacuum_delay settings,
  assuming that all pages constitute cache misses.
  
  How can you determine what tables can be vacuumed within 
  autovacuum_naptime?
 
 My assumption is that
 pg_class.relpages * vacuum_cost_page_miss * vacuum_cost_delay = time to vacuum

Need ta take vacuum_cost_limit into account.

The advantage to keying this to autovac_naptime is that it means we
don't need another GUC, but after I suggested that before I realized
that's probably not the best idea. For example, I've seen clusters that
are running dozens-hundreds of databases; in that environment you really
need to turn naptime way down (to like a second). In that case you
wouldn't want to key to naptime.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Jim C. Nasby
On Mon, Feb 26, 2007 at 06:23:22PM -0500, Matthew T. O'Connor wrote:
 Alvaro Herrera wrote:
 Matthew T. O'Connor wrote:
 How can you determine what tables can be vacuumed within 
 autovacuum_naptime?
 
 My assumption is that
 pg_class.relpages * vacuum_cost_page_miss * vacuum_cost_delay = time to 
 vacuum
 
 This is of course not the reality, because the delay is not how long
 it takes to fetch the pages.  But it lets us have a value with which we
 can do something.  With the default values, vacuum_cost_delay=10,
 vacuum_cost_page_miss=10, autovacuum_naptime=60s, we'll consider tables
 of under 600 pages, 4800 kB (should we include indexes here in the
 relpages count?  My guess is no).
 
 I'm not sure how pg_class.relpages is maintained but what happens to a 
 bloated table?  For example, a 100 row table that is constantly updated 
 and hasn't been vacuumed in a while (say the admin disabled autovacuum 
 for a while), now that small 100 row table has 1000 pages in it most of 
 which are just bloat, will we miss this table?  Perhaps basing this on 
 reltuples would be better?

The entire point of this is to ensure that the second daemon will only
vacuum tables that it can finish very quickly. If you let a table bloat
so it's too big, then you just can't vacuum it very frequently without
risking all your other hot tables bloating because they're no longer
getting vacuumed.

The reality is that you can actually vacuum a pretty good-sized table in
60 seconds with typical cost-delay settings (ie: defaults except
cost_delay set to 10). That means you can do 9 pages ~100 times a
second, or 54k pages a minute. Even with a vacuum_cost_delay of 20,
that's still 27k pages per minute.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Seeking Google SoC Mentors

2007-02-26 Thread Jim C. Nasby
On Mon, Feb 26, 2007 at 09:10:38PM -0500, Tom Lane wrote:
 Andrew Dunstan [EMAIL PROTECTED] writes:
  Well, here's a question. Given the recent discussion re full 
  disjunction, I'd like to know what sort of commitment we are going to 
  give people who work on proposed projects.
 
 Um, if you mean are we going to promise to accept a patch in advance of
 seeing it, the answer is certainly not.  Still, a SoC author can improve
 his chances in all the usual ways, primarily by getting discussion and
 rough consensus on a spec and then on an implementation sketch before
 he starts to do much code.  Lots of showstopper problems can be caught
 at that stage.
 
 I think the main problems with the FD patch were (1) much of the
 community was never actually sold on it being a useful feature,
 and (2) the implementation was not something anyone wanted to accept
 into core, because of its klugy API.  Both of these points could have
 been dealt with before a line of code had been written, but they were
 not :-(

Yes, but the list being discussed is SoC projects that the community
would like to see done, which means most people would assume that #1
isn't an issue.

We need to make sure that every project on the list of SoC ideas is
supported by the community.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-26 Thread Jim C. Nasby
On Mon, Feb 26, 2007 at 10:56:58PM +, Simon Riggs wrote:
 2. remove fsync parameter

Why? Wouldn't fsync=off still speed up checkpoints? ISTM you'd still
want this for things like database restores.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Dead Space Map version 2

2007-02-26 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 12:05:57PM +0900, ITAGAKI Takahiro wrote:
 Each heap pages have 4 states for dead space map; HIGH, LOW, UNFROZEN and
 FROZEN. VACUUM uses the states to reduce the number of target pages.
 
 - HIGH : High priority to vacuum. Maybe many dead tuples in the page.
 - LOW  : Low priority to vacuum Maybe few dead tuples in the page.
 - UNFROZEN : No dead tuples, but some unfrozen tuples in the page.
 - FROZEN   : No dead nor unfrozen tuples in the page.
 
 If we do UPDATE a tuple, the original page containing the tuple is marked
 as HIGH and the new page where the updated tuple is placed is marked as LOW.

Don't you mean UNFROZEN?

 When we commit the transaction, the updated tuples needs only FREEZE.
 That's why the after-page is marked as LOW. However, If we rollback, the
 after-page should be vacuumed, so we should mark the page LOW, not UNFROZEN.
 We don't know the transaction will commit or rollback at the UPDATE.

What makes it more important to mark the original page as HIGH instead
of LOW, like the page with the new tuple? The description of the states
indicates that there would likely be a lot more dead tuples in a HIGH
page than in a LOW page.

Perhaps it would be better to have the bgwriter take a look at how many
dead tuples (or how much space the dead tuples account for) when it
writes a page out and adjust the DSM at that time.

 * Agressive freezing
 We will freeze tuples in dirty pages using OldestXmin but FreezeLimit.
 This is for making FROZEN pages but not UNFROZEN pages as far as possible
 in order to reduce works in XID wraparound vacuums.

Do you mean using OldestXmin instead of FreezeLimit?

Perhaps it might be better to save that optimization for later...

 In current implementation, DSM allocates a bunch of memory at start up and
 we cannot modify it in running. It's maybe enough because DSM consumes very
 little memory -- 32MB memory per 1TB database.
 
 There are 3 parameters for FSM and DSM.
 
   - max_fsm_pages = 204800
   - max_fsm_relations = 1000 (= max_dsm_relations)
   - max_dsm_pages = 4096000
 
 I'm thinking to change them into 2 new paramaters. We will allocates memory
 for DSM that can hold all of estimated_database_size, and for FSM 50% or
 something of the size. Is this reasonable?
 
I don't think so, at least not until we get data from the field about
what's typical. If the DSM is tracking every page in the cluster then
I'd expect the FSM to be closer to 10% or 20% of that, anyway.

 I've already have a recovery extension. However, it can recover DSM
 but not FSM. Do we also need to restore FSM? If we don't, unreusable
 pages might be left in heaps. Of cource it could be reused if another
 tuple in the page are updated, but VACUUM will not find those pages.

Yes, DSM would make FSM recovery more important, but I thought it was
recoverable now? Or is that only on a clean shutdown?

I suspect we don't need perfect recoverability... theoretically we could
just commit the FSM after vacuum frees pages and leave it at that; if we
revert to that after a crash, backends will grab pages from the FSM only
to find there's no more free space, at which point they could pull the
page from the FSM and find another one. This would lead to degraded
performance for a while after a crash, but that might be a good
trade-off.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Expanding DELETE/UPDATE returning

2007-02-26 Thread Jim C. Nasby
On Mon, Feb 26, 2007 at 11:14:01PM -0500, Tom Lane wrote:
 Rusty Conover [EMAIL PROTECTED] writes:
  I didn't see this on the TODO list, but if it is my apologies.  Is it  
  in the cards to expand the functionality of DELETE/UPDATE returning  
  to be able to sort the output of the rows returned?
 
 No.
 
  Or allow delete  
  and update to be used in sub-queries?
 
 That's been discussed but the implementation effort seems far from
 trivial.  One big problem is that a sub-query can normally be
 re-executed multiple times, eg on the inner side of a join; whereas
 that's clearly not acceptable for an insert/update/delete.

Couldn't we avoid that by writing the data to a tuplestore? Or is it too
hard to detect the cases when that would need to happen?
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 12:00:41AM -0300, Alvaro Herrera wrote:
 Jim C. Nasby wrote:
 
  The advantage to keying this to autovac_naptime is that it means we
  don't need another GUC, but after I suggested that before I realized
  that's probably not the best idea. For example, I've seen clusters that
  are running dozens-hundreds of databases; in that environment you really
  need to turn naptime way down (to like a second). In that case you
  wouldn't want to key to naptime.
 
 Actually, I've been thinking that it would be a good idea to change the
 semantics of autovacuum_naptime so that it means the average time to
 start a worker in any given database.  That way, the time between
 autovac runs is not dependent on the number of databases you have.

Hrm... how would that work?

BTW, another thought is to only sleep if you've scanned through every
database and found nothing to do.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Jim C. Nasby
On Mon, Feb 26, 2007 at 10:48:49PM -0500, Tom Lane wrote:
 Matthew T. O'Connor matthew@zeut.net writes:
  That does sounds simpler. Is chunk-at-a-time a realistic option for 8.3?
 
 It seems fairly trivial to me to have a scheme where you do one
 fill-workmem-and-scan-indexes cycle per invocation, and store the
 next-heap-page-to-scan in some handy place (new pg_class column updated
 along with relpages/reltuples, likely).  Galy is off in left field with
 some far more complex ideas :-( but I don't see that there's all that
 much needed to support this behavior ... especially if we don't expose
 it to the SQL level but only support it for autovac's use.  Then we're
 not making any big commitment to support the behavior forever.

The problem I see there is that the case we're trying to fix is tables
that need to be vacuumed every few minutes. As I posted elsewhere, it's
reasonable to assume a vacuum rate of ~1000 pages/second for a small
table that's going to be in memory (assuming that vacuum dirties every
page). That means that you can only dirty about 60k pages per cycle,
which seems way to small to be practical unless we come up with a way to
avoid scanning the indexes on every cycle.

The proposal to save enough state to be able to resume a vacuum at
pretty much any point in it's cycle might work; we'd have to benchmark
it.  With the default maintenance_work_mem of 128M it would mean writing
out 64M of state every minute on average, which is likely to take
several seconds to fsync (though, maybe we wouldn't need to fsync it...)
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-26 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 11:05:45AM +0700, Jeroen T. Vermeulen wrote:
 On Tue, February 27, 2007 06:06, Joshua D. Drake wrote:
 
  Why do we want this?? Because some apps have *lots* of data and many
  really don't care whether they lose a few records. Honestly, I've met
  people that want this, even after 2 hours of discussion and
  understanding. Plus probably lots of MySQLers also.
 
  Most users will take speed over data loss any day. Whether we want to
  admit it or not.
 
 In that case, wouldn't it make just as much sense to have an equivalent
 for this special transaction mode on individual statements, without
 transaction context?  I'm guessing that who don't really know or want
 transactions would never start one, running lots of loose statements
 instead that otherwise get committed individually.

I don't think it makes sense to optimize for people who can't be
bothered to learn about a transaction. In any case, that option is
there; you just set the GUC in the session.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Jim C. Nasby
On Mon, Feb 26, 2007 at 10:18:36PM -0500, Matthew T. O'Connor wrote:
 Jim C. Nasby wrote:
 On Mon, Feb 26, 2007 at 06:23:22PM -0500, Matthew T. O'Connor wrote:
 I'm not sure how pg_class.relpages is maintained but what happens to a 
 bloated table?  For example, a 100 row table that is constantly updated 
 and hasn't been vacuumed in a while (say the admin disabled autovacuum 
 for a while), now that small 100 row table has 1000 pages in it most of 
 which are just bloat, will we miss this table?  Perhaps basing this on 
 reltuples would be better?
 
 The entire point of this is to ensure that the second daemon will only
 vacuum tables that it can finish very quickly. If you let a table bloat
 so it's too big, then you just can't vacuum it very frequently without
 risking all your other hot tables bloating because they're no longer
 getting vacuumed.
 
 The reality is that you can actually vacuum a pretty good-sized table in
 60 seconds with typical cost-delay settings (ie: defaults except
 cost_delay set to 10). That means you can do 9 pages ~100 times a
 second, or 54k pages a minute. Even with a vacuum_cost_delay of 20,
 that's still 27k pages per minute.
 
 At the risk of sounding like a broken record, I still think the size 
 limit threshold is unnecessary.  Since all workers will be working in on 
 tables in size order, younger workers will typically catch older workers 
 fairly quickly since the tables will be either small, or recently 
 vacuumed and not need work.  And since younger workers exit when they 
 catch-up to an older worker, there is some inherent stability in the 
 number of workers.
 
 Here is a worst case example: A DB with 6 tables all of which are highly 
 active and will need to be vacuumed constantly.  While this is totally 
 hypothetical, it is how I envision things working (without the threshold).

I fail to see how a simple 6 table case is 'worst case'. It's common to
see hundreds of tables, and I've run across more than one database with
thousands of tables (think partitioning). In cases like those it's
certainly possible, perhaps even likely that you would get many daemons
running in the database at one time just from different tables suddenly
needing vacuuming and appearing at a higher point in the list than other
tables. With 100 ~1G tables getting updates it certainly wouldn't be
hard to end up with 10 of those being vacuumed all at the same time.

I do like the idea since it should be easier to tune, but I think we
still need some limit on it. Perhaps as a first-pass we could just have
a hard limit and log a message and/or set a flag any time we hit it.
That would hopefully allow us to get information about how big a problem
it really is. We could go one step further and say that the last daemon
that can start in a database will only vacuum tables that can be done
quickly; that's essentially what we've been talking about, except the
limit we've been discussing would be hard-coded at 2.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps, take 2

2007-02-26 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 12:37:42AM -0500, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  The proposal to save enough state to be able to resume a vacuum at
  pretty much any point in it's cycle might work; we'd have to benchmark
  it.  With the default maintenance_work_mem of 128M it would mean writing
  out 64M of state every minute on average, which is likely to take
  several seconds to fsync (though, maybe we wouldn't need to fsync it...)
 
 Which is exactly why we needn't bother benchmarking it.  Even if it
 weren't complex and unsafe, it will be a net loss when you consider the
 fact that it adds I/O instead of removing it.

Well, it depends on how often you're doing that. Adding extra IO at the
end of 4 hours of vacuuming isn't going to make any real difference, but
once a minute...

Looks like partial vacuum won't help this problem. :(
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Simple Column reordering

2007-02-23 Thread Jim C. Nasby
On Fri, Feb 23, 2007 at 02:09:55PM +, Simon Riggs wrote:
  If you really want an interim solution, what about a builtin function 
  that would explicitly mutate the definition and table contents (if any) 
  along the lines you want? (assuming that's lots less work than just 
  doing the whole thing right to start with). Or even one which just 
  *displayed* the optimal order might be sufficient assistance to DBAs who 
  want to take advantage of this.
 
 I think the only interim solution now is to put functionality into
 PgAdmin et al to optimize the column order.

Well, if it comes to that it would be good to have pgAdmin et all driven
by logic in the database, so that people using psql can benefit as well.
Perhaps a function that is passed an existing table and re-creates it in
optimal order (if it's empty...). Or at least spits out a CREATE TABLE
statement for you that's in optimal order.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] timestamp subtraction (was Re: [SQL] formatting intervals with to_char)

2007-02-23 Thread Jim C. Nasby
Yes, but if it was '2004-01-02 01:00:00'-'2004-01-01 00:00:00' it should
return 25:00:00, not 1 day 1:00.

I agree with Tom that this should be changed; I'm just arguing that we
might well need a backwards-compatibility solution for a while. At the
very least we'd need to make this change very clear to users.

On Tue, Feb 20, 2007 at 08:07:11PM -0500, Bruce Momjian wrote:
 
 One problem with removing justify_hours() is that this is going to
 return '24:00:00', rather than '1 day:
   
   test= select '2004-01-02 00:00:00'::timestamptz - '2004-01-01
   00:00:00'::timestamptz;
?column?
   --
24:00:00
   (1 row)
 
 ---
 
 Jim Nasby wrote:
  On Oct 5, 2006, at 11:50 AM, Tom Lane wrote:
   regression=# select ('2006-09-15 23:59:00'::timestamp - '2006-09-01  
   09:30:41'::timestamp);
?column?
   --
14 days 14:28:19
   (1 row)
  
   should be reporting '350:28:19' instead.
  
   This is a hack that was done to minimize the changes in the regression
   test expected outputs when we changed type interval from months/ 
   seconds
   to months/days/seconds.  But I wonder whether it wasn't a dumb idea.
   It is certainly inconsistent, as noted in the code comments.
  
   I'm tempted to propose that we remove the justify_hours call, and tell
   anyone who really wants the old results to apply justify_hours() to  
   the
   subtraction result for themselves.  Not sure what the fallout would  
   be,
   though.
  
  I suspect there's applications out there that are relying on that  
  being nicely formated for display purposes.
  
  I agree it should be removed, but we might need a form of backwards  
  compatibility for a version or two...
  --
  Jim Nasby[EMAIL PROTECTED]
  EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)
  
  
  
  ---(end of broadcast)---
  TIP 3: Have you checked our extensive FAQ?
  
 http://www.postgresql.org/docs/faq
 
 -- 
   Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
   EnterpriseDB   http://www.enterprisedb.com
 
   + If your life is a hard drive, Christ can be your backup. +
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Status of Hierarchical Queries

2007-02-23 Thread Jim C. Nasby
On Thu, Feb 22, 2007 at 07:59:35AM +, Gregory Stark wrote:
 Gavin Sherry [EMAIL PROTECTED] writes:
 
  On Thu, 22 Feb 2007, Gregory Stark wrote:
 
  But in a simple recursive tree search you have a node which wants to do a 
  join
  between the output of tree level n against some table to produce tree level
  n+1. It can't simply execute the plan to produce tree level n since that's 
  the
  same tree it's executing itself. If it calls the Init method on itself 
  it'll
  lose all its state.
 
  There's another reason it can't just execute the previous node. You really
  don't want to recompute all the results for level n when you go to produce
  level n+1. You want to keep them around from the previous iteration. 
  Otherwise
  you have an n^2 algorithm.
 
  Right. When I've spent some idle cycles thinking through this in the past
  I figured that in a non-trivial query, we'd end up with a bunch of
  materialisations, one for each level of recursion. That sounds very ugly.
 
 Well as long as you have precisely one for each level of recursion I think
 you're doing ok. The problem is if you do it the naive way you calculate the
 first level, then for the second level you recalculate the first level again,
 then for the third level you recalculate both of the previous two, ... So you
 end up with n copies of the first level, n-1 copies of the second level, ...
 
 If you reuse the result sets for subsequent recursive calls then you actually
 only need to keep then nth level around until you're done generating the n+1
 level.
 
 The trick is being able to have two different call sites in the plan tree
 pulling records out of the Materialize node at different points in the result
 set. That currently isn't possible.

So it's sounding like the best we can get in 8.3 is WITH doing
single-level subquery replacement?
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Priorities for users or queries?

2007-02-23 Thread Jim C. Nasby
On Wed, Feb 21, 2007 at 04:14:35PM +, Jos?? Orlando Pereira wrote:
 Benjamin Arai wrote:
  Is there a way to give priorities to queries or users? Something similar to 
  NICE in Linux. My goal is to give the updating (backend) application a very 
  low priority and give the web application a high priority to avoid
  disturbing the user experience.  
 
  Thanks in advance!
 
 You might want to look at the following, as they mention a PostgreSQL 
 prototype:
 
 http://www.cs.cmu.edu/~natassa/aapubs/conference/priority mechanisms.pdf

That URL should be...
http://www.cs.cmu.edu/~natassa/aapubs/conference/priority%20mechanisms.pdf

There has been extensive discussion on the bizgres list about ways to
implement priorities, and I believe that paper was mentioned. You should
look at the archives.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] autovacuum next steps, take 2

2007-02-23 Thread Jim C. Nasby
On Fri, Feb 23, 2007 at 01:22:17PM -0300, Alvaro Herrera wrote:
 Jim C. Nasby wrote:
  On Thu, Feb 22, 2007 at 10:32:44PM -0500, Matthew T. O'Connor wrote:
 
   I'm not sure this is a great idea, but I don't see how this would result 
   in large numbers of workers working in one database.   If workers work 
   on tables in size order, and exit as soon as they catch up to an older 
   worker, I don't see the problem.  Newer works are going to catch-up to 
   older workers pretty quickly since small tables will vacuum fairly 
   quickly.
  
  The reason that won't necessarily happen is because you can get large
  tables popping up as needing vacuuming at any time.
 
 Right.
 
 We know that a table that needs frequent vacuum necessarily has to be
 small -- so maybe have the second worker exit when it catches up with
 the first, or when the next table is above 1 GB, whichever happens
 first.  That way, only the first worker can be processing the huge
 tables.  The problem with this is that if one of your hot tables grows
 a bit larger than 1 GB, you suddenly have a change in autovacuuming
 behavior, for no really good reason.
 
 And while your second worker is processing the tables in the hundreds-MB
 range, your high-update 2 MB tables are neglected :-(

That's why I'm thinking it would be best to keep the maximum size of
stuff for the second worker small. It probably also makes sense to tie
it to time and not size, since the key factor is that you want it to hit
the high-update tables every X number of seconds.

If we wanted to get fancy, we could factor in how far over the vacuum
threshold a table is, so even if the table is on the larger size, if
it's way over the threshold the second vacuum will hit it.

You know, maybe the best way to handle this is to force both vacuums to
exit after a certain amount of time, probably with a longer time limit
for the first vacuum in a database. That would mean that after
processing a large table for 10 minutes, the first vacuum would
exit/re-evaluate what work needs to be done. That would mean
medium-sized tables wouldn't get completely starved.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Simple Column reordering

2007-02-23 Thread Jim C. Nasby
First, it would absolutely be best if we just got the full blown patch
into 8.3 and were done with it. I don't think anyone's arguing against
that... it's a question of what we can do if that can't happen (and it
does sound like the patch lost it's maintainer when the direction
changed towards doing both physical and logical ordering code at the
same time).

On Fri, Feb 23, 2007 at 01:04:14PM -0300, Alvaro Herrera wrote:
 Jim C. Nasby wrote:
  On Fri, Feb 23, 2007 at 02:09:55PM +, Simon Riggs wrote:
If you really want an interim solution, what about a builtin function 
that would explicitly mutate the definition and table contents (if any) 
along the lines you want? (assuming that's lots less work than just 
doing the whole thing right to start with). Or even one which just 
*displayed* the optimal order might be sufficient assistance to DBAs 
who 
want to take advantage of this.
   
   I think the only interim solution now is to put functionality into
   PgAdmin et al to optimize the column order.
  
  Well, if it comes to that it would be good to have pgAdmin et all driven
  by logic in the database, so that people using psql can benefit as well.
  Perhaps a function that is passed an existing table and re-creates it in
  optimal order (if it's empty...). Or at least spits out a CREATE TABLE
  statement for you that's in optimal order.
 
 That's just working around the fact that the engine is not smart enough
 to do the right thing (semi-) automatically.  We don't support that kind
 of operation, just like we don't support optimizer hints.

Except it's kind of the opposite... in this case, the database actually
knows better about what fields have what alignment, etc. At least if
users can get what the database says will be the best order they can use
that should they choose to.

I also don't see why we should restrict that information to users of
pgAdmin or other 3rd party tools and not support those that just use
psql.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SCMS question

2007-02-23 Thread Jim C. Nasby
On Fri, Feb 23, 2007 at 08:32:34AM -0800, Joshua D. Drake wrote:
 I am happy to help with this any way I can, because I would love to see
 CVS take a big diving leap off the backend of mysql into the truncated
 data set of hell.

That quote made the whole argument coming up again worthwhile. :)
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposal for Implenting read-only queries during wal replay (SoC 2007)

2007-02-23 Thread Jim C. Nasby
On Fri, Feb 23, 2007 at 10:57:24PM +, Heikki Linnakangas wrote:
 Florian G. Pflug wrote:
 I plan to submit a proposal for implementing support for
 read-only queries during wal replay as a Google Summer of Code 2007
 project.
 
 I've been browsing the postgres source-code for the last few days,
 and came up with the following plan for a implementation.
 
 I'd be very interested in any feedback on the propsoal - especially
 of the you overlooked this an that, it can never work that way kind ;-)
 
 I had the same thought roughly two years ago:
 
 http://archives.postgresql.org/pgsql-hackers/2005-11/msg01043.php
 
 People weren't very interested in having a read-only mode. I think it 
 would be a nice feature if it's not too complicated.

Every customer I've ever talked to about HA has either asked about it or
thought it was a great idea. We should definitely do it if it's not a
load of difficult..
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SCMS question

2007-02-23 Thread Jim C. Nasby
On Fri, Feb 23, 2007 at 04:24:29PM -0700, Warren Turkal wrote:
 On Friday 23 February 2007 15:50, you wrote:
  How to people get a branch?  Do they have their own logins?
 
 If monotone is something like Git, you just create it in your local working 
 copy and push is somewhere public when you are ready, or you can just 
 generate the changeset and submit that.

One additional benefit... we've talked about allowing certain patches to
be automatically verified by the buildfarm to guard against bitrot and
provide additional testing, but one of the big issues has been how to
distribute those patches. Using a distributed SCM would make it easier
to do this; we'd just have to supply buildfarm machines with a list of
approved branches that they should be pulling from (or perhaps some SCMs
would allow a means of tagging that).
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] pg_autovacuum should allow NULL values

2007-02-23 Thread Jim C. Nasby
On Fri, Feb 23, 2007 at 06:47:52PM -0500, Tom Lane wrote:
 I wrote:
  I don't find this particularly important, because we have never intended
  direct update of catalog entries to be a primary way of interacting with
  the system.  The current pg_autovacuum setup is a stopgap until the dust
  has settled enough that we know what sort of long-term API we want for
  autovacuum.
 
 I just had an epiphany about that.  We've wanted to avoid setting the
 autovacuum knobs in stone, because it's pretty obvious they're not
 ready, and that has prevented us from inventing any nice SQL syntax for
 managing the per-table settings.
 
 Meanwhile, the storage-parameter infrastructure got added in 8.2.
 Isn't that an absolutely ideal framework for managing per-table autovac
 settings?  We could drop the separate pg_autovacuum catalog altogether
 and keep all the info in pg_class.reloptions.  Advantages:
 
 * The infrastructure is all there already, including ALTER TABLE and
 pg_dump support.
 
 * The parameter names are not SQL keywords, and the syntax isn't
 hardwired to any particular set of them.  So it would be fairly painless
 to change the set of supported parameters, with or without backwards
 compatibility to keep on recognizing an old parameter.
 
 Disadvantages:
 
 * Wouldn't be forwards-compatible with any hacks that people might
 currently have to dump and restore pg_autovacuum contents.  But you
 could probably make a script to read your existing table and emit
 ALTER TABLE SET commands instead.

Actually, if we wanted to we should be able to create a view that takes
the place of the current pg_autovacuum. With appropriate rules and some
functions, you could probably even make it updatable.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] autovacuum next steps, take 2

2007-02-22 Thread Jim C. Nasby
On Wed, Feb 21, 2007 at 05:40:53PM -0500, Matthew T. O'Connor wrote:
 My Proposal:  If we require admins to identify hot tables tables, then: 
 1) Launcher fires-off a worker1 into database X.
 2) worker1 deals with hot tables first, then regular tables.
 3) Launcher continues to launch workers to DB X every autovac naptime. 
 4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as 
 worker1 did above.  If worker1 is still working in DB X then worker2 
 looks for hot tables that are being starved because worker1 got busy. 
 If worker2 finds no hot tables that need work, then worker2 exits.

Rather than required people to manually identify hot tables, what if we
just prioritize based on table size? So if a second autovac process hits
a specific database, it would find the smallest table in need of
vacuuming that it should be able to complete before the next naptime and
vacuum that. It could even continue picking tables until it can't find
one that it could finish within the naptime. Granted, it would have to
make some assumptions about how many pages it would dirty.

ISTM that's a lot easier than forcing admins to mark specific tables.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] autovacuum next steps, take 2

2007-02-22 Thread Jim C. Nasby
On Thu, Feb 22, 2007 at 09:32:57AM -0500, Matthew T. O'Connor wrote:
 Jim C. Nasby wrote:
 On Wed, Feb 21, 2007 at 05:40:53PM -0500, Matthew T. O'Connor wrote:
   
 My Proposal:  If we require admins to identify hot tables tables, then: 
 1) Launcher fires-off a worker1 into database X.
 2) worker1 deals with hot tables first, then regular tables.
 3) Launcher continues to launch workers to DB X every autovac naptime. 
 4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as 
 worker1 did above.  If worker1 is still working in DB X then worker2 
 looks for hot tables that are being starved because worker1 got busy. 
 If worker2 finds no hot tables that need work, then worker2 exits.
 
 
 Rather than required people to manually identify hot tables, what if we
 just prioritize based on table size? So if a second autovac process hits
 a specific database, it would find the smallest table in need of
 vacuuming that it should be able to complete before the next naptime and
 vacuum that. It could even continue picking tables until it can't find
 one that it could finish within the naptime. Granted, it would have to
 make some assumptions about how many pages it would dirty.
 
 ISTM that's a lot easier than forcing admins to mark specific tables.
 
 So the heuristic would be:
 * Launcher fires off workers into a database at a given interval 
 (perhaps configurable?)
 * Each worker works on tables in size order. 
 * If a worker ever catches up to an older worker, then the younger 
 worker exits.
 
 This sounds simple and workable to me, perhaps we can later modify this 
 to include some max_workers variable so that a worker would only exit if 
 it catches an older worker and there are max_workers currently active.

That would likely result in a number of workers running in one database,
unless you limited how many workers per database. And if you did that,
you wouldn't be addressing the frequently update table problem.

A second vacuum in a database *must* exit after a fairly short time so
that we can go back in and vacuum the important tables again (well or
the 2nd vacuum has to periodically re-evaluate what tables need to be
vacuumed).
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps, take 2

2007-02-22 Thread Jim C. Nasby
On Thu, Feb 22, 2007 at 09:35:45AM +0100, Zeugswetter Andreas ADI SD wrote:
 
  vacuum should be a process with the least amount of voodoo. 
  If we can just have vacuum_delay and vacuum_threshold, where 
  threshold allows an arbitrary setting of how much bandwidth 
  we will allot to the process, then that is a beyond wonderful thing.
  
  It is easy to determine how much IO you have, and what you can spare.
 
 The tricky part is what metric to use. Imho IO per second would be
 good.
 In a typical DB scenario that is the IO bottleneck, not the Mb/s.

Well, right now they're one in the same... but yeah, IO/sec probably
does make more sense.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


  1   2   3   4   5   6   7   8   9   10   >