Re: [PATCHES] Fast CLUSTER

2007-03-27 Thread Simon Riggs
On Thu, 2007-02-22 at 22:49 +, Simon Riggs wrote:
 On Tue, 2007-02-20 at 14:38 -0300, Alvaro Herrera wrote:
 
  Cool.  I noticed that the SGML seems broken here:
 
 Corrected. 
 
  You need to close the listitem and para opened in the COPY mention.
  
   + 
   + static void
   + heap_sync_relation(Relation rel)
   + {
   + if (!rel-rd_istemp)
  
  No comment in this function?
 
 I've added more comments as you suggest.
 
 Thanks for the review.

Could we add this to the unapplied patches queue? It seems to have been
missed off the list. Thanks.

There is a probable conflict with Heikki's recent CLUSTER patch, so I'm
happy to re-write this patch after that has been applied. So its OK to
stick it at the bottom of the queue.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PATCHES] Unbroke srcdir!=builddir compilation

2007-03-27 Thread Marko Kreen

Let's be decent.

--
marko


pg13.diff
Description: Binary data

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Numeric patch to add special-case representations for 8 bytes

2007-03-27 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 Greg, do you want to submit a patch for this or add a TODO item for this?

Well I never got any approval or rejection in principle so I don't know if
such a patch would be accepted even if it were implemented reasonably. If it
has the consensus needed to be a TODO item I would go ahead and do it.

The main design issue is that I was proposing to make it impossible to access
the internals of the numeric storage using macros. Currently some of the data
(the sign, dscale, and weight) is visible without having to call any special
numeric functions. I was proposing to use representations where those might
not be as easily accessible. 

However I don't think we have any consumers of that data outside of numeric.c
anyways. Is there anything using that functionality currently? Do we mind
losing it?

 ---

 Gregory Stark wrote:
 
 I've uploaded a quick hack to store numerics in  8 bytes when possible. 
 
  http://community.enterprisedb.com/numeric-hack-1.patch
 
 This is a bit of a kludge since it doesn't actually provide any interface for
 external clients of the numeric module to parse the resulting data. Ie, the
 macros in numeric.h will return garbage.
 
 But I'm not entirely convinced that matters. It's not like those macros were
 really useful to any other modules anyways since there was no way to extract
 the actual digits.
 
 The patch is also slightly unsatisfactory because as I discovered numbers 
 like
 1.1 are stored as two digits currently. But it does work and it does save a
 substantial amount of space for integers.
 
 postgres=# select n,pg_column_size(n) from n;
 n | pg_column_size 
 --+
 1 |  2
11 |  2
   101 |  2
  1001 |  3
 10001 |  9
11 |  9
   1.1 |  9
  10.1 |  9
 100.1 |  9
1000.1 |  9
   1.1 | 11
  10.1 | 11
 
 I had hoped to get the second batch to be 3-4 bytes. But even now note how
 much space is saved for integers 1.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PATCHES] Patch for circular buffer in tuplestore to optimize merge joins (v1)

2007-03-27 Thread stark

This patch implements a circular buffer in tuplestore which drops old tuples
as they're no longer needed. It uses this for merge joins to avoid having to
spill the tuplestore if no single value exceeds work_mem. It also is what's
needed for both recursive query support and OLAP window functions (hence why
it implements the more complex circular buffer rather than just moving the
single tuple up to the head of the buffer).

This was mostly already done by Simon, I just finished the logic in tuplesort.c.

This is actually not quite polished so I guess it's still a WIP but it's
certainly ready to be reviewed. All that remains is polishing. If there's
anything in there people object to now I would like to know. 



mergejoin-circbuffer-v1.patch.gz
Description: Binary data


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-27 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Uh, no, that's not very clear.  A long-running transaction will be a
  VACUUM bottleneck because of its own XID, never mind its xmin.
 
  Well, my secondary addition was to start MyProc-xmin with the current
  xid counter, rather than your own xid.
 
 Don't tell me you seriously believe that would work.

I do.  Please tell me why it will not.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] patch adding new regexp functions

2007-03-27 Thread Bruce Momjian
Jeremy Drake wrote:
 On Mon, 26 Mar 2007, Bruce Momjian wrote:
 
  Tom Lane wrote:
   Or were you speaking to the question of whether to adjust the regexp
   patch to conform more nearly to the coding practices found elsewhere?
   I agree with that, but I thought there was already a submitted patch
   for it.
 
  Yes, regex patch adjustment, and I have not seen a patch which makes
  such adjustments.
 
 http://archives.postgresql.org/pgsql-patches/2007-03/msg00285.php

Ah, yes, I still have that in my mailbox, but didn't think it was
related because there was discussion _after_ the patch was posted.  Will
add the patch to the queue now.  Thanks.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] patch adding new regexp functions

2007-03-27 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Jeremy Drake wrote:
 On Thu, 22 Mar 2007, Tom Lane wrote:
 
  I'd vote for making this new code look like the rest of it, to wit
  hardwire the values.
 
 Attached please find a patch which does this.
 
 
 
 -- 
 Although written many years ago, Lady Chatterley's Lover has just been
 reissued by the Grove Press, and this pictorial account of the
 day-to-day life of an English gamekeeper is full of considerable
 interest to outdoor minded readers, as it contains many passages on
 pheasant-raising, the apprehending of poachers, ways to control vermin,
 and other chores and duties of the professional gamekeeper.
 Unfortunately, one is obliged to wade through many pages of extraneous
 material in order to discover and savour those sidelights on the
 management of a midland shooting estate, and in this reviewer's opinion
 the book cannot take the place of J. R. Miller's Practical
 Gamekeeping.
   -- Ed Zern, Field and Stream (Nov. 1959)
Content-Description: 

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-27 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 Well, my secondary addition was to start MyProc-xmin with the current
 xid counter, rather than your own xid.
 
 Don't tell me you seriously believe that would work.

 I do.  Please tell me why it will not.

You can't have GlobalXmin greater than your own xid, else log space
(particularly pg_subtrans) may be recycled too soon.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Improvement of procArray.xmin for VACUUM

2007-03-27 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
  Well, my secondary addition was to start MyProc-xmin with the current
  xid counter, rather than your own xid.
  
  Don't tell me you seriously believe that would work.
 
  I do.  Please tell me why it will not.
 
 You can't have GlobalXmin greater than your own xid, else log space
 (particularly pg_subtrans) may be recycled too soon.

OK, so maybe GlobalXmin would have to be split into two new variables
that control log space and dead-row detection separately.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Fast CLUSTER

2007-03-27 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Simon Riggs wrote:
 On Tue, 2007-02-20 at 14:38 -0300, Alvaro Herrera wrote:
 
  Cool.  I noticed that the SGML seems broken here:
 
 Corrected. 
 
  You need to close the listitem and para opened in the COPY mention.
  
   + 
   + static void
   + heap_sync_relation(Relation rel)
   + {
   + if (!rel-rd_istemp)
  
  No comment in this function?
 
 I've added more comments as you suggest.
 
 Thanks for the review.
 
 -- 
   Simon Riggs 
   EnterpriseDB   http://www.enterprisedb.com
 

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Fast CLUSTER

2007-03-27 Thread Bruce Momjian
Simon Riggs wrote:
 On Thu, 2007-02-22 at 22:49 +, Simon Riggs wrote:
  On Tue, 2007-02-20 at 14:38 -0300, Alvaro Herrera wrote:
  
   Cool.  I noticed that the SGML seems broken here:
  
  Corrected. 
  
   You need to close the listitem and para opened in the COPY mention.
   
+ 
+ static void
+ heap_sync_relation(Relation rel)
+ {
+   if (!rel-rd_istemp)
   
   No comment in this function?
  
  I've added more comments as you suggest.
  
  Thanks for the review.
 
 Could we add this to the unapplied patches queue? It seems to have been
 missed off the list. Thanks.

Done.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PATCHES] DEALLOCATE ALL

2007-03-27 Thread Marko Kreen

When pooling connections where prepared statements are in use,
it is hard to give new client totally clean connection as
there may be allocated statements that give errors when
new client starts preparing statements again.

Currently PgPool solves the situation by parsing all queries
and keeping list of prepares statements.  This may not be a big
problem for it as it parses the queries anyway, but for simple
pooler like PgBouncer (see pgfoundry) that does not look inside
libpq packets it is huge problem.

Attached is a patch that allows keyword ALL to be used with
DEALLOCATE and then frees all prepared queryes.  I think it is
useful for most pooling situations, not only PgBouncer.
Also its similar in the spirit to RESET ALL.

I did it slightly hacky way - if DeallocateStmt-name is
NULL is signifies DEALLOCATE ALL command.  All the code
that looks into DeallocateStmt seems to survive the situation
so it should be problem.  If still a new node is needed
or additional field in the node I can rework the patch.

--
marko


dealloc_all.diff
Description: Binary data

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Unbroke srcdir!=builddir compilation

2007-03-27 Thread Bruce Momjian

Applied.

---

Marko Kreen wrote:
 Let's be decent.
 
 -- 
 marko

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Recalculating OldestXmin in a long-running vacuum

2007-03-27 Thread Bruce Momjian
Heikki Linnakangas wrote:
 Alvaro Herrera wrote:
  Heikki Linnakangas wrote:
  
  I ran two 24h test runs with DBT-2, one with the patch and one without. 
  To get comparable, predictable results, I turned autovacuum off and run 
  a manual vacuum in a loop on the stock-table alone.
 
  As expected, the steady-state of the stock table is smaller with the 
  patch. But only by ~2%, that's slightly less than I expected.
 
  But what surprises me is that response times went up a with the patch. I 
  don't know why.
  
  Maybe because of increased contention of ProcArrayLock?  (I assume you
  are using that, althought I haven't seen the patch)
 
 I am, but I doubt that's it. The response times are dominated by I/O, so 
 any increase in lock contention would hardly show up. And the patch is 
 only adding one GetOldestXmin call every 1000 scanned pages, which is 
 nothing compared to the thousands of GetSnapshot calls the normal 
 transactions are issuing per minute.
 
 The patch must have changed the I/O pattern in some subtle way.

So are you stopping work on the patch?  I assume so.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Fast CLUSTER

2007-03-27 Thread Simon Riggs
On Tue, 2007-03-27 at 10:28 -0400, Bruce Momjian wrote:
 Simon Riggs wrote:
  On Thu, 2007-02-22 at 22:49 +, Simon Riggs wrote:
  Could we add this to the unapplied patches queue? It seems to have been
  missed off the list. Thanks.
 
 Done.

Many thanks

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Recalculating OldestXmin in a long-running vacuum

2007-03-27 Thread Heikki Linnakangas

Bruce Momjian wrote:

Heikki Linnakangas wrote:

Alvaro Herrera wrote:

Heikki Linnakangas wrote:

I ran two 24h test runs with DBT-2, one with the patch and one without. 
To get comparable, predictable results, I turned autovacuum off and run 
a manual vacuum in a loop on the stock-table alone.


As expected, the steady-state of the stock table is smaller with the 
patch. But only by ~2%, that's slightly less than I expected.


But what surprises me is that response times went up a with the patch. I 
don't know why.

Maybe because of increased contention of ProcArrayLock?  (I assume you
are using that, althought I haven't seen the patch)
I am, but I doubt that's it. The response times are dominated by I/O, so 
any increase in lock contention would hardly show up. And the patch is 
only adding one GetOldestXmin call every 1000 scanned pages, which is 
nothing compared to the thousands of GetSnapshot calls the normal 
transactions are issuing per minute.


The patch must have changed the I/O pattern in some subtle way.


So are you stopping work on the patch?  I assume so.


Yes, at least for now. I can't believe the patch actually hurts 
performance, but I'm not going to spend time investigating it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] WIP patch - INSERT-able log statements

2007-03-27 Thread Bruce Momjian

What is the status of this patch?

---

FAST PostgreSQL wrote:
 Hi,
 
 I've been working on the following TODO item and attached is an initial 
 patch. (It is only partial and not yet completely functional)
 
 Allow server log information to be output as INSERT statements 
 This would allow server log information to be easily loaded into a database 
 for analysis. 
 
 I want to confirm, if what I have done so far is what community is looking 
 for and also want to clear some doubts.
 
 What is done so far
 ---
 
 Two postgresql.conf variables
 
 #log_output_type = 'text' #Valid values are 'SQL' or 'text'
 #log_output_table_name = 'auditlogs'
 
 These control how to output the log. Defaults to 'text' which is status quo. 
 If it is set to 'SQL' log will be output as INSERT commands.
 
 The second variable is of interest. We need to specify a table in the insert 
 command. My preferred option is for the user to give one and he can create it 
 if and when he wants to. The alternative is we decide the table name and make 
 initdb to create one.  
 
 The proposed log output structure
 --
 INSERT INTO user_defined_table values( timestamp_with_milliseconds,  
 timestamp, username,  databasename, sessionid,  host_and_port, host, proc_id, 
 command_tag,  session_start, transaction_id,  error_severity,  
 SQL_State_Code, error_message);
 
 All these columns will follow the current rules of log output. ie, unless 
 explicity requested by the user, these columns will have NULL. User can still 
 give log_line_prefix in any order he wants, and logger will output it in 
 appropriate columns. The code has been modified to do 
 this.
 
 Issues/Questions are:
 - How about 'Statement duration log'.  This will come to the logger as a 
 single string and after the query execution. In the existing log we can make 
 sense of the duration log by matching it with the statement above it or by 
 the statement which gets printed besides it (Again as 
 a single string). But when this is loaded onto a table doesn't make much 
 sense untless everything is in a single row. (My preferred option is to add 
 another column to the table structure defined above as 'duration'. But 
 haven't figured out how to achieve this, because the 
 statement is printed first and then the duration as another log.)
 
 - If the SQL log output is to the syslog, then it becomes pretty awkward and 
 possibly useless because our current syslog writer function breaks up the log 
 into several lines to accomodate various platforms. Syslog also then adds 
 other information before outputting it, which 
 cannot be loaded onto a table. The preferred option is to educate the user 
 through documentation that SQL type log output is best served when it is 
 output to stderr and redirected to a file? Same goes with other aspects such 
 as verbose and various other statistics log. 
 
 - There are also other minor issues such as, the actual query currently gets 
 output in log as 'Statement: CREATE '. For sql type log we may not 
 need the 'Statement:' part as it will be in a column ? Do we remove this in 
 both text and SQL outputs ?
 
 Rgds,
 Arul Shaji
 
 This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 
 003 693 481. It is confidential to the ordinary user of the email address to 
 which it was addressed and may contain copyright and/or legally privileged 
 information. No one else may read, print, store, copy or forward all or any 
 of it or its attachments. If you receive this email in error, please return 
 to sender. Thank you.
 
 If you do not wish to receive commercial email messages from Fujitsu 
 Australia Software Technology Pty Ltd, please email [EMAIL PROTECTED]

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] Concurrent psql v4 [WIP]

2007-03-27 Thread stark

This is just an update against CVS.

The interface is still as described at this URL:

http://community.enterprisedb.com/concurrent/index.html

It's true there isn't any documentation in the patch but I'm not sure we're
actually settled on the interface.

I think our experience here is that the \cwait command was a mistake. You
never need it in intractive use, and you need to use it *religiously* when
writing regression tests. If you miss one your regression test will fail
randomly depending on the timing. Instead we should just have a psql setting
that makes it automatically wait before every connection switch. That ensures
you catch any bugs where a command fails to block when it should, and also
ensures you catch the output of an unblocked command as soon as you switch
connections to it.

The other issue is the cursor behaviour. Currently it looks like below and it
starts a new pager for each block. I'm not sure this is really all that bad
myself but I have a feeling others will disagree. I'm not exactly sure what
the right behaviour is though, if this is an asynchronous command issued with
\cnowait then it would strange to suddenly start executing synchronously once
the first bit of data arrives.

postgres[2]=# select * from generate_series(1,4);
 generate_series 
-
   1
   2
   3
   4
(10 rows)

postgres[2]=# \set FETCH_COUNT 1

postgres[2]=# select * from generate_series(1,10);
 generate_series 
-
   1
(1 row)

 generate_series 
-
   2
(1 row)

 generate_series 
-
   3
(1 row)

 generate_series 
-
   4
(1 row)

 generate_series 
-
(0 rows)




concurrent-psql-v4.patch.gz
Description: Binary data


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Recalculating OldestXmin in a long-running vacuum

2007-03-27 Thread Gregory Stark

Heikki Linnakangas [EMAIL PROTECTED] writes:

 But what surprises me is that response times went up a with the patch. I
 don't know why.
 Maybe because of increased contention of ProcArrayLock?  (I assume you
 are using that, althought I haven't seen the patch)
 I am, but I doubt that's it. The response times are dominated by I/O, so any
 increase in lock contention would hardly show up. And the patch is only
 adding one GetOldestXmin call every 1000 scanned pages, which is nothing
 compared to the thousands of GetSnapshot calls the normal transactions are
 issuing per minute.

 The patch must have changed the I/O pattern in some subtle way.

 So are you stopping work on the patch?  I assume so.

 Yes, at least for now. I can't believe the patch actually hurts performance,
 but I'm not going to spend time investigating it.

Isn't this exactly what you would expect? It will clean up more tuples so
it'll dirty more pages. Especially given the pessimal way vacuum's dirty
buffers are handled until Simon's patch to fix that goes in.

The benefit of the patch that we would expect to see is that you won't need to
run VACUUM as often. In the long term we would expect the stock table to grow
less too but I doubt these tests were long enough to demonstrate that effect.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Recalculating OldestXmin in a long-running vacuum

2007-03-27 Thread Heikki Linnakangas

Gregory Stark wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Yes, at least for now. I can't believe the patch actually hurts performance,
but I'm not going to spend time investigating it.


Isn't this exactly what you would expect? It will clean up more tuples so
it'll dirty more pages. Especially given the pessimal way vacuum's dirty
buffers are handled until Simon's patch to fix that goes in.


Hmm. Yeah, maybe it'll get better when we get that fixed..


The benefit of the patch that we would expect to see is that you won't need to
run VACUUM as often. In the long term we would expect the stock table to grow
less too but I doubt these tests were long enough to demonstrate that effect.


The size did reach a steady state about half-way through the test, see 
the logs here:


patched
http://community.enterprisedb.com/oldestxmin/92/server/relsizes.log

unpatched
http://community.enterprisedb.com/oldestxmin/93/server/relsizes.log

The test was a success in that sense, the patch did reduce the steady 
state size of the stock table.


Maybe we would see a gain in transactions per minute or response times 
if we traded off the smaller table size to run vacuum slightly less 
frequently.. But as I said I don't want to spend time running more tests 
for what seems like a small benefit.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Recalculating OldestXmin in a long-running vacuum

2007-03-27 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Bruce Momjian wrote:
 So are you stopping work on the patch?  I assume so.

 Yes, at least for now. I can't believe the patch actually hurts 
 performance, but I'm not going to spend time investigating it.

Are we withdrawing the patch from consideration for 8.3 then?
I had assumed it was still a live candidate, but if it seems to
lose in pgbench maybe we had better set it aside.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Recalculating OldestXmin in a long-running vacuum

2007-03-27 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Bruce Momjian wrote:

So are you stopping work on the patch?  I assume so.


Yes, at least for now. I can't believe the patch actually hurts 
performance, but I'm not going to spend time investigating it.


Are we withdrawing the patch from consideration for 8.3 then?
I had assumed it was still a live candidate, but if it seems to
lose in pgbench maybe we had better set it aside.


I haven't tried pgbench, the tests I ran were with DBT-2.

Just to summarize again: the patch did help to keep the stock table 
smaller, but the response times were higher with the patch.


Maybe we should keep this issue open until we resolve the vacuum WAL 
flush issue? I can then rerun the same tests to see if this patch is a 
win after that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] Recalculating OldestXmin in a long-running vacuum

2007-03-27 Thread Bruce Momjian
Heikki Linnakangas wrote:
 Tom Lane wrote:
  Heikki Linnakangas [EMAIL PROTECTED] writes:
  Bruce Momjian wrote:
  So are you stopping work on the patch?  I assume so.
  
  Yes, at least for now. I can't believe the patch actually hurts 
  performance, but I'm not going to spend time investigating it.
  
  Are we withdrawing the patch from consideration for 8.3 then?
  I had assumed it was still a live candidate, but if it seems to
  lose in pgbench maybe we had better set it aside.
 
 I haven't tried pgbench, the tests I ran were with DBT-2.
 
 Just to summarize again: the patch did help to keep the stock table 
 smaller, but the response times were higher with the patch.
 
 Maybe we should keep this issue open until we resolve the vacuum WAL 
 flush issue? I can then rerun the same tests to see if this patch is a 
 win after that.

Would you like to add a TODO item?

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Recalculating OldestXmin in a long-running vacuum

2007-03-27 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
Maybe we should keep this issue open until we resolve the vacuum WAL 
flush issue? I can then rerun the same tests to see if this patch is a 
win after that.


Sounds like a plan, if you are willing to do that.


Sure, just rerunning the same tests isn't much work.

Bruce Momjian wrote:

 Would you like to add a TODO item?


I don't know how we track things like this. Maybe add to the end of the 
patch queue, with link to this discussion so that we remember that it 
needs more testing?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] [pgsql-patches] pg_get_domaindef

2007-03-27 Thread Bruce Momjian

I have remove this TODO item:

* %Add pg_get_acldef(), pg_get_typedefault(), pg_get_attrdef(),
  pg_get_tabledef(), pg_get_domaindef(), pg_get_functiondef()

  These would be for application use, not for use by pg_dump.

Seems there is lack of interest in adding this feature because of
maintanance concerns.

The attached patch is rejected for the same reason.  Sorry for the
confusion.


---

FAST PostgreSQL wrote:
 On Thu, 25 Jan 2007 02:25, Tom Lane wrote:
  Andrew Dunstan [EMAIL PROTECTED] writes:
   FAST PostgreSQL wrote:
   Please find attached the patch with modifications
  
   are you proposing to implement the other functions in this TODO item
   (pg_get_acldef(), pg_get_typedefault(), pg_get_attrdef(),
   pg_get_tabledef(), pg_get_functiondef() ) ?
 
  I haven't entirely understood the use case for any of these.  It's not
 
 Any consensus on these functions? If we decide against having these then its 
 better to remove them from the TODO list temporarily/permanently.
 
 Rgds,
 Arul Shaji
 
 
  pg_dump, for a number of reasons: one being that pg_dump still has to
  support older backend versions, and another being that every time we
  let backend SnapshotNow functions get involved, we take another hit to
  pg_dump's claim to produce a consistent MVCC snapshot.
 
  But my real objection is: do we really want to support duplicative code
  in both pg_dump and the backend?  Updating pg_dump is already a major
  PITA whenever one adds a new feature; doubling that work isn't
  attractive.  (And it'd be double, not just a copy-and-paste, because of
  the large difference in the operating environment.)  So I want to hear a
  seriously convincing use-case that will justify the maintenance load we
  are setting up for ourselves.  Somebody might want this is not
  adequate.
 
  Perhaps a better area of work would be the often-proposed refactoring of
  pg_dump into a library and driver program, wherein the library could
  expose individual functions such as fetch the SQL definition of this
  object.  Unfortunately, that'll be a huge project with no payoff until
  the end...
 
  regards, tom lane
 This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 
 003 693 481. It is confidential to the ordinary user of the email address to 
 which it was addressed and may contain copyright and/or legally privileged 
 information. No one else may read, print, store, copy or forward all or any 
 of it or its attachments. If you receive this email in error, please return 
 to sender. Thank you.
 
 If you do not wish to receive commercial email messages from Fujitsu 
 Australia Software Technology Pty Ltd, please email [EMAIL PROTECTED]
 
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] DEALLOCATE ALL

2007-03-27 Thread Alvaro Herrera
Marko Kreen wrote:
 When pooling connections where prepared statements are in use,
 it is hard to give new client totally clean connection as
 there may be allocated statements that give errors when
 new client starts preparing statements again.

Huh, didn't we have a RESET SESSION command to do just that?  What about
cursors, for example?

 I did it slightly hacky way - if DeallocateStmt-name is
 NULL is signifies DEALLOCATE ALL command.  All the code
 that looks into DeallocateStmt seems to survive the situation
 so it should be problem.  If still a new node is needed
 or additional field in the node I can rework the patch.

Wouldn't it be easier to just add a bool to DeallocateStmt?

-- 
Alvaro Herrerahttp://www.PlanetPostgreSQL.org
Si un desconocido se acerca y te regala un CD de Ubuntu ...
 Eso es ...  Eau de Tux

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] DEALLOCATE ALL

2007-03-27 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Marko Kreen wrote:
 When pooling connections where prepared statements are in use,
 it is hard to give new client totally clean connection as
 there may be allocated statements that give errors when
 new client starts preparing statements again.

 Huh, didn't we have a RESET SESSION command to do just that?  What about
 cursors, for example?

We don't actually *have* one, but I believe it was agreed that that is
the right API to provide.  If a pooler has to remember to clear prepared
statements, GUCs, cursors, and who knows what else, it'll be perpetually
broken because there'll be something it omits.

There might be a use-case for DEALLOCATE ALL, but needs of poolers
aren't it.  I'd be inclined to vote against this unless someone can
point to a better use-case.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Numeric patch to add special-case representations for 8 bytes

2007-03-27 Thread Heikki Linnakangas

Gregory Stark wrote:

The main design issue is that I was proposing to make it impossible to access
the internals of the numeric storage using macros. Currently some of the data
(the sign, dscale, and weight) is visible without having to call any special
numeric functions. I was proposing to use representations where those might
not be as easily accessible. 


However I don't think we have any consumers of that data outside of numeric.c
anyways. Is there anything using that functionality currently? Do we mind
losing it?


The data would still be available through a function, right? If there's 
no-one accessing that information currently, there's no 
backwards-compatibility issue. I think this is a non-issue.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] DEALLOCATE ALL

2007-03-27 Thread Marko Kreen

On 3/27/07, Tom Lane [EMAIL PROTECTED] wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:
 Marko Kreen wrote:
 When pooling connections where prepared statements are in use,
 it is hard to give new client totally clean connection as
 there may be allocated statements that give errors when
 new client starts preparing statements again.

 Huh, didn't we have a RESET SESSION command to do just that?  What about
 cursors, for example?

We don't actually *have* one, but I believe it was agreed that that is
the right API to provide.  If a pooler has to remember to clear prepared
statements, GUCs, cursors, and who knows what else, it'll be perpetually
broken because there'll be something it omits.


Well.  Please apply following patch then:

http://archives.postgresql.org/pgsql-patches/2004-12/msg00228.php

Even if it is incomplete, the missing parts can be added later.
I see no reason to keep it from users.


There might be a use-case for DEALLOCATE ALL, but needs of poolers
aren't it.  I'd be inclined to vote against this unless someone can
point to a better use-case.


Ok, a non-pooler argument: prepared statements are supposed to be
garbage-collected by the user.  Thats it.  There should be friendly
way to get a clean state without the need for user to specifically
keep track of whats allocated, or to do messy exception-handling
around PREPARE/DEALLOCATE.  (PREPARE OR REPLACE and DEALLOCATE IF EXISTS
would also lessen the pain.)

Then a pooler argument: there is one pooler where RandomJoe executes
queries and another for specific app where the subset of SQL it uses is
known.  I want to RESET only specific things in app case.  So it would be
good if the RESET-s for specific areas would be available.

Also the objections to the Hans' patch give impression that different
pooling solutions want different RESET EVERYTHING, so again,
it would be good if RESET-s for different areas are available
and the all-encomassing RESET EVERYTHING just ties all the specific
RESETs together.

--
marko

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] [PATCH] add CLUSTER table ORDER BY index

2007-03-27 Thread Holger Schurig
The following table add's a new variant of the CLUSTER command. The old
variants are preserved, as suggested in the TODO entry.

Things I changed:

* The grammar
* psql help text
* psql tab-completion, it favours now CLUSTER table ORDER BY index
* two uses of CLUSTER in the regression, so that both the old and
  new syntax get checked

Things to consider:

* not yet in the documentation
* psql should probably no longer emit CLUSTER index ON table


Index: src/doc/TODO
===
*** src.orig/doc/TODO   2007-03-27 21:18:01.0 +0200
--- src/doc/TODO2007-03-27 21:18:26.0 +0200
***
*** 624,631 
  
o %Add VERBOSE option to report tables as they are processed,
  like VACUUM VERBOSE
-   o Add more logical syntax CLUSTER table ORDER BY index; 
- support current syntax for backward compatibility
  
  
  * COPY
--- 624,629 
Index: src/doc/src/FAQ/TODO.html
===
*** src.orig/doc/src/FAQ/TODO.html  2007-03-27 21:18:01.0 +0200
--- src/doc/src/FAQ/TODO.html   2007-03-27 21:18:26.0 +0200
***
*** 558,565 
  /p
  /lili%Add VERBOSE option to report tables as they are processed,
like VACUUM VERBOSE
- /liliAdd more logical syntax CLUSTER table ORDER BY index; 
-   support current syntax for backward compatibility
/li/ul
/liliCOPY
ul
--- 558,563 
Index: src/src/backend/parser/gram.y
===
*** src.orig/src/backend/parser/gram.y  2007-03-27 21:18:01.0 +0200
--- src/src/backend/parser/gram.y   2007-03-27 21:18:26.0 +0200
***
*** 209,215 
  
  %type str   relation_name copy_file_name
database_name access_method_clause 
access_method attr_name
!   index_name name file_name
  
  %type list  func_name handler_name qual_Op qual_all_Op subquery_Op
opt_class opt_validator
--- 209,215 
  
  %type str   relation_name copy_file_name
database_name access_method_clause 
access_method attr_name
!   index_name name file_name opt_cluster_order_by
  
  %type list  func_name handler_name qual_Op qual_all_Op subquery_Op
opt_class opt_validator
***
*** 5327,5332 
--- 5327,5333 
   *
   *QUERY:
   *cluster index_name on qualified_name
+  *cluster qualified_name ORDER BY index_name
   *cluster qualified_name
   *cluster
   *
***
*** 5340,5350 
   n-indexname = $2;
   $$ = (Node*)n;
}
!   | CLUSTER qualified_name
{
   ClusterStmt *n = makeNode(ClusterStmt);
   n-relation = $2;
!  n-indexname = NULL;
   $$ = (Node*)n;
}
| CLUSTER
--- 5341,5351 
   n-indexname = $2;
   $$ = (Node*)n;
}
!   | CLUSTER qualified_name opt_cluster_order_by
{
   ClusterStmt *n = makeNode(ClusterStmt);
   n-relation = $2;
!  n-indexname = $3;
   $$ = (Node*)n;
}
| CLUSTER
***
*** 5356,5361 
--- 5357,5368 
}
;
  
+ opt_cluster_order_by:
+   ORDER BY index_name { $$ = $3; }
+   | /*EMPTY*/ { $$ = NULL; }
+   ;
+ 
+ 
  /*
   *
   *QUERY:
Index: src/src/bin/psql/sql_help.h
===
*** src.orig/src/bin/psql/sql_help.h2007-03-27 21:18:01.0 +0200
--- src/src/bin/psql/sql_help.h 2007-03-27 21:18:26.0 +0200
***
*** 119,125 
  
  { CLUSTER,
N_(cluster a table according to an index),
!   N_(CLUSTER indexname ON tablename\nCLUSTER tablename\nCLUSTER) },
  
  { COMMENT,
N_(define or change the comment of an object),
--- 119,125 
  
  { CLUSTER,
N_(cluster a table according to an index),
!   N_(CLUSTER indexname ON 

Re: [PATCHES] [pgsql-patches] O_DIRECT support for Windows

2007-03-27 Thread Magnus Hagander
IIRC, we're still waiting for performance numbers showing there exists a
win from this patch.

//Magnus

Bruce Momjian wrote:
 Magnus, where are on this?
 
 ---
 
 Magnus Hagander wrote:
 We're ok with the alignment issues provided the is code added to reject
 O_DIRECT if the sector size is too large.

 We also said we need to see some performance numbers on the effect of
 the patch before it goes in.

 //Magnus


 Bruce Momjian wrote:
 So, do we want this patch?  Are we OK on WIN32 alignment issues?

 ---

 ITAGAKI Takahiro wrote:
 The attached is a patch to define O_DIRECT by ourselves on Windows,
 and to map O_DIRECT to FILE_FLAG_NO_BUFFERING.

 There will be a consistency in our support between Windows and other OSes
 that have O_DIRECT. Also, there is the following comment that says, I read,
 we should do so.
 | handle other flags? (eg FILE_FLAG_NO_BUFFERING/FILE_FLAG_WRITE_THROUGH)

 Is this worth doing? Do we need more performance reports for the change?

 Regards,
 ---
 ITAGAKI Takahiro
 NTT Open Source Software Center
 [ Attachment, skipping... ]

 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at

 http://www.postgresql.org/about/donate
 


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] autovacuum: recheck logic

2007-03-27 Thread Alvaro Herrera
Hi,

This is the first non-trivial patch to autovacuum multiple workers.

This patch that adds the recheck logic to autovacuum worker.  With
this, the worker first builds its table list and then rechecks pgstat
before vacuuming each table to verify that no one has vacuumed the table
in the meantime, before vacuuming it.

In the current autovacuum world this only means that a worker will not
vacuum a table that a user has vacuumed manually.  As discussed, it will
be much more useful as soon as multiple workers are running
concurrently.

To do this, I separated the task of calculating autovacuum parameters
(freeze_min_age, vacuum cost limit and delay, etc) from the autovac
equation calculation (freeze_max_age, pg_class.reltuples, etc).

We now keep track of three lists at the initial pg_class scan:

1. tables that need vacuum or analyze, per equations
2. tables not in (1) that have toast tables
3. toast tables that need vacuum

Then we append those tables in (2) whose toast tables are in (3), to the
(1) list.  The rest are discarded.  So when we need to do the
rechecking, we need to process only those tables that actually needed
vacuuming.  With the previous coding, we would end up rechecking almost
all tables every time (to be exact, all tables that have a toast table).

The autovacuum parameters are only calculated in the second pass (the
rechecking).  The first pass only yields boolean parameters.

Unless there are objections I'll apply this tomorrow.

-- 
Alvaro Herrerahttp://www.PlanetPostgreSQL.org
La naturaleza, tan frágil, tan expuesta a la muerte... y tan viva
Index: src/backend/postmaster/autovacuum.c
===
RCS file: /home/alvherre/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.39
diff -c -p -r1.39 autovacuum.c
*** src/backend/postmaster/autovacuum.c	27 Mar 2007 20:36:03 -	1.39
--- src/backend/postmaster/autovacuum.c	27 Mar 2007 20:43:31 -
*** typedef struct autovac_dbase
*** 91,96 
--- 91,103 
  	PgStat_StatDBEntry *ad_entry;
  } autovac_dbase;
  
+ /* struct to keep track of tables to vacuum and/or analyze, in 1st pass */
+ typedef struct av_relation
+ {
+ 	Oid		ar_relid;
+ 	Oid		ar_toastrelid;
+ } av_relation;
+ 
  /* struct to keep track of tables to vacuum and/or analyze, after rechecking */
  typedef struct autovac_table
  {
*** NON_EXEC_STATIC void AutoVacLauncherMain
*** 121,137 
  static void do_start_worker(void);
  static void do_autovacuum(Oid dbid);
  static List *autovac_get_database_list(void);
! static void test_rel_for_autovac(Oid relid, PgStat_StatTabEntry *tabentry,
! 	 Form_pg_class classForm,
! 	 Form_pg_autovacuum avForm,
! 	 List **vacuum_tables,
! 	 List **toast_table_ids);
  static void autovacuum_do_vac_analyze(Oid relid, bool dovacuum,
  		  bool doanalyze, int freeze_min_age);
  static HeapTuple get_pg_autovacuum_tuple_relid(Relation avRel, Oid relid);
  static PgStat_StatTabEntry *get_pgstat_tabentry_relid(Oid relid, bool isshared,
  		  PgStat_StatDBEntry *shared,
  		  PgStat_StatDBEntry *dbentry);
  static void autovac_report_activity(VacuumStmt *vacstmt, Oid relid);
  static void avl_sighup_handler(SIGNAL_ARGS);
  static void avlauncher_shutdown(SIGNAL_ARGS);
--- 128,152 
  static void do_start_worker(void);
  static void do_autovacuum(Oid dbid);
  static List *autovac_get_database_list(void);
! 
! static void relation_check_autovac(Oid relid, Form_pg_class classForm,
! 	   Form_pg_autovacuum avForm, PgStat_StatTabEntry *tabentry,
! 	   List **table_oids, List **table_toast_list,
! 	   List **toast_oids);
! static autovac_table *table_recheck_autovac(Oid relid,
! 			PgStat_StatDBEntry *shared,
! 			PgStat_StatDBEntry *dbentry);
  static void autovacuum_do_vac_analyze(Oid relid, bool dovacuum,
  		  bool doanalyze, int freeze_min_age);
  static HeapTuple get_pg_autovacuum_tuple_relid(Relation avRel, Oid relid);
  static PgStat_StatTabEntry *get_pgstat_tabentry_relid(Oid relid, bool isshared,
  		  PgStat_StatDBEntry *shared,
  		  PgStat_StatDBEntry *dbentry);
+ static void relation_needs_vacanalyze(Oid relid, Form_pg_autovacuum avForm,
+ 		  Form_pg_class classForm,
+ 		  PgStat_StatTabEntry *tabentry, bool *dovacuum,
+ 		  bool *doanalyze);
+ static HeapTuple get_pg_autovacuum_tuple_relid(Relation avRel, Oid relid);
  static void autovac_report_activity(VacuumStmt *vacstmt, Oid relid);
  static void avl_sighup_handler(SIGNAL_ARGS);
  static void avlauncher_shutdown(SIGNAL_ARGS);
*** do_autovacuum(Oid dbid)
*** 872,879 
  	HeapTuple	tuple;
  	HeapScanDesc relScan;
  	Form_pg_database dbForm;
! 	List	   *vacuum_tables = NIL;
! 	List	   *toast_table_ids = NIL;
  	ListCell   *cell;
  	PgStat_StatDBEntry *shared;
  	PgStat_StatDBEntry *dbentry;
--- 887,895 
  	HeapTuple	tuple;
  	HeapScanDesc relScan;
  	

Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-27 Thread Simon Riggs
On Tue, 2007-03-27 at 11:52 +0900, Koichi Suzuki wrote:

 Here's an update of a code to improve full page writes as proposed in
 
 http://archives.postgresql.org/pgsql-hackers/2007-01/msg01491.php
 and
 http://archives.postgresql.org/pgsql-patches/2007-01/msg00607.php
 
 Update includes some modification for error handling in archiver and
 restoration command.
 
 In the previous threads, I posted several evaluation and shown that we
 can keep all the full page writes needed for full XLOG crash recovery,
 while compressing archive log size considerably better than gzip, with
 less CPU consumption.  I've found no further objection for this proposal
 but still would like to hear comments/opinions/advices.

Koichi-san,

Looks interesting. I like the small amount of code to do this.

A few thoughts:

- Not sure why we need full_page_compress, why not just mark them
always? That harms noone. (Did someone else ask for that? If so, keep
it)

- OTOH I'd like to see an explicit parameter set during recovery since
you're asking the main recovery path to act differently in case a single
bit is set/unset. If you are using that form of recovery, we should say
so explicitly, to keep everybody else safe.

- I'd rather mark just the nonremovable blocks. But no real difference

- We definitely don't want an normal elog in a XLogInsert critical
section, especially at DEBUG1 level

- diff -c format is easier and the standard

- pg_logarchive and pg_logrestore should be called by a name that
reflects what they actually do. Possibly pg_compresslog and
pg_decompresslog etc.. I've not reviewed those programs, but:

- Not sure why we have to compress away page headers. Touch as little as
you can has always been my thinking with recovery code.

- I'm very uncomfortable with touching the LSN. Maybe I misunderstand?

- Have you thought about how pg_standby would integrate with this
option? Can you please?

- I'll do some docs for this after Freeze, if you'd like. I have some
other changes to make there, so I can do this at the same time.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [PATCH] add CLUSTER table ORDER BY index

2007-03-27 Thread Gregory Stark
Holger Schurig [EMAIL PROTECTED] writes:

 * psql tab-completion, it favours now CLUSTER table ORDER BY index

It occurs to me (sorry that I didn't think of this earlier) that if we're
going to use ORDER BY it really ought to take a list columns. It would be
more consistent with what ORDER BY means in queries and one day we may want to
add support for ordering by arbitrary lists of columns even if no index
exists.

It may be reasonable to allow both to coexist and just have an arbitrary rule
that if an index of the specified name exists takes precedence over any
columns. I've never seen anyone name their indexes in a way that would
conflict with a column anyways.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] autovacuum: multiple workers

2007-03-27 Thread Alvaro Herrera
Hi,

This is the patch to put multiple workers into autovacuum.  This patch
applies after the recheck patch I just posted.

The main change is to have an array of Worker structs in shared memory;
each worker checks the current table of all other Workers, and skips a
table that's being vacuumed by any of them.  It also rechecks the table
before vacuuming, which removes the problem of redundant vacuuming.

It also introduces the business of SIGUSR1 between workers and launcher.
The launcher keeps a database list in memory and schedules workers to
vacuum databases depending on that list.  The actual database selected
may differ from what was in the schedule; in that case, the list is
reconstructed.

There are two main FIXMEs in this code:

1. have the list reconstruction and scheduling be smarter so that
databases are not ganged together in the schedule.  The only difficulty
is keeping the sort order that the databases had.

2. have a way to clean up after failed workers filling up the Worker
array and thus starving other databases from vacuuming.  I don't really
know a way to do this that works in all cases.  The only idea I have so
far is that workers that started more than autovacuum_naptime seconds
ago are considered failed to start.


Neither of these is really minor, but I think they are solvable.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
*** 14recheck/src/backend/postmaster/autovacuum.c	2007-03-27 16:43:31.0 -0400
--- 12vacuum/src/backend/postmaster/autovacuum.c	2007-03-27 17:40:19.0 -0400
***
*** 52,57 
--- 52,58 
  #include utils/syscache.h
  
  
+ static volatile sig_atomic_t got_SIGUSR1 = false;
  static volatile sig_atomic_t got_SIGHUP = false;
  static volatile sig_atomic_t avlauncher_shutdown_request = false;
  
***
*** 59,64 
--- 60,66 
   * GUC parameters
   */
  bool		autovacuum_start_daemon = false;
+ int			autovacuum_max_workers;
  int			autovacuum_naptime;
  int			autovacuum_vac_thresh;
  double		autovacuum_vac_scale;
***
*** 69,75 
  int			autovacuum_vac_cost_delay;
  int			autovacuum_vac_cost_limit;
  
! /* Flag to tell if we are in the autovacuum daemon process */
  static bool am_autovacuum_launcher = false;
  static bool am_autovacuum_worker = false;
  
--- 71,77 
  int			autovacuum_vac_cost_delay;
  int			autovacuum_vac_cost_limit;
  
! /* Flags to tell if we are in an autovacuum process */
  static bool am_autovacuum_launcher = false;
  static bool am_autovacuum_worker = false;
  
***
*** 86,91 
--- 88,94 
  typedef struct autovac_dbase
  {
  	Oid			ad_datid;
+ 	TimestampTz	ad_next_worker;
  	char	   *ad_name;
  	TransactionId ad_frozenxid;
  	PgStat_StatDBEntry *ad_entry;
***
*** 110,123 
  	int			at_vacuum_cost_limit;
  } autovac_table;
  
  typedef struct
  {
! 	Oid		process_db;			/* OID of database to process */
! 	int		worker_pid;			/* PID of the worker process, if any */
  } AutoVacuumShmemStruct;
  
  static AutoVacuumShmemStruct *AutoVacuumShmem;
  
  #ifdef EXEC_BACKEND
  static pid_t avlauncher_forkexec(void);
  static pid_t avworker_forkexec(void);
--- 113,158 
  	int			at_vacuum_cost_limit;
  } autovac_table;
  
+ /*-
+  * This struct holds information about a single worker's whereabouts.  We keep
+  * an array of these in shared memory, sized according to
+  * autovacuum_max_workers.
+  *
+  * wi_dboid		OID of the database this worker is supposed to work on
+  * wi_tableoid	OID of the table currently being vacuumed
+  * wi_workerpid	PID of the running worker, 0 if not yet started
+  * wi_finished	True when the worker is done and about to exit
+  *-
+  */
+ typedef struct
+ {
+ 	Oid			wi_dboid;
+ 	Oid			wi_tableoid;
+ 	int			wi_workerpid;
+ 	bool		wi_finished;
+ } WorkerInfo;
+ 
  typedef struct
  {
! 	pid_t		av_launcherpid;
! 	WorkerInfo	av_workers[1];
! 	/* VARIABLE LENGTH STRUCT */
  } AutoVacuumShmemStruct;
  
+ /* Macro to iterate over all workers.  Beware multiple evaluation of args! */
+ #define foreach_worker(_i, _worker) \
+ 	_worker = (WorkerInfo *) (AutoVacuumShmem + \
+ 			  offsetof(AutoVacuumShmemStruct, av_workers)); \
+ 	for (_i = 0; _i  autovacuum_max_workers; _i++, _worker += sizeof(WorkerInfo))
+ 
  static AutoVacuumShmemStruct *AutoVacuumShmem;
  
+ /* number of currently free worker slots; only valid in the launcher */
+ static int free_workers;
+ /* the database list in the launcher, and the context that contains it */
+ static Dllist *DatabaseList = NULL;
+ static MemoryContext DatabaseListCxt = NULL;
+ 
  #ifdef EXEC_BACKEND
  static pid_t avlauncher_forkexec(void);
  static pid_t avworker_forkexec(void);
***
*** 125,133 
  NON_EXEC_STATIC void AutoVacWorkerMain(int argc, char *argv[]);
  NON_EXEC_STATIC void AutoVacLauncherMain(int argc, char *argv[]);
  
! static void 

Re: [PATCHES] Patch for circular buffer in tuplestore to optimizemerge joins (v1)

2007-03-27 Thread Simon Riggs
On Tue, 2007-03-27 at 13:32 +0100, stark wrote:

 This patch implements a circular buffer in tuplestore which drops old tuples
 as they're no longer needed. It uses this for merge joins to avoid having to
 spill the tuplestore if no single value exceeds work_mem. It also is what's
 needed for both recursive query support and OLAP window functions (hence why
 it implements the more complex circular buffer rather than just moving the
 single tuple up to the head of the buffer).
 
 This was mostly already done by Simon, I just finished the logic in 
 tuplesort.c.

Cool. 

This item was previously discussed here:
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00096.php

(this changes tuplestore.c not tuplesort.c, though its clear in the
patch)

 This is actually not quite polished so I guess it's still a WIP but it's
 certainly ready to be reviewed. All that remains is polishing. If there's
 anything in there people object to now I would like to know. 

I guess a performance test would be a great eyecatcher here, but it does
seem otherwise complete. Any eager merge join testers out there?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] autovacuum: multiple workers

2007-03-27 Thread Simon Riggs
On Tue, 2007-03-27 at 17:41 -0400, Alvaro Herrera wrote:

 The main change is to have an array of Worker structs in shared memory;
 each worker checks the current table of all other Workers, and skips a
 table that's being vacuumed by any of them.  It also rechecks the table
 before vacuuming, which removes the problem of redundant vacuuming.

Slightly OT: Personally, I'd like it if we added an array for all
special backends, with configurable behaviour. That way it would be
easier to have multiple copies of other backends of any flavour using
the same code, as well as adding others without cutting and pasting each
time. That part of the postmaster code has oozed sideways in the past
few years and seems in need of some love. (A former sinner repents).

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [PATCH] add CLUSTER table ORDER BY index

2007-03-27 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Holger Schurig [EMAIL PROTECTED] writes:
 * psql tab-completion, it favours now CLUSTER table ORDER BY index

 It occurs to me (sorry that I didn't think of this earlier) that if we're
 going to use ORDER BY it really ought to take a list columns.

Surely you jest.  The point is to be ordered the same as the index, no?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PATCHES] Small code clean-up

2007-03-27 Thread ITAGAKI Takahiro
Here are two small code clean-up in initdb and win32_shmem.

pg_char_to_encoding() was redundant in initdb because
pg_valid_server_encoding() returns the same result if the encoding is valid,

Changes in win32_shmem suppress the following warnings.
| pg_shmem.c: In function `PGSharedMemoryCreate':
| pg_shmem.c:137: warning: long unsigned int format, Size arg (arg 2)
| pg_shmem.c:159: warning: long unsigned int format, Size arg (arg 2)


*** initdb.c.orig   Mon Mar 19 01:50:43 2007
--- initdb.cWed Mar 28 10:02:42 2007
*** get_encoding_id(char *encoding_name)
*** 709,716 
  
if (encoding_name  *encoding_name)
{
!   if ((enc = pg_char_to_encoding(encoding_name)) = 0 
!   pg_valid_server_encoding(encoding_name) = 0)
return encodingid_to_string(enc);
}
fprintf(stderr, _(%s: \%s\ is not a valid server encoding name\n),
--- 709,715 
  
if (encoding_name  *encoding_name)
{
!   if ((enc = pg_valid_server_encoding(encoding_name)) = 0)
return encodingid_to_string(enc);
}
fprintf(stderr, _(%s: \%s\ is not a valid server encoding name\n),


*** win32_shmem.c.orig  Wed Mar 28 10:17:14 2007
--- win32_shmem.c   Wed Mar 28 10:16:36 2007
*** PGSharedMemoryCreate(Size size, bool mak
*** 136,142 
if (!hmap)
ereport(FATAL,
(errmsg(could not create shared memory 
segment: %lu, GetLastError()),
!errdetail(Failed system call was 
CreateFileMapping(size=%lu, name=%s), size, szShareMem)));
  
/*
 * If the segment already existed, CreateFileMapping() will return a
--- 136,142 
if (!hmap)
ereport(FATAL,
(errmsg(could not create shared memory 
segment: %lu, GetLastError()),
!errdetail(Failed system call was 
CreateFileMapping(size=%lu, name=%s), (unsigned long) size, szShareMem)));
  
/*
 * If the segment already existed, CreateFileMapping() will return a
*** PGSharedMemoryCreate(Size size, bool mak
*** 158,164 
if (!hmap)
ereport(FATAL,
(errmsg(could not create shared memory 
segment: %lu, GetLastError()),
!errdetail(Failed system call was 
CreateFileMapping(size=%lu, name=%s), size, szShareMem)));
  
if (GetLastError() == ERROR_ALREADY_EXISTS)
ereport(FATAL,
--- 158,164 
if (!hmap)
ereport(FATAL,
(errmsg(could not create shared memory 
segment: %lu, GetLastError()),
!errdetail(Failed system call was 
CreateFileMapping(size=%lu, name=%s), (unsigned long) size, szShareMem)));
  
if (GetLastError() == ERROR_ALREADY_EXISTS)
ereport(FATAL,


Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-27 Thread Koichi Suzuki

Simon;

Thanks a lot for your comments/advices. I'd like to write some feedback.

Simon Riggs wrote:

On Tue, 2007-03-27 at 11:52 +0900, Koichi Suzuki wrote:


Here's an update of a code to improve full page writes as proposed in

http://archives.postgresql.org/pgsql-hackers/2007-01/msg01491.php
and
http://archives.postgresql.org/pgsql-patches/2007-01/msg00607.php

Update includes some modification for error handling in archiver and
restoration command.

In the previous threads, I posted several evaluation and shown that we
can keep all the full page writes needed for full XLOG crash recovery,
while compressing archive log size considerably better than gzip, with
less CPU consumption.  I've found no further objection for this proposal
but still would like to hear comments/opinions/advices.


Koichi-san,

Looks interesting. I like the small amount of code to do this.

A few thoughts:

- Not sure why we need full_page_compress, why not just mark them
always? That harms noone. (Did someone else ask for that? If so, keep
it)


No, no one asked to have a separate option. There'll be no bad
influence to do so. As written below, full page write can be
categolized as follows:

1) Needed for crash recovery: first page update after each checkpoint.
This has to be kept in WAL.

2) Needed for archive recovery: page update between pg_start_backup and
pg_stop_backup. This has to be kept in archive log.

3) For log-shipping slave such as pg_standby: no full page writes will
be needed for this purpose.

My proposal deals with 2). So, if we mark each full_page_write, I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.


- OTOH I'd like to see an explicit parameter set during recovery since
you're asking the main recovery path to act differently in case a single
bit is set/unset. If you are using that form of recovery, we should say
so explicitly, to keep everybody else safe.


Only one thing I had to do is to create dummy full page write to
maintain LSNs. Full page writes are omitted in archive log. We have to
LSNs same as those in the original WAL. In this case, recovery has to
read logical log, not dummy full page writes. On the other hand, if
both logical log and real full page writes are found in a log record,
the recovery has to use real full page writes.


- I'd rather mark just the nonremovable blocks. But no real difference


It sound nicer. According to the full page write categories above, we 
can mark full page writes as needed in crash recovery or archive 
recovery.   Please give some feedback to the above full page write

categories.



- We definitely don't want an normal elog in a XLogInsert critical
section, especially at DEBUG1 level


I agree. I'll remove elog calls from critical sections.


- diff -c format is easier and the standard


I'll change the diff option.


- pg_logarchive and pg_logrestore should be called by a name that
reflects what they actually do. Possibly pg_compresslog and
pg_decompresslog etc.. I've not reviewed those programs, but:


I wasn't careful to have command names more based on what to be done. 
I'll change the command name.




- Not sure why we have to compress away page headers. Touch as little as
you can has always been my thinking with recovery code.


Eliminating page headers gives compression rate slightly better, a 
couple of percents.   To make code simpler, I'll drop this compression.




- I'm very uncomfortable with touching the LSN. Maybe I misunderstand?


The reason why we need pg_logarchive (or pg_decompresslog) is to
maintain LSN the same as those in the original WAL. For this purpose,
we need dummy full page write. I don't like to touch LSN either and
this is the reason why pg_decompresslog need some extra work,
eliminating many other changes in the recovery.



- Have you thought about how pg_standby would integrate with this
option? Can you please?


Yes I believe so. As pg_standby does not include any chance to meet
partial writes of pages, I believe you can omit all the full page
writes. Of course, as Tom Lange suggested in
http://archives.postgresql.org/pgsql-hackers/2007-02/msg00034.php
removing full page writes can lose a chance to recover from
partial/inconsisitent writes in the crash of pg_standby. In this case,
we have to import a backup and archive logs (with full page writes
during the backup) to recover. (We have to import them when the file
system crashes anyway). If it's okay, I believe
pg_compresslog/pg_decompresslog can be integrated with pg_standby.

Maybe we can work together to include pg_compresslog/pg_decompresslog in 
pg_standby.




- I'll do some docs for this after Freeze, if you'd like. I have some
other changes to make there, so I can do this at the same time.


I'll be looking forward to your writings.

Best regards;

--
Koichi Suzuki

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to