Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-30 Thread Zdenek Kotala

Zoltan Boszormenyi wrote:



We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL
with the new GCC without --enable-integer-datetimes and it fixed
the problem we experienced. It seems that my suspicion was right:
GCC-3.4.3 on Solaris 10/Sparc is buggy.



I tried original S10 gcc (3.4.3) on two different machine with different 
kernel update and both work fine. In term of our offlist communication 
and Tom's mention, It looks more as problem in linking/loading. Maybe 
some libraries mismatch. I'm not able say more without core.


Zdenek


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-30 Thread Zeugswetter Andreas ADI SD

 Without a switch, because both full page writes and 
 corresponding logical log is included in WAL, this will 
 increase WAL size slightly 
 (maybe about five percent or so).   If everybody is happy 
 with this, we 
 don't need a switch.

Sorry, I still don't understand that. What is the corresponding logical
log ?
It seems to me, that a full page WAL record has enough info to produce a

dummy LSN WAL entry. So insead of just cutting the full page wal record
you 
could replace it with a LSN WAL entry when archiving the log.

Then all that is needed is the one flag, no extra space ?

Andreas


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Autovacuum vs statement_timeout

2007-03-30 Thread Heikki Linnakangas

Tom Lane wrote:

I seem to remember that we'd agreed that autovacuum should ignore any
globally set statement_timeout, on the grounds that a poorly chosen
setting could indefinitely prevent large tables from being vacuumed.
But I do not see anything in autovacuum.c that resets the variable.
Am I just being blind?  (Quite possible, as I'm tired and under the
weather.)

The thing that brought this to mind was the idea that Mark
Shuttleworth's open problem might be triggered in part by a statement
timeout interrupting autovacuum at an inopportune point --- some logs
he sent me offlist show that he is using statement_timeout ...


statement_timeout interrupts seem to go through the PG_CATCH-block and 
clean up the entry from the vacuum cycle array as they should. But a 
SIGINT leading to a terminating connection due to administrator 
command error does not.


After the recent change in CVS HEAD, CREATE DATABASE tries to 
kill(SIGINT) any autovacuum process in the template database. That seems 
very dangerous now, it could easily leave stale entries in the cycle id 
array. However, it doesn't explain the Mark Shuttleworth's problem 
because the 8.2 behavior is to throw an source database is being 
accessed by other users error instead of killing autovacuum. Maybe 
there's something else killing autovacuum processes?


I think we need to add the xid of the vacuum transaction in the vacuum 
cycle array, and clean up orphaned entries in _bt_start_vacuum. We're 
going to have a hard time plugging every leak one-by-one otherwise.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-30 Thread Simon Riggs
On Fri, 2007-03-30 at 10:22 +0200, Zeugswetter Andreas ADI SD wrote:
  Without a switch, because both full page writes and 
  corresponding logical log is included in WAL, this will 
  increase WAL size slightly 
  (maybe about five percent or so).   If everybody is happy 
  with this, we 
  don't need a switch.
 
 Sorry, I still don't understand that. What is the corresponding logical
 log ?
 It seems to me, that a full page WAL record has enough info to produce a
 
 dummy LSN WAL entry. So insead of just cutting the full page wal record
 you 
 could replace it with a LSN WAL entry when archiving the log.
 
 Then all that is needed is the one flag, no extra space ?

The full page write is required for crash recovery, but that isn't
required during archive recovery because the base backup provides the
safe base. Archive recovery needs the normal xlog record, which in some
cases has been optimised away because the backup block is present, since
the full block already contains the changes.

If you want to remove the backup blocks, you need to put back the
information that was optimised away, otherwise you won't be able to do
the archive recovery correctly. Hence a slight increase in WAL volume to
allow it to be compressed does make sense.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Minor changes to Recovery related code

2007-03-30 Thread Simon Riggs
I'd like to make the following changes to recovery related code over the
next few days/weeks. If anybody insists I do this by freeze or not at
all, then I'll submit patches for 1,3,4,5,10 before Saturday night. I'd
rather take a bit more time and do this in one drop and there are some
code dependencies between these changes and other patches from
Koichi-san and myself.

1. Current xlog should be archived at shutdown for smart shutdown
- check that archiver is active prior to shutdown checkpoint request
- if (shutdown checkpoint  XLogArchivingActive()) RequestXLogSwitch()
- for smart shutdown, have archiver complete its work before exiting

2. pg_stop_backup() should wait until all archive files are safely
archived before returning

3. Need a %r parameter for restore_command, to allow the restore command
be passed the name of the file containing the last restartpoint. This
will allow the restore command to clean down old archive files more
safely/cleanly in Warm Standby operation.
- change to pg_standby to accept the parameter and use %r rather than -k
parameter

4. Add an option to pg_standby to have it complete all outstanding
archive files after it has been triggered, minimising data loss at the
slight expense of cut-over time

5. log_restartpoint option in recovery.conf
LOG each restorepoint, so can understand whether restartable recovery
will be effective or not

6. refactor recovery.conf so that it uses a GUC-like parser

7. refactor all xlog _desc routines into one module, so these can be
more easily used by xlogviewer utility

8. get xlogviewer utility a freshen-up so it can be part of main
release, possibly including further refactoring of xlog.c

9. Another round of doc updates to highlight the use of pg_standby and
Koichi-san's work. I think Doug Knight might have some additional code
examples to include as well, from previous discussions.

10. Changes to ensure WAL-avoiding operations and hot backups cannot be
executed simultaneously. One of these two options, ISTM:

a) Following a change to archive_command while server is running. Log
the xid of the WAL-avoiding operations when they start and have
pg_start_backup() wait for those xids to complete before continuing.

b) Introduce a new parameter, archive_mode = on | off that can only be
set at server start. If archive_mode = on then XLogArchivingActive();
archiving only takes place when archive_command is not ''. This allows
archive_command to be changed while server running, yet without any
danger from WAL-avoiding operations.

[7  8 would be complete by about 5-6 weeks from now. Others much
earlier]

Comments?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-30 Thread Zeugswetter Andreas ADI SD

 Archive recovery needs the 
 normal xlog record, which in some cases has been optimised 
 away because the backup block is present, since the full 
 block already contains the changes.

Aah, I didn't know that optimization exists.
I agree that removing that optimization is good/ok.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Proposal: include PL/Proxy into core

2007-03-30 Thread Marko Kreen

PL/Proxy is a small PL whose goal is to allow creating
proxy functions that call actual functions in remote database.
Basic design is:

Function body describes how to deduce final database.  Its either

CONNECT 'connstr';   -- connect to exactly this db

or when partitioning is used:

-- partitons are described under that name
CLUSTER 'namestr';

-- calculate int4 based on function paramenters
-- and use that to pick a partition
RUN ON hashtext(username);


Actual function call info (arguments, result fields) are deduced
from looking at its own signature.

so function foo(int4, text) returns setof text will result in
query select * from foo($1::int4, $2::text) to be executed.


Announcement with more examples:

 http://archives.postgresql.org/pgsql-announce/2007-03/msg5.php

Documentation:

 https://developer.skype.com/SkypeGarage/DbProjects/PlProxy

Patch:

 http://plproxy.projects.postgresql.org/plproxy_core.diff.gz


Now, why put it into core?

1) Much simpler replacement for various other clustering solutions
that try to cluster regular SQL.

2) Nicer replacement for dblink.

3) PLs need much more intimate knowledge of the PostgreSQL core
then regular modules.  API for PLs has been changing every
major revision of PostgreSQL.

4) It promotes the db-access-thru-functions design to databases, that
has proven to be killer feature of PostgreSQL.  In a sense it is
using PostgreSQL as appserver which provides fixed API via
functions for external users, but hides internal layout from them,
so it can be changed invisibly to external users.

5) The language is ready feature-wise - theres no need for it to grow
into Remote PLPGSQL, because all logic can be put into remote function.


Some objections that may be:

1) It is not a universal solves-everything tool for remote access/clustering.

But those solves-everything tools have very hard time maturing,
and will be not exactly simple.   Much better is to have simple
tool that works well.

2) You cant use it for all thing you can use dblink.

PL/Proxy is easier to use for simple result fetching.  For complicated
access using full-blown PLs (plperl, plpython) is better.  From such
POV dblink is replaced.

3) It is possible for PL to live outside, The pain is not that big.

Sure its possible.  We just feel that its usefulness : lines-of-code ratio
is very high, so its worthy of being builtin into PostgreSQL core,
thus also giving PostgreSQL opportunity to boast being
clusterable out-of-box.

4) What about all the existing apps that dont access database thru functions?

Those are target for solves-everything tool...

5) It it too new product.

We think this is offset by the small scope of the task it takes,
and it already works well in that scope.


--
marko

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-30 Thread Richard Huxton

Simon Riggs wrote:

On Fri, 2007-03-30 at 10:22 +0200, Zeugswetter Andreas ADI SD wrote:
Without a switch, because both full page writes and 
corresponding logical log is included in WAL, this will 
increase WAL size slightly 
(maybe about five percent or so).   If everybody is happy 
with this, we 
don't need a switch.

Sorry, I still don't understand that. What is the corresponding logical
log ?
It seems to me, that a full page WAL record has enough info to produce a

dummy LSN WAL entry. So insead of just cutting the full page wal record
you 
could replace it with a LSN WAL entry when archiving the log.


Then all that is needed is the one flag, no extra space ?


The full page write is required for crash recovery, but that isn't
required during archive recovery because the base backup provides the
safe base. 


Is that always true? Could the backup not pick up a partially-written 
page? Assuming it's being written to as the backup is in progress. (We 
are talking about when disk blocks are smaller than PG blocks here, so 
can't guarantee an atomic write for a PG block?)


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: include PL/Proxy into core

2007-03-30 Thread Cédric Villemain
Le vendredi 30 mars 2007 12:36, Marko Kreen a écrit :
 Patch:

   http://plproxy.projects.postgresql.org/plproxy_core.diff.gz
Note a perhaps oversight in your makefile :

+ #REGRESS_OPTS 
= --dbname=$(PL_TESTDB) --load-language=plpgsql --load-language=plproxy
+ REGRESS_OPTS 
= --dbname=regression --load-language=plpgsql --load-language=plproxy

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal: include PL/Proxy into core

2007-03-30 Thread Marko Kreen

On 3/30/07, Cédric Villemain [EMAIL PROTECTED] wrote:

Le vendredi 30 mars 2007 12:36, Marko Kreen a écrit:
 Patch:

   http://plproxy.projects.postgresql.org/plproxy_core.diff.gz
Note a perhaps oversight in your makefile :

+ #REGRESS_OPTS
= --dbname=$(PL_TESTDB) --load-language=plpgsql --load-language=plproxy
+ REGRESS_OPTS
= --dbname=regression --load-language=plpgsql --load-language=plproxy


Heh.  The problem is I had 'regression' hardwired into
regtests, so I could not use $(PL_TESTDB).

If the proposal is accespted and we want to always run
PL/Proxy regtests, there should be some dynamic way
of passing main dbname and also connstrings for partitions
into regression tests.

ATM I thought it can stay as-is.  (Actually I forgot that change
after I had done it :)

--
marko

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-30 Thread Simon Riggs
On Fri, 2007-03-30 at 11:27 +0100, Richard Huxton wrote:

 Is that always true? Could the backup not pick up a partially-written 
 page? Assuming it's being written to as the backup is in progress. (We 
 are talking about when disk blocks are smaller than PG blocks here, so 
 can't guarantee an atomic write for a PG block?)

Any page written during a backup has a backup block that would not be
removable by Koichi's tool, so yes, you'd still be safe.

i.e. between pg_start_backup() and pg_stop_backup() we always use full
page writes, even if you are running in full_page_writes=off mode.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] UPDATE using sub selects

2007-03-30 Thread NikhilS

Hi,

No.  Offhand I think you'd either need to relax EXPR_SUBLINK to allow

 multiple output columns, or invent a ROW_SUBLINK SubLinkType that is
 just like EXPR_SUBLINK except for allowing multiple output columns.
 The latter would probably be less likely to break other things...

Yeah, was looking at EXPR_SUBLINK and its single column use case and drove
to the same conclusion that inventing a new sublink type would be better
too. It is indeed becoming a not so simple and narrow fix as you had
mentioned earlier in your first response :)




I have invented a ROWEXPR_SUBLINK type that handles multiple output columns.
The trouble is that since eventually columns of the parents have to be part
of the query's targetList, I am sending the entire subquery as one of the
entries in that list and the targetList gets populated with entries
dependent on the subquery much later via make_subplan.

This breaks code in rewriteTargetList (which expects every list entry to be
of type TargetEntry), and expand_targetlist (which expects the targets to be
present in attrno order, the entries added because of the subquery will not
be in order as compared to normal SET colname = expr targets).

Is there a simpler way of doing things? Should I try generating a resjunk
TargetEntry in transformUpdateStmt and have its expr point to the subquery
and see if that works?

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-30 Thread Richard Huxton

Simon Riggs wrote:

On Fri, 2007-03-30 at 11:27 +0100, Richard Huxton wrote:

Is that always true? Could the backup not pick up a partially-written 
page? Assuming it's being written to as the backup is in progress. (We 
are talking about when disk blocks are smaller than PG blocks here, so 
can't guarantee an atomic write for a PG block?)


Any page written during a backup has a backup block that would not be
removable by Koichi's tool, so yes, you'd still be safe.

i.e. between pg_start_backup() and pg_stop_backup() we always use full
page writes, even if you are running in full_page_writes=off mode.


Ah, that's OK then.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: include PL/Proxy into core

2007-03-30 Thread Hannu Krosing
Ühel kenal päeval, R, 2007-03-30 kell 13:36, kirjutas Marko Kreen:
 PL/Proxy is a small PL whose goal is to allow creating
 proxy functions that call actual functions in remote database.
 Basic design is:
 
 Function body describes how to deduce final database.  Its either
 
  CONNECT 'connstr';   -- connect to exactly this db
 
 or when partitioning is used:

  -- partitons are described under that name
  CLUSTER 'namestr';
 
  -- calculate int4 based on function paramenters
  -- and use that to pick a partition
  RUN ON hashtext(username);
 
 
 Actual function call info (arguments, result fields) are deduced
 from looking at its own signature.
 
 so function foo(int4, text) returns setof text will result in
 query select * from foo($1::int4, $2::text) to be executed.

 
 Announcement with more examples:
 
   http://archives.postgresql.org/pgsql-announce/2007-03/msg5.php
 
 Documentation:
 
   https://developer.skype.com/SkypeGarage/DbProjects/PlProxy
 
 Patch:
 
   http://plproxy.projects.postgresql.org/plproxy_core.diff.gz
 
 
 Now, why put it into core?
 
 1) Much simpler replacement for various other clustering solutions
 that try to cluster regular SQL.
 
 2) Nicer replacement for dblink.
 
 3) PLs need much more intimate knowledge of the PostgreSQL core
 then regular modules.  API for PLs has been changing every
 major revision of PostgreSQL.
 
 4) It promotes the db-access-thru-functions design to databases, that
 has proven to be killer feature of PostgreSQL.  In a sense it is
 using PostgreSQL as appserver which provides fixed API via
 functions for external users, but hides internal layout from them,
 so it can be changed invisibly to external users.
 
 5) The language is ready feature-wise - theres no need for it to grow
 into Remote PLPGSQL, because all logic can be put into remote function.
 
 
 Some objections that may be:
 
 1) It is not a universal solves-everything tool for remote access/clustering.
 
 But those solves-everything tools have very hard time maturing,
 and will be not exactly simple.   Much better is to have simple
 tool that works well.

current pl/proxy proposed here for inclusion is already an almost
complete redesign and rewrite based on our experiences of using the
initial version in production databases, so you can expect ver 2.x
robustness, maintainability and code cleanness from it.

 5) It it too new product.
 
 We think this is offset by the small scope of the task it takes,
 and it already works well in that scope.

Also, it is actively used serving thousands of requests per second in a
24/7 live environment, which means that it should be reasonably well
tested.

Together with our lightweight connection pooler
https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer pl/proxy
can be used to implement the vision of building a DB-bus over a
database farm of diverse postgresql servers as shown in SLIDE3:  of
https://developer.skype.com/SkypeGarage/DbProjects/SkypePostgresqlWhitepaper .

The connection pooler is not strictly needed and can be left out for
smaller configurations with maybe less than  about 10 databases and/or
concurrent db connections.

(btw, the connection poolers name PgBouncer comes from its initial focus
of bouncing around single-transaction db calls.)

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Proposal: Adding CP51932 encoding

2007-03-30 Thread Tatsuo Ishii
Hi,

Here is a proposal to add a client side encoding CP51932. CP51932 is
an encoding for Japanese, defined by Microsoft. It is similar to
existing EUC_JP encoding but some characters are mapped
differently. Unfortunately for most Windows apps EUC-JP means
CP51932. As a result, for example, if a web page is encoded with
EUC-JP, web browsers will encode users' input as CP51932 and save data
to PostgreSQL database as CP51932, which is not what PostgreSQL
expects of course. Adding CP51932 -- EUC_JP conversion should solve
the problem (CP51932 -- UTF-8 and SJIS should be added too).

If there's no objection, patches for current will be posted for a
review (patches have been develpped by a Japanese developer, not me).

Comments and suggestions are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Autovacuum vs statement_timeout

2007-03-30 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 statement_timeout interrupts seem to go through the PG_CATCH-block and 
 clean up the entry from the vacuum cycle array as they should. But a 
 SIGINT leading to a terminating connection due to administrator 
 command error does not.

Hm, that's an interesting thought, but there are no terminating
connection messages in Shuttleworth's logs either.  So we still lack
the right idea there.  (BTW it would be SIGTERM not SIGINT.)

 I think we need to add the xid of the vacuum transaction in the vacuum 
 cycle array, and clean up orphaned entries in _bt_start_vacuum. We're 
 going to have a hard time plugging every leak one-by-one otherwise.

You're thinking too small --- what this thought actually suggests is
that PG_CATCH can't be used to clean up shared memory at all, and I
don't think we want to accept that.  (I see several other places already
where we assume we can do that.  We could convert each one into an
on_proc_exit cleanup operation, maybe, but that seems messy and not very
scalable.)  I'm thinking we may want to redesign elog(FATAL) processing
so that we escape out to the outer level before calling proc_exit,
thereby allowing CATCH blocks to run first.

Note for the archives: I've argued for some time that SIGTERM'ing
individual backends is an insufficiently tested code path to be exposed
as standard functionality.  Looks like that's still true.  This is not
a bug for database shutdown because we don't really care if we leave
perfectly clean shared memory behind --- it's only a bug if you try to
SIGTERM an individual vacuum process while leaving the system up.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes:
 How about storing the snapshot which we used during planning in
 CachedPlanSource, if at least one index was seen unusable because
 its CREATE INDEX transaction was seen as in-progress ?

I'm getting tired of repeating this, but: the planner doesn't use a
snapshot.  System catalogs run on SnapshotNow.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Pavan Deolasee

Tom Lane wrote:

 I'm getting tired of repeating this, but: the planner doesn't use a
 snapshot.  System catalogs run on SnapshotNow.


I am really sorry if I sound foolish here. I am NOT suggesting
that we use snapshot to read system catalogs. I understand
that system catalogs run on SnapshotNow and all transactions,
irrespective of when they started, would see the changes to
system catalogs as soon as the transaction updating the
system catalog commits.

What I am suggesting is to use ActiveSnapshot (actually
Florian's idea) to decide whether the transaction that created
index was still running when we started. Isn't it the case that
some snapshot will be active when we plan ? The active
snapshot may change later in the same transaction if we
are running in read-committed mode, and we may need to
invalidate the plan.

Here is what I suggest to do in get_relation_info():

+if (index-indcreatexid != InvalidTransactionId)
+{
+Assert(ActiveSnapshot);
+if (XidInMVCCSnapshot(index-indcreatexid, ActiveSnapshot))
+{
+index_close(indexRelation, NoLock);
+continue;
+}
+/*
+ * Otherwise the index is usable
+ */
+}


Is there a problem with this ?

I really appreciate all the help I am receiving on this. But
there is hardly anything else that I can do than post my
thoughts and get feedback, until we find a clean solution :-(

Thanks,
Pavan


--


EnterpriseDBhttp://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Florian G. Pflug

Tom Lane wrote:

Pavan Deolasee [EMAIL PROTECTED] writes:

How about storing the snapshot which we used during planning in
CachedPlanSource, if at least one index was seen unusable because
its CREATE INDEX transaction was seen as in-progress ?


I'm getting tired of repeating this, but: the planner doesn't use a
snapshot.  System catalogs run on SnapshotNow.


But it would still do that - it would just compare the createxid of
the index against some snapshot, and the query would be replanned
if the cached result of this comparison differs from the one the
current snapshot yields.

It might well be that this won't work, because the planner is invoked
in situations where there is no active snapshot - I'm not sure if your 
comment refers to that case, or not.


greetings, Florian Pflug

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [pgsql-www] Help: reading the source

2007-03-30 Thread Andrew Hammond

1) Wrong list, you want -hackers.
2) Did you have a specific question based on what you have read or are
you asking for suggested reading?

On 3/29/07, 李宁 [EMAIL PROTECTED] wrote:

Dear everyone,

   I'm a college student,now I am doing the topic about the postgresql storage
management as my thesis. I wish I can get help here about where should I begin. 
I
am reading the book Database System Implementation and reading the source
code,mainly in storage/smgr,I hope someone can give me some suggestion or
instruction.

   Thank you!

   Sincerely yours
  ninglee



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes:
 What I am suggesting is to use ActiveSnapshot (actually
 Florian's idea) to decide whether the transaction that created
 index was still running when we started. Isn't it the case that
 some snapshot will be active when we plan ?

I do not think you can assume that the plan won't be used later with
some older snapshot.  Consider recursive plpgsql functions for a
counterexample: the inner occurrence might be the first to arrive at
a given line of the function, hence the first to plan it, yet when we
return to the outer instance we might revert to an older snapshot.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Oracle indemnifies PostgreSQL on its patents

2007-03-30 Thread Bruce Momjian
Here is a surprising article about how Oracle has made open source
projects, like PostgreSQL, safe from claims of infringing Oracle
patents:


http://www.cbronline.com/article_news.asp?guid=A0F5F220-5940-470D-8564-CEA7E2D2B954.
 Oracle, like IBM, Sony, RedHat, and Novell, is now a member of

Oracle, like IBM, Sony, RedHat, and Novell, is now a member of the Open
Invention Network, whose members all offer patent indemnification.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Florian G. Pflug

Tom Lane wrote:

Pavan Deolasee [EMAIL PROTECTED] writes:

What I am suggesting is to use ActiveSnapshot (actually
Florian's idea) to decide whether the transaction that created
index was still running when we started. Isn't it the case that
some snapshot will be active when we plan ?


I do not think you can assume that the plan won't be used later with
some older snapshot.  Consider recursive plpgsql functions for a
counterexample: the inner occurrence might be the first to arrive at
a given line of the function, hence the first to plan it, yet when we
return to the outer instance we might revert to an older snapshot.


So maybe we'd need to use the SerializableSnapshot created at the start
of each transaction for this check, and not the ActiveSnapshot? Could
that work?

What about doing
PREPARE myplan select ... ;
outside of a transaction? Will this be execute inside a transaction?
Is is a query always planned upon it's first execution, and not when
PREPARE is issued?

greetings, Florian Pflug



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Pavan Deolasee

On 3/30/07, Florian G. Pflug [EMAIL PROTECTED] wrote:




What about doing
PREPARE myplan select ... ;
outside of a transaction? Will this be execute inside a transaction?



I checked that. PREPARE runs with ActiveSnapshot set.

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I do not think you can assume that the plan won't be used later with
 some older snapshot.

 So maybe we'd need to use the SerializableSnapshot created at the start
 of each transaction for this check, and not the ActiveSnapshot? Could
 that work?

That might work, but it doesn't seem to address the core objection:
there's no mechanism to cause the query to be replanned once the
snapshot is new enough, because no relcache inval will happen.  So
most likely existing backends will keep using old plans that don't
consider the index.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Simon Riggs
On Fri, 2007-03-30 at 11:44 +0530, Pavan Deolasee wrote:
 ISTM that the run-another-transaction-afterwards idea would have same
 problem with plan invalidation. When the second transaction commits,
 the relcache invalidation event is generated. The event may get
 consumed 
 by other backends, but the index may still not be usable to them
 because
 their xid  xcreat. If no more relcache invalidation events are
 generated
 after that, the backends would continue to use the cached plan, even 
 if index becomes usable to them later. 

Sounds like we need to allow create index invalidation events to be
processed at the the end of the current transaction in the *receiving*
backend. That way we don't need to do the run-another-transaction thing
and seems a helluva lot cleaner way of doing this.

Messages of SHAREDINVALRELCACHE_ID, would be augmented by a boolean
deferred flag on the SharedInvalRelcacheMsg struct. Received messages
would be stored in a third kind of InvalidationList, then processed
during AtEOXact_Inval() whether the receiving transaction commits or
not. (see src/backend/utils/cache/inval.c)

Not sure how we'd know to *send* the message marked as deferred, but
seems like we can work that out also.

That seems to allow CCI to not have to wait until the end of time
either.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Pavan Deolasee

On 3/30/07, Tom Lane [EMAIL PROTECTED] wrote:




That might work, but it doesn't seem to address the core objection:
there's no mechanism to cause the query to be replanned once the
snapshot is new enough, because no relcache inval will happen.  So
most likely existing backends will keep using old plans that don't
consider the index.




Can't we store the snapshot (or may be the transaction id) which was
used to plan the query in CachedPlanSource if and only if at least one
index was seen unusable  ? In RevalidateCachedPlan() we then check if
the snapshot has changed and replan the query in that case.

That would make the index usable in the subsequent transactions
in the same session, though we may not be able to use the index
in the same transaction, even if its running in read-commited mode.
Would that be acceptable ?

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Florian G. Pflug

Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

Tom Lane wrote:

I do not think you can assume that the plan won't be used later with
some older snapshot.



So maybe we'd need to use the SerializableSnapshot created at the start
of each transaction for this check, and not the ActiveSnapshot? Could
that work?


That might work, but it doesn't seem to address the core objection:
there's no mechanism to cause the query to be replanned once the
snapshot is new enough, because no relcache inval will happen.  So
most likely existing backends will keep using old plans that don't
consider the index.


Pavan suggested storing the IndexSnapshot in the cached plan, and to 
compare it to the IndexSnapshot when the query is executed.

If those two snapshots differ, the query would be replanned.

My idea was to store a list of xid's together with the cached plan that
are assumed to be uncommitted accoring to the IndexSnapshot. The query
is replanned if upon execution the IndexSnapshot assumes that one of
these xid's is committed.

Those two ideas seem to be mostly equivalent, mine seems to be a bit
more fine-grained, but at the cost of more work upon each query execution.

greetings, Florian Pflug

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Autovacuum vs statement_timeout

2007-03-30 Thread Tom Lane
I wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 statement_timeout interrupts seem to go through the PG_CATCH-block and 
 clean up the entry from the vacuum cycle array as they should. But a 
 SIGINT leading to a terminating connection due to administrator 
 command error does not.

 Hm, that's an interesting thought, but there are no terminating
 connection messages in Shuttleworth's logs either.  So we still lack
 the right idea there.  (BTW it would be SIGTERM not SIGINT.)

Hold it ... stop the presses ... the reason we saw no terminating
connection messages was he was grepping his logs for lines containing
ERROR.  Once we look for FATAL too, there are a pile of 'em.  I'm not
100% convinced that any are from autovacuum processes, but clearly
*something* is throwing SIGTERM around with abandon in his test
environment.  So at this point your theory above looks like a plausible
mechanism for the vacuum cycle array to slowly fill up and eventually
make _bt_start_vacuum fail (or, perhaps, fail sooner than that due to
a repeat vacuum attempt).

 I think we need to add the xid of the vacuum transaction in the vacuum 
 cycle array, and clean up orphaned entries in _bt_start_vacuum. We're 
 going to have a hard time plugging every leak one-by-one otherwise.

 You're thinking too small --- what this thought actually suggests is
 that PG_CATCH can't be used to clean up shared memory at all, and I
 don't think we want to accept that.  (I see several other places already
 where we assume we can do that.  We could convert each one into an
 on_proc_exit cleanup operation, maybe, but that seems messy and not very
 scalable.)  I'm thinking we may want to redesign elog(FATAL) processing
 so that we escape out to the outer level before calling proc_exit,
 thereby allowing CATCH blocks to run first.

I was hoping we could do that just as an 8.3 change, but it's now
starting to look like we might have to back-patch it, depending on how
much we care about surviving random SIGTERM attempts.  I'd like to wait
for some report from Mark about what's causing all the SIGTERMs before
we evaluate that.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Autovacuum vs statement_timeout

2007-03-30 Thread Alvaro Herrera
Tom Lane wrote:
 I wrote:
  Heikki Linnakangas [EMAIL PROTECTED] writes:
  statement_timeout interrupts seem to go through the PG_CATCH-block and 
  clean up the entry from the vacuum cycle array as they should. But a 
  SIGINT leading to a terminating connection due to administrator 
  command error does not.
 
  Hm, that's an interesting thought, but there are no terminating
  connection messages in Shuttleworth's logs either.  So we still lack
  the right idea there.  (BTW it would be SIGTERM not SIGINT.)
 
 Hold it ... stop the presses ... the reason we saw no terminating
 connection messages was he was grepping his logs for lines containing
 ERROR.  Once we look for FATAL too, there are a pile of 'em.  I'm not
 100% convinced that any are from autovacuum processes, but clearly
 *something* is throwing SIGTERM around with abandon in his test
 environment.  So at this point your theory above looks like a plausible
 mechanism for the vacuum cycle array to slowly fill up and eventually
 make _bt_start_vacuum fail (or, perhaps, fail sooner than that due to
 a repeat vacuum attempt).

Hmmm, remember that DatabaseCancelAutovacuumActivity is called on CREATE
DATABASE; but what it does is send SIGINT, not SIGTERM.  Also, it's not
in 8.2.  SIGINT does terminate the autovac process however.

I haven't read the whole problem report completely, so I'm not sure this
has something to do or not.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Pavan Deolasee

On 3/30/07, Florian G. Pflug [EMAIL PROTECTED] wrote:




My idea was to store a list of xid's together with the cached plan that
are assumed to be uncommitted accoring to the IndexSnapshot. The query
is replanned if upon execution the IndexSnapshot assumes that one of
these xid's is committed.



Actually, if we are using Serializable Snapshot then there is no chance
to replan the query before the transaction completes and the next
transaction to start in the session must see the index and hence
we must replan. So it would be enough just to associate a transaction
id with the cached plan. If this xid is set and our transaction id is
different than that, we replan.

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Autovacuum vs statement_timeout

2007-03-30 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Hmmm, remember that DatabaseCancelAutovacuumActivity is called on CREATE
 DATABASE; but what it does is send SIGINT, not SIGTERM.  Also, it's not
 in 8.2.  SIGINT does terminate the autovac process however.
 I haven't read the whole problem report completely, so I'm not sure this
 has something to do or not.

AFAICT, SIGINT should be okay, because it will lead to an ERROR not a
FATAL elog; so control should fall out through the CATCH block before
the autovacuum process quits.  The problem is with FATAL elogs.

Mark reports that the only FATAL lines in his logs are instances of
FATAL:  terminating connection due to administrator command
FATAL:  database launchpad_ftest does not exist
and the latter presumably isn't coming out from within the btree vacuum
code, so I don't see any other explanation for a FATAL exit than SIGTERM.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: include PL/Proxy into core

2007-03-30 Thread Tom Lane
Marko Kreen [EMAIL PROTECTED] writes:
 Now, why put it into core?

I don't think you have made a sufficient case for that.  I think it
should stay as an outside project for awhile and see what sort of
userbase it attracts.  If it becomes sufficiently popular I'd be
willing to consider adding it to core, but that remains to be seen.

We can barely keep up maintaining what's in core now --- we need to
be very strict about adding stuff that doesn't really have to be in
core, and this evidently doesn't, since you've got it working ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Florian G. Pflug

Pavan Deolasee wrote:

On 3/30/07, Florian G. Pflug [EMAIL PROTECTED] wrote:


My idea was to store a list of xid's together with the cached plan that
are assumed to be uncommitted accoring to the IndexSnapshot. The query
is replanned if upon execution the IndexSnapshot assumes that one of
these xid's is committed.



Actually, if we are using Serializable Snapshot then there is no chance
to replan the query before the transaction completes and the next
transaction to start in the session must see the index and hence
we must replan. So it would be enough just to associate a transaction
id with the cached plan. If this xid is set and our transaction id is
different than that, we replan.


I believe this is true for the CREATE INDEX scenario. However, comparing
either the snapshot or the result of xid checks seems like it might
be useful for other things beside CREATE INDEX. I'm specifically 
thinking about TRUNCATE here - the create index + HOT problems sound

quite similar to the problems a non-exclusive-locking TRUNCATE would face.

greetings, Florian Pflug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes:
 Actually, if we are using Serializable Snapshot then there is no chance
 to replan the query before the transaction completes and the next
 transaction to start in the session must see the index and hence
 we must replan. So it would be enough just to associate a transaction
 id with the cached plan. If this xid is set and our transaction id is
 different than that, we replan.

Hm.  So anytime we reject a potentially useful index as being not valid
yet, we mark the plan as only good for this top-level transaction?
That seems possibly workable --- in particular it doesn't get more
complicated as soon as you consider multiple such indexes.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Simon Riggs
On Fri, 2007-03-30 at 13:54 -0400, Tom Lane wrote:
 Pavan Deolasee [EMAIL PROTECTED] writes:
  Actually, if we are using Serializable Snapshot then there is no chance
  to replan the query before the transaction completes and the next
  transaction to start in the session must see the index and hence
  we must replan. So it would be enough just to associate a transaction
  id with the cached plan. If this xid is set and our transaction id is
  different than that, we replan.
 
 Hm.  So anytime we reject a potentially useful index as being not valid
 yet, we mark the plan as only good for this top-level transaction?
 That seems possibly workable --- in particular it doesn't get more
 complicated as soon as you consider multiple such indexes.

I like that because its specific in dealing with the exact issue we have
- it doesn't rely on many other things happening correctly.

...and it also seems to provide a new route to avoiding the CIC wait.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: include PL/Proxy into core

2007-03-30 Thread Josh Berkus
Hannu, Marko,

I, personally, think that it's worth talking about integrating these.  
However, the old versions were definitely NOT ready for integration, and the 
new versions went on the internet like a week ago.  Heck, I haven't even 
downloaded them yet.

Can we address these on the 8.4 timeline?  That will give the rest of us in 
the community time to download, try and debug the new SkyTools.  I know I'm 
planning on testing them and will know a lot more about your code/performance 
in a few months.  Is there a reason why getting PL/proxy into 8.3 is 
critical?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Arrays of Complex Types

2007-03-30 Thread David Fetter
On Fri, Mar 02, 2007 at 03:40:16PM -0800, David Fetter wrote:
 Folks,
 
 I'd like to take the TODO item that reads, Add support for arrays of
 complex types, but before I start patching, I'd like to see whether
 what I'm about to do makes any sense:

After several rounds of patches, it appears that it might be easier to
create a new typtype entry, which I'll tentatively call 'a' because it
seems a little fragile and a lot inelegant and hard to maintain to
have typtype='c' and typrelid=InvalidOid mean, this is an array of
complex types.  I'd like to see about making this new typtype
available for arrays of DOMAINs eventually, but that's not a
requirement right this instant.

What parts of the code would need a once-over?  

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Minor changes to Recovery related code

2007-03-30 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 I'd like to make the following changes to recovery related code over the
 next few days/weeks. If anybody insists I do this by freeze or not at
 all, then I'll submit patches for 1,3,4,5,10 before Saturday night. I'd
 rather take a bit more time and do this in one drop and there are some
 code dependencies between these changes and other patches from
 Koichi-san and myself.

Well, I've got a proposal for a pg_proc change that I haven't even started
coding yet, so personally I won't hold you to having the patch submitted
as long as the design is agreed to before feature freeze.  However:

 2. pg_stop_backup() should wait until all archive files are safely
 archived before returning

Not sure I agree with that one.  If it fails, you can't tell whether the
action is done and it failed while waiting for the archiver, or if you
need to redo it.

 6. refactor recovery.conf so that it uses a GUC-like parser

I would suggest that after feature freeze is not the time for code
beautification efforts like this.  (I rather doubt that it's worth doing
at all actually, but definitely not now.)

 7. refactor all xlog _desc routines into one module, so these can be
 more easily used by xlogviewer utility

Even more so.

 8. get xlogviewer utility a freshen-up so it can be part of main
 release, possibly including further refactoring of xlog.c

This is not happening for 8.3, either.

 10. Changes to ensure WAL-avoiding operations and hot backups cannot be
 executed simultaneously. One of these two options, ISTM:
 b) Introduce a new parameter, archive_mode = on | off that can only be
 set at server start. If archive_mode = on then XLogArchivingActive();
 archiving only takes place when archive_command is not ''. This allows
 archive_command to be changed while server running, yet without any
 danger from WAL-avoiding operations.

I think I'd go with (b) since a lot of people felt it should've been
like that from the beginning, and found the magic empty string
behavior confusing.

 [7  8 would be complete by about 5-6 weeks from now. Others much
 earlier]

We are hoping to go beta in less time than that.  While I'm willing to
cut a little slack, anything you can't submit before about mid-April
is not going to make it into 8.3.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-30 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Any page written during a backup has a backup block that would not be
 removable by Koichi's tool, so yes, you'd still be safe.

How does it know not to do that?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] UPDATE using sub selects

2007-03-30 Thread Tom Lane
NikhilS [EMAIL PROTECTED] writes:
 I have invented a ROWEXPR_SUBLINK type that handles multiple output columns.
 The trouble is that since eventually columns of the parents have to be part
 of the query's targetList, I am sending the entire subquery as one of the
 entries in that list and the targetList gets populated with entries
 dependent on the subquery much later via make_subplan.

 This breaks code in rewriteTargetList (which expects every list entry to be
 of type TargetEntry), and expand_targetlist (which expects the targets to be
 present in attrno order, the entries added because of the subquery will not
 be in order as compared to normal SET colname = expr targets).

Hmm.  That sounds like it would be a horrid mess.  You need to decouple
the execution of the subplan from the use of its outputs, apparently.
There is some precedent for this in the way that InitPlans are handled:
the result of the subplan is stored into a ParamList array entry that's
later referenced by a Param node in the parent's expression tree.  That
would generalize easily enough to setting more than one Param, but I'm
not clear on where you'd want to stick the subplan itself in the plan
tree, nor on what controls how often it needs to get evaluated.

An alternative approach is to put the subplan into the rangetable and
use Vars to reference its outputs.  Again it's not quite clear what
drives re-execution of the subplan.  It strikes me though that an
approach like this might also serve for SQL2003's LATERAL construct,
which'd be a nice thing to support.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Oracle indemnifies PostgreSQL on its patents

2007-03-30 Thread Josh Berkus
Bruce,

 Oracle, like IBM, Sony, RedHat, and Novell, is now a member of the Open
 Invention Network, whose members all offer patent indemnification.

Hey!  We could go back to using ARC!

;-)

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Arrays of Complex Types

2007-03-30 Thread Tom Lane
David Fetter [EMAIL PROTECTED] writes:
 After several rounds of patches, it appears that it might be easier to
 create a new typtype entry, which I'll tentatively call 'a' because it
 seems a little fragile and a lot inelegant and hard to maintain to
 have typtype='c' and typrelid=InvalidOid mean, this is an array of
 complex types.

Uh, wouldn't it be typtype = 'c' and typelem != 0 ?

 I'd like to see about making this new typtype
 available for arrays of DOMAINs eventually, but that's not a
 requirement right this instant.

Hmm.  It might not be a bad idea to switch to 'a' for arrays over
regular scalar types too.  Right now we have some klugy rules involving
looking at typlen to decide whether an array type is a normal array.
(There are also some special subscriptable types like name and point,
which should continue to not use 'a' because they are not general
purpose arrays.  So the 'a' marker wouldn't be entirely redundant with
typelem being nonzero, rather checking for 'a' would replace the places
where we test both typelem and typlen.)  OTOH this is a lot of hacking
for something that I'm not convinced is really needed.

Anyway, the point is that I dislike the idea of doing arrays for complex
types differently from those for scalars --- either both should use a
new typtype, or neither.  If you try to do it differently then you'll
have more complexity, not less, since there are a lot of places that
shouldn't need to care.  get_element_type() is an example.

 What parts of the code would need a once-over?  

A lot :-( ... probably every place that touches typtype or typelem would
need at least a look.  It'd be a good idea to take the opportunity to
start using macros for the values of typtype, as we do for relkind but
for some reason never adopted for typtype.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-03-30 Thread Simon Riggs
On Fri, 2007-03-30 at 16:35 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Any page written during a backup has a backup block that would not be
  removable by Koichi's tool, so yes, you'd still be safe.
 
 How does it know not to do that?

Not sure what you mean, but I'll take a stab...

I originally questioned Koichi-san's request for a full_page_compress
parameter, which is how it would tell whether/not. After explanation, I
accepted the need for a parameter, but I think we're looking for a new
name for it.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Minor changes to Recovery related code

2007-03-30 Thread Simon Riggs
On Fri, 2007-03-30 at 16:34 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  I'd like to make the following changes to recovery related code over the
  next few days/weeks. If anybody insists I do this by freeze or not at
  all, then I'll submit patches for 1,3,4,5,10 before Saturday night. I'd
  rather take a bit more time and do this in one drop and there are some
  code dependencies between these changes and other patches from
  Koichi-san and myself.
 
 Well, I've got a proposal for a pg_proc change that I haven't even started
 coding yet, so personally I won't hold you to having the patch submitted
 as long as the design is agreed to before feature freeze.  However:

Cool

  2. pg_stop_backup() should wait until all archive files are safely
  archived before returning
 
 Not sure I agree with that one.  If it fails, you can't tell whether the
 action is done and it failed while waiting for the archiver, or if you
 need to redo it.

There's a slight delay between pg_stop_backup() completing and the
archiver doing its stuff. Currently if somebody does a -m fast straight
after the pg_stop_backup() the backup may be unusable.

We need a way to plug that small hole.

I suggest that pg_stop_backup() polls once per second until
pg_xlog/archive_status/LOG.ready disappears, in which case it ends
successfully. If it does this for more than 60 seconds it ends
successfully but produces a WARNING.

  6. refactor recovery.conf so that it uses a GUC-like parser
 
 I would suggest that after feature freeze is not the time for code
 beautification efforts like this.  (I rather doubt that it's worth doing
 at all actually, but definitely not now.)

OK

  7. refactor all xlog _desc routines into one module, so these can be
  more easily used by xlogviewer utility
 
 Even more so.

OK

  8. get xlogviewer utility a freshen-up so it can be part of main
  release, possibly including further refactoring of xlog.c
 
 This is not happening for 8.3, either.

OK

  10. Changes to ensure WAL-avoiding operations and hot backups cannot be
  executed simultaneously. One of these two options, ISTM:
  b) Introduce a new parameter, archive_mode = on | off that can only be
  set at server start. If archive_mode = on then XLogArchivingActive();
  archiving only takes place when archive_command is not ''. This allows
  archive_command to be changed while server running, yet without any
  danger from WAL-avoiding operations.
 
 I think I'd go with (b) since a lot of people felt it should've been
 like that from the beginning, and found the magic empty string
 behavior confusing.

OK

 We are hoping to go beta in less time than that.  While I'm willing to
 cut a little slack, anything you can't submit before about mid-April
 is not going to make it into 8.3.

No probs, you struck a line through the more time consuming items. :-)

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Minor changes to Recovery related code

2007-03-30 Thread Florian G. Pflug

Simon Riggs wrote:

On Fri, 2007-03-30 at 16:34 -0400, Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

2. pg_stop_backup() should wait until all archive files are safely
archived before returning

Not sure I agree with that one.  If it fails, you can't tell whether the
action is done and it failed while waiting for the archiver, or if you
need to redo it.


There's a slight delay between pg_stop_backup() completing and the
archiver doing its stuff. Currently if somebody does a -m fast straight
after the pg_stop_backup() the backup may be unusable.

We need a way to plug that small hole.

I suggest that pg_stop_backup() polls once per second until
pg_xlog/archive_status/LOG.ready disappears, in which case it ends
successfully. If it does this for more than 60 seconds it ends
successfully but produces a WARNING.


I fear that ending sucessfully despite having not archived all wals
will make this feature less worthwile. If a dba knows what he is
doing, he can code a perfectly safe backup script using 8.2 too.
He'll just have to check the current wal position after pg_stop_backup(),
(There is a function for that, right?), and wait until the corresponding
wal was archived.

In realitly, however, I feare that most people will just create a script
that does 'echo select pg_stop_backup | psql' or something similar.
If they're a bit more carefull, they will enable ON_ERROR_STOP, and check
the return value of pgsql. I believe that those are the people who would
really benefit from a pg_stop_backup() that waits for archiving to complete.
But they probably won't check for WARNINGs.

Maybe doing it the other way round would be an option?
pg_stop_backup() could wait for the archiver to complete forever, but
spit out a warning every 60 seconds or so WARNING: Still waiting
for wal archiving of wal ??? to complete. If someone really wants
a 60-second timeout, he can just use statement_timeout.

Anyway, just my 0.02 eurocents, maybe I'm totally mistaken about the
postgresql dba's out there...

greetings, Florian Pflug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Last minute mini-proposal (I know, I know) for PQexecf()

2007-03-30 Thread korryd
While cleaning up some pg_migrator code
(http://pgfoundry.org/projects/pg-migrator/) it occurred to me that a
typical libpq client application spends a lot of code constructing SQL
commands.  The code typically looks like this:

a) allocate enough room to hold the command 
b) sprintf( command, text, argument, argument, argument, ... )
c) PQexec( conn, command )
d) free( command )

In most cases, the amount of memory that you allocate in step a) is just
an educated guess.  It's typically more room than you need,
occassionally less room than you need (and you get a buffer overflow
exploit), and it's rarely maintained properly when you modify the
command text (or the argument list).

I'd like to see a new variant on PQexec():

PGresult * PQexecf(PGconn *conn, const char *fmt, ...);

PQexecf() simply performs steps a, b, c, and d for you. And you call it
like this:

PQexecf( conn, text, argument, argument, argument, ... )

PQexecf() is just a wrapper around the already existing
createPQExpBuffer(), enlargePQExpBuffer(), printfPQExpBuffer(), and
PQexec() so it introduces no new code (other than assembling the
wrapper) and doesn't change any existing code.  PQexecf() is similar to
PQexecParams() but it much simpler to use (and should be very familiar
to C programmers).  PQexecf() is not intended as a replacement for
PQprepare() and PQexecPrepared() - you should use prepare/exec when you
want to execute a command many times.

I could eliminate a lot of client-side code if PQexecf() were available
- and the code that I could remove is the code that's most likely to be
buggy and least likely to be properly maintained.

I've thrown together an UNTESTED prototype (below), just to get the idea
across - you'll recognize that most of this code is identical to
printPQExpBuffer().  In the prototype, I'm keeping a static PQExpBuffer
that grows to the hold the largest string ever required by the client
application - that part seems to be a point for discussion, but since
the detail is hidden in the implementation, we could adjust the code
later if necessary (without changing the interface).

Of course, I could include an implementation of PQexecf() in each of my
client applications if it were not available in libpq, but that would be
silly and I'd have to invent my own createPQExpBuffer() /
enlargePQExpBuffer() code since those functions are not an official part
of libpq (and won't even be available to a Win32 client application).

Is it just too late to even think about this for 8.3? (Bruce laughed at
me when I suggested the idea :-)  

   -- Korry
  [EMAIL PROTECTED]
  http://www.enterprisedb.com



PGresult *
PQexecf(PGconn *conn, const char *fmt, ...)
{
static PQExpBuffer str;
va_listargs;

if (str == NULL)
str = createPQExpBuffer();

for (;;)
{
/*
 * Try to format the given string into the available space; but 
if
 * there's hardly any space, don't bother trying, just fall 
through to
 * enlarge the buffer first.
 */
if (str-maxlen  str-len + 16)
{
size_t avail = str-maxlen - str-len - 1;
intnprinted;

va_start(args, fmt);
nprinted = vsnprintf(str-data + str-len, avail, fmt, 
args);
va_end(args);

/*
 * Note: some versions of vsnprintf return the number 
of chars
 * actually stored, but at least one returns -1 on 
failure. Be
 * conservative about believing whether the print 
worked.
 */
if (nprinted = 0  nprinted  (int) avail - 1)
{
/* Success.  Note nprinted does not include 
trailing null. */
str-len += nprinted;
break;
}
}
/* Double the buffer size and try again. */
if (!enlargePQExpBuffer(str, str-maxlen))
return PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR); /* 
oops, out of
memory */
}

return PQexec(conn, str-data);
}



Re: [HACKERS] Last minute mini-proposal (I know, I know) for PQexecf()

2007-03-30 Thread Tom Lane
[EMAIL PROTECTED] writes:
 I'd like to see a new variant on PQexec():
 PGresult * PQexecf(PGconn *conn, const char *fmt, ...);

Way too late for 8.3 --- if we were going to do something like this,
we should think first and program later.  In particular, blindly
adopting the sprintf format string definition doesn't seem very helpful.
The sorts of escapes I'd want to have are properly quoted SQL
identifier, properly quoted SQL literal, etc.  A large fraction of
what sprintf knows about is more or less irrelevant to the task of
creating SQL commands.

Also, how does this interact with parameterized or prepared commands?
If we wanted PQexecf we'd soon want PQexecParamsf, etc.  I don't think
we really want so much duplicate logic there --- it'd be better to
decouple the string-building functionality from the query-sending
functionality.  Probably better to consider something like
PQformatQuery() that passes back a malloc'd string.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] Full page writes improvement, code update

2007-03-30 Thread Koichi Suzuki

Simon;
Tom;

Koichi is writing.

Your question is how to determine WAL record generated between
pg_start_backup and pg_stop_backup and here's an answer.

XLogInsert( ) already has a logic to determine if inserting WAL record
is between pg_start_backup and pg_stop_backup.   Currently it is used
to remove full_page_writes when full_page_writes=off.   We can use
this to mark WAL records.   We have one bit not used in WAL record
header, the last bit of xl_info, where upper four bits are used to
indicate the resource manager and three of the rest are used to
indicate number of full page writes included in the record.

So in my proposal, this unused bit is used to mark that full page
writes must not be removed at offline optimization by pg_complesslog.

Sorry I didn't have mailing list capability from home and have just
completed my subscription from
home.   I had to create new thread to continue my post.  Sorry for confusion.

Please refer to the original thread about this discussion.

Best Regards;

--
--
Koichi Suzuki

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Last minute mini-proposal (I know, I know) for PQexecf()

2007-03-30 Thread Gregory Stark

Hm, my first thought was that you should just be using bind parameters instead
of interpolating variables directly into the query.

But the more I think about it the more I like your idea. It's true that using
parameters takes away most of the use cases for the kind of interface you
suggest. But there are still cases that remain. And in those cases it would be
possible to do it more cleanly and conveniently than with a stock sprintf.

In particular cases like when I want to insert one of a small number of
constants and want to be sure the planner plans and caches separate plans for
each value; or when I want to insert entirely different subexpressions
depending on some parameter; or most commonly of all I want to vary the order
of the ORDER BY expressions and still have every chance of using indexes.

Aside from the convenience I think it would be interesting from an
injection-safety point of view. We can offer a %-escape for string with SQL
quoting and a separate %-escape for unquoted SQL text which is documented as
being the wrong thing to use for user-provided data. And we can ensure that
all escapes except for this raw SQL escape are all injection-safe.

But anything you provide you should provide both in PQexec form and PQprepare
form as well (and I suppose in PQexecParams form). This might seem pointless,
if you're interpolating some values why not interpolate them all? The answer
is that you quite often want to interpolate a few specific values, often
values that don't have many possible values and might affect the plan, but
definitely don't want to interpolate user-provided values that have many
possible values.

A typical example might be something like:

SELECT * 
  FROM invoices 
 WHERE customer_id = ? 
 ORDER BY { order_by_clauses[column_selected] }

You certainly don't want to a plan a new query for every possible user, but
you don't mind caching 5 different plans for the five display columns
depending on which the user has clicked on.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Modifying TOAST thresholds

2007-03-30 Thread Christopher Browne
[EMAIL PROTECTED] (Tom Lane) wrote:
 I'm willing to do the code changes to separate TOAST_THRESHOLD from
 the toast chunk size, but I do not have the time or facilities to do
 any performance testing for different parameter choices.  Anyone
 want to work on that?

What have you got in mind there?  I might be able to do something,
assuming that it doesn't cause heartburn that I'll be offline April
6-14th.

Are we simply talking about having the option of #defining a different
threshold at which items get thrown out to TOAST?  Or one of the more
sophisticated options?  Or is the idea more simply that we might
consider having the default set somewhat lower than it is at present?

And I guess a good question is, what's going to get regarded as a
meaningful test?  I've got a couple local test cases I could draw
from, unfortunately, the interaction with TOAST will more than likely
be pretty trivial, showing off Yeah, cutting the threshold was a good
idea.  And that may not be fair to everyone's case.

[The good news is, of course, that if the end deliverable is a
single #define parameter that's used as the denominator to the
fraction, delivery during beta time is, indeed, quite trivial...]
-- 
output = (cbbrowne @ gmail.com)
http://linuxdatabases.info/info/slony.html
Outside of a dog,  a book is man's best friend. Inside  of a dog, it's
too dark to read. -Groucho Marx

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Pavan Deolasee

On 3/31/07, Simon Riggs [EMAIL PROTECTED] wrote:


On Fri, 2007-03-30 at 13:54 -0400, Tom Lane wrote:


 Hm.  So anytime we reject a potentially useful index as being not valid
 yet, we mark the plan as only good for this top-level transaction?
 That seems possibly workable --- in particular it doesn't get more
 complicated as soon as you consider multiple such indexes.

I like that because its specific in dealing with the exact issue we have
- it doesn't rely on many other things happening correctly.



Ok. Cool. I would finish this work then.


...and it also seems to provide a new route to avoiding the CIC wait.


Yeah, though I would like to take that up later.

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Pavan Deolasee

On 3/30/07, Tom Lane [EMAIL PROTECTED] wrote:



I do not think you can assume that the plan won't be used later with
some older snapshot.  Consider recursive plpgsql functions for a
counterexample: the inner occurrence might be the first to arrive at
a given line of the function, hence the first to plan it, yet when we
return to the outer instance we might revert to an older snapshot.



Thanks for making me aware of such scenarios. What it also means
is that a transaction may use an older snapshot after it created the
index. So to be on safer side, we should not use an index created
in the same transaction if we saw HOT-updated RECENTLY_DEAD
or DELETE_IN_PROGRESS tuples while building the index (with HOT
we don't want to index these tuples). Not such a bad restriction to
live with. May be we can do something with command ids to solve
this once we get the basic stuff ready.

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] UPDATE using sub selects

2007-03-30 Thread NikhilS

Hi,

On 3/31/07, Tom Lane [EMAIL PROTECTED] wrote:


NikhilS [EMAIL PROTECTED] writes:
 I have invented a ROWEXPR_SUBLINK type that handles multiple output
columns.
 The trouble is that since eventually columns of the parents have to be
part
 of the query's targetList, I am sending the entire subquery as one of
the
 entries in that list and the targetList gets populated with entries
 dependent on the subquery much later via make_subplan.

 This breaks code in rewriteTargetList (which expects every list entry to
be
 of type TargetEntry), and expand_targetlist (which expects the targets
to be
 present in attrno order, the entries added because of the subquery will
not
 be in order as compared to normal SET colname = expr targets).

Hmm.  That sounds like it would be a horrid mess.  You need to decouple
the execution of the subplan from the use of its outputs, apparently.
There is some precedent for this in the way that InitPlans are handled:
the result of the subplan is stored into a ParamList array entry that's
later referenced by a Param node in the parent's expression tree.  That
would generalize easily enough to setting more than one Param, but I'm
not clear on where you'd want to stick the subplan itself in the plan
tree, nor on what controls how often it needs to get evaluated.




Yes, I have tried this already. As you suspect, it seems that the subplan
does not get evaluated if its not part of the targetList at all.

An alternative approach is to put the subplan into the rangetable and

use Vars to reference its outputs.  Again it's not quite clear what
drives re-execution of the subplan.  It strikes me though that an
approach like this might also serve for SQL2003's LATERAL construct,
which'd be a nice thing to support.



Ok, I will try this out.

Regards,
Nikhils

--
EnterpriseDB   http://www.enterprisedb.com