Re: [HACKERS] Verbosity of Function Return Type Checks

2008-09-09 Thread Volkan YAZICI
On Mon, 8 Sep 2008, Alvaro Herrera [EMAIL PROTECTED] writes:
 Modified as you suggested. BTW, will there be a similar i18n scenario
 for dropped column you mentioned below?

 Yes, you need _() around those too.

For this purpose, I introduced a dropped_column_type variable in
validate_tupdesc_compat() function:

  const char dropped_column_type[] = _(n/a (dropped column));

And used this in below fashion:

  OidIsValid(returned-attrs[i]-atttypid)
  ? format_type_be(returned-attrs[i]-atttypid)
  : dropped_column_type

 Done with format_type_be() usage.

 BTW you forgot to remove the quotes around the type names (I know I told
 you to add them but Tom gave the reasons why it's not needed) :-)

Done.

 Those are minor problems that are easily fixed.  However there's a
 larger issue that Tom mentioned earlier and I concur, which is that the
 caller is forming the primary error message and passing it down.  It
 gets a bit silly if you consider the ways the messages end up worded:

errmsg(returned record type does not match expected record type));
errdetail(Returned type \%s\ does not match expected type \%s\ in 
 column \%s\.,
--- this is the case where it's OK

errmsg(wrong record type supplied in RETURN NEXT));
errdetail(Returned type \%s\ does not match expected type \%s\ in 
 column \%s\.,
-- this is strange

errmsg(returned tuple structure does not match table of trigger event));
errdetail(Returned type \%s\ does not match expected type \%s\ in 
 column \%s\.,
-- this is not OK

What we're trying to do is to avoid code duplication while checking the
returned tuple types against expected tuple types. For this purpose we
implemented a generic function (validate_tupdesc_compat) to handle all
possible cases. But, IMHO, it's important to remember that there is no
perfect generic function to satisfy all possible cases at best.

 I've been thinking how to pass down the context information without
 feeding the complete string, but I don't find a way without doing
 message construction. Construction is to be avoided because it's a
 pain for translators.

 Maybe we should just use something generic like errmsg(mismatching
 record type) and have the caller pass two strings specifying what's
 the returned tuple and what's the expected, but I don't see how
 ...  (BTW this is worth fixing, because every case seems to have
 appeared independently without much thought as to other callsites with
 the same pattern.)

I considered the subject with identical constraints but couldn't come up
with a more rational solution. Nevertheless, I'm open to any suggestion.


Regards.

Index: src/pl/plpgsql/src/pl_exec.c
===
RCS file: /projects/cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v
retrieving revision 1.219
diff -c -r1.219 pl_exec.c
*** src/pl/plpgsql/src/pl_exec.c	1 Sep 2008 22:30:33 -	1.219
--- src/pl/plpgsql/src/pl_exec.c	9 Sep 2008 05:48:57 -
***
*** 188,194 
  	   Oid reqtype, int32 reqtypmod,
  	   bool isnull);
  static void exec_init_tuple_store(PLpgSQL_execstate *estate);
! static bool compatible_tupdesc(TupleDesc td1, TupleDesc td2);
  static void exec_set_found(PLpgSQL_execstate *estate, bool state);
  static void plpgsql_create_econtext(PLpgSQL_execstate *estate);
  static void free_var(PLpgSQL_var *var);
--- 188,195 
  	   Oid reqtype, int32 reqtypmod,
  	   bool isnull);
  static void exec_init_tuple_store(PLpgSQL_execstate *estate);
! static void validate_tupdesc_compat(TupleDesc expected, TupleDesc returned,
! 	const char *msg);
  static void exec_set_found(PLpgSQL_execstate *estate, bool state);
  static void plpgsql_create_econtext(PLpgSQL_execstate *estate);
  static void free_var(PLpgSQL_var *var);
***
*** 384,394 
  			{
  case TYPEFUNC_COMPOSITE:
  	/* got the expected result rowtype, now check it */
! 	if (estate.rettupdesc == NULL ||
! 		!compatible_tupdesc(estate.rettupdesc, tupdesc))
! 		ereport(ERROR,
! (errcode(ERRCODE_DATATYPE_MISMATCH),
!  errmsg(returned record type does not match expected record type)));
  	break;
  case TYPEFUNC_RECORD:
  
--- 385,392 
  			{
  case TYPEFUNC_COMPOSITE:
  	/* got the expected result rowtype, now check it */
! 	validate_tupdesc_compat(tupdesc, estate.rettupdesc,
! 			gettext_noop(returned record type does not match expected record type));
  	break;
  case TYPEFUNC_RECORD:
  
***
*** 705,715 
  		rettup = NULL;
  	else
  	{
! 		if (!compatible_tupdesc(estate.rettupdesc,
! trigdata-tg_relation-rd_att))
! 			ereport(ERROR,
! 	(errcode(ERRCODE_DATATYPE_MISMATCH),
! 	 errmsg(returned tuple structure does not match table of trigger event)));
  		/* Copy tuple to upper executor memory */
  		rettup = SPI_copytuple((HeapTuple) DatumGetPointer(estate.retval));
  	}
--- 703,711 
  		rettup = NULL;
  	

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread ITAGAKI Takahiro

Fujii Masao [EMAIL PROTECTED] wrote:

 3) Patch of introducing new background process which I've called
  WALSender. It takes charge of sending WAL to the slave.
 
  Now, I assume that WALSender also listens the connection from
  the slave, i.e. only one sender process manages multiple slaves.

  The relation between WALSender and backend is 1:1. So,
  the communication mechanism between them can be simple.

I assume that he says only one backend communicates with WAL sender
at a time. The communication is done during WALWriteLock is held,
so other backends wait for the communicating backend on WALWriteLock.
WAL sender only needs to send one signal for each time it sends WAL
buffers to slave.

We could be split the LWLock to WALWriterLock and WALSenderLock,
but the essential point is same.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Jeff Davis
On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
 Thanks for the review.
 
The standard specifies that non-recursive WITH should be evaluated
once.
 
 What shall we do? I don't think there's a easy way to fix this. Maybe
 we should not allow WITH clause without RECURISVE?

My interpretation of 7.13: General Rules: 2.b is that it should be
single evaluation, even if RECURSIVE is present.

The previous discussion was here:

http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

The important arguments in the thread seemed to be:

1. People will generally expect single evaluation, so might be
disappointed if they can't use this feature for that purpose.

2. It's a spec violation in the case of volatile functions.

3. I think this is a must fix because of the point about volatile
functions --- changing it later will result in user-visible semantics
changes, so we have to get it right the first time.

I don't entirely agree with #3. It is user-visible, but only in the
sense that someone is depending on undocumented multiple-evaluation
behavior.

Tom Lane said that multiple evaluation is grounds for rejection:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php

Is there hope of correcting this before November?

 I will try to fix this. However detecting the query being not a
 non-linear one is not so easy.

If we don't allow mutual recursion, the only kind of non-linear
recursion that might exist would be multiple references to the same
recursive query name in a recursive query, is that correct?

  * DISTINCT should supress duplicates:
  
with recursive foo(i) as
  (select distinct * from (values(1),(2)) t
  union all
  select distinct i+1 from foo where i  10)
select * from foo;
  
This outputs a lot of duplicates, but they should be supressed
  according to the standard. This query is essentially the same as
  supporting UNION for recursive queries, so we should either fix both for
  8.4 or block both for consistency.
 
 I'm not sure if it's possible to fix this. Will look into.
 

Can't we just reject queries with top-level DISTINCT, similar to how
UNION is rejected?

  * outer joins on a recursive reference should be blocked:
  
with recursive foo(i) as
  (values(1)
  union all
  select i+1 from foo left join (values(1)) t on (i=column1))
select * from foo;
  
Causes an infinite loop, but the standard says using an outer join
in this situation should be prohibited. This should be fixed for 8.4.
 
 Not an issue, I think.

Agreed, Andrew Gierth corrected me here.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PLUGINS Functionlity in Win32 build scripts

2008-09-09 Thread MUHAMMAD ASIF


Hi,
  I am currently working in EnterpriseDB (www.enterprisedb.com). During the 
integration of pldebugger ( http://pgfoundry.org/projects/edb-debugger ) with 
postgres on windows  I faced a problem that plugins are not being copied to the 
lib/plugins directory. Plugins should be copied in (Installation 
dir)lib/plugins to work properly.
  To solve this problem I added PLUGINS logic in the Windows Perl build 
scripts of PostgreSQL 8.3.3. It searches for PLUGINS variable in the contrib 
Makefile and processes PLUGINS as MODULES and copies the generated plugin 
library to the (Installation dir)lib/plugins.
Please find the attached plugin.patch file. Thanks.

Best Regards,
Asif Naeem
www.enterprisedb.com

_
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx

plugins.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Simon Riggs

On Mon, 2008-09-08 at 17:40 -0400, Bruce Momjian wrote:
 Fujii Masao wrote:
  On Mon, Sep 8, 2008 at 8:44 PM, Markus Wanner [EMAIL PROTECTED] wrote:
  Merge into WAL writer?
  
   Uh.. that would mean you'd loose parallelism between WAL writing to disk 
   and
   WAL shipping via network. That does not sound appealing to me.
  
  That depends on the order of WAL writing and WAL shipping.
  How about the following order?
  
  1. A backend writes WAL to disk.
  2. The backend wakes up WAL sender process and sleeps.
  3. WAL sender process does WAL shipping and wakes up the backend.
  4. The backend issues sync command.
 
 I am confused why this is considered so complicated.  Having individual
 backends doing the wal transfer to the slave is never going to work
 well.

Agreed.

 I figured we would have a single WAL streamer that continues advancing
 forward in the WAL file, streaming to the standby.  Backends would
 update a shared memory variable specifying how far they want the wal
 streamer to advance and send a signal to the wal streamer if necessary. 
 Backends would monitor another shared memory variable that specifies how
 far the wal streamer has advanced.

Yes. We should have a LogwrtRqst pointer and LogwrtResult pointer for
the send operation. The Write and Send operations can then continue
independently of one another. XLogInsert() cannot advance to a new page
while we are waiting to send or write. Notice that the Send process
might be the bottleneck - that is the price of synchronous replication.

Backends then wait
* not at all for asynch commit
* just for Write for local synch commit
* for both Write and Send for remote synch commit
(various additional options for what happens to confirm Send)

So normal backends neither write nor send. We have two dedicated
processes, one for write, one for send. We need to put an extra test
into WALWriter loop so that it will continue immediately (with no wait)
if there is an outstanding request for synchronous operation.

This gives us the Group Commit feature also, even if we are not using
replication. So we can drop the commit_delay stuff.

XLogBackgroundFlush() processes data page at a time if it can. That may
not be the correct batch size for XLogBackgroundSend(), so we may need a
tunable for the MTU. Under heavy load we need the Write and Send to act
in a way to maximise throughput rather than minimise response time, as
we do now.

If wal_buffers overflows, we continue to hold WALInsertLock while we
wait for WALWriter and WALSender to complete.

We should increase default wal_buffers to 64.

After (or during) XLogInsert backends will sleep in a proc queue,
similar to LWlocks and protected by a spinlock. When preparing to
write/send the WAL process should read the proc at the *tail* of the
queue to see what the next LogwrtRqst should be. Then it performs its
action and wakes procs up starting with the head of the queue. We would
add LSN into PGPROC, so WAL processes can check whether the backend
should be woken. The LSN field can be accessed without spinlocks since
it is only ever set by the backend itself and only read while a backend
is sleeping. So we access spinlock, find tail, drop spinlock then read
LSN of the backend that (was) the tail.

Another thought occurs that we might measure the time a Send takes and
specify a limit on how long we are prepared to wait for confirmation.
Limit=0 = asynchronous. Limit  0 implies synchronous-up-to-the-limit.
This would give better user behaviour across a highly variable network
connection.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move src/tools/backend/ to wiki

2008-09-09 Thread Greg Smith

On Mon, 8 Sep 2008, Bruce Momjian wrote:

I don't think much changes at the flow chart level from release to 
release so it would fine if it was just CVS HEAD. I also don't think 
many people do back-branch development.


It's easy with Mediawiki to both 1) see the old version for those 
situations and 2) copy the page to somewhere else to tag the one that 
goes along with a particular release.  If there is some version specific 
stuff there it's straightforward to do something along one of those two 
lines at the point it's needed.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [EMAIL PROTECTED]: Re: [HACKERS] [patch] GUC source file and line number]

2008-09-09 Thread Magnus Hagander
Alvaro Herrera wrote:
 Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 
 Hmm, I didn't recheck after Greg's patch, but in mine, it doesn't,
 because the location is saved as reset location and restored when the
 variable is reset.  It worked fine in all cases I tested.
 Hmm.  Actually, why is there a need to save and restore at all?  There
 can certainly never be more than one recorded config-file location per
 variable.  What about saying that each variable has one and only one
 filename/linenumber, but whether these are relevant to the current value
 is determined by whether the current value's source is S_FILE?
 
 Hmm, this does make the patch a lot simpler.  Attached.  (Magnus was
 visionary enough to put the correct test in the pg_settings definition.)

:-)

Yeah, it does look at lot simpler. And it certainly takes away the
pieces of code of mine that I was entirely unable to make working :-)


 I also dropped the change to set_config_option, and added a new routine
 to set the source file/line, as you suggested elsewhere.  The only
 problem is that we now have two bsearch calls for each option set in a
 config file ...  I don't want to change set_config_option just to be
 able to return the struct config_generic for this routine's sake ...
 Better ideas?  Is this OK as is?

Well, it's not like it's a performance critical path, is it? I think we
should be ok.


 I noticed some weird things when the config files contain errors, but I
 think it's outside this patch's scope.
 
 (I dropped the default stuff for now, as it doesn't seem that a
 consensus has been reached on that topic.)

This is one of the reasons I suggested keeping that one as a separate
patch in the first place. The other main reason being that once it gets
applied, you really want it to be two different revisions, to clearly
keep them apart ;-) I still think we should eventually get both in
there, but let's treat them as separate entities.

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some newbie questions

2008-09-09 Thread Greg Smith

On Sun, 7 Sep 2008, M2Y wrote:


Why does a replication solution need log shipping and why cant we just
ship the transaction statements to a standby node?


Here's one of the classic examples of why that doesn't work:

create table x (d decimal);
insert into x values (random());

If you execute those same statements on two different nodes, they will end 
up with different values for the random number and therefore the nodes 
won't match anymore.  A similar issue shows up if you use functions that 
check the current system time, that will be slightly different between the 
two:  even if the clocks are perfectly synced, by the time the standy 
received the transaction it will be later than the original.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Heikki Linnakangas

Simon Riggs wrote:

This gives us the Group Commit feature also, even if we are not using
replication. So we can drop the commit_delay stuff.


Huh? How does that give us group commit?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [EMAIL PROTECTED]: Re: [HACKERS] [patch] GUC source file and line number]

2008-09-09 Thread Greg Smith

On Tue, 9 Sep 2008, Magnus Hagander wrote:


(I dropped the default stuff for now, as it doesn't seem that a
consensus has been reached on that topic.)


This is one of the reasons I suggested keeping that one as a separate
patch in the first place. The other main reason being that once it gets
applied, you really want it to be two different revisions, to clearly
keep them apart


This means some committer is going to have to make a second pass over the 
same section of code and do testing there more than once, that's a waste 
of time I was trying to avoid.  Also, any standalone patch I submit right 
now won't apply cleanly if the source file/line patch is committed.


If nobody cares about doing that work twice, I'll re-submit a separate 
patch once this one is resolved one way or another.  I hope you snagged 
the documentation update I added to your patch though.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] to_date() validation

2008-09-09 Thread Martijn van Oosterhout
On Mon, Sep 08, 2008 at 06:24:14PM +1000, Brendan Jurd wrote:
 On Sun, Sep 7, 2008 at 5:58 AM, Alex Hunsaker [EMAIL PROTECTED] wrote:
  Im just following this:
  http://wiki.postgresql.org/wiki/Reviewing_a_Patch so lets get started.
 
 
 Hi Alex.  Thanks for taking the time to review my patch.

I actually had a look at this patch also, though not as thoroughly as
Alex. There was one part that I had some thoughts about in 
from_char_parse_int_len:

!   /*
!* We need to pull exactly the number of characters given in 
'len' out
!* of the string, and convert those.
!*/
snip
!   first = palloc(len + 1);
!   strncpy(first, init, len);
!   first[len] = '\0';

The use of palloc/pfree in this routine seems excessive. Does len have
upper bound? If so a simple array will do it.

!   if (strlen(first)  len)

Here you check the length of the remaining string and complain if it's
too short, but:

snip
!   result = strtol(first, last, 10);
!   *src += (last - first);

Here you do not note if we didn't convert the entire string. So it
seems you are allowed to provide too few characters, as long as it's
not the last field?

Other than that it looks like a good patch.

Have a ncie day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Simon Riggs

On Tue, 2008-09-09 at 12:24 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  This gives us the Group Commit feature also, even if we are not using
  replication. So we can drop the commit_delay stuff.
 
 Huh? How does that give us group commit?

Multiple backends waiting while we perform a write. Commits then happen
as a group (to WAL at least), hence Group Commit.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Heikki Linnakangas

Simon Riggs wrote:

Multiple backends waiting while we perform a write. Commits then happen
as a group (to WAL at least), hence Group Commit.


The problem with our current commit protocol is this:

1. Backend A inserts commit record A
2. Backend A starts to flush commit record A
3. Backend B inserts commit record B
4. Backend B waits until 2. finishes
5. Backend B starts to flush commit record B

Note that we already have the logic to flush all pending commit records 
at once. If there's also backend C that insert their commit records 
after step 2, B and C will be flushed at once:


1. Backend A inserts commit record A
2. Backend A starts to flush commit record A
3. Backend B inserts commit record B
4. Backend B waits until 2. finishes
5. Backend C inserts commit record C
6. Backend C waits until 2. finishes
5. Flush A finishes. Backend B starts to flush commit records A+B

The idea of group commit is to insert a small delay in backend A between 
steps 1 and 2, so that we can flush both commit records in one fsync:


1. Backend A inserts commit record A
2. Backend A waits
3. Backend B inserts commit record B
3. Backend B starts to flush commit record A + B

The tricky part is, how does A know if it should wait, and for how long? 
commit_delay sure isn't ideal, but AFAICS the log shipping proposal 
doesn't provide any solution to that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Our CLUSTER implementation is pessimal

2008-09-09 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Yeah, I've been thinking about how to use the planner to do this.

 I thought the answer to that was going to be more or less call
 cost_sort() and cost_index() and compare the answers.

That was the way I was headed. But the idea of using the executor is looking
more and more attractive the more I think about it. It would solve this
problem as well as future-proof us against any other new features which could
speed up CLUSTER. 

In particular I'm thinking of people clustering on a covering index (which
isn't as uncommon as it sounds, if you have a covering index you probably do
want to cluster it -- consider many-to-many join tables). We should be able to
do an index-only scan which might be even faster than sorting.

Also, it would cleanly solve the expression index case which my current
solution can't solve (not without much larger changes to tuplesort because the
sort key isn't present in the sort tuple). Also, if your table is very bloated
it could be faster to do a bitmap index scan and sort. Or scan another newer
and better organized index instead of the index you're clustering.

Basically I feel like what I've done is reproduce a small part of the planner
and executor and it would be a cleaner and more general solution to just do it
properly.

 To do that it seems to me what we would need to do is add a function
 _pg_get_rawtuple_header() which returns the visibility information that HTSV
 needs. 

 You seem to be confusing use the planner with use the executor.
 All that we need here is a decision about which code path to take
 within CLUSTER.  We don't need to bring in boatloads of irrelevant
 infrastructure --- especially not infrastructure that's going to be
 fighting us every step of the way.  The executor isn't designed to
 return raw tuples and no magic function is going to change that.

Actually, thinking about it, couldn't we just a new system column to do this?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread ITAGAKI Takahiro

Heikki Linnakangas [EMAIL PROTECTED] wrote:

 The tricky part is, how does A know if it should wait, and for how long? 
 commit_delay sure isn't ideal, but AFAICS the log shipping proposal 
 doesn't provide any solution to that.

They have no relation each other directly,
but they need similar synchronization modules.

In log shipping, backends need to wait for WAL Sender's job,
and should wake up as fast as possible after the job is done.
It is similar to requirement of the group commit.

Signals and locking, borrewed from Postgres-R, are now studied
for the purpose in the log shipping, but I'm not sure it can be
also used in the group commit.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] to_date() validation

2008-09-09 Thread Brendan Jurd
On Tue, Sep 9, 2008 at 7:29 PM, Martijn van Oosterhout
[EMAIL PROTECTED] wrote:
 I actually had a look at this patch also, though not as thoroughly as
 Alex. There was one part that I had some thoughts about in 
 from_char_parse_int_len:


Hi Martijn.  Thanks for your comments.

 The use of palloc/pfree in this routine seems excessive. Does len have
 upper bound? If so a simple array will do it.


There isn't any *theoretical* upper bound on len.  However, in
practice, with the current set of formatting nodes, the largest len
that will ever be passed to rom_char_parse_int_len() is 6 (for the
microsecond 'US' node).

I suppose I could define a constant FORMATNODE_MAX_LEN, make it
something like 10 and just use that for copying the string, rather
than palloc().  I'll give it a try.

 !   if (strlen(first)  len)

 Here you check the length of the remaining string and complain if it's
 too short, but:

 snip
 !   result = strtol(first, last, 10);
 !   *src += (last - first);

 Here you do not note if we didn't convert the entire string. So it
 seems you are allowed to provide too few characters, as long as it's
 not the last field?

That's true.  The only way to hit that condition would be to provide a
semi-bogus value in a string with no separators between the numbers.
For example:

postgres=# SELECT to_date('20081o13', 'MMDD');
ERROR:  invalid value for DD in source string
DETAIL:  Value must be an integer.

What happens here is that strtol() happily consumes the '1' for the
format node MM, and as you point out it does not complain that it
expected to consume two characters and only got one.  On the next node
(DD), the function tries to start parsing an integer, but the first
character it encounters is 'o', so it freaks out.

Certainly the error message here should be more apropos, and tell the
user that the problem is in the MM node.  Blaming the DD node is pure
red herring.

However, if the mistake is at the end of the string, there is no error at all:

postgres=# SELECT to_date('2008101!', 'MMDD');
  to_date

 2008-10-01

This is because the end of the string counts as a separator, so we
just run an unbounded strtol() on whatever characters remain in the
string.

As discussed in my response to Alex's review, I made the end of the
string a separator so that things like 'DD-MM-' would actually
work for years with more than four digits.

Now I'm wondering if that was the wrong way to go.  The case for years
with more than four digits is extremely narrow, and if somebody really
wanted to parse '01-01-20008' as 1 Jan 20,008 they could always use
the 'FM' modifier to get the behaviour they want ('DD-MM-FM').

I'll send in a new version which addresses these issues.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Fujii Masao
On Tue, Sep 9, 2008 at 5:11 PM, Simon Riggs [EMAIL PROTECTED] wrote:

 Yes. We should have a LogwrtRqst pointer and LogwrtResult pointer for
 the send operation. The Write and Send operations can then continue
 independently of one another. XLogInsert() cannot advance to a new page
 while we are waiting to send or write.

Agreed.

For realizing various synchronous options, the Write and Send operations should
be treated separately. So, I'll introduce XLogCtlSend structure which is shared
state data for WAL sending. XLogCtlInsert might need new field LogsndResult
which indicates a byte position that we have already sended. As you say,
AdvanceXLInsertBuffer() must check both position that we have already written
(fsynced) and sended. I'm doing the detail design of this now :)

 Notice that the Send process
 might be the bottleneck - that is the price of synchronous replication.

Really? In the benchmark result of my prototype, the bottleneck is
still disk I/O.
The communication (between the master and the slave) latency is smaller than
WAL writing (fsyncing) one. Of course, I assume that we use not-poor network
like 1000BASE-T.

What makes the sender process bottleneck?

 Backends then wait
 * not at all for asynch commit
 * just for Write for local synch commit
 * for both Write and Send for remote synch commit
 (various additional options for what happens to confirm Send)

I'd like to introduce new parameter synchronous_replication which specifies
whether backends waits for the response from WAL sender process. By
combining synchronous_commit and synchronous_replication, users can
choose various options.

 After (or during) XLogInsert backends will sleep in a proc queue,
 similar to LWlocks and protected by a spinlock. When preparing to
 write/send the WAL process should read the proc at the *tail* of the
 queue to see what the next LogwrtRqst should be. Then it performs its
 action and wakes procs up starting with the head of the queue. We would
 add LSN into PGPROC, so WAL processes can check whether the backend
 should be woken. The LSN field can be accessed without spinlocks since
 it is only ever set by the backend itself and only read while a backend
 is sleeping. So we access spinlock, find tail, drop spinlock then read
 LSN of the backend that (was) the tail.

You mean only XLogInsert treating commit record or every XLogInsert?
Anyway, ISTM that the response time get worse :(

 Another thought occurs that we might measure the time a Send takes and
 specify a limit on how long we are prepared to wait for confirmation.
 Limit=0 = asynchronous. Limit  0 implies synchronous-up-to-the-limit.
 This would give better user behaviour across a highly variable network
 connection.

In the viewpoint of detection of a network failure, this feature is necessary.
When the network goes down, WAL sender can be blocked until it detects
the network failure, i.e. WAL sender keeps waiting for the response which
never comes. A timeout notification is necessary in order to detect a
network failure soon.

regards

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Markus Wanner

Hi,

ITAGAKI Takahiro wrote:

Signals and locking, borrewed from Postgres-R, are now studied
for the purpose in the log shipping, but I'm not sure it can be
also used in the group commit.


Yeah. As Heikki points out, there is a completely orthogonal question 
WRT group commit: how does transaction A know if or how long it should 
wait for other transactions to file their WAL?


If we decide to do all of the WAL writing from a separate WAL writer 
process and let the backends communicate with it, then imessages might 
help again. But I currently don't think that's feasible.


Apart from possibly having similar IPC requirements, group commit and 
log shipping have not much in common and should be considered separate 
features.


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Markus Wanner

Hi,

Fujii Masao wrote:

Really? In the benchmark result of my prototype, the bottleneck is
still disk I/O.
The communication (between the master and the slave) latency is smaller than
WAL writing (fsyncing) one. Of course, I assume that we use not-poor network
like 1000BASE-T.


Sure. If you do WAL sending to standby and WAL writing to disk in 
parallel, only the slower one is relevant (in case you want to wait for 
both). If that happens to be the disk, you won't see any performance 
degradation compared to standalone operation.


If you want the standby to confirm having written (and flushed) the WAL 
to disk as well, that can't possibly be faster than the active node's 
local disk (assuming equally fast and busy disk subsystems).



I'd like to introduce new parameter synchronous_replication which specifies
whether backends waits for the response from WAL sender process. By
combining synchronous_commit and synchronous_replication, users can
choose various options.


Various config options have already been proposed. I personally don't 
think that helps us much. Instead, I'd prefer to see prototype code or 
at least concepts. We can juggle with the GUC variable names or other 
config options later on.



In the viewpoint of detection of a network failure, this feature is necessary.
When the network goes down, WAL sender can be blocked until it detects
the network failure, i.e. WAL sender keeps waiting for the response which
never comes. A timeout notification is necessary in order to detect a
network failure soon.


That's one of the areas I'm missing from the overall concept. I'm glad 
it comes up. You certainly realize, that such a timeout must be set high 
enough so as not to trigger false negatives every now and then? Or do 
you expect some sort of retry loop in case the link to the standby comes 
up again? How about multiple standby servers?


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Simon Riggs

On Tue, 2008-09-09 at 20:12 +0900, Fujii Masao wrote:

 What makes the sender process bottleneck?

In my experience, the Atlantic. But I guess the Pacific does it too. :-)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Heikki Linnakangas

Fujii Masao wrote:

What makes the sender process bottleneck?


The keyword here is might. There's many possibilities, like:
- Slow network.
- Ridiculously fast disk. Like a RAM disk. If you have a synchronous 
slave you can fail over to, putting WAL on a RAM disk isn't that crazy.

- slower WAL disk on the slave.
etc.


Backends then wait
* not at all for asynch commit
* just for Write for local synch commit
* for both Write and Send for remote synch commit
(various additional options for what happens to confirm Send)


I'd like to introduce new parameter synchronous_replication which specifies
whether backends waits for the response from WAL sender process. By
combining synchronous_commit and synchronous_replication, users can
choose various options.


There's one thing I haven't figured out in this discussion. Does the 
write to the disk happen before or after the write to the slave? Can you 
guarantee that if a transaction is committed in the master, it's also 
committed in the slave, or vice versa?



Another thought occurs that we might measure the time a Send takes and
specify a limit on how long we are prepared to wait for confirmation.
Limit=0 = asynchronous. Limit  0 implies synchronous-up-to-the-limit.
This would give better user behaviour across a highly variable network
connection.


In the viewpoint of detection of a network failure, this feature is necessary.
When the network goes down, WAL sender can be blocked until it detects
the network failure, i.e. WAL sender keeps waiting for the response which
never comes. A timeout notification is necessary in order to detect a
network failure soon.


Agreed. But what happens if you hit that timeout? Should we enforce that 
timeout within the server, or should we leave that to the external 
heartbeat system?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Simon Riggs

On Tue, 2008-09-09 at 12:54 +0300, Heikki Linnakangas wrote:
 Note that we already have the logic to flush all pending commit
 records at once.

But only when you can grab WALInsertLock when flushing. If you look at
the way I suggested, it does not rely upon that lock being available.

So it is both responsive in low write rate conditions and yet efficient
in high write rate conditions and does not require we specify a wait
time.

IMHO the idea of a wait time is a confusion that comes from using a
simple example (with respect). If we imagine the example slightly
differently you'll see a different answer: 

High write rate: A stream of commits come so fast that by the time a
write completes there are always  1 backends waiting to commit again.
In that case, there is never any need to wait because the arrival
pattern requries us to issues writes as quickly as we can.

Medium write rate: Commits occur relatively frequently, so that the mean
commits/flush is in the range 0.5 - 1. In this case, we can get better
I/O efficiency by introducing waits. But note that a wait is risky, and
at some point we may wait without another commit arriving. In this case,
if the disk can keep up with the write rate, why would we want to
improve I/O efficiency? There's no a priori way of calculating a useful
wait time, so waiting is always a risk. Why would we risk damage to our
response times when the disk can keep up with write rate? 

So for me, introducing a wait is something you might want to consider in
medium rate conditions. Anything more or less than that and a wait is
useless. So optimising for the case where the arrival rate is within a
certain fairly tight range seems not worthwhile.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Simon Riggs

On Tue, 2008-09-09 at 20:12 +0900, Fujii Masao wrote:

 I'd like to introduce new parameter synchronous_replication which specifies
 whether backends waits for the response from WAL sender process. By
 combining synchronous_commit and synchronous_replication, users can
 choose various options.

We already discussed that on -hackers. See Transaction Controlled
Robustness. But yes, something like that.

Please note the design mentions fsyncing after applying WAL. I'm sure
you're aware we don't fsync after *applying* WAL now, and I hope we
never do. You might want to fsync data to WAL files on the standby, but
that is a slightly different thing.

  After (or during) XLogInsert backends will sleep in a proc queue,
  similar to LWlocks and protected by a spinlock. When preparing to
  write/send the WAL process should read the proc at the *tail* of the
  queue to see what the next LogwrtRqst should be. Then it performs its
  action and wakes procs up starting with the head of the queue. We would
  add LSN into PGPROC, so WAL processes can check whether the backend
  should be woken. The LSN field can be accessed without spinlocks since
  it is only ever set by the backend itself and only read while a backend
  is sleeping. So we access spinlock, find tail, drop spinlock then read
  LSN of the backend that (was) the tail.
 
 You mean only XLogInsert treating commit record or every XLogInsert?

Just the commit records, when synchronous_commit = on.

 Anyway, ISTM that the response time get worse :(

No, because it would have had to wait in the queue for the WALWriteLock
while prior writes occur. 

If the WALWriter sleeps on a semaphore, it too can be nudged into action
at the appropriate time, so no need for a delay between backend
beginning to wait and WALWriter beginning to act. (Well, IPC delay
between two processes, so some, but that is balanced against efficiency
of Send).

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Simon Riggs

On Tue, 2008-09-09 at 13:42 +0200, Markus Wanner wrote:

 How about multiple standby servers?

There are various ways for getting things to work with multiple servers.
I hope we can make this work with just a single standby before we try to
make it work on more. There are various options for synchronous and
asynchronous relay that will burden us if we try to consider all of that
in the remaining 7 weeks we have. So yes please, just not yet.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Our CLUSTER implementation is pessimal

2008-09-09 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 In particular I'm thinking of people clustering on a covering index (which
 isn't as uncommon as it sounds, if you have a covering index you probably do
 want to cluster it -- consider many-to-many join tables). We should be able to
 do an index-only scan which might be even faster than sorting.

[ scratches head... ]  You need *all* the data from the heap.  Or by
covering index do you mean an index that contains the entire table
contents?  Doesn't really sound like a case we need to focus on; or
at least this version of clustering isn't what it needs, it wants an
implementation where the table and the index are the same thing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Simon Riggs

On Tue, 2008-09-09 at 08:24 -0400, Tom Lane wrote:
 Fujii Masao [EMAIL PROTECTED] writes:
  On Tue, Sep 9, 2008 at 5:11 PM, Simon Riggs [EMAIL PROTECTED] wrote:
  
  Yes. We should have a LogwrtRqst pointer and LogwrtResult pointer for
  the send operation. The Write and Send operations can then continue
  independently of one another. XLogInsert() cannot advance to a new page
  while we are waiting to send or write.
 
  Agreed.
 
 Agreed?  That last restriction is a deal-breaker.

OK, I should have said *if wal_buffers are full* XLogInsert() cannot
advance to a new page while we are waiting to send or write. So I don't
think its a deal breaker.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Tom Lane
Fujii Masao [EMAIL PROTECTED] writes:
 On Tue, Sep 9, 2008 at 5:11 PM, Simon Riggs [EMAIL PROTECTED] wrote:
 
 Yes. We should have a LogwrtRqst pointer and LogwrtResult pointer for
 the send operation. The Write and Send operations can then continue
 independently of one another. XLogInsert() cannot advance to a new page
 while we are waiting to send or write.

 Agreed.

Agreed?  That last restriction is a deal-breaker.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] to_date() validation

2008-09-09 Thread Brendan Jurd
On Tue, Sep 9, 2008 at 9:04 PM, Brendan Jurd [EMAIL PROTECTED] wrote:
 On Tue, Sep 9, 2008 at 7:29 PM, Martijn van Oosterhout
 [EMAIL PROTECTED] wrote:
 The use of palloc/pfree in this routine seems excessive. Does len have
 upper bound? If so a simple array will do it.


 I suppose I could define a constant FORMATNODE_MAX_LEN, make it
 something like 10 and just use that for copying the string, rather
 than palloc().  I'll give it a try.


Turns out there was already a relevant constant defined in
formatting.c: DCH_MAX_ITEM_SIZ, set to 9.  So I just used that, +1 for
the trailing null.


 Here you do not note if we didn't convert the entire string. So it
 seems you are allowed to provide too few characters, as long as it's
 not the last field?

 That's true.  The only way to hit that condition would be to provide a
 semi-bogus value in a string with no separators between the numbers.

I've now plugged this hole.  I added the following error for
fixed-width inputs that are too short:

postgres=# SELECT to_date('200%1010', 'MMDD');
ERROR:  invalid value for  in source string
DETAIL:  Field requires 4 characters, but only 3 could be parsed.
HINT:  If your source string is not fixed-width, try using the FM modifier.

I've attached a new version of the patch (version 3), as well as an
incremental patch to show the differences between versions 2 and 3.

Cheers,
BJ


to-date-validation-3.diff.gz
Description: GNU Zip compressed data


to-date-validation-2-to-3.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Tue, 2008-09-09 at 08:24 -0400, Tom Lane wrote:
 Agreed?  That last restriction is a deal-breaker.

 OK, I should have said *if wal_buffers are full* XLogInsert() cannot
 advance to a new page while we are waiting to send or write. So I don't
 think its a deal breaker.

Oh, OK, that's obvious --- there's no place to put more data.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Dimitri Fontaine
Hi,

Le mardi 09 septembre 2008, Heikki Linnakangas a écrit :
 The tricky part is, how does A know if it should wait, and for how long?
 commit_delay sure isn't ideal, but AFAICS the log shipping proposal
 doesn't provide any solution to that.

It might just be I'm not understanding what it's all about, but it seems to me 
with WALSender process A will wait, whatever happens, either until the WAL is 
sent to slave or written to disk on the slave.

I naively read Simon's proposition to consider GroupCommit done with this new 
feature. A is already waiting (for some external event to complete), why 
can't we use this for including some other transactions commits into the 
local deal?

Regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.


Re: [EMAIL PROTECTED]: Re: [HACKERS] [patch] GUC source file and line number]

2008-09-09 Thread Alvaro Herrera
Greg Smith wrote:
 On Tue, 9 Sep 2008, Magnus Hagander wrote:

 (I dropped the default stuff for now, as it doesn't seem that a
 consensus has been reached on that topic.)

 This is one of the reasons I suggested keeping that one as a separate
 patch in the first place. The other main reason being that once it gets
 applied, you really want it to be two different revisions, to clearly
 keep them apart

 This means some committer is going to have to make a second pass over the 
 same section of code and do testing there more than once, that's a waste  
 of time I was trying to avoid.

Actually, this is done all the time.

 Also, any standalone patch I submit right now won't apply cleanly if
 the source file/line patch is committed.

You can always start from the patched version and use interdiff to
obtain a patch difference ...

 If nobody cares about doing that work twice, I'll re-submit a separate  
 patch once this one is resolved one way or another.  I hope you snagged  
 the documentation update I added to your patch though.

Yeah, I did.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-09 Thread Martin Pihlak
Alvaro Herrera wrote:
 Attached is a patch that implements the described signalling.
 
 Are we keeping the idea that the reader sends a stat message when it
 needs to read stats?  What about the lossiness of the transport?
 

As the message is resent in the wait loop, the collector should receive
it sooner or later. And initial testing shows that its not really easy to
make the collector lose messages.

I used a modified version of the patch to run a simple load test on a 4 core
amd64 Linux box:

- patch modified so that pgstat_send_inquiry() is sent only once - before wait 
loop,
  so it times out if message is lost.
- stats file bloated to ~ 5MB by creating 40k tables.
- 4 pgbench instances running: -c 500 -t 500
- 2 clients constantly pulling stats
- all cores near 100% busy, tx traffic over loopback ~ 200kB/sec.

Most of the stats requests required 1 or 2 file wait iterations (5ms sleep 
each).
Occasionally 3, but no timeouts (ie. no lost messages). Maybe other platforms 
are
more sensitive, but I have none available for testing at the moment.

regards,
Martin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Markus Wanner

Hi,

Dimitri Fontaine wrote:
It might just be I'm not understanding what it's all about, but it seems to me 
with WALSender process A will wait, whatever happens, either until the WAL is 
sent to slave or written to disk on the slave.


..and it will still has to wait until WAL is written to disk on the 
local node, as we do now. These are two different things to wait for. 
One is a network socket operation, the other is an fsync(). As these 
don't work together too well (blocking), you better run that in two 
different processes.


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Dimitri Fontaine
Le mardi 09 septembre 2008, Markus Wanner a écrit :
 ..and it will still has to wait until WAL is written to disk on the
 local node, as we do now. These are two different things to wait for.
 One is a network socket operation, the other is an fsync(). As these
 don't work together too well (blocking), you better run that in two
 different processes.

Exactly the point. The process is now already waiting in all cases, so maybe 
we could just force waiting some WALSender signal before sending the fsync() 
order, so we now have Group Commit.
I'm not sure this is a good idea at all, it's just the way I understand how 
adding WALSender process in the mix could give Group Commit feature for free.

Regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Simon Riggs

On Tue, 2008-09-09 at 15:32 +0200, Dimitri Fontaine wrote:
 The process is now already waiting in all cases

If the WALWriter|Sender is available, it can begin the task immediately.
There is no need for it to wait if you want synchronous behaviour.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Robert Haas
 3. I think this is a must fix because of the point about volatile
 functions --- changing it later will result in user-visible semantics
 changes, so we have to get it right the first time.

 I don't entirely agree with #3. It is user-visible, but only in the
 sense that someone is depending on undocumented multiple-evaluation
 behavior.

What makes you think it's going to be undocumented?  Single versus
multiple evaluation is a keep aspect of this feature and certainly
needs to be documented one way or the other.  I can't understand why
we would introduce a standard syntax with non-standard behavior, but
if we do, it certainly had better be mentioned in the documentation.

I think that the most likely result of a CTE implementation that
doesn't guarantee single evaluation is that people simply won't use
it.  But anyone who does will expect that their queries will return
the same results in release N and release N+1, for all values of N.
The only way that an incompatible change of this type won't break
people's applications is if they're not using the feature in the first
place, in which case there is no point in committing it anyway.

I wonder if the whole approach to this patch is backward.  Instead of
worrying about how to implement WITH RECURSIVE, maybe it would be
better to implement a really solid, spec-compliant version of WITH,
and add the RECURSIVE functionality in a later patch/release.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Markus Wanner

Hi,

ITAGAKI Takahiro wrote:

Signals and locking, borrewed from Postgres-R, are now studied
for the purpose in the log shipping,


Cool. Let me know if you have any questions WRT this imessages stuff.

Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Markus Wanner

Hi,

Dimitri Fontaine wrote:
Exactly the point. The process is now already waiting in all cases, so maybe 
we could just force waiting some WALSender signal before sending the fsync() 
order, so we now have Group Commit.


A single process can only wait on either fsync() or on select(), but not 
on both concurrently, because both syscalls are blocking. So mixing 
these into a single process is an inherently bad idea due to lack of 
parallelism.


I fail to see how log shipping would ease or have any other impact on a 
Group Commit feature, which should clearly also work for stand alone 
servers, i.e. where there is no WAL sender process.


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Dimitri Fontaine
Le mardi 09 septembre 2008, Simon Riggs a écrit :
 If the WALWriter|Sender is available, it can begin the task immediately.
 There is no need for it to wait if you want synchronous behaviour.

Ok. Now I'm as lost as anyone with respect to how you get Group Commit :)
-- 
dim


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Simon Riggs

On Tue, 2008-09-09 at 16:05 +0200, Dimitri Fontaine wrote:
 Le mardi 09 septembre 2008, Simon Riggs a écrit :
  If the WALWriter|Sender is available, it can begin the task immediately.
  There is no need for it to wait if you want synchronous behaviour.
 
 Ok. Now I'm as lost as anyone with respect to how you get Group Commit :)

OK, sorry. Pls read my reply to Heikki on different subthread of this
topic, he had same question of me.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Heikki Linnakangas

Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

On Tue, 2008-09-09 at 08:24 -0400, Tom Lane wrote:

Agreed?  That last restriction is a deal-breaker.



OK, I should have said *if wal_buffers are full* XLogInsert() cannot
advance to a new page while we are waiting to send or write. So I don't
think its a deal breaker.


Oh, OK, that's obvious --- there's no place to put more data.


Each WAL sender can keep at most one page locked at a time, right? So, 
that should never happen if wal_buffers  1 + n_wal_senders.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Tatsuo Ishii
 On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
  Thanks for the review.
  
 The standard specifies that non-recursive WITH should be evaluated
 once.
  
  What shall we do? I don't think there's a easy way to fix this. Maybe
  we should not allow WITH clause without RECURISVE?
 
 My interpretation of 7.13: General Rules: 2.b is that it should be
 single evaluation, even if RECURSIVE is present.
 
 The previous discussion was here:
 
 http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
 
 The important arguments in the thread seemed to be:
 
 1. People will generally expect single evaluation, so might be
 disappointed if they can't use this feature for that purpose.
 
 2. It's a spec violation in the case of volatile functions.
 
 3. I think this is a must fix because of the point about volatile
 functions --- changing it later will result in user-visible semantics
 changes, so we have to get it right the first time.
 
 I don't entirely agree with #3. It is user-visible, but only in the
 sense that someone is depending on undocumented multiple-evaluation
 behavior.
 
 Tom Lane said that multiple evaluation is grounds for rejection:
 http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php
 
 Is there hope of correcting this before November?

According to Tom, to implement single evaluation we need to make big
infrastructure enhancement which is likely slip the schedule for 8.4
release which Tom does not want.

So as long as Tom and other people think that is a must fix, there
seems no hope probably.

Anyway I will continue to work on existing patches...
--
Tatsuo Ishii
SRA OSS, Inc. Japan

  I will try to fix this. However detecting the query being not a
  non-linear one is not so easy.
 
 If we don't allow mutual recursion, the only kind of non-linear
 recursion that might exist would be multiple references to the same
 recursive query name in a recursive query, is that correct?
 
   * DISTINCT should supress duplicates:
   
 with recursive foo(i) as
   (select distinct * from (values(1),(2)) t
   union all
   select distinct i+1 from foo where i  10)
 select * from foo;
   
 This outputs a lot of duplicates, but they should be supressed
   according to the standard. This query is essentially the same as
   supporting UNION for recursive queries, so we should either fix both for
   8.4 or block both for consistency.
  
  I'm not sure if it's possible to fix this. Will look into.
  
 
 Can't we just reject queries with top-level DISTINCT, similar to how
 UNION is rejected?
 
   * outer joins on a recursive reference should be blocked:
   
 with recursive foo(i) as
   (values(1)
   union all
   select i+1 from foo left join (values(1)) t on (i=column1))
 select * from foo;
   
 Causes an infinite loop, but the standard says using an outer join
 in this situation should be prohibited. This should be fixed for 8.4.
  
  Not an issue, I think.
 
 Agreed, Andrew Gierth corrected me here.
 
 Regards,
   Jeff Davis
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Pavel Stehule
Hello

2008/9/9 Tatsuo Ishii [EMAIL PROTECTED]:
 On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
  Thanks for the review.
 
 The standard specifies that non-recursive WITH should be evaluated
 once.
 
  What shall we do? I don't think there's a easy way to fix this. Maybe
  we should not allow WITH clause without RECURISVE?

 My interpretation of 7.13: General Rules: 2.b is that it should be
 single evaluation, even if RECURSIVE is present.

 The previous discussion was here:

 http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

 The important arguments in the thread seemed to be:

 1. People will generally expect single evaluation, so might be
 disappointed if they can't use this feature for that purpose.

 2. It's a spec violation in the case of volatile functions.

 3. I think this is a must fix because of the point about volatile
 functions --- changing it later will result in user-visible semantics
 changes, so we have to get it right the first time.

 I don't entirely agree with #3. It is user-visible, but only in the
 sense that someone is depending on undocumented multiple-evaluation
 behavior.

 Tom Lane said that multiple evaluation is grounds for rejection:
 http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php

 Is there hope of correcting this before November?

 According to Tom, to implement single evaluation we need to make big
 infrastructure enhancement which is likely slip the schedule for 8.4
 release which Tom does not want.

why? why don't use a materialisation?


 So as long as Tom and other people think that is a must fix, there
 seems no hope probably.

 Anyway I will continue to work on existing patches...
 --

I would to see your patch in core early. I am working on grouping sets
and I cannot finish my patch before your patch will be commited.

Regards
Pavel Stehule

 Tatsuo Ishii
 SRA OSS, Inc. Japan

  I will try to fix this. However detecting the query being not a
  non-linear one is not so easy.

 If we don't allow mutual recursion, the only kind of non-linear
 recursion that might exist would be multiple references to the same
 recursive query name in a recursive query, is that correct?

   * DISTINCT should supress duplicates:
  
 with recursive foo(i) as
   (select distinct * from (values(1),(2)) t
   union all
   select distinct i+1 from foo where i  10)
 select * from foo;
  
 This outputs a lot of duplicates, but they should be supressed
   according to the standard. This query is essentially the same as
   supporting UNION for recursive queries, so we should either fix both for
   8.4 or block both for consistency.
 
  I'm not sure if it's possible to fix this. Will look into.
 

 Can't we just reject queries with top-level DISTINCT, similar to how
 UNION is rejected?

   * outer joins on a recursive reference should be blocked:
  
 with recursive foo(i) as
   (values(1)
   union all
   select i+1 from foo left join (values(1)) t on (i=column1))
 select * from foo;
  
 Causes an infinite loop, but the standard says using an outer join
 in this situation should be prohibited. This should be fixed for 8.4.
 
  Not an issue, I think.

 Agreed, Andrew Gierth corrected me here.

 Regards,
   Jeff Davis


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Tatsuo Ishii
 Hello
 
 2008/9/9 Tatsuo Ishii [EMAIL PROTECTED]:
  On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
   Thanks for the review.
  
  The standard specifies that non-recursive WITH should be evaluated
  once.
  
   What shall we do? I don't think there's a easy way to fix this. Maybe
   we should not allow WITH clause without RECURISVE?
 
  My interpretation of 7.13: General Rules: 2.b is that it should be
  single evaluation, even if RECURSIVE is present.
 
  The previous discussion was here:
 
  http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
 
  The important arguments in the thread seemed to be:
 
  1. People will generally expect single evaluation, so might be
  disappointed if they can't use this feature for that purpose.
 
  2. It's a spec violation in the case of volatile functions.
 
  3. I think this is a must fix because of the point about volatile
  functions --- changing it later will result in user-visible semantics
  changes, so we have to get it right the first time.
 
  I don't entirely agree with #3. It is user-visible, but only in the
  sense that someone is depending on undocumented multiple-evaluation
  behavior.
 
  Tom Lane said that multiple evaluation is grounds for rejection:
  http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php
 
  Is there hope of correcting this before November?
 
  According to Tom, to implement single evaluation we need to make big
  infrastructure enhancement which is likely slip the schedule for 8.4
  release which Tom does not want.
 
 why? why don't use a materialisation?

See:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

 
  So as long as Tom and other people think that is a must fix, there
  seems no hope probably.
 
  Anyway I will continue to work on existing patches...
  --
 
 I would to see your patch in core early. I am working on grouping sets
 and I cannot finish my patch before your patch will be commited.
 
 Regards
 Pavel Stehule
 
  Tatsuo Ishii
  SRA OSS, Inc. Japan
 
   I will try to fix this. However detecting the query being not a
   non-linear one is not so easy.
 
  If we don't allow mutual recursion, the only kind of non-linear
  recursion that might exist would be multiple references to the same
  recursive query name in a recursive query, is that correct?
 
* DISTINCT should supress duplicates:
   
  with recursive foo(i) as
(select distinct * from (values(1),(2)) t
union all
select distinct i+1 from foo where i  10)
  select * from foo;
   
  This outputs a lot of duplicates, but they should be supressed
according to the standard. This query is essentially the same as
supporting UNION for recursive queries, so we should either fix both 
for
8.4 or block both for consistency.
  
   I'm not sure if it's possible to fix this. Will look into.
  
 
  Can't we just reject queries with top-level DISTINCT, similar to how
  UNION is rejected?
 
* outer joins on a recursive reference should be blocked:
   
  with recursive foo(i) as
(values(1)
union all
select i+1 from foo left join (values(1)) t on (i=column1))
  select * from foo;
   
  Causes an infinite loop, but the standard says using an outer join
  in this situation should be prohibited. This should be fixed for 8.4.
  
   Not an issue, I think.
 
  Agreed, Andrew Gierth corrected me here.
 
  Regards,
Jeff Davis
 
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Tatsuo Ishii
 Hello
 
 2008/9/9 Tatsuo Ishii [EMAIL PROTECTED]:
  On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
   Thanks for the review.
  
  The standard specifies that non-recursive WITH should be evaluated
  once.
  
   What shall we do? I don't think there's a easy way to fix this. Maybe
   we should not allow WITH clause without RECURISVE?
 
  My interpretation of 7.13: General Rules: 2.b is that it should be
  single evaluation, even if RECURSIVE is present.
 
  The previous discussion was here:
 
  http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
 
  The important arguments in the thread seemed to be:
 
  1. People will generally expect single evaluation, so might be
  disappointed if they can't use this feature for that purpose.
 
  2. It's a spec violation in the case of volatile functions.
 
  3. I think this is a must fix because of the point about volatile
  functions --- changing it later will result in user-visible semantics
  changes, so we have to get it right the first time.
 
  I don't entirely agree with #3. It is user-visible, but only in the
  sense that someone is depending on undocumented multiple-evaluation
  behavior.
 
  Tom Lane said that multiple evaluation is grounds for rejection:
  http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php
 
  Is there hope of correcting this before November?
 
  According to Tom, to implement single evaluation we need to make big
  infrastructure enhancement which is likely slip the schedule for 8.4
  release which Tom does not want.
 
 why? why don't use a materialisation?

See:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

  So as long as Tom and other people think that is a must fix, there
  seems no hope probably.
 
  Anyway I will continue to work on existing patches...
  --
 
 I would to see your patch in core early. I am working on grouping sets
 and I cannot finish my patch before your patch will be commited.
 
 Regards
 Pavel Stehule
 
  Tatsuo Ishii
  SRA OSS, Inc. Japan
 
   I will try to fix this. However detecting the query being not a
   non-linear one is not so easy.
 
  If we don't allow mutual recursion, the only kind of non-linear
  recursion that might exist would be multiple references to the same
  recursive query name in a recursive query, is that correct?
 
* DISTINCT should supress duplicates:
   
  with recursive foo(i) as
(select distinct * from (values(1),(2)) t
union all
select distinct i+1 from foo where i  10)
  select * from foo;
   
  This outputs a lot of duplicates, but they should be supressed
according to the standard. This query is essentially the same as
supporting UNION for recursive queries, so we should either fix both 
for
8.4 or block both for consistency.
  
   I'm not sure if it's possible to fix this. Will look into.
  
 
  Can't we just reject queries with top-level DISTINCT, similar to how
  UNION is rejected?
 
* outer joins on a recursive reference should be blocked:
   
  with recursive foo(i) as
(values(1)
union all
select i+1 from foo left join (values(1)) t on (i=column1))
  select * from foo;
   
  Causes an infinite loop, but the standard says using an outer join
  in this situation should be prohibited. This should be fixed for 8.4.
  
   Not an issue, I think.
 
  Agreed, Andrew Gierth corrected me here.
 
  Regards,
Jeff Davis
 
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Simon Riggs

On Tue, 2008-09-09 at 17:17 +0300, Heikki Linnakangas wrote:
 Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
  On Tue, 2008-09-09 at 08:24 -0400, Tom Lane wrote:
  Agreed?  That last restriction is a deal-breaker.
  
  OK, I should have said *if wal_buffers are full* XLogInsert() cannot
  advance to a new page while we are waiting to send or write. So I don't
  think its a deal breaker.
  
  Oh, OK, that's obvious --- there's no place to put more data.
 
 Each WAL sender can keep at most one page locked at a time, right? So, 
 that should never happen if wal_buffers  1 + n_wal_senders.

Don't understand. I am referring to the logic at the top of
AdvanceXLInsertBuffer(). We would need to wait for all people reading
the contents of wal_buffers. 

Currently, there is no page locking on the WAL buffers, though I have
suggested some for increasing XLogInsert() performance.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Tatsuo Ishii
  * Aggregates allowed:
  
with recursive foo(i) as
  (values(1)
  union all
  select max(i)+1 from foo where i  10)
select * from foo;
  
Aggregates should be blocked according to the standard.
Also, causes an infinite loop. This should be fixed for 8.4.
 
 I will try to fix this.

We already reject:

select max(i) from foo where i  10)

But max(i)+1 seems to slip the check. I looked into this I found the
patch tried to detect the case before analyzing(see
parser/parse_cte.c) which is not a right thing I think.

I think we could detect the case by adding more checking in
parseCheckAggregates():

/*
 * Check if there's aggregate function in a recursive term.
 */
foreach(l, qry-rtable)
{
RangeTblEntry *rte = (RangeTblEntry *) lfirst(l);

if (qry-hasAggs  rte-rtekind == RTE_RECURSIVE 
rte-self_reference)
{
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
 errmsg(aggregate functions in a 
recursive term not allowed)));
}
}

What do you think?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Verbosity of Function Return Type Checks

2008-09-09 Thread Alvaro Herrera
Volkan YAZICI wrote:
 On Mon, 8 Sep 2008, Alvaro Herrera [EMAIL PROTECTED] writes:
  Modified as you suggested. BTW, will there be a similar i18n scenario
  for dropped column you mentioned below?
 
  Yes, you need _() around those too.
 
 For this purpose, I introduced a dropped_column_type variable in
 validate_tupdesc_compat() function:
 
   const char dropped_column_type[] = _(n/a (dropped column));
 
 And used this in below fashion:
 
   OidIsValid(returned-attrs[i]-atttypid)
   ? format_type_be(returned-attrs[i]-atttypid)
   : dropped_column_type

I changed it to gettext_noop(the text) and _(dropped_column_type) in
errdetail, and committed it.

I'd still like to have a better way to word the message, and maybe have
this error appear in a regression test somewhere at least once ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Heikki Linnakangas

Simon Riggs wrote:

Don't understand. I am referring to the logic at the top of
AdvanceXLInsertBuffer(). We would need to wait for all people reading
the contents of wal_buffers. 


Oh, I see.

If a slave falls behind, how does it catch up? I guess you're saying 
that it can't fall behind, because the master will block before that 
happens. Also in asynchronous replication? And what about when the slave 
is first set up, and needs to catch up with the master?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Simon Riggs

On Tue, 2008-09-09 at 18:26 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  Don't understand. I am referring to the logic at the top of
  AdvanceXLInsertBuffer(). We would need to wait for all people reading
  the contents of wal_buffers. 
 
 Oh, I see.
 
 If a slave falls behind, how does it catch up? 

That is the right question.

 I guess you're saying 
 that it can't fall behind, because the master will block before that 
 happens. Also in asynchronous replication? 

Yes, it can fall behind in async mode. sysadmin must not let it.

 And what about when the slave 
 is first set up, and needs to catch up with the master?

We need an initial joining mode while they match speed. We must allow
for the case where the standby has been recycled, or the network has
been down for a medium-long period of time.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Pavel Stehule
2008/9/9 Tatsuo Ishii [EMAIL PROTECTED]:
 Hello

 2008/9/9 Tatsuo Ishii [EMAIL PROTECTED]:
  On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
   Thanks for the review.
  
  The standard specifies that non-recursive WITH should be evaluated
  once.
  
   What shall we do? I don't think there's a easy way to fix this. Maybe
   we should not allow WITH clause without RECURISVE?
 
  My interpretation of 7.13: General Rules: 2.b is that it should be
  single evaluation, even if RECURSIVE is present.
 
  The previous discussion was here:
 
  http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
 

I am  blind, I didn't find any reason, why materialisation isn't useable.

Regards
Pavel

  The important arguments in the thread seemed to be:
 
  1. People will generally expect single evaluation, so might be
  disappointed if they can't use this feature for that purpose.
 
  2. It's a spec violation in the case of volatile functions.
 
  3. I think this is a must fix because of the point about volatile
  functions --- changing it later will result in user-visible semantics
  changes, so we have to get it right the first time.
 
  I don't entirely agree with #3. It is user-visible, but only in the
  sense that someone is depending on undocumented multiple-evaluation
  behavior.
 
  Tom Lane said that multiple evaluation is grounds for rejection:
  http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php
 
  Is there hope of correcting this before November?
 
  According to Tom, to implement single evaluation we need to make big
  infrastructure enhancement which is likely slip the schedule for 8.4
  release which Tom does not want.

 why? why don't use a materialisation?

 See:
 http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

 
  So as long as Tom and other people think that is a must fix, there
  seems no hope probably.
 
  Anyway I will continue to work on existing patches...
  --

 I would to see your patch in core early. I am working on grouping sets
 and I cannot finish my patch before your patch will be commited.

 Regards
 Pavel Stehule

  Tatsuo Ishii
  SRA OSS, Inc. Japan
 
   I will try to fix this. However detecting the query being not a
   non-linear one is not so easy.
 
  If we don't allow mutual recursion, the only kind of non-linear
  recursion that might exist would be multiple references to the same
  recursive query name in a recursive query, is that correct?
 
* DISTINCT should supress duplicates:
   
  with recursive foo(i) as
(select distinct * from (values(1),(2)) t
union all
select distinct i+1 from foo where i  10)
  select * from foo;
   
  This outputs a lot of duplicates, but they should be supressed
according to the standard. This query is essentially the same as
supporting UNION for recursive queries, so we should either fix both 
for
8.4 or block both for consistency.
  
   I'm not sure if it's possible to fix this. Will look into.
  
 
  Can't we just reject queries with top-level DISTINCT, similar to how
  UNION is rejected?
 
* outer joins on a recursive reference should be blocked:
   
  with recursive foo(i) as
(values(1)
union all
select i+1 from foo left join (values(1)) t on (i=column1))
  select * from foo;
   
  Causes an infinite loop, but the standard says using an outer join
  in this situation should be prohibited. This should be fixed for 
8.4.
  
   Not an issue, I think.
 
  Agreed, Andrew Gierth corrected me here.
 
  Regards,
Jeff Davis
 
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Robert Haas
  My interpretation of 7.13: General Rules: 2.b is that it should be
  single evaluation, even if RECURSIVE is present.
 
  The previous discussion was here:
 
  http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

 I am  blind, I didn't find any reason, why materialisation isn't useable.

I believe it's because of these two (closely related) problems:

# The basic
# implementation clearly ought to be to dump the result of the subquery
# into a tuplestore and then have the upper level read out from that.
# However, we don't have any infrastructure for having multiple
# upper-level RTEs reference the same tuplestore.  (Perhaps the InitPlan
# infrastructure could be enhanced in that direction, but it's not ready
# for non-scalar outputs today.)  Also, I think we'd have to teach
# tuplestore how to support multiple readout cursors.  For example,
# consider
#   WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ...
# If the planner chooses to do the join as a nested loop then each
# Scan node needs to keep track of its own place in the tuplestore,
# concurrently with the other node having a different place.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-09 Thread Hitoshi Harada
 Also, current implementation has only a type of plan which uses sort
 operation. It should be optimized by re-position the windows and/or
 using hashtable.

 I would like to see some performance test results also. It would be good
 to know whether they are fast/slow etc.. It will definitely help the
 case for inclusion if they are faster than alternative multi-statement
 approaches to solving the basic data access problems.


Just for the report, I attach the result I have tested today. You see
the result says the current window function is faster than
sort-operated self-join and slower than hashagg-operated self-join.

This test is on the Redhat Linux ES3 Xeon 2.13GHz with 100,000 rows 2
column integers. I wrote simple perl script using psql invoking the
shell so it may contain the invocation overhead overall.


test0   test1   test2   test3   test4   test5

689.502 416.633 257.970 1195.294954.318 1204.292
687.254 447.676 256.629 1075.342949.711 1154.754
700.602 421.818 260.742 1105.680926.462 1203.012
736.594 476.388 334.310 1157.818978.861 1199.944
676.572 418.782 270.270 1060.900909.474 1175.079
687.260 428.564 257.032 1069.0131045.3871275.988
700.252 429.289 263.216 1074.7491018.9681273.965
719.478 445.218 258.464 1087.9321015.7441273.637
694.865 453.737 261.286 1065.2291039.9411262.208
685.756 430.169 258.017 1124.7951102.0551297.603

697.81  436.83  267.79  1101.68 994.09  1232.05

test0   SELECT sum(amount) OVER (PARTITION BY sector) FROM bench1;
test1   SELECT amount FROM bench1 ORDER BY sector;
test2   SELECT sum(amount) FROM bench1 GROUP BY sector;
test3   SELECT id, amount - avg(amount) OVER (PARTITION BY sector) FROM bench1;
test4   SELECT id, amount - avg FROM bench1 INNER JOIN(SELECT sector,
avg(amount) FROM bench1 GROUP BY sector)t USING(sector)
test5   SET enable_hashagg TO off; SELECT id, amount - avg FROM bench1
INNER JOIN(SELECT sector, avg(amount) FROM bench1 GROUP BY sector)t
USING(sector)

I'll include this test in my docs later.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Pavel Stehule
2008/9/9 Robert Haas [EMAIL PROTECTED]:
  My interpretation of 7.13: General Rules: 2.b is that it should be
  single evaluation, even if RECURSIVE is present.
 
  The previous discussion was here:
 
  http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

 I am  blind, I didn't find any reason, why materialisation isn't useable.

 I believe it's because of these two (closely related) problems:

 # The basic
 # implementation clearly ought to be to dump the result of the subquery
 # into a tuplestore and then have the upper level read out from that.
 # However, we don't have any infrastructure for having multiple
 # upper-level RTEs reference the same tuplestore.  (Perhaps the InitPlan
 # infrastructure could be enhanced in that direction, but it's not ready
 # for non-scalar outputs today.)  Also, I think we'd have to teach
 # tuplestore how to support multiple readout cursors.  For example,
 # consider
 #   WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ...
 # If the planner chooses to do the join as a nested loop then each
 # Scan node needs to keep track of its own place in the tuplestore,
 # concurrently with the other node having a different place.


hmm. I solve similar problem in grouping sets :( etc

SELECT ... FROM ... GROUP BY GROUPING SETS (a,b)

is almost same as

With foo AS (SELECT ... FROM) SELECT ... FROM foo GROUP BY a UNION ALL
SELECT ... FROM foo GROUP BY b;

Regards
Pavel





 ...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-09 Thread Peter Eisentraut

Tom Lane wrote:

Greg Stark [EMAIL PROTECTED] writes:

On Mon, Sep 8, 2008 at 2:11 PM, Tom Lane [EMAIL PROTECTED] wrote:

But of course case insensitivity isn't going to fix that example for you.
So we're right back at the question of where we should draw the line in
trying to accept variant input.



Well it's not a perfect precedent but for example, dd accepts:



G(2^30)
M(2^20)
k (2^10)
K(2^10)
Kb  (10^3)
MB (10^6)
GB (10^9)
b(512)


Hmm.  I could get behind a proposal to allow single-letter abbreviations
if it could be made to work across the board,


The SQL standard actually specifies something about that.  You can 
define the length of large object types (CLOB and BLOB) with multipliers 
K, M, and G, as in


CREATE TABLE foo ( bar BLOB(5 M) );

These multipliers are case insensitive, of course.  (And their are 
1024-based, FWIW.)


So I could imagine that we generalize this approach to make these 
multipliers available in other positions.


This would have definitional problems of its own, however.  If you 
interpret K, M, and G strictly as unit-less multipliers, then


SET shared_buffers = 2 G

would mean

SET shared_buffers = 2 * 1073741824

meaning

SET shared_buffers = 2147463648

which is not the same thing as the current

SET shared_buffer = '2GB'

This also affects the solution to another GUC units complaint that the 
quotes are annoying, which I support.


We could possibly settle some of these arguments if we could redefine 
all memory parameters to use one byte as base unit, and then allow some 
ambiguity and unit omission from there.  But that would probably cause 
much havoc, so we are stuck with a certain degree of inconsistency anyhow.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Jeff Davis
On Tue, 2008-09-09 at 18:51 +0200, Pavel Stehule wrote:
 hmm. I solve similar problem in grouping sets :( etc
 

How did you solve it?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Pavel Stehule
2008/9/9 Jeff Davis [EMAIL PROTECTED]:
 On Tue, 2008-09-09 at 18:51 +0200, Pavel Stehule wrote:
 hmm. I solve similar problem in grouping sets :( etc


I have special executor node - feeder, it hold one tuple and others
nodes read from this node. It's usable for hash aggregates.

Pavel

plan is like:

grouping sets
-- seq scan ...
-- hash aggg
-- feeder
-- hash agg
   -- feeder




 How did you solve it?

 Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_regress inputdir

2008-09-09 Thread Alvaro Herrera
Peter Eisentraut wrote:

 There is interest among packagers to run the regression tests or other  
 tests after the build.  The Red Hat RPMs have shipped a postgresql-test  
 package for years with a hacked-up makefile that will probably overwrite  
 random files that it shouldn't in /usr/lib.  So I would rather be in  
 favor of coming up with a solution that would make this work rather than  
 removing the options.  The solution would probably be adding another  
 option to place the generated files, but the exact behavior would need  
 to be worked out.

Hmm.  I took a look at the RPM makefiles and patches, and it doesn't
seem like changing this part of pg_regress would solve anything.  The
RPM changes are about shared libraries, whereas this is just about
moving the generated files (equivalent to those in the sql directory).

For an example of the hacked-up makefiles and stuff, see here:
https://projects.commandprompt.com/public/pgcore/browser/rpm/redhat/8.3/postgresql/F-9/

The relevant files are Makefile.regress, postgresql-test.patch, and
postgresql-8.3.spec (lines 440ff).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Potential Join Performance Issue

2008-09-09 Thread Lawrence, Ramon
PostgreSQL development community:

 

Our research group has been using the PostgreSQL code base to test new
join algorithms.  During testing, we noticed that the planner is not
pushing down projections to the outer relation in a hash join.  Although
this makes sense for in-memory (1 batch) joins, for joins larger than
memory (such as for TPC-H DSS), this causes the system to perform
significantly more disk I/Os when reading/writing batches of the outer
relation.

 

A simple solution is to add a single line of code to
src\backend\optimizer\plan\createplan.c after line 1771:

 

disuse_physical_tlist(outer_plan, best_path-jpath.outerjoinpath);

 

This will always force the projection on the outer relation.

 

A more complicated modification alternative is to add a state variable
to allow the planner to know how many batches the hash join expects and
only push down the projection if it is greater than one.  However,
pushing the projection on the outer relation is almost always the best
choice as it eliminates unneeded attributes for operators above the hash
join in the plan and will be robust in the case of poor estimates.

 

We have been testing using TPC-H scale factor 1 GB.  A sample query that
demonstrates the behavior is:

 

SELECT c_custkey, c_name, o_orderkey, o_orderdate

FROM Customer, Orders

WHERE c_custkey = o_custkey

 

Note that EXPLAIN on this query will indicate that the projection is
performed on the outer relation even though it is not done.  We found
the difference by modifying our code to track tuples and bytes output to
disk, but it also can be detected by watching the size of the temporary
files produced during the join.

 

Sincerely,

 

Dr. Ramon Lawrence

Assistant Professor, Department of Computer Science, University of
British Columbia Okanagan

http://people.ok.ubc.ca/rlawrenc/

E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 

 



Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Zeugswetter Andreas OSB sIT

  Don't understand. I am referring to the logic at the top of
  AdvanceXLInsertBuffer(). We would need to wait for all people reading
  the contents of wal_buffers.

 Oh, I see.

 If a slave falls behind, how does it catch up? I guess you're saying
 that it can't fall behind, because the master will block before that
 happens. Also in asynchronous replication? And what about
 when the slave
 is first set up, and needs to catch up with the master?

I think the WAL Sender needs the ability to read the WAL files directly.
In cases where it falls behind, or just started, it needs to be able to catch 
up.
So, it seems we eighter need to copy the WAL buffer into local memory before 
sending,
or lock the WAL buffer until send finished.
Useful network timeouts are in the = 5-10 sec range (even for GbE lan), so I 
don't
think locking WAL buffers is feasible. Thus the WAL sender needs to copy (the 
needed
portion of the current WAL buffer) before send (or use async send that 
immediately
returns when the buffer is copied into the network stack).

When the WAL sender is ready to continue it eighter still finds the next WAL 
buffer
(or the rest of the current buffer) or it needs to fall back to Plan B and
read the WAL files again. A sync client could still wait for the replicate, 
even if
local WAL has already advanced massively. The checkpointer would need the LSN
info from WAL senders to not reuse any still needed WAL files, although in that 
case
it might be time to declare the replicate broken.

Ideally the WAL sender also knows whether the client waits, so it can decide to 
send
a part of a buffer. The WAL sender should wake and act whenever a network 
packet
full of WAL buffer is ready, regardless of commits. Whatever size of send seems
appropriate here (might be one WAL page).
The WAL Sender should only need to expect a response, when it sent a commit 
record,
ideally only if a client is waiting (and once in a while at least for every log 
switch).

All in all a useful streamer seems like a lot of work.

Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: plan invalidation vs stored procedures

2008-09-09 Thread Tom Lane
Martin Pihlak [EMAIL PROTECTED] writes:
 Yes, creating a new message type was a bit short sighted -- attached is a 
 patch
 that uses syscache invalidation messages instead. This also adds additional
 tupleId field to SharedInvalCatcacheMsg. This is used to identify the
 invalidated tuple in PROC messages, for now others still pass InvalidOid.

Applied after rather heavy revision.  Aside from the gripes I had
yesterday, I found out on closer inspection that the patch did things
all wrong for the case of a not-fully-planned cache item.  I ended up
discarding the existing code for that and instead using the planner
machinery to extract dependencies of a parsed querytree.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Keeping creation time of objects

2008-09-09 Thread Devrim GÜNDÜZ
Hi,

AFAICS, PostgreSQL is not keeping info about when a table, database,
sequence, etc was created. We cannot get that info even from OS, since
CLUSTER or VACUUM FULL may change the metadata of corresponding
relfilenode.

Does anyone think that adding a timestamp column to pg_class would bring
an overhead? For me, it looks a bit easy to add that value while calling
CREATE XXX, but does anyone see a corner case?

Regards, 
-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-09 Thread Robert Haas
 This would have definitional problems of its own, however.  If you interpret
 K, M, and G strictly as unit-less multipliers, then

 SET shared_buffers = 2 G

I don't think it would be a good idea to make them unit-less, for
exactly the reasons you mention.

 We could possibly settle some of these arguments if we could redefine all
 memory parameters to use one byte as base unit, and then allow some
 ambiguity and unit omission from there.  But that would probably cause much
 havoc, so we are stuck with a certain degree of inconsistency anyhow.

A good start might be to always OUTPUT memory parameters using the
same base unit.

portal=# show shared_buffers;
 shared_buffers

 24MB
(1 row)

portal=# show temp_buffers;
 temp_buffers
--
 1024
(1 row)

Kilobytes seems like the most reasonable choice, because we definitely
have variables where you would want to set a value less than 1
megabyte, and I doubt we have (or will ever need) any where we the
granularity is finer than than 1 kilobyte.

Beyond that, how about moving in the direction of deprecating
unit-less settings altogether?  In other words, if you want 1024
shared buffers, you should be saying 8192kB or 8MB rather than 1024.
We could issue a WARNING for 8.4 and eventually move to rejecting that
syntax altogether.  That gets everything into the same base unit
without ever change the semantics of any particular value.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-09 Thread Tom Lane
Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= [EMAIL PROTECTED] writes:
 AFAICS, PostgreSQL is not keeping info about when a table, database,
 sequence, etc was created. We cannot get that info even from OS, since
 CLUSTER or VACUUM FULL may change the metadata of corresponding
 relfilenode.

 Does anyone think that adding a timestamp column to pg_class would bring
 an overhead?

There isn't sufficient support for such a feature.  In any case, why
would creation time (as opposed to any other time, eg last schema
modification, last data modification, yadda yadda) be especially
significant?  Would you expect it to be preserved over dump/restore?
How about every other object type in the system?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-09 Thread daveg
On Tue, Sep 09, 2008 at 03:36:19PM -0400, Tom Lane wrote:
 Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= [EMAIL PROTECTED] writes:
  AFAICS, PostgreSQL is not keeping info about when a table, database,
  sequence, etc was created. We cannot get that info even from OS, since
  CLUSTER or VACUUM FULL may change the metadata of corresponding
  relfilenode.
 
  Does anyone think that adding a timestamp column to pg_class would bring
  an overhead?
 
 There isn't sufficient support for such a feature.  In any case, why
 would creation time (as opposed to any other time, eg last schema
 modification, last data modification, yadda yadda) be especially
 significant?  Would you expect it to be preserved over dump/restore?
 How about every other object type in the system?

I'd be very interested in seeing a last schema modification time for pg_class
objects. I don't care about it being preserved over dump and restore as my
use case is more to find out when a table was created with a view to finding
out if it is still needed. So the question I'm looking to answer is when did
that get here?

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-09 Thread Devrim GÜNDÜZ
On Tue, 2008-09-09 at 15:36 -0400, Tom Lane wrote:
 why would creation time (as opposed to any other time, eg last schema
 modification, last data modification, yadda yadda) be especially
 significant? 

Hmm, those would be cool, too.

Seriously, I believe we can get last data modification from filesystem
(if it is keeping of course), but we cannot get the creation time --
that's why I am talking about the creation time.

It would be useful when a DBA is not sure whether (s)he created the
object on a known time, or it was not restored from backups correctly or
not.

 Would you expect it to be preserved over dump/restore?

No. If we are talking about creation time, then it means we should not
preserve it, IMHO.

 How about every other object type in the system?

I'm talking about every object.

Regards,
-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Keeping creation time of objects

2008-09-09 Thread Robert Haas
 There isn't sufficient support for such a feature.

It sounds like a useful feature to me.

 In any case, why
 would creation time (as opposed to any other time, eg last schema
 modification, last data modification, yadda yadda) be especially
 significant?

Those would be nice to have too, but last data modification is
doubtless too expensive to compute and keep up to date.

 Would you expect it to be preserved over dump/restore?

Definitely not.  Then it wouldn't really be the creation time, would it?

 How about every other object type in the system?

Good idea.  I wouldn't bother for things that are intended to be
ephemeral, but having this for, say, functions, would be nice.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-09 Thread Hannu Krosing
On Tue, 2008-09-09 at 12:40 -0700, daveg wrote:

 I'd be very interested in seeing a last schema modification time for pg_class
 objects. I don't care about it being preserved over dump and restore as my
 use case is more to find out when a table was created with a view to finding
 out if it is still needed.

Isn't it easier to find out if it is still needed by looking if it is
still used, say from pg_stat_user_tables ?

-
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-09 Thread Tom Lane
Robert Haas [EMAIL PROTECTED] writes:
 I am  blind, I didn't find any reason, why materialisation isn't useable.

 I believe it's because of these two (closely related) problems:

 # The basic
 # implementation clearly ought to be to dump the result of the subquery
 # into a tuplestore and then have the upper level read out from that.
 # However, we don't have any infrastructure for having multiple
 # upper-level RTEs reference the same tuplestore.  (Perhaps the InitPlan
 # infrastructure could be enhanced in that direction, but it's not ready
 # for non-scalar outputs today.)  Also, I think we'd have to teach
 # tuplestore how to support multiple readout cursors.  For example,
 # consider
 # WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ...
 # If the planner chooses to do the join as a nested loop then each
 # Scan node needs to keep track of its own place in the tuplestore,
 # concurrently with the other node having a different place.

The amount of new code needed for that seems a pittance compared to the
size of the patch already, so I'm not seeing why Tatsuo-san considers
it infeasible.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-09 Thread daveg
On Tue, Sep 09, 2008 at 11:03:56PM +0300, Hannu Krosing wrote:
 On Tue, 2008-09-09 at 12:40 -0700, daveg wrote:
 
  I'd be very interested in seeing a last schema modification time for 
  pg_class
  objects. I don't care about it being preserved over dump and restore as my
  use case is more to find out when a table was created with a view to finding
  out if it is still needed.
 
 Isn't it easier to find out if it is still needed by looking if it is
 still used, say from pg_stat_user_tables ?

Except that pg_dump will access it and make it look used. Also, this does
not work for functions, views etc.

It seems to me to be pretty simple to put an abstime or timestamp column
on the major catalog tables and update it when the row is updated. A mod
time is more useful probably than a create time.

-dg
 

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-09 Thread Andrew Dunstan



daveg wrote:

On Tue, Sep 09, 2008 at 11:03:56PM +0300, Hannu Krosing wrote:
  

On Tue, 2008-09-09 at 12:40 -0700, daveg wrote:



I'd be very interested in seeing a last schema modification time for pg_class
objects. I don't care about it being preserved over dump and restore as my
use case is more to find out when a table was created with a view to finding
out if it is still needed.
  

Isn't it easier to find out if it is still needed by looking if it is
still used, say from pg_stat_user_tables ?



Except that pg_dump will access it and make it look used. Also, this does
not work for functions, views etc.

It seems to me to be pretty simple to put an abstime or timestamp column
on the major catalog tables and update it when the row is updated. A mod
time is more useful probably than a create time.


  


I must say I'm suspicious of this whole proposal. It looks a whole lot 
like data creeping into metadata.


We already have the ability to log just DDL statements, although that's 
somewhat incomplete in that it doesn't track DDL performed by functions.


Can someone please give a good, concrete use case for this stuff? Might 
be nice to have doesn't cut it, I'm afraid. In particular, I'd like to 
know why logging statements won't do the trick here.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-09 Thread Devrim GÜNDÜZ
Hi Andrew,

On Tue, 2008-09-09 at 16:22 -0400, Andrew Dunstan wrote:
 I'd like to know why logging statements won't do the trick here.

It is not on by default, logs are rotated, and may be lost, etc.

Regards,
-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-09 Thread Hannu Krosing
On Thu, 2008-09-04 at 11:51 -0400, Andrew Sullivan wrote:
 On Thu, Sep 04, 2008 at 07:01:18AM -0700, Steve Atkins wrote:
  Settings in postgresql.conf are currently case-insensitive. Except
  for the units.
 
 And, of course, filenames when you are using a case-sensitive
 filesystem.  Because these are things that are defined by some
 convention other than the ones the PGDG made up.  Since units fall
 into that category, it seems to me that we're stuck with using
 external conventions.
 
  one right now. If the answer to that is something along the lines
  of we don't support megaabits for shared_buffers, and never will because
  nobody in their right mind would ever intend to use megabits
  to set their shared buffer size... that's a useful datapoint when
  it comes to designing for usability.
 
 And you are going to establish this worldwide convention on what
 someone in right mind would do how, exactly?  For instance, I think
 nobody in right mind would use KB to mean kilobytes.  

Except those following JEDEC standards ;)

from http://en.wikipedia.org/wiki/JEDEC_memory_standards

---
JEDEC Standard 100B.01[1] defines the prefix to units of semiconductor
storage capacity as follows:

  * kilo (K): A multiplier equal to 1024 (210).
  * mega (M): A multiplier equal to 1 048 576 (220 or K2, where K =
1024).
  * giga (G): A multiplier equal to 1 073 741 824 (230 or K3, where
K = 1024).
---

I'm not sure if this applies only to storage _capacity_ or also to stuff
stored using said capacity.


but in general I think it is easier to train millions of DBAs to use kB
than to achieve consensus about what everybody assumes on this list,
so I also give +1 to working on config checker instead .

--
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-09 Thread Greg Smith

On Tue, 9 Sep 2008, Robert Haas wrote:


A good start might be to always OUTPUT memory parameters using the
same base unit.


SHOW gives output that matches what you input.  If you want to see things 
with consistant units, look at pg_settings:


# select name,unit,setting,current_setting(name) from pg_settings where 
name='shared_buffers' or name='temp_buffers';


  name  | unit | setting | current_setting
+--+-+-
 shared_buffers | 8kB  | 131072  | 1GB
 temp_buffers   | 8kB  | 1024| 1024


Beyond that, how about moving in the direction of deprecating
unit-less settings altogether?  In other words, if you want 1024
shared buffers, you should be saying 8192kB or 8MB rather than 1024.
We could issue a WARNING for 8.4 and eventually move to rejecting that
syntax altogether.  That gets everything into the same base unit
without ever change the semantics of any particular value.


Deprecating setting the value directly in its base unit, so that all the 
memory parameters are specified as a number of bytes, would sidestep a lot 
of the issues people complain about.  What I would like to see (but don't 
have nearly enough time to argue in support of considering the resistance 
to change here) is that this syntax:


shared_buffers=1024

Would assume the user meant 1024 *bytes*, with the server silently 
rounding that up to the nearest 8k block.  Then the whole issue of do 
they mean bits or bytes? goes away, because no one would ever have to 
include the B.  That paves the way for making it easy to support 
case-insensitive values without pedantic confusion.  As I see it these 
should all give you the same thing:


shared_buffers=16777216
shared_buffers=16384k
shared_buffers=16384K
shared_buffers=16M
shared_buffers=16m

Because that's what would make life easiest for those configuring the 
software.


Since this suggestion will inevitably lead to cries of befowled backward 
compatibility, what I've been doing instead of participating in this 
thread is working on a tool that will make it easy to convert old 
postgresql.conf files to support a new version.  Then the tool can convert 
all the places someone uses the old syntax into the new.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-09 Thread David Fetter
On Tue, Sep 09, 2008 at 10:20:00PM +0300, Devrim GUNDUZ wrote:
 Hi,
 
 AFAICS, PostgreSQL is not keeping info about when a table, database,
 sequence, etc was created.  We cannot get that info even from OS,
 since CLUSTER or VACUUM FULL may change the metadata of
 corresponding relfilenode.

When people aren't keeping track of their DDL, that is very strictly a
process problem on their end.  When people are shooting themselves in
the foot, it's a great disservice to market Kevlar shoes to them.

 Does anyone think that adding a timestamp column to pg_class would
 bring an overhead?  For me, it looks a bit easy to add that value
 while calling CREATE XXX, but does anyone see a corner case?

As above, I am making a case for never attempting any such a thing,
and instead helping people understand that a casual attitude about
their DDL will result in cascading--usually catastrophic--failures.

Cheers,
David.
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposed patch: make SQL interval-literal syntax work per spec

2008-09-09 Thread Tom Lane
Over in that TPC-H thread, I was bemoaning once again the never-finished
support for SQL-spec interval literals.  I decided to go look at exactly
how unfinished it was, and it turns out that it's actually pretty close.
Hence the attached proposed patch ;-)

The main gating factor is that coerce_type doesn't want to pass typmod
through to the datatype input function when converting a literal
constant.  This is necessary for certain types like char and varchar
but loses badly for interval.  I have a feeling that we changed that
behavior after Tom Lockhart left the project, which may mean that
interval wasn't quite as broken when he left it as it is today.
Anyway, the attached patch simply hardwires a special case for INTERVAL.
Given that this is reflective of a special case in the standard, and
that there's no very good reason for anyone else to design a datatype
that acts this way, I don't feel too bad about such a hack; but has
anyone got a better idea?

After that it's just a matter of getting DecodeInterval to do the
right things; and it turns out that about half the logic for SQL-spec
input syntax was there already.  Almost the only thing I had to change
was the code to decide what a plain integer field at the right end of
the input means.

The patch includes regression test changes that illustrate what it does.
I am not sure about some of the corner cases --- anyone want to see if
their understanding of the spec for interval string is different?

There is still some unfinished business if anyone wants to make it
really exactly 100% spec compliant.  In particular the spec seems to
allow a minus sign *outside* the string literal, and if I'm reading it
right, a precision spec in combination with field restrictions ought to
look like INTERVAL '...' DAY TO SECOND(3) not INTERVAL(3) '...' DAY TO
SECOND.  However, for these you'll get a syntax error instead of
silently wrong answers if you try to use the other syntax, so it's not
quite as pernicious as the matters addressed here.

regards, tom lane

Index: src/backend/parser/parse_coerce.c
===
RCS file: /cvsroot/pgsql/src/backend/parser/parse_coerce.c,v
retrieving revision 2.166
diff -c -r2.166 parse_coerce.c
*** src/backend/parser/parse_coerce.c   1 Sep 2008 20:42:44 -   2.166
--- src/backend/parser/parse_coerce.c   9 Sep 2008 23:47:59 -
***
*** 179,184 
--- 179,185 
Const  *newcon = makeNode(Const);
Oid baseTypeId;
int32   baseTypeMod;
+   int32   inputTypeMod;
TypetargetType;
ParseCallbackState pcbstate;
  
***
*** 190,202 
 * what we want here.  The needed check will be applied properly
 * inside coerce_to_domain().
 */
!   baseTypeMod = -1;
baseTypeId = getBaseTypeAndTypmod(targetTypeId, baseTypeMod);
  
targetType = typeidType(baseTypeId);
  
newcon-consttype = baseTypeId;
!   newcon-consttypmod = -1;
newcon-constlen = typeLen(targetType);
newcon-constbyval = typeByVal(targetType);
newcon-constisnull = con-constisnull;
--- 191,217 
 * what we want here.  The needed check will be applied properly
 * inside coerce_to_domain().
 */
!   baseTypeMod = targetTypeMod;
baseTypeId = getBaseTypeAndTypmod(targetTypeId, baseTypeMod);
  
+   /*
+* For most types we pass typmod -1 to the input routine, 
because
+* existing input routines follow implicit-coercion semantics 
for
+* length checks, which is not always what we want here.  Any 
length
+* constraint will be applied later by our caller.  An exception
+* however is the INTERVAL type, for which we *must* pass the 
typmod
+* or it won't be able to obey the bizarre SQL-spec input rules.
+* (Ugly as sin, but so is this part of the spec...)
+*/
+   if (baseTypeId == INTERVALOID)
+   inputTypeMod = baseTypeMod;
+   else
+   inputTypeMod = -1;
+ 
targetType = typeidType(baseTypeId);
  
newcon-consttype = baseTypeId;
!   newcon-consttypmod = inputTypeMod;
newcon-constlen = typeLen(targetType);
newcon-constbyval = typeByVal(targetType);
newcon-constisnull = con-constisnull;
***
*** 215,234 
setup_parser_errposition_callback(pcbstate, pstate, 
con-location);
  
/*
-* We pass typmod -1 to the input routine, primarily because 
existing
- 

Re: [HACKERS] Keeping creation time of objects

2008-09-09 Thread Gregory Williamson
Andrew Dunstan wrote:
...
 
 Can someone please give a good, concrete use case for this stuff? Might 
 be nice to have doesn't cut it, I'm afraid. In particular, I'd like to 
 know why logging statements won't do the trick here.
 

Please pardon the kibbitzer intrusion ... 

Informix has this feature and I've often yearned for it in PostgreSQL (although 
it is low on my personal priorities). Typical use case I've run into is working 
on legacy databases where the original DBA is gone or senile (deprecating 
self-reference not to applied to any one on this list) and I need to make sense 
of a muddle of similarly named tables or functions with the same structure but 
different row counts or variant codings. The logs have long since been offlined 
to gosh knows where or lost -- we're talking 5 or more years of activity -- and 
even scripts may be suspect (the checked in script might refer to an original 
table but the DBA made on the fly changes) or some other DBA-like creature did 
things without proper procedures being followed.

Having that date has been critical to resolving those issues of which table 
came in which order. It also gives a time window to use to go check old emails, 
archives, etc. for more information. 

Last update of data seems prohibitively expensive; if a user wants that a 
trigger and a 2nd table could well do that. Last DDL mod ... I could see the 
use but my old workhorse doesn't offer it so it never occurred to me to want 
it. Until know. '-)

But this request is adding metadata, I agree. But with my vague understandings 
adding a date or time stamp for table creation wouldn't be a large bloat and if 
only required at creation seems low overhead. 

But maybe only bad DBAs need it. Or good DBAs who inherit systems from bad ones 
?

Sorry for the crufty posting -- my web client has recently deteriorated in 
terms of message formatting.

Greg Williamson
Senior DBA
DigitalGlobe

Confidentiality Notice: This e-mail message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information and must be protected in accordance with those 
provisions. Any unauthorized review, use, disclosure or distribution is 
prohibited. If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)



Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-09 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 What I would like to see (but don't 
 have nearly enough time to argue in support of considering the resistance 
 to change here) is that this syntax:

 shared_buffers=1024

 Would assume the user meant 1024 *bytes*, with the server silently 
 rounding that up to the nearest 8k block.  Then the whole issue of do 
 they mean bits or bytes? goes away, because no one would ever have to 
 include the B.

How do you come to that conclusion?  Leaving off the unit entirely
certainly doesn't make the user's intent clearer.

There's also a pretty serious compatibility problem, which is that
settings that had always worked before would suddenly be completely
broken (off by a factor of 8192 qualifies as broken in my book).

I think that if we wanted to change anything here, we'd have to
*require* a unit spec on unit-affected parameters, at least for a period
of several releases.  Otherwise the confusion would be horrendous.

 That paves the way for making it easy to support 
 case-insensitive values without pedantic confusion.

Again, you're just making this up.  It doesn't make anything clearer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] to_date() validation

2008-09-09 Thread Alex Hunsaker
On Mon, Sep 8, 2008 at 2:24 AM, Brendan Jurd [EMAIL PROTECTED] wrote:
 HEAD actually gets this one wrong; in defiance of the documentation it
 returns 2000-09-07.  So, it seems to me that the patch shifts the
 behaviour in the right direction.

 Barring actually teaching the code that some nodes (like ) can
 take an open-ended number of characters, while others (like MM) must
 take an exact number of characters, I'm not sure what can be done to
 improve this.  Perhaps something for a later patch?

Sound good to me and I would probably argue that things like MM should
not be hard coded to take only 2 chars...
But then again to play devils advocate I can just as easily do things
like to_char(...) + '30 months'::interval;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_regress inputdir

2008-09-09 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Peter Eisentraut wrote:
 There is interest among packagers to run the regression tests or other  
 tests after the build.  The Red Hat RPMs have shipped a postgresql-test  
 package for years with a hacked-up makefile that will probably overwrite  
 random files that it shouldn't in /usr/lib.  So I would rather be in  
 favor of coming up with a solution that would make this work rather than  
 removing the options.  The solution would probably be adding another  
 option to place the generated files, but the exact behavior would need  
 to be worked out.

 Hmm.  I took a look at the RPM makefiles and patches, and it doesn't
 seem like changing this part of pg_regress would solve anything.

Well, it would be interesting if it were possible for an unprivileged
user to run postgresql-test.  That would mean arranging for the tests to
not write anything in the regression source directory, but only in some
user-private directory; ie, keeping the modifiable and non-modifiable
files separate.  Which I think is what Peter is getting at above.

However, at least for Red Hat I don't think I could use such a feature
if I had it :-(.  You'll note that Makefile.regress has to fool around
with SELinux labeling, which I think isn't possible for any old random
user.  That's not something that could be avoided if we had a pg_regress
that was careful about modifiable vs non-modifiable files, because the
restriction is actually enforced against the installed postgresql
binaries.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] to_date() validation

2008-09-09 Thread Alex Hunsaker
On Tue, Sep 9, 2008 at 6:46 AM, Brendan Jurd [EMAIL PROTECTED] wrote:
 On Tue, Sep 9, 2008 at 9:04 PM, Brendan Jurd [EMAIL PROTECTED] wrote:
 On Tue, Sep 9, 2008 at 7:29 PM, Martijn van Oosterhout
 [EMAIL PROTECTED] wrote:
 The use of palloc/pfree in this routine seems excessive. Does len have
 upper bound? If so a simple array will do it.



Good catch Martijn!

 I suppose I could define a constant FORMATNODE_MAX_LEN, make it
 something like 10 and just use that for copying the string, rather
 than palloc().  I'll give it a try.


 Turns out there was already a relevant constant defined in
 formatting.c: DCH_MAX_ITEM_SIZ, set to 9.  So I just used that, +1 for
 the trailing null.

Cool.


 Here you do not note if we didn't convert the entire string. So it
 seems you are allowed to provide too few characters, as long as it's
 not the last field?

 That's true.  The only way to hit that condition would be to provide a
 semi-bogus value in a string with no separators between the numbers.

 I've now plugged this hole.  I added the following error for
 fixed-width inputs that are too short:

 postgres=# SELECT to_date('200%1010', 'MMDD');
 ERROR:  invalid value for  in source string
 DETAIL:  Field requires 4 characters, but only 3 could be parsed.
 HINT:  If your source string is not fixed-width, try using the FM modifier.

I think thats a big improvement.

 I've attached a new version of the patch (version 3), as well as an
 incremental patch to show the differences between versions 2 and 3.

I looked it over, looks good to me!

 Cheers,
 BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential Join Performance Issue

2008-09-09 Thread Tom Lane
Lawrence, Ramon [EMAIL PROTECTED] writes:
 Our research group has been using the PostgreSQL code base to test new
 join algorithms.  During testing, we noticed that the planner is not
 pushing down projections to the outer relation in a hash join.  Although
 this makes sense for in-memory (1 batch) joins, for joins larger than
 memory (such as for TPC-H DSS), this causes the system to perform
 significantly more disk I/Os when reading/writing batches of the outer
 relation.

Hm.  The proposed patch seems a bit brute-force, since it loses the
benefit of the physical-tlist optimization even if the relations are
certainly too small to require batching.

 A more complicated modification alternative is to add a state variable
 to allow the planner to know how many batches the hash join expects and
 only push down the projection if it is greater than one.  However,
 pushing the projection on the outer relation is almost always the best
 choice as it eliminates unneeded attributes for operators above the hash
 join in the plan and will be robust in the case of poor estimates.

Nonetheless, I'm inclined to do it that way.  The robust in the case of
poor estimates argument doesn't convince me, because the incremental
cost isn't *that* large if we get it wrong; and the other argument is
just bogus because we don't do physical tlists at or above joins anyhow.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-09 Thread Greg Smith

On Tue, 9 Sep 2008, Tom Lane wrote:


How do you come to that conclusion?  Leaving off the unit entirely
certainly doesn't make the user's intent clearer.


Same way I do all my conclusions in this area--talking to people in the 
field regularly who've never configured a postgresql.conf before.  I 
highly recommend it for a fresh view.


Here's how this works every time I go throught it.  When you first 
encounter someone who is new to PostgreSQL, after they find out 
shared_buffers is a memory allocation they presume it's in bytes until 
they find out otherwise.  And then they're slightly annoyed that a) if 
they accidentally don't include a unit all the values get way bigger 
because of some backward compatibility nonsense they don't care about and 
b) that it's case sensitive.  Since some of the arguments against (b) (Mb 
could mean megabits!) diminish if the recommended practice is to just 
keep the multiplier in there, so the byte part of the unit is optional and 
not used by the default postgresql.conf, that seems the most reasonable 
way to proceed to me.


Unlike things like network speed where it's more complicated, memory is 
measured in bytes unless there's a different multiplier attached by most 
people.  Greg Stark already made this same observation yesterday:  But 
the point is that yes, people expect to type 100M or 1G and have that 
work. Plenty of us do it all the time with dd or other tools already.


I don't actually expect any adjustment here but was getting a little bored 
watching everyone replay 
http://archives.postgresql.org/pgsql-hackers/2006-07/msg01229.php with 
barely any changes from the first time.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-09 Thread Robert Haas
 A good start might be to always OUTPUT memory parameters using the
 same base unit.
 SHOW gives output that matches what you input.

Not for me it doesn't.

portal=# show temp_buffers;
 temp_buffers
--
 1024
(1 row)

portal=# set temp_buffers = '16MB';
SET
portal=# show temp_buffers;
 temp_buffers
--
 2048
(1 row)

 Deprecating setting the value directly in its base unit, so that all the
 memory parameters are specified as a number of bytes, would sidestep a lot
 of the issues people complain about.  What I would like to see (but don't
 have nearly enough time to argue in support of considering the resistance to
 change here) is that this syntax:

 shared_buffers=1024

Silly me, perhaps, but in the absence of a unit, I would assume that a
number in this context refers to a number of BUFFERS.  I'd be willing
to bet money that if you showed this line to a bunch of people who
knew nothing about postgresql but were reasonably computer-savvy,
you'd get a lot more people who thought that was a number of buffers
than who thought it was a number of bytes.

Compounding the problem, of course, is the fact, that right now,
that's exactly what it does mean.  While it's probably acceptable to
change the semantics of postgresql.conf in such a way that this syntax
is no longer accepted, or generates a warning, it's almost certainly a
bad idea to change it to silently mean something that is four orders
of magnitude different what it currently means.

 Would assume the user meant 1024 *bytes*, with the server silently rounding
 that up to the nearest 8k block.  Then the whole issue of do they mean bits
 or bytes? goes away, because no one would ever have to include the B.

I don't believe this issue exists in the first place, because anyone
who would specify the size of their shared buffer pool in bits should
have their head examined.  But if, hypothetically, someone were
confused, I can't imagine that taking away the B is somehow going to
be more clear.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-09 Thread Robert Haas
 When people aren't keeping track of their DDL, that is very strictly a
 process problem on their end.  When people are shooting themselves in
 the foot, it's a great disservice to market Kevlar shoes to them.

I can't believe anyone is going to stop tracking their DDL because,
ooh goody, now we have pg_class.creation_time.  They will look at and
say either oh, this is nice or oh, this is useless and go on about
their business.

I try pretty hard not to shoot myself in the foot.  But if someone
comes up to me and offers me some shoes that are have the same cost,
appearance, comfort-level, and durability as regular shoes but are
slightly more bullet resistant, should I refuse them on principle?
Why?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keeping creation time of objects

2008-09-09 Thread Andrew Dunstan



Robert Haas wrote:

I try pretty hard not to shoot myself in the foot.  But if someone
comes up to me and offers me some shoes that are have the same cost,
appearance, comfort-level, and durability as regular shoes but are
slightly more bullet resistant, should I refuse them on principle?
Why?


  


The premise is false. Nothing is cost free. Every feature adds to code 
complexity, and has to be maintained.


I am still quite unconvinced by any of the justifications advanced so 
far for this feature.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Fujii Masao
On Tue, Sep 9, 2008 at 8:38 PM, Heikki Linnakangas
[EMAIL PROTECTED] wrote:
 There's one thing I haven't figured out in this discussion. Does the write
 to the disk happen before or after the write to the slave? Can you guarantee
 that if a transaction is committed in the master, it's also committed in the
 slave, or vice versa?

We can guarantee that a transaction is committed in both the master and
the slave if we can wait for that one fsyncs WAL to disk and the other holds
it to memory or disk. Even if one fails, the other can continue service.
Even if both fail, the node which wrote WAL can continue service. A transaction
is lost in neither of the cases.

 Agreed. But what happens if you hit that timeout?

The stand-alone master continues service when it it that timeout. On the other
hand, the slave waits for the order by the sysadmin or the clustering software,
then it exits or becomes master.

 Should we enforce that
 timeout within the server, or should we leave that to the external heartbeat
 system?

Within the server. All users do not use such an external system. It's not simple
for the external system to leave the master stand-alone.

regards

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-09 Thread Fujii Masao
On Tue, Sep 9, 2008 at 8:42 PM, Markus Wanner [EMAIL PROTECTED] wrote:
 In the viewpoint of detection of a network failure, this feature is
 necessary.
 When the network goes down, WAL sender can be blocked until it detects
 the network failure, i.e. WAL sender keeps waiting for the response which
 never comes. A timeout notification is necessary in order to detect a
 network failure soon.

 That's one of the areas I'm missing from the overall concept. I'm glad it
 comes up. You certainly realize, that such a timeout must be set high enough
 so as not to trigger false negatives every now and then?

Yes.
And, as you know, there is trade-off between the false detection of the network
failure and how long WAL sender is blocked.

I'll provide not only that timeout but also keepalive for the network between
the master and the slave. I expect that keepalive eases that trade-off.

regards

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers