date:20080908

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-08 Thread Jeff Davis

On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
> Thanks for the review.
> 
> >   The standard specifies that non-recursive WITH should be evaluated
> >   once.
> 
> What shall we do? I don't think there's a easy way to fix this. Maybe
> we should not allow WITH clause without RECURISVE?

My interpretation of 7.13: General Rules: 2.b is that it should be
single evaluation, even if RECURSIVE is present.

The previous discussion was here:

http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

The important arguments in the thread seemed to be:

1. People will generally expect single evaluation, so might be
disappointed if they can't use this feature for that purpose.

2. It's a spec violation in the case of volatile functions.

3. "I think this is a "must fix" because of the point about volatile
functions --- changing it later will result in user-visible semantics
changes, so we have to get it right the first time."

I don't entirely agree with #3. It is user-visible, but only in the
sense that someone is depending on undocumented multiple-evaluation
behavior.

Tom Lane said that multiple evaluation is grounds for rejection:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php

Is there hope of correcting this before November?

> I will try to fix this. However detecting the query being not a
> non-linear one is not so easy.

If we don't allow mutual recursion, the only kind of non-linear
recursion that might exist would be multiple references to the same
recursive query name in a recursive query, is that correct?

> > * DISTINCT should supress duplicates:
> > 
> >   with recursive foo(i) as
> > (select distinct * from (values(1),(2)) t
> > union all
> > select distinct i+1 from foo where i < 10)
> >   select * from foo;
> > 
> >   This outputs a lot of duplicates, but they should be supressed
> > according to the standard. This query is essentially the same as
> > supporting UNION for recursive queries, so we should either fix both for
> > 8.4 or block both for consistency.
> 
> I'm not sure if it's possible to fix this. Will look into.
> 

Can't we just reject queries with top-level DISTINCT, similar to how
UNION is rejected?

> > * outer joins on a recursive reference should be blocked:
> > 
> >   with recursive foo(i) as
> > (values(1)
> > union all
> > select i+1 from foo left join (values(1)) t on (i=column1))
> >   select * from foo;
> > 
> >   Causes an infinite loop, but the standard says using an outer join
> >   in this situation should be prohibited. This should be fixed for 8.4.
> 
> Not an issue, I think.

Agreed, Andrew Gierth corrected me here.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-08 Thread ITAGAKI Takahiro

"Fujii Masao" <[EMAIL PROTECTED]> wrote:

> 3) Patch of introducing new background process which I've called
>  WALSender. It takes charge of sending WAL to the slave.
> 
>  Now, I assume that WALSender also listens the connection from
>  the slave, i.e. only one sender process manages multiple slaves.

>  The relation between WALSender and backend is 1:1. So,
>  the communication mechanism between them can be simple.

I assume that he says only one backend communicates with WAL sender
at a time. The communication is done during WALWriteLock is held,
so other backends wait for the communicating backend on WALWriteLock.
WAL sender only needs to send one signal for each time it sends WAL
buffers to slave.

We could be split the LWLock to WALWriterLock and WALSenderLock,
but the essential point is same.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Verbosity of Function Return Type Checks

2008-09-08 Thread Volkan YAZICI

On Mon, 8 Sep 2008, Alvaro Herrera <[EMAIL PROTECTED]> writes:
>> Modified as you suggested. BTW, will there be a similar i18n scenario
>> for "dropped column" you mentioned below?
>
> Yes, you need _() around those too.

For this purpose, I introduced a dropped_column_type variable in
validate_tupdesc_compat() function:

  const char dropped_column_type[] = _("n/a (dropped column)");

And used this in below fashion:

  OidIsValid(returned->attrs[i]->atttypid)
  ? format_type_be(returned->attrs[i]->atttypid)
  : dropped_column_type

>> Done with format_type_be() usage.
>
> BTW you forgot to remove the quotes around the type names (I know I told
> you to add them but Tom gave the reasons why it's not needed) :-)

Done.

> Those are minor problems that are easily fixed.  However there's a
> larger issue that Tom mentioned earlier and I concur, which is that the
> caller is forming the primary error message and passing it down.  It
> gets a bit silly if you consider the ways the messages end up worded:
>
>errmsg("returned record type does not match expected record type"));
>errdetail("Returned type \"%s\" does not match expected type \"%s\" in 
> column \"%s\".",
>---> this is the case where it's OK
>
>errmsg("wrong record type supplied in RETURN NEXT"));
>errdetail("Returned type \"%s\" does not match expected type \"%s\" in 
> column \"%s\".",
>--> this is strange
>
>errmsg("returned tuple structure does not match table of trigger event"));
>errdetail("Returned type \"%s\" does not match expected type \"%s\" in 
> column \"%s\".",
>--> this is not OK

What we're trying to do is to avoid code duplication while checking the
returned tuple types against expected tuple types. For this purpose we
implemented a generic function (validate_tupdesc_compat) to handle all
possible cases. But, IMHO, it's important to remember that there is no
perfect generic function to satisfy all possible cases at best.

> I've been thinking how to pass down the context information without
> feeding the complete string, but I don't find a way without doing
> message construction. Construction is to be avoided because it's a
> pain for translators.
>
> Maybe we should just use something generic like errmsg("mismatching
> record type") and have the caller pass two strings specifying what's
> the "returned" tuple and what's the "expected", but I don't see how
> ...  (BTW this is worth fixing, because every case seems to have
> appeared independently without much thought as to other callsites with
> the same pattern.)

I considered the subject with identical constraints but couldn't come up
with a more rational solution. Nevertheless, I'm open to any suggestion.


Regards.

Index: src/pl/plpgsql/src/pl_exec.c
===
RCS file: /projects/cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v
retrieving revision 1.219
diff -c -r1.219 pl_exec.c
*** src/pl/plpgsql/src/pl_exec.c	1 Sep 2008 22:30:33 -	1.219
--- src/pl/plpgsql/src/pl_exec.c	9 Sep 2008 05:48:57 -
***
*** 188,194 
  	   Oid reqtype, int32 reqtypmod,
  	   bool isnull);
  static void exec_init_tuple_store(PLpgSQL_execstate *estate);
! static bool compatible_tupdesc(TupleDesc td1, TupleDesc td2);
  static void exec_set_found(PLpgSQL_execstate *estate, bool state);
  static void plpgsql_create_econtext(PLpgSQL_execstate *estate);
  static void free_var(PLpgSQL_var *var);
--- 188,195 
  	   Oid reqtype, int32 reqtypmod,
  	   bool isnull);
  static void exec_init_tuple_store(PLpgSQL_execstate *estate);
! static void validate_tupdesc_compat(TupleDesc expected, TupleDesc returned,
! 	const char *msg);
  static void exec_set_found(PLpgSQL_execstate *estate, bool state);
  static void plpgsql_create_econtext(PLpgSQL_execstate *estate);
  static void free_var(PLpgSQL_var *var);
***
*** 384,394 
  			{
  case TYPEFUNC_COMPOSITE:
  	/* got the expected result rowtype, now check it */
! 	if (estate.rettupdesc == NULL ||
! 		!compatible_tupdesc(estate.rettupdesc, tupdesc))
! 		ereport(ERROR,
! (errcode(ERRCODE_DATATYPE_MISMATCH),
!  errmsg("returned record type does not match expected record type")));
  	break;
  case TYPEFUNC_RECORD:
  
--- 385,392 
  			{
  case TYPEFUNC_COMPOSITE:
  	/* got the expected result rowtype, now check it */
! 	validate_tupdesc_compat(tupdesc, estate.rettupdesc,
! 			gettext_noop("returned record type does not match expected record type"));
  	break;
  case TYPEFUNC_RECORD:
  
***
*** 705,715 
  		rettup = NULL;
  	else
  	{
! 		if (!compatible_tupdesc(estate.rettupdesc,
! trigdata->tg_relation->rd_att))
! 			ereport(ERROR,
! 	(errcode(ERRCODE_DATATYPE_MISMATCH),
! 	 errmsg("returned tuple structure does not match table of trigger event")));
  		/* Copy tuple to upper executor memory */
  		rettup = SPI_co

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-08 Thread Fujii Masao

Hi,

I looked some comment for the synchronous replication and understood
the consensus of the community was that the sync replication should be
added using not hooks and plug-ins but core-patches. If my understanding
is right, I will change my development plan so that the sync replication
may be put into core.

But, I don't think every features should be put into core. Of course, the
high-availability features (like clustering, automatic failover, ...etc) are
out of postgres. The user who wants whole HA solution using the sync
replication must integrate postgres and clustering software like heartbeat.

WAL sending should be put into core. But, I'd like to separate WAL
receiving from core and provide it as a new contrib tool. Because,
there are some users who use the sync replication as only WAL
streaming. They don't want to start postgres on the slave. Of course,
the slave can replay WAL by using pg_standby and WAL receiver tool
which I'd like to provide as a new contrib tool. I think the patch against
recovery code is not necessary.

I arrange the development items below :

1) Patch around XLogWrite.
 It enables a backend to wake up the WAL sender process at the
 timing of COMMIT.

2) Patch for the communication between a backend and WAL
 sender process.
 There were some discussions about this topic. Now, I decided to
 adopt imessages proposed by Markus.

3) Patch of introducing new background process which I've called
 WALSender. It takes charge of sending WAL to the slave.

 Now, I assume that WALSender also listens the connection from
 the slave, i.e. only one sender process manages multiple slaves.
 The relation between WALSender and backend is 1:1. So,
 the communication mechanism between them can be simple.
 As other idea, I can introduce new listener process and fork new
 WALSender for every slave. Which architecture is better? Or,
 should postmaster listen also the connection from the slave?

4) New contrib tool which I've called WALReceiver. It takes charge
of receiving WAL from the master and writing it to disk on the slave.

I will submit these patches and tool by Nov Commit Fest at the latest.

Any comment welcome!

best regards

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [EMAIL PROTECTED]: Re: [HACKERS] [patch] GUC source file and line number]

2008-09-08 Thread Greg Smith


On Mon, 8 Sep 2008, Alvaro Herrera wrote:


(I dropped the "default" stuff for now, as it doesn't seem that a
consensus has been reached on that topic.)


I have multiple GUC-related projects that are all stalled waiting for that 
capability to be added.  The only thing there wasn't clear consensus on 
was exactly what the name for it should be, and there I really don't care. 
I made the argument for why I named it the way I did, but if it gets 
committed with a less friendly name (like boot_val) I won't complain.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-08 Thread Robert Haas

>> * Single Evaluation:
>>
>>   with
>> foo(i) as (select random() as i)
>>   select * from foo union all select * from foo;
>>i
>>   ---
>>0.233165248762816
>> 0.62126633618027
>>   (2 rows)
>>
>>   The standard specifies that non-recursive WITH should be evaluated
>>   once.
>
> What shall we do? I don't think there's a easy way to fix this. Maybe
> we should not allow WITH clause without RECURISVE?

ISTM that kind of misses the point.  Even if it's WITH RECURSIVE
rather than simply WITH, one wouldn't expect multiple evaluations of
any non-recursive portion of the query.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-08 Thread Tatsuo Ishii

Thanks for the review.

[I dropped [EMAIL PROTECTED] from the Cc list since he has left our
company and the email address is being deleted.]

I'm going to look into issues which are seem to be bug (of course if
you know what to fix, patches are always welcome:-).

> These are my initial comments about the Common Table Expressions (CTE)
> patch, also known as WITH [RECURSIVE]. These comments are based on the
> patch here:
> 
> http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php
> 
> This is a semantically complex feature, and the standard is fairly
> complex as well. So I'm approaching this by making my own
> interpretations from the standard first (I included my interpretations
> and section references at the end of this email) and comparing to the
> behavior of the patch.
> 
> The following examples may be inconsistent with the standard. Some
> have already been mentioned, and I don't think they all need to be
> fixed for 8.4, but I mention them here for completeness.
> 
> * Mutual Recursion:
> 
>   with recursive
> foo(i) as (values(1) union all select i+1 from bar where i < 10),
> bar(i) as (values(1) union all select i+1 from foo where i < 10)
>   select * from foo;
>   ERROR:  mutual recursive call is not supported
> 
>   The standard allows mutual recursion.

The discussion seems to agree that let it leave for post 8.4.

> * Single Evaluation:
> 
>   with
> foo(i) as (select random() as i)
>   select * from foo union all select * from foo;
>i
>   ---
>0.233165248762816
> 0.62126633618027
>   (2 rows)
> 
>   The standard specifies that non-recursive WITH should be evaluated
>   once.

What shall we do? I don't think there's an easy way to fix this as Tom
suggested. Maybe we should not allow WITH clause without RECURISVE for
8.4?

> * RHS only:
> 
>   with recursive
> foo(i) as (select i+1 from foo where i < 10 union all values(1))
>   select * from foo;
>   ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
> recursive query
> 
>   The standard does not require that the recursive term be on the RHS.

The discussion seems to agree that let it leave for post 8.4.

> * UNION ALL only:
> 
>   with recursive
> foo(i) as (values(1) union select i+1 from foo where i < 10)
>   select * from foo;
>   ERROR:  non-recursive term and recursive term must be combined with
> UNION ALL
> 
>   The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT
>   (when the recursive term is not on the RHS of the EXCEPT).

The discussion seems to agree that let it leave for post 8.4.

> * Binary recursion and subselect strangeness:
> 
>   with recursive foo(i) as
> (values (1)
> union all
> select * from
>   (select i+1 from foo where i < 10
>   union all
>   select i+1 from foo where i < X) t)
>   select * from foo;
> 
>   Produces 10 rows of output regardless of what "X" is. This should be
> fixed for 8.4.
>   Also, this is non-linear recursion, which the standard seems to
> disallow.

I will try to fix this. However detecting the query being not a
non-linear one is not so easy.

> * Multiple recursive references:
> 
>   with recursive foo(i) as
> (values (1)
> union all
> select i+1 from foo where i < 10
> union all
> select i+1 from foo where i < 20)
>   select * from foo;
>   ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
> recursive query
> 
>   If we're going to allow non-linear recursion (which the standard
>   does not), this seems like it should be a valid case.

I will try to disallow this.

> * Strange result with except:
> 
>   with recursive foo(i) as
> (values (1)
> union all
> select * from
> (select i+1 from foo where i < 10
> except
> select i+1 from foo where i < 5) t)
>   select * from foo;
>   ERROR:  table "foo" has 0 columns available but 1 columns specified
> 
>   This query works if you replace "except" with "union". This should be
> fixed for 8.4.

I will try to fix this.

> * Aggregates allowed:
> 
>   with recursive foo(i) as
> (values(1)
> union all
> select max(i)+1 from foo where i < 10)
>   select * from foo;
> 
>   Aggregates should be blocked according to the standard.
>   Also, causes an infinite loop. This should be fixed for 8.4.

I will try to fix this.

> * DISTINCT should supress duplicates:
> 
>   with recursive foo(i) as
> (select distinct * from (values(1),(2)) t
> union all
> select distinct i+1 from foo where i < 10)
>   select * from foo;
> 
>   This outputs a lot of duplicates, but they should be supressed
> according to the standard. This query is essentially the same as
> supporting UNION for recursive queries, so we should either fix both for
> 8.4 or block both for consistency.

I'm not sure if it's possible to fix this. Will look into.

> * outer joins on a recursive reference should be blocked:
> 
>   with recursive foo(i) as
> (values(1)
> union all
> s

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-08 Thread Tatsuo Ishii

Thanks for the review.

[I dropped [EMAIL PROTECTED] from the Cc list since he has left our
company and the email address is being deleted.]

I'm going to look into issues which are seem to be bug (of course if
you know what to fix, patches are always welcome:-).

> These are my initial comments about the Common Table Expressions (CTE)
> patch, also known as WITH [RECURSIVE]. These comments are based on the
> patch here:
> 
> http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php
> 
> This is a semantically complex feature, and the standard is fairly
> complex as well. So I'm approaching this by making my own
> interpretations from the standard first (I included my interpretations
> and section references at the end of this email) and comparing to the
> behavior of the patch.
> 
> The following examples may be inconsistent with the standard. Some
> have already been mentioned, and I don't think they all need to be
> fixed for 8.4, but I mention them here for completeness.
> 
> * Mutual Recursion:
> 
>   with recursive
> foo(i) as (values(1) union all select i+1 from bar where i < 10),
> bar(i) as (values(1) union all select i+1 from foo where i < 10)
>   select * from foo;
>   ERROR:  mutual recursive call is not supported
> 
>   The standard allows mutual recursion.

The discussion seems to agree that let it leave for post 8.4.

> * Single Evaluation:
> 
>   with
> foo(i) as (select random() as i)
>   select * from foo union all select * from foo;
>i
>   ---
>0.233165248762816
> 0.62126633618027
>   (2 rows)
> 
>   The standard specifies that non-recursive WITH should be evaluated
>   once.

What shall we do? I don't think there's a easy way to fix this. Maybe
we should not allow WITH clause without RECURISVE?

> * RHS only:
> 
>   with recursive
> foo(i) as (select i+1 from foo where i < 10 union all values(1))
>   select * from foo;
>   ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
> recursive query
> 
>   The standard does not require that the recursive term be on the RHS.

The discussion seems to agree that let it leave for post 8.4.

> * UNION ALL only:
> 
>   with recursive
> foo(i) as (values(1) union select i+1 from foo where i < 10)
>   select * from foo;
>   ERROR:  non-recursive term and recursive term must be combined with
> UNION ALL
> 
>   The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT
>   (when the recursive term is not on the RHS of the EXCEPT).

The discussion seems to agree that let it leave for post 8.4.

> * Binary recursion and subselect strangeness:
> 
>   with recursive foo(i) as
> (values (1)
> union all
> select * from
>   (select i+1 from foo where i < 10
>   union all
>   select i+1 from foo where i < X) t)
>   select * from foo;
> 
>   Produces 10 rows of output regardless of what "X" is. This should be
> fixed for 8.4.
>   Also, this is non-linear recursion, which the standard seems to
> disallow.

I will try to fix this. However detecting the query being not a
non-linear one is not so easy.

> * Multiple recursive references:
> 
>   with recursive foo(i) as
> (values (1)
> union all
> select i+1 from foo where i < 10
> union all
> select i+1 from foo where i < 20)
>   select * from foo;
>   ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
> recursive query
> 
>   If we're going to allow non-linear recursion (which the standard
>   does not), this seems like it should be a valid case.

I will try to disallow this.

> * Strange result with except:
> 
>   with recursive foo(i) as
> (values (1)
> union all
> select * from
> (select i+1 from foo where i < 10
> except
> select i+1 from foo where i < 5) t)
>   select * from foo;
>   ERROR:  table "foo" has 0 columns available but 1 columns specified
> 
>   This query works if you replace "except" with "union". This should be
> fixed for 8.4.

I will try to fix this.

> * Aggregates allowed:
> 
>   with recursive foo(i) as
> (values(1)
> union all
> select max(i)+1 from foo where i < 10)
>   select * from foo;
> 
>   Aggregates should be blocked according to the standard.
>   Also, causes an infinite loop. This should be fixed for 8.4.

I will try to fix this.

> * DISTINCT should supress duplicates:
> 
>   with recursive foo(i) as
> (select distinct * from (values(1),(2)) t
> union all
> select distinct i+1 from foo where i < 10)
>   select * from foo;
> 
>   This outputs a lot of duplicates, but they should be supressed
> according to the standard. This query is essentially the same as
> supporting UNION for recursive queries, so we should either fix both for
> 8.4 or block both for consistency.

I'm not sure if it's possible to fix this. Will look into.

> * outer joins on a recursive reference should be blocked:
> 
>   with recursive foo(i) as
> (values(1)
> union all
> select i+1 from foo left jo

Re: [HACKERS] SQL standard question about Common Table Expressions

2008-09-08 Thread Jeff Davis

On Tue, 2008-09-09 at 01:45 +0100, Andrew Gierth wrote:
> The "contains" language in the spec is tricky. And I think there's some
> issue here with the spec confusing  and  expression body>; some of the language refers to "If a 
> AQEk not marked as recursive is immediately contained in a  expression body>" which is not possible as the syntax production for
>  does not contain .

Now that you point that out, I think that is a mistake in the spec. Your
version makes a lot more sense.

Thank you for going into the gory details of the algorithm, there were a
few other things tripping me up that you clarified.

I was going crazy yesterday trying to piece together what is supposed to
happen for, e.g., using INTERSECT to connect the recursive and the
non-recursive parts, and this answers it. I'm not suggesting that we
require support for INTERSECT, but my curiosity got the best of me.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Joshua Drake

On Mon, 8 Sep 2008 10:32:40 -0400
Alvaro Herrera <[EMAIL PROTECTED]> wrote:

> Gregory Stark wrote:
> > "Greg Sabino Mullane" <[EMAIL PROTECTED]> writes:
> > 
> > > Tom Lane wrote:
> > >> My vote is to reject the patch and work on a config checker.
> > >
> > > +1

+1

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Ron Mayer


Marko Kreen wrote:

Thirdly, please don't use "standard units" argument, unless you plan to
propose use of "KiB, MiB, GiB" at the same moment.  


In defense of standard units, if the postgres docs say
"Postgres will round up to the nearest power of 2"
kB and MB seem very clear to me.  If we want to silently
accept other abbreviations and acronyms too (mb, mIb, whatever)
that doesn't bother me, tho.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [EMAIL PROTECTED]: Re: [HACKERS] [patch] GUC source file and line number]

2008-09-08 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:

> > Hmm, I didn't recheck after Greg's patch, but in mine, it doesn't,
> > because the location is saved as "reset location" and restored when the
> > variable is reset.  It worked fine in all cases I tested.
> 
> Hmm.  Actually, why is there a need to save and restore at all?  There
> can certainly never be more than one recorded config-file location per
> variable.  What about saying that each variable has one and only one
> filename/linenumber, but whether these are relevant to the current value
> is determined by whether the current value's source is S_FILE?

Hmm, this does make the patch a lot simpler.  Attached.  (Magnus was
visionary enough to put the correct test in the pg_settings definition.)

I also dropped the change to set_config_option, and added a new routine
to set the source file/line, as you suggested elsewhere.  The only
problem is that we now have two bsearch calls for each option set in a
config file ...  I don't want to change set_config_option just to be
able to return the struct config_generic for this routine's sake ...
Better ideas?  Is this OK as is?

I noticed some weird things when the config files contain errors, but I
think it's outside this patch's scope.

(I dropped the "default" stuff for now, as it doesn't seem that a
consensus has been reached on that topic.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Index: doc/src/sgml/catalogs.sgml
===
RCS file: /home/alvherre/Code/cvs/pgsql/doc/src/sgml/catalogs.sgml,v
retrieving revision 2.172
diff -c -p -r2.172 catalogs.sgml
*** doc/src/sgml/catalogs.sgml	30 Jul 2008 17:05:04 -	2.172
--- doc/src/sgml/catalogs.sgml	9 Sep 2008 02:42:14 -
***
*** 6414,6419 
--- 6414,6433 
Allowed values in enum parameters (NULL for non-enum
values)
   
+  
+   sourcefile
+   text
+   Input file the current value was set from (NULL for values set in
+   sources other than configuration files).  Helpful when using
+   configuration include directives.
+  
+  
+   sourceline
+   text
+   Line number within the sourcefile the current value was set 
+   from (NULL for values set in sources other than configuration files)
+   
+  
  
 

Index: src/backend/utils/misc/guc-file.l
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/utils/misc/guc-file.l,v
retrieving revision 1.56
diff -c -p -r1.56 guc-file.l
*** src/backend/utils/misc/guc-file.l	22 Aug 2008 00:20:40 -	1.56
--- src/backend/utils/misc/guc-file.l	9 Sep 2008 02:09:28 -
*** struct name_value_pair
*** 39,44 
--- 39,46 
  {
  	char   *name;
  	char   *value;
+ 	char	   *filename;
+ 	int			sourceline;
  	struct name_value_pair *next;
  };
  
*** ProcessConfigFile(GucContext context)
*** 307,314 
  	/* If we got here all the options checked out okay, so apply them. */
  	for (item = head; item; item = item->next)
  	{
! 		set_config_option(item->name, item->value, context,
! 		  PGC_S_FILE, GUC_ACTION_SET, true);
  	}
  
  	/* Remember when we last successfully loaded the config file. */
--- 309,320 
  	/* If we got here all the options checked out okay, so apply them. */
  	for (item = head; item; item = item->next)
  	{
! 		if (set_config_option(item->name, item->value, context,
! 			   	 PGC_S_FILE, GUC_ACTION_SET, true))
! 		{
! 			set_config_sourcefile(item->name, item->filename,
!   item->sourceline);
! 		}
  	}
  
  	/* Remember when we last successfully loaded the config file. */
*** ParseConfigFile(const char *config_file,
*** 483,488 
--- 489,496 
  pfree(item->value);
  item->name = opt_name;
  item->value = opt_value;
+ item->filename = pstrdup(config_file);
+ item->sourceline = ConfigFileLineno-1;
  			}
  			else
  			{
*** ParseConfigFile(const char *config_file,
*** 490,495 
--- 498,505 
  item = palloc(sizeof *item);
  item->name = opt_name;
  item->value = opt_value;
+ item->filename = pstrdup(config_file);
+ item->sourceline = ConfigFileLineno-1;
  item->next = *head_p;
  *head_p = item;
  if (*tail_p == NULL)
*** ParseConfigFile(const char *config_file,
*** 502,507 
--- 512,519 
  			item = palloc(sizeof *item);
  			item->name = opt_name;
  			item->value = opt_value;
+ 			item->filename = pstrdup(config_file);
+ 			item->sourceline = ConfigFileLineno-1;
  			item->next = NULL;
  			if (*head_p == NULL)
  *head_p = item;
*** free_name_value_list(struct name_value_p
*** 553,558 
--- 565,571 
  
  		pfree(item->name);
  		pfree(item->value);
+ 		pfree(

[HACKERS] [Patch Review] Copy column storage parameters on CREATE TABLE LIKE/INHERITS

2008-09-08 Thread Stephen Frost

Greetings,

  I've reviewed the patch here:
  http://archives.postgresql.org/message-id/[EMAIL PROTECTED]

  and am happy to report that it looks to be in good order.  It has
  documentation and regression updates, is in context diff format,
  patches against current CVS with only some minor fuzz, and appears to
  work as advertised.  I looked over the patch and could easily follow
  what was going on, did some additional testing beyond the regression
  tests, and reviewed the documentation changes.

  My only comment on this patch is that the documentation updates might
  include a link back to Section 53.2 for more information, and/or to
  the SET STORAGE portion of the alter table/alter column command
  documentation, since the only other 'storage' reference in create
  table's documentation is about table storage parameters.
  
Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] sql2008 diff sql2003

2008-09-08 Thread Andrew Gierth

> "Alvaro" == Alvaro Herrera <[EMAIL PROTECTED]> writes:

 >> so it's like this:
 >> 
 >> select * from foo order by bar offset 5 rows fetch first 10 rows only;

 Alvaro> Oh, I see -- it's just a cumbersome way to have our LIMIT
 Alvaro> clause.  What's the "ONLY" for?

It seems to be just a mandatory noise word.

-- 
Andrew.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql2008 diff sql2003

2008-09-08 Thread Alvaro Herrera

Andrew Gierth wrote:

>  Alvaro> This means we have to support stuff like
> 
>  Alvaro> declare foo cursor for select * from lists;
>  Alvaro> select * from (fetch first from foo) as bar;
> 
> No, that's wrong.

[...]

> so it's like this:
> 
> select * from foo order by bar offset 5 rows fetch first 10 rows only;

Oh, I see -- it's just a cumbersome way to have our LIMIT clause.
What's the "ONLY" for?

> (nothing that I can see assigns any semantics to FIRST vs NEXT, they seem
> to do the same thing)

I was actually thinking that you'd be able to open a cursor and then
invoke the query repeatedly.  With FETCH FIRST it wouldn't work, because
every repetition would get the same rows, but with FETCH NEXT it'd give
you a paginated query.

It seems a lot less useful this way.

-- 
Alvaro Herrera   Valdivia, Chile   ICBM: S 39º 48' 55.3", W 73º 15' 24.7"
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL standard question about Common Table Expressions

2008-09-08 Thread Andrew Gierth

> "Jeff" == Jeff Davis <[EMAIL PROTECTED]> writes:

 Jeff> I am looking into the SQL standard to try to determine
 Jeff> precisely how the CTE feature should behave.

 Jeff> Taking a simple case like:

 Jeff>   with recursive
 Jeff> foo(i) as
 Jeff>   (values(1)
 Jeff>   union all
 Jeff>   select i+1 from foo where i < 5)
 Jeff>   select * from foo;

 Jeff> And looking at the SQL standard 200n 7.13: General Rules: 2.c,
 Jeff> it provides an algorithm for evaluating the recursive query.

 Jeff> In this algorithm, AQEk is a . Syntactically,
 Jeff> I only see two s, and one is the entire
 Jeff> query. The other is: "(values(1) union all select i+1 from foo
 Jeff> where i < 5)", so I'll assume that AQEk must be equal to that*.

The "contains" language in the spec is tricky. And I think there's some
issue here with the spec confusing  and ; some of the language refers to "If a 
AQEk not marked as recursive is immediately contained in a " which is not possible as the syntax production for
 does not contain .

Working through the spec, we have here only one WLE and therefore only
one partition. So WLE1 is the only element, and WQN1 is "foo" and
WQE1 is "(values(1) union all select i+1 from foo where i < 5)". The
way I suspect it's _intended_ to work is like this (following the order
of steps in the spec):

AQE1 = "values(1) union all select i+1 from foo where i < 5"
AQE2 = "values (1)"
AQE3 = "select i+1 from foo where i < 5"

AQE1 and AQE3 are recursive but AQE2 is not.

AQE1 is associated with WQE1 and therefore RT1 and WT1 are the result
and working tables for WQN1.

AQE2 is not marked as recursive and is contained in a recursive union,
therefore it is marked iteration-ignorable.

None of the AQEn suppress duplicates.

Let RT2 and WT2 be the result of AQE2, so both now contain (1)

AQE2 now becomes "TABLE RTN2" (and hence AQE1/WQE1 are now "TABLE RTN2
union all select i+1 from foo where i < 5"

Evaluate WQE1:

  RT1 and WT1 become (1)
  RT2/WT2 unchanged
  RT3 and WT3 remain empty

AQE2 is iteration ignorable, so RT2 is emptied.

WT1 and WT2 are nonempty, so:

associate "foo" with WT1

evaluate WQE1:

  WT1 becomes (2)
  WT2 is empty
  WT3 becomes (2)

  RT1 becomes (RT1 UNION ALL WT1) i.e. (1),(2)
  RT3 becomes (RT3 UNION ALL WT3) i.e. (1),(2)

WT1 and WT2 are nonempty, so:

associate "foo" with WT1

evaluate WQE1:

  WT1 becomes (3)
  WT2 is empty
  WT3 becomes (3)

  RT1 becomes (RT1 UNION ALL WT1) i.e. (1),(2),(3)
  RT3 becomes (RT3 UNION ALL WT3) i.e. (1),(2),(3)

etc.

[The main differences between this and the logic used by the existing
patch are the fact that the spec's version is keeping around many more
working and result tables in order to do the duplicate-suppression logic,
which requires knowing not just whether you've already produced a given
row, but also _which branch of the query_ produced it. Within the
constraints of the patch, it's not possible to trigger the duplicate
elimination, and therefore RT2/WT2 and RT3/WT3 become redundant; also all
non-recursive clauses are iteration-ignorable.]

-- 
Andrew (irc:RhodiumToad)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Verbosity of Function Return Type Checks

2008-09-08 Thread Alvaro Herrera

Volkan YAZICI wrote:
> On Fri, 05 Sep 2008, Tom Lane <[EMAIL PROTECTED]> writes:

> > at the call sites, and then
> >
> > errmsg("%s", _(msg))
> >
> > when throwing the error.
> 
> Modified as you suggested. BTW, will there be a similar i18n scenario
> for "dropped column" you mentioned below?

Yes, you need _() around those too.

> Done with format_type_be() usage.

BTW you forgot to remove the quotes around the type names (I know I told
you to add them but Tom gave the reasons why it's not needed) :-)

Those are minor problems that are easily fixed.  However there's a
larger issue that Tom mentioned earlier and I concur, which is that the
caller is forming the primary error message and passing it down.  It
gets a bit silly if you consider the ways the messages end up worded:

   errmsg("returned record type does not match expected record type"));
   errdetail("Returned type \"%s\" does not match expected type \"%s\" in 
column \"%s\".",
   ---> this is the case where it's OK

   errmsg("wrong record type supplied in RETURN NEXT"));
   errdetail("Returned type \"%s\" does not match expected type \"%s\" in 
column \"%s\".",
   --> this is strange

   errmsg("returned tuple structure does not match table of trigger event"));
   errdetail("Returned type \"%s\" does not match expected type \"%s\" in 
column \"%s\".",
   --> this is not OK

I've been thinking how to pass down the context information without
feeding the complete string, but I don't find a way without doing
message construction.  Construction is to be avoided because it's a pain
for translators.

Maybe we should just use something generic like errmsg("mismatching record 
type") 
and have the caller pass two strings specifying what's the "returned"
tuple and what's the "expected", but I don't see how ...  (BTW this is
worth fixing, because every case seems to have appeared independently
without much thought as to other callsites with the same pattern.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Alvaro Herrera

Martin Pihlak escribió:
> Tom Lane wrote:
> > Hmm.  With the timestamp in the file, ISTM that we could put all the
> > intelligence on the reader side.  Reader checks file, sends message if

> Attached is a patch that implements the described signalling.

Are we keeping the idea that the reader sends a stat message when it
needs to read stats?  What about the lossiness of the transport?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-08 Thread Bruce Momjian

Markus Wanner wrote:
> Hi,
> 
> Bruce Momjian wrote:
> > Backends would
> > update a shared memory variable specifying how far they want the wal
> > streamer to advance and send a signal to the wal streamer if necessary. 
> > Backends would monitor another shared memory variable that specifies how
> > far the wal streamer has advanced.
> 
> That sounds like WAL needs to be written to disk, before it can be sent 
> to the standby. Except maybe with some sort of mmap'ing the WAL.

Well, WAL is either on disk or in the wal_buffers in shared memory ---
in either case, a WAL streamer can get to it.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-08 Thread Markus Wanner


Hi,

Bruce Momjian wrote:

Backends would
update a shared memory variable specifying how far they want the wal
streamer to advance and send a signal to the wal streamer if necessary. 
Backends would monitor another shared memory variable that specifies how

far the wal streamer has advanced.


That sounds like WAL needs to be written to disk, before it can be sent 
to the standby. Except maybe with some sort of mmap'ing the WAL.


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move src/tools/backend/ to wiki

2008-09-08 Thread Bruce Momjian

Alvaro Herrera wrote:
> So I just noticed that we have a description of the Pg innards in the
> sourcecode, complete with a flowchart and all, at src/tools/backend.
> I had already seen this graph years ago; what shocked me the most was
> finding out that there's a pointer to it in the Developer's FAQ, in a
> question that's absolutely unrelated to it.
> 
> So I went ahead and moved its mention to a separate question, where it
> has a lot more visibility.  (I also added an URL to anoncvs, where
> people can see it more readily.)
> 
> What I'm wondering right now is what's the value of having this stuff
> in the source code at all.  Why don't we move it to the Wiki and remove
> it from the repo?  Would anybody object to that idea?  Having it on CVS
> means that nobody is able to improve the text, which is kind of sad
> because it's been neglected for the last 3 years.

I have no problem having it moved to the wiki, though it does have
HTML imagemap elements.  I don't think much changes at the flow chart
level from release to release so it would fine if it was just CVS HEAD. 
I also don't think many people do back-branch development.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Bruce Momjian

Zdenek Kotala wrote:
> You mentioned data types, but it is not a problem. You can easily extend data 
> type attribute about version information and call correct in/out functions. 
> Or 
> use different Oid for new data type version. There are more possible easy 
> solutions for data types. And for conversion You can use ALTER TABLE command.
> Main idea is keep data in all format in a relation. This approach should use 
> also for integer/float datetime problem.

This kind of code structure scares me that our system will become so
complex that it will hinder our ability to continue making improvements.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-08 Thread Jeff Davis

On Mon, 2008-09-08 at 22:53 +0100, Andrew Gierth wrote:
> Yes, you've misinterpreted. When the spec says that a query "shall
> not" do such-and-such, it means that a conforming client isn't allowed
> to do that, it does NOT mean that the server is required to produce an
> error when it sees it.
> 

Interesting. Thanks for the clear reference.

So, we either need to reject it or define some implementation-dependent
behavior, right?

The patch currently rejects HAVING, which means that it's a little
difficult to use an aggregate effectively. I lean toward rejecting
aggregates if we reject HAVING, for consistency. If we allow it, we
should allow HAVING as well.

>  Jeff> I agree that's kind of a funny requirement. But that's pretty
>  Jeff> typical of the SQL standard. If DB2 or SQL Server follow the
>  Jeff> standard here, we should, too. If not, it's open for discussion.
> 
> MSSQL does not:

Thanks again for a reference.

If MSSQL rejects SELECT DISTINCT for a recursive ,
then I think we should reject it. Right now the patch allows it but
provides a result that is inconsistent with the standard.

If we reject SELECT DISTINCT for recursive queries now, we can always
meet the standard later, or decide that the standard behavior is too
likely to cause confusion and just continue to block it.

> Ideally we should. DB2 appears to (other than OFFSET which it doesn't
> seem to have at all). But it's not at all clear that it would be either
> useful or easy to do so.

Ok. If it's easy to support it should probably be done.

As an aside, you seem to be an expert with the standard, have you had a
chance to look at the question I asked earlier?:

http://archives.postgresql.org/pgsql-hackers/2008-09/msg00487.php

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-08 Thread Andrew Gierth

> "Jeff" == Jeff Davis <[EMAIL PROTECTED]> writes:

 Jeff> Aggregates should be blocked according to the standard.
 Jeff> Also, causes an infinite loop. This should be fixed for 8.4.

 >> Does the standard require anywhere that non-conforming statements
 >> must be diagnosed? (seems impractical, since it would forbid
 >> extensions)

 Jeff> 2.g.iii.4.B explicitly says aggregates should be rejected,
 Jeff> unless I have misinterpreted.

Yes, you've misinterpreted. When the spec says that a query "shall
not" do such-and-such, it means that a conforming client isn't allowed
to do that, it does NOT mean that the server is required to produce an
error when it sees it.

Chapter and verse on this is given in the Framework doc, at 6.3.3.2:

  In the Syntax Rules, the term "shall" defines conditions that are
  required to be true of syntactically conforming SQL language. When such
  conditions depend on the contents of one or more schemas, they are
  required to be true just before the actions specified by the General
  Rules are performed. The treatment of language that does not conform to
  the SQL Formats and Syntax Rules is implementation-dependent.  If any
  condition required by Syntax Rules is not satisfied when the evaluation
  of Access or General Rules is attempted and the implementation is
  neither processing non-conforming SQL language nor processing
  conforming SQL language in a non-conforming manner, then an exception
  condition is raised: "syntax error or access rule violation".

Including an aggregate violates a "shall" in a syntax rule, therefore the
query is non-conforming, therefore the server can either process it in an
implementation-dependent manner or reject it with an exception.

 >> Yeah, though the standard's use of DISTINCT in this way is something
 >> of a violation of the POLA.

 Jeff> I agree that's kind of a funny requirement. But that's pretty
 Jeff> typical of the SQL standard. If DB2 or SQL Server follow the
 Jeff> standard here, we should, too. If not, it's open for discussion.

MSSQL does not:

"The following items are not allowed in the CTE_query_definition of a
 recursive member:

* SELECT DISTINCT
* GROUP BY
* HAVING
* Scalar aggregation
* TOP
* LEFT, RIGHT, OUTER JOIN (INNER JOIN is allowed)
* Subqueries
* A hint applied to a recursive reference to a CTE inside a
  CTE_query_definition.
"

For DB2 the docs do not appear to specify either way; they don't seem to
forbid the use of SELECT DISTINCT inside a recursive CTE, but neither do
they seem to mention any unusual effect of including it.

 Jeff> * ORDER BY, LIMIT, and OFFSET are rejected for recursive
 Jeff> queries. The standard does not seem to say that these should be
 Jeff> rejected.

 >> Note that supporting those in subqueries (including CTEs) is a
 >> separate optional feature of the standard.

 Jeff> I don't feel strongly about this either way, but I prefer that we
 Jeff> are consistent when possible. We do support these things in a
 Jeff> subquery, so shouldn't we support them in all subqueries?

Ideally we should. DB2 appears to (other than OFFSET which it doesn't
seem to have at all). But it's not at all clear that it would be either
useful or easy to do so.

-- 
Andrew.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-08 Thread Bruce Momjian

Fujii Masao wrote:
> On Mon, Sep 8, 2008 at 8:44 PM, Markus Wanner <[EMAIL PROTECTED]> wrote:
> >>Merge into WAL writer?
> >
> > Uh.. that would mean you'd loose parallelism between WAL writing to disk and
> > WAL shipping via network. That does not sound appealing to me.
> 
> That depends on the order of WAL writing and WAL shipping.
> How about the following order?
> 
> 1. A backend writes WAL to disk.
> 2. The backend wakes up WAL sender process and sleeps.
> 3. WAL sender process does WAL shipping and wakes up the backend.
> 4. The backend issues sync command.

I am confused why this is considered so complicated.  Having individual
backends doing the wal transfer to the slave is never going to work
well.

I figured we would have a single WAL streamer that continues advancing
forward in the WAL file, streaming to the standby.  Backends would
update a shared memory variable specifying how far they want the wal
streamer to advance and send a signal to the wal streamer if necessary. 
Backends would monitor another shared memory variable that specifies how
far the wal streamer has advanced.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Martin Pihlak

Tom Lane wrote:
> Hmm.  With the timestamp in the file, ISTM that we could put all the
> intelligence on the reader side.  Reader checks file, sends message if
... snip ...
> remember the file timestamp it last wrote out.  There are various ways
> you could design it but what comes to mind here is for readers to send
> a timestamp defined as minimum acceptable file timestamp (ie, their
> current clock time less whatever slop they want to allow).  Then,
> when the collector gets a request with that timestamp <= its last
> write timestamp, it knows it need not do anything.

Attached is a patch that implements the described signalling. Additionally
following non-related changes have been made:
1. fopen/fclose replaced with AllocateFile/FreeFile
2. pgstat_report_stat() now also checks if there are functions to report
before returning. Previous behavior was to return immediately if no table
stats were available for reporting.

> responding to a new live write request.  It's sort of annoying that
> the readers have to sleep for an arbitrary interval though.  If we could
> get rid of that part...
> 
It is, but I guess its unavoidable if we have to wait for the file to be
written. I'll try to do some load testing tomorrow, to see if something
nasty comes out.

regards,
Martin

Index: src/backend/postmaster/pgstat.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.181
diff -c -r1.181 pgstat.c
*** src/backend/postmaster/pgstat.c	25 Aug 2008 18:55:43 -	1.181
--- src/backend/postmaster/pgstat.c	8 Sep 2008 21:00:17 -
***
*** 85,90 
--- 85,97 
  #define PGSTAT_SELECT_TIMEOUT	2		/* How often to check for postmaster
  		 * death; in seconds. */
  
+ #define PGSTAT_POLL_RETRY_DELAY	5		/* How long to pause between statistics
+ 		 * update requests; in milliseconds. */
+ 
+ #define PGSTAT_POLL_TIME		5000	/* How long are we allowed to poll for 
+ 		 * the stats file; in milliseconds. */
+ 
+ #define PGSTAT_POLL_LOOP_COUNT	(PGSTAT_POLL_TIME / PGSTAT_POLL_RETRY_DELAY)
  
  /* --
   * The initial size hints for the hash tables used in the collector.
***
*** 159,164 
--- 166,177 
  static HTAB *pgStatFunctions = NULL;
  
  /*
+  * Indicates if backend has some function stats which it hasn't yet
+  * sent to the collector.
+  */
+ static bool have_function_stats = false;
+ 
+ /*
   * Tuple insertion/deletion counts for an open transaction can't be propagated
   * into PgStat_TableStatus counters until we know if it is going to commit
   * or abort.  Hence, we keep these counts in per-subxact structs that live
***
*** 201,208 
   */
  static PgStat_GlobalStats globalStats;
  
  static volatile bool need_exit = false;
- static volatile bool need_statwrite = false;
  static volatile bool got_SIGHUP = false;
  
  /*
--- 214,225 
   */
  static PgStat_GlobalStats globalStats;
  
+ /* Last time the collector wrote the stats file */
+ static TimestampTz last_statwrite;
+ /* Latest statistics request from backend */
+ static TimestampTz last_statrequest;
+ 
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
  
  /*
***
*** 223,229 
  
  NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]);
  static void pgstat_exit(SIGNAL_ARGS);
- static void force_statwrite(SIGNAL_ARGS);
  static void pgstat_beshutdown_hook(int code, Datum arg);
  static void pgstat_sighup_handler(SIGNAL_ARGS);
  
--- 240,245 
***
*** 254,259 
--- 270,276 
  static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
  static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
  static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
+ static void pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len);
  
  
  /* 
***
*** 655,663 
  	int			i;
  
  	/* Don't expend a clock check if nothing to do */
! 	/* Note: we ignore pending function stats in this test ... OK? */
! 	if (pgStatTabList == NULL ||
! 		pgStatTabList->tsa_used == 0)
  		return;
  
  	/*
--- 672,679 
  	int			i;
  
  	/* Don't expend a clock check if nothing to do */
! 	if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0)
! 		&& !have_function_stats)
  		return;
  
  	/*
***
*** 823,828 
--- 839,846 
  	if (msg.m_nentries > 0)
  		pgstat_send(&msg, offsetof(PgStat_MsgFuncstat, m_entry[0]) +
  	msg.m_nentries * sizeof(PgStat_FunctionEntry));
+ 
+ 	have_function_stats = false;
  }
  
  
***
*** 1341,1346 
--- 1359,1367 
  		fs->f_numcalls++;
  	fs->f_time = f_total;
  	INSTR_TIME_ADD(fs->f_time_self, f_self);
+ 
+ 	/* indicate that we have something to upload */
+ 	have_function_stats = true;
  }
  
  
***
*** 2463,2468 
--- 2484,2505 
  	hdr->

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-08 Thread Jeff Davis

On Mon, 2008-09-08 at 18:08 +0100, Andrew Gierth wrote:
>  Jeff> * Mutual Recursion:
> 
> This limitation isn't at all uncommon in other implementations; DB2
> docs for example say:

As with some other things in my list, this doesn't need to be supported
in 8.4. I just wanted to lay out my interpretation of the standard, and
places that we might (currently) fall short of it.

The fact that other DBMSs don't support mutual recursion is a good
indication that it's not important immediately.

>  Jeff>   The standard does not require that the recursive term be on
>  Jeff>   the RHS.
> 
> Again, the standard may not, but existing implementations do:
> 

Again, I don't think we need this for 8.4.

However, I think it's probably more important than mutual recursion.

>  Jeff> * UNION ALL only:
> 
>  Jeff>   with recursive
>  Jeff> foo(i) as (values(1) union select i+1 from foo where i < 10)
>  Jeff>   select * from foo;
>  Jeff>   ERROR:  non-recursive term and recursive term must be combined with
>  Jeff> UNION ALL
> 
>  Jeff>   The standard seems to allow UNION ALL, UNION, INTERSECT, and
>  Jeff>   EXCEPT (when the recursive term is not on the RHS of the
>  Jeff>   EXCEPT).
> 
> Again, existing implementations disagree. See above for DB2, and for
> MSSQL:
> 

And again, I agree that it's not important for 8.4.

At some point we need to determine what the goalposts are though. Are we
copying existing implementations, or are we implementing the standard?

>  Jeff>   Produces 10 rows of output regardless of what "X" is. This
>  Jeff> should be fixed for 8.4.  Also, this is non-linear recursion,
>  Jeff> which the standard seems to disallow.
> 
> That looks like it should be disallowed somehow.

Agreed. I think it should just throw an error, probably.

> [snip * Strange result with except: which looks like a bug]
> 
>  Jeff> * Aggregates allowed: which
> 
>  Jeff>   with recursive foo(i) as
>  Jeff> (values(1)
>  Jeff> union all
>  Jeff> select max(i)+1 from foo where i < 10)
>  Jeff>   select * from foo;
> 
>  Jeff>   Aggregates should be blocked according to the standard.
>  Jeff>   Also, causes an infinite loop. This should be fixed for 8.4.
> 
> Does the standard require anywhere that non-conforming statements must
> be diagnosed? (seems impractical, since it would forbid extensions)
> 

2.g.iii.4.B explicitly says aggregates should be rejected, unless I have
misinterpreted.

>  
> Yeah, though the standard's use of DISTINCT in this way is something
> of a violation of the POLA.
> 

I agree that's kind of a funny requirement. But that's pretty typical of
the SQL standard. If DB2 or SQL Server follow the standard here, we
should, too. If not, it's open for discussion.

> No. This has already been discussed; it's neither possible nor desirable
> to diagnose all cases which can result in infinite loops, and there are
> important types of queries which would be unnecessarily forbidden.

I didn't say we should forbid all infinite loops. But we should forbid
ones that the standard tells us to forbid.

> Besides, you've misread the spec here: it prohibits the recursive
> reference ONLY on the nullable side of the join. You cite:
> 

Thank you for the correction. It does properly reject the outer joins
that the standard says should be rejected.

>  Jeff> * ORDER BY, LIMIT, and OFFSET are rejected for recursive
>  Jeff> queries. The standard does not seem to say that these should be
>  Jeff> rejected.
> 
> Note that supporting those in subqueries (including CTEs) is a separate
> optional feature of the standard.
> 

I don't feel strongly about this either way, but I prefer that we are
consistent when possible. We do support these things in a subquery, so
shouldn't we support them in all subqueries?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-08 Thread Jeff Davis

On Mon, 2008-09-08 at 21:13 +0100, Gregory Stark wrote:
> Jeff Davis <[EMAIL PROTECTED]> writes:
> 
> > * Mutual Recursion:
> >
> >   with recursive
> > foo(i) as (values(1) union all select i+1 from bar where i < 10),
> > bar(i) as (values(1) union all select i+1 from foo where i < 10)
> >   select * from foo;
> >   ERROR:  mutual recursive call is not supported
> >
> >   The standard allows mutual recursion.
> 
> This seems to be a point of confusion. I originally read the standard and
> concluded that mutual recursion was an optional feature. Itagaki-san showed me
> a copy of the spec where it seemed there was a clear blanket prohibition on
> mutually recursive queries and in fact anything but simple linearly expandable
> queries. I wonder if there are different versions of the spec floating around
> on this point.
> 
> Take a second look at your spec and read on to where it defines "linear" and
> "expandable". If it doesn't define those terms then it's definitely different
> from what I read. If it does, read on to see what it does with them. The main
> reason to define them appeared to be to use them to say that supporting mutual
> recursion is not required.

I think we're reading the same version of the spec. I'm reading 200n.

My interpretation (Syntax Rule 2.h) is that expandable is only used to
determine whether it can contain a .

That being said, I don't think it should be a requirement for 8.4. The
CTE patch is an important feature, and we shouldn't hold it up over
something comparatively obscure like mutual recursion. As Andrew Gierth
cited, other systems don't support it anyway. It should be kept in mind
for future work though.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-08 Thread Gregory Stark

Jeff Davis <[EMAIL PROTECTED]> writes:

> * Mutual Recursion:
>
>   with recursive
> foo(i) as (values(1) union all select i+1 from bar where i < 10),
> bar(i) as (values(1) union all select i+1 from foo where i < 10)
>   select * from foo;
>   ERROR:  mutual recursive call is not supported
>
>   The standard allows mutual recursion.

This seems to be a point of confusion. I originally read the standard and
concluded that mutual recursion was an optional feature. Itagaki-san showed me
a copy of the spec where it seemed there was a clear blanket prohibition on
mutually recursive queries and in fact anything but simple linearly expandable
queries. I wonder if there are different versions of the spec floating around
on this point.

Take a second look at your spec and read on to where it defines "linear" and
"expandable". If it doesn't define those terms then it's definitely different
from what I read. If it does, read on to see what it does with them. The main
reason to define them appeared to be to use them to say that supporting mutual
recursion is not required.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] A few thoughts on the plan inval extension patch

2008-09-08 Thread Tom Lane

I've been looking at
http://archives.postgresql.org/message-id/[EMAIL PROTECTED]
and am not very happy with it.  One big problem is that it has the
parser collecting the list of functions referenced by a plan, which
is quite bogus --- consider functions introduced into the tree during
rewrite, or by means of an inlining operation.  The other thing I
did not care for was adding an OID to SharedInvalCatcacheMsg.  That
means bloating the size of sinval messages from four words to five,
which is not a good thing from a performance perspective.

What would make a lot more sense from the perspective of the sinval
code is to use the cache entry itempointer to identify the function
(or other object) being invalidated.  That's what the cache itself
uses and what's already being shipped in inval messages.  We would
have to modify the call signature for syscache callback functions
to pass them an ItemPointer, but that doesn't seem like a problem.

The main downside of this approach is that some part of the planner
would have to collect a list of TIDs of pg_proc entries for every
function used in a plan, which means an additional syscache lookup
(unless this could be piggybacked on some other existing lookup,
but I see no very good candidate).  The function is almost certainly in
cache at this point, so it's not a huge penalty, but it's annoying.

One idea that I had was to ameliorate that by not bothering to look up
built-in functions (which for this purpose we could define as those with
OIDs less than 1), on the assumption that they will never be
dropped.  Certainly the vast majority of functions referenced in a
typical plan tree would be built-ins, and so this should get us down to
the point where the extra lookups aren't a problem.  The subsequent
savings in list copying and comparisons in PlanCacheCallback are
attractive too.

Barring objections, I'll go see about whacking the patch into this
form.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast REVERSE() function?

2008-09-08 Thread Chris Browne

[EMAIL PROTECTED] (hubert depesz lubaczewski) writes:
> On Mon, Sep 08, 2008 at 11:20:18AM -0400, Chris Browne wrote:
>> I've got a case where I need to reverse strings, and find that, oddly
>> enough, there isn't a C-based reverse() function.
>> A search turns up pl/pgsql and SQL implementations:
>
> just for completenes - there is also pl/perl and c versions freely
> available:
> http://www.depesz.com/index.php/2007/07/30/indexable-field-like-something/
> (pl/perl)
> http://blog.frosties.org/post/2007/08/28/Fonction-reverse-C-avec-PostgreSQL
> (c)

I hadn't thought about the Unicode issue (mentioned elsewhere in the
thread); that's a good reason why the method I mentioned *wouldn't* be
a good one!

I'm NOT interested in pl/perl as an option; building and deploying all
of Perl is a mighty expensive way to get *ONE* function (and I don't
think that fundamentally changes if it's 10 functions!).

In the long run, I'd be keen on there being a REVERSE function
available in pg_catalog, which is why I'm asking about the C version,
as that would be the way to put it into the "core."
-- 
"cbbrowne","@","linuxdatabases.info"
http://www3.sympatico.ca/cbbrowne/sap.html
DSK: STAN.K; ML EXIT -- FILE NOT FOUND

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

2008-09-08 Thread David Rowley

Heikki Linnakangas wrote:
> David Rowley wrote:
> Thanks for all the reviews and suggestions.

> David, could you re-run the performance tests you ran earlier? I'm 
> curious to know what impact switching to the simpler loop for 1-byte 
> pattern had.

Sure: http://www.unixbeast.com/~fat/8.3_test_v1.4.xls

I tested with 8.3.3, and applied the patch updated by tom.

The thing that surprised me was that string_to_array didn't perform as well.
I expected single character searches to perform a little better. I can't
think why it would be slower now.

The strpos() test for the single char needle is taking 1.2% longer. I guess
that makes a little sense as we're now doing a new more length tests in this
case, but then we've eliminated the final single strncmp() for once it finds
that match. The strncmp would have check for nulls, check it's not exceeded
the compare length and check the chars actually match. I figured this would
have made up for that 1.2%. Seems not.

David.

-Original Message-
From: Heikki Linnakangas [mailto:[EMAIL PROTECTED] 
Sent: 08 September 2008 08:50
To: David Rowley
Cc: 'Tom Lane'; 'Peter Eisentraut'; pgsql-hackers@postgresql.org
Subject: Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching
(First time hacker)

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Korry Douglas


Settings in postgresql.conf are currently case-insensitive. Except
for the units.


And, of course, filenames when you are using a case-sensitive
filesystem.  Because these are things that are defined by some
convention other than the ones the PGDG made up.  Since units fall
into that category, it seems to me that we're stuck with using
external conventions.



Just a thought... since there are very few (if any) places where a  
user would specify a variable in terms of bits (kilobits, megabits,  
gigabits), we could make the units case-insensitive and assume that  
kb, gb, and mb (and all case variants) refer to some number of bytes.   
If a user wants to specify a variable in terms of bits, he would have  
to spell out the units completely, as in "4 gigabits", "20 kilobits",  
or "16 megabits".



-- Korry


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-08 Thread Simon Riggs


On Mon, 2008-09-08 at 19:19 +0900, ITAGAKI Takahiro wrote:
> Bruce Momjian <[EMAIL PROTECTED]> wrote:
> 
> > > > b) Use new background process as WALSender
> > > > 
> > > >This idea needs background-process hook which enables users
> > > >to define new background processes
> 
> > I think starting/stopping a process for each WAL send is too much
> > overhead.
> 
> Yes, of course slow. But I guess it is the only way to share one socket
> in all backends. Postgres is not a multi-threaded architecture,
> so each backend should use dedicated connections to send WAL buffers.
> 300 backends require 300 connections for each slave... it's not good at all.

So... don't have individual backends do the sending. Have them wait
while somebody else does it for them.

> > It sounds like Fujii-san is basically saying they can only get the hooks
> > done for 8.4, not the actual solution.
> 
> No! He has an actual solution in his prototype ;-)

The usual thing if you have a WIP patch you're not sure of is to post
the patch for feedback. 

If you guys aren't going to post any code to the project then I'm not
clear why it's being discussed here. Is this a community project or a
private project? 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] For what should pg_stop_backup wait?

2008-09-08 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
>> So thinking we should test XLogArchiveCheckDone() for both
>> stopxlogfilename and history file and then stat for the stop WAL file:

> This seems better.

Somehow I missed the 5-Apr patch that introduced this bogosity.
Next time you make a fundamental change in the behavior of a function,
how about updating its comment to match?

After studying it for a minute, I think that XLogArchiveCheckDone no
longer even has a clearly explainable purpose; it needs to be split
into two functions with two different behaviors.  I plan to revert
XLogArchiveCheckDone to its previous behavior and introduce
XLogArchiveIsBusy (flipping the sense of the result) to use in
pg_stop_backup.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] pgbench duration option

2008-09-08 Thread Brendan Jurd

On Mon, Sep 8, 2008 at 6:59 PM, ITAGAKI Takahiro
<[EMAIL PROTECTED]> wrote:
> Here is a revised version of the pgbench duration patch.
> I merged some comments from Brendan and gnari.
>

The changes look good.  I tried out the new v3 patch and didn't
encounter any problems.

One last minor quibble - I think the wording in the documentation is
still a little bit awkward:

   In the first place, never believe any test that runs
   for only a few seconds.  Use the -t or -T
setting enough
   to make the run last at least a few minutes, so as to average out noise.

This reads better IMHO if you simply omit the word "enough".

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] to_date() validation

2008-09-08 Thread Brendan Jurd

On Mon, Sep 8, 2008 at 6:24 PM, Brendan Jurd <[EMAIL PROTECTED]> wrote:
> On Sun, Sep 7, 2008 at 5:58 AM, Alex Hunsaker <[EMAIL PROTECTED]> wrote:
>> Code review: one minor nit
>> *** a/src/backend/utils/adt/formatting.c
>> --- b/src/backend/utils/adt/formatting.c
>> ***
>> *** 781,787  static const KeyWord DCH_keywords[] = {
>>{"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN},
>>
>>/* last */
>> !   {NULL, 0, 0, 0}
>>  };
>>
>>  /* --
>> --- 781,787 
>>{"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN},
>>
>>/* last */
>> !   {NULL, 0, 0, 0, 0}
>>  };
>
> Ah, thank you for picking that up.  I will correct this and send in a
> new patch version in the next 24 hours.
>

New version attached (and link added to wiki).

Cheers,
BJ


to-date-validation-2.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-08 Thread Andrew Gierth

> "Jeff" == Jeff Davis <[EMAIL PROTECTED]> writes:

 Jeff> * Mutual Recursion:

This limitation isn't at all uncommon in other implementations; DB2
docs for example say:

"If more than one common table expression is defined in the same
statement, cyclic references between the common table expressions are
not permitted. A cyclic reference occurs when two common table
expressions dt1 and dt2 are created such that dt1 refers to dt2 and
dt2 refers to dt1."

http://publib.boulder.ibm.com/infocenter/iadthelp/v7r0/index.jsp?topic=/com.ibm.etools.iseries.langref2.doc/rbafzmst295.htm

MSSQL's documentation is less clear, but it also appears not to allow
mutual recursion (not allowing forward references to CTEs).

"A CTE can reference itself and previously defined CTEs in the same
WITH clause. Forward referencing is not allowed."

http://msdn.microsoft.com/en-us/library/ms175972.aspx

 Jeff> * RHS only:

 Jeff>   with recursive
 Jeff> foo(i) as (select i+1 from foo where i < 10 union all values(1))
 Jeff>   select * from foo;
 Jeff>   ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
 Jeff> recursive query

 Jeff>   The standard does not require that the recursive term be on
 Jeff>   the RHS.

Again, the standard may not, but existing implementations do:

MSSQL:

"The recursive CTE definition must contain at least two CTE query
definitions, an anchor member and a recursive member. Multiple anchor
members and recursive members can be defined; however, all anchor
member query definitions must be put before the first recursive member
definition. All CTE query definitions are anchor members unless they
reference the CTE itself. "

DB2:

"The following restrictions apply to a recursive common-table-expression:
* A list of column-names must be specified following the
  table-identifier.
* The UNION ALL set operator must be specified.
* The first fullselect of the first union (the initialization
  fullselect) must not include a reference to the
  common-table-expression itself in any FROM clause.
"

 Jeff> * UNION ALL only:

 Jeff>   with recursive
 Jeff> foo(i) as (values(1) union select i+1 from foo where i < 10)
 Jeff>   select * from foo;
 Jeff>   ERROR:  non-recursive term and recursive term must be combined with
 Jeff> UNION ALL

 Jeff>   The standard seems to allow UNION ALL, UNION, INTERSECT, and
 Jeff>   EXCEPT (when the recursive term is not on the RHS of the
 Jeff>   EXCEPT).

Again, existing implementations disagree. See above for DB2, and for
MSSQL:

"Anchor members must be combined by one of these set operators: UNION
ALL, UNION, INTERSECT, or EXCEPT. UNION ALL is the only set operator
allowed between the last anchor member and first recursive member, and
when combining multiple recursive members."

 Jeff> * Binary recursion and subselect strangeness:

 Jeff>   with recursive foo(i) as
 Jeff> (values (1)
 Jeff> union all
 Jeff> select * from
 Jeff>   (select i+1 from foo where i < 10
 Jeff>   union all
 Jeff>   select i+1 from foo where i < X) t)
 Jeff>   select * from foo;

 Jeff>   Produces 10 rows of output regardless of what "X" is. This
 Jeff> should be fixed for 8.4.  Also, this is non-linear recursion,
 Jeff> which the standard seems to disallow.

That looks like it should be disallowed somehow.

 Jeff> * Multiple recursive references:
 [snip]

 Jeff>   If we're going to allow non-linear recursion (which the
 Jeff>   standard does not), this seems like it should be a valid
 Jeff>   case.

We probably shouldn't allow it (as above).

[snip * Strange result with except: which looks like a bug]

 Jeff> * Aggregates allowed: which

 Jeff>   with recursive foo(i) as
 Jeff> (values(1)
 Jeff> union all
 Jeff> select max(i)+1 from foo where i < 10)
 Jeff>   select * from foo;

 Jeff>   Aggregates should be blocked according to the standard.
 Jeff>   Also, causes an infinite loop. This should be fixed for 8.4.

Does the standard require anywhere that non-conforming statements must
be diagnosed? (seems impractical, since it would forbid extensions)

 Jeff> * DISTINCT should supress duplicates:

 Jeff>   with recursive foo(i) as
 Jeff> (select distinct * from (values(1),(2)) t
 Jeff> union all
 Jeff> select distinct i+1 from foo where i < 10)
 Jeff>   select * from foo;

 Jeff>   This outputs a lot of duplicates, but they should be
 Jeff> supressed according to the standard. This query is essentially
 Jeff> the same as supporting UNION for recursive queries, so we
 Jeff> should either fix both for 8.4 or block both for consistency.

Yeah, though the standard's use of DISTINCT in this way is something
of a violation of the POLA.

 Jeff> * outer joins on a recursive reference should be blocked:

 Jeff>   with recursive foo(i) as
 Jeff> (values(1)
 Jeff> union all
 Jeff> select i+1 from foo left join (values(1)) t on (i=column1))
 Jeff>   select * from foo;

 Jeff>   Causes an infinite loop, but the sta

Re: [HACKERS] sql2008 diff sql2003

2008-09-08 Thread Andrew Gierth

> "Alvaro" == Alvaro Herrera <[EMAIL PROTECTED]> writes:

 Alvaro> Wow, this is really horrid:

 Alvaro>  # F856 through F859: FETCH FIRST clause in subqueries,
 Alvaro>  views, and query expressions. The SQL:2008 syntax for
 Alvaro>  restricting the rows of a result set is FETCH FIRST, rather
 Alvaro>  than Microsoft SQL Server’s SELECT TOP N equivalent which
 Alvaro>  SQL Anywhere supports presently.

 Alvaro> This means we have to support stuff like

 Alvaro> declare foo cursor for select * from lists;
 Alvaro> select * from (fetch first from foo) as bar;

No, that's wrong.

The new syntax is:

 ::=
  [  ] 
  [  ] [  ] [  ]

 ::=
  OFFSET  { ROW | ROWS }

 ::=
  FETCH { FIRST | NEXT } [  ] { ROW | ROWS } ONLY

so it's like this:

select * from foo order by bar offset 5 rows fetch first 10 rows only;

(nothing that I can see assigns any semantics to FIRST vs NEXT, they seem
to do the same thing)

-- 
Andrew (irc:RhodiumToad)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql2008 diff sql2003

2008-09-08 Thread Alvaro Herrera

Pavel Stehule escribió:
> Hello
> 
> I found one usefull article
> http://iablog.sybase.com/paulley/2008/07/sql2008-now-an-approved-iso-international-standard/

Wow, this is really horrid:

#  F856 through F859: FETCH FIRST clause in subqueries, views,
and query expressions. The SQL:2008 syntax for restricting the
rows of a result set is FETCH FIRST, rather than Microsoft SQL
Server’s SELECT TOP N equivalent which SQL Anywhere supports
presently. 

This means we have to support stuff like

declare foo cursor for select * from lists;
select * from (fetch first from foo) as bar;

(I wonder why didn't they use FETCH NEXT instead.  As is, it seems a bit
cumbersome to use.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] For what should pg_stop_backup wait?

2008-09-08 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> You sound like you're in the middle of doing this yourself. Or would you
> like me to do that?

Yeah, done and committed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] sql2008 diff sql2003

2008-09-08 Thread Pavel Stehule

Hello

I found one usefull article
http://iablog.sybase.com/paulley/2008/07/sql2008-now-an-approved-iso-international-standard/

Regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast REVERSE() function?

2008-09-08 Thread Pavel Stehule

2008/9/8 Andrew Dunstan <[EMAIL PROTECTED]>:
>
>
> Mario Weilguni wrote:
>>>
>>> (Aside: presumably we could walk thru the string destructively,
>>> in-place, swapping bytes; I think that would be theoretically
>>> quickest...)
>>>
>>
>> Hmmm... I guess it will not work für UTF-8 or any other multibyte charset
>>
>
> Yes, quite.

orafce contains multibyte (UTF8) reverse function.

Pavel

>
> Perl's reverse might work with UTF8 - I've never tried.
>
> cheers
>
> andrew
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] For what should pg_stop_backup wait?

2008-09-08 Thread Simon Riggs


On Mon, 2008-09-08 at 11:57 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> >> So thinking we should test XLogArchiveCheckDone() for both
> >> stopxlogfilename and history file and then stat for the stop WAL file:
> 
> > This seems better.
> 
> Somehow I missed the 5-Apr patch that introduced this bogosity.
> Next time you make a fundamental change in the behavior of a function,
> how about updating its comment to match?

It felt OK when I did it, because of the clearly named boolean. But I
accept your point and will look to improve my code comments in future.

> After studying it for a minute, I think that XLogArchiveCheckDone no
> longer even has a clearly explainable purpose; it needs to be split
> into two functions with two different behaviors.  I plan to revert
> XLogArchiveCheckDone to its previous behavior and introduce
> XLogArchiveIsBusy (flipping the sense of the result) to use in
> pg_stop_backup.

You sound like you're in the middle of doing this yourself. Or would you
like me to do that?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast REVERSE() function?

2008-09-08 Thread Andrew Dunstan




Mario Weilguni wrote:

(Aside: presumably we could walk thru the string destructively,
in-place, swapping bytes; I think that would be theoretically
quickest...)



Hmmm... I guess it will not work für UTF-8 or any other multibyte charset 

  


Yes, quite.

Perl's reverse might work with UTF8 - I've never tried.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast REVERSE() function?

2008-09-08 Thread Pavel Stehule

Hello

2008/9/8 Mario Weilguni <[EMAIL PROTECTED]>:
>> (Aside: presumably we could walk thru the string destructively,
>> in-place, swapping bytes; I think that would be theoretically
>> quickest...)
>
> Hmmm... I guess it will not work für UTF-8 or any other multibyte charset
>

it isn't problem, but I am not sure, if ANSI SQL has this function?

Regards
Pavel Stehule


> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast REVERSE() function?

2008-09-08 Thread hubert depesz lubaczewski

On Mon, Sep 08, 2008 at 11:20:18AM -0400, Chris Browne wrote:
> I've got a case where I need to reverse strings, and find that, oddly
> enough, there isn't a C-based reverse() function.
> A search turns up pl/pgsql and SQL implementations:

just for completenes - there is also pl/perl and c versions freely
available:
http://www.depesz.com/index.php/2007/07/30/indexable-field-like-something/
(pl/perl)
http://blog.frosties.org/post/2007/08/28/Fonction-reverse-C-avec-PostgreSQL
(c)

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: [EMAIL PROTECTED] / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast REVERSE() function?

2008-09-08 Thread Mario Weilguni

> (Aside: presumably we could walk thru the string destructively,
> in-place, swapping bytes; I think that would be theoretically
> quickest...)

Hmmm... I guess it will not work für UTF-8 or any other multibyte charset 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fast REVERSE() function?

2008-09-08 Thread Chris Browne

I've got a case where I need to reverse strings, and find that, oddly
enough, there isn't a C-based reverse() function.

A search turns up pl/pgsql and SQL implementations:

create or replace function reverse_string(text) returns text as $$
DECLARE
reversed_string text;
incoming alias for $1;
BEGIN
reversed_string = ;
for i in reverse char_length(incoming)..1 loop
  reversed_string = reversed_string || substring(incoming from i for 1);
end loop;
return reversed_string;
END $$
language plpgsql;

CREATE OR REPLACE FUNCTION reverse(TEXT) RETURNS TEXT AS $$
 SELECT 
array_to_string( 
  ARRAY
( SELECT substring($1, s.i,1) FROM generate_series(length($1), 1, -1) 
AS s(i) ), 
  '');
$$ LANGUAGE SQL IMMUTABLE;

Unfortunately, neither is particularly fast.  This should be
"blinding-quick" in C, in comparison; reversing a set of bytes should
be able to be done mighty quick!

(Aside: presumably we could walk thru the string destructively,
in-place, swapping bytes; I think that would be theoretically
quickest...)

I could probably add this in as an SPI() function; is there a good
reason to try to avoid doing so?
-- 
output = reverse("ofni.sesabatadxunil" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/sgml.html
"Consistency  is  the  single  most important  aspect  of  *ideology.*
Reality is not nearly so consistent." - <[EMAIL PROTECTED]>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Gregory Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Gregory Stark <[EMAIL PROTECTED]> writes:
>> Alvaro Herrera <[EMAIL PROTECTED]> writes:
>>> It's good as a joke, but what if the user says '1024b'?  Does it mean
>>> 1024 blocks or one kilobyte?  If blocks, what size are we talking, the
>>> usual 512 bytes, or our BLCKSZ?
>
>> For what guc would you find a unit of posix-style "blocks" relevant?!
>
> The point isn't whether it's relevant, the point is that there's a
> fairly serious doubt as to what the user thought it meant.

My point was that we don't accept any form of "b" today and nobody has
suggested we should.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-08 Thread Jeff Davis


These are my initial comments about the Common Table Expressions (CTE)
patch, also known as WITH [RECURSIVE]. These comments are based on the
patch here:

http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php

This is a semantically complex feature, and the standard is fairly
complex as well. So I'm approaching this by making my own
interpretations from the standard first (I included my interpretations
and section references at the end of this email) and comparing to the
behavior of the patch.

The following examples may be inconsistent with the standard. Some
have already been mentioned, and I don't think they all need to be
fixed for 8.4, but I mention them here for completeness.

* Mutual Recursion:

  with recursive
foo(i) as (values(1) union all select i+1 from bar where i < 10),
bar(i) as (values(1) union all select i+1 from foo where i < 10)
  select * from foo;
  ERROR:  mutual recursive call is not supported

  The standard allows mutual recursion.

* Single Evaluation:

  with
foo(i) as (select random() as i)
  select * from foo union all select * from foo;
   i
  ---
   0.233165248762816
0.62126633618027
  (2 rows)

  The standard specifies that non-recursive WITH should be evaluated
  once.

* RHS only:

  with recursive
foo(i) as (select i+1 from foo where i < 10 union all values(1))
  select * from foo;
  ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
recursive query

  The standard does not require that the recursive term be on the RHS.

* UNION ALL only:

  with recursive
foo(i) as (values(1) union select i+1 from foo where i < 10)
  select * from foo;
  ERROR:  non-recursive term and recursive term must be combined with
UNION ALL

  The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT
  (when the recursive term is not on the RHS of the EXCEPT).

* Binary recursion and subselect strangeness:

  with recursive foo(i) as
(values (1)
union all
select * from
  (select i+1 from foo where i < 10
  union all
  select i+1 from foo where i < X) t)
  select * from foo;

  Produces 10 rows of output regardless of what "X" is. This should be
fixed for 8.4.
  Also, this is non-linear recursion, which the standard seems to
disallow.

* Multiple recursive references:

  with recursive foo(i) as
(values (1)
union all
select i+1 from foo where i < 10
union all
select i+1 from foo where i < 20)
  select * from foo;
  ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
recursive query

  If we're going to allow non-linear recursion (which the standard
  does not), this seems like it should be a valid case.

* Strange result with except:

  with recursive foo(i) as
(values (1)
union all
select * from
(select i+1 from foo where i < 10
except
select i+1 from foo where i < 5) t)
  select * from foo;
  ERROR:  table "foo" has 0 columns available but 1 columns specified

  This query works if you replace "except" with "union". This should be
fixed for 8.4.

* Aggregates allowed:

  with recursive foo(i) as
(values(1)
union all
select max(i)+1 from foo where i < 10)
  select * from foo;

  Aggregates should be blocked according to the standard.
  Also, causes an infinite loop. This should be fixed for 8.4.

* DISTINCT should supress duplicates:

  with recursive foo(i) as
(select distinct * from (values(1),(2)) t
union all
select distinct i+1 from foo where i < 10)
  select * from foo;

  This outputs a lot of duplicates, but they should be supressed
according to the standard. This query is essentially the same as
supporting UNION for recursive queries, so we should either fix both for
8.4 or block both for consistency.

* outer joins on a recursive reference should be blocked:

  with recursive foo(i) as
(values(1)
union all
select i+1 from foo left join (values(1)) t on (i=column1))
  select * from foo;

  Causes an infinite loop, but the standard says using an outer join
  in this situation should be prohibited. This should be fixed for 8.4.

* ORDER BY, LIMIT, and OFFSET are rejected for recursive queries. The
standard does not seem to say that these should be rejected.


The following are my interpretations of relevant parts of the SQL
standard (200n), and the associated sections. These are only my
interpretation, so let me know if you interpret the standard
differently.

Non-linear recursion forbidden:
  7.13: Syntax Rules: 2.g.ii
My interpretation of 2.g.ii.2 is that WQN[k] and WQN[l] may be the
same .
  7.13: Syntax Rules: 2.g.iv

EXCEPT can't be used for recursive queries if a recursive reference
appears on the RHS:
  7.13: Syntax Rules: 2.g.iii.1

INTERSECT ALL/EXCEPT ALL can't be used for recursive queries:
  7.13: Syntax Rules: 2.g.iii.5

recursive references must appear in FROM clause:
  7.13: Syntax Rules: 2.g.iii.3
My interpretation of this rule is that it does not allow a

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-08 Thread Markus Wanner


Hi,

Fujii Masao wrote:

1. A backend writes WAL to disk.
2. The backend wakes up WAL sender process and sleeps.
3. WAL sender process does WAL shipping and wakes up the backend.
4. The backend issues sync command.


Right, that would work. But still, the WAL writer process would block 
during writing WAL blocks.


Are there compelling reasons for using the existing WAL writer process, 
as opposed to introducing a new process?



The timing of the process's receiving a signal is dependent on the scheduler
of kernel.


Sure, so are pipes or shmem queues.


The scheduler does not always handle a signal immediately.


What exactly are you proposing to use instead of signals? Semaphores are 
pretty inconvenient when trying to wake up arbitrary processes or in 
conjunction with listening on sockets via select(), for example.


See src/backend/replication/manager.c from Postgres-R for a working 
implementation of such a process using select() and signaling.


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
>> It's good as a joke, but what if the user says '1024b'?  Does it mean
>> 1024 blocks or one kilobyte?  If blocks, what size are we talking, the
>> usual 512 bytes, or our BLCKSZ?

> For what guc would you find a unit of posix-style "blocks" relevant?!

The point isn't whether it's relevant, the point is that there's a
fairly serious doubt as to what the user thought it meant.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Gregory Stark


Alvaro Herrera <[EMAIL PROTECTED]> writes:

> It's good as a joke, but what if the user says '1024b'?  Does it mean
> 1024 blocks or one kilobyte?  If blocks, what size are we talking, the
> usual 512 bytes, or our BLCKSZ?

For what guc would you find a unit of posix-style "blocks" relevant?!

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Alvaro Herrera

Gregory Stark wrote:
> "Greg Sabino Mullane" <[EMAIL PROTECTED]> writes:
> 
> > Tom Lane wrote:
> >> My vote is to reject the patch and work on a config checker.
> >
> > +1
> >
> >> postgres=# set work_mem = '1g';
> >> ERROR: invalid value for parameter "work_mem": "1g"
> >
> > Perhaps this would be a great place for a HINT listing all
> > valid inputs, if not there already?
> 
> It is, I paraphrased it on my original message as:
> 
> HINT:  It's perfectly clear what you want but I'm going to refuse to do
>it until you type it exactly as I say: "GB"

It's good as a joke, but what if the user says '1024b'?  Does it mean
1024 blocks or one kilobyte?  If blocks, what size are we talking, the
usual 512 bytes, or our BLCKSZ?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Gregory Stark

"Greg Sabino Mullane" <[EMAIL PROTECTED]> writes:

> Tom Lane wrote:
>> My vote is to reject the patch and work on a config checker.
>
> +1
>
>> postgres=# set work_mem = '1g';
>> ERROR: invalid value for parameter "work_mem": "1g"
>
> Perhaps this would be a great place for a HINT listing all
> valid inputs, if not there already?

It is, I paraphrased it on my original message as:

HINT:  It's perfectly clear what you want but I'm going to refuse to do
   it until you type it exactly as I say: "GB"


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Alvaro Herrera

Greg Sabino Mullane wrote:

> Tom Lane wrote:

> > postgres=# set work_mem = '1g';
> > ERROR: invalid value for parameter "work_mem": "1g"
> 
> Perhaps this would be a great place for a HINT listing all
> valid inputs, if not there already?

alvherre=# set work_mem = '1g';
ERROR:  invalid value for parameter "work_mem": "1g"
HINT:  Valid units for this parameter are "kB", "MB", and "GB".


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Greg Sabino Mullane


-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


Tom Lane wrote:
> My vote is to reject the patch and work on a config checker.

+1

> postgres=# set work_mem = '1g';
> ERROR: invalid value for parameter "work_mem": "1g"

Perhaps this would be a great place for a HINT listing all
valid inputs, if not there already?

- --
Greg Sabino Mullane [EMAIL PROTECTED]
End Point Corporation
PGP Key: 0x14964AC8 200809081014
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkjFM2AACgkQvJuQZxSWSsiDvACdE6Wlrnu3uQH4mOpuEMvX0VQe
rXoAoPLCR5jKTWQH4GsHDtz5NNZXq4vA
=nRMS
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Tom Lane

"Greg Stark" <[EMAIL PROTECTED]> writes:
> On Mon, Sep 8, 2008 at 2:11 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> But of course case insensitivity isn't going to fix that example for you.
>> So we're right back at the question of where we should draw the line in
>> trying to accept variant input.

> Well it's not a perfect precedent but for example, dd accepts:

> G(2^30)
> M(2^20)
> k (2^10)
> K(2^10)
> Kb  (10^3)
> MB (10^6)
> GB (10^9)
> b(512)

Hmm.  I could get behind a proposal to allow single-letter abbreviations
if it could be made to work across the board, but what about the time
units?  "ms" vs "min" is the sticky part there...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Tom Lane

Martin Pihlak <[EMAIL PROTECTED]> writes:
> Attached is a patch which adds a timestamp to pgstat.stat file header,
> backend_read_statsfile uses this to determine if the file is fresh.
> During the wait loop, the stats request message is retransmitted to
> compensate for possible loss of message(s).

> The collector only writes the file mostly at PGSTAT_STAT_INTERVAL frequency,
> currently no extra custom timeouts can be passed in the message. This can
> of course be added if need arises.

Hmm.  With the timestamp in the file, ISTM that we could put all the
intelligence on the reader side.  Reader checks file, sends message if
it's too stale.  The collector just writes when it's told to, no
filtering.  In this design, rate limiting relies on the readers to not
be unreasonable in how old a file they'll accept; and there's no problem
with different readers having different requirements.

A possible problem with this idea is that two readers might send request
messages at about the same time, resulting in two writes where there
need have been only one.  However I think that could be fixed if we add
a little more information to the request messages and have the collector
remember the file timestamp it last wrote out.  There are various ways
you could design it but what comes to mind here is for readers to send
a timestamp defined as minimum acceptable file timestamp (ie, their
current clock time less whatever slop they want to allow).  Then,
when the collector gets a request with that timestamp <= its last
write timestamp, it knows it need not do anything.

With signaling like that, there's no excess writes *and* no delay in
responding to a new live write request.  It's sort of annoying that
the readers have to sleep for an arbitrary interval though.  If we could
get rid of that part...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-08 Thread Fujii Masao

On Mon, Sep 8, 2008 at 8:44 PM, Markus Wanner <[EMAIL PROTECTED]> wrote:
>>Merge into WAL writer?
>
> Uh.. that would mean you'd loose parallelism between WAL writing to disk and
> WAL shipping via network. That does not sound appealing to me.

That depends on the order of WAL writing and WAL shipping.
How about the following order?

1. A backend writes WAL to disk.
2. The backend wakes up WAL sender process and sleeps.
3. WAL sender process does WAL shipping and wakes up the backend.
4. The backend issues sync command.

>> I guess we could invent a new semaphore-like primitive at the same layer
>> as LWLocks using spinlock and PGPROC directly...
>
> Sure, but in what way would that differ from what I do with imessages?

Performance ;)

The timing of the process's receiving a signal is dependent on the scheduler
of kernel. The scheduler does not always handle a signal immediately.

Regards

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread tomas

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, Sep 08, 2008 at 02:18:55PM +0100, Greg Stark wrote:
> On Mon, Sep 8, 2008 at 2:11 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> > But of course case insensitivity isn't going to fix that example for you.
> > So we're right back at the question of where we should draw the line in
> > trying to accept variant input.
> 
> Well it's not a perfect precedent but for example, dd accepts:
> 
> G(2^30)
> M(2^20)
> k (2^10)
> K(2^10)
> Kb  (10^3)
> MB (10^6)
> GB (10^9)
> b(512)

Heh. But it doesn't take Gb or gB or gb, which was one of your express
wishes. So -- hm.

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIxSxlBcgs9XrR2kYRAkOkAJ9pp7nNjgJOb2NtHPwKIKZMcsNYlQCdE8wa
VQ/c+9Nan1V3d+/cdTm+Xn4=
=rkzg
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Zdenek Kotala


Tom Lane napsal(a):

Zdenek Kotala <[EMAIL PROTECTED]> writes:
Another idea is to create backward compatible AM and put them into separate 
library. If these AM will work also with old page structure then there should

not be reason for reindexing or index page conversion after upgrade.


I don't think that'd be real workable.  It would require duplicating all
the entries for that AM in pg_opfamily, pg_amop, etc.  Which we could do
for the built-in entries, I suppose, but what happens to user-defined
operator classes?


When catalog upgrade will be performed directly, user-defined op classes should 
stay in the catalog. But question is what's happen with regproc records and if 
all functions will be compatible with a new server ... It invokes idea that we 
need stable API for operator and data types implementation. All datatype which 
will use only this API, then can be used on new PostgreSQL versions without 
recompilation.



At least for the index changes proposed so far for 8.4, it seems to me
that the best solution is to mark affected indexes as not "indisvalid"
and require a post-conversion REINDEX to fix 'em.  Obviously a better
solution would be nice later, but we have to avoid putting huge amounts
of work into noncritical problems, else the whole feature is just not
going to get finished.


Agree.

Zdenek



--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Greg Stark

On Mon, Sep 8, 2008 at 2:11 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> But of course case insensitivity isn't going to fix that example for you.
> So we're right back at the question of where we should draw the line in
> trying to accept variant input.

Well it's not a perfect precedent but for example, dd accepts:

G(2^30)
M(2^20)
k (2^10)
K(2^10)
Kb  (10^3)
MB (10^6)
GB (10^9)
b(512)

I think we're all agreed we want to ignore the KiB crap and make all
our units base 2. And I don't think usin "b" for block makes sense for
us. But the point is that yes, people expect to type "100M" or "1G"
and have that work. Plenty of us do it all the time with dd or other
tools already.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Martin Pihlak

Tom Lane wrote:
> Timestamp within the file is certainly better than trying to rely on
> filesystem timestamps.  I doubt 1 sec resolution is good enough, and
> I'd also be worried about issues like clock skew between the
> postmaster's time and the filesystem's time.
> 

Attached is a patch which adds a timestamp to pgstat.stat file header,
backend_read_statsfile uses this to determine if the file is fresh.
During the wait loop, the stats request message is retransmitted to
compensate for possible loss of message(s).

The collector only writes the file mostly at PGSTAT_STAT_INTERVAL frequency,
currently no extra custom timeouts can be passed in the message. This can
of course be added if need arises.

regards,
Martin


Index: src/backend/postmaster/pgstat.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.181
diff -c -r1.181 pgstat.c
*** src/backend/postmaster/pgstat.c	25 Aug 2008 18:55:43 -	1.181
--- src/backend/postmaster/pgstat.c	8 Sep 2008 12:17:21 -
***
*** 85,90 
--- 85,94 
  #define PGSTAT_SELECT_TIMEOUT	2		/* How often to check for postmaster
  		 * death; in seconds. */
  
+ #define PGSTAT_POLL_RETRY_DELAY	10		/* How long to pause between statistics
+ 		 * update requests; in milliseconds. */
+ 
+ #define PGSTAT_POLL_RETRIES		500		/* How many times to repeat stats inquiry */
  
  /* --
   * The initial size hints for the hash tables used in the collector.
***
*** 201,206 
--- 205,215 
   */
  static PgStat_GlobalStats globalStats;
  
+ /* Last time the collector wrote the stats file */
+ static TimestampTz last_statwrite;
+ /* Last time a backend requested a new file */
+ static TimestampTz last_statrequest;
+ 
  static volatile bool need_exit = false;
  static volatile bool need_statwrite = false;
  static volatile bool got_SIGHUP = false;
***
*** 223,229 
  
  NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]);
  static void pgstat_exit(SIGNAL_ARGS);
- static void force_statwrite(SIGNAL_ARGS);
  static void pgstat_beshutdown_hook(int code, Datum arg);
  static void pgstat_sighup_handler(SIGNAL_ARGS);
  
--- 232,237 
***
*** 254,259 
--- 262,268 
  static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
  static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
  static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
+ static void pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len);
  
  
  /* 
***
*** 2463,2468 
--- 2472,2491 
  	hdr->m_type = mtype;
  }
  
+ /* --
+  * pgstat_send_inquiry() -
+  *
+  *		Notify collector that we need fresh data.
+  * --
+  */
+ static void
+ pgstat_send_inquiry(void)
+ {
+ 	PgStat_MsgInquiry msg;
+ 
+ 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
+ 	pgstat_send(&msg, sizeof(msg));
+ }
  
  /* --
   * pgstat_send() -
***
*** 2538,2545 
  NON_EXEC_STATIC void
  PgstatCollectorMain(int argc, char *argv[])
  {
- 	struct itimerval write_timeout;
- 	bool		need_timer = false;
  	int			len;
  	PgStat_Msg	msg;
  
--- 2561,2566 
***
*** 2571,2583 
  
  	/*
  	 * Ignore all signals usually bound to some action in the postmaster,
! 	 * except SIGQUIT and SIGALRM.
  	 */
  	pqsignal(SIGHUP, pgstat_sighup_handler);
  	pqsignal(SIGINT, SIG_IGN);
  	pqsignal(SIGTERM, SIG_IGN);
  	pqsignal(SIGQUIT, pgstat_exit);
- 	pqsignal(SIGALRM, force_statwrite);
  	pqsignal(SIGPIPE, SIG_IGN);
  	pqsignal(SIGUSR1, SIG_IGN);
  	pqsignal(SIGUSR2, SIG_IGN);
--- 2592,2603 
  
  	/*
  	 * Ignore all signals usually bound to some action in the postmaster,
! 	 * except SIGQUIT 
  	 */
  	pqsignal(SIGHUP, pgstat_sighup_handler);
  	pqsignal(SIGINT, SIG_IGN);
  	pqsignal(SIGTERM, SIG_IGN);
  	pqsignal(SIGQUIT, pgstat_exit);
  	pqsignal(SIGPIPE, SIG_IGN);
  	pqsignal(SIGUSR1, SIG_IGN);
  	pqsignal(SIGUSR2, SIG_IGN);
***
*** 2598,2608 
  	 */
  	need_statwrite = true;
  
- 	/* Preset the delay between status file writes */
- 	MemSet(&write_timeout, 0, sizeof(struct itimerval));
- 	write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
- 	write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;
- 
  	/*
  	 * Read in an existing statistics stats file or initialize the stats to
  	 * zero.
--- 2618,2623 
***
*** 2649,2668 
  		}
  
  		/*
! 		 * If time to write the stats file, do so.	Note that the alarm
! 		 * interrupt isn't re-enabled immediately, but only after we next
! 		 * receive a stats message; so no cycles are wasted when there is
! 		 * nothing going on.
  		 */
  		if (need_statwrite)
  		{
  			/* Check for postmaster death; if so we'll write file below */
  			if (!PostmasterIsAlive(true))
  break;
  
! 			pgstat_

Re: [HACKERS] Some newbie questions

2008-09-08 Thread Alvaro Herrera

M2Y escribió:

> On Sep 7, 11:52 pm, [EMAIL PROTECTED] (Shane Ambler) wrote:
> > > What is a good way to start understanding backend(postgres) code? Is
> > > there any documentation available especially for developers?

> > > What is commit log and why it is needed?
> >
> > To achieve ACID (Atomic, Consistent, Isolatable, Durable)
> > The changes needed to complete a transaction are saved to the commit log
> > and flushed to disk, then the data files are changed. If the power goes
> > out during the data file modifications the commit log can be used to
> > complete the changes without losing any data.
> 
> This, I think, is transaction log or XLog. My question is about CLog
> in which two bits are there for each transaction which will denote the
> status of transaction. Since there is XLog from which we can determine
> what changes we have to redo and undo, what is the need for this CLog.

That's correct -- what Shane is describing is the transaction log
(usually know here as WAL).  However, this xlog is write-only (except in
the case of a crash); clog is read-write, and must be fast to query
since it's used very frequently to determine visibility of each tuple.
Perhaps what you need to read is the chapter on our MVCC implementation,
which relies heavily on clog.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> I'm all for using the correct acronyms in all messages and documentation. What
> I find annoying is the:

> postgres=# set work_mem = '1g';
> ERROR: invalid value for parameter "work_mem": "1g"

But of course case insensitivity isn't going to fix that example for you.
So we're right back at the question of where we should draw the line in
trying to accept variant input.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Zdenek Kotala


Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:

Heikki Linnakangas napsal(a):
Relation forks didn't change anything inside relation files, so no 
scanning of relations is required because of that. Neither will the 
FSM rewrite. Not sure about DSM yet.


Does it mean, that if you "inject" old data file after catalog 
upgrade, then FSM will works without any problem?


Yes. You'll need to construct an FSM, but it doesn't necessarily need to 
reflect the reality. You could just fill it with zeros, meaning that 
there's no free space anywhere, and let the next vacuum fill it with 
real information. Or you could read the old pg_fsm.cache file and fill 
the new FSM accordingly.


I think zeroed FSM is good, because new items should not be added on to old 
page.

Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Magnus Hagander

Tom Lane wrote:
> Magnus Hagander <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> I'd also be worried about issues like clock skew between the
>>> postmaster's time and the filesystem's time.
> 
>> Can that even happen on a local filesystem? I guess you could put the
>> file on NFS though, but that seems to be.. eh. sub-optimal.. in more
>> than one way..
> 
> Lots of people use NAS/SAN type setups.

For NAS it could happen, but certainly not for SAN, no? SANs have all
the filesystem processing in the kernel of the server...

I still agree that it's not a good thing to rely on, though :-)

//Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Zdenek Kotala


Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:

Tom Lane napsal(a):

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
In fact, I don't think there's any low-level data format changes yet 
between 8.3 and 8.4, so this would be a comparatively easy release 
to implement upgrade-in-place. There's just the catalog changes, but 
AFAICS nothing that would require scanning through relations.


After a quick scan of the catversion.h changelog (which hopefully covers
any such changes): we changed sequences incompatibly, we changed hash
indexes incompatibly (even without the pending patch that would change
their contents beyond recognition), and Teodor did some stuff to GIN
indexes that might or might not represent an on-disk format change,
you'd have to ask him.  We also whacked around the sort order of
bpchar_pattern_ops btree indexes.


Hmm, It seems that reindex is only good answer on all these changes. 


Isn't that exactly what we want to avoid with upgrade-in-place? As long 
as the conversion can be done page-at-a-time, without consulting other 
pages, we can do it when the page is read in.


Yes, but I meant what we can do for 8.4.

Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Gregory Stark

Simon Riggs <[EMAIL PROTECTED]> writes:

> Peter's objection is reasonable, as far as most people have replied.
> Marko's proposal is also reasonable to most people, since they do not
> wish fat fingers to cause any amount of downtime. ISTM that if you've
> done this, you appreciate the feature, if not it seems less important.

My objection isn't down-time at all, it's the insultingly user-hostile
attitude. I normally am setting work_mem by hand in a psql session and *every*
time I do it I swear at Postgres for being annoyingly pedantic here.

I'm all for using the correct acronyms in all messages and documentation. What
I find annoying is the:

postgres=# set work_mem = '1g';
ERROR: invalid value for parameter "work_mem": "1g"
HINT:  It's perfectly clear what you want but I'm going to refuse to do
   it until you type it exactly as I say: "GB"

> * Marko should change patch to put WARNINGs in place so people know they
> got it wrong

That's only slightly less insulting than an error.

> * we make sure the case is always shown correctly in all other aspects
> of Postgres server and docs (no relaxation at all there)

I believe we're already fairly stringent about this as we should be.

> * in the longer term, we look for the solution to be a config checker

I don't think a config checker directly addresses the same problem. I never
set work_mem in a config and it still annoys the hell out of me.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our CLUSTER implementation is pessimal

2008-09-08 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> Yeah, I've been thinking about how to use the planner to do this.

I thought the answer to that was going to be more or less "call
cost_sort() and cost_index() and compare the answers".

> To do that it seems to me what we would need to do is add a function
> _pg_get_rawtuple_header() which returns the visibility information that HTSV
> needs. 

You seem to be confusing "use the planner" with "use the executor".
All that we need here is a decision about which code path to take
within CLUSTER.  We don't need to bring in boatloads of irrelevant
infrastructure --- especially not infrastructure that's going to be
fighting us every step of the way.  The executor isn't designed to
return raw tuples and no magic function is going to change that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> Peter's objection is reasonable, as far as most people have replied.
> Marko's proposal is also reasonable to most people, since they do not
> wish fat fingers to cause any amount of downtime. ISTM that if you've
> done this, you appreciate the feature, if not it seems less important.

I really think that the claim that this will "save downtime" is a
ridiculous argument.  On that basis we should, for example, be looking
for a nearest match for any misspelled variable name.  The fact of the
matter is that a configuration validator is a far better answer to any
such worries than trying to accept bad/questionable input.

> So my recommendation to everybody is
> * we allow case insensitive matches of units in postgresql.conf
> * Marko should change patch to put WARNINGs in place so people know they
> got it wrong
> * we make sure the case is always shown correctly in all other aspects
> of Postgres server and docs (no relaxation at all there)
> * in the longer term, we look for the solution to be a config checker

My vote is to reject the patch and work on a config checker.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Tom Lane

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> The bpchar_pattern_ops change you mentioned must be this one:
>> A not-immediately-obvious incompatibility is that the sort order within
>> bpchar_pattern_ops indexes changes --- it had been identical to plain
>> strcmp, but is now trailing-blank-insensitive.  This will impact
>> in-place upgrades, if those ever happen.

Yup.

> The way I read that, bpchar_pattern_ops just became less sensitive. Some 
> values are now considered equal that weren't before, and thus can now be 
> stored in any order. That's not an incompatible change, right?

No, consider 'abc^I' vs 'abc ' (^I denoting a tab character).  These are
unequal in either case, but the sort order has flipped.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Tom Lane

Zdenek Kotala <[EMAIL PROTECTED]> writes:
> Another idea is to create backward compatible AM and put them into separate 
> library. If these AM will work also with old page structure then there should
> not be reason for reindexing or index page conversion after upgrade.

I don't think that'd be real workable.  It would require duplicating all
the entries for that AM in pg_opfamily, pg_amop, etc.  Which we could do
for the built-in entries, I suppose, but what happens to user-defined
operator classes?

At least for the index changes proposed so far for 8.4, it seems to me
that the best solution is to mark affected indexes as not "indisvalid"
and require a post-conversion REINDEX to fix 'em.  Obviously a better
solution would be nice later, but we have to avoid putting huge amounts
of work into noncritical problems, else the whole feature is just not
going to get finished.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some newbie questions

2008-09-08 Thread Tom Lane

M2Y <[EMAIL PROTECTED]> writes:
> On Sep 7, 11:52 pm, [EMAIL PROTECTED] (Shane Ambler) wrote:
>> Most of the developer info is within comments in the code itself.
>> Another place to start ishttp://www.postgresql.org/developer/coding
>> 
> I have seen this link. But, I am looking(or hoping) for any design doc
> or technical doc which details what is happening under the hoods as it
> will save a lot of time to catchup the main stream.

Well, you should certainly not neglect
http://developer.postgresql.org/pgdocs/postgres/internals.html

Also note that many subtrees of the source code contain README files
with assorted overview material.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Tom Lane

Magnus Hagander <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I'd also be worried about issues like clock skew between the
>> postmaster's time and the filesystem's time.

> Can that even happen on a local filesystem? I guess you could put the
> file on NFS though, but that seems to be.. eh. sub-optimal.. in more
> than one way..

Lots of people use NAS/SAN type setups.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-08 Thread Markus Wanner


Hi,

ITAGAKI Takahiro wrote:

  1. Is process-switching approach the best way to share one socket?
Both Postgres-R and the log-shipping prototype use the approach now.
Can I think there is no objection here?


I don't see any appealing alternative. The postmaster certainly 
shouldn't need to worry with any such socket for replication. Threading 
falls pretty flat for Postgres. So the socket must be held by one of the 
child processes of the Postmaster.



  2. If 1 is reasonable, how should we add a new WAL sender process?
Just add a new process using a core-patch?


Seems feasible to me, yes.


Merge into WAL writer?


Uh.. that would mean you'd loose parallelism between WAL writing to disk 
and WAL shipping via network. That does not sound appealing to me.



Consider framework to add any of user-defined auxiliary process?


What for? What do you miss in the existing framework?


  3. If 1 is reasonable, what should we use for the process-switching
 primitive?
Postgres-R uses signals and locking and the log-shipping prototype
uses multi-threads and POSIX message queues now.


AFAIK message queues are problematic WRT portability. At least Postgres 
doesn't currently use them and introducing dependencies on those might 
lead to problems, but I'm not sure. Others certainly know more about 
issues involved.


A multi-threaded approach is certainly out of bounds, at least within 
the Postgres core code.



Signals and locking is possible choice for 3, but I want to use better
approach if any. Faster is always better.


I think the approach can reach better throughput than POSIX message 
queues or unix pipes, because of the mentioned savings in copying around 
between system and application memory. However, that hasn't been proved, 
yet.



I guess we could invent a new semaphore-like primitive at the same layer
as LWLocks using spinlock and PGPROC directly...


Sure, but in what way would that differ from what I do with imessages?

Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] [HACKERS] New shapshot RPMs (Sep 7 2008) are ready for testing

2008-09-08 Thread Devrim GÜNDÜZ

Hi,

On Sun, 2008-09-07 at 13:39 -0400, Andrew Dunstan wrote:

> The point I was making is that for 8.4, unless you specifically 
> configure with --disable-integer-datetimes, it is enabled by default on 
> any platform that can support it. We committed that change on 30 March 
> here: http://archives.postgresql.org/pgsql-committers/2008-03/msg00550.php

You are right, and I overlooked the actual macro.

I now "fixed" the macro by changing its behavior to use the
--disable-integer-datetimes mode if not defined.

Cheers, 
-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] Our CLUSTER implementation is pessimal

2008-09-08 Thread Gregory Stark

Heikki Linnakangas <[EMAIL PROTECTED]> writes:

> Simon Riggs wrote:
>> I think there needs to be an option to force this to do either sorts or
>> indexscans. 
>
> If we use the planner, "set enable_indexscan =off" or "set enable_sort=off"
> ought to work.

Yeah, I've been thinking about how to use the planner to do this. It seems to
me it would be a much better solution because it would allow us to take
advantage of other access paths as well (such as other indexes) and even new
features that don't currently exist (index-only scans...).

To do that it seems to me what we would need to do is add a function
_pg_get_rawtuple_header() which returns the visibility information that HTSV
needs. 

Then we need to construct an SPI query like

 SELECT _pg_get_rawtuple_header(), * FROM tab ORDER BY col1, col2, col3, ...

For each tuple we'll have to deform it, and reform it using the new tuple
descriptor and just the columns excluding the header and pass that to the heap
rewrite module. Passing the header separately.

Heap rewrite would have to call HTSV on just the header (with the same hack I
put in for this patch to allow passing InvalidBuffer to HTSV).

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-08 Thread ITAGAKI Takahiro

Markus Wanner <[EMAIL PROTECTED]> wrote:

> ITAGAKI Takahiro wrote:
> > Are there any better idea to share one socket connection between
> > backends (and bgwriter)?
>
> I fear I'm repeating myself, but I've had the same problem for 
> Postgres-R and solved it with an internal message passing infrastructure 
> which I've simply called imessages. It requires only standard Postgres 
> shared memory, signals and locking and should thus be pretty portable.

Imessage serves as a useful reference, but it is one of the detail parts
of the issue. I can break down the issue into three parts:

  1. Is process-switching approach the best way to share one socket?
Both Postgres-R and the log-shipping prototype use the approach now.
Can I think there is no objection here?

  2. If 1 is reasonable, how should we add a new WAL sender process?
Just add a new process using a core-patch?
Merge into WAL writer?
Consider framework to add any of user-defined auxiliary process?

  3. If 1 is reasonable, what should we use for the process-switching
 primitive?
Postgres-R uses signals and locking and the log-shipping prototype
uses multi-threads and POSIX message queues now.

Signals and locking is possible choice for 3, but I want to use better
approach if any. Faster is always better.

I guess we could invent a new semaphore-like primitive at the same layer
as LWLocks using spinlock and PGPROC directly...

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our CLUSTER implementation is pessimal

2008-09-08 Thread Simon Riggs

On Mon, 2008-09-08 at 13:52 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > I think there needs to be an option to force this to do either sorts or
> > indexscans. 
> 
> If we use the planner, "set enable_indexscan =off" or "set 
> enable_sort=off" ought to work.

Agreed - as long as that is explicitly in the docs.

I'm wondering whether we should put a limit on size of each temp
tablespace. This change will cause old admin jobs to break disks that
aren't big enough for the new way of doing it.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-08 Thread Simon Riggs

On Wed, 2008-09-03 at 11:58 +0300, Asko Oja wrote:
> On Wed, Sep 3, 2008 at 11:20 AM, Heikki Linnakangas
> <[EMAIL PROTECTED]> wrote:
> Marko Kreen wrote:
> On 9/2/08, Peter Eisentraut <[EMAIL PROTECTED]> wrote:
> Marko Kreen wrote:
> In the meantime, here is simple patch
> for case-insensivity.
>  You might be able to talk me into accepting
> various unambiguous, common
> alternative spellings of various units.  But
> for instance allowing MB and Mb
> to mean the same thing is insane. 
> 
> How would the docs for that look like?  And anyway,
> what is wrong with
> Mb for megabytes?
> From infamous wikipedia: A megabit is a unit of information or
> computer storage, abbreviated Mbit (or Mb).
> To me playing with case of acronyms and even depending on it seems
> more insane. It would make much more sense to have case insensitive
> set of acronyms and (thanks Tom for pointing out) some sanity checks
> when configuration is loaded to notify user when wrong ones are used
> for some context.

There is a patch on the CommitFest, so we must either accept it or
reject it (with/without comments).

Peter's objection is reasonable, as far as most people have replied.
Marko's proposal is also reasonable to most people, since they do not
wish fat fingers to cause any amount of downtime. ISTM that if you've
done this, you appreciate the feature, if not it seems less important.

I would point out that Marko's patch is about what values get accepted
in postgresql.conf, not how the units are represented when you perform
SHOW, look at pg_settings or read the docs. So Marko's proposal does not
ignore the important distinction between units in *all* places, just at
the time of input. With that in mind, the proposal seems to be
acceptable to the majority, based upon my assessment of the comments.

In terms of the patch, ISTM that relaxing the comparison logic also
appears to relax the warnings given and that is not acceptable, given
concerns raised.

So my recommendation to everybody is
* we allow case insensitive matches of units in postgresql.conf
* Marko should change patch to put WARNINGs in place so people know they
got it wrong
* we make sure the case is always shown correctly in all other aspects
of Postgres server and docs (no relaxation at all there)
* in the longer term, we look for the solution to be a config checker

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our CLUSTER implementation is pessimal

2008-09-08 Thread Heikki Linnakangas


Simon Riggs wrote:

I think there needs to be an option to force this to do either sorts or
indexscans. 


If we use the planner, "set enable_indexscan =off" or "set 
enable_sort=off" ought to work.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-08 Thread Markus Wanner


Hi,

ITAGAKI Takahiro wrote:

Are there any better idea to share one socket connection between
backends (and bgwriter)? The connections could be established after
fork() from postmaster, and number of them could be two or more.
This is one of the most complicated part of synchronous log shipping.
Switching-processes apporach like b) is just one idea for it.


I fear I'm repeating myself, but I've had the same problem for 
Postgres-R and solved it with an internal message passing infrastructure 
which I've simply called imessages. It requires only standard Postgres 
shared memory, signals and locking and should thus be pretty portable.


In simple benchmarks, it's not quite as efficient as unix pipes, but 
doesn't require as many file descriptors, is independent of the 
parent-child relations of processes, maintains message borders and it is 
more portable (I hope). It could certainly be improved WRT efficiency 
and could theoretically even beat Unix pipes, because it involves less 
copying of data and less syscalls.


It has not been reviewed nor commented much. I'd still appreciate that.

Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-08 Thread ITAGAKI Takahiro

Bruce Momjian <[EMAIL PROTECTED]> wrote:

> > > b) Use new background process as WALSender
> > > 
> > >This idea needs background-process hook which enables users
> > >to define new background processes

> I think starting/stopping a process for each WAL send is too much
> overhead.

Yes, of course slow. But I guess it is the only way to share one socket
in all backends. Postgres is not a multi-threaded architecture,
so each backend should use dedicated connections to send WAL buffers.
300 backends require 300 connections for each slave... it's not good at all.

> It sounds like Fujii-san is basically saying they can only get the hooks
> done for 8.4, not the actual solution.

No! He has an actual solution in his prototype ;-)
It is very similar to b) and the overhead was not so bad.
It's not so clean to be a part of postgres, though.

Are there any better idea to share one socket connection between
backends (and bgwriter)? The connections could be established after
fork() from postmaster, and number of them could be two or more.
This is one of the most complicated part of synchronous log shipping.
Switching-processes apporach like b) is just one idea for it.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NDirectFileRead and Write

2008-09-08 Thread ITAGAKI Takahiro

"Hitoshi Harada" <[EMAIL PROTECTED]> wrote:

> so I guess all of these variables should be defined together but
> actually you put the two in buffile.h while the others in
> buf_internals.h. Is there clear reason for that?

That's because buffile.c includes buffile.h, but not buf_internals.h .
NDirectFileRead/Writes are counters for BufFile in the patch
and don't depend on bufmgr.

It might be cleaner if we had something like storage/bufstats.h
and declared all counter variables in it, but it seems to be an overkill.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Heikki Linnakangas


Zdenek Kotala wrote:

Tom Lane napsal(a):

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
In fact, I don't think there's any low-level data format changes yet 
between 8.3 and 8.4, so this would be a comparatively easy release to 
implement upgrade-in-place. There's just the catalog changes, but 
AFAICS nothing that would require scanning through relations.


After a quick scan of the catversion.h changelog (which hopefully covers
any such changes): we changed sequences incompatibly, we changed hash
indexes incompatibly (even without the pending patch that would change
their contents beyond recognition), and Teodor did some stuff to GIN
indexes that might or might not represent an on-disk format change,
you'd have to ask him.  We also whacked around the sort order of
bpchar_pattern_ops btree indexes.


Hmm, It seems that reindex is only good answer on all these changes. 


Isn't that exactly what we want to avoid with upgrade-in-place? As long 
as the conversion can be done page-at-a-time, without consulting other 
pages, we can do it when the page is read in.


I'm not sure what the GIN changes were, but I didn't see any changes to 
the page layout at a quick glance.


The bpchar_pattern_ops change you mentioned must be this one:

A not-immediately-obvious incompatibility is that the sort order within
bpchar_pattern_ops indexes changes --- it had been identical to plain
strcmp, but is now trailing-blank-insensitive.  This will impact
in-place upgrades, if those ever happen.


The way I read that, bpchar_pattern_ops just became less sensitive. Some 
values are now considered equal that weren't before, and thus can now be 
stored in any order. That's not an incompatible change, right?



Sequence should be converted during catalog conversion.


Agreed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Heikki Linnakangas


Zdenek Kotala wrote:

Heikki Linnakangas napsal(a):
Relation forks didn't change anything inside relation files, so no 
scanning of relations is required because of that. Neither will the 
FSM rewrite. Not sure about DSM yet.


Does it mean, that if you "inject" old data file after catalog upgrade, 
then FSM will works without any problem?


Yes. You'll need to construct an FSM, but it doesn't necessarily need to 
reflect the reality. You could just fill it with zeros, meaning that 
there's no free space anywhere, and let the next vacuum fill it with 
real information. Or you could read the old pg_fsm.cache file and fill 
the new FSM accordingly.



PS: I plan to review FSM this week.


Thanks!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] pgbench duration option

2008-09-08 Thread ITAGAKI Takahiro

Here is a revised version of the pgbench duration patch.
I merged some comments from Brendan and gnari.

"Brendan Jurd" <[EMAIL PROTECTED]> wrote:

> >> Wouldn't this be better written as:
> >>   if ((duration > 0 && timer_exceeded) || st->cnt >= nxacts)
> >
> > sorry, but these do not lok as the same thing to me.
> > gnari

I used this condition expression here:
   if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



pgbench-duration_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Zdenek Kotala


Heikki Linnakangas napsal(a):

Tom Lane wrote:

I didn't see anything that looked like an immediate change in user table
contents, unless they used the "name" type; but what of relation forks?


Relation forks didn't change anything inside relation files, so no 
scanning of relations is required because of that. Neither will the FSM 
rewrite. Not sure about DSM yet.




Does it mean, that if you "inject" old data file after catalog upgrade, then FSM 
will works without any problem?


Zdenek

PS: I plan to review FSM this week.



--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Zdenek Kotala


Tom Lane napsal(a):

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
In fact, I don't think there's any low-level data format changes yet 
between 8.3 and 8.4, so this would be a comparatively easy release to 
implement upgrade-in-place. There's just the catalog changes, but AFAICS 
nothing that would require scanning through relations.


After a quick scan of the catversion.h changelog (which hopefully covers
any such changes): we changed sequences incompatibly, we changed hash
indexes incompatibly (even without the pending patch that would change
their contents beyond recognition), and Teodor did some stuff to GIN
indexes that might or might not represent an on-disk format change,
you'd have to ask him.  We also whacked around the sort order of
bpchar_pattern_ops btree indexes.


Hmm, It seems that reindex is only good answer on all these changes. Sequence 
should be converted during catalog conversion.


Another idea is to create backward compatible AM and put them into separate 
library. If these AM will work also with old page structure then there should 
not be reason for reindexing or index page conversion after upgrade.


Zdenek



--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] to_date() validation

2008-09-08 Thread Brendan Jurd

On Sun, Sep 7, 2008 at 5:58 AM, Alex Hunsaker <[EMAIL PROTECTED]> wrote:
> Im just following this:
> http://wiki.postgresql.org/wiki/Reviewing_a_Patch so lets get started.
>

Hi Alex.  Thanks for taking the time to review my patch.

> Feature test: Everything seems to work as advertised. However before
> via sscanf() most limited the max length of the input.  Before they
> were just silently ignored now they get added to the result:
>
> patched:
> # SELECT to_timestamp('1', 'HHMM');
> to_timestamp
> --
>  0009-03-19 11:00:00-06:59:56
>
> 8.3.3:
> # SELECT to_timestamp('1', 'HHMM');
>   to_timestamp
> ---
>  0001-11-01 11:00:00-07 BC
>

It's an interesting point.  I had considered what would happen if the
input string was too short, but hadn't given much thought to what
would happen if it was too long.

In your example case, it isn't clear whether the user wanted to
specify 11 hours and 11 months (and the final '1' is just junk), or if
they really wanted to specify 11 hours and 111 months.

But consider a case like the following:

# SELECT to_date('07-09-20008', 'DD-MM-');

The documentation says that '' is "4 and more digits", so you have
to assume that the user meant to say the year 20,008.  That's why the
code in the patch parses all the remaining digits in the input string
on the final node.

HEAD actually gets this one wrong; in defiance of the documentation it
returns 2000-09-07.  So, it seems to me that the patch shifts the
behaviour in the right direction.

Barring actually teaching the code that some nodes (like ) can
take an open-ended number of characters, while others (like MM) must
take an exact number of characters, I'm not sure what can be done to
improve this.  Perhaps something for a later patch?

> Code review: one minor nit
> *** a/src/backend/utils/adt/formatting.c
> --- b/src/backend/utils/adt/formatting.c
> ***
> *** 781,787  static const KeyWord DCH_keywords[] = {
>{"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN},
>
>/* last */
> !   {NULL, 0, 0, 0}
>  };
>
>  /* --
> --- 781,787 
>{"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN},
>
>/* last */
> !   {NULL, 0, 0, 0, 0}
>  };

Ah, thank you for picking that up.  I will correct this and send in a
new patch version in the next 24 hours.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Zdenek Kotala


Bruce Momjian napsal(a):

Heikki Linnakangas wrote:

Bruce Momjian wrote:

As far as the page not fitting after conversion, what about some user
command that will convert an entire table to the new format if page
expansion fails.

VACUUM?

Having to run a manual command defeats the purpose somewhat, though. 
Especially if you have no way of knowing on what tables it needs to be 
run on.


My assumption is that the page not fitting would be a rare case so
requiring something like vacuum to fix it would be OK.


It is 1-2% records per heap. I assume that is is more for BTree.


What I don't want to do it to add lots of complexity to the code just to
handle the page expansion case, when such a case is rare and perhaps can
be fixed by a vacuum.


Unfortunately it is not so rare. And only heap on 32bit x86 platform (4byte Max 
alignment) is no problem. But all index pages are affected.




In fact, I don't think there's any low-level data format changes yet 
between 8.3 and 8.4, so this would be a comparatively easy release to 
implement upgrade-in-place. There's just the catalog changes, but AFAICS 
nothing that would require scanning through relations.


Yep.


I did not test now but pg_upgrade.sh script worked fine in May without any 
modification for conversion 8.3->8.4.


Zdenek



--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Zdenek Kotala


Bruce Momjian napsal(a):


As far as the page not fitting after conversion, what about some user
command that will convert an entire table to the new format if page
expansion fails.


Keep in a mind that there are more kind of pages. Heap is easy, but each index 
AM has own specific :(. Better approach is move tuple to the new page and 
invalidate all related table indexes. Following reindex automatically convert 
whole table.


After putting all those together, how large a patch are we talking 
about, and what's the performance penalty then? How much of all that 
needs to be in core, and how much can live in a pgfoundry project or an 
extra binary in src/bin or contrib? I realize that none of us have a 
crystal ball, and one has to start somewhere, but I feel uneasy 
committing to an approach until we have a full plan.


Yes, another very good point.

I am ready to focus on these issues for 8.4;  all this needs to be
fleshed out, perhaps on a wiki.  As a starting point, what would be
really nice is to start a wiki that lists all data format changes for
every major release.


As Greg mentioned in his mail there wiki page is already there. Unfortunately, I 
did not time to put actual information there. I'm going to do soon.


Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prototype: In-place upgrade v02

2008-09-08 Thread Zdenek Kotala


Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:

Heikki Linnakangas napsal(a):

The patch seems to be missing the new htup.c file.


I'm sorry. I attached new version which is synchronized with current
head. I would like to say more comments as well.

1) The patch contains also changes which was discussed during July 
commit fest. - PageGetTempPage modification suggested by Tom

- another hash.h backward compatible cleanup


It might be a good idea to split that into a separate patch. The sheer 
size of this patch is quite daunting, even though the bulk of it is 
straightforward search&replace.


Yes, I will do it.

2) I add tuplimits.h header file which contains tuple limits for 
different access method. It is not finished yet, but idea is to keep 
all limits in one file and easily add limits for different page layout 
version - for example replace static computing with dynamic based on 
relation (maxtuplesize could be store in pg_class for each relation).


I need this header also because I fallen in a cycle in header dependency.

3) I already sent Page API performance result in 
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00398.php


I replaced call sequence PagetGetItemId, PageGetItemId with 
PageGetIndexTuple and PageGetHeapTuple function. It is main difference 
in this patch. PAgeGetHeap Tuple fills t_ver in HeapTuple to identify 
correct tupleheader version.


It would be good to mention that PageAPI (and tuple API) 
implementation is only prototype without any performance optimization.


You mentioned 5% performance degradation in that thread. What test case 
was that? What would be a worst-case scanario, and how bad is it?


Paul van den Bogaart tested long run OLTP workload on it. He used iGen test.

5% is a pretty hefty price, especially when it's paid by not only 
upgraded installations, but also freshly initialized clusters. I think 
you'll need to pursue those performance optimizations.


5% is worst scenario. Current version is not optimized. It is written for easy 
debugging and (D)tracing. Pageheaders structures are very similar and we can 
easily remove switches for most of attributes and replace function with macros 
or inline function.


4) This patch contains more topics for decision. First is general if 
this approach is acceptable.


I don't like the invasiveness of this approach. It's pretty invasive 
already, and ISTM you'll need similar switch-case handling of all data 
types that have changed the internal representation as well.


I agree in general. But for example new page API is not so invasive and by my 
opinion it should be implemented (with or without multiversion support), because 
it cleans a code. HeapTuple processing is easy too, but unfortunately it 
requires lot of modifications on many places. I has wonder how many pieces of 
code access directly to HeapTupleHeader and does not use HeapTuple data 
structure. I think we should make a conclusion what is recommended usage of 
HeapTupleHeader and HeapTuple. Most of changes in a code is like replacing 
HeapTupleHeaderGetXmax(tuple->t_data) with HeapTupleGetXmax(tuple) and so on. I 
think it should be cleanup anyway.


You mentioned data types, but it is not a problem. You can easily extend data 
type attribute about version information and call correct in/out functions. Or 
use different Oid for new data type version. There are more possible easy 
solutions for data types. And for conversion You can use ALTER TABLE command.
Main idea is keep data in all format in a relation. This approach should use 
also for integer/float datetime problem.


We've talked about this before, so you'll remember that I favor teh 
approach is to convert the page format, page at a time, when the pages 
are read in. I grant you that there's non-trivial issues with that as 
well, like if the converted data takes more space and don't fit in the 
page anymore.


I like conversion on read too, because it is easy but there are more problems.

The non-fit page is one them.  Others problems are with indexes. For example 
hash index stores bitmap into page and it is not mentioned anywhere. Only hash 
am knows what page contains this kind of data. It is probably impossible to 
convert this page during a reading. :(


I wonder if we could go with some sort of a hybrid approach? Convert the 
 whole page when it's read in, but if it doesn't fit, fall back to 
tricks like loosening the alignment requirements on platforms that can 
handle non-aligned data, or support a special truncated page header, 
without pd_tli and pd_prune_xid fields. Just a thought, not sure how 
feasible those particular tricks are, but something along those lines..


OK, I have backup idea :-). Stay tuned :-)

All in all, though. I find it a bit hard to see the big picture. For 
upgrade-in-place, what are all the pieces that we need? To keep this 
concrete, let's focus on PG 8.2 -> PG 8.3 (or are you focusing on PG 8.3 
-> 8.4? That's fine with me as well, but le

Re: [HACKERS] Some newbie questions

2008-09-08 Thread M2Y

Thanks Shane for your response...

On Sep 7, 11:52 pm, [EMAIL PROTECTED] (Shane Ambler) wrote:
> > What is a good way to start understanding backend(postgres) code? Is
> > there any documentation available especially for developers?
>
> Most of the developer info is within comments in the code itself.
> Another place to start ishttp://www.postgresql.org/developer/coding
>
I have seen this link. But, I am looking(or hoping) for any design doc
or technical doc which details what is happening under the hoods as it
will save a lot of time to catchup the main stream.

> > What is commit log and why it is needed?
>
> To achieve ACID (Atomic, Consistent, Isolatable, Durable)
> The changes needed to complete a transaction are saved to the commit log
> and flushed to disk, then the data files are changed. If the power goes
> out during the data file modifications the commit log can be used to
> complete the changes without losing any data.

This, I think, is transaction log or XLog. My question is about CLog
in which two bits are there for each transaction which will denote the
status of transaction. Since there is XLog from which we can determine
what changes we have to redo and undo, what is the need for this CLog.

>
> > Why does a replication solution need log shipping and why cant we
> > just ship the transaction statements to a standby node?
>
> Depends on what you wish to achieve. They are two ways to a similar
> solution.
> Log shipping is part of the core code with plans to make the duplicate
> server be able to satisfy select queries.
> Statement based replication is offered by other options such as slony.
>
> Each has advantages and disadvantages. Transaction logs are part of
> normal operation and can be copied to another server in the background
> without adding load or delays to the master server.
>
> Statement based replication has added complexity of waiting for the
> slaves to duplicate the transaction and handling errors from a slave
> applying the transaction. They also tend to have restrictions when it
> comes to replicating DDL changes - implemented as triggers run from
> INSERT/UPDATE not from CREATE/ALTER TABLE.

I agree. Assuming that both master and backup are running same
versions of the server and both are in sync, why cant we just send the
command statements to standby in the main backend loop(before parsing)
and let the standby ignore the SELECT kind of statements.

I am a beginner ... plz forgive my ignorance and plz provide some
clarity so that I can understand the system better.

Thanks,
Srinivas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Magnus Hagander

Tom Lane wrote:
> Martin Pihlak <[EMAIL PROTECTED]> writes:
>> I had also previously experimented with stat() based polling but ran into
>> the same issues - no portable high resolution timestamp on files. I guess
>> stat() is unusable unless we can live with 1 second update interval for the
>> stats (eg. backend reads the file if it is within 1 second of the request).
> 
>> One alternative is to include a timestamp in the stats file header - the
>> backend can then wait on that -- check the timestamp, sleep, resend the
>> request, loop. Not particularly elegant, but easy to implement. Would this
>> be acceptable?
> 
> Timestamp within the file is certainly better than trying to rely on
> filesystem timestamps.  I doubt 1 sec resolution is good enough, and

We'd need half a second resolution just to keep up with the level we
have *now*, don't we?

> I'd also be worried about issues like clock skew between the
> postmaster's time and the filesystem's time.

Can that even happen on a local filesystem? I guess you could put the
file on NFS though, but that seems to be.. eh. sub-optimal.. in more
than one way..

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Magnus Hagander

Martin Pihlak wrote:
> Magnus Hagander wrote:
>> I wrote a patch for this some time back, that was actually applied.
>> Turns out it didn't work, and I ran out of time to fix it, so it was
>> backed out again. And then I forgot about it :-) If you look through the
>> cvs history of pgstat you should be able to find it - maybe it can give
>> you some further ideas.
> 
> Got it - this was 1.126. Looks very familiar indeed :)

:-)


> I had also previously experimented with stat() based polling but ran into
> the same issues - no portable high resolution timestamp on files. I guess
> stat() is unusable unless we can live with 1 second update interval for the
> stats (eg. backend reads the file if it is within 1 second of the request).

If the filesystem has inodes, noticing that the inode changes should
usually be enough, no? Since we always create a new file and rename it
into place, there is no way that inode can be reused right away. But it
does require that the stat info is not cached somewhere in between...


//Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

2008-09-08 Thread Heikki Linnakangas


David Rowley wrote:

Thanks for all the reviews and suggestions.


David, could you re-run the performance tests you ran earlier? I'm 
curious to know what impact switching to the simpler loop for 1-byte 
pattern had.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 102 matches

Mail list logo