Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Daniel Farina
On Mon, Oct 1, 2012 at 12:57 AM, Magnus Hagander  wrote:
> Can we please expose the internal hash id of the statements in
> pg_stat_statements?
>
> I know there was discussions about it earlier, and it wasn't done with
> an argument of it not being stable between releases (IIRC). I think we
> can live with that drawback, assuming of course that we document this
> properly.
>
> I've now run into multiple customer installations where it would be
> very useful to have. The usecase is mainly storing snapshots of the
> pg_stat_statements output over time and analyzing those. Weird things
> happen for example when the query text is the same, but the hash is
> different (which can happen for example when a table is dropped and
> recreated). And even without that, in order to do anything useful with
> it, you end up hashing the query text anyway - so using the already
> existing hash would be easier and more useful.

I have a similar problem, however, I am not sure if the hash generated
is ideal.  Putting aside the number of mechanical, versioning,
shut-down/stats files issues, etc reasons given in the main branch of
the thread, I also have this feeling that it is not what I want.
Consider the following case:

SELECT * FROM users WHERE id = ?



SELECT * FROM users WHERE id = ?

In the intervening time, an equivalent hash could still be evicted and
reintroduced and the statistics silently reset, and that'll befuddle
principled tools.  This is worse than merely less-useful, because it
can lead to drastic underestimations that otherwise pass inspection.

Instead, I think it makes sense to assign a number -- arbitrarily, but
uniquely -- to the generation of a new row in pg_stat_statements, and,
on the flip side, whenever a row is retired its number should be
eliminated, practically, for-ever.  This way re-introductions between
two samplings of pg_stat_statements cannot be confused for a
contiguously maintained statistic on a query.

-- 
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PQping command line tool

2012-10-02 Thread Phil Sorber
I was wondering recently if there was any command line tool that
utilized PQping() or PQpingParams(). I searched the code and couldn't
find anything and was wondering if there was any interest to have
something like this included? I wrote something for my purposes of
performing a health check that also supports nagios style status
output. It's probably convenient for scripting purposes as well. It's
not currently ready for submission to a commitfest, but if there was
an interest I would clean it up so that it would be.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.1] 2 bugs with extensions

2012-10-02 Thread Alvaro Herrera
Excerpts from Dimitri Fontaine's message of lun oct 01 04:44:28 -0300 2012:
> Hi,
> 
> Alvaro Herrera  writes:
> >> Same for 9.2, attached. master needs yet another patch because of the
> >> recent headers reorg, it seems, that's for another day though.
> >
> > No, just remove the RELKIND_UNCATALOGUED case in that switch.
> 
> Oh. As in the attached? :)

That seems to work, yes.

While testing this out I noticed that this silently does nothing:

SELECT pg_catalog.pg_extension_config_dump('my_config', NULL);

i.e. the table is not marked as configuration but no error or warning
appears, either.  This seems rather surprising.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade does not completely honor --new-port

2012-10-02 Thread Bruce Momjian

Applied to head and 9.2.

---

On Wed, Sep 26, 2012 at 10:06:50PM -0400, Bruce Momjian wrote:
> On Tue, Sep 25, 2012 at 05:36:54PM +0300, Devrim Gunduz wrote:
> > 
> > Hi,
> > 
> > I just performed a test upgrade from 9.1 to 9.2, and used --new-port
> > variable. However, the analyze_new_cluster.sh does not include the new
> > port, thus when I run it, it fails. Any chance to add the port number to
> > the script?
> 
> Well, the reason people normally use the port number is to do a live
> check, but obviously when the script is created it isn't doing a check. 
> I am worried that if I do embed the port number in there, then if they
> change the port after the upgrade, they now can't use the script.  I
> assume users would have PGPORT set before running the script, no?
> 
> > Also, is it worth to add the value specified in --new-bindir as a prefix
> > to vacuumdb command in the same script?
> 
> Wow, I never thought of adding a path to those scripts, but it certainly
> makes sense.  I have created the attached patch which does this.
> 
> -- 
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
> 
>   + It's impossible for everything to be true. +

> diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
> index bed10f8..2785eb7 100644
> --- a/contrib/pg_upgrade/check.c
> +++ b/contrib/pg_upgrade/check.c
> @@ -477,7 +477,7 @@ create_script_for_cluster_analyze(char 
> **analyze_script_file_name)
>   ECHO_QUOTE, ECHO_QUOTE);
>   fprintf(script, "echo %sthis script and run:%s\n",
>   ECHO_QUOTE, ECHO_QUOTE);
> - fprintf(script, "echo %svacuumdb --all %s%s\n", ECHO_QUOTE,
> + fprintf(script, "echo %s\"%s/vacuumdb\" --all %s%s\n", ECHO_QUOTE, 
> new_cluster.bindir,
>   /* Did we copy the free space files? */
>   (GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
>   "--analyze-only" : "--analyze", ECHO_QUOTE);
> @@ -498,7 +498,7 @@ create_script_for_cluster_analyze(char 
> **analyze_script_file_name)
>   ECHO_QUOTE, ECHO_QUOTE);
>   fprintf(script, "echo 
> %s--%s\n",
>   ECHO_QUOTE, ECHO_QUOTE);
> - fprintf(script, "vacuumdb --all --analyze-only\n");
> + fprintf(script, "\"%s/vacuumdb\" --all --analyze-only\n", 
> new_cluster.bindir);
>   fprintf(script, "echo%s\n", ECHO_BLANK);
>   fprintf(script, "echo %sThe server is now available with minimal 
> optimizer statistics.%s\n",
>   ECHO_QUOTE, ECHO_QUOTE);
> @@ -519,7 +519,7 @@ create_script_for_cluster_analyze(char 
> **analyze_script_file_name)
>   ECHO_QUOTE, ECHO_QUOTE);
>   fprintf(script, "echo 
> %s---%s\n",
>   ECHO_QUOTE, ECHO_QUOTE);
> - fprintf(script, "vacuumdb --all --analyze-only\n");
> + fprintf(script, "\"%s/vacuumdb\" --all --analyze-only\n", 
> new_cluster.bindir);
>   fprintf(script, "echo%s\n\n", ECHO_BLANK);
>  
>  #ifndef WIN32
> @@ -532,7 +532,7 @@ create_script_for_cluster_analyze(char 
> **analyze_script_file_name)
>   ECHO_QUOTE, ECHO_QUOTE);
>   fprintf(script, "echo 
> %s-%s\n",
>   ECHO_QUOTE, ECHO_QUOTE);
> - fprintf(script, "vacuumdb --all %s\n",
> + fprintf(script, "\"%s/vacuumdb\" --all %s\n", new_cluster.bindir,
>   /* Did we copy the free space files? */
>   (GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
>   "--analyze-only" : "--analyze");

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Stirling Newberry
This case points to a weakness in many programming languages, not having a 
clear ifof (if and only if) versus if construction. 

The sane use case for create schema foo if not exists  is for building 
a database dynamically, where several points may be the first to put a table in 
a schema, and schemas should not be created if there are no objects. The 
create/search/drop design pattern having its own problems.

Thus the construction should default to one behavior, and have an option for 
the second.

e.g.

create schema foo if not exists (will not be done if foo existed)
create schema foo if not exists FORCE (will be done even if foo existed)

This would even allow for mixed e.g. 

create schema foo if not exists (tables that should be created once and not 
again)
FORCE (objects routine will add if the schema does)




On Oct 2, 2012, at 6:33 PM, Fabrízio de Royes Mello wrote:

> 
> 2012/10/2 Alvaro Herrera 
> Excerpts from Andrew Dunstan's message of mar oct 02 17:24:38 -0300 2012:
> >
> > On 10/02/2012 03:48 PM, Tom Lane wrote:
> > > Alvaro Herrera  writes:
> 
> > >> Well, if that's the rationale then you end up with no schema foo at all
> > >> (i.e. both die), which seems even more surprising (though I admit it has
> > >> the advantage of being a simple rule to document.)
> > > I think we should just disallow putting any contained objects in the
> > > statement when IF NOT EXISTS is used.  It's simple to understand, simple
> > > to document and implement, and I think it covers all the sane use-cases
> > > anyway.
> >
> > I thought we'd already agreed on this.
> 
> Well, it's not what the latest proposed patch implements.
> 
> 
> You're right... the latest proposed patch don't implements it.
> 
> I'll change the patch and send soon...
> 
> Regards,
> 
> -- 
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
> >> Blog sobre TI: http://fabriziomello.blogspot.com
> >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
> >> Twitter: http://twitter.com/fabriziomello
> 



Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Fabrízio de Royes Mello
2012/10/2 Alvaro Herrera 

> Excerpts from Andrew Dunstan's message of mar oct 02 17:24:38 -0300 2012:
> >
> > On 10/02/2012 03:48 PM, Tom Lane wrote:
> > > Alvaro Herrera  writes:
>
> > >> Well, if that's the rationale then you end up with no schema foo at
> all
> > >> (i.e. both die), which seems even more surprising (though I admit it
> has
> > >> the advantage of being a simple rule to document.)
> > > I think we should just disallow putting any contained objects in the
> > > statement when IF NOT EXISTS is used.  It's simple to understand,
> simple
> > > to document and implement, and I think it covers all the sane use-cases
> > > anyway.
> >
> > I thought we'd already agreed on this.
>
> Well, it's not what the latest proposed patch implements.
>
>
You're right... the latest proposed patch don't implements it.

I'll change the patch and send soon...

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Martijn van Oosterhout
On Tue, Oct 02, 2012 at 12:58:15PM -0400, Stephen Frost wrote:
> > I simply do not understand objections to the proposal. Have I missed 
> > something?
> 
> It was my impression that the concern is the stability of the hash value
> and ensuring that tools which operate on it don't mistakenly lump two
> different queries into one because they had the same hash value (caused
> by a change in our hashing algorithm or input into it over time, eg a
> point release).  I was hoping to address that to allow this proposal to
> move forward..

That makes no sense though. The moment you talk about "hash" you
consider the possibility of lumping together things that aren't the
same.  Any tools using these hashes must have realised this.

Fortunatly, the statistics are better than the birthday paradox. The
chances that the two most important queries in your system end up
having the same hash is miniscule.

Like mentioned elsewhere, a system with more than 10,000 different
queries sounds rare to me (once you exclude query parameters ofcourse).

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Alvaro Herrera
Excerpts from Andrew Dunstan's message of mar oct 02 17:24:38 -0300 2012:
> 
> On 10/02/2012 03:48 PM, Tom Lane wrote:
> > Alvaro Herrera  writes:

> >> Well, if that's the rationale then you end up with no schema foo at all
> >> (i.e. both die), which seems even more surprising (though I admit it has
> >> the advantage of being a simple rule to document.)
> > I think we should just disallow putting any contained objects in the
> > statement when IF NOT EXISTS is used.  It's simple to understand, simple
> > to document and implement, and I think it covers all the sane use-cases
> > anyway.
> 
> I thought we'd already agreed on this.

Well, it's not what the latest proposed patch implements.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Andrew Dunstan


On 10/02/2012 03:48 PM, Tom Lane wrote:

Alvaro Herrera  writes:

Excerpts from David E. Wheeler's message of mar oct 02 16:16:37 -0300 2012:

On Oct 2, 2012, at 12:08 PM, Alvaro Herrera  wrote:

create schema if not exists foo create table first (a int);
create schema if not exists foo create table second (a int);

As far as I can see, with the patch as it currently stands, you would
end up with only table "first" in the schema, which seems very
surprising to me.

Yeah, I think the second should die. CINE should only work if there are no 
other objects created as part of the statement, IMHO.

Well, if that's the rationale then you end up with no schema foo at all
(i.e. both die), which seems even more surprising (though I admit it has
the advantage of being a simple rule to document.)

I think we should just disallow putting any contained objects in the
statement when IF NOT EXISTS is used.  It's simple to understand, simple
to document and implement, and I think it covers all the sane use-cases
anyway.




I thought we'd already agreed on this.

+1.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread David E. Wheeler
On Oct 2, 2012, at 12:48 PM, Tom Lane  wrote:

> I think we should just disallow putting any contained objects in the
> statement when IF NOT EXISTS is used.  It's simple to understand, simple
> to document and implement, and I think it covers all the sane use-cases
> anyway.

+1

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Alvaro Herrera
Excerpts from David E. Wheeler's message of mar oct 02 16:37:30 -0300 2012:
> On Oct 2, 2012, at 12:30 PM, Alvaro Herrera  wrote:
> 
> > How about call this for precedent:
> > 
> > mkdir -p /tmp/foo/bar
> > mkdir -p /tmp/foo/baz
> > 
> > In this case you end up with directory "foo" and at least two subdirs in
> > it, bar and baz.  This works even if /tmp/foo existed previously and
> > even if there was some other stuff in it.
> 
> Well, what about this, then?
> 
> create schema if not exists foo create table second (a int);
> create schema if not exists foo create table second (b int);

Yes, exactly -- what about this case?  This is precisely the reason we
don't have CREATE TABLE IF NOT EXISTS.

I don't know what the best answer is.  Most people seem to think that
the answer ought to be that you end up with a single column second.a,
and the second command errors out.

So if you do this:

create schema if not exists foo create table first (a int);
create schema if not exists foo create table first (b int),
create table second (a int);

you end up with *only* the first table, because the second command
errors out when the first table is observed to exist.

Now, what if you were to do this instead:

create schema if not exists foo
   create table if not exists first (a int);
create schema if not exists foo
   create table if not exists first (b int),
   create table if not exists second (a int);

The you end up with first.a and second.a.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from David E. Wheeler's message of mar oct 02 16:16:37 -0300 2012:
>> On Oct 2, 2012, at 12:08 PM, Alvaro Herrera  wrote:
>>> create schema if not exists foo create table first (a int);
>>> create schema if not exists foo create table second (a int);
>>>
>>> As far as I can see, with the patch as it currently stands, you would
>>> end up with only table "first" in the schema, which seems very
>>> surprising to me.

>> Yeah, I think the second should die. CINE should only work if there are no 
>> other objects created as part of the statement, IMHO.

> Well, if that's the rationale then you end up with no schema foo at all
> (i.e. both die), which seems even more surprising (though I admit it has
> the advantage of being a simple rule to document.)

I think we should just disallow putting any contained objects in the
statement when IF NOT EXISTS is used.  It's simple to understand, simple
to document and implement, and I think it covers all the sane use-cases
anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread David E. Wheeler
On Oct 2, 2012, at 12:30 PM, Alvaro Herrera  wrote:

> How about call this for precedent:
> 
> mkdir -p /tmp/foo/bar
> mkdir -p /tmp/foo/baz
> 
> In this case you end up with directory "foo" and at least two subdirs in
> it, bar and baz.  This works even if /tmp/foo existed previously and
> even if there was some other stuff in it.

Well, what about this, then?

create schema if not exists foo create table second (a int);
create schema if not exists foo create table second (b int);

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Alvaro Herrera
Excerpts from David E. Wheeler's message of mar oct 02 16:16:37 -0300 2012:
> On Oct 2, 2012, at 12:08 PM, Alvaro Herrera  wrote:
> 
> > create schema if not exists foo create table first (a int);
> > create schema if not exists foo create table second (a int);
> > 
> > 
> > As far as I can see, with the patch as it currently stands, you would
> > end up with only table "first" in the schema, which seems very
> > surprising to me.
> 
> Yeah, I think the second should die. CINE should only work if there are no 
> other objects created as part of the statement, IMHO.

Well, if that's the rationale then you end up with no schema foo at all
(i.e. both die), which seems even more surprising (though I admit it has
the advantage of being a simple rule to document.)

How about call this for precedent:

mkdir -p /tmp/foo/bar
mkdir -p /tmp/foo/baz

In this case you end up with directory "foo" and at least two subdirs in
it, bar and baz.  This works even if /tmp/foo existed previously and
even if there was some other stuff in it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread David E. Wheeler
On Oct 2, 2012, at 12:08 PM, Alvaro Herrera  wrote:

> create schema if not exists foo create table first (a int);
> create schema if not exists foo create table second (a int);
> 
> 
> As far as I can see, with the patch as it currently stands, you would
> end up with only table "first" in the schema, which seems very
> surprising to me.

Yeah, I think the second should die. CINE should only work if there are no 
other objects created as part of the statement, IMHO.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raise a WARNING if a REVOKE affects nothing?

2012-10-02 Thread David Johnston
> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
> ow...@postgresql.org] On Behalf Of Noah Misch
> Sent: Tuesday, October 02, 2012 3:02 PM
> To: Craig Ringer
> Cc: PostgreSQL Hackers
> Subject: Re: [HACKERS] Raise a WARNING if a REVOKE affects nothing?
> 
> On Tue, Aug 21, 2012 at 02:31:29PM +0800, Craig Ringer wrote:
> > It'd really help if REVOKE consistently raised warnings when it didn't
> > actually revoke anything.
> 
> +1
> 
> This will invite the same mixed feelings as the CREATE x IF NOT EXISTS
> notices, but I think it's worthwhile.
> 
> > Even better, a special case for REVOKEs on objects that only have
> > owner and public permissions could say:
> >
> > WARNING: REVOKE didn't remove any permissions for user . This
> >  has default permissions, so there were no GRANTs
> > for user  to revoke. See the documentation for REVOKE for more
> > information.
> 
> The extra aid from saying those particular things is not clear to me.
> 
> It might be overkill, but we could report any other roles indirectly
conveying
> access to the named role.
> 

Having been bitten by this myself I do see the value in such a warning.  It
is not uncommon for someone using REVOKE to believe they are installing a
block instead of removing an allowance; especially as it interacts with
default permissions.

That said, and this is an off-the-cuff thought, the entire UI for
permissions, and its treatment in the documentation, seems to be fact
oriented.  The system is well documented but actually getting up to speed to
learn and use it is still a matter of reading the documentation and figuring
out how everything fits together.  I haven't given it that much thought but
I am curious if others are of the same opinion.

IOW, this proposal is an attempt to fix a symptom without addressing the
root cause.

Food for thought.

David J.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Alvaro Herrera
Excerpts from Volker Grabsch's message of sáb sep 29 06:32:13 -0300 2012:
> Dickson S. Guedes schrieb:
> > - https://commitfest.postgresql.org/action/patch_view?id=907
> > 
> > The patch is small and implements a new syntax to CREATE SCHEMA
> > that allow the creation of a schema be skipped when IF NOT EXISTS is
> > used.
> >
> > [...]
> >
> > - Should this patch implements others INEs like ADD COLUMN IF NOT EXISTS?
> 
> If there's still a chance to improve the patch, I'd love to see
> the following INEs implemented. Several real-world database
> upgrade scripts would benefit from those:

I don't see that this patch is responsible for such new commands.  If
you want them, feel free to submit separate patches for them (or have
someone else do it for you).  But see the thread starting at
http://archives.postgresql.org/message-id/467881.79137.qm%40web27104.mail.ukl.yahoo.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Peter Geoghegan
On 2 October 2012 17:58, Stephen Frost  wrote:
> Right, and that's all I'm trying to address here- how do we provide a
> value for a given query which can be relied upon by outside sources,
> even in the face of a point release which changes what our internal hash
> value for a given query is.

I don't know of a way. Presumably, we'd hope to avoid this, and would
look for alternatives to anything that would necessitate bumping
PGSS_FILE_HEADER, while not going so far as to let pg_stat_statements
contort things in the core system. If I was aware of a case where this
would have come up had pg_stat_statements fingerprinting been around
at the time, perhaps I could give a better answer than that.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Alvaro Herrera

The fundamental issue with this patch hasn't been answered sufficiently,
I think.  Consider the following sequence of commands:

create schema if not exists foo create table first (a int);
create schema if not exists foo create table second (a int);


As far as I can see, with the patch as it currently stands, you would
end up with only table "first" in the schema, which seems very
surprising to me.

I think this needs more thought, and in any case it needs more
comprehensive regression test and documentation (i.e. at least the
examples ought to explain what would happen in such cases).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raise a WARNING if a REVOKE affects nothing?

2012-10-02 Thread Noah Misch
On Tue, Aug 21, 2012 at 02:31:29PM +0800, Craig Ringer wrote:
> It'd really help if REVOKE consistently raised warnings when it didn't  
> actually revoke anything.

+1

This will invite the same mixed feelings as the CREATE x IF NOT EXISTS
notices, but I think it's worthwhile.

> Even better, a special case for REVOKEs on objects that only have owner  
> and public permissions could say:
>
> WARNING: REVOKE didn't remove any permissions for user . This  
> 
> has default permissions, so there were no GRANTs for user  to  
> revoke. See the documentation
> for REVOKE for more information.

The extra aid from saying those particular things is not clear to me.

It might be overkill, but we could report any other roles indirectly conveying
access to the named role.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Peter Geoghegan
On 2 October 2012 18:16, Tom Lane  wrote
> 1. Why isn't something like md5() on the reported query text an equally
> good solution for users who want a query hash?

Because that does not uniquely identify the entry. The very first
thing that the docs say on search_path is "Qualified names are tedious
to write, and it's often best not to wire a particular schema name
into applications anyway". Presumably, the reason it's best not to
wire schema names into apps is because it might be useful to modify
search_path in a way that dynamically made the same queries in some
application reference what are technically distinct relations. If
anyone does this, and it seems likely that many do for various
reasons, they will be out of luck when using some kind of
pg_stat_statements aggregation.

This was the behaviour that I intended for pg_stat_statements all
along, and I think it's better than a solution that matches query
strings.

> 2. If people are going to accumulate stats on queries over a long period
> of time, is a 32-bit hash really good enough for the purpose?  If I'm
> doing the math right, the chance of collision is already greater than 1%
> at 1 queries, and rises to about 70% for 10 queries; see
> http://en.wikipedia.org/wiki/Birthday_paradox
> We discussed this issue and decided it was okay for pg_stat_statements's
> internal hash table, but it's not at all clear to me that it's sensible
> to use 32-bit hashes for external accumulation of query stats.

Well, forgive me for pointing this out, but I did propose that the
hash be a 64-bit value (which would have necessitated adopting
hash_any() to produce 64-bit values), but you rejected the proposal. I
arrived at the same probability for a collision as you did and posted
in to the list, in discussion shortly after the normalisation stuff
was committed.

A more sensible way of assessing the risk of a collision would be to
try and come up with the probability of a collision that someone
actually ends up caring about, which is considerably less than the 1%
for 10,000 entries. I'm not being glib - people are very used to the
idea that aggregating information on query costs is a lossy process.
Prior to 9.2, the only way execution costs could reasonably be
measured on at the query granularity on a busy system was to set
log_min_duration_statement to something like 1 second.

I am also unconvinced by the idea that aggregating historical data
(with the same hash value) in a separate application is likely to make
the collision situation appreciably worse. People are going to be
using something like an RRD circular buffer to aggregate the
information, and I can't see anyone caring about detailed information
that is more than a couple of weeks in the past. The point of
aggregation isn't to store more queries, it's to construct time-series
data from snapshots. Besides, do most applications really even have
more than 10,000 distinct queries?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prevent restored WAL files from being archived again Re: Unnecessary WAL archiving after failover

2012-10-02 Thread Fujii Masao
On Wed, Oct 3, 2012 at 3:11 AM, Simon Riggs  wrote:
> On 2 October 2012 19:06, Fujii Masao  wrote:
>> On Sat, Aug 11, 2012 at 2:19 AM, Fujii Masao  wrote:
>>> On Thu, Aug 9, 2012 at 8:08 AM, Simon Riggs  wrote:
 On 29 July 2012 16:01, Fujii Masao  wrote:

> Attached patch changes the startup process so that it creates .done file
> whenever WAL file is successfully restored, whether archive mode is
> enabled or not. The restored WAL files will not be archived again because
> of .done file.

 The proposed patch works, for archiving only, but I don't like the
 code. It's a partial refactoring of existing code.

 I prefer to go for a full re-factoring version for HEAD, and a zero
 refactoring version for 9.2 since we're deep into beta.
>>
>> Isn't it time to push the full re-factoring version to HEAD? If there is no
>> such version yet, what about pushing the zero refactoring version for now?
>
> If you send a rebased patch, I'll review,

Okay. Will do. The patch needs to be revised to correspond with the recent
split of xlog.c.

> but its not high on my radar
> right now unless you can explain why it should be higher.

It may not be high, but I'm just worried that we are likely to forget to
apply that change into HEAD if we postpone it furthermore.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prevent restored WAL files from being archived again Re: Unnecessary WAL archiving after failover

2012-10-02 Thread Simon Riggs
On 2 October 2012 19:06, Fujii Masao  wrote:
> On Sat, Aug 11, 2012 at 2:19 AM, Fujii Masao  wrote:
>> On Thu, Aug 9, 2012 at 8:08 AM, Simon Riggs  wrote:
>>> On 29 July 2012 16:01, Fujii Masao  wrote:
>>>
 Attached patch changes the startup process so that it creates .done file
 whenever WAL file is successfully restored, whether archive mode is
 enabled or not. The restored WAL files will not be archived again because
 of .done file.
>>>
>>> The proposed patch works, for archiving only, but I don't like the
>>> code. It's a partial refactoring of existing code.
>>>
>>> I prefer to go for a full re-factoring version for HEAD, and a zero
>>> refactoring version for 9.2 since we're deep into beta.
>
> Isn't it time to push the full re-factoring version to HEAD? If there is no
> such version yet, what about pushing the zero refactoring version for now?

If you send a rebased patch, I'll review, but its not high on my radar
right now unless you can explain why it should be higher.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] review: pgbench - aggregation of info written into log

2012-10-02 Thread Pavel Stehule
Hello

I did review of pgbench patch
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00737.php

* this patch is cleanly patched

* I had a problem with doc

make[1]: Entering directory `/home/pavel/src/postgresql/doc/src/sgml'
openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
-c /usr/share/sgml/docbook/dsssl-stylesheets/catalog -d stylesheet.dsl
-t sgml -i output-html -V html-index postgres.sgml
openjade:pgbench.sgml:767:8:E: document type does not allow element
"TITLE" here; missing one of "ABSTRACT", "AUTHORBLURB", "MSGSET",
"CALLOUTLIST", "ITEMIZEDLIST", "ORDEREDLIST", "SEGMENTEDLIST",
"VARIABLELIST", "CAUTION", "IMPORTANT", "NOTE", "TIP", "WARNING",
"FORMALPARA", "BLOCKQUOTE", "EQUATION", "EXAMPLE", "FIGURE", "TABLE",
"PROCEDURE", "SIDEBAR", "QANDASET", "REFSECT3" start-tag
make[1]: *** [HTML.index] Error 1
make[1]: *** Deleting file `HTML.index'
make[1]: Leaving directory `/home/pavel/src/postgresql/doc/src/sgml

* pgbench is compiled without warnings

* there is a few issues in source code

+   
+   /* should we aggregate the results or not? */
+   if (use_log_agg)
+   {
+   
+   /* are we still in the same interval? if yes, 
accumulate the
+* values (print them otherwise) */
+   if (agg->start_time + use_log_agg >= 
INSTR_TIME_GET_DOUBLE(now))
+   {
+   

* a real time range for aggregation is dynamic (pgbench is not
realtime application), so I am not sure if following constraint has
sense

 +  if ((duration > 0) && (use_log_agg > 0) && (duration % use_log_agg != 
0)) {
+   fprintf(stderr, "duration (%d) must be a multiple of aggregation
interval (%d)\n", duration, use_log_agg);
+   exit(1);
+   }

* it doesn't flush last aggregated data (it is mentioned by Tomas:
"Also, I've realized the last interval may not be logged at all - I'll
take a look into this in the next version of the patch."). And it can
be significant for higher values of "use_log_agg"

* a name of variable "use_log_agg" is not good - maybe "log_agg_interval" ?

Regards

Pavel Stehule


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prevent restored WAL files from being archived again Re: Unnecessary WAL archiving after failover

2012-10-02 Thread Fujii Masao
On Sat, Aug 11, 2012 at 2:19 AM, Fujii Masao  wrote:
> On Thu, Aug 9, 2012 at 8:08 AM, Simon Riggs  wrote:
>> On 29 July 2012 16:01, Fujii Masao  wrote:
>>
>>> Attached patch changes the startup process so that it creates .done file
>>> whenever WAL file is successfully restored, whether archive mode is
>>> enabled or not. The restored WAL files will not be archived again because
>>> of .done file.
>>
>> The proposed patch works, for archiving only, but I don't like the
>> code. It's a partial refactoring of existing code.
>>
>> I prefer to go for a full re-factoring version for HEAD, and a zero
>> refactoring version for 9.2 since we're deep into beta.

Isn't it time to push the full re-factoring version to HEAD? If there is no
such version yet, what about pushing the zero refactoring version for now?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect behaviour when using a GiST index on points

2012-10-02 Thread Noah Misch
On Tue, Aug 28, 2012 at 05:04:09PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane  wrote:
> >> There's also the big-picture question of whether we should just get rid
> >> of fuzzy comparisons in the geometric types instead of trying to hack
> >> indexes to work around them.
> 
> > +1 for that approach, but only if I don't have to do the work.

I agree in the abstract; why should a point (position on a 2D plane) compare
fuzzily while a float8 (position on a 1D number line) does not?  But ...

> > Otherwise, +1 for doing the simplest thing that we're sure will
> > eliminate wrong answers.
> 
> What we're forced to speculate about here is how many applications out
> there are relying on fuzzy comparison to get answers they like, versus
> how many are getting answers they don't like because of it.  The fact
> that the underlying storage is float8 not numeric suggests there are
> probably some cases where fuzzy is helpful.

... yes.  Having never used these types in practice, I won't venture a guess.
Anyone else?

> I've never cared for the particulars of the way the fuzzy comparisons
> are done, in any case: using an absolute rather than relative error
> threshold is wrong according to every numerical analysis principle
> I know.

Definitely.

> The long and the short of it is that it will probably take a significant
> investment of work to make something that's clearly better.  If that
> weren't the case, we'd have done something long ago.

In any event, I think we should entertain a patch to make the GiST operator
class methods bug-compatible with corresponding operators.  Even if we decide
to change operator behavior in HEAD, the back branches could use it.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Tom Lane
Stephen Frost  writes:
> * Peter Geoghegan (pe...@2ndquadrant.com) wrote:
>> I simply do not understand objections to the proposal. Have I missed 
>> something?

> It was my impression that the concern is the stability of the hash value
> and ensuring that tools which operate on it don't mistakenly lump two
> different queries into one because they had the same hash value (caused
> by a change in our hashing algorithm or input into it over time, eg a
> point release).  I was hoping to address that to allow this proposal to
> move forward..

I think there are at least two questions that ought to be answered:

1. Why isn't something like md5() on the reported query text an equally
good solution for users who want a query hash?

2. If people are going to accumulate stats on queries over a long period
of time, is a 32-bit hash really good enough for the purpose?  If I'm
doing the math right, the chance of collision is already greater than 1%
at 1 queries, and rises to about 70% for 10 queries; see
http://en.wikipedia.org/wiki/Birthday_paradox
We discussed this issue and decided it was okay for pg_stat_statements's
internal hash table, but it's not at all clear to me that it's sensible
to use 32-bit hashes for external accumulation of query stats.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Stephen Frost
* Peter Geoghegan (pe...@2ndquadrant.com) wrote:
> On 1 October 2012 18:05, Stephen Frost  wrote:
> > You're going to have to help me here, 'cause I don't see how there can
> > be duplicates if we include the PGSS_FILE_HEADER as part of the hash,
> > unless we're planning to keep PGSS_FILE_HEADER constant while we change
> > what the hash value is for a given query, yet that goes against the
> > assumptions that were laid out, aiui.
> 
> Well, they wouldn't be duplicates if you think that the fact that one
> query was executed before some point release and another after ought
> to differentiate queries. I do not.

This would only be if we happened to change what hash was generated for
a given query during such a point release, where I share your feeling
that it aught to be quite rare.  I'm not suggestion we do this for every
point release...

> By invalidate, I mean that when we go to open the saved file, if the
> header doesn't match, the file is considered corrupt, and we simply
> log that the file could not be read, before unlinking it. This would
> be necessary in the unlikely event of there being some substantive
> change in the representation of query trees in a point release. I am
> not aware of any precedent for this, though Tom said that there was
> one.

Right, and that's all I'm trying to address here- how do we provide a
value for a given query which can be relied upon by outside sources,
even in the face of a point release which changes what our internal hash
value for a given query is.

> I don't want to get too hung up on what we'd do if this problem
> actually occurred, because that isn't what this thread is about.

[...]

> I simply do not understand objections to the proposal. Have I missed 
> something?

It was my impression that the concern is the stability of the hash value
and ensuring that tools which operate on it don't mistakenly lump two
different queries into one because they had the same hash value (caused
by a change in our hashing algorithm or input into it over time, eg a
point release).  I was hoping to address that to allow this proposal to
move forward..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] xmalloc => pg_malloc

2012-10-02 Thread Andres Freund
On Tuesday, October 02, 2012 06:30:33 PM Tom Lane wrote:
> Andres Freund  writes:
> >> pg_calloc  (randomly different API for pg_malloc0)
> > 
> > Do we need this?
> 
> I thought about getting rid of it, but there are some dozens of calls
> scattered across several files, so I wasn't sure it was worth it.
I always found calloc to be a pointless API but by now I have learned what it 
means so I don't have a strong opinion.


> > I wonder whether the same set of functions should also be available in
> > the backend with ereport(EC_OUT_OF_MEMORY, ...) behaviour as well.
> 
> In the backend, you almost always ought to be using palloc instead.
> The only places where it's really appropriate to be using malloc
> directly are where you don't want an error thrown for out-of-memory.
> So I think providing these in the backend would do little except to
> encourage bad programming.
We seem to have 100+ usages of malloc() anyway. Several of those are in helper 
libraries like regex/* though. A quick grep shows most of the others to be 
valid in my opinion. Mostly its allocating memory which is never deallocated 
but to big to unconditionally allocated.

The quick grep showed that at least src/backend/utils/misc/ps_status.c doesn't 
properly check the return value. Others (mctx.c) use Asserts...

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xmalloc => pg_malloc

2012-10-02 Thread Phil Sorber
On Tue, Oct 2, 2012 at 12:30 PM, Tom Lane  wrote:
> Andres Freund  writes:
>>> pg_calloc(randomly different API for pg_malloc0)
>
>> Do we need this?
>
> I thought about getting rid of it, but there are some dozens of calls
> scattered across several files, so I wasn't sure it was worth it.
> Anybody else have an opinion?

I think having more than 1 function that does the same thing is
generally a bad idea. It sounds like it is going to cause confusion
and provide no real benefit.

>
>> I wonder whether the same set of functions should also be available in the
>> backend with ereport(EC_OUT_OF_MEMORY, ...) behaviour as well.
>
> In the backend, you almost always ought to be using palloc instead.
> The only places where it's really appropriate to be using malloc
> directly are where you don't want an error thrown for out-of-memory.
> So I think providing these in the backend would do little except to
> encourage bad programming.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xmalloc => pg_malloc

2012-10-02 Thread Tom Lane
Andres Freund  writes:
>> pg_calloc(randomly different API for pg_malloc0)

> Do we need this?

I thought about getting rid of it, but there are some dozens of calls
scattered across several files, so I wasn't sure it was worth it.
Anybody else have an opinion?

> I wonder whether the same set of functions should also be available in the 
> backend with ereport(EC_OUT_OF_MEMORY, ...) behaviour as well.

In the backend, you almost always ought to be using palloc instead.
The only places where it's really appropriate to be using malloc
directly are where you don't want an error thrown for out-of-memory.
So I think providing these in the backend would do little except to
encourage bad programming.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xmalloc => pg_malloc

2012-10-02 Thread Bruce Momjian
On Tue, Oct  2, 2012 at 12:02:13PM -0400, Tom Lane wrote:
> While looking around to fix the pg_malloc(0) issue, I noticed that
> various other pieces of code such as pg_basebackup have essentially
> identical functions, except they're called xmalloc().  I propose to
> standardize all these things on this set of names:
> 
>   pg_malloc
>   pg_malloc0  (for malloc-and-zero behavior)
>   pg_calloc   (randomly different API for pg_malloc0)
>   pg_realloc
>   pg_free
>   pg_strdup
> 
> Any objections?

Please standarize.  I am totally confused by the many memory
implementations we have.  (Pg_upgrade uses pg_malloc().)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xmalloc => pg_malloc

2012-10-02 Thread Andres Freund
On Tuesday, October 02, 2012 06:02:13 PM Tom Lane wrote:
> While looking around to fix the pg_malloc(0) issue, I noticed that
> various other pieces of code such as pg_basebackup have essentially
> identical functions, except they're called xmalloc().  I propose to
> standardize all these things on this set of names:
> 
>   pg_malloc
>   pg_malloc0  (for malloc-and-zero behavior)
>   pg_calloc   (randomly different API for pg_malloc0)
Do we need this?

>   pg_realloc
>   pg_free
>   pg_strdup
I wonder whether the same set of functions should also be available in the 
backend with ereport(EC_OUT_OF_MEMORY, ...) behaviour as well. As noted before 
the are quite some malloc() calls around. Not all of them should be replaced, 
but I think quite some could.

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] xmalloc => pg_malloc

2012-10-02 Thread Tom Lane
While looking around to fix the pg_malloc(0) issue, I noticed that
various other pieces of code such as pg_basebackup have essentially
identical functions, except they're called xmalloc().  I propose to
standardize all these things on this set of names:

pg_malloc
pg_malloc0  (for malloc-and-zero behavior)
pg_calloc   (randomly different API for pg_malloc0)
pg_realloc
pg_free
pg_strdup

Any objections?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-10-02 Thread Bruce Momjian
On Tue, Sep 25, 2012 at 09:10:33AM -0400, Bruce Momjian wrote:
> > lc_collate cluster values do not match: old "zh_CN.utf8", new "zh_CN.UTF-8"
> > Failure, exiting
> > 
> > zh_CN.utf8 is provided by the installer and zh_CN.UTF-8 is my system
> > default.
> 
> OK, this tells us that the canonicalization code used in initdb is not
> going to help us in pg_upgrade, at least not on your system, and not on
> mine.
> 
> I think we should apply the patch that fixes the TOAST problem with
> information_schema, and the patch that outputs the old/new values for
> easier debugging.  Other than that, I don't know what else we can do
> except to ignore dashes when comparing locale names, which I am told is
> unacceptable.

Based on this great bug report and submitter leg-work, I have applied
three patches to pg_upgrade in head and 9.2, all attached:

*  try to get the canonical locale names, and report old/new values on mismatch
*  update query to skip toast tables for system objects
*  improve error reporting when the object counts don't match

None of these bugs caused pg_upgrade to produce an incorrect upgraded
cluster, so I am not going to panic and try to force them into 9.1,
which probably isn't being used by many people anymore anyway.

I think this closes this report.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
new file mode 100644
index 74b13e7..9d08f41
*** a/contrib/pg_upgrade/info.c
--- b/contrib/pg_upgrade/info.c
*** get_rel_infos(ClusterInfo *cluster, DbIn
*** 269,302 
  	 */
  
  	snprintf(query, sizeof(query),
! 			 "SELECT c.oid, n.nspname, c.relname, "
! 			 "	c.relfilenode, c.reltablespace, %s "
  			 "FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n "
  			 "	   ON c.relnamespace = n.oid "
! 			 "  LEFT OUTER JOIN pg_catalog.pg_tablespace t "
! 			 "	   ON c.reltablespace = t.oid "
! 			 "WHERE relkind IN ('r','t', 'i'%s) AND "
  	/* exclude possible orphaned temp tables */
  			 "  ((n.nspname !~ '^pg_temp_' AND "
  			 "n.nspname !~ '^pg_toast_temp_' AND "
! 			 "n.nspname NOT IN ('pg_catalog', 'information_schema', 'binary_upgrade') AND "
  			 "	  c.oid >= %u) "
  			 "  OR (n.nspname = 'pg_catalog' AND "
! 	"relname IN ('pg_largeobject', 'pg_largeobject_loid_pn_index'%s) )) "
! 	/* we preserve pg_class.oid so we sort by it to match old/new */
! 			 "ORDER BY 1;",
! 	/* 9.2 removed the spclocation column */
! 			 (GET_MAJOR_VERSION(cluster->major_version) <= 901) ?
! 			 "t.spclocation" : "pg_catalog.pg_tablespace_location(t.oid) AS spclocation",
  	/* see the comment at the top of old_8_3_create_sequence_script() */
  			 (GET_MAJOR_VERSION(old_cluster.major_version) <= 803) ?
  			 "" : ", 'S'",
- 	/* this oid allows us to skip system toast tables */
  			 FirstNormalObjectId,
  	/* does pg_largeobject_metadata need to be migrated? */
  			 (GET_MAJOR_VERSION(old_cluster.major_version) <= 804) ?
  	"" : ", 'pg_largeobject_metadata', 'pg_largeobject_metadata_oid_index'");
  
  	res = executeQueryOrDie(conn, "%s", query);
  
  	ntups = PQntuples(res);
--- 269,327 
  	 */
  
  	snprintf(query, sizeof(query),
! 			 "CREATE TEMPORARY TABLE info_rels (reloid) AS SELECT c.oid "
  			 "FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n "
  			 "	   ON c.relnamespace = n.oid "
! 			 "WHERE relkind IN ('r', 'i'%s) AND "
  	/* exclude possible orphaned temp tables */
  			 "  ((n.nspname !~ '^pg_temp_' AND "
  			 "n.nspname !~ '^pg_toast_temp_' AND "
! 	/* skip pg_toast because toast index have relkind == 'i', not 't' */
! 			 "n.nspname NOT IN ('pg_catalog', 'information_schema', "
! 			 "		'binary_upgrade', 'pg_toast') AND "
  			 "	  c.oid >= %u) "
  			 "  OR (n.nspname = 'pg_catalog' AND "
! 	"relname IN ('pg_largeobject', 'pg_largeobject_loid_pn_index'%s) ));",
  	/* see the comment at the top of old_8_3_create_sequence_script() */
  			 (GET_MAJOR_VERSION(old_cluster.major_version) <= 803) ?
  			 "" : ", 'S'",
  			 FirstNormalObjectId,
  	/* does pg_largeobject_metadata need to be migrated? */
  			 (GET_MAJOR_VERSION(old_cluster.major_version) <= 804) ?
  	"" : ", 'pg_largeobject_metadata', 'pg_largeobject_metadata_oid_index'");
  
+ 	PQclear(executeQueryOrDie(conn, "%s", query));
+ 
+ 	/*
+ 	 *	Get TOAST tables and indexes;  we have to gather the TOAST tables in
+ 	 *	later steps because we can't schema-qualify TOAST tables.
+ 	 */
+ 	PQclear(executeQueryOrDie(conn,
+ 			  "INSERT INTO info_rels "
+ 			  "SELECT reltoastrelid "
+ 			  "FROM info_rels i JOIN pg_catalog.pg_class c "
+ 			  "		ON i.reloid = c.oid"));
+ 	PQclear(executeQueryOrDie(conn,
+ 			  "INSERT INTO info_rels "
+ 			  "SELECT reltoastidxid "
+ 			  "FROM info_rels i JOIN pg_catalog.pg_class c "
+ 			  "		ON i.reloid = c.oid"));
+ 
+ 	snprintf(que

Re: [HACKERS] Doc patch to note which system catalogs have oids

2012-10-02 Thread Karl O. Pinc
On 09/25/2012 12:28:13 AM, Karl O. Pinc wrote:
> On 09/23/2012 08:57:45 PM, Karl O. Pinc wrote:
> 
> > The attached patch documents the oid column of those
> > system catalogs having an oid.
> 
> Don't use the first version of this patch (oid_doc.patch)
> without discarding the last hunk.  The last hunk
> introduces an error by duplicating the
> documentation of the pg_roles.oid column.
> 
> (I am reluctant to post a revised version
> since there's already 3 versions floating
> around representing 2 different approaches
> and no consensus as to which approach, if any, should
> be taken.)

I have figured out how to use the commitfest
pages to track what should be reviewed.
Submitting a corrected version of the very
first patch, which treats the oids as
regular columns.

I am now submitting patches to the commitfest
for review.  (I'm not sure how I missed this.)

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f999190..babb11c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -427,6 +427,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   amname
   name
   
@@ -683,6 +690,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   amopfamily
   oid
   pg_opfamily.oid
@@ -819,6 +833,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   amprocfamily
   oid
   pg_opfamily.oid
@@ -902,6 +923,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   adrelid
   oid
   pg_class.oid
@@ -1257,6 +1285,14 @@
 
 
 
+
+ 
+  oid
+  oid
+  
+  Row identifier
+ 
+
  
   rolname
   name
@@ -1462,6 +1498,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   castsource
   oid
   pg_type.oid
@@ -1577,6 +1620,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   relname
   name
   
@@ -1984,6 +2034,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   conname
   name
   
@@ -2250,6 +2307,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   collname
   name
   
@@ -2350,6 +2414,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   conname
   name
   
@@ -2443,6 +2514,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   datname
   name
   
@@ -2652,6 +2730,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   defaclrole
   oid
   pg_authid.oid
@@ -3005,6 +3090,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   enumtypid
   oid
   pg_type.oid
@@ -3078,6 +3170,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   extname
   name
   
@@ -3174,6 +3273,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   fdwname
   name
   
@@ -3266,6 +3372,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   srvname
   name
   
@@ -3675,6 +3788,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   lanname
   name
   
@@ -3875,6 +3995,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   lomowner
   oid
   pg_authid.oid
@@ -3927,6 +4054,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   nspname
   name
   
@@ -3995,6 +4129,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   opcmethod
   oid
   pg_am.oid
@@ -4093,6 +4234,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   oprname
   name
   
@@ -4243,6 +4391,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   opfmethod
   oid
   pg_am.oid
@@ -4427,6 +4582,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   proname
   name
   
@@ -4819,6 +4981,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   rulename
   name
   
@@ -5488,6 +5657,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   spcname
   name
   
@@ -5556,6 +5732,13 @@
 
 
  
+  oid
+  oid
+  
+  Row identifier
+ 
+
+ 
   tgrelid
   oid
   pg_class.oid
@@ -5741,6 +5924,13 @@
 
 
  
+  oid
+  oid

Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Magnus Hagander
P
On Oct 2, 2012 5:04 PM, "Euler Taveira"  wrote:
>
> On 02-10-2012 10:15, Peter Geoghegan wrote:
> > There are other, similar tools that exist in proprietary databases.
> > They expose a hash value, which is subject to exactly the same caveats
> > as our own. They explicitly encourage the type of aggregation by
> > third-party tools that I anticipate will happen with
> > pg_stat_statements. I want to facilitate this, but I'm confident that
> > the use of (dbid, userid, querytext) as a "candidate key" by these
> > tools is going to cause confusion, in subtle ways. By using the hash
> > value, those tools are exactly as well off as pg_stat_statements is,
> > which is to say, as well off as possible.
> >
> It depends on how you will use this information. If it is for a rapid
> analysis, it is fine to use a hash because the object internals won't
change
> (I hope not). However, if you want to analyze query statistics for a long
> period of time (say a few months) or your environment is distributed, you
> can't use the hash. There isn't a perfect solution but I see both cases
being
> useful for analysis tools.

Having the hash available in no way prevents you from using the query
string ad your key. Not having the hash certainly prevents you from using
it.

/Magnus


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Euler Taveira
On 02-10-2012 10:15, Peter Geoghegan wrote:
> There are other, similar tools that exist in proprietary databases.
> They expose a hash value, which is subject to exactly the same caveats
> as our own. They explicitly encourage the type of aggregation by
> third-party tools that I anticipate will happen with
> pg_stat_statements. I want to facilitate this, but I'm confident that
> the use of (dbid, userid, querytext) as a "candidate key" by these
> tools is going to cause confusion, in subtle ways. By using the hash
> value, those tools are exactly as well off as pg_stat_statements is,
> which is to say, as well off as possible.
> 
It depends on how you will use this information. If it is for a rapid
analysis, it is fine to use a hash because the object internals won't change
(I hope not). However, if you want to analyze query statistics for a long
period of time (say a few months) or your environment is distributed, you
can't use the hash. There isn't a perfect solution but I see both cases being
useful for analysis tools.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Missing OID define

2012-10-02 Thread Phil Sorber
Thom Brown and I were doing some hacking the other day and came across
this missing define. We argued over who was going to send the patch in
and I lost. So here it is.


pg_type_uuid_oid.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] MVCC and large objects

2012-10-02 Thread Scott Corscadden
While I'm referring to the 8.4 release, it's a slightly more general question. 
We're moving away from using pg_largeobject as fast as we can, and while doing 
so I'm seeing interesting behaviour. I'd like to confirm that this is is 
expected, or perhaps I don't have tuning parameters set quite correctly. I 
believe we have standard autovacuum running.

* Table Foo has an oid "data" column, points to a valid BLOB. We modified this 
table to also have a "datafilepath" column.
* We wrote a throttleable "copier" process which walks rows, reads the BLOB 
data out to a file and then UPDATEs the "datafilepath" column with where it 
wrote it. We did not alter the BLOB data in any way. When asked for byte data, 
the higher level code will first return it from the datafilepath if it's there, 
and fall back on the lo otherwise. 

While the above was working away, we nearly missed the fact that the 
"public.pg_largeobject" table seemed to be growing commensurate with what we 
were exporting! As we were doing this as the primary disk was nearly out of 
space, it was fortunate I could pause this work. We were able to move the 
entire system and it's now continuing along, but my question:

Is this expected? I'm a little surprised. My theory is that MVCC seems to be 
including the pg_largeobject referenced as a part of the row, and even though 
we're not updating the BLOB at all, a snapshot is getting created. *Is this 
expected*?

Many thanks - single-link RTFM answers welcome, I have seen the MVCC through 
pictures, and I get it - just not how a BLOB fits into MVCC here.

./scc

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Peter Geoghegan
On 1 October 2012 18:05, Stephen Frost  wrote:
> * Peter Geoghegan (pe...@2ndquadrant.com) wrote:
>> That won't really help matters. There'd still be duplicate entries,
>> from before and after the change, even if we make it immediately
>> obvious which is which. The only reasonable solution in that scenario
>> is to bump PGSS_FILE_HEADER, which will cause all existing entries to
>> be invalidated.
>
> You're going to have to help me here, 'cause I don't see how there can
> be duplicates if we include the PGSS_FILE_HEADER as part of the hash,
> unless we're planning to keep PGSS_FILE_HEADER constant while we change
> what the hash value is for a given query, yet that goes against the
> assumptions that were laid out, aiui.

Well, they wouldn't be duplicates if you think that the fact that one
query was executed before some point release and another after ought
to differentiate queries. I do not.

> If there's a change that results in a given query X no longer hashing to
> a value A, we need to change PGSS_FILE_HEADER to invalidate statistics
> which were collected for value A (or else we risk an independent query Y
> hashing to value A and ending up with completely invalid stats..).
> Provided we apply that consistently and don't reuse PGSS_FILE_HEADER
> values along the way, a combination of PGSS_FILE_HEADER and the hash
> value for a given query should be unique over time.
>
> We do need to document that the hash value for a given query may
> change..

By invalidate, I mean that when we go to open the saved file, if the
header doesn't match, the file is considered corrupt, and we simply
log that the file could not be read, before unlinking it. This would
be necessary in the unlikely event of there being some substantive
change in the representation of query trees in a point release. I am
not aware of any precedent for this, though Tom said that there was
one.

Tom: Could you please describe the precedent you had in mind? I would
like to better understand this risk.

I don't want to get too hung up on what we'd do if this problem
actually occurred, because that isn't what this thread is about.
Exposing the hash is a completely unrelated question, except that to
do so might make pg_stat_statements (including its limitations) better
understood. In my opinion, the various log parsing tools have taught
people to think about this in the wrong way - the query string is just
a representation of a query (and an imperfect one at that).

There are other, similar tools that exist in proprietary databases.
They expose a hash value, which is subject to exactly the same caveats
as our own. They explicitly encourage the type of aggregation by
third-party tools that I anticipate will happen with
pg_stat_statements. I want to facilitate this, but I'm confident that
the use of (dbid, userid, querytext) as a "candidate key" by these
tools is going to cause confusion, in subtle ways. By using the hash
value, those tools are exactly as well off as pg_stat_statements is,
which is to say, as well off as possible.

I simply do not understand objections to the proposal. Have I missed something?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: ToDo: allow to get a number of processed rows by COPY statement [Review of Patch]

2012-10-02 Thread Pavel Stehule
2012/10/1 Heikki Linnakangas :
> On 27.09.2012 23:58, Pavel Stehule wrote:
>>
>> Hello
>>
>> I reduced this patch as just plpgsql (SPI) problem solution.
>>
>> Correct generic solution needs a discussion about enhancing our libpq
>> interface - we can take a number of returned rows, but we should to
>> get number of processed rows too. A users should to parse this number
>> from completationTag, but this problem is not too hot and usually is
>> not blocker, because anybody can process completationTag usually.
>>
>> So this issue is primary for PL/pgSQL, where this information is not
>> possible. There is precedent - CREATE AS SELECT (thanks for sample
>> :)), and COPY should to have same impact on ROW_COUNT like this
>> statement (last patch implements it).
>
>
> The comment where CREATE AS SELECT does this says:
>
>> /*
>>  * CREATE TABLE AS is a messy special case for historical
>>  * reasons.  We must set _SPI_current->processed even though
>>  * the tuples weren't returned to the caller, and we must
>>  * return a special result code if the statement was spelled
>>  * SELECT INTO.
>>  */
>
>
> That doesn't sound like a very good precedent that we should copy to COPY. I
> don't have a problem with this approach in general, though. But if we're
> going to make it normal behavior, rather than an ugly historical kluge, that
> utility statements can return a row count without returning the tuples, we
> should document that. The above comment ought to be changed, and the manual
> section about SPI_execute needs to mention the possibility that
> SPI_processed != 0, but SPI_tuptable == NULL.
>
> Regarding the patch itself:
>
>> +   else if (IsA(stmt, CopyStmt))
>> +   {
>> +   /*
>> +* usually utility statements
>> doesn't return a number
>> +* of processed rows, but COPY
>> does it.
>> +*/
>> +   Assert(strncmp(completionTag,
>> "COPY  ", 5) == 0);
>> +   _SPI_current->processed =
>> strtoul(completionTag + 5,
>> +
>> NULL, 10);
>> +   }
>> else
>> res = SPI_OK_UTILITY;
>
>
> 'res' is not set for a CopyStmt, and there's an extra space in "COPY  " in
> the Assert.
>

fixed patch

Regards

Pavel Stehule


> - Heikki


copy-processed-rows-simple2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small LDAP error message change

2012-10-02 Thread Magnus Hagander
On Tue, Oct 2, 2012 at 2:57 AM, Peter Eisentraut  wrote:
> I'm proposing to make the attached change to some LDAP error messages.
> Aside from fixing a pluralization issue, I want to separate fact (search
> resulted in != 1 result) from interpretation (LDAP user does not exist,
> and that's a problem).

Looks good to me, +1.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-02 Thread Heikki Linnakangas

On 02.10.2012 10:36, Amit kapila wrote:

On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:

So let's think how this should ideally work from a user's point of view.
I think there should be just two settings: walsender_timeout and
walreceiver_timeout. walsender_timeout specifies how long a walsender
will keep a connection open if it doesn't hear from the walreceiver, and
walreceiver_timeout is the same for walreceiver. The system should
figure out itself how often to send keepalive messages so that those
timeouts are not reached.


By this it implies that we should remove wal_receiver_status_interval. 
Currently it is also used
incase of reply message of data sent by sender which contains till what point 
receiver has flushed. So if we remove this variable
receiver might start sending that message sonner than required.
Is that okay behavior?


I guess we should keep that setting, then, so that you can get status 
updates more often than would be required for heartbeat purposes.



In walsender, after half of walsender_timeout has elapsed and we haven't
received anything from the client, the walsender process should send a
"ping" message to the client. Whenever the client receives a Ping, it
replies. The walreceiver does the same; when half of walreceiver_timeout
has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip
resets the timer in both ends, regardless of which side initiated it, so
if e.g walsender_timeout<  walreceiver_timeout, the client will never
have to initiate a Ping message, because walsender will always reach the
walsender_timeout/2 point first and initiate the heartbeat message.


Just to clarify, walsender should reset timer after it gets reply from receiver 
of the message it sent.


Right.


walreceiver should reset timer after sending reply for heartbeat message.
> Similar to above timers will be reset when receiver sent the 
heartbeat message.


walreceiver should reset the timer when it *receives* any message from 
walsender. If it sends the reply right away, I guess that's the same 
thing, but I'd phrase it so that it's the reception of a message from 
the other end that resets the timer.



The Ping/Pong messages don't necessarily need to be new message types,
we can use the message types we currently have, perhaps with an
additional flag attached to them, to request the other side to reply
immediately.


Can't we make the decision to send reply immediately based on message type, 
because these message types will be unique.

To clarify my understanding,
1. the heartbeat message from walsender side will be keepalive message ('k') 
and from walreceiver side it will be Hot Standby feedback message ('h').
2. the reply message from walreceiver side will be current reply message ('r').


Yep. I wonder why need separate message types for Hot Standby Feedback 
'h' and Reply 'r', though. Seems it would be simpler to have just one 
messasge type that includes all the fields from both messages.



3. currently there is no reply kind of message from walsender, so do we need to 
introduce one new message for it or can use some existing message only?
 if new, do we need to send any additional information along with it, for 
existing messages can we use keepalive message it self as reply message but 
with an additional byte
 to indicate it is reply?


Hmm, I think I'd prefer to use the existing Keepalive message 'k', with 
an additional flag.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] date_in and buffer overrun

2012-10-02 Thread Heikki Linnakangas

On 02.10.2012 01:30, Hitoshi Harada wrote:

It seems date_in() has a risk of buffer overrun.  If the input is '.',
it sets field[0] to the beginning of workbuf and goes into
DecodeDate().  This function checks null-termination of the head of
string, but it can go beyond the end of string inside the first loop
and replace some bytes with zero.  The worst scenario we've seen is
overwrite of the stack frame, in which the compiler rearranged the
memory allocation of local variables in date_in() and work_buf is at
lower address than field.

I tried to attach a patch file but failed somehow, so I paste the fix here.


Thanks, applied to master and all supported back-branches.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch for removng unused targets

2012-10-02 Thread Alexander Korotkov
Hi!

Attached patch removes unused targets which are used only for order by when
data already comes in right order. It introduces resorderbyonly flag of
TargetEntry which indicated that entry is used only for ORDER BY clause. If
data comes in right order then such entries are removed in grouping_planner
function.

This is my first patch on planner. Probably, I did it in wrong way. But I
think it is worthwhile optimization and you could give me direction to
rework patch.

Actually we meet need of this optimization when ranking full-text search in
GIN index (it isn't published yet, will post prototype soon). But there is
some synthetic example illustrating benefit from patch.

CREATE OR REPLACE FUNCTION slow_func(x float8, y float8) RETURNS float8 AS
$$
BEGIN
PERFORM pg_sleep(0.01);
RETURN x + y;
END;
$$ IMMUTABLE LANGUAGE plpgsql;

CREATE TABLE test AS (SELECT random() AS x, random() AS y FROM
generate_series(1,1000));
CREATE INDEX test_idx ON test(slow_func(x,y));

Without patch:

test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY
slow_func(x,y) LIMIT 10;
  QUERY PLAN


--
 Limit  (cost=0.00..3.09 rows=10 width=16) (actual time=11.344..103.443
rows=10 loops=1)
   Output: x, y, (slow_func(x, y))
   ->  Index Scan using test_idx on public.test  (cost=0.00..309.25
rows=1000 width=16) (actual time=11.341..103.422 rows=10 loops=1)
 Output: x, y, slow_func(x, y)
 Total runtime: 103.524 ms
(5 rows)

With patch:

test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY
slow_func(x,y) LIMIT 10;
QUERY PLAN


---
 Limit  (cost=0.00..3.09 rows=10 width=16) (actual time=0.062..0.093
rows=10 loops=1)
   Output: x, y
   ->  Index Scan using test_idx on public.test  (cost=0.00..309.25
rows=1000 width=16) (actual time=0.058..0.085 rows=10 loops=1)
 Output: x, y
 Total runtime: 0.164 ms
(5 rows)

--
With best regards,
Alexander Korotkov.


unused-targets-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-02 Thread Amit kapila

On Monday, October 01, 2012 8:36 PM Robert Haas wrote:
On Mon, Oct 1, 2012 at 6:38 AM, Heikki Linnakangas
 wrote:
>> Hmm, I think we need to step back a bit. I've never liked the way
>> replication_timeout works, where it's the user's responsibility to set
>> wal_receiver_status_interval < replication_timeout. It's not very
>> user-friendly. I'd rather not copy that same design to this walreceiver
>> timeout. If there's two different timeouts like that, it's even worse,
>> because it's easy to confuse the two.

> I agree, but also note that wal_receiver_status_interval serves
> another user-visible purpose as well.

By above do you mean to say that wal_receiver_status_interval is used for reply 
of data sent by server to indicate till what point receiver has flushed data or 
something else?

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-02 Thread Amit kapila
On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:
On 21.09.2012 14:18, Amit kapila wrote:
> On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote:
> On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila  wrote:
>
 Approach-2 :
 Provide a variable wal_send_status_interval, such that if this is 0, then
 the current behavior would prevail and if its non-zero then KeepAlive
 message would be send maximum after that time.
 The modified code of WALSendLoop will be as follows:
>
> 
 Which way you think is better or you have any other idea to handle.
>
>>> I think #2 is better because it's more intuitive to a user.
>
>> Please find a patch attached for implementation of Approach-2.


>So let's think how this should ideally work from a user's point of view.
>I think there should be just two settings: walsender_timeout and
>walreceiver_timeout. walsender_timeout specifies how long a walsender
>will keep a connection open if it doesn't hear from the walreceiver, and
>walreceiver_timeout is the same for walreceiver. The system should
>figure out itself how often to send keepalive messages so that those
>timeouts are not reached.

By this it implies that we should remove wal_receiver_status_interval. 
Currently it is also used
incase of reply message of data sent by sender which contains till what point 
receiver has flushed. So if we remove this variable
receiver might start sending that message sonner than required. 
Is that okay behavior? 

>In walsender, after half of walsender_timeout has elapsed and we haven't
>received anything from the client, the walsender process should send a
>"ping" message to the client. Whenever the client receives a Ping, it
>replies. The walreceiver does the same; when half of walreceiver_timeout
>has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip
>resets the timer in both ends, regardless of which side initiated it, so
>if e.g walsender_timeout < walreceiver_timeout, the client will never
>have to initiate a Ping message, because walsender will always reach the
>walsender_timeout/2 point first and initiate the heartbeat message.

Just to clarify, walsender should reset timer after it gets reply from receiver 
of the message it sent.
walreceiver should reset timer after sending reply for heartbeat message. 
Similar to above timers will be reset when receiver sent the heartbeat message.

>The Ping/Pong messages don't necessarily need to be new message types,
>we can use the message types we currently have, perhaps with an
>additional flag attached to them, to request the other side to reply
>immediately.

Can't we make the decision to send reply immediately based on message type, 
because these message types will be unique.

To clarify my understanding, 
1. the heartbeat message from walsender side will be keepalive message ('k') 
and from walreceiver side it will be Hot Standby feedback message ('h').
2. the reply message from walreceiver side will be current reply message ('r').
3. currently there is no reply kind of message from walsender, so do we need to 
introduce one new message for it or can use some existing message only?
if new, do we need to send any additional information along with it, for 
existing messages can we use keepalive message it self as reply message but 
with an additional byte
to indicate it is reply?

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visual Studio 2012 RC

2012-10-02 Thread Noah Misch
On Fri, Sep 14, 2012 at 01:26:30AM +0200, Brar Piening wrote:
> Heikki Linnakangas wrote:
>> I don't think we can realistically support VS2012 until Microsoft  
>> releases the gratis Visual Studio Express 2012 for Windows Desktop.
>
> As they've released now I've updated my Patch with docs and ask for review.

I used this patch to build 64-bit PostgreSQL under Windows 7 Professional,
using Visual Studio 2012 Express for Windows Desktop.  I did not provide a
config.pl, so this build used the defaults -- in particular, it did not
exercise linking to optional external projects like perl and libxml2.  A
"vcregress check" passed.  The build emitted no warnings beyond those seen on
buildfarm member "chough" (VS 2008 Express).

My build log filled 8.8 MiB, a large increase from the 432 KiB of the "chough"
build log.  This isn't strictly a problem, but do you happen to have ideas for
curbing the noise?

I find no functional problems with the patch, but some comment updates and
other trivia need attention.  The patch itself was reversed; it applied
cleanly with "patch -R".  I regenerated it in the usual direction for the
portions I quote below.

  src/tools/msvc/MSBuildProject.pm:4:# Package that encapsulates a MSBuild 
(Visual C++ 2010) project file

Say something like "Visual C++ 2010 or greater".

  src/tools/msvc/VSObjectFactory.pm:93:# we use nmake as it has existed for a 
long time and still exists in visual studio 2010

Likewise, modify this comnent to be open-ended.  If nmake ever disappears,
we'll be updating this code and can see to change the comment.

> *** a/doc/src/sgml/install-windows.sgml
> --- b/doc/src/sgml/install-windows.sgml
> ***
> *** 22,28 
> Microsoft tools is to install a supported version of the
> Microsoft Windows SDK and use the included
> compiler. It is also possible to build with the full
> !   Microsoft Visual C++ 2005, 2008 or 2010. In 
> some cases
> that requires the installation of the Windows 
> SDK
> in addition to the compiler.
>
> --- 22,28 
> Microsoft tools is to install a supported version of the
> Microsoft Windows SDK and use the included
> compiler. It is also possible to build with the full
> !   Microsoft Visual C++ 2005, 2008, 2010 or 2012. 
> In some cases
> that requires the installation of the Windows 
> SDK
> in addition to the compiler.
>

I think this paragraph should be more like the one in the next patch hunk:
call out Visual Studio 2012 Express for Windows Desktop and Windows SDK 7.1 as
the main recommendations.  Perhaps even demote the SDK entirely and just
recommend VS 2012.  It'd odd to recommend only a past-version tool if a
current-version tool works just as well.

> ***
> *** 77,90 
> Visual Studio Express or some versions of the
> Microsoft Windows SDK. If you do not already 
> have a
> Visual Studio environment set up, the easiest
> !   way is to use the compilers in the Windows SDK,
> !   which is a free download from Microsoft.
>
>   
>
> PostgreSQL is known to support compilation using the compilers shipped 
> with
> Visual Studio 2005 to
> !   Visual Studio 2010 (including Express 
> editions),
> as well as standalone Windows SDK releases 6.0 to 7.1.
> 64-bit PostgreSQL builds are only supported with
> Microsoft Windows SDK version 6.0a and above or
> --- 77,91 
> Visual Studio Express or some versions of the
> Microsoft Windows SDK. If you do not already 
> have a
> Visual Studio environment set up, the easiest
> !   ways are to use the compilers in the Windows 
> SDK

I would write "Windows SDK 7.1" here and remove the parenthesized bit.
There's a later mention of support for older versions.

> !   (<= 7.1) or those from Visual Studio Express 2012 for Windows
> !   Desktop, which are both free downloads from Microsoft.

>
>   
>
> PostgreSQL is known to support compilation using the compilers shipped 
> with
> Visual Studio 2005 to
> !   Visual Studio 2012 (including Express 
> editions),
> as well as standalone Windows SDK releases 6.0 to 7.1.
> 64-bit PostgreSQL builds are only supported with
> Microsoft Windows SDK version 6.0a and above or
> ***
> *** 157,162  $ENV{PATH}=$ENV{PATH} . ';c:\some\where\bison\bin';
> --- 158,165 
> If you install the Windows SDK
> including the Visual C++ Compilers,
> you don't need Visual Studio to build.
> +   Note that as of Version 8.0a the Windows SDK no longer ships with a
> +   complete command-line build environment.

The part ending here looks like this:


 Microsoft Windows SDK
 
  It is recommended that you upgrade to the latest supported version
  of the Microsoft Windows SDK (currently
  version 7.1), available for download from
  http://www.microsoft.com/downloads/";>.
 
 
  You must always include the
  Windows Headers and Libraries pa