Re: [HACKERS] Installation of xpath (read xml on postgres)

2012-10-02 Thread Heikki Linnakangas

On 29.09.2012 15:53, Dariel Nicolas De Jesus Medrnao wrote:

I need install the function in postgres to read xml.
The function is xpath I do not find the guide of installation procedure
of the function.

Need to get something?

Explain to me how to it the instalation.


It's built-in, no need to install anything. See user manual: 
http://www.postgresql.org/docs/9.2/static/functions-xml.html#FUNCTIONS-XML-PROCESSING


PS. This mailing list is for discussion of development of PostgreSQL, 
please use pgsql-general or pgsql-novice for this kind of questions in 
the future.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visual Studio 2012 RC

2012-10-02 Thread Noah Misch
On Fri, Sep 14, 2012 at 01:26:30AM +0200, Brar Piening wrote:
 Heikki Linnakangas wrote:
 I don't think we can realistically support VS2012 until Microsoft  
 releases the gratis Visual Studio Express 2012 for Windows Desktop.

 As they've released now I've updated my Patch with docs and ask for review.

I used this patch to build 64-bit PostgreSQL under Windows 7 Professional,
using Visual Studio 2012 Express for Windows Desktop.  I did not provide a
config.pl, so this build used the defaults -- in particular, it did not
exercise linking to optional external projects like perl and libxml2.  A
vcregress check passed.  The build emitted no warnings beyond those seen on
buildfarm member chough (VS 2008 Express).

My build log filled 8.8 MiB, a large increase from the 432 KiB of the chough
build log.  This isn't strictly a problem, but do you happen to have ideas for
curbing the noise?

I find no functional problems with the patch, but some comment updates and
other trivia need attention.  The patch itself was reversed; it applied
cleanly with patch -R.  I regenerated it in the usual direction for the
portions I quote below.

  src/tools/msvc/MSBuildProject.pm:4:# Package that encapsulates a MSBuild 
(Visual C++ 2010) project file

Say something like Visual C++ 2010 or greater.

  src/tools/msvc/VSObjectFactory.pm:93:# we use nmake as it has existed for a 
long time and still exists in visual studio 2010

Likewise, modify this comnent to be open-ended.  If nmake ever disappears,
we'll be updating this code and can see to change the comment.

 *** a/doc/src/sgml/install-windows.sgml
 --- b/doc/src/sgml/install-windows.sgml
 ***
 *** 22,28 
 Microsoft tools is to install a supported version of the
 productnameMicrosoft Windows SDK/productname and use the included
 compiler. It is also possible to build with the full
 !   productnameMicrosoft Visual C++ 2005, 2008 or 2010/productname. In 
 some cases
 that requires the installation of the productnameWindows 
 SDK/productname
 in addition to the compiler.
/para
 --- 22,28 
 Microsoft tools is to install a supported version of the
 productnameMicrosoft Windows SDK/productname and use the included
 compiler. It is also possible to build with the full
 !   productnameMicrosoft Visual C++ 2005, 2008, 2010 or 2012/productname. 
 In some cases
 that requires the installation of the productnameWindows 
 SDK/productname
 in addition to the compiler.
/para

I think this paragraph should be more like the one in the next patch hunk:
call out Visual Studio 2012 Express for Windows Desktop and Windows SDK 7.1 as
the main recommendations.  Perhaps even demote the SDK entirely and just
recommend VS 2012.  It'd odd to recommend only a past-version tool if a
current-version tool works just as well.

 ***
 *** 77,90 
 productnameVisual Studio Express/productname or some versions of the
 productnameMicrosoft Windows SDK/productname. If you do not already 
 have a
 productnameVisual Studio/productname environment set up, the easiest
 !   way is to use the compilers in the productnameWindows SDK/productname,
 !   which is a free download from Microsoft.
/para
   
para
 PostgreSQL is known to support compilation using the compilers shipped 
 with
 productnameVisual Studio 2005/productname to
 !   productnameVisual Studio 2010/productname (including Express 
 editions),
 as well as standalone Windows SDK releases 6.0 to 7.1.
 64-bit PostgreSQL builds are only supported with
 productnameMicrosoft Windows SDK/productname version 6.0a and above or
 --- 77,91 
 productnameVisual Studio Express/productname or some versions of the
 productnameMicrosoft Windows SDK/productname. If you do not already 
 have a
 productnameVisual Studio/productname environment set up, the easiest
 !   ways are to use the compilers in the productnameWindows 
 SDK/productname

I would write Windows SDK 7.1 here and remove the parenthesized bit.
There's a later mention of support for older versions.

 !   (= 7.1) or those from productnameVisual Studio Express 2012 for Windows
 !   Desktop/productname, which are both free downloads from Microsoft.

/para
   
para
 PostgreSQL is known to support compilation using the compilers shipped 
 with
 productnameVisual Studio 2005/productname to
 !   productnameVisual Studio 2012/productname (including Express 
 editions),
 as well as standalone Windows SDK releases 6.0 to 7.1.
 64-bit PostgreSQL builds are only supported with
 productnameMicrosoft Windows SDK/productname version 6.0a and above or
 ***
 *** 157,162  $ENV{PATH}=$ENV{PATH} . ';c:\some\where\bison\bin';
 --- 158,165 
 If you install the productnameWindows SDK/productname
 including the applicationVisual C++ Compilers/application,
 you don't need productnameVisual Studio/productname to build.
 +   

[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-02 Thread Amit kapila
On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:
On 21.09.2012 14:18, Amit kapila wrote:
 On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote:
 On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapilaamit.kap...@huawei.com  wrote:

 Approach-2 :
 Provide a variable wal_send_status_interval, such that if this is 0, then
 the current behavior would prevail and if its non-zero then KeepAlive
 message would be send maximum after that time.
 The modified code of WALSendLoop will be as follows:

 snip
 Which way you think is better or you have any other idea to handle.

 I think #2 is better because it's more intuitive to a user.

 Please find a patch attached for implementation of Approach-2.


So let's think how this should ideally work from a user's point of view.
I think there should be just two settings: walsender_timeout and
walreceiver_timeout. walsender_timeout specifies how long a walsender
will keep a connection open if it doesn't hear from the walreceiver, and
walreceiver_timeout is the same for walreceiver. The system should
figure out itself how often to send keepalive messages so that those
timeouts are not reached.

By this it implies that we should remove wal_receiver_status_interval. 
Currently it is also used
incase of reply message of data sent by sender which contains till what point 
receiver has flushed. So if we remove this variable
receiver might start sending that message sonner than required. 
Is that okay behavior? 

In walsender, after half of walsender_timeout has elapsed and we haven't
received anything from the client, the walsender process should send a
ping message to the client. Whenever the client receives a Ping, it
replies. The walreceiver does the same; when half of walreceiver_timeout
has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip
resets the timer in both ends, regardless of which side initiated it, so
if e.g walsender_timeout  walreceiver_timeout, the client will never
have to initiate a Ping message, because walsender will always reach the
walsender_timeout/2 point first and initiate the heartbeat message.

Just to clarify, walsender should reset timer after it gets reply from receiver 
of the message it sent.
walreceiver should reset timer after sending reply for heartbeat message. 
Similar to above timers will be reset when receiver sent the heartbeat message.

The Ping/Pong messages don't necessarily need to be new message types,
we can use the message types we currently have, perhaps with an
additional flag attached to them, to request the other side to reply
immediately.

Can't we make the decision to send reply immediately based on message type, 
because these message types will be unique.

To clarify my understanding, 
1. the heartbeat message from walsender side will be keepalive message ('k') 
and from walreceiver side it will be Hot Standby feedback message ('h').
2. the reply message from walreceiver side will be current reply message ('r').
3. currently there is no reply kind of message from walsender, so do we need to 
introduce one new message for it or can use some existing message only?
if new, do we need to send any additional information along with it, for 
existing messages can we use keepalive message it self as reply message but 
with an additional byte
to indicate it is reply?

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-02 Thread Amit kapila

On Monday, October 01, 2012 8:36 PM Robert Haas wrote:
On Mon, Oct 1, 2012 at 6:38 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm, I think we need to step back a bit. I've never liked the way
 replication_timeout works, where it's the user's responsibility to set
 wal_receiver_status_interval  replication_timeout. It's not very
 user-friendly. I'd rather not copy that same design to this walreceiver
 timeout. If there's two different timeouts like that, it's even worse,
 because it's easy to confuse the two.

 I agree, but also note that wal_receiver_status_interval serves
 another user-visible purpose as well.

By above do you mean to say that wal_receiver_status_interval is used for reply 
of data sent by server to indicate till what point receiver has flushed data or 
something else?

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch for removng unused targets

2012-10-02 Thread Alexander Korotkov
Hi!

Attached patch removes unused targets which are used only for order by when
data already comes in right order. It introduces resorderbyonly flag of
TargetEntry which indicated that entry is used only for ORDER BY clause. If
data comes in right order then such entries are removed in grouping_planner
function.

This is my first patch on planner. Probably, I did it in wrong way. But I
think it is worthwhile optimization and you could give me direction to
rework patch.

Actually we meet need of this optimization when ranking full-text search in
GIN index (it isn't published yet, will post prototype soon). But there is
some synthetic example illustrating benefit from patch.

CREATE OR REPLACE FUNCTION slow_func(x float8, y float8) RETURNS float8 AS
$$
BEGIN
PERFORM pg_sleep(0.01);
RETURN x + y;
END;
$$ IMMUTABLE LANGUAGE plpgsql;

CREATE TABLE test AS (SELECT random() AS x, random() AS y FROM
generate_series(1,1000));
CREATE INDEX test_idx ON test(slow_func(x,y));

Without patch:

test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY
slow_func(x,y) LIMIT 10;
  QUERY PLAN


--
 Limit  (cost=0.00..3.09 rows=10 width=16) (actual time=11.344..103.443
rows=10 loops=1)
   Output: x, y, (slow_func(x, y))
   -  Index Scan using test_idx on public.test  (cost=0.00..309.25
rows=1000 width=16) (actual time=11.341..103.422 rows=10 loops=1)
 Output: x, y, slow_func(x, y)
 Total runtime: 103.524 ms
(5 rows)

With patch:

test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY
slow_func(x,y) LIMIT 10;
QUERY PLAN


---
 Limit  (cost=0.00..3.09 rows=10 width=16) (actual time=0.062..0.093
rows=10 loops=1)
   Output: x, y
   -  Index Scan using test_idx on public.test  (cost=0.00..309.25
rows=1000 width=16) (actual time=0.058..0.085 rows=10 loops=1)
 Output: x, y
 Total runtime: 0.164 ms
(5 rows)

--
With best regards,
Alexander Korotkov.


unused-targets-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] date_in and buffer overrun

2012-10-02 Thread Heikki Linnakangas

On 02.10.2012 01:30, Hitoshi Harada wrote:

It seems date_in() has a risk of buffer overrun.  If the input is '.',
it sets field[0] to the beginning of workbuf and goes into
DecodeDate().  This function checks null-termination of the head of
string, but it can go beyond the end of string inside the first loop
and replace some bytes with zero.  The worst scenario we've seen is
overwrite of the stack frame, in which the compiler rearranged the
memory allocation of local variables in date_in() and work_buf is at
lower address than field.

I tried to attach a patch file but failed somehow, so I paste the fix here.


Thanks, applied to master and all supported back-branches.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-02 Thread Heikki Linnakangas

On 02.10.2012 10:36, Amit kapila wrote:

On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:

So let's think how this should ideally work from a user's point of view.
I think there should be just two settings: walsender_timeout and
walreceiver_timeout. walsender_timeout specifies how long a walsender
will keep a connection open if it doesn't hear from the walreceiver, and
walreceiver_timeout is the same for walreceiver. The system should
figure out itself how often to send keepalive messages so that those
timeouts are not reached.


By this it implies that we should remove wal_receiver_status_interval. 
Currently it is also used
incase of reply message of data sent by sender which contains till what point 
receiver has flushed. So if we remove this variable
receiver might start sending that message sonner than required.
Is that okay behavior?


I guess we should keep that setting, then, so that you can get status 
updates more often than would be required for heartbeat purposes.



In walsender, after half of walsender_timeout has elapsed and we haven't
received anything from the client, the walsender process should send a
ping message to the client. Whenever the client receives a Ping, it
replies. The walreceiver does the same; when half of walreceiver_timeout
has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip
resets the timer in both ends, regardless of which side initiated it, so
if e.g walsender_timeout  walreceiver_timeout, the client will never
have to initiate a Ping message, because walsender will always reach the
walsender_timeout/2 point first and initiate the heartbeat message.


Just to clarify, walsender should reset timer after it gets reply from receiver 
of the message it sent.


Right.


walreceiver should reset timer after sending reply for heartbeat message.
 Similar to above timers will be reset when receiver sent the 
heartbeat message.


walreceiver should reset the timer when it *receives* any message from 
walsender. If it sends the reply right away, I guess that's the same 
thing, but I'd phrase it so that it's the reception of a message from 
the other end that resets the timer.



The Ping/Pong messages don't necessarily need to be new message types,
we can use the message types we currently have, perhaps with an
additional flag attached to them, to request the other side to reply
immediately.


Can't we make the decision to send reply immediately based on message type, 
because these message types will be unique.

To clarify my understanding,
1. the heartbeat message from walsender side will be keepalive message ('k') 
and from walreceiver side it will be Hot Standby feedback message ('h').
2. the reply message from walreceiver side will be current reply message ('r').


Yep. I wonder why need separate message types for Hot Standby Feedback 
'h' and Reply 'r', though. Seems it would be simpler to have just one 
messasge type that includes all the fields from both messages.



3. currently there is no reply kind of message from walsender, so do we need to 
introduce one new message for it or can use some existing message only?
 if new, do we need to send any additional information along with it, for 
existing messages can we use keepalive message it self as reply message but 
with an additional byte
 to indicate it is reply?


Hmm, I think I'd prefer to use the existing Keepalive message 'k', with 
an additional flag.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small LDAP error message change

2012-10-02 Thread Magnus Hagander
On Tue, Oct 2, 2012 at 2:57 AM, Peter Eisentraut pete...@gmx.net wrote:
 I'm proposing to make the attached change to some LDAP error messages.
 Aside from fixing a pluralization issue, I want to separate fact (search
 resulted in != 1 result) from interpretation (LDAP user does not exist,
 and that's a problem).

Looks good to me, +1.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: ToDo: allow to get a number of processed rows by COPY statement [Review of Patch]

2012-10-02 Thread Pavel Stehule
2012/10/1 Heikki Linnakangas hlinnakan...@vmware.com:
 On 27.09.2012 23:58, Pavel Stehule wrote:

 Hello

 I reduced this patch as just plpgsql (SPI) problem solution.

 Correct generic solution needs a discussion about enhancing our libpq
 interface - we can take a number of returned rows, but we should to
 get number of processed rows too. A users should to parse this number
 from completationTag, but this problem is not too hot and usually is
 not blocker, because anybody can process completationTag usually.

 So this issue is primary for PL/pgSQL, where this information is not
 possible. There is precedent - CREATE AS SELECT (thanks for sample
 :)), and COPY should to have same impact on ROW_COUNT like this
 statement (last patch implements it).


 The comment where CREATE AS SELECT does this says:

 /*
  * CREATE TABLE AS is a messy special case for historical
  * reasons.  We must set _SPI_current-processed even though
  * the tuples weren't returned to the caller, and we must
  * return a special result code if the statement was spelled
  * SELECT INTO.
  */


 That doesn't sound like a very good precedent that we should copy to COPY. I
 don't have a problem with this approach in general, though. But if we're
 going to make it normal behavior, rather than an ugly historical kluge, that
 utility statements can return a row count without returning the tuples, we
 should document that. The above comment ought to be changed, and the manual
 section about SPI_execute needs to mention the possibility that
 SPI_processed != 0, but SPI_tuptable == NULL.

 Regarding the patch itself:

 +   else if (IsA(stmt, CopyStmt))
 +   {
 +   /*
 +* usually utility statements
 doesn't return a number
 +* of processed rows, but COPY
 does it.
 +*/
 +   Assert(strncmp(completionTag,
 COPY  , 5) == 0);
 +   _SPI_current-processed =
 strtoul(completionTag + 5,
 +
 NULL, 10);
 +   }
 else
 res = SPI_OK_UTILITY;


 'res' is not set for a CopyStmt, and there's an extra space in COPY   in
 the Assert.


fixed patch

Regards

Pavel Stehule


 - Heikki


copy-processed-rows-simple2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Peter Geoghegan
On 1 October 2012 18:05, Stephen Frost sfr...@snowman.net wrote:
 * Peter Geoghegan (pe...@2ndquadrant.com) wrote:
 That won't really help matters. There'd still be duplicate entries,
 from before and after the change, even if we make it immediately
 obvious which is which. The only reasonable solution in that scenario
 is to bump PGSS_FILE_HEADER, which will cause all existing entries to
 be invalidated.

 You're going to have to help me here, 'cause I don't see how there can
 be duplicates if we include the PGSS_FILE_HEADER as part of the hash,
 unless we're planning to keep PGSS_FILE_HEADER constant while we change
 what the hash value is for a given query, yet that goes against the
 assumptions that were laid out, aiui.

Well, they wouldn't be duplicates if you think that the fact that one
query was executed before some point release and another after ought
to differentiate queries. I do not.

 If there's a change that results in a given query X no longer hashing to
 a value A, we need to change PGSS_FILE_HEADER to invalidate statistics
 which were collected for value A (or else we risk an independent query Y
 hashing to value A and ending up with completely invalid stats..).
 Provided we apply that consistently and don't reuse PGSS_FILE_HEADER
 values along the way, a combination of PGSS_FILE_HEADER and the hash
 value for a given query should be unique over time.

 We do need to document that the hash value for a given query may
 change..

By invalidate, I mean that when we go to open the saved file, if the
header doesn't match, the file is considered corrupt, and we simply
log that the file could not be read, before unlinking it. This would
be necessary in the unlikely event of there being some substantive
change in the representation of query trees in a point release. I am
not aware of any precedent for this, though Tom said that there was
one.

Tom: Could you please describe the precedent you had in mind? I would
like to better understand this risk.

I don't want to get too hung up on what we'd do if this problem
actually occurred, because that isn't what this thread is about.
Exposing the hash is a completely unrelated question, except that to
do so might make pg_stat_statements (including its limitations) better
understood. In my opinion, the various log parsing tools have taught
people to think about this in the wrong way - the query string is just
a representation of a query (and an imperfect one at that).

There are other, similar tools that exist in proprietary databases.
They expose a hash value, which is subject to exactly the same caveats
as our own. They explicitly encourage the type of aggregation by
third-party tools that I anticipate will happen with
pg_stat_statements. I want to facilitate this, but I'm confident that
the use of (dbid, userid, querytext) as a candidate key by these
tools is going to cause confusion, in subtle ways. By using the hash
value, those tools are exactly as well off as pg_stat_statements is,
which is to say, as well off as possible.

I simply do not understand objections to the proposal. Have I missed something?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] MVCC and large objects

2012-10-02 Thread Scott Corscadden
While I'm referring to the 8.4 release, it's a slightly more general question. 
We're moving away from using pg_largeobject as fast as we can, and while doing 
so I'm seeing interesting behaviour. I'd like to confirm that this is is 
expected, or perhaps I don't have tuning parameters set quite correctly. I 
believe we have standard autovacuum running.

* Table Foo has an oid data column, points to a valid BLOB. We modified this 
table to also have a datafilepath column.
* We wrote a throttleable copier process which walks rows, reads the BLOB 
data out to a file and then UPDATEs the datafilepath column with where it 
wrote it. We did not alter the BLOB data in any way. When asked for byte data, 
the higher level code will first return it from the datafilepath if it's there, 
and fall back on the lo otherwise. 

While the above was working away, we nearly missed the fact that the 
public.pg_largeobject table seemed to be growing commensurate with what we 
were exporting! As we were doing this as the primary disk was nearly out of 
space, it was fortunate I could pause this work. We were able to move the 
entire system and it's now continuing along, but my question:

Is this expected? I'm a little surprised. My theory is that MVCC seems to be 
including the pg_largeobject referenced as a part of the row, and even though 
we're not updating the BLOB at all, a snapshot is getting created. *Is this 
expected*?

Many thanks - single-link RTFM answers welcome, I have seen the MVCC through 
pictures, and I get it - just not how a BLOB fits into MVCC here.

./scc

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Missing OID define

2012-10-02 Thread Phil Sorber
Thom Brown and I were doing some hacking the other day and came across
this missing define. We argued over who was going to send the patch in
and I lost. So here it is.


pg_type_uuid_oid.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Euler Taveira
On 02-10-2012 10:15, Peter Geoghegan wrote:
 There are other, similar tools that exist in proprietary databases.
 They expose a hash value, which is subject to exactly the same caveats
 as our own. They explicitly encourage the type of aggregation by
 third-party tools that I anticipate will happen with
 pg_stat_statements. I want to facilitate this, but I'm confident that
 the use of (dbid, userid, querytext) as a candidate key by these
 tools is going to cause confusion, in subtle ways. By using the hash
 value, those tools are exactly as well off as pg_stat_statements is,
 which is to say, as well off as possible.
 
It depends on how you will use this information. If it is for a rapid
analysis, it is fine to use a hash because the object internals won't change
(I hope not). However, if you want to analyze query statistics for a long
period of time (say a few months) or your environment is distributed, you
can't use the hash. There isn't a perfect solution but I see both cases being
useful for analysis tools.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Magnus Hagander
P
On Oct 2, 2012 5:04 PM, Euler Taveira eu...@timbira.com wrote:

 On 02-10-2012 10:15, Peter Geoghegan wrote:
  There are other, similar tools that exist in proprietary databases.
  They expose a hash value, which is subject to exactly the same caveats
  as our own. They explicitly encourage the type of aggregation by
  third-party tools that I anticipate will happen with
  pg_stat_statements. I want to facilitate this, but I'm confident that
  the use of (dbid, userid, querytext) as a candidate key by these
  tools is going to cause confusion, in subtle ways. By using the hash
  value, those tools are exactly as well off as pg_stat_statements is,
  which is to say, as well off as possible.
 
 It depends on how you will use this information. If it is for a rapid
 analysis, it is fine to use a hash because the object internals won't
change
 (I hope not). However, if you want to analyze query statistics for a long
 period of time (say a few months) or your environment is distributed, you
 can't use the hash. There isn't a perfect solution but I see both cases
being
 useful for analysis tools.

Having the hash available in no way prevents you from using the query
string ad your key. Not having the hash certainly prevents you from using
it.

/Magnus


Re: [HACKERS] Doc patch to note which system catalogs have oids

2012-10-02 Thread Karl O. Pinc
On 09/25/2012 12:28:13 AM, Karl O. Pinc wrote:
 On 09/23/2012 08:57:45 PM, Karl O. Pinc wrote:
 
  The attached patch documents the oid column of those
  system catalogs having an oid.
 
 Don't use the first version of this patch (oid_doc.patch)
 without discarding the last hunk.  The last hunk
 introduces an error by duplicating the
 documentation of the pg_roles.oid column.
 
 (I am reluctant to post a revised version
 since there's already 3 versions floating
 around representing 2 different approaches
 and no consensus as to which approach, if any, should
 be taken.)

I have figured out how to use the commitfest
pages to track what should be reviewed.
Submitting a corrected version of the very
first patch, which treats the oids as
regular columns.

I am now submitting patches to the commitfest
for review.  (I'm not sure how I missed this.)

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f999190..babb11c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -427,6 +427,13 @@
 tbody
 
  row
+  entrystructfieldoid/structfield/entry
+  entrytypeoid/type/entry
+  entry/entry
+  entryRow identifier/entry
+ /row
+
+ row
   entrystructfieldamname/structfield/entry
   entrytypename/type/entry
   entry/entry
@@ -683,6 +690,13 @@
 tbody
 
  row
+  entrystructfieldoid/structfield/entry
+  entrytypeoid/type/entry
+  entry/entry
+  entryRow identifier/entry
+ /row
+
+ row
   entrystructfieldamopfamily/structfield/entry
   entrytypeoid/type/entry
   entryliterallink linkend=catalog-pg-opfamilystructnamepg_opfamily/structname/link.oid/literal/entry
@@ -819,6 +833,13 @@
 tbody
 
  row
+  entrystructfieldoid/structfield/entry
+  entrytypeoid/type/entry
+  entry/entry
+  entryRow identifier/entry
+ /row
+
+ row
   entrystructfieldamprocfamily/structfield/entry
   entrytypeoid/type/entry
   entryliterallink linkend=catalog-pg-opfamilystructnamepg_opfamily/structname/link.oid/literal/entry
@@ -902,6 +923,13 @@
 
 tbody
  row
+  entrystructfieldoid/structfield/entry
+  entrytypeoid/type/entry
+  entry/entry
+  entryRow identifier/entry
+ /row
+
+ row
   entrystructfieldadrelid/structfield/entry
   entrytypeoid/type/entry
   entryliterallink linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
@@ -1257,6 +1285,14 @@
 /thead
 
 tbody
+
+ row
+  entrystructfieldoid/structfield/entry
+  entrytypeoid/type/entry
+  entry/entry
+  entryRow identifier/entry
+ /row
+
  row
   entrystructfieldrolname/structfield/entry
   entrytypename/type/entry
@@ -1462,6 +1498,13 @@
 
 tbody
  row
+  entrystructfieldoid/structfield/entry
+  entrytypeoid/type/entry
+  entry/entry
+  entryRow identifier/entry
+ /row
+
+ row
   entrystructfieldcastsource/structfield/entry
   entrytypeoid/type/entry
   entryliterallink linkend=catalog-pg-typestructnamepg_type/structname/link.oid/literal/entry
@@ -1577,6 +1620,13 @@
 
 tbody
  row
+  entrystructfieldoid/structfield/entry
+  entrytypeoid/type/entry
+  entry/entry
+  entryRow identifier/entry
+ /row
+
+ row
   entrystructfieldrelname/structfield/entry
   entrytypename/type/entry
   entry/entry
@@ -1984,6 +2034,13 @@
 
 tbody
  row
+  entrystructfieldoid/structfield/entry
+  entrytypeoid/type/entry
+  entry/entry
+  entryRow identifier/entry
+ /row
+
+ row
   entrystructfieldconname/structfield/entry
   entrytypename/type/entry
   entry/entry
@@ -2250,6 +2307,13 @@
 
 tbody
  row
+  entrystructfieldoid/structfield/entry
+  entrytypeoid/type/entry
+  entry/entry
+  entryRow identifier/entry
+ /row
+
+ row
   entrystructfieldcollname/structfield/entry
   entrytypename/type/entry
   entry/entry
@@ -2350,6 +2414,13 @@
 
 tbody
  row
+  entrystructfieldoid/structfield/entry
+  entrytypeoid/type/entry
+  entry/entry
+  entryRow identifier/entry
+ /row
+
+ row
   entrystructfieldconname/structfield/entry
   entrytypename/type/entry
   entry/entry
@@ -2443,6 +2514,13 @@
 
 tbody
  row
+  entrystructfieldoid/structfield/entry
+  entrytypeoid/type/entry
+  entry/entry
+  entryRow identifier/entry
+ /row
+
+ row
   entrystructfielddatname/structfield/entry
   entrytypename/type/entry
   entry/entry
@@ -2652,6 +2730,13 @@
 
 tbody
  row
+  entrystructfieldoid/structfield/entry
+  entrytypeoid/type/entry
+  entry/entry
+  entryRow identifier/entry
+ /row
+
+ row
   entrystructfielddefaclrole/structfield/entry
   

Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-10-02 Thread Bruce Momjian
On Tue, Sep 25, 2012 at 09:10:33AM -0400, Bruce Momjian wrote:
  lc_collate cluster values do not match: old zh_CN.utf8, new zh_CN.UTF-8
  Failure, exiting
  
  zh_CN.utf8 is provided by the installer and zh_CN.UTF-8 is my system
  default.
 
 OK, this tells us that the canonicalization code used in initdb is not
 going to help us in pg_upgrade, at least not on your system, and not on
 mine.
 
 I think we should apply the patch that fixes the TOAST problem with
 information_schema, and the patch that outputs the old/new values for
 easier debugging.  Other than that, I don't know what else we can do
 except to ignore dashes when comparing locale names, which I am told is
 unacceptable.

Based on this great bug report and submitter leg-work, I have applied
three patches to pg_upgrade in head and 9.2, all attached:

*  try to get the canonical locale names, and report old/new values on mismatch
*  update query to skip toast tables for system objects
*  improve error reporting when the object counts don't match

None of these bugs caused pg_upgrade to produce an incorrect upgraded
cluster, so I am not going to panic and try to force them into 9.1,
which probably isn't being used by many people anymore anyway.

I think this closes this report.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
new file mode 100644
index 74b13e7..9d08f41
*** a/contrib/pg_upgrade/info.c
--- b/contrib/pg_upgrade/info.c
*** get_rel_infos(ClusterInfo *cluster, DbIn
*** 269,302 
  	 */
  
  	snprintf(query, sizeof(query),
! 			 SELECT c.oid, n.nspname, c.relname, 
! 			 	c.relfilenode, c.reltablespace, %s 
  			 FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n 
  			 	   ON c.relnamespace = n.oid 
! 			   LEFT OUTER JOIN pg_catalog.pg_tablespace t 
! 			 	   ON c.reltablespace = t.oid 
! 			 WHERE relkind IN ('r','t', 'i'%s) AND 
  	/* exclude possible orphaned temp tables */
  			   ((n.nspname !~ '^pg_temp_' AND 
  			 n.nspname !~ '^pg_toast_temp_' AND 
! 			 n.nspname NOT IN ('pg_catalog', 'information_schema', 'binary_upgrade') AND 
  			 	  c.oid = %u) 
  			   OR (n.nspname = 'pg_catalog' AND 
! 	relname IN ('pg_largeobject', 'pg_largeobject_loid_pn_index'%s) )) 
! 	/* we preserve pg_class.oid so we sort by it to match old/new */
! 			 ORDER BY 1;,
! 	/* 9.2 removed the spclocation column */
! 			 (GET_MAJOR_VERSION(cluster-major_version) = 901) ?
! 			 t.spclocation : pg_catalog.pg_tablespace_location(t.oid) AS spclocation,
  	/* see the comment at the top of old_8_3_create_sequence_script() */
  			 (GET_MAJOR_VERSION(old_cluster.major_version) = 803) ?
  			  : , 'S',
- 	/* this oid allows us to skip system toast tables */
  			 FirstNormalObjectId,
  	/* does pg_largeobject_metadata need to be migrated? */
  			 (GET_MAJOR_VERSION(old_cluster.major_version) = 804) ?
  	 : , 'pg_largeobject_metadata', 'pg_largeobject_metadata_oid_index');
  
  	res = executeQueryOrDie(conn, %s, query);
  
  	ntups = PQntuples(res);
--- 269,327 
  	 */
  
  	snprintf(query, sizeof(query),
! 			 CREATE TEMPORARY TABLE info_rels (reloid) AS SELECT c.oid 
  			 FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n 
  			 	   ON c.relnamespace = n.oid 
! 			 WHERE relkind IN ('r', 'i'%s) AND 
  	/* exclude possible orphaned temp tables */
  			   ((n.nspname !~ '^pg_temp_' AND 
  			 n.nspname !~ '^pg_toast_temp_' AND 
! 	/* skip pg_toast because toast index have relkind == 'i', not 't' */
! 			 n.nspname NOT IN ('pg_catalog', 'information_schema', 
! 			 		'binary_upgrade', 'pg_toast') AND 
  			 	  c.oid = %u) 
  			   OR (n.nspname = 'pg_catalog' AND 
! 	relname IN ('pg_largeobject', 'pg_largeobject_loid_pn_index'%s) ));,
  	/* see the comment at the top of old_8_3_create_sequence_script() */
  			 (GET_MAJOR_VERSION(old_cluster.major_version) = 803) ?
  			  : , 'S',
  			 FirstNormalObjectId,
  	/* does pg_largeobject_metadata need to be migrated? */
  			 (GET_MAJOR_VERSION(old_cluster.major_version) = 804) ?
  	 : , 'pg_largeobject_metadata', 'pg_largeobject_metadata_oid_index');
  
+ 	PQclear(executeQueryOrDie(conn, %s, query));
+ 
+ 	/*
+ 	 *	Get TOAST tables and indexes;  we have to gather the TOAST tables in
+ 	 *	later steps because we can't schema-qualify TOAST tables.
+ 	 */
+ 	PQclear(executeQueryOrDie(conn,
+ 			  INSERT INTO info_rels 
+ 			  SELECT reltoastrelid 
+ 			  FROM info_rels i JOIN pg_catalog.pg_class c 
+ 			  		ON i.reloid = c.oid));
+ 	PQclear(executeQueryOrDie(conn,
+ 			  INSERT INTO info_rels 
+ 			  SELECT reltoastidxid 
+ 			  FROM info_rels i JOIN pg_catalog.pg_class c 
+ 			  		ON i.reloid = c.oid));
+ 
+ 	snprintf(query, sizeof(query),
+ 			 SELECT c.oid, n.nspname, c.relname, 
+ 			 	c.relfilenode, c.reltablespace, %s 

[HACKERS] xmalloc = pg_malloc

2012-10-02 Thread Tom Lane
While looking around to fix the pg_malloc(0) issue, I noticed that
various other pieces of code such as pg_basebackup have essentially
identical functions, except they're called xmalloc().  I propose to
standardize all these things on this set of names:

pg_malloc
pg_malloc0  (for malloc-and-zero behavior)
pg_calloc   (randomly different API for pg_malloc0)
pg_realloc
pg_free
pg_strdup

Any objections?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xmalloc = pg_malloc

2012-10-02 Thread Andres Freund
On Tuesday, October 02, 2012 06:02:13 PM Tom Lane wrote:
 While looking around to fix the pg_malloc(0) issue, I noticed that
 various other pieces of code such as pg_basebackup have essentially
 identical functions, except they're called xmalloc().  I propose to
 standardize all these things on this set of names:
 
   pg_malloc
   pg_malloc0  (for malloc-and-zero behavior)
   pg_calloc   (randomly different API for pg_malloc0)
Do we need this?

   pg_realloc
   pg_free
   pg_strdup
I wonder whether the same set of functions should also be available in the 
backend with ereport(EC_OUT_OF_MEMORY, ...) behaviour as well. As noted before 
the are quite some malloc() calls around. Not all of them should be replaced, 
but I think quite some could.

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xmalloc = pg_malloc

2012-10-02 Thread Bruce Momjian
On Tue, Oct  2, 2012 at 12:02:13PM -0400, Tom Lane wrote:
 While looking around to fix the pg_malloc(0) issue, I noticed that
 various other pieces of code such as pg_basebackup have essentially
 identical functions, except they're called xmalloc().  I propose to
 standardize all these things on this set of names:
 
   pg_malloc
   pg_malloc0  (for malloc-and-zero behavior)
   pg_calloc   (randomly different API for pg_malloc0)
   pg_realloc
   pg_free
   pg_strdup
 
 Any objections?

Please standarize.  I am totally confused by the many memory
implementations we have.  (Pg_upgrade uses pg_malloc().)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xmalloc = pg_malloc

2012-10-02 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 pg_calloc(randomly different API for pg_malloc0)

 Do we need this?

I thought about getting rid of it, but there are some dozens of calls
scattered across several files, so I wasn't sure it was worth it.
Anybody else have an opinion?

 I wonder whether the same set of functions should also be available in the 
 backend with ereport(EC_OUT_OF_MEMORY, ...) behaviour as well.

In the backend, you almost always ought to be using palloc instead.
The only places where it's really appropriate to be using malloc
directly are where you don't want an error thrown for out-of-memory.
So I think providing these in the backend would do little except to
encourage bad programming.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xmalloc = pg_malloc

2012-10-02 Thread Phil Sorber
On Tue, Oct 2, 2012 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 pg_calloc(randomly different API for pg_malloc0)

 Do we need this?

 I thought about getting rid of it, but there are some dozens of calls
 scattered across several files, so I wasn't sure it was worth it.
 Anybody else have an opinion?

I think having more than 1 function that does the same thing is
generally a bad idea. It sounds like it is going to cause confusion
and provide no real benefit.


 I wonder whether the same set of functions should also be available in the
 backend with ereport(EC_OUT_OF_MEMORY, ...) behaviour as well.

 In the backend, you almost always ought to be using palloc instead.
 The only places where it's really appropriate to be using malloc
 directly are where you don't want an error thrown for out-of-memory.
 So I think providing these in the backend would do little except to
 encourage bad programming.

 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xmalloc = pg_malloc

2012-10-02 Thread Andres Freund
On Tuesday, October 02, 2012 06:30:33 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  pg_calloc  (randomly different API for pg_malloc0)
  
  Do we need this?
 
 I thought about getting rid of it, but there are some dozens of calls
 scattered across several files, so I wasn't sure it was worth it.
I always found calloc to be a pointless API but by now I have learned what it 
means so I don't have a strong opinion.


  I wonder whether the same set of functions should also be available in
  the backend with ereport(EC_OUT_OF_MEMORY, ...) behaviour as well.
 
 In the backend, you almost always ought to be using palloc instead.
 The only places where it's really appropriate to be using malloc
 directly are where you don't want an error thrown for out-of-memory.
 So I think providing these in the backend would do little except to
 encourage bad programming.
We seem to have 100+ usages of malloc() anyway. Several of those are in helper 
libraries like regex/* though. A quick grep shows most of the others to be 
valid in my opinion. Mostly its allocating memory which is never deallocated 
but to big to unconditionally allocated.

The quick grep showed that at least src/backend/utils/misc/ps_status.c doesn't 
properly check the return value. Others (mctx.c) use Asserts...

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Stephen Frost
* Peter Geoghegan (pe...@2ndquadrant.com) wrote:
 On 1 October 2012 18:05, Stephen Frost sfr...@snowman.net wrote:
  You're going to have to help me here, 'cause I don't see how there can
  be duplicates if we include the PGSS_FILE_HEADER as part of the hash,
  unless we're planning to keep PGSS_FILE_HEADER constant while we change
  what the hash value is for a given query, yet that goes against the
  assumptions that were laid out, aiui.
 
 Well, they wouldn't be duplicates if you think that the fact that one
 query was executed before some point release and another after ought
 to differentiate queries. I do not.

This would only be if we happened to change what hash was generated for
a given query during such a point release, where I share your feeling
that it aught to be quite rare.  I'm not suggestion we do this for every
point release...

 By invalidate, I mean that when we go to open the saved file, if the
 header doesn't match, the file is considered corrupt, and we simply
 log that the file could not be read, before unlinking it. This would
 be necessary in the unlikely event of there being some substantive
 change in the representation of query trees in a point release. I am
 not aware of any precedent for this, though Tom said that there was
 one.

Right, and that's all I'm trying to address here- how do we provide a
value for a given query which can be relied upon by outside sources,
even in the face of a point release which changes what our internal hash
value for a given query is.

 I don't want to get too hung up on what we'd do if this problem
 actually occurred, because that isn't what this thread is about.

[...]

 I simply do not understand objections to the proposal. Have I missed 
 something?

It was my impression that the concern is the stability of the hash value
and ensuring that tools which operate on it don't mistakenly lump two
different queries into one because they had the same hash value (caused
by a change in our hashing algorithm or input into it over time, eg a
point release).  I was hoping to address that to allow this proposal to
move forward..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Peter Geoghegan (pe...@2ndquadrant.com) wrote:
 I simply do not understand objections to the proposal. Have I missed 
 something?

 It was my impression that the concern is the stability of the hash value
 and ensuring that tools which operate on it don't mistakenly lump two
 different queries into one because they had the same hash value (caused
 by a change in our hashing algorithm or input into it over time, eg a
 point release).  I was hoping to address that to allow this proposal to
 move forward..

I think there are at least two questions that ought to be answered:

1. Why isn't something like md5() on the reported query text an equally
good solution for users who want a query hash?

2. If people are going to accumulate stats on queries over a long period
of time, is a 32-bit hash really good enough for the purpose?  If I'm
doing the math right, the chance of collision is already greater than 1%
at 1 queries, and rises to about 70% for 10 queries; see
http://en.wikipedia.org/wiki/Birthday_paradox
We discussed this issue and decided it was okay for pg_stat_statements's
internal hash table, but it's not at all clear to me that it's sensible
to use 32-bit hashes for external accumulation of query stats.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect behaviour when using a GiST index on points

2012-10-02 Thread Noah Misch
On Tue, Aug 28, 2012 at 05:04:09PM -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  There's also the big-picture question of whether we should just get rid
  of fuzzy comparisons in the geometric types instead of trying to hack
  indexes to work around them.
 
  +1 for that approach, but only if I don't have to do the work.

I agree in the abstract; why should a point (position on a 2D plane) compare
fuzzily while a float8 (position on a 1D number line) does not?  But ...

  Otherwise, +1 for doing the simplest thing that we're sure will
  eliminate wrong answers.
 
 What we're forced to speculate about here is how many applications out
 there are relying on fuzzy comparison to get answers they like, versus
 how many are getting answers they don't like because of it.  The fact
 that the underlying storage is float8 not numeric suggests there are
 probably some cases where fuzzy is helpful.

... yes.  Having never used these types in practice, I won't venture a guess.
Anyone else?

 I've never cared for the particulars of the way the fuzzy comparisons
 are done, in any case: using an absolute rather than relative error
 threshold is wrong according to every numerical analysis principle
 I know.

Definitely.

 The long and the short of it is that it will probably take a significant
 investment of work to make something that's clearly better.  If that
 weren't the case, we'd have done something long ago.

In any event, I think we should entertain a patch to make the GiST operator
class methods bug-compatible with corresponding operators.  Even if we decide
to change operator behavior in HEAD, the back branches could use it.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prevent restored WAL files from being archived again Re: Unnecessary WAL archiving after failover

2012-10-02 Thread Fujii Masao
On Sat, Aug 11, 2012 at 2:19 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Aug 9, 2012 at 8:08 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 29 July 2012 16:01, Fujii Masao masao.fu...@gmail.com wrote:

 Attached patch changes the startup process so that it creates .done file
 whenever WAL file is successfully restored, whether archive mode is
 enabled or not. The restored WAL files will not be archived again because
 of .done file.

 The proposed patch works, for archiving only, but I don't like the
 code. It's a partial refactoring of existing code.

 I prefer to go for a full re-factoring version for HEAD, and a zero
 refactoring version for 9.2 since we're deep into beta.

Isn't it time to push the full re-factoring version to HEAD? If there is no
such version yet, what about pushing the zero refactoring version for now?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] review: pgbench - aggregation of info written into log

2012-10-02 Thread Pavel Stehule
Hello

I did review of pgbench patch
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00737.php

* this patch is cleanly patched

* I had a problem with doc

make[1]: Entering directory `/home/pavel/src/postgresql/doc/src/sgml'
openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
-c /usr/share/sgml/docbook/dsssl-stylesheets/catalog -d stylesheet.dsl
-t sgml -i output-html -V html-index postgres.sgml
openjade:pgbench.sgml:767:8:E: document type does not allow element
TITLE here; missing one of ABSTRACT, AUTHORBLURB, MSGSET,
CALLOUTLIST, ITEMIZEDLIST, ORDEREDLIST, SEGMENTEDLIST,
VARIABLELIST, CAUTION, IMPORTANT, NOTE, TIP, WARNING,
FORMALPARA, BLOCKQUOTE, EQUATION, EXAMPLE, FIGURE, TABLE,
PROCEDURE, SIDEBAR, QANDASET, REFSECT3 start-tag
make[1]: *** [HTML.index] Error 1
make[1]: *** Deleting file `HTML.index'
make[1]: Leaving directory `/home/pavel/src/postgresql/doc/src/sgml

* pgbench is compiled without warnings

* there is a few issues in source code

+   
+   /* should we aggregate the results or not? */
+   if (use_log_agg)
+   {
+   
+   /* are we still in the same interval? if yes, 
accumulate the
+* values (print them otherwise) */
+   if (agg-start_time + use_log_agg = 
INSTR_TIME_GET_DOUBLE(now))
+   {
+   

* a real time range for aggregation is dynamic (pgbench is not
realtime application), so I am not sure if following constraint has
sense

 +  if ((duration  0)  (use_log_agg  0)  (duration % use_log_agg != 
0)) {
+   fprintf(stderr, duration (%d) must be a multiple of aggregation
interval (%d)\n, duration, use_log_agg);
+   exit(1);
+   }

* it doesn't flush last aggregated data (it is mentioned by Tomas:
Also, I've realized the last interval may not be logged at all - I'll
take a look into this in the next version of the patch.). And it can
be significant for higher values of use_log_agg

* a name of variable use_log_agg is not good - maybe log_agg_interval ?

Regards

Pavel Stehule


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prevent restored WAL files from being archived again Re: Unnecessary WAL archiving after failover

2012-10-02 Thread Simon Riggs
On 2 October 2012 19:06, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Aug 11, 2012 at 2:19 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Aug 9, 2012 at 8:08 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 29 July 2012 16:01, Fujii Masao masao.fu...@gmail.com wrote:

 Attached patch changes the startup process so that it creates .done file
 whenever WAL file is successfully restored, whether archive mode is
 enabled or not. The restored WAL files will not be archived again because
 of .done file.

 The proposed patch works, for archiving only, but I don't like the
 code. It's a partial refactoring of existing code.

 I prefer to go for a full re-factoring version for HEAD, and a zero
 refactoring version for 9.2 since we're deep into beta.

 Isn't it time to push the full re-factoring version to HEAD? If there is no
 such version yet, what about pushing the zero refactoring version for now?

If you send a rebased patch, I'll review, but its not high on my radar
right now unless you can explain why it should be higher.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prevent restored WAL files from being archived again Re: Unnecessary WAL archiving after failover

2012-10-02 Thread Fujii Masao
On Wed, Oct 3, 2012 at 3:11 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 2 October 2012 19:06, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Aug 11, 2012 at 2:19 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Aug 9, 2012 at 8:08 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 29 July 2012 16:01, Fujii Masao masao.fu...@gmail.com wrote:

 Attached patch changes the startup process so that it creates .done file
 whenever WAL file is successfully restored, whether archive mode is
 enabled or not. The restored WAL files will not be archived again because
 of .done file.

 The proposed patch works, for archiving only, but I don't like the
 code. It's a partial refactoring of existing code.

 I prefer to go for a full re-factoring version for HEAD, and a zero
 refactoring version for 9.2 since we're deep into beta.

 Isn't it time to push the full re-factoring version to HEAD? If there is no
 such version yet, what about pushing the zero refactoring version for now?

 If you send a rebased patch, I'll review,

Okay. Will do. The patch needs to be revised to correspond with the recent
split of xlog.c.

 but its not high on my radar
 right now unless you can explain why it should be higher.

It may not be high, but I'm just worried that we are likely to forget to
apply that change into HEAD if we postpone it furthermore.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Peter Geoghegan
On 2 October 2012 18:16, Tom Lane t...@sss.pgh.pa.us wrote
 1. Why isn't something like md5() on the reported query text an equally
 good solution for users who want a query hash?

Because that does not uniquely identify the entry. The very first
thing that the docs say on search_path is Qualified names are tedious
to write, and it's often best not to wire a particular schema name
into applications anyway. Presumably, the reason it's best not to
wire schema names into apps is because it might be useful to modify
search_path in a way that dynamically made the same queries in some
application reference what are technically distinct relations. If
anyone does this, and it seems likely that many do for various
reasons, they will be out of luck when using some kind of
pg_stat_statements aggregation.

This was the behaviour that I intended for pg_stat_statements all
along, and I think it's better than a solution that matches query
strings.

 2. If people are going to accumulate stats on queries over a long period
 of time, is a 32-bit hash really good enough for the purpose?  If I'm
 doing the math right, the chance of collision is already greater than 1%
 at 1 queries, and rises to about 70% for 10 queries; see
 http://en.wikipedia.org/wiki/Birthday_paradox
 We discussed this issue and decided it was okay for pg_stat_statements's
 internal hash table, but it's not at all clear to me that it's sensible
 to use 32-bit hashes for external accumulation of query stats.

Well, forgive me for pointing this out, but I did propose that the
hash be a 64-bit value (which would have necessitated adopting
hash_any() to produce 64-bit values), but you rejected the proposal. I
arrived at the same probability for a collision as you did and posted
in to the list, in discussion shortly after the normalisation stuff
was committed.

A more sensible way of assessing the risk of a collision would be to
try and come up with the probability of a collision that someone
actually ends up caring about, which is considerably less than the 1%
for 10,000 entries. I'm not being glib - people are very used to the
idea that aggregating information on query costs is a lossy process.
Prior to 9.2, the only way execution costs could reasonably be
measured on at the query granularity on a busy system was to set
log_min_duration_statement to something like 1 second.

I am also unconvinced by the idea that aggregating historical data
(with the same hash value) in a separate application is likely to make
the collision situation appreciably worse. People are going to be
using something like an RRD circular buffer to aggregate the
information, and I can't see anyone caring about detailed information
that is more than a couple of weeks in the past. The point of
aggregation isn't to store more queries, it's to construct time-series
data from snapshots. Besides, do most applications really even have
more than 10,000 distinct queries?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raise a WARNING if a REVOKE affects nothing?

2012-10-02 Thread Noah Misch
On Tue, Aug 21, 2012 at 02:31:29PM +0800, Craig Ringer wrote:
 It'd really help if REVOKE consistently raised warnings when it didn't  
 actually revoke anything.

+1

This will invite the same mixed feelings as the CREATE x IF NOT EXISTS
notices, but I think it's worthwhile.

 Even better, a special case for REVOKEs on objects that only have owner  
 and public permissions could say:

 WARNING: REVOKE didn't remove any permissions for user blah. This  
 table/db/whatever
 has default permissions, so there were no GRANTs for user blah to  
 revoke. See the documentation
 for REVOKE for more information.

The extra aid from saying those particular things is not clear to me.

It might be overkill, but we could report any other roles indirectly conveying
access to the named role.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Alvaro Herrera

The fundamental issue with this patch hasn't been answered sufficiently,
I think.  Consider the following sequence of commands:

create schema if not exists foo create table first (a int);
create schema if not exists foo create table second (a int);


As far as I can see, with the patch as it currently stands, you would
end up with only table first in the schema, which seems very
surprising to me.

I think this needs more thought, and in any case it needs more
comprehensive regression test and documentation (i.e. at least the
examples ought to explain what would happen in such cases).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Peter Geoghegan
On 2 October 2012 17:58, Stephen Frost sfr...@snowman.net wrote:
 Right, and that's all I'm trying to address here- how do we provide a
 value for a given query which can be relied upon by outside sources,
 even in the face of a point release which changes what our internal hash
 value for a given query is.

I don't know of a way. Presumably, we'd hope to avoid this, and would
look for alternatives to anything that would necessitate bumping
PGSS_FILE_HEADER, while not going so far as to let pg_stat_statements
contort things in the core system. If I was aware of a case where this
would have come up had pg_stat_statements fingerprinting been around
at the time, perhaps I could give a better answer than that.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Alvaro Herrera
Excerpts from Volker Grabsch's message of sáb sep 29 06:32:13 -0300 2012:
 Dickson S. Guedes schrieb:
  - https://commitfest.postgresql.org/action/patch_view?id=907
  
  The patch is small and implements a new syntax to CREATE SCHEMA
  that allow the creation of a schema be skipped when IF NOT EXISTS is
  used.
 
  [...]
 
  - Should this patch implements others INEs like ADD COLUMN IF NOT EXISTS?
 
 If there's still a chance to improve the patch, I'd love to see
 the following INEs implemented. Several real-world database
 upgrade scripts would benefit from those:

I don't see that this patch is responsible for such new commands.  If
you want them, feel free to submit separate patches for them (or have
someone else do it for you).  But see the thread starting at
http://archives.postgresql.org/message-id/467881.79137.qm%40web27104.mail.ukl.yahoo.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raise a WARNING if a REVOKE affects nothing?

2012-10-02 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Noah Misch
 Sent: Tuesday, October 02, 2012 3:02 PM
 To: Craig Ringer
 Cc: PostgreSQL Hackers
 Subject: Re: [HACKERS] Raise a WARNING if a REVOKE affects nothing?
 
 On Tue, Aug 21, 2012 at 02:31:29PM +0800, Craig Ringer wrote:
  It'd really help if REVOKE consistently raised warnings when it didn't
  actually revoke anything.
 
 +1
 
 This will invite the same mixed feelings as the CREATE x IF NOT EXISTS
 notices, but I think it's worthwhile.
 
  Even better, a special case for REVOKEs on objects that only have
  owner and public permissions could say:
 
  WARNING: REVOKE didn't remove any permissions for user blah. This
  table/db/whatever has default permissions, so there were no GRANTs
  for user blah to revoke. See the documentation for REVOKE for more
  information.
 
 The extra aid from saying those particular things is not clear to me.
 
 It might be overkill, but we could report any other roles indirectly
conveying
 access to the named role.
 

Having been bitten by this myself I do see the value in such a warning.  It
is not uncommon for someone using REVOKE to believe they are installing a
block instead of removing an allowance; especially as it interacts with
default permissions.

That said, and this is an off-the-cuff thought, the entire UI for
permissions, and its treatment in the documentation, seems to be fact
oriented.  The system is well documented but actually getting up to speed to
learn and use it is still a matter of reading the documentation and figuring
out how everything fits together.  I haven't given it that much thought but
I am curious if others are of the same opinion.

IOW, this proposal is an attempt to fix a symptom without addressing the
root cause.

Food for thought.

David J.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread David E. Wheeler
On Oct 2, 2012, at 12:08 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 create schema if not exists foo create table first (a int);
 create schema if not exists foo create table second (a int);
 
 
 As far as I can see, with the patch as it currently stands, you would
 end up with only table first in the schema, which seems very
 surprising to me.

Yeah, I think the second should die. CINE should only work if there are no 
other objects created as part of the statement, IMHO.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Alvaro Herrera
Excerpts from David E. Wheeler's message of mar oct 02 16:16:37 -0300 2012:
 On Oct 2, 2012, at 12:08 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 
  create schema if not exists foo create table first (a int);
  create schema if not exists foo create table second (a int);
  
  
  As far as I can see, with the patch as it currently stands, you would
  end up with only table first in the schema, which seems very
  surprising to me.
 
 Yeah, I think the second should die. CINE should only work if there are no 
 other objects created as part of the statement, IMHO.

Well, if that's the rationale then you end up with no schema foo at all
(i.e. both die), which seems even more surprising (though I admit it has
the advantage of being a simple rule to document.)

How about call this for precedent:

mkdir -p /tmp/foo/bar
mkdir -p /tmp/foo/baz

In this case you end up with directory foo and at least two subdirs in
it, bar and baz.  This works even if /tmp/foo existed previously and
even if there was some other stuff in it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread David E. Wheeler
On Oct 2, 2012, at 12:30 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 How about call this for precedent:
 
 mkdir -p /tmp/foo/bar
 mkdir -p /tmp/foo/baz
 
 In this case you end up with directory foo and at least two subdirs in
 it, bar and baz.  This works even if /tmp/foo existed previously and
 even if there was some other stuff in it.

Well, what about this, then?

create schema if not exists foo create table second (a int);
create schema if not exists foo create table second (b int);

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Excerpts from David E. Wheeler's message of mar oct 02 16:16:37 -0300 2012:
 On Oct 2, 2012, at 12:08 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 create schema if not exists foo create table first (a int);
 create schema if not exists foo create table second (a int);

 As far as I can see, with the patch as it currently stands, you would
 end up with only table first in the schema, which seems very
 surprising to me.

 Yeah, I think the second should die. CINE should only work if there are no 
 other objects created as part of the statement, IMHO.

 Well, if that's the rationale then you end up with no schema foo at all
 (i.e. both die), which seems even more surprising (though I admit it has
 the advantage of being a simple rule to document.)

I think we should just disallow putting any contained objects in the
statement when IF NOT EXISTS is used.  It's simple to understand, simple
to document and implement, and I think it covers all the sane use-cases
anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Alvaro Herrera
Excerpts from David E. Wheeler's message of mar oct 02 16:37:30 -0300 2012:
 On Oct 2, 2012, at 12:30 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 
  How about call this for precedent:
  
  mkdir -p /tmp/foo/bar
  mkdir -p /tmp/foo/baz
  
  In this case you end up with directory foo and at least two subdirs in
  it, bar and baz.  This works even if /tmp/foo existed previously and
  even if there was some other stuff in it.
 
 Well, what about this, then?
 
 create schema if not exists foo create table second (a int);
 create schema if not exists foo create table second (b int);

Yes, exactly -- what about this case?  This is precisely the reason we
don't have CREATE TABLE IF NOT EXISTS.

I don't know what the best answer is.  Most people seem to think that
the answer ought to be that you end up with a single column second.a,
and the second command errors out.

So if you do this:

create schema if not exists foo create table first (a int);
create schema if not exists foo create table first (b int),
create table second (a int);

you end up with *only* the first table, because the second command
errors out when the first table is observed to exist.

Now, what if you were to do this instead:

create schema if not exists foo
   create table if not exists first (a int);
create schema if not exists foo
   create table if not exists first (b int),
   create table if not exists second (a int);

The you end up with first.a and second.a.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread David E. Wheeler
On Oct 2, 2012, at 12:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I think we should just disallow putting any contained objects in the
 statement when IF NOT EXISTS is used.  It's simple to understand, simple
 to document and implement, and I think it covers all the sane use-cases
 anyway.

+1

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Andrew Dunstan


On 10/02/2012 03:48 PM, Tom Lane wrote:

Alvaro Herrera alvhe...@2ndquadrant.com writes:

Excerpts from David E. Wheeler's message of mar oct 02 16:16:37 -0300 2012:

On Oct 2, 2012, at 12:08 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

create schema if not exists foo create table first (a int);
create schema if not exists foo create table second (a int);

As far as I can see, with the patch as it currently stands, you would
end up with only table first in the schema, which seems very
surprising to me.

Yeah, I think the second should die. CINE should only work if there are no 
other objects created as part of the statement, IMHO.

Well, if that's the rationale then you end up with no schema foo at all
(i.e. both die), which seems even more surprising (though I admit it has
the advantage of being a simple rule to document.)

I think we should just disallow putting any contained objects in the
statement when IF NOT EXISTS is used.  It's simple to understand, simple
to document and implement, and I think it covers all the sane use-cases
anyway.




I thought we'd already agreed on this.

+1.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Alvaro Herrera
Excerpts from Andrew Dunstan's message of mar oct 02 17:24:38 -0300 2012:
 
 On 10/02/2012 03:48 PM, Tom Lane wrote:
  Alvaro Herrera alvhe...@2ndquadrant.com writes:

  Well, if that's the rationale then you end up with no schema foo at all
  (i.e. both die), which seems even more surprising (though I admit it has
  the advantage of being a simple rule to document.)
  I think we should just disallow putting any contained objects in the
  statement when IF NOT EXISTS is used.  It's simple to understand, simple
  to document and implement, and I think it covers all the sane use-cases
  anyway.
 
 I thought we'd already agreed on this.

Well, it's not what the latest proposed patch implements.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Martijn van Oosterhout
On Tue, Oct 02, 2012 at 12:58:15PM -0400, Stephen Frost wrote:
  I simply do not understand objections to the proposal. Have I missed 
  something?
 
 It was my impression that the concern is the stability of the hash value
 and ensuring that tools which operate on it don't mistakenly lump two
 different queries into one because they had the same hash value (caused
 by a change in our hashing algorithm or input into it over time, eg a
 point release).  I was hoping to address that to allow this proposal to
 move forward..

That makes no sense though. The moment you talk about hash you
consider the possibility of lumping together things that aren't the
same.  Any tools using these hashes must have realised this.

Fortunatly, the statistics are better than the birthday paradox. The
chances that the two most important queries in your system end up
having the same hash is miniscule.

Like mentioned elsewhere, a system with more than 10,000 different
queries sounds rare to me (once you exclude query parameters ofcourse).

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Fabrízio de Royes Mello
2012/10/2 Alvaro Herrera alvhe...@2ndquadrant.com

 Excerpts from Andrew Dunstan's message of mar oct 02 17:24:38 -0300 2012:
 
  On 10/02/2012 03:48 PM, Tom Lane wrote:
   Alvaro Herrera alvhe...@2ndquadrant.com writes:

   Well, if that's the rationale then you end up with no schema foo at
 all
   (i.e. both die), which seems even more surprising (though I admit it
 has
   the advantage of being a simple rule to document.)
   I think we should just disallow putting any contained objects in the
   statement when IF NOT EXISTS is used.  It's simple to understand,
 simple
   to document and implement, and I think it covers all the sane use-cases
   anyway.
 
  I thought we'd already agreed on this.

 Well, it's not what the latest proposed patch implements.


You're right... the latest proposed patch don't implements it.

I'll change the patch and send soon...

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS

2012-10-02 Thread Stirling Newberry
This case points to a weakness in many programming languages, not having a 
clear ifof (if and only if) versus if construction. 

The sane use case for create schema foo if not exists object is for building 
a database dynamically, where several points may be the first to put a table in 
a schema, and schemas should not be created if there are no objects. The 
create/search/drop design pattern having its own problems.

Thus the construction should default to one behavior, and have an option for 
the second.

e.g.

create schema foo if not exists (will not be done if foo existed)
create schema foo if not exists FORCE (will be done even if foo existed)

This would even allow for mixed e.g. 

create schema foo if not exists (tables that should be created once and not 
again)
FORCE (objects routine will add if the schema does)




On Oct 2, 2012, at 6:33 PM, Fabrízio de Royes Mello wrote:

 
 2012/10/2 Alvaro Herrera alvhe...@2ndquadrant.com
 Excerpts from Andrew Dunstan's message of mar oct 02 17:24:38 -0300 2012:
 
  On 10/02/2012 03:48 PM, Tom Lane wrote:
   Alvaro Herrera alvhe...@2ndquadrant.com writes:
 
   Well, if that's the rationale then you end up with no schema foo at all
   (i.e. both die), which seems even more surprising (though I admit it has
   the advantage of being a simple rule to document.)
   I think we should just disallow putting any contained objects in the
   statement when IF NOT EXISTS is used.  It's simple to understand, simple
   to document and implement, and I think it covers all the sane use-cases
   anyway.
 
  I thought we'd already agreed on this.
 
 Well, it's not what the latest proposed patch implements.
 
 
 You're right... the latest proposed patch don't implements it.
 
 I'll change the patch and send soon...
 
 Regards,
 
 -- 
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
  Blog sobre TI: http://fabriziomello.blogspot.com
  Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
  Twitter: http://twitter.com/fabriziomello
 



Re: [HACKERS] pg_upgrade does not completely honor --new-port

2012-10-02 Thread Bruce Momjian

Applied to head and 9.2.

---

On Wed, Sep 26, 2012 at 10:06:50PM -0400, Bruce Momjian wrote:
 On Tue, Sep 25, 2012 at 05:36:54PM +0300, Devrim Gunduz wrote:
  
  Hi,
  
  I just performed a test upgrade from 9.1 to 9.2, and used --new-port
  variable. However, the analyze_new_cluster.sh does not include the new
  port, thus when I run it, it fails. Any chance to add the port number to
  the script?
 
 Well, the reason people normally use the port number is to do a live
 check, but obviously when the script is created it isn't doing a check. 
 I am worried that if I do embed the port number in there, then if they
 change the port after the upgrade, they now can't use the script.  I
 assume users would have PGPORT set before running the script, no?
 
  Also, is it worth to add the value specified in --new-bindir as a prefix
  to vacuumdb command in the same script?
 
 Wow, I never thought of adding a path to those scripts, but it certainly
 makes sense.  I have created the attached patch which does this.
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + It's impossible for everything to be true. +

 diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
 index bed10f8..2785eb7 100644
 --- a/contrib/pg_upgrade/check.c
 +++ b/contrib/pg_upgrade/check.c
 @@ -477,7 +477,7 @@ create_script_for_cluster_analyze(char 
 **analyze_script_file_name)
   ECHO_QUOTE, ECHO_QUOTE);
   fprintf(script, echo %sthis script and run:%s\n,
   ECHO_QUOTE, ECHO_QUOTE);
 - fprintf(script, echo %svacuumdb --all %s%s\n, ECHO_QUOTE,
 + fprintf(script, echo %s\%s/vacuumdb\ --all %s%s\n, ECHO_QUOTE, 
 new_cluster.bindir,
   /* Did we copy the free space files? */
   (GET_MAJOR_VERSION(old_cluster.major_version) = 804) ?
   --analyze-only : --analyze, ECHO_QUOTE);
 @@ -498,7 +498,7 @@ create_script_for_cluster_analyze(char 
 **analyze_script_file_name)
   ECHO_QUOTE, ECHO_QUOTE);
   fprintf(script, echo 
 %s--%s\n,
   ECHO_QUOTE, ECHO_QUOTE);
 - fprintf(script, vacuumdb --all --analyze-only\n);
 + fprintf(script, \%s/vacuumdb\ --all --analyze-only\n, 
 new_cluster.bindir);
   fprintf(script, echo%s\n, ECHO_BLANK);
   fprintf(script, echo %sThe server is now available with minimal 
 optimizer statistics.%s\n,
   ECHO_QUOTE, ECHO_QUOTE);
 @@ -519,7 +519,7 @@ create_script_for_cluster_analyze(char 
 **analyze_script_file_name)
   ECHO_QUOTE, ECHO_QUOTE);
   fprintf(script, echo 
 %s---%s\n,
   ECHO_QUOTE, ECHO_QUOTE);
 - fprintf(script, vacuumdb --all --analyze-only\n);
 + fprintf(script, \%s/vacuumdb\ --all --analyze-only\n, 
 new_cluster.bindir);
   fprintf(script, echo%s\n\n, ECHO_BLANK);
  
  #ifndef WIN32
 @@ -532,7 +532,7 @@ create_script_for_cluster_analyze(char 
 **analyze_script_file_name)
   ECHO_QUOTE, ECHO_QUOTE);
   fprintf(script, echo 
 %s-%s\n,
   ECHO_QUOTE, ECHO_QUOTE);
 - fprintf(script, vacuumdb --all %s\n,
 + fprintf(script, \%s/vacuumdb\ --all %s\n, new_cluster.bindir,
   /* Did we copy the free space files? */
   (GET_MAJOR_VERSION(old_cluster.major_version) = 804) ?
   --analyze-only : --analyze);

 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.1] 2 bugs with extensions

2012-10-02 Thread Alvaro Herrera
Excerpts from Dimitri Fontaine's message of lun oct 01 04:44:28 -0300 2012:
 Hi,
 
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Same for 9.2, attached. master needs yet another patch because of the
  recent headers reorg, it seems, that's for another day though.
 
  No, just remove the RELKIND_UNCATALOGUED case in that switch.
 
 Oh. As in the attached? :)

That seems to work, yes.

While testing this out I noticed that this silently does nothing:

SELECT pg_catalog.pg_extension_config_dump('my_config', NULL);

i.e. the table is not marked as configuration but no error or warning
appears, either.  This seems rather surprising.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PQping command line tool

2012-10-02 Thread Phil Sorber
I was wondering recently if there was any command line tool that
utilized PQping() or PQpingParams(). I searched the code and couldn't
find anything and was wondering if there was any interest to have
something like this included? I wrote something for my purposes of
performing a health check that also supports nagios style status
output. It's probably convenient for scripting purposes as well. It's
not currently ready for submission to a commitfest, but if there was
an interest I would clean it up so that it would be.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Daniel Farina
On Mon, Oct 1, 2012 at 12:57 AM, Magnus Hagander mag...@hagander.net wrote:
 Can we please expose the internal hash id of the statements in
 pg_stat_statements?

 I know there was discussions about it earlier, and it wasn't done with
 an argument of it not being stable between releases (IIRC). I think we
 can live with that drawback, assuming of course that we document this
 properly.

 I've now run into multiple customer installations where it would be
 very useful to have. The usecase is mainly storing snapshots of the
 pg_stat_statements output over time and analyzing those. Weird things
 happen for example when the query text is the same, but the hash is
 different (which can happen for example when a table is dropped and
 recreated). And even without that, in order to do anything useful with
 it, you end up hashing the query text anyway - so using the already
 existing hash would be easier and more useful.

I have a similar problem, however, I am not sure if the hash generated
is ideal.  Putting aside the number of mechanical, versioning,
shut-down/stats files issues, etc reasons given in the main branch of
the thread, I also have this feeling that it is not what I want.
Consider the following case:

SELECT * FROM users WHERE id = ?

this query isn't seen for a while

SELECT * FROM users WHERE id = ?

In the intervening time, an equivalent hash could still be evicted and
reintroduced and the statistics silently reset, and that'll befuddle
principled tools.  This is worse than merely less-useful, because it
can lead to drastic underestimations that otherwise pass inspection.

Instead, I think it makes sense to assign a number -- arbitrarily, but
uniquely -- to the generation of a new row in pg_stat_statements, and,
on the flip side, whenever a row is retired its number should be
eliminated, practically, for-ever.  This way re-introductions between
two samplings of pg_stat_statements cannot be confused for a
contiguously maintained statistic on a query.

-- 
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers