Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)

2013-06-21 Thread KONDO Mitsumasa

Hi Febien,

I send you my review result and refactoring patch. I think that your patch has 
good function and many people surely want to use! I hope that my review comment 
will be good for your patch.



* 1. Complete words and variable in source code and sgml document.
It is readable for user and developper that new patch completes words and 
variables in existing source code. For example, SECONDS is NUM etc.


* 2. Output format in result for more readable.
I think taht output format should be simple and intuitive. Your patch's output 
format is not easy to read very much. So I fix simple format and add Average 
latency. My proposed format is good for readable and programing processing.



[mitsu-ko@localhost postgresql]$ bin/pgbench -T10 -P5 -c2 -j2
starting vacuum...end.
5.0 s[thread 1]: tps = 1015.576032, AverageLatency(ms) = 0.000984663
5.0 s[thread 0]: tps = 1032.580794, AverageLatency(ms) = 0.000968447
10.0 s [thread 0]: tps = 1129.591189, AverageLatency(ms) = 0.000885276
10.0 s [thread 1]: tps = 1126.267776, AverageLatency(ms) = 0.000887888


However, interesting of output format(design) is different depending on the 
person:-). If you like other format, fix it.



* 3. Thread name in output format is not nesessary.
I cannot understand that thread name is displayed in each progress. I think that 
it does not need. I hope that output result sould be more simple also in a lot of 
thread. My images is here,



[mitsu-ko@localhost postgresql]$ bin/pgbench -T10 -P5 -c2 -j2
starting vacuum...end.
5.0 s: tps = 2030.576032, AverageLatency(ms) = 0.000984663
10.0 s : tps = 2250.591189, AverageLatency(ms) = 0.000885276


This output format is more simple and intuitive. If you need result in each 
threads, please tell us the reason.



* 4. Slipping the progress time.
Whan I executed this patch in long time, I found slipping the progress time. This 
problem image is here.


 [mitsu-ko@localhost postgresql]$ bin/pgbench -T10 -P5 -c2
 starting vacuum...end.
 5.1 s   : tps = 2030.576032, AverageLatency(ms) = 0.000984663
 10.2 s : tps = 2250.591189, AverageLatency(ms) = 0.000885276

It has problem in method of calculate progress time. It needs to fix collect, or 
displaying time format will be like '13:00:00'. If you select later format, it 
will fit in postgresql log and other contrib modules that are like 
pg_stat_statements.



Best regards,
--
Mitsumasa KONDO
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 1303217..32805ea 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -164,6 +164,7 @@ bool		use_log;			/* log transaction latencies to a file */
 bool		use_quiet;			/* quiet logging onto stderr */
 int			agg_interval;		/* log aggregates instead of individual
  * transactions */
+int			progress = 0;   /* thread progress report every this seconds */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
@@ -354,6 +355,8 @@ usage(void)
 		  protocol for submitting queries to server (default: simple)\n
 		 -n   do not run VACUUM before tests\n
 		 -N   do not update tables \pgbench_tellers\ and \pgbench_branches\\n
+		 -P NUM, --progress NUM\n
+		  show thread progress report about every NUM seconds\n
 		 -r   report average latency per command\n
 		 -s NUM   report this scale factor in output\n
 		 -S   perform SELECT-only transactions\n
@@ -2115,6 +2118,7 @@ main(int argc, char **argv)
 		{unlogged-tables, no_argument, unlogged_tables, 1},
 		{sampling-rate, required_argument, NULL, 4},
 		{aggregate-interval, required_argument, NULL, 5},
+		{progress, required_argument, NULL, 'P'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -2181,7 +2185,7 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
-	while ((c = getopt_long(argc, argv, ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:, long_options, optindex)) != -1)
+	while ((c = getopt_long(argc, argv, ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:P:, long_options, optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -2336,6 +2340,16 @@ main(int argc, char **argv)
 	exit(1);
 }
 break;
+			case 'P':
+progress = atoi(optarg);
+if (progress = 0)
+{
+	fprintf(stderr,
+	   thread progress delay (-P) must not be negative (%s)\n,
+			optarg);
+	exit(1);
+}
+break;
 			case 0:
 /* This covers long options which take no argument. */
 break;
@@ -2712,6 +2726,10 @@ threadRun(void *arg)
 	int			nstate = thread-nstate;
 	int			remains = nstate;		/* number of remaining clients */
 	int			i;
+	/* for reporting progress in benchmark result */
+	int64		start_report = INSTR_TIME_GET_MICROSEC(thread-start_time);
+	int64		last_report = start_report;
+	int64		last_count = 0;
 
 	AggVals		aggs;
 
@@ -2875,6 

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-06-21 Thread Dean Rasheed
On 21 June 2013 06:54, David Fetter da...@fetter.org wrote:
 For example SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS file

 The spec is pretty specific about the all or none nature of naming
 in the AS clause...unless we can figure out a way of getting around it
 somehow.

We already support and document the ability to provide a subset of the
columns in the alias. I didn't realise that was beyond the spec, but
since we already have it...


 I'm weighing in on the side of a name that's like ?columnN? elsewhere
 in the code, i.e. one that couldn't sanely be an actual column name,
 whether in table, view, or SRF.

I don't think we need to be overly concerned with coming up with a
unique name for the column. Duplicate column names are fine, except if
the user wants to refer to them elsewhere in the query, in which case
they need to provide aliases to distinguish them, otherwise the query
won't be accepted.

I'd be happy if you just replaced ?column? with ordinality in a
couple of places in your original patch.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Hitoshi Harada
On Thu, Jun 20, 2013 at 3:40 PM, MauMau maumau...@gmail.com wrote:

  Here, reliable means that the database server is certainly shut
 down when pg_ctl returns, not telling a lie that I shut down the
 server processes for you, so you do not have to be worried that some
 postgres process might still remain and write to disk.  I suppose
 reliable shutdown is crucial especially in HA cluster.  If pg_ctl
 stop -mi gets stuck forever when there is an unkillable process (in
 what situations does this happen? OS bug, or NFS hard mount?), I
 think the DBA has to notice this situation from the unfinished
 pg_ctl, investigate the cause, and take corrective action.


 So you're suggesting that keeping postmaster up is a useful sign that
 the shutdown is not going well?  I'm not really sure about this.  What
 do others think?


 I think you are right, and there is no harm in leaving postgres processes
 in unkillable state.  I'd like to leave the decision to you and/or others.


+1 for leaving processes, not waiting for long (or possibly forever,
remember not all people are running postgres on such cluster ware).  I'm
sure some users rely on the current behavior of immediate shutdown.  If
someone wants to ensure processes are finished when pg_ctl returns, is it
fast shutdown, not immediate shutdown?  To me the current immediate
shutdown is reliable, in that it without doubt returns control back to
terminal, after killing postmaster at least.

Thanks,
-- 
Hitoshi Harada


Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-06-21 Thread Dean Rasheed
On 21 June 2013 08:02, Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 21 June 2013 06:54, David Fetter da...@fetter.org wrote:
 For example SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS file

 The spec is pretty specific about the all or none nature of naming
 in the AS clause...unless we can figure out a way of getting around it
 somehow.

 We already support and document the ability to provide a subset of the
 columns in the alias. I didn't realise that was beyond the spec, but
 since we already have it...


 I'm weighing in on the side of a name that's like ?columnN? elsewhere
 in the code, i.e. one that couldn't sanely be an actual column name,
 whether in table, view, or SRF.

 I don't think we need to be overly concerned with coming up with a
 unique name for the column. Duplicate column names are fine, except if
 the user wants to refer to them elsewhere in the query, in which case
 they need to provide aliases to distinguish them, otherwise the query
 won't be accepted.

 I'd be happy if you just replaced ?column? with ordinality in a
 couple of places in your original patch.


Expanding on that, I think it's perfectly acceptable to allow
potentially duplicate column names in this context. For the majority
of simple queries the user wouldn't have to (and wouldn't feel
compelled to) supply aliases. Where there was ambiguity they would be
forced to do so, but people are already used to that.

If I write a simple query that selects from a single unnest() with or
without ordinality, I probably won't want to supply aliases.

If I select from 2 unnest()'s *without* ordinality, I already have to
provide aliases.

If I select from 2 SRF's functions with ordinality, I won't be too
surprised if I am also forced to provide aliases.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Frontend/backend protocol improvements proposal (request).

2013-06-21 Thread Albe Laurenz
Dmitriy Igrishin wrote:
 Sent: Thursday, June 20, 2013 5:09 PM
 To: PostgreSQL Hackers
 Subject: [HACKERS] Frontend/backend protocol improvements proposal (request).
 
 Hackers,
 
 While developing a C++ client library for Postgres I felt lack of extra
 information in command tags in the CommandComplete (B) message
 for the following commands:
   PREPARE;
   DEALLOCATE;
   DECLARE;
   CLOSE;
   LISTEN;
   UNLISTEN;
   SET;
   RESET.
 Namely, for example, users of my library can prepare statements by using
 protocol directly or via PREPARE command. Since the protocol does not
 supports prepared statement deallocation, I wrote a wrapper over DEALLOCATE
 command. The library knows about all prepared statements and
 invalidates them automatically when user performs deallocate() wrapper.
 But users can go with DEALLOCATE command directly and in these cases
 I need to query the database to get the list of currently prepared statements
 whenever CommandComplete message with DEALLOCATE command tag
 is consumed. Moreover, I need to do it *synchronously* and this breaks
 asynchronous API.
 I propose to include name of the object in the CommandComplete (B)
 message for the above commands.

That would be a change in the protocol, so it's not likely to happen
soon.  There is a page where proposed changes to the wire protocol
are collected: http://wiki.postgresql.org/wiki/Todo#Wire_Protocol_Changes

It seems like bad design to me to keep a list of prepared statements
on the client side when it is already kept on the server side
(accessible with the pg_prepared_statements view).

What's wrong with the following:
If the user wants to deallocate an individual prepared statement,
just send DEALLOCATE statement name to the server.  If the
statement does not exist, the server will return an error.
If the user wants to deallocate all statements, just send
DEALLOCATE ALL.
Why do you need to track prepared statements on the client side?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Possible bug in CASE evaluation

2013-06-21 Thread Albe Laurenz
I want to draw attention to this thread on -general:
camq5dgq4sujpbht2-9xlapasvknul2-bb0cpyci2fp+pfsf...@mail.gmail.com

Would you concur that this is a bug?

The fine manual says about CASE:

  If the condition's result is true, the value of the CASE expression
  is the result that follows the condition, and the remainder of the
  CASE expression is not processed.

But:

test= CREATE FUNCTION zero() RETURNS integer IMMUTABLE LANGUAGE SQL AS 'SELECT 
0';
CREATE FUNCTION
test= SELECT CASE WHEN (SELECT zero()) = 0 THEN -1 ELSE 1/zero() END;
ERROR:  division by zero

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-06-21 Thread KONDO Mitsumasa

Hi,

I took results of my separate patches and original PG.

* Result of DBT-2
  | TPS  90%tileAverage  Maximum
--
original_0.7  | 3474.62  18.348328  5.73936.977713
original_1.0  | 3469.03  18.637865  5.84241.754421
fsync | 3525.03  13.872711  5.38228.062947
write | 3465.96  19.653667  5.80440.664066
fsync + write | 3564.94  16.31922   5.1  34.530766

 - 'original_*' indicates checkpoint_completion_target in PG 9.2.4.
 - In other patched postgres, checkpoint_completion_target sets 0.7.
 - 'write' is applied write patch, and 'fsync' is applied fsync patch.
 - 'fsync + write' is applied both patches.


* Investigation of result
 - Large value of checkpoint_completion_target in original and the patch in 
write become slow latency in benchmark transactions. Because slow write pages are 
caused long time fsync IO in final checkpoint.
 - The patch in fsync has an effect latency in each file fsync. Continued 
fsyncsin each files are caused slow latency. Therefore, it is good for latency 
that fsync stage in checkpoint has sleeping time after slow fsync IO.
 - The patches of fsync + write were seemed to improve TPS. I think that write 
patch does not disturb transactions which are in full-page-write WAL write than 
original(plain) PG.


I will send you more detail investigation and result next week. And I will also 
take result in pgbench. If you mind other part of benchmark result or parameter 
of postgres, please tell me.


Best Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Frontend/backend protocol improvements proposal (request).

2013-06-21 Thread Dmitriy Igrishin
2013/6/21 Albe Laurenz laurenz.a...@wien.gv.at

 Dmitriy Igrishin wrote:
  Sent: Thursday, June 20, 2013 5:09 PM
  To: PostgreSQL Hackers
  Subject: [HACKERS] Frontend/backend protocol improvements proposal
 (request).
 
  Hackers,
 
  While developing a C++ client library for Postgres I felt lack of extra
  information in command tags in the CommandComplete (B) message
  for the following commands:
PREPARE;
DEALLOCATE;
DECLARE;
CLOSE;
LISTEN;
UNLISTEN;
SET;
RESET.
  Namely, for example, users of my library can prepare statements by using
  protocol directly or via PREPARE command. Since the protocol does not
  supports prepared statement deallocation, I wrote a wrapper over
 DEALLOCATE
  command. The library knows about all prepared statements and
  invalidates them automatically when user performs deallocate() wrapper.
  But users can go with DEALLOCATE command directly and in these cases
  I need to query the database to get the list of currently prepared
 statements
  whenever CommandComplete message with DEALLOCATE command tag
  is consumed. Moreover, I need to do it *synchronously* and this breaks
  asynchronous API.
  I propose to include name of the object in the CommandComplete (B)
  message for the above commands.

 That would be a change in the protocol, so it's not likely to happen
 soon.  There is a page where proposed changes to the wire protocol
 are collected: http://wiki.postgresql.org/wiki/Todo#Wire_Protocol_Changes

Well, even if this proposal moves to the TODO, it would be nice.



 It seems like bad design to me to keep a list of prepared statements
 on the client side when it is already kept on the server side
 (accessible with the pg_prepared_statements view).


 What's wrong with the following:
 If the user wants to deallocate an individual prepared statement,
 just send DEALLOCATE statement name to the server.  If the
 statement does not exist, the server will return an error.
 If the user wants to deallocate all statements, just send
 DEALLOCATE ALL.
 Why do you need to track prepared statements on the client side?

Nothing wrong if the user wants to deal with scary and cumbersome code.
As library author, I want to help people make things simpler.
To understand me, please look at the pseudo C++ code below.

// A class designed to work with prepared statements
class Prepared_statement {
public:
  // Methods to generate a Bind message, like
  Prepared_statement* bind(Position, Value);
  // ... and more
  // Methods to send Execute message, like
  void execute();
  void execute_async();
};

class Connection {
public:
  // many stuff ...
  void close();

  Prepared_statement* prepare(Name, Query);
  void prepare_async(Statement);

  // Make yet another instance of prepared statement.
  Prepared_statement* prepared_statement(Name);

  // etc.
};

The Connection class is a factory for Prepared_statement instances.
As you can see, the Connection::prepare() returns new instance of
*synchronously* prepared statement. Next, the user can bind values
and execute the statement, like this:

void f(Connection* cn)
{
  // Prepare unnamed statement and execute it.
  cn-prepare(SELECT $1::text)-bind(0, Albe)-execute();
  // Ps: don't worry about absence of delete; We are using smart pointers
:-)
}

But there is a another possible case:

void f(Connection* cn)
{
  Prepared_statement* ps = cn-prepare(SELECT $1::text);
  cn-close(); // THIS SHOULD invalidate all Prepared_statement instances
...
  ps-bind(0, Albe); // ... to throw the exception here
}

Moreover, consider:

void f(Connection* cn)
{
  Prepared_statement* ps1 = cn-prepare(ps1, SELECT $1::text);
  cn-deallocate(ps1); // THIS invalidates ps1 object...
  ps1-bind(0, Albe); // ... to throw the exception here

  Prepared_statement* ps2 = cn-prepare(ps2, SELECT $1::text);
  cn-perform(DEALLOCATE ps2); // THIS SHOULD ALSO invalidate ps2
object...
  ps2-bind(0, Albe); // ... to throw the exception here
}

In the latter case when the user deallocates named prepared statement
directly,
the implementation of Connection can invalidates the prepared statement
(ps2) by
analyzing and parsing CommandComplete command tag to get it's name.

And please note, that the user can send DEALLOCATE asynchronously. And
there is
only two ways to get the prepared statement (or another session object's)
name:
  1) Parse the SQL command which the user is attempts to send;
  2) Just get it from CommandComplete command tag.

I beleive that the 1) is a 100% bad idea.

PS: this C++11 library is not publicaly available yet, but I hope it will
this year.

-- 
// Dmitriy.


Re: [HACKERS] Possible bug in CASE evaluation

2013-06-21 Thread Andres Freund
Hi,


On 2013-06-21 08:16:22 +, Albe Laurenz wrote:
 I want to draw attention to this thread on -general:
 camq5dgq4sujpbht2-9xlapasvknul2-bb0cpyci2fp+pfsf...@mail.gmail.com

There's also a bug reported for it:
#8237: e1uovmc-0007ft...@wrigleys.postgresql.org

 Would you concur that this is a bug?

Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
though. If we can agree it is, the fix outlined over on -bugs seems to
be easily enough implemented...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]

2013-06-21 Thread Dean Rasheed
On 21 June 2013 05:01, David Fetter da...@fetter.org wrote:
 What tests do you think should be there that aren't?


I think I expected to see more tests related to some of the specific
code changes, such as

CREATE TABLE t AS SELECT * FROM generate_series(1,10) t(x);

-- Should fail (filter can't be used for non-aggregates)
SELECT abs(x) FILTER (WHERE x  5) FROM t;

-- Should fail (filter clause can't contain aggregates)
SELECT array_agg(x) FILTER (WHERE x  AVG(x)) t;

-- OK (aggregate in filter sub-query)
SELECT array_agg(x) FILTER (WHERE x  (SELECT AVG(x) FROM t)) FROM t;

-- OK (trivial sub-query)
SELECT array_agg(x) FILTER (WHERE (SELECT x  5)) FROM t;

-- OK
SELECT array_agg(x) FILTER (WHERE (SELECT x  (SELECT AVG(x) FROM t))) FROM t;

-- OK
SELECT array_agg(x) FILTER (WHERE (SELECT (SELECT t.x  AVG(t2.x) FROM
t as t2))) FROM t;

-- Should fail
SELECT array_agg(x) FILTER (WHERE (SELECT (SELECT 5  AVG(t.x) FROM t
as t2))) FROM t;

-- OK (filtered aggregate in window context)
SELECT x, AVG(x) OVER(ORDER BY x), AVG(x) FILTER (WHERE x  5)
OVER(ORDER BY x) FROM t;

-- Should fail (non-aggregate window function with filter)
SELECT x, rank() OVER(ORDER BY x), rank() FILTER (WHERE x  5)
OVER(ORDER BY x) FROM t;

-- View definition test
CREATE VIEW v AS SELECT array_agg(x) FILTER (WHERE (SELECT (SELECT t.x
 AVG(t2.x) FROM t as t2))) FROM t;
\d+ v

etc...

You could probably dream up better examples --- I just felt that the
current tests don't give much coverage of the code changes.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]

2013-06-21 Thread Dean Rasheed
On 21 June 2013 06:16, David Fetter da...@fetter.org wrote:
 On Fri, Jun 21, 2013 at 12:10:25AM -0400, Alvaro Herrera wrote:
 David Fetter escribió:
  On Thu, Jun 20, 2013 at 08:59:27PM +0100, Dean Rasheed wrote:

   In my testing of sub-queries in the FILTER clause (an extension to the
   spec), I was able to produce the following error:
 
  Per the spec,
 
  B) A filter clause shall not contain a query expression, a window 
  function, or an outer reference.

 If this is not allowed, I think there should be a clearer error message.
 What Dean showed is an internal (not user-facing) error message.

 Please find attached a patch which allows subqueries in the FILTER
 clause and adds regression testing for same.


Nice!

I should have pointed out that it was already working for some
sub-queries, just not all. It's good to see that it was basically just
a one-line fix, because I think it would have been a shame to not
support sub-queries.

I hope to take another look at it over the weekend.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-21 Thread Andres Freund
On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
 Please find an updated patch. The regression test rules has been
 updated, and all the comments are addressed.
 
 On Tue, Jun 18, 2013 at 6:35 PM, Andres Freund and...@2ndquadrant.com wrote:
  Hi,
 
  On 2013-06-18 10:53:25 +0900, Michael Paquier wrote:
  diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
  index c381f11..3a6342c 100644
  --- a/contrib/pg_upgrade/info.c
  +++ b/contrib/pg_upgrade/info.c
  @@ -321,12 +321,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
  INSERT INTO 
  info_rels 
  SELECT 
  reltoastrelid 
  FROM info_rels i 
  JOIN pg_catalog.pg_class c 
  -ON 
  i.reloid = c.oid));
  +ON 
  i.reloid = c.oid 
  +AND 
  c.reltoastrelid != %u, InvalidOid));
PQclear(executeQueryOrDie(conn,
  INSERT INTO 
  info_rels 
  -   SELECT 
  reltoastidxid 
  -   FROM info_rels i 
  JOIN pg_catalog.pg_class c 
  -ON 
  i.reloid = c.oid));
  +   SELECT indexrelid 
  
  +   FROM pg_index 
  +   WHERE indrelid IN 
  (SELECT reltoastrelid 
  + FROM 
  pg_class 
  + WHERE 
  oid = %u 
  +AND 
  reltoastrelid != %u),
  +   
  FirstNormalObjectId, InvalidOid));
 
  What's the idea behind the = here?
 It is here to avoid fetching the toast relations of system tables. But
 I see your point, the inner query fetching the toast OIDs should do a
 join on the exising info_rels and not try to do a join on a plain
 pg_index, so changed this way.

I'd also rather not introduce knowledge about FirstNormalObjectId into
client applications... But you fixed it already.


/* Clean up. */
heap_freetuple(reltup1);
  @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel-rd_rel-reltoastrelid))
{
Relationtoastrel;
  - Oid toastidx;
charNewToastName[NAMEDATALEN];
  + ListCell   *lc;
  + int count = 0;
 
toastrel = 
  relation_open(newrel-rd_rel-reltoastrelid,
 
  AccessShareLock);
  - toastidx = toastrel-rd_rel-reltoastidxid;
  + RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
 
/* rename the toast table ... */
  @@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel-rd_rel-reltoastrelid,
   
  NewToastName, true);
 
  - /* ... and its index too */
  - snprintf(NewToastName, NAMEDATALEN, 
  pg_toast_%u_index,
  -  OIDOldHeap);
  - RenameRelationInternal(toastidx,
  -
  NewToastName, true);
  + /* ... and its indexes too */
  + foreach(lc, toastrel-rd_indexlist)
  + {
  + /*
  +  * The first index keeps the former toast 
  name and the
  +  * following entries have a suffix appended.
  +  */
  + if (count == 0)
  + snprintf(NewToastName, NAMEDATALEN, 
  pg_toast_%u_index,
  +  OIDOldHeap);
  + else
  + snprintf(NewToastName, NAMEDATALEN, 
  pg_toast_%u_index_%d,
  +  OIDOldHeap, count);
  + RenameRelationInternal(lfirst_oid(lc),
  +

Re: [HACKERS] Possible bug in CASE evaluation

2013-06-21 Thread Albe Laurenz
Andres Freund wrote:
 On 2013-06-21 08:16:22 +, Albe Laurenz wrote:
  I want to draw attention to this thread on -general:
  camq5dgq4sujpbht2-9xlapasvknul2-bb0cpyci2fp+pfsf...@mail.gmail.com
 
 There's also a bug reported for it:
 #8237: e1uovmc-0007ft...@wrigleys.postgresql.org
 
  Would you concur that this is a bug?
 
 Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
 though. If we can agree it is, the fix outlined over on -bugs seems to
 be easily enough implemented...

I think that it is surprising behaviour.

If fixing the behaviour is undesirable, at least the documentation
should be fixed.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2013-06-21 Thread Dimitri Fontaine
Hi,

Peter Eisentraut pete...@gmx.net writes:
 I think the suggested emacs configuration snippets in
 src/tools/editors/emacs.samples no longer represent current best
 practices.  I have come up with some newer things that I'd like to
 propose for review.

Thanks for doing that!

 First, I propose adding a .dir-locals.el file to the top-level directory
 with basic emacs settings.  These get applied automatically.  This
 especially covers the particular tab and indentation settings that
 PostgreSQL uses.  With this, casual developers will not need to modify
 any of their emacs settings.

I've tested that on a new git clone and with the `emacs -q` command so
as not to load any of my local setup. While the indentation seemed ok,
the placement of the comments seems way off:

Compare what you see using those commands:

emacs -q src/backend/commands/extension.c
emacs -q -l ../emacs.samples src/backend/commands/extension.c

(When using macosx, you might have to replace the 'emacs' binary
 location with /Applications/Emacs.app/Contents/MacOS/Emacs).

I did also test on doc/src/sgml/extend.sgml and some Makefile, only with
using the emacs.samples file content though.

 With that, emacs.samples can be shrunk significantly.  The only real
 reason to keep is that that c-offsets-alist and (more dubiously)
 sgml-basic-offset cannot be set from .dir-locals.el because they are not
 safe.  I have also removed many of the redundant examples and settled
 on a hook-based solution.

A couple of notes about your emacs.sample file:

  - Name the lambda used in the hook for easier removing / reference

  - A fresh git clone will create a directory named postgres, so I did
change your /postgresql/ regex to /postgres/ in my attached version

 I think together this setup would be significantly simpler and more
 practical.

Agreed.
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

;; -*- mode: emacs-lisp -*-

;; This file contains code to set up Emacs to edit PostgreSQL source
;; code.  Copy these snippets into your .emacs file or equivalent, or
;; use load-file to load this file directly.
;;
;; Note also that there is a .dir-locals.el file at the top of the
;; PostgreSQL source tree, which contains many of the settings shown
;; here.  So for light editing, you might not need any additional
;; Emacs configuration.


;;; C files

;; Style that matches the formatting used by
;; src/tools/pgindent/pgindent.  Many extension projects also use this
;; style.
(c-add-style postgresql
 '(bsd
   (c-basic-offset . 4)
   (c-offsets-alist . ((case-label . +)))
   (fill-column . 79)
   (indent-tabs-mode . t)
   (tab-width . 4)))

(add-hook 'c-mode-hook
  (defun postgresql-c-mode-hook ()
(when (string-match /postgres/ buffer-file-name)
  (c-set-style postgresql


;;; documentation files

(add-hook 'sgml-mode-hook
   (defun postgresql-sgml-mode-hook ()
 (when (string-match /postgres/ buffer-file-name)
   (setq fill-column 79)
   (setq indent-tabs-mode nil)
   (setq sgml-basic-offset 1


;;; Makefiles

;; use GNU make mode instead of plain make mode
(add-to-list 'auto-mode-alist '(/postgres/.*Makefile.* . makefile-gmake-mode))
(add-to-list 'auto-mode-alist '(/postgres/.*\\.mk\\' . makefile-gmake-mode))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] refresh materialized view concurrently

2013-06-21 Thread Hitoshi Harada
On Fri, Jun 14, 2013 at 9:05 AM, Kevin Grittner kgri...@ymail.com wrote:

 Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for
 9.4 CF1.  The goal of this patch is to allow a refresh without
 interfering with concurrent reads, using transactional semantics.


I spent a few hours to review the patch.

As far as I can tell, the overall approach is as follows.

- create a new temp heap as non-concurrent does, but with ExclusiveLock on
the matview, so that reader wouldn't be blocked
- with this temp table and the matview, query FULL JOIN and extract
difference between original matview and temp heap (via SPI)
- this operation requires unique index for performance reason (or
correctness reason too)
- run ANALYZE on this diff table
- run UPDATE, INSERT and DELETE via SPI, to do the merge
- close these temp heap

If I don't miss something, the requirement for the CONCURRENTLY option is
to allow simple SELECT reader to read the matview concurrently while the
view is populating the new data, and INSERT/UPDATE/DELETE and SELECT FOR
UPDATE/SHARE are still blocked.  So, I wonder why it is not possible just
to acquire ExclusiveLock on the matview while populating the data and swap
the relfile by taking small AccessExclusiveLock.  This lock escalation is
no dead lock hazard, I suppose, because concurrent operation would block
the other at the point ExclusiveLock is acquired, and ExclusiveLock
conflicts AccessExclusiveLock.  Then you don't need the complicated SPI
logic or unique key index dependency.

Assuming I'm asking something wrong and going for the current approach,
here's what I've found in the patch.

- I'm not sure if unique key index requirement is reasonable or not,
because users only add CONCURRENTLY option and not merging or incremental
update.
- This could be an overflow in diffname buffer.
+ quoteRelationName(tempname, tempRel);
+ strcpy(diffname, tempname);
+ strcpy(diffname + strlen(diffname) - 1, _2\);
- As others pointed out, quoteOneName can be replaced with quote_identifier
- This looks harmless, but I wonder if you need to change relkind.

*** 665,672  make_new_heap(Oid OIDOldHeap, Oid NewTableSpace)
OldHeap-rd_rel-relowner,
OldHeapDesc,
NIL,
!   OldHeap-rd_rel-relkind,
!   OldHeap-rd_rel-relpersistence,
false,
RelationIsMapped(OldHeap),
true,
--- 680,687 
OldHeap-rd_rel-relowner,
OldHeapDesc,
NIL,
!   RELKIND_RELATION,
!   relpersistence,
false,
RelationIsMapped(OldHeap),
true,

Since OldHeap-rd_rel-relkind has been working with 'm', too, not only 'r'?
- I found two additional parameters on make_new_heap ugly, but couldn't
come up with better solution.  Maybe we can pass Relation of old heap to
the function instead of Oid..

That's about it from me.


Thanks,
-- 
Hitoshi Harada


Re: [HACKERS] Support for RANGE ... PRECEDING windows in OVER

2013-06-21 Thread Hitoshi Harada
On Thu, Jun 20, 2013 at 7:24 PM, Craig Ringer cr...@2ndquadrant.comwrote:

 I've missed this feature more than once, and am curious about whether
 any more recent changes may have made it cleaner to tackle this, or
 whether consensus can be formed on adding the new entries to btree's
 opclass to avoid the undesirable explicit lookups of the '+' and '-'
 oprators.




As far as I know the later development didn't add anything to help this
conversation.  I initially thought range type or knn gist would add
something, but they were something else far from this.  On the other hand,
if this makes it, it'll also open doors to range PARTITION BY for CREATE
TABLE command, so the impact will be bigger than you may think.

I also later found that we are missing not only notion of '+' or '-', but
also notion of 'zero value' in our catalog.  Per spec, RANGE BETWEEN needs
to detect ERROR if the offset value is negative, but it is not always easy
if you think about interval, numeric types as opposed to int64 used in ROWS
BETWEEN.

Thanks,
-- 
Hitoshi Harada


Re: [HACKERS] Config reload/restart preview

2013-06-21 Thread Thom Brown
On 21 June 2013 05:47, Gurjeet Singh gurj...@singh.im wrote:

 On Thu, Jun 20, 2013 at 9:33 AM, Magnus Hagander mag...@hagander.netwrote:

 On Thu, Jun 20, 2013 at 2:54 PM, Dimitri Fontaine
 dimi...@2ndquadrant.fr wrote:
  Magnus Hagander mag...@hagander.net writes:
  Should we have a way of previewing changes that would be applied if we
  reloaded/restarted the server?
 
  Yes, we should.
 
  +1
 
  This would go well with something I started working on some time ago
  (but haven't actually gotten far on at all), which is the ability for
  pg_ctl to be able to give feedback at all. Meaning a pg_ctl reload
  should also be able to tell you which parameters were changed, without
  having to go to the log. Obviously that's almost exactly the same
  feature.
 
  It could probably connect to the server and issue the SQL command to
  reload, and that one could probably get enhanced to return modified
  variable as NOTICE, or be run with the right client_min_message:
 
  SELECT pg_reload_conf();
 
  The pg_ctl client would then have to know to display the messages sent
  back by the server.

 The problem with that is that now you must *always* have your system
 set up to allow the postgres user to log in in pg_hba.conf or it
 fails.

 But yes, one option would be to use SQL instead of opening a socket.
 Maybe that's a better idea - have pg_ctl try to use that if available,
 and if not send a regular signal and not try to collect the output.


 I started working on it yesterday after Thom proposed this idea internally
 at EDB. The discussion until now doesn't seem to be hostile to a SQL
 interface, so attached is a hack attempt, which hopefully will serve at
 least as a POC. A sample session is shown below, while changing a few
 values in postgresql.conf files.

 The central idea is to use the SIGHUP processing function to do the work
 for us and report potential changes via DEBUG2, instead of having to write
 a new parsing engine. The (GUC-nesting + PGC_S_TEST) is nice to have since
 it avoids the current session from adopting the values that are different
 in conf file. This approach is susceptible to the fact that the connected
 superuser may have its GUC values picked up from user/database/session
 level settings (ALTER USER/DATABASE .. SET ; or SET param TO val;).

 $ pgsql
 Expanded display is used automatically.
 psql (9.4devel)
 Type help for help.

 postgres=# show work_mem;
  work_mem
 --
  1MB
 (1 row)

 postgres=# set client_min_messages = debug2;
 SET
 postgres=# select pg_test_reload_conf();
 DEBUG:  parameter work_mem changed to 70MB
  pg_test_reload_conf
 -
  t
 (1 row)

 postgres=# show work_mem;
  work_mem
 --
  1MB
 (1 row)

 postgres=# select pg_test_reload_conf();
 DEBUG:  parameter shared_buffers cannot be changed without restarting
 the server
 DEBUG:  configuration file
 /home/gurjeet/dev/pgdbuilds/report_guc_chanege_pre_reload/db/data/postgresql.conf
 contains errors; unaffected changes were applied
  pg_test_reload_conf
 -
  t
 (1 row)

 postgres=# select pg_test_reload_conf();
 DEBUG:  parameter log_min_messages removed from configuration file,
 reset to default
  pg_test_reload_conf
 -
  t
 (1 row)


Thanks for taking a look at this Gurjeet.  This seems to be a step towards
it, but there are a few issues:

1) As you say, if the superuser has role-level settings, it will provide
incorrect feedback.

2) Settings that require a restart don't tell you what there are currently
set to and what they would be set to.  I'm not sure the currently-logged
text provided by a SIGHUP is necessarily appropriate for such a feature.

3) I'd expect that a DBA that goes editing postgresql.conf wouldn't then go
logging in to the database to run this command, if they even knew it
existed.  Whereas they would typically be familiar with
/etc/init.d/postgresql action or pg_ctl action.  Now if the SQL
function was a medium to achieve this, that would be fine, but I'm not
clear on what would be required technically to achieve the kind of
intuitive admin-friendly interface.  Or maybe there's a better method of
checking the config that I haven't considered.

-- 
Thom


[HACKERS] Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Andres Freund
On 2013-06-20 22:36:45 -0400, Alvaro Herrera wrote:
 Noah Misch escribió:
  On Thu, Jun 20, 2013 at 12:33:25PM -0400, Alvaro Herrera wrote:
   MauMau escribi?:
Here, reliable means that the database server is certainly shut
down when pg_ctl returns, not telling a lie that I shut down the
server processes for you, so you do not have to be worried that some
postgres process might still remain and write to disk.  I suppose
reliable shutdown is crucial especially in HA cluster.  If pg_ctl
stop -mi gets stuck forever when there is an unkillable process (in
what situations does this happen? OS bug, or NFS hard mount?), I
think the DBA has to notice this situation from the unfinished
pg_ctl, investigate the cause, and take corrective action.
   
   So you're suggesting that keeping postmaster up is a useful sign that
   the shutdown is not going well?  I'm not really sure about this.  What
   do others think?
  
  It would be valuable for pg_ctl -w -m immediate stop to have the property
  that an subsequent start attempt will not fail due to the presence of some
  backend still attached to shared memory.  (Maybe that's true anyway or can 
  be
  achieved a better way; I have not investigated.)
 
 Well, the only case where a process that's been SIGKILLed does not go
 away, as far as I know, is when it is in some uninterruptible sleep due
 to in-kernel operations that get stuck.  Personally I have never seen
 this happen in any other case than some network filesystem getting
 disconnected, or a disk that doesn't respond.  And whenever the
 filesystem starts to respond again, the process gets out of its sleep
 only to die due to the signal.

Those are the situation in which it takes a really long time, yes. But
there can be timing issues involved. Consider a backend that's currently
stuck in a write() because it hit the dirtying limit.  Say you have a
postgres cluster that's currently slowing down to a crawl because it's
overloaded and hitting the dirty limit. Somebody very well might just
want to restart it with -m immediate. In that case a delay of a second
or two till enough dirty memory has been written that write() can
continue is enough for the postmaster to start up again and try to
attach to shared memory where it will find the shared memory to be still
in use.
I don't really see any argument for *not* waiting. Sure it might take a
bit longer till it's shut down, but if it didn't wait that will cause
problems down the road.

 If we leave postmaster running after SIGKILLing its children, the only
 thing we can do is have it continue to SIGKILL processes continuously
 every few seconds until they die (or just sit around doing nothing until
 they all die).  I don't think this will have a different effect than
 postmaster going away trusting the first SIGKILL to do its job
 eventually.

I think it should just wait till all its child processes are dead. No
need to repeat sending the signals - as you say, that won't help.



What we could do to improve the robustness a bit - at least on linux -
is to prctl(PR_SET_PDEATHSIG, SIGKILL) which would cause children to be
killed if the postmaster goes away...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] refresh materialized view concurrently

2013-06-21 Thread Hitoshi Harada
On Fri, Jun 21, 2013 at 2:20 AM, Hitoshi Harada umi.tan...@gmail.comwrote:




 On Fri, Jun 14, 2013 at 9:05 AM, Kevin Grittner kgri...@ymail.com wrote:

 Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for
 9.4 CF1.  The goal of this patch is to allow a refresh without
 interfering with concurrent reads, using transactional semantics.


 I spent a few hours to review the patch.


Oh, BTW, though it is not part of this patch, but I came across this.

! /*
!  * We're not using materialized views in the system catalogs.
!  */
  Assert(!IsSystemRelation(matviewRel));

Of course we don't have builtin matview on system catalog, but it is
possible to create such one by allow_system_table_mods=true, so Assert
doesn't look like good to me.

Thanks,
-- 
Hitoshi Harada


Re: [HACKERS] refresh materialized view concurrently

2013-06-21 Thread Andres Freund
On 2013-06-21 02:43:23 -0700, Hitoshi Harada wrote:
 On Fri, Jun 21, 2013 at 2:20 AM, Hitoshi Harada umi.tan...@gmail.comwrote:
 
 
 
 
  On Fri, Jun 14, 2013 at 9:05 AM, Kevin Grittner kgri...@ymail.com wrote:
 
  Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for
  9.4 CF1.  The goal of this patch is to allow a refresh without
  interfering with concurrent reads, using transactional semantics.
 
 
  I spent a few hours to review the patch.
 
 
 Oh, BTW, though it is not part of this patch, but I came across this.
 
 ! /*
 !  * We're not using materialized views in the system catalogs.
 !  */
   Assert(!IsSystemRelation(matviewRel));
 
 Of course we don't have builtin matview on system catalog, but it is
 possible to create such one by allow_system_table_mods=true, so Assert
 doesn't look like good to me.

Anybody using allow_system_table_mods gets to keep the pieces. There are
so many ways to break just about everything things using it that I don't
think worrying much makes sense.
If you want you could replace that by an elog(), but...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add session_preload_libraries configuration parameter

2013-06-21 Thread Dimitri Fontaine
Hi,

Peter Eisentraut pete...@gmx.net writes:
 This is like shared_preload_libraries except that it takes effect at
 backend start and can be changed without a full postmaster restart.  It
 is like local_preload_libraries except that it is still only settable by
 a superuser.  This can be a better way to load modules such as
 auto_explain.

I had a pretty hard time to get my head around that new one, and I'm not
sure I totally did. The important things to me are:

  - No need to manually copy the lib into the plugin directory,
  - ALTER ROLE support.

So basically it's a very good solution for auto_explain and any other
module you want to load eagerly but not for everyone and when not using
shared memory.

I have a feeling that something simpler could be made, but I will have
to continue thinking about it.

I found it strange that those two paras read differently for saying the
same thing?

 +preloaded at connection start.  This parameter cannot be changed 
 after
 +the start of a particular session.  If a specified library is not

 +The parameter value only takes effect at the start of the connection.
 +Subsequent changes have no effect.  If a specified library is not

Will review the code in more details, wanted to get back on the context
where this patch is useful first, and try to understand better the trade
offs involved.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible bug in CASE evaluation

2013-06-21 Thread Pavel Stehule
2013/6/21 Andres Freund and...@2ndquadrant.com:
 Hi,


 On 2013-06-21 08:16:22 +, Albe Laurenz wrote:
 I want to draw attention to this thread on -general:
 camq5dgq4sujpbht2-9xlapasvknul2-bb0cpyci2fp+pfsf...@mail.gmail.com

 There's also a bug reported for it:
 #8237: e1uovmc-0007ft...@wrigleys.postgresql.org

 Would you concur that this is a bug?

 Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
 though. If we can agree it is, the fix outlined over on -bugs seems to
 be easily enough implemented...

fix is not easy :-( you should to catch any possible exception, so it
means, so you should to do some optimalization under subtransactions -
or you should not do this kind of optimizations.

Pavel


 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for removng unused targets

2013-06-21 Thread Hitoshi Harada
On Thu, Jun 20, 2013 at 12:19 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
 wrote:

  From: Hitoshi Harada [mailto:umi.tan...@gmail.com]

  I guess the patch works fine, but what I'm saying is it might be limited
 to
  small use cases.  Another instance of this that I can think of is ORDER
 BY
 clause
  of window specifications, which you may want to remove from the target
 list
  as well, in addition to ORDER BY of query.  It will just not be removed
 by
 this
  approach, simply because it is looking at only parse-sortClause.
  Certainly
  you can add more rules to the new function to look at the window
 specification,
  but then I'm not sure what we are missing.

 Yeah, I thought the extension to the window ORDER BY case, too.  But I'm
 not
 sure it's worth complicating the code, considering that the objective of
 this
 optimization is to improve full-text search related things if I understand
 correctly, though general solutions would be desirable as you mentioned.


Ah, I see the use case now.  Makes sense.


  So, as it stands it doesn't have
  critical issue, but more generalized approach would be desirable.  That
 said,
  I don't have strong objection to the current patch, and just posting one
 thought
  to see if others may have the same opinion.

 OK.  I'll also wait for others' comments.  For review, an updated version
 of the
 patch is attached, which fixed the bug using the approach that directly
 uses the
 clause information in the parse tree.



I tried several ways but I couldn't find big problems.  Small typo:
s/rejunk/resjunk/

I defer to commiter.


Thanks,
-- 
Hitoshi Harada


Re: [HACKERS] Support for RANGE ... PRECEDING windows in OVER

2013-06-21 Thread Craig Ringer
On 06/21/2013 05:32 PM, Hitoshi Harada wrote:

 I also later found that we are missing not only notion of '+' or '-',
 but also notion of 'zero value' in our catalog.  Per spec, RANGE BETWEEN
 needs to detect ERROR if the offset value is negative, but it is not
 always easy if you think about interval, numeric types as opposed to
 int64 used in ROWS BETWEEN.

Zero can be tested for with `val = (@ val)` ie `val = abs(val)`. That
should make sense for any type in which the concept of zero makes sense.

Thanks for the warning on that issue.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] trgm regex index peculiarity

2013-06-21 Thread Erik Rijkers
On Fri, June 21, 2013 05:25, Tom Lane wrote:
 Erik Rijkers e...@xs4all.nl writes:
 In a 112 MB test table (containing random generated text) with a trgm index 
 (gin_trgm_ops), I consistently get these
 timings:
 select txt from azjunk6 where txt ~ '^abcd';
130 ms
 select txt from azjunk6
 where txt ~ 'abcd' and substr(txt,1,4) = 'abcd';
3 ms

 Hm, could you provide a self-contained test case?


yes, sorry.   I tested on a 1M row table:

#!/bin/sh

# create table:
for power in 6;
do
  table=azjunk${power}
  index=${table}_trgm_re_idx
  perl -E'
  sub ss{ join,@_[ map{rand @_} 1 .. shift ] };
  say(ss(80,a..g, ,h..m, ,n..s, ,t..z)) for 1 .. 
1e'${power}; \
  | psql -aqXc 
drop table if exists $table;
create table $table(txt text);
copy $table from stdin;;
  echo set session maintenance_work_mem='1GB';
create index $index on $table using gin (txt gin_trgm_ops);
analyze $table; | psql -qtAX;
done

# test:
echo 
\\timing on
explain analyze select txt from azjunk6 where txt ~ '^abcd';  -- slow (140 ms)
explain analyze select txt from azjunk6 where txt ~ 'abcd' and substr(txt,1,4) 
= 'abcd'; -- fast (5 ms)
 | psql -Xqa





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for removng unused targets

2013-06-21 Thread Etsuro Fujita
 From: Hitoshi Harada [mailto:umi.tan...@gmail.com]

 I tried several ways but I couldn't find big problems.  Small typo:
 s/rejunk/resjunk/

Thank you for the review.  Attached is an updated version of the patch.

Thanks,

Best regards,
Etsuro Fujita


unused-targets-20130621.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for RANGE ... PRECEDING windows in OVER

2013-06-21 Thread Hitoshi Harada
On Fri, Jun 21, 2013 at 3:20 AM, Craig Ringer cr...@2ndquadrant.com wrote:

 On 06/21/2013 05:32 PM, Hitoshi Harada wrote:

  I also later found that we are missing not only notion of '+' or '-',
  but also notion of 'zero value' in our catalog.  Per spec, RANGE BETWEEN
  needs to detect ERROR if the offset value is negative, but it is not
  always easy if you think about interval, numeric types as opposed to
  int64 used in ROWS BETWEEN.

 Zero can be tested for with `val = (@ val)` ie `val = abs(val)`. That
 should make sense for any type in which the concept of zero makes sense.


 Yeah, I mean, it needs to know if offset is negative or not by testing
with zero.  So we need zero value or is_negative function for each type.

Thanks,
-- 
Hitoshi Harada


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-21 Thread Michael Paquier
On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
/* Clean up. */
heap_freetuple(reltup1);
  @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel-rd_rel-reltoastrelid))
{
Relationtoastrel;
  - Oid toastidx;
charNewToastName[NAMEDATALEN];
  + ListCell   *lc;
  + int count = 0;
 
toastrel = 
  relation_open(newrel-rd_rel-reltoastrelid,
 
  AccessShareLock);
  - toastidx = toastrel-rd_rel-reltoastidxid;
  + RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
 
/* rename the toast table ... */
  @@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,

  RenameRelationInternal(newrel-rd_rel-reltoastrelid,
   
  NewToastName, true);
 
  - /* ... and its index too */
  - snprintf(NewToastName, NAMEDATALEN, 
  pg_toast_%u_index,
  -  OIDOldHeap);
  - RenameRelationInternal(toastidx,
  -
  NewToastName, true);
  + /* ... and its indexes too */
  + foreach(lc, toastrel-rd_indexlist)
  + {
  + /*
  +  * The first index keeps the former toast 
  name and the
  +  * following entries have a suffix appended.
  +  */
  + if (count == 0)
  + snprintf(NewToastName, NAMEDATALEN, 
  pg_toast_%u_index,
  +  OIDOldHeap);
  + else
  + snprintf(NewToastName, NAMEDATALEN, 
  pg_toast_%u_index_%d,
  +  OIDOldHeap, count);
  + RenameRelationInternal(lfirst_oid(lc),
  +
  NewToastName, true);
  + count++;
  + }
}
relation_close(newrel, NoLock);
}
 
  Is it actually possible to get here with multiple toast indexes?
 Actually it is possible. finish_heap_swap is also called for example
 in ALTER TABLE where rewriting the table (phase 3), so I think it is
 better to protect this code path this way.

 But why would we copy invalid toast indexes over to the new relation?
 Shouldn't the new relation have been freshly built in the previous
 steps?
What do you think about that? Using only the first valid index would be enough?


  diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
  index 8ac2549..31309ed 100644
  --- a/src/include/utils/relcache.h
  +++ b/src/include/utils/relcache.h
  @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
   typedef Relation *RelationPtr;
 
   /*
  + * RelationGetIndexListIfValid
  + * Get index list of relation without recomputing it.
  + */
  +#define RelationGetIndexListIfValid(rel) \
  +do { \
  + if (rel-rd_indexvalid == 0) \
  + RelationGetIndexList(rel); \
  +} while(0)
 
  Isn't this function misnamed and should be
  RelationGetIndexListIfInValid?
 When naming that; I had more in mind: get the list of indexes if it
 is already there. It looks more intuitive to my mind.

 I can't follow. RelationGetIndexListIfValid() doesn't return
 anything. And it doesn't do anything if the list is already valid. It
 only does something iff the list currently is invalid.
In this case RelationGetIndexListIfInvalid?
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] trgm regex index peculiarity

2013-06-21 Thread Alexander Korotkov
On Fri, Jun 21, 2013 at 2:40 PM, Erik Rijkers e...@xs4all.nl wrote:

 On Fri, June 21, 2013 05:25, Tom Lane wrote:
  Erik Rijkers e...@xs4all.nl writes:
  In a 112 MB test table (containing random generated text) with a trgm
 index (gin_trgm_ops), I consistently get these
  timings:
  select txt from azjunk6 where txt ~ '^abcd';
 130 ms
  select txt from azjunk6
  where txt ~ 'abcd' and substr(txt,1,4) = 'abcd';
 3 ms
 
  Hm, could you provide a self-contained test case?
 

 yes, sorry.   I tested on a 1M row table:

 #!/bin/sh

 # create table:
 for power in 6;
 do
   table=azjunk${power}
   index=${table}_trgm_re_idx
   perl -E'
   sub ss{ join,@_[ map{rand @_} 1 .. shift ] };
   say(ss(80,a..g, ,h..m, ,n..s, ,t..z)) for 1 ..
 1e'${power}; \
   | psql -aqXc 
 drop table if exists $table;
 create table $table(txt text);
 copy $table from stdin;;
   echo set session maintenance_work_mem='1GB';
 create index $index on $table using gin (txt gin_trgm_ops);
 analyze $table; | psql -qtAX;
 done

 # test:
 echo 
 \\timing on
 explain analyze select txt from azjunk6 where txt ~ '^abcd';  -- slow (140
 ms)
 explain analyze select txt from azjunk6 where txt ~ 'abcd' and
 substr(txt,1,4) = 'abcd'; -- fast (5 ms)
  | psql -Xqa


Regex '^abcd' will be expanded into trigrams '__a', '_ab', 'abc' and 'bcd'.
However trigrams '__a' is much more frequent than '_ab' which in turn is
much more frequent than 'abc' and 'bcd'. Ommiting of ^ leads to ommiting of
'__a' and '_ab' and that gives so significant speedup.
The problem is that trigram regex engine doesn't know that '__a' and '_ab'
is more frequent than another trigrams because we don't know their
distribution. However, simple assumption that blank character (in pg_trgm
meaning) is much more frequent than others seems to be true for most cases.
Attached patch implementing this assumption. It introduces BLANK_COLOR_SIZE
macro which is used in selectColorTrigrams like count of characters in
COLOR_BLANK. Another change in this patch is split of MAX_TRGM_COUNT into
WISH_TRGM_SIZE and MAX_TRGM_SIZE. The idea is to keep trying to remove
color trigrams from graph even when it fits into MAX_TRGM_SIZE, because we
are intending to scan less posting lists/trees.
Comments is not fixed yet, coming soon.

--
With best regards,
Alexander Korotkov.


trgm_regex_optimize.1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Request for Patch Feedback: Lag Lead Window Functions Can Ignore Nulls

2013-06-21 Thread Robert Haas
On Fri, Jun 21, 2013 at 12:18 AM, Jeff Davis pg...@j-davis.com wrote:
 On Thu, 2013-06-20 at 10:03 -0400, Robert Haas wrote:
 I think the question is whether this feature is really worth adding
 new reserved keywords for.  I have a hard time saying we shouldn't
 support something that's part of the SQL standard, but personally,
 it's not something I've seen come up prior to this thread.

 What's the next step here?

Well, ideally, some other people weigh in on the value of the feature
vs. the pain of reserving the keywords.

 The feature sounds useful to me.

...and there's one person with an opinion now!   :-)

The other question here is - do we actually have the grammar right?
As in, is this actually the syntax we're supposed to be implementing?
It looks different from what's shown here, where the IGNORE NULLS is
inside the function's parentheses, rather than afterwards:

http://rwijk.blogspot.com/2010/06/simulating-laglead-ignore-nulls.html

IBM seems to think it's legal either inside or outside the parentheses:

http://pic.dhe.ibm.com/infocenter/informix/v121/index.jsp?topic=%2Fcom.ibm.sqls.doc%2Fids_sqs_2594.htm

Regardless of what syntax we settle on, we should also make sure that
the conflict is intrinsic to the grammar and can't be factored out, as
Tom suggested upthread.  It's not obvious to me what the actual
ambiguity is here.  If you've seen select lag(num,0) and the
lookahead token is respect, what's the problem?  It sort of looks
like it could be a column label, but not even unreserved keywords can
be column labels, so that's not it.  Probably deserves a bit more
investigation...

 If the grammar is unacceptable, does
 someone have an alternative idea, like using new function names instead
 of grammar? If so, what are reasonable names to use?

We could just add additional, optional Boolean argument to the
existing functions.  It's non-standard, but we avoid adding keywords.

 Also, I think someone mentioned this already, but what about
 first_value() and last_value()? Shouldn't we do those at the same time?

Not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread MauMau

From: Alvaro Herrera alvhe...@2ndquadrant.com

MauMau escribió:


One concern is that umount would fail in such a situation because
postgres has some open files on the filesystem, which is on the
shared disk in case of traditional HA cluster.


See my reply to Noah.  If postmaster stays around, would this be any
different?  I don't think so.


I'll consider this a bit and respond as a reply to Andres-san's mail.



Actually, in further testing I noticed that the fast-path you introduced
in BackendCleanup (or was it HandleChildCrash?) in the immediate
shutdown case caused postmaster to fail to clean up properly after
sending the SIGKILL signal, so I had to remove that as well.  Was there
any reason for that?


You are talking about the code below at the beginning of HandleChildCrash(), 
aren't you?


/* Do nothing if the child terminated due to immediate shutdown */
if (Shutdown == ImmediateShutdown)
 return;

If my memory is correct, this was necessary; without this, 
HandleChildCrash() or LogChildExit() left extra messages in the server log.


LOG:  %s (PID %d) exited with exit code %d
LOG:  terminating any other active server processes

These messages are output because postmaster sends SIGQUIT to its children 
and wait for them to terminate.  The children exit with non-zero status when 
they receive SIGQUIT.



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-21 Thread Robert Haas
On Wed, Jun 19, 2013 at 2:42 PM, Fabien COELHO coe...@cri.ensmp.fr wrote:
 number of transactions actually processed: 301921
 Just a thought before spending too much time on this subtle issue.

 The patch worked reasonnably for 301900 transactions in your above run, and
 the few last ones, less than the number of clients, show strange latency
 figures which suggest that something is amiss in some corner case when
 pgbench is stopping. However, the point of pgbench is to test a steady
 state, not to achieve the cleanest stop at the end of a run.

 So my question is: should this issue be a blocker wrt to the feature?

I think so.  If it doesn't get fixed now, it's not likely to get fixed
later.  And the fact that nobody understands why it's happening is
kinda worrisome...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] trgm regex index peculiarity

2013-06-21 Thread Erik Rijkers
On Fri, June 21, 2013 15:11, Alexander Korotkov wrote:
 On Fri, Jun 21, 2013 at 2:40 PM, Erik Rijkers e...@xs4all.nl wrote:

 On Fri, June 21, 2013 05:25, Tom Lane wrote:
  Erik Rijkers e...@xs4all.nl writes:
  In a 112 MB test table (containing random generated text) with a trgm
 index (gin_trgm_ops), I consistently get these
  timings:
  select txt from azjunk6 where txt ~ '^abcd';
 130 ms
  select txt from azjunk6
  where txt ~ 'abcd' and substr(txt,1,4) = 'abcd';
 3 ms
 

 Regex '^abcd' will be expanded into trigrams '__a', '_ab', 'abc' and 'bcd'.
 However trigrams '__a' is much more frequent than '_ab' which in turn is
 much more frequent than 'abc' and 'bcd'. Ommiting of ^ leads to ommiting of
 '__a' and '_ab' and that gives so significant speedup.

 [trgm_regex_optimize.1.patch ]

Yes, that fixes the problem, thanks.

Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-21 Thread Andres Freund
On 2013-06-21 20:54:34 +0900, Michael Paquier wrote:
 On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
   @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
   Is it actually possible to get here with multiple toast indexes?
  Actually it is possible. finish_heap_swap is also called for example
  in ALTER TABLE where rewriting the table (phase 3), so I think it is
  better to protect this code path this way.
 
  But why would we copy invalid toast indexes over to the new relation?
  Shouldn't the new relation have been freshly built in the previous
  steps?
 What do you think about that? Using only the first valid index would be 
 enough?

What I am thinking about is the following: When we rewrite a relation,
we build a completely new toast relation. Which will only have one
index, right? So I don't see how this could could be correct if we deal
with multiple indexes. In fact, the current patch's swap_relation_files
throws an error if there are multiple ones around.


   diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
   index 8ac2549..31309ed 100644
   --- a/src/include/utils/relcache.h
   +++ b/src/include/utils/relcache.h
   @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
typedef Relation *RelationPtr;
  
/*
   + * RelationGetIndexListIfValid
   + * Get index list of relation without recomputing it.
   + */
   +#define RelationGetIndexListIfValid(rel) \
   +do { \
   + if (rel-rd_indexvalid == 0) \
   + RelationGetIndexList(rel); \
   +} while(0)
  
   Isn't this function misnamed and should be
   RelationGetIndexListIfInValid?
  When naming that; I had more in mind: get the list of indexes if it
  is already there. It looks more intuitive to my mind.
 
  I can't follow. RelationGetIndexListIfValid() doesn't return
  anything. And it doesn't do anything if the list is already valid. It
  only does something iff the list currently is invalid.
 In this case RelationGetIndexListIfInvalid?

Yep. Suggested that above ;). Maybe RelationFetchIndexListIfInvalid()?

Hm. Looking at how this is currently used - I am afraid it's not
correct... the reason RelationGetIndexList() returns a copy is that
cache invalidations will throw away that list. And you do index_open()
while iterating over it which will accept invalidation messages.
Mybe it's better to try using RelationGetIndexList directly and measure
whether that has a measurable impact=

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible bug in CASE evaluation

2013-06-21 Thread Noah Misch
On Fri, Jun 21, 2013 at 09:20:21AM +, Albe Laurenz wrote:
 Andres Freund wrote:
  Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
  though. If we can agree it is, the fix outlined over on -bugs seems to
  be easily enough implemented...

If you refer to this:

On Tue, Jun 18, 2013 at 03:31:32PM +0200, Andres Freund wrote:
 So it seems we need to stop processing after finding a single WHEN
 that's not const? Does anybody have a better idea?

eval_const_expressions() is not just an optimization: it performs mandatory
transformations such as the conversion of named-argument function calls to
positional function calls.  Even if you could skip it, queries with expensive
constant expressions would notice the performance loss.  The situations helped
by a change like this are too marginal to accept that cost.

Would it work to run eval_const_expressions() lazily on THEN clauses?  The
plan-time eval_const_expressions() would not descend to them.  The first
ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
before proceeding.

 I think that it is surprising behaviour.

That's about how I characterize it, too.

I question whether real applications care.  It's important to have CASE usable
for avoiding data-dependent errors, but what's the use of including in your
query an expression that can do nothing but throw an error?  Does anyone have
a real-world example?  Perhaps some generated-query scenario.

That being said, if we discover a simple-enough fix that performs well, we may
as well incorporate it.

 If fixing the behaviour is undesirable, at least the documentation
 should be fixed.

A brief documentation mention sounds fine.  Perhaps add a paragraph on
constant folding in general and reference that from the CASE page.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread MauMau

From: Alvaro Herrera alvhe...@2ndquadrant.com

Actually, I think it would be cleaner to have a new state in pmState,
namely PM_IMMED_SHUTDOWN which is entered when we send SIGQUIT.  When
we're in this state, postmaster is only waiting for the timeout to
expire; and when it does, it sends SIGKILL and exits.  Pretty much the
same you have, except that instead of checking AbortStartTime we check
the pmState variable.


Are you suggesting simplifying the following part in ServerLoop()?  I 
welcome the idea if this condition becomes simpler.  However, I cannot 
imagine how.


 if (AbortStartTime  0   /* SIGKILL only once */
  (Shutdown == ImmediateShutdown || (FatalError  !SendStop)) 
  now - AbortStartTime = 10)
 {
  SignalAllChildren(SIGKILL);
  AbortStartTime = 0;
 }

I thought of adding some new state of pmState for some reason (that might be 
the same as your idea).
But I refrained from doing that, because pmState has already many states.  I 
was afraid adding a new pmState value for this bug fix would complicate the 
state management (e.g. state transition in PostmasterStateMachine()).  In 
addition, I felt PM_WAIT_BACKENDS was appropriate because postmaster is 
actually waiting for backends to terminate after sending SIGQUIT.  The state 
name is natural.


I don't have strong objection to your idea if it makes the code cleaner and 
more understandable.  Thank you very much.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible bug in CASE evaluation

2013-06-21 Thread Andres Freund
On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
 On Fri, Jun 21, 2013 at 09:20:21AM +, Albe Laurenz wrote:
  Andres Freund wrote:
   Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
   though. If we can agree it is, the fix outlined over on -bugs seems to
   be easily enough implemented...
 
 If you refer to this:
 
 On Tue, Jun 18, 2013 at 03:31:32PM +0200, Andres Freund wrote:
  So it seems we need to stop processing after finding a single WHEN
  that's not const? Does anybody have a better idea?
 
 eval_const_expressions() is not just an optimization: it performs mandatory
 transformations such as the conversion of named-argument function calls to
 positional function calls.

Ah yes. Forgot about that... Scrap that. Although it surely isn't nice
that all that is done in a function calleed eval_const_expressions()...

 Even if you could skip it, queries with expensive
 constant expressions would notice the performance loss.  The situations helped
 by a change like this are too marginal to accept that cost.

I have to say, that argument doesn't excite me mu8ch. It's not like we
don't want to do the constant expression evaluation at all anymore. Just
not inside CASE WHEN blocks which already are barring some optimizations
anyway...

 Would it work to run eval_const_expressions() lazily on THEN clauses?  The
 plan-time eval_const_expressions() would not descend to them.  The first
 ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
 before proceeding.

Ugh. Doesn't sound nice. I don't have any better ideas than to actually
split eval_const_expressions into one function that does the necessary
things like canonicalization of AND/OR and one that actually evaluates
expressions inside though.
So maybe that's the way to go :/

  I think that it is surprising behaviour.

 That's about how I characterize it, too.

 I question whether real applications care.  It's important to have CASE usable
 for avoiding data-dependent errors, but what's the use of including in your
 query an expression that can do nothing but throw an error?  Does anyone have
 a real-world example?  Perhaps some generated-query scenario.

It doesn't need to be an actual constant. Something that evaluates to
the value at plan time is enough:
PREPARE foo AS SELECT CASE WHEN (SELECT $1::int)=0 THEN 0 ELSE 1/$1::int END;
EXECUTE foo(0);

That example will most likely only crashes in 9.2+ because it will
replan it with the acutal parameter values in place. But you could have
the same in earlier versions e.g. using PQExecParams(), but that's
harder to demonstrate.

Now, that example only crashes because one place uses (SELECT $1) and
the other doesn't, but...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible bug in CASE evaluation

2013-06-21 Thread Andres Freund
On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
 That being said, if we discover a simple-enough fix that performs well, we may
 as well incorporate it.

What about passing another parameter down eval_const_expressions_mutator
(which is static, so changing the API isn't a problem) that basically
tells us whether we actually should evaluate expressions or just perform
the transformation?
There's seems to be basically a couple of places where we call dangerous
things:
* simplify_function (via -evaluate_function-evaluate_expr) which is
  called in a bunch of places
* evaluate_expr which is directly called in T_ArrayExpr
  T_ArrayCoerceExpr

All places I've inspected so far need to deal with simplify_function
returning NULL anyway, so that seems like a viable fix.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible bug in CASE evaluation

2013-06-21 Thread Pavel Stehule
2013/6/21 Andres Freund and...@2ndquadrant.com:
 On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
 On Fri, Jun 21, 2013 at 09:20:21AM +, Albe Laurenz wrote:
  Andres Freund wrote:
   Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
   though. If we can agree it is, the fix outlined over on -bugs seems to
   be easily enough implemented...

 If you refer to this:

 On Tue, Jun 18, 2013 at 03:31:32PM +0200, Andres Freund wrote:
  So it seems we need to stop processing after finding a single WHEN
  that's not const? Does anybody have a better idea?

 eval_const_expressions() is not just an optimization: it performs mandatory
 transformations such as the conversion of named-argument function calls to
 positional function calls.

 Ah yes. Forgot about that... Scrap that. Although it surely isn't nice
 that all that is done in a function calleed eval_const_expressions()...

 Even if you could skip it, queries with expensive
 constant expressions would notice the performance loss.  The situations 
 helped
 by a change like this are too marginal to accept that cost.

yes, I dislike it too - then we can have inconsistent behave of
constant between CASE and other statements.

We should to do without any performance lost, if we do some changes in
this area.

Regards

Pavel


 I have to say, that argument doesn't excite me mu8ch. It's not like we
 don't want to do the constant expression evaluation at all anymore. Just
 not inside CASE WHEN blocks which already are barring some optimizations
 anyway...

 Would it work to run eval_const_expressions() lazily on THEN clauses?  The
 plan-time eval_const_expressions() would not descend to them.  The first
 ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
 before proceeding.

 Ugh. Doesn't sound nice. I don't have any better ideas than to actually
 split eval_const_expressions into one function that does the necessary
 things like canonicalization of AND/OR and one that actually evaluates
 expressions inside though.
 So maybe that's the way to go :/

  I think that it is surprising behaviour.

 That's about how I characterize it, too.

 I question whether real applications care.  It's important to have CASE 
 usable
 for avoiding data-dependent errors, but what's the use of including in your
 query an expression that can do nothing but throw an error?  Does anyone have
 a real-world example?  Perhaps some generated-query scenario.

 It doesn't need to be an actual constant. Something that evaluates to
 the value at plan time is enough:
 PREPARE foo AS SELECT CASE WHEN (SELECT $1::int)=0 THEN 0 ELSE 1/$1::int END;
 EXECUTE foo(0);

 That example will most likely only crashes in 9.2+ because it will
 replan it with the acutal parameter values in place. But you could have
 the same in earlier versions e.g. using PQExecParams(), but that's
 harder to demonstrate.

 Now, that example only crashes because one place uses (SELECT $1) and
 the other doesn't, but...

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible bug in CASE evaluation

2013-06-21 Thread Noah Misch
On Fri, Jun 21, 2013 at 04:12:32PM +0200, Andres Freund wrote:
 On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
  On Fri, Jun 21, 2013 at 09:20:21AM +, Albe Laurenz wrote:

  Even if you could skip it, queries with expensive
  constant expressions would notice the performance loss.  The situations 
  helped
  by a change like this are too marginal to accept that cost.
 
 I have to say, that argument doesn't excite me mu8ch. It's not like we
 don't want to do the constant expression evaluation at all anymore. Just
 not inside CASE WHEN blocks which already are barring some optimizations
 anyway...

Sure, it's a narrow loss.  Before introducing a new narrow loss to fix an
existing one, we should consider which loss hurts more.  Offhand, I sympathize
with the folks who would lose performance more than with the folks who want to
write the sorts of expressions under consideration.

  Would it work to run eval_const_expressions() lazily on THEN clauses?  The
  plan-time eval_const_expressions() would not descend to them.  The first
  ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
  before proceeding.
 
 Ugh. Doesn't sound nice.

Would you elaborate?

  I question whether real applications care.  It's important to have CASE 
  usable
  for avoiding data-dependent errors, but what's the use of including in your
  query an expression that can do nothing but throw an error?  Does anyone 
  have
  a real-world example?  Perhaps some generated-query scenario.
 
 It doesn't need to be an actual constant. Something that evaluates to
 the value at plan time is enough:
 PREPARE foo AS SELECT CASE WHEN (SELECT $1::int)=0 THEN 0 ELSE 1/$1::int END;
 EXECUTE foo(0);

 Now, that example only crashes because one place uses (SELECT $1) and
 the other doesn't, but...

Not the real-world I was hoping for, but fair enough.

Crash in this context means raise an error, right?

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-06-21 Thread Robert Haas
On Wed, Jun 19, 2013 at 1:59 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 6/7/13 12:14 AM, Amit Kapila wrote:
 I will change the patch as per below syntax if there are no objections:

  ALTER SYSTEM SET configuration_parameter {TO | =} {value, | 'value'};

 I do like using ALTER SYSTEM in general, but now that I think about it,
 the 9.3 feature to create global settings in pg_db_role_setting would
 also have been a candidate for exactly that same syntax if it had been
 available.  In fact, if we do add ALTER SYSTEM, it might make sense to
 recast that feature into that syntax.

 It might be clearer to do something like ALTER SYSTEM SET EXTERNAL FILE
 or something like that.  It's only a small syntax change, so don't worry
 about it too much, but let's keep thinking about it.

I think that anything we want to add should either go before SET or
after the value, such as:

ALTER SYSTEM SET configuration_parameter = 'value' IN FILE 'myconf.conf';
ALTER SYSTEM CONFIGURATION FILE SET configuration_parameter = 'file';
ALTER SYSTEM CONFIGURATION FILE 'myconf.conf' SET
configuration_parameter = 'file';

I agree we shouldn't back ourselves into a syntactic corner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add visibility map information to pg_freespace.

2013-06-21 Thread Robert Haas
On Wed, Jun 19, 2013 at 11:26 PM, Satoshi Nagayasu sn...@uptime.jp wrote:
 - pageinspect provies several functions for debugging purpose.
 - pg_freespace provies a view for monitoring purpose.
 - pgstattuple provies several functions for collecting
   specific table/index statistics.

I think we should be careful to think about upgrade considerations
when adding this functionality.  Bumping the module version number to
add new functions is less likely to break things for users than
changing the return value of an existing SRF.  Maybe that's too far
down in the weeds to worry about, but it's a thought - especially for
pg_freespace, where there's no real efficiency benefit to have the
same function look at the FSM and the VM anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Robert Haas
On Thu, Jun 20, 2013 at 12:33 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 I will go with 5 seconds, then.

I'm uncomfortable with this whole concept, and particularly with such
a short timeout.  On a very busy system, things can take a LOT longer
than they think we should; it can take 30 seconds or more just to get
a prompt back from a shell command.  5 seconds is the blink of an eye.

More generally, what do we think the point is of sending SIGQUIT
rather than SIGKILL in the first place, and why does that point cease
to be valid after 5 seconds?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Request for Patch Feedback: Lag Lead Window Functions Can Ignore Nulls

2013-06-21 Thread Jeff Davis
On Fri, 2013-06-21 at 09:21 -0400, Robert Haas wrote:
 The other question here is - do we actually have the grammar right?
 As in, is this actually the syntax we're supposed to be implementing?
 It looks different from what's shown here, where the IGNORE NULLS is
 inside the function's parentheses, rather than afterwards:
 
 http://rwijk.blogspot.com/2010/06/simulating-laglead-ignore-nulls.html
 
 IBM seems to think it's legal either inside or outside the parentheses:
 
 http://pic.dhe.ibm.com/infocenter/informix/v121/index.jsp?topic=%2Fcom.ibm.sqls.doc%2Fids_sqs_2594.htm

The spec seems pretty clear that it falls outside of the parentheses,
unless I am missing something.

 Regardless of what syntax we settle on, we should also make sure that
 the conflict is intrinsic to the grammar and can't be factored out, as
 Tom suggested upthread.  It's not obvious to me what the actual
 ambiguity is here.  If you've seen select lag(num,0) and the
 lookahead token is respect, what's the problem?  It sort of looks
 like it could be a column label, but not even unreserved keywords can
 be column labels, so that's not it.  Probably deserves a bit more
 investigation...

I think the problem is when the function is used as a table function;
e.g.:

  SELECT * FROM generate_series(1,10) respect;

 We could just add additional, optional Boolean argument to the
 existing functions.  It's non-standard, but we avoid adding keywords.

I thought about that, but it is awkward because the argument would have
to be constant (or, if not, it would be quite strange).

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why can't I use windowing functions over ordered aggregates?

2013-06-21 Thread Cédric Villemain
Le vendredi 21 juin 2013 03:32:33, Josh Berkus a écrit :
 Hackers,
 
 So, I can create a custom aggregate first and do this:
 
 SELECT first(val order by ts desc) ...
 
 And I can do this:
 
 SELECT first_value(val) OVER (order by ts desc)
 
 ... but I can't do this:
 
 SELECT first_value(val order by ts desc)
 
 ... even though under the hood, it's the exact same operation.

First I'm not sure it is the same, in a window frame you have the notion of 
peer-rows (when you use ORDER BY).

And also, first_value is a *window* function, not a simple aggregate 
function...

See this example:
# create table foo (i int, t timestamptz);
# insert into foo select n, now() from generate_series(1,10) g(n);
# select i, first_value(i) over (order by t desc) from foo;
# select i, first_value(i) over (order by t desc ROWS between 0 PRECEDING and 
UNBOUNDED FOLLOWING) from foo;

What do you expect SELECT first(val order by ts desc) to output ?

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


[HACKERS] Hardware donation

2013-06-21 Thread Jim Nasby

We've got some recently decommissioned servers and Enova is willing to donate 2 
of them to the community.

There's nothing terribly spectacular about the servers except for memory. We 
have one 512G server available and the other would be either 192G or 96G. I 
know that folks already have access to machines with a lot of cores, but I 
haven't seen reports of large memory machines.

CPU details vary but we're only looking at 20ish cores (though AFAIK they're 
all 4 socket servers if that matters).

Local drives are nothing fancy (though some might possibly be SSD).
--
Jim Nasby, Lead Data Architect
(512) 569-9461 (primary) (512) 579-9024 (backup)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardware donation

2013-06-21 Thread Atri Sharma
On Fri, Jun 21, 2013 at 10:18 PM, Jim Nasby jna...@enova.com wrote:
 We've got some recently decommissioned servers and Enova is willing to
 donate 2 of them to the community.

 There's nothing terribly spectacular about the servers except for memory. We
 have one 512G server available and the other would be either 192G or 96G. I
 know that folks already have access to machines with a lot of cores, but I
 haven't seen reports of large memory machines.

 CPU details vary but we're only looking at 20ish cores (though AFAIK they're
 all 4 socket servers if that matters).

 Local drives are nothing fancy (though some might possibly be SSD).

Woot!

--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardware donation

2013-06-21 Thread Jim Nasby

I stand corrected... we don't have a 512G server available. We do have plenty 
of 192G and 96G servers though if 2 of those would be of use.

Sorry for the noise.

On 6/21/13 11:48 AM, Jim Nasby wrote:

We've got some recently decommissioned servers and Enova is willing to donate 2 
of them to the community.

There's nothing terribly spectacular about the servers except for memory. We 
have one 512G server available and the other would be either 192G or 96G. I 
know that folks already have access to machines with a lot of cores, but I 
haven't seen reports of large memory machines.

CPU details vary but we're only looking at 20ish cores (though AFAIK they're 
all 4 socket servers if that matters).

Local drives are nothing fancy (though some might possibly be SSD).


--
Jim Nasby, Lead Data Architect
(512) 569-9461 (primary) (512) 579-9024 (backup)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-06-21 Thread Fujii Masao
On Sat, Jun 22, 2013 at 12:11 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 19, 2013 at 1:59 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 6/7/13 12:14 AM, Amit Kapila wrote:
 I will change the patch as per below syntax if there are no objections:

  ALTER SYSTEM SET configuration_parameter {TO | =} {value, | 'value'};

 I do like using ALTER SYSTEM in general, but now that I think about it,
 the 9.3 feature to create global settings in pg_db_role_setting would
 also have been a candidate for exactly that same syntax if it had been
 available.  In fact, if we do add ALTER SYSTEM, it might make sense to
 recast that feature into that syntax.

 It might be clearer to do something like ALTER SYSTEM SET EXTERNAL FILE
 or something like that.  It's only a small syntax change, so don't worry
 about it too much, but let's keep thinking about it.

 I think that anything we want to add should either go before SET or
 after the value, such as:

 ALTER SYSTEM SET configuration_parameter = 'value' IN FILE 'myconf.conf';
 ALTER SYSTEM CONFIGURATION FILE SET configuration_parameter = 'file';
 ALTER SYSTEM CONFIGURATION FILE 'myconf.conf' SET
 configuration_parameter = 'file';

 I agree we shouldn't back ourselves into a syntactic corner.

Maybe this idea has been already discussed, but why don't we just
add the function like update_config_file(parameter, value)? We can
commit the core of the patch with that function API first, and then
we can discuss the syntax and add another API like ALTER SYSTEM
later.

We now have set_config() function to set the parameter,
so adding the function for this feature should not be a surprise.
If the name of the function is, for example, update_conf_file,
most users would think that it's intuitive even if SIGHUP is not
automatically sent immediately. We don't need to emit
the message like 'This setting change will not be loaded until SIGHUP'.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why can't I use windowing functions over ordered aggregates?

2013-06-21 Thread David Johnston
Cédric Villemain-2 wrote
 And also, first_value is a *window* function, not a simple aggregate 
 function...

Per the documentation any aggregate function can be used with a WINDOW
declaration.  The logical question is why are window aggregates special so
that the reverse cannot be true?  In other words why is not every function
simply defined as a normal aggregate that can be used in both contexts?


 See this example:
 # create table foo (i int, t timestamptz);
 # insert into foo select n, now() from generate_series(1,10) g(n);
 # select i, first_value(i) over (order by t desc) from foo;
 # select i, first_value(i) over (order by t desc ROWS between 0 PRECEDING
 and 
 UNBOUNDED FOLLOWING) from foo;
 
 What do you expect SELECT first(val order by ts desc) to output ? 

Undefined due to incorrect specificity of the ORDER BY definition.  The
window version has the same issue.

The window aggregates should simply treat the entire input set as the
relevant frame - basically the same output as would result from
(simplistically):

SELECT window_agg(...)
FROM (
SELECT id, window_agg(...) OVER (ORDER BY id ASC)  ORDER BY id ASC
) agg
ORDER BY id DESC LIMIT 1

Admittedly this really only makes sense for first_value, last_value, and
nth_value; the other window aggregates can return valid values but to have
meaning they really need to be output in a windowing context.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Why-can-t-I-use-windowing-functions-over-ordered-aggregates-tp5760233p5760358.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why can't I use windowing functions over ordered aggregates?

2013-06-21 Thread Josh Berkus
Cedric,

 See this example:
 # create table foo (i int, t timestamptz);
 # insert into foo select n, now() from generate_series(1,10) g(n);
 # select i, first_value(i) over (order by t desc) from foo;
 # select i, first_value(i) over (order by t desc ROWS between 0 PRECEDING and 
 UNBOUNDED FOLLOWING) from foo;
 
 What do you expect SELECT first(val order by ts desc) to output ?
 

Ah, right, I see what you mean.  Yeah, I was doing queries without peer
rows, so it looked the same to me, and it uses some of the same
machinery.  But of course it's not completely the same.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Review [was Re: [HACKERS] MD5 aggregate]

2013-06-21 Thread David Fetter
On Mon, Jun 17, 2013 at 11:34:52AM +0100, Dean Rasheed wrote:
 On 15 June 2013 10:22, Dean Rasheed dean.a.rash...@gmail.com wrote:
  There seem to be 2 separate directions that this could go, which
  really meet different requirements:
 
  1). Produce an unordered sum for SQL to compare 2 tables regardless of
  the order in which they are scanned. A possible approach to this might
  be something like an aggregate
 
  md5_total(text/bytea) returns text
 
  that returns the sum of the md5 values of each input value, treating
  each md5 value as an unsigned 128-bit integer, and then producing the
  hexadecimal representation of the final sum. This should out-perform a
  solution based on numeric addition, and in typical cases, the result
  wouldn't be much longer than a regular md5 sum, and so would be easy
  to eyeball for differences.
 
 
 I've been playing around with the idea of an aggregate that computes
 the sum of the md5 hashes of each of its inputs, which I've called
 md5_total() for now, although I'm not particularly wedded to that
 name. Comparing it with md5_agg() on a 100M row table (see attached
 test script) produces interesting results:
 
 SELECT md5_agg(foo.*::text)
   FROM (SELECT * FROM foo ORDER BY id) foo;
 
  50bc42127fb9b028c9708248f835ed8f
 
 Time: 92960.021 ms
 
 SELECT md5_total(foo.*::text) FROM foo;
 
  02faea7fafee4d253fc94cfae031afc43c03479c
 
 Time: 96190.343 ms
 
 Unlike md5_agg(), it is no longer a true MD5 sum (for one thing, its
 result is longer) but it seems like it would be very useful for
 quickly comparing data in SQL, since its value is not dependent on the
 row-order making it easier to use and better performing if there is no
 usable index for ordering.
 
 Note, however, that if there is an index that can be used for
 ordering, the performance is not necessarily better than md5_agg(), as
 this example shows. There is a small additional overhead per row for
 initialising the MD5 sums, and adding the results to the total, but I
 think the biggest factor is that md5_total() is processing more data.
 The reason is that MD5 works on 64-byte blocks, so the total amount of
 data going through the core MD5 algorithm is each row's size is
 rounded up to a multiple of 64. In this simple case it ends up
 processing around 1.5 times as much data:
 
 SELECT sum(length(foo.*::text)) AS md5_agg,
sum(((length(foo.*::text)+63)/64)*64) AS md5_total FROM foo;
 
   md5_agg   |  md5_total
 +-
  8103815438 | 12799909248
 
 although of course that overhead won't be as large on wider tables,
 and even in this case the overall performance is still on a par with
 md5_agg().
 
 ISTM that both aggregates are potentially useful in different
 situations. I would probably typically use md5_total() because of its
 simplicity/order-independence and consistent performance, but
 md5_agg() might also be useful when comparing with external data.
 
 Regards,
 Dean

Submission review:

Is the patch in a patch format which has context? (eg: context diff format)

Yes.

Does it apply cleanly to the current git master?

Yes.

Does it include reasonable tests, necessary doc patches, etc? 

Yes.

Usability review:

Does the patch actually implement that?

Yes.

Do we want that?

Enough do.

Do we already have it?

No.

Does it follow SQL spec, or the community-agreed behavior?

No, and yes, respectively.

Does it include pg_dump support (if applicable)?

N/A

Are there dangers?

Patch changes the MD5 implementation, which could
theoretically result in backward incompatibility if the
changes are not 100% backward-compatible.

Have all the bases been covered? 

Yes.

Feature test:

Does the feature work as advertised?

Yes.

Are there corner cases the author has failed to consider?

Not that I've found so far.

Are there any assertion failures or crashes?

No.

Performance review (skills needed: ability to time performance)

Does the patch slow down simple tests?

Yes.  For an MD5 checksum of some random data, here are
results from master:

shackle@postgres:5493=# WITH t1 AS (SELECT 
string_agg(chr(floor(255*random()+1)::int),'') AS a FROM 
generate_series(1,1)),
postgres-# t2 AS (SELECT a FROM t1 CROSS JOIN 
generate_series(1,1))
postgres-# select md5(a::text) FROM t2;
shackle@postgres:5493=# \timing 
Timing is on.
shackle@postgres:5493=# \g
Time: 955.393 ms
shackle@postgres:5493=# \g
Time: 950.909 ms
shackle@postgres:5493=# \g
Time: 947.955 ms
shackle@postgres:5493=# \g
Time: 946.193 ms
shackle@postgres:5493=# \g
Time: 950.591 ms
shackle@postgres:5493=# \g
Time: 952.346 ms
  

[HACKERS] Unaccent performance

2013-06-21 Thread Thom Brown
Hi,

The unaccent extension is great, especially with its customisability, but
it's not always easy to recommend.  I witnessed a customer using no less
than 56 nested replace functions in an SQL function.  I looked to see how
much this can be mitigated by unaccent.  It turns out that not all the
characters they were replacing can be replaced by unaccent, either because
they replace more than 1 character at a time, or the character they're
replacing, for some reason, isn't processed by unaccent, even with a custom
rules file.

So there were 20 characters I could identify that they were replacing.  I
made a custom rules file and compared its performance to the
difficult-to-manage set of nested replace calls.

CREATE OR REPLACE FUNCTION public.myunaccent(sometext text)
 RETURNS text
 LANGUAGE sql
 IMMUTABLE
AS $function$
SELECT
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(sometext,'ą','a'),'Ą','A'),'ă','a'),'Ă','A'),'ā','a'),'Ā','A'),'æ','a'),'å','a'),'ä','a'),'ã','a'),'â','a'),'á','a'),'à','a'),'Æ','A'),'Å','A'),'Ä','A'),'Ã','A'),'Â','A'),'Á','A'),'À','A')
;
$function$

postgres=# SELECT myunaccent(sometext::text) FROM (SELECT
'ÀÁÂÃÄÅÆàáâãäåæĀāĂ㥹' sometext FROM generate_series(1,100)) x OFFSET
99 LIMIT 1;
  myunaccent
--
 AAAaaaAaAaAa
(1 row)

Time: 726.282 ms
postgres=# SELECT unaccent(sometext::text) FROM (SELECT
'ÀÁÂÃÄÅÆàáâãäåæĀāĂ㥹' sometext FROM generate_series(1,100)) x OFFSET
99 LIMIT 1;
   unaccent
--
 AAAaaaAaAaAa
(1 row)

Time: 3305.252 ms

The timings are actually pretty much the same even if I introduce 187
nested replace calls for every line in the unaccent.rules file for 187
characters.  But the same character set with unaccent increases to 7418.526
ms with the same type of query as above.  That's 10 times more expensive.

Is there a way to boost the performance to make its adoption more palatable?

-- 
Thom


Re: [HACKERS] Request for Patch Feedback: Lag Lead Window Functions Can Ignore Nulls

2013-06-21 Thread Robert Haas
On Fri, Jun 21, 2013 at 11:33 AM, Jeff Davis pg...@j-davis.com wrote:
 Regardless of what syntax we settle on, we should also make sure that
 the conflict is intrinsic to the grammar and can't be factored out, as
 Tom suggested upthread.  It's not obvious to me what the actual
 ambiguity is here.  If you've seen select lag(num,0) and the
 lookahead token is respect, what's the problem?  It sort of looks
 like it could be a column label, but not even unreserved keywords can
 be column labels, so that's not it.  Probably deserves a bit more
 investigation...

 I think the problem is when the function is used as a table function;
 e.g.:

   SELECT * FROM generate_series(1,10) respect;

Ah ha.  Well, there's probably not much help for that.  Disallowing
keywords as table aliases would be a cure worse than the disease, I
think.  I suppose the good news is that there probably aren't many
people using RESPECT as a column name, but I have a feeling that we're
almost certain to get complaints about reserving IGNORE.  I think that
will have to be quoted as a PL/pgsql variable name as well.  :-(

 We could just add additional, optional Boolean argument to the
 existing functions.  It's non-standard, but we avoid adding keywords.

 I thought about that, but it is awkward because the argument would have
 to be constant (or, if not, it would be quite strange).

True... but e.g. string_agg() has the same issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-06-21 Thread Robert Haas
On Fri, Jun 21, 2013 at 12:56 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Jun 22, 2013 at 12:11 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 19, 2013 at 1:59 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 6/7/13 12:14 AM, Amit Kapila wrote:
 I will change the patch as per below syntax if there are no objections:

  ALTER SYSTEM SET configuration_parameter {TO | =} {value, | 'value'};

 I do like using ALTER SYSTEM in general, but now that I think about it,
 the 9.3 feature to create global settings in pg_db_role_setting would
 also have been a candidate for exactly that same syntax if it had been
 available.  In fact, if we do add ALTER SYSTEM, it might make sense to
 recast that feature into that syntax.

 It might be clearer to do something like ALTER SYSTEM SET EXTERNAL FILE
 or something like that.  It's only a small syntax change, so don't worry
 about it too much, but let's keep thinking about it.

 I think that anything we want to add should either go before SET or
 after the value, such as:

 ALTER SYSTEM SET configuration_parameter = 'value' IN FILE 'myconf.conf';
 ALTER SYSTEM CONFIGURATION FILE SET configuration_parameter = 'file';
 ALTER SYSTEM CONFIGURATION FILE 'myconf.conf' SET
 configuration_parameter = 'file';

 I agree we shouldn't back ourselves into a syntactic corner.

 Maybe this idea has been already discussed, but why don't we just
 add the function like update_config_file(parameter, value)? We can
 commit the core of the patch with that function API first, and then
 we can discuss the syntax and add another API like ALTER SYSTEM
 later.

 We now have set_config() function to set the parameter,
 so adding the function for this feature should not be a surprise.
 If the name of the function is, for example, update_conf_file,
 most users would think that it's intuitive even if SIGHUP is not
 automatically sent immediately. We don't need to emit
 the message like 'This setting change will not be loaded until SIGHUP'.

I could certainly support that plan.  I'm satisfied with the ALTER
SYSTEM syntax and feel that might find other plausible uses, but
functional notation would be OK with me, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-06-21 Thread Robert Haas
On Thu, Jun 20, 2013 at 12:18 AM, Amit Kapila amit.kap...@huawei.com wrote:
 Auto.conf- 1 Vote (Josh)
 System.auto.conf - 1 Vote (Josh)
 Postgresql.auto.conf - 2 Votes (Zoltan, Amit)
 Persistent.auto.conf - 0 Vote
 generated_by_server.conf - 1 Vote (Peter E)
 System.conf  - 1 Vote (Magnus)
 alter_system.conf- 1 Vote (Alvaro)

 I had consolidated the list, so that it would be helpful for committer to
 decide among proposed names for this file.

I'll also vote for postgresql.auto.conf.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardware donation

2013-06-21 Thread Josh Berkus
On 06/21/2013 09:48 AM, Jim Nasby wrote:
 We've got some recently decommissioned servers and Enova is willing to
 donate 2 of them to the community.
 
 There's nothing terribly spectacular about the servers except for
 memory. We have one 512G server available and the other would be either
 192G or 96G. I know that folks already have access to machines with a
 lot of cores, but I haven't seen reports of large memory machines.
 
 CPU details vary but we're only looking at 20ish cores (though AFAIK
 they're all 4 socket servers if that matters).
 
 Local drives are nothing fancy (though some might possibly be SSD).

I'm sure we could use these for the performance test farm.  If we need
to replace some of the drives, the community has money for that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardware donation

2013-06-21 Thread Joshua D. Drake


On 06/21/2013 09:48 AM, Jim Nasby wrote:


We've got some recently decommissioned servers and Enova is willing to
donate 2 of them to the community.

There's nothing terribly spectacular about the servers except for
memory. We have one 512G server available and the other would be either
192G or 96G. I know that folks already have access to machines with a
lot of cores, but I haven't seen reports of large memory machines.

CPU details vary but we're only looking at 20ish cores (though AFAIK
they're all 4 socket servers if that matters).

Local drives are nothing fancy (though some might possibly be SSD).


A couple 192G machines to put in the performance lab would be nice, 
especially if they have SSD.


JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XLogInsert scaling, revisited

2013-06-21 Thread Jeff Janes
On Tue, Jun 18, 2013 at 11:28 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 18.06.2013 21:17, Jeff Janes wrote:

 Hi Heikki,

 I am getting conflicts applying version 22 of this patch to 9.4dev.  Could
 you rebase?


 Here you go.


I think I'm getting an undetected deadlock between the checkpointer and a
user process running a TRUNCATE command.

This is the checkpointer:

#0  0x003a73eeaf37 in semop () from /lib64/libc.so.6
#1  0x005ff847 in PGSemaphoreLock (sema=0x7f8c0a4eb730,
interruptOK=0 '\000') at pg_sema.c:415
#2  0x004b0abf in WaitOnSlot (upto=416178159648) at xlog.c:1775
#3  WaitXLogInsertionsToFinish (upto=416178159648) at xlog.c:2086
#4  0x004b657a in CopyXLogRecordToWAL (write_len=32, isLogSwitch=1
'\001', rdata=0x0, StartPos=value optimized out, EndPos=416192397312)
at xlog.c:1389
#5  0x004b6fb2 in XLogInsert (rmid=0 '\000', info=value optimized
out, rdata=0x7fff0020) at xlog.c:1209
#6  0x004b7644 in RequestXLogSwitch () at xlog.c:8748
#7  0x00611be3 in CheckArchiveTimeout () at checkpointer.c:622
#8  0x00611d97 in CheckpointWriteDelay (flags=value optimized
out, progress=value optimized out) at checkpointer.c:705
#9  0x0062ec5a in BufferSync (flags=64) at bufmgr.c:1322
#10 CheckPointBuffers (flags=64) at bufmgr.c:1828
#11 0x004b1393 in CheckPointGuts (checkPointRedo=416178159592,
flags=64) at xlog.c:8365
#12 0x004b8ff3 in CreateCheckPoint (flags=64) at xlog.c:8148
#13 0x006121c3 in CheckpointerMain () at checkpointer.c:502
#14 0x004c4c4a in AuxiliaryProcessMain (argc=2,
argv=0x7fff21c4a5d0) at bootstrap.c:439
#15 0x0060a68c in StartChildProcess (type=CheckpointerProcess) at
postmaster.c:4954
#16 0x0060d1ea in reaper (postgres_signal_arg=value optimized
out) at postmaster.c:2571
#17 signal handler called
#18 0x003a73ee14d3 in __select_nocancel () from /lib64/libc.so.6
#19 0x0060efee in ServerLoop (argc=value optimized out,
argv=value optimized out) at postmaster.c:1537
#20 PostmasterMain (argc=value optimized out, argv=value optimized out)
at postmaster.c:1246
#21 0x005ad4e0 in main (argc=3, argv=0x179fd00) at main.c:196


And this is the TRUNCATE command.

#0  0x003a73eeaf37 in semop () from /lib64/libc.so.6
#1  0x005ff847 in PGSemaphoreLock (sema=0x7f8c0a4ea8d0,
interruptOK=0 '\000') at pg_sema.c:415
#2  0x004b002d in WALInsertSlotAcquireOne (slotno=-1) at xlog.c:1667
#3  0x004b6d5d in XLogInsert (rmid=0 '\000', info=value optimized
out, rdata=0x7fff21c4a5e0) at xlog.c:1115
#4  0x004b8abc in XLogPutNextOid (nextOid=67198981) at xlog.c:8704
#5  0x004a3bc2 in GetNewObjectId () at varsup.c:495
#6  0x004c5195 in GetNewRelFileNode (reltablespace=value optimized
out, pg_class=0x0, relpersistence=value optimized out) at catalog.c:437
#7  0x0070f52d in RelationSetNewRelfilenode
(relation=0x7f8c019cb2a0, freezeXid=2144367079, minmulti=1) at
relcache.c:2758
#8  0x0055de61 in ExecuteTruncate (stmt=value optimized out) at
tablecmds.c:1163
#9  0x00656080 in standard_ProcessUtility (parsetree=0x2058900,
queryString=value optimized out, context=value optimized out,
params=0x0,
dest=value optimized out, completionTag=value optimized out) at
utility.c:552
#10 0x00652a87 in PortalRunUtility (portal=0x17bf510,
utilityStmt=0x2058900, isTopLevel=1 '\001', dest=0x2058c40,
completionTag=0x7fff21c4a9a0 )
at pquery.c:1187
#11 0x006539fd in PortalRunMulti (portal=0x17bf510, isTopLevel=1
'\001', dest=0x2058c40, altdest=0x2058c40, completionTag=0x7fff21c4a9a0 )
at pquery.c:1318
#12 0x006540b3 in PortalRun (portal=0x17bf510,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x2058c40,
altdest=0x2058c40,
completionTag=0x7fff21c4a9a0 ) at pquery.c:816
#13 0x00650944 in exec_simple_query (query_string=0x2057e90
truncate foo) at postgres.c:1048
#14 0x00651fe9 in PostgresMain (argc=value optimized out,
argv=value optimized out, dbname=0x1fc9e98 jjanes, username=value
optimized out)
at postgres.c:3985
#15 0x0060f80b in BackendRun (argc=value optimized out,
argv=value optimized out) at postmaster.c:3987
#16 BackendStartup (argc=value optimized out, argv=value optimized out)
at postmaster.c:3676
#17 ServerLoop (argc=value optimized out, argv=value optimized out) at
postmaster.c:1577
#18 PostmasterMain (argc=value optimized out, argv=value optimized out)
at postmaster.c:1246
#19 0x005ad4e0 in main (argc=3, argv=0x179fd00) at main.c:196

This is using the same testing harness as in the last round of this patch.

Is there a way for me to dump the list of held/waiting lwlocks from gdb?

Cheers,

Jeff


Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 More generally, what do we think the point is of sending SIGQUIT
 rather than SIGKILL in the first place, and why does that point cease
 to be valid after 5 seconds?

Well, mostly it's about telling the client we're committing hara-kiri.
Without that, there's no very good reason to run quickdie() at all.

A practical issue with starting to send SIGKILL ourselves is that we
will no longer be able to reflexively diagnose server process died
on signal 9 as the linux OOM killer got you.  I'm not at all
convinced that the cases where SIGQUIT doesn't work are sufficiently
common to justify losing that property.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardware donation

2013-06-21 Thread Jim Nasby

On 6/21/13 1:45 PM, Josh Berkus wrote:

On 06/21/2013 09:48 AM, Jim Nasby wrote:

We've got some recently decommissioned servers and Enova is willing to
donate 2 of them to the community.

There's nothing terribly spectacular about the servers except for
memory. We have one 512G server available and the other would be either
192G or 96G. I know that folks already have access to machines with a
lot of cores, but I haven't seen reports of large memory machines.

CPU details vary but we're only looking at 20ish cores (though AFAIK
they're all 4 socket servers if that matters).

Local drives are nothing fancy (though some might possibly be SSD).


I'm sure we could use these for the performance test farm.  If we need
to replace some of the drives, the community has money for that.


We might actually have some spare SSDs floating around; I'm checking. We're 
also thinking we might be able to get at least one of these up to 256G by 
swapping memory around.

Am I correct that the most valuable thing to the community the large memory 
size?

Who can be point of contact from the community to arrange shipping, etc?
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardware donation

2013-06-21 Thread Mark Wong
On Fri, Jun 21, 2013 at 12:03 PM, Jim Nasby j...@nasby.net wrote:
 On 6/21/13 1:45 PM, Josh Berkus wrote:

 On 06/21/2013 09:48 AM, Jim Nasby wrote:

 We've got some recently decommissioned servers and Enova is willing to
 donate 2 of them to the community.

 There's nothing terribly spectacular about the servers except for
 memory. We have one 512G server available and the other would be either
 192G or 96G. I know that folks already have access to machines with a
 lot of cores, but I haven't seen reports of large memory machines.

 CPU details vary but we're only looking at 20ish cores (though AFAIK
 they're all 4 socket servers if that matters).

 Local drives are nothing fancy (though some might possibly be SSD).


 I'm sure we could use these for the performance test farm.  If we need
 to replace some of the drives, the community has money for that.


 We might actually have some spare SSDs floating around; I'm checking. We're
 also thinking we might be able to get at least one of these up to 256G by
 swapping memory around.

 Am I correct that the most valuable thing to the community the large memory
 size?

Yeah, I believe it's memory and storage.

 Who can be point of contact from the community to arrange shipping, etc?

I can be.

Regards,
Mark


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for RANGE ... PRECEDING windows in OVER

2013-06-21 Thread ian link
Forgive my ignorance, but I don't entirely understand the problem. What
does '+' and '-' refer to exactly?
Thanks!


On Fri, Jun 21, 2013 at 4:35 AM, Hitoshi Harada umi.tan...@gmail.comwrote:




 On Fri, Jun 21, 2013 at 3:20 AM, Craig Ringer cr...@2ndquadrant.comwrote:

 On 06/21/2013 05:32 PM, Hitoshi Harada wrote:

  I also later found that we are missing not only notion of '+' or '-',
  but also notion of 'zero value' in our catalog.  Per spec, RANGE BETWEEN
  needs to detect ERROR if the offset value is negative, but it is not
  always easy if you think about interval, numeric types as opposed to
  int64 used in ROWS BETWEEN.

 Zero can be tested for with `val = (@ val)` ie `val = abs(val)`. That
 should make sense for any type in which the concept of zero makes sense.


 Yeah, I mean, it needs to know if offset is negative or not by testing
 with zero.  So we need zero value or is_negative function for each type.

 Thanks,
 --
 Hitoshi Harada



Re: [HACKERS] GIN improvements part2: fast scan

2013-06-21 Thread Heikki Linnakangas

On 19.06.2013 11:56, Alexander Korotkov wrote:

On Wed, Jun 19, 2013 at 12:49 PM, Heikki Linnakangas
hlinnakan...@vmware.com  wrote:


On 19.06.2013 11:30, Alexander Korotkov wrote:


On Wed, Jun 19, 2013 at 11:48 AM, Heikki Linnakangas
hlinnakan...@vmware.com   wrote:

  On 18.06.2013 23:59, Alexander Korotkov wrote:


  I would like to illustrate that on example. Imagine you have fulltext

query
rare_termfrequent_term. Frequent term has large posting tree while

rare term has only small posting list containing iptr1, iptr2 and iptr3.
At
first we get iptr1 from posting list of rare term, then we would like to
check whether we have to scan part of frequent term posting tree where
iptr
iptr1. So we call pre_consistent([false, true]), because we know
that
rare term is not present for iptriptr2. pre_consistent returns
false.
So
we can start scanning frequent term posting tree from iptr1. Similarly
we
can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to
maximum possible pointer.



Thanks, now I understand the rare-term   frequent-term problem. Couldn't

you do that with the existing consistent function? I don't see why you
need
the new pre-consistent function for this.



In the case of two entries I can. But in the case of n entries things
becomes more complicated. Imagine you have term_1   term_2   ...
  term_n

query. When you get some item pointer from term_1 you can skip all the
lesser item pointers from term_2, term_3 ... term_n. But if all you have
for it is consistent function you have to call it with following check
arguments:
1) [false, false, false, ... , false]
2) [false, true, false, ... , false]
3) [false, false, true, ... , false]
4) [false, true, true, ..., false]
..
i.e. you have to call it 2^(n-1) times. But if you know the query specific
(i.e. in opclass) it's typically easy to calculate exactly what we need in
single pass. That's why I introduced pre_consistent.



Hmm. So how does that work with the pre-consistent function? Don't you
need to call that 2^(n-1)-1 times as well?



I call pre-consistent once with [false, true, true, ..., true].
Pre-consistent knows that each true passed to it could be false positive.
So, if it returns false it guarantees that consistent will be false for all
possible combinations.


Ok, I see.

I spent some time pondering this. I'd like to find a way to do something 
about this without requiring another user-defined function. A couple of 
observations:


1. The profile of that rare-term  frequent-term quest, without any 
patch, looks like this:


28,55%  postgres   ginCompareItemPointers
19,36%  postgres   keyGetItem
15,20%  postgres   scanGetItem
 7,75%  postgres   checkcondition_gin
 6,25%  postgres   gin_tsquery_consistent
 4,34%  postgres   TS_execute
 3,85%  postgres   callConsistentFn
 3,64%  postgres   FunctionCall8Coll
 3,19%  postgres   check_stack_depth
 2,60%  postgres   entryGetNextItem
 1,35%  postgres   entryGetItem
 1,25%  postgres   MemoryContextReset
 1,12%  postgres   MemoryContextSwitchTo
 0,31%  libc-2.17.so   __memcpy_ssse3_back
 0,24%  postgres   collectMatchesForHeapRow

I was quite surprised by seeing ginCompareItemPointers at the top. It 
turns out that it's relatively expensive to do comparisons in the format 
we keep item pointers, packed in 6 bytes, in 3 int16s. I hacked together 
a patch to convert ItemPointers into uint64s, when dealing with them in 
memory. That helped quite a bit.


Another important thing in the above profile is that calling 
consistent-function is taking up a lot of resources. And in the example 
test case you gave, it's called with the same arguments every time. 
Caching the result of consistent-function would be a big win.


I wrote a quick patch to do that caching, and together with the 
itempointer hack, I was able to halve the runtime of that test case. 
That's impressive, we probably should pursue that low-hanging fruit, but 
it's still slower than your fast scan patch by a factor of 100x. So 
clearly we do need an algorithmic improvement here, along the lines of 
your patch, or something similar.


2. There's one trick we could do even without the pre-consistent 
function, that would help the particular test case you gave. Suppose 
that you have two search terms A and B.  If you have just called 
consistent on a row that matched term A, but not term B, you can skip 
any subsequent rows in the scan that match A but not B. That means that 
you can skip over to the next row that matches B. This is essentially 
the same thing you do with the pre-consistent function, it's just a 
degenerate case of it. That helps as long as the search contains only 
one frequent term, but if it contains multiple, then you have to still 
stop at every row that matches more than one of the frequent terms.


3. I'm still not totally convinced that we shouldn't just build the 

Re: Review [was Re: [HACKERS] MD5 aggregate]

2013-06-21 Thread David Fetter
On Fri, Jun 21, 2013 at 10:48:35AM -0700, David Fetter wrote:
 On Mon, Jun 17, 2013 at 11:34:52AM +0100, Dean Rasheed wrote:
  On 15 June 2013 10:22, Dean Rasheed dean.a.rash...@gmail.com wrote:
   There seem to be 2 separate directions that this could go, which
   really meet different requirements:
  
   1). Produce an unordered sum for SQL to compare 2 tables regardless of
   the order in which they are scanned. A possible approach to this might
   be something like an aggregate
  
   md5_total(text/bytea) returns text
  
   that returns the sum of the md5 values of each input value, treating
   each md5 value as an unsigned 128-bit integer, and then producing the
   hexadecimal representation of the final sum. This should out-perform a
   solution based on numeric addition, and in typical cases, the result
   wouldn't be much longer than a regular md5 sum, and so would be easy
   to eyeball for differences.
  
  
  I've been playing around with the idea of an aggregate that computes
  the sum of the md5 hashes of each of its inputs, which I've called
  md5_total() for now, although I'm not particularly wedded to that
  name. Comparing it with md5_agg() on a 100M row table (see attached
  test script) produces interesting results:
  
  SELECT md5_agg(foo.*::text)
FROM (SELECT * FROM foo ORDER BY id) foo;
  
   50bc42127fb9b028c9708248f835ed8f
  
  Time: 92960.021 ms
  
  SELECT md5_total(foo.*::text) FROM foo;
  
   02faea7fafee4d253fc94cfae031afc43c03479c
  
  Time: 96190.343 ms
  
  Unlike md5_agg(), it is no longer a true MD5 sum (for one thing, its
  result is longer) but it seems like it would be very useful for
  quickly comparing data in SQL, since its value is not dependent on the
  row-order making it easier to use and better performing if there is no
  usable index for ordering.
  
  Note, however, that if there is an index that can be used for
  ordering, the performance is not necessarily better than md5_agg(), as
  this example shows. There is a small additional overhead per row for
  initialising the MD5 sums, and adding the results to the total, but I
  think the biggest factor is that md5_total() is processing more data.
  The reason is that MD5 works on 64-byte blocks, so the total amount of
  data going through the core MD5 algorithm is each row's size is
  rounded up to a multiple of 64. In this simple case it ends up
  processing around 1.5 times as much data:
  
  SELECT sum(length(foo.*::text)) AS md5_agg,
 sum(((length(foo.*::text)+63)/64)*64) AS md5_total FROM foo;
  
md5_agg   |  md5_total
  +-
   8103815438 | 12799909248
  
  although of course that overhead won't be as large on wider tables,
  and even in this case the overall performance is still on a par with
  md5_agg().
  
  ISTM that both aggregates are potentially useful in different
  situations. I would probably typically use md5_total() because of its
  simplicity/order-independence and consistent performance, but
  md5_agg() might also be useful when comparing with external data.
  
  Regards,
  Dean

 Performance review (skills needed: ability to time performance)
 
 Does the patch slow down simple tests?
 
 Yes.  For an MD5 checksum of some random data, here are
 results from master:
 
 shackle@postgres:5493=# WITH t1 AS (SELECT 
 string_agg(chr(floor(255*random()+1)::int),'') AS a FROM 
 generate_series(1,1)),
 postgres-# t2 AS (SELECT a FROM t1 CROSS JOIN 
 generate_series(1,1))
 postgres-# select md5(a::text) FROM t2;
 shackle@postgres:5493=# \timing 
 Timing is on.
 shackle@postgres:5493=# \g
 Time: 955.393 ms
 shackle@postgres:5493=# \g
 Time: 950.909 ms
 shackle@postgres:5493=# \g
 Time: 947.955 ms
 shackle@postgres:5493=# \g
 Time: 946.193 ms
 shackle@postgres:5493=# \g
 Time: 950.591 ms
 shackle@postgres:5493=# \g
 Time: 952.346 ms
 shackle@postgres:5493=# \g
 Time: 948.623 ms
 shackle@postgres:5493=# \g
 Time: 939.819 ms
 
 and here from master + the patch:
 
 WITH t1 AS (SELECT string_agg(chr(floor(255*random()+1)::int),'') 
 AS a FROM generate_series(1,1)),
 t2 AS (SELECT a FROM t1 CROSS JOIN generate_series(1,1))
 select md5(a::text) FROM t2;
 Time: 1129.178 ms
 shackle@postgres:5494=# \g
 Time: 1134.013 ms
 shackle@postgres:5494=# \g
 Time: 1124.387 ms
 shackle@postgres:5494=# \g
 Time: 1119.733 ms
 shackle@postgres:5494=# \g
 Time: 1104.906 ms
 shackle@postgres:5494=# \g
 Time: 1137.055 ms
 shackle@postgres:5494=# \g
 Time: 1128.977 ms
 shackle@postgres:5494=# \g

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Robert Haas
On Fri, Jun 21, 2013 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 More generally, what do we think the point is of sending SIGQUIT
 rather than SIGKILL in the first place, and why does that point cease
 to be valid after 5 seconds?

 Well, mostly it's about telling the client we're committing hara-kiri.
 Without that, there's no very good reason to run quickdie() at all.

That's what I thought, too.  It seems to me that if we think that's
important, then it's important even if it takes more than 5 seconds
for some reason.

 A practical issue with starting to send SIGKILL ourselves is that we
 will no longer be able to reflexively diagnose server process died
 on signal 9 as the linux OOM killer got you.  I'm not at all
 convinced that the cases where SIGQUIT doesn't work are sufficiently
 common to justify losing that property.

I'm not, either.  Maybe this question will provoke many indignant
responses, but who in their right mind even uses immediate shutdown on
a production server with any regularity?  The shutdown checkpoint is
sometimes painfully long, but do you really want to run recovery just
to avoid it?  And in the rare case where an immediate shutdown fails
to work, what's wrong will killall -9 postgres?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardware donation

2013-06-21 Thread Josh Berkus

 Who can be point of contact from the community to arrange shipping, etc?
 
 I can be.

And I'll handle the tax credit once the servers are received.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Alvaro Herrera
Andres Freund escribió:
 On 2013-06-20 22:36:45 -0400, Alvaro Herrera wrote:

  If we leave postmaster running after SIGKILLing its children, the only
  thing we can do is have it continue to SIGKILL processes continuously
  every few seconds until they die (or just sit around doing nothing until
  they all die).  I don't think this will have a different effect than
  postmaster going away trusting the first SIGKILL to do its job
  eventually.
 
 I think it should just wait till all its child processes are dead. No
 need to repeat sending the signals - as you say, that won't help.

OK, I can buy that.  So postmaster stays around waiting in ServerLoop
until all children are gone; and if any persists for whatever reason,
well, tough.

 What we could do to improve the robustness a bit - at least on linux -
 is to prctl(PR_SET_PDEATHSIG, SIGKILL) which would cause children to be
 killed if the postmaster goes away...

This is an interesting idea (even if it has no relationship to the patch
at hand).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Christopher Browne
The case where I wanted routine shutdown immediate (and I'm not sure I
ever actually got it) was when we were using IBM HA/CMP, where I wanted a
terminate with a fair bit of prejudice.

If we know we want to switch right away now, immediate seemed pretty much
right.  I was fine with interrupting user sessions, and there wasn't as
much going on in the way of system background stuff back then.

I wasn't keen on waiting on much of anything.  The background writer ought
to be keeping things from being too desperately out of date.

If there's stuff worth waiting a few seconds for, I'm all ears.

But if I have to wait arbitrarily long, colour me unhappy.

If I have to distinguish, myself, between a checkpoint nearly done flushing
and a backend that's stuck waiting forlornly for filesystem access, I'm
inclined to kill -9 and hope recovery doesn't take *too* long on the next
node...

If shutting a server down in an emergency situation requires a DBA to look
in, as opposed to init.d doing its thing, I think that's pretty much the
same problem too.


[HACKERS] Fixed Cardinality estimation with equality predicates between column of the same table

2013-06-21 Thread desmodemone
Hi all,
  I see a strange behavior ( for me ) on 9.2 (but seems the same on
9.1 and 9.3)  of the optimizer on query like that :

/* create a table with random data and 2 rows */

create table test1 ( id int not null primary key, state1 int not null
default 0, state2 int not null default 0, state3 int not null default 0 );

 insert into test1 (id, state1, state2, state3) select i,
(random()*3)::int, (random())::int, (random()*100)::int from
generate_series (1, 2) as gs(i) ;

analyze test1 ;

/* between  same columns  */

explain  select * from test1 where state1=state1 ;
QUERY PLAN
--
 Seq Scan on test1  (cost=0.00..359.00 rows=100 width=16)
   Filter: (state1 = state1)
(2 rows)

test3=# explain  select * from test1 where state2=state2 ;
QUERY PLAN
--
 Seq Scan on test1  (cost=0.00..359.00 rows=100 width=16)
   Filter: (state2 = state2)
(2 rows)

/* between different columns of same table  */

test3=# explain  select * from test1 where state1=state2 ;
QUERY PLAN
--
 Seq Scan on test1  (cost=0.00..359.00 rows=100 width=16)
   Filter: (state1 = state2)
(2 rows)

===

/* create a table with random data and 10 rows to verify  */

create table test2 ( id int not null primary key, state1 int not null
default 0, state2 int not null default 0, state3 int not null default 0 );

 insert into test2 (id, state1, state2, state3) select i,
(random()*3)::int, (random())::int, (random()*100)::int from
generate_series (1, 10) as gs(i) ;

test3=#  analyze  test2 ;
ANALYZE
test3=# explain  select * from test2 where
state1=state3;QUERY PLAN
---
 Seq Scan on test2  (cost=0.00..1791.00 rows=500 width=16)
   Filter: (state1 = state3)
(2 rows)

test3=# explain  select * from test2 where state1=state2;
QUERY PLAN
---
 Seq Scan on test2  (cost=0.00..1791.00 rows=500 width=16)
   Filter: (state1 = state2)
(2 rows)

test3=# explain  select * from test2 where state1=state1;
QUERY PLAN
---
 Seq Scan on test2  (cost=0.00..1791.00 rows=500 width=16)
   Filter: (state1 = state1)
(2 rows)



It's seems always 0.5% of the rows , and it seems indipendent of the type
of data you have in row :

/*add a column where costant value named c3 */

 alter table test1 add c3 int default 1 ;
ALTER TABLE

analyze test1 ;
ANALYZE

explain  select * from test1  where state1=c3;
QUERY PLAN
--
 Seq Scan on test1  (cost=0.00..378.00 rows=100 width=20)
   Filter: (state1 = c3)
(2 rows)

/*add a column where costant value named c3 */

 alter table test2 add c3 int default 1 ;
ALTER TABLE
 analyze test2 ;
ANALYZE
 explain  select * from test2  where state1=c3;
QUERY PLAN
---
 Seq Scan on test2  (cost=0.00..1887.00 rows=500 width=20)
   Filter: (state1 = c3)
(2 rows)

/* add another constant column */

test3=# alter table test2 add c4 int default 1 ;
ALTER TABLE
test3=# analyze test2 ;
ANALYZE
test3=# explain  select * from test2  where c3=c4 ;
QUERY PLAN
---
 Seq Scan on test2  (cost=0.00..1887.00 rows=500 width=24)
   Filter: (c3 = c4)

obviously the statistics are ok :



Always 0.5%.

Greetings

Matteo


Re: [HACKERS] Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Andres Freund escribió:
 What we could do to improve the robustness a bit - at least on linux -
 is to prctl(PR_SET_PDEATHSIG, SIGKILL) which would cause children to be
 killed if the postmaster goes away...

 This is an interesting idea (even if it has no relationship to the patch
 at hand).

The traditional theory has been that that would be less robust, not
more so.  Child backends are (mostly) able to carry out queries whether
or not the postmaster is around.  True, you can't make new connections,
but how does killing the existing children make that better?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unaccent performance

2013-06-21 Thread Thom Brown
On 21 June 2013 19:04, Thom Brown t...@linux.com wrote:

 Hi,

 The unaccent extension is great, especially with its customisability, but
 it's not always easy to recommend.  I witnessed a customer using no less
 than 56 nested replace functions in an SQL function.  I looked to see how
 much this can be mitigated by unaccent.  It turns out that not all the
 characters they were replacing can be replaced by unaccent, either because
 they replace more than 1 character at a time, or the character they're
 replacing, for some reason, isn't processed by unaccent, even with a custom
 rules file.

 So there were 20 characters I could identify that they were replacing.  I
 made a custom rules file and compared its performance to the
 difficult-to-manage set of nested replace calls.

 CREATE OR REPLACE FUNCTION public.myunaccent(sometext text)
  RETURNS text
  LANGUAGE sql
  IMMUTABLE
 AS $function$
 SELECT
 replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(sometext,'ą','a'),'Ą','A'),'ă','a'),'Ă','A'),'ā','a'),'Ā','A'),'æ','a'),'å','a'),'ä','a'),'ã','a'),'â','a'),'á','a'),'à','a'),'Æ','A'),'Å','A'),'Ä','A'),'Ã','A'),'Â','A'),'Á','A'),'À','A')
 ;
 $function$

 postgres=# SELECT myunaccent(sometext::text) FROM (SELECT
 'ÀÁÂÃÄÅÆàáâãäåæĀāĂ㥹' sometext FROM generate_series(1,100)) x OFFSET
 99 LIMIT 1;
   myunaccent
 --
  AAAaaaAaAaAa
 (1 row)

 Time: 726.282 ms
 postgres=# SELECT unaccent(sometext::text) FROM (SELECT
 'ÀÁÂÃÄÅÆàáâãäåæĀāĂ㥹' sometext FROM generate_series(1,100)) x OFFSET
 99 LIMIT 1;
unaccent
 --
  AAAaaaAaAaAa
 (1 row)

 Time: 3305.252 ms

 The timings are actually pretty much the same even if I introduce 187
 nested replace calls for every line in the unaccent.rules file for 187
 characters.  But the same character set with unaccent increases to 7418.526
 ms with the same type of query as above.  That's 10 times more expensive.

 Is there a way to boost the performance to make its adoption more
 palatable?


Another test passing in a string of 10 characters gives the following
timings:

unaccent: 240619.395 ms
myunaccent: 785.505 ms

I guess this must indicate that unaccent is processing all rows, and
myunaccent is only being run on the 1 select row?  I can't account for
myunaccent always being almost the same duration regardless of string
length otherwise.  This is probably an incorrect assessment of performance.

Another test inserting long text strings into a text column of a table
100,000 times, then updating another column to have that unaccented value
using both methods:

unaccent: 3867.306 ms
myunaccent: 43611.732 ms

So I guess this complaint about performance is all just noise.

However, pushing that pointless complaint to one side, I would like to have
the ability to have unaccent support more characters that it doesn't
currently seem to support, such as bullet points, ellipses etc., and also
more than 1 character being replaced.  Naturally these aren't appropriate
to fall under the unaccent function itself, but the rules file is good
starting point.  It would be a bit like translate, except it would use a
rules file instead of providing strings of single characters to convert.

So say we wanted (trademark) to be converted into ™ just as an example,
or ; to ..  We can't do that with unaccent, but in order to avoid a
huge list of replace functions, a function like unaccent, with a few
adaptations, would solve the problem.

e.g.:

SELECT transform(my_custom_dictionary, 'Commodore Amiga(trademark);')

would return

Commodore Amiga™.

This would ideally somehow cater for replacing tabs and spaces too.

-- 
Thom


Re: [HACKERS] Request for Patch Feedback: Lag Lead Window Functions Can Ignore Nulls

2013-06-21 Thread Troels Nielsen
Hello all

I've been examining PostgreSQL to gain a greater understanding
of RDBMS. (Thanks for a nice, very educational system!)

In the process I've been looking into a few problems and the
complications of this patch appeared relatively uninvolved, so I
tried to look for a solution.

I found the following:

The grammar conflict appears to be because of ambiguities in:
1. table_ref (used exclusively in FROM clauses)
2. index_elem (used exclusively in INDEX creation statements).

Now, this doesn't seem to make much sense, as AFAICT window functions
are explicitly disallowed in these contexts (transformWindowFuncCall
will yield errors, and I can't really wrap my head around what a
window function call would mean there).

I therefore propose a simple rearrangement of the grammar,
syntactically disallowing window functions in the outer part of those
contexts (a_expr's inside can't and shouldn't be done much about)
which will allow both RESPECT and IGNORE to become unreserved
keywords, without doing any lexer hacking or abusing the grammar.

I've attached a patch which will add RESPECT NULLS and IGNORE NULLS to
the grammar in the right place. Also the window frame options are set
but nothing more, so this patch needs to be merged with Nicholas White's
original patch.

One problem I see with this approach to the grammar is that the
error messages will change when putting window functions in any of the
forbidden places. The new error messages are I think worse and less
specific than the old ones. I suppose that can be fixed though, or
maybe the problem isn't so severe.

Example of old error message:
window functions are not allowed in functions in FROM

New error message:
syntax error at or near OVER



in addition I think the original patch as it stands has a few
problems that I haven't seen discussed:

1. The result of the current patch using lead

create table test_table (
   id serial,
   val integer);
insert into test_table (val) select * from unnest(ARRAY[1,2,3,4,NULL, NULL,
NULL, 5, 6, 7]);

select val, lead(val, 2) ignore nulls over (order by id) from test_table;
 val | lead
-+--
   1 |3
   2 |4
   3 |4
   4 |4
 |4
 |5
 |6
   5 |7
   6 |7
   7 |7
(10 rows)

I would expect it to output:

select val, lead(val, 2) ignore nulls over (order by id) from test_table;
 val | lead
-+--
   1 |3
   2 |4
   3 |5
   4 |6
 |6
 |6
 |6
   5 |7
   6 |
   7 |
(10 rows)

That is: skip two rows forward not counting null rows.

The lag behavior works well as far as I can see.

2. It would be nice if an error was given when ignore nulls was used
on a function for which it had no effect. Perhaps this should be up to
the different window function themselves to do though.

Apart from those points I think the original patch is nice and provides a
functionality
that's definitely nice to have.

Kind Regards
Troels Nielsen


On Fri, Jun 21, 2013 at 8:11 PM, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Jun 21, 2013 at 11:33 AM, Jeff Davis pg...@j-davis.com wrote:
  Regardless of what syntax we settle on, we should also make sure that
  the conflict is intrinsic to the grammar and can't be factored out, as
  Tom suggested upthread.  It's not obvious to me what the actual
  ambiguity is here.  If you've seen select lag(num,0) and the
  lookahead token is respect, what's the problem?  It sort of looks
  like it could be a column label, but not even unreserved keywords can
  be column labels, so that's not it.  Probably deserves a bit more
  investigation...
 
  I think the problem is when the function is used as a table function;
  e.g.:
 
SELECT * FROM generate_series(1,10) respect;

 Ah ha.  Well, there's probably not much help for that.  Disallowing
 keywords as table aliases would be a cure worse than the disease, I
 think.  I suppose the good news is that there probably aren't many
 people using RESPECT as a column name, but I have a feeling that we're
 almost certain to get complaints about reserving IGNORE.  I think that
 will have to be quoted as a PL/pgsql variable name as well.  :-(

  We could just add additional, optional Boolean argument to the
  existing functions.  It's non-standard, but we avoid adding keywords.
 
  I thought about that, but it is awkward because the argument would have
  to be constant (or, if not, it would be quite strange).

 True... but e.g. string_agg() has the same issue.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



respect_nulls_and_ignore_nulls_grammar.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixed Cardinality estimation with equality predicates between column of the same table

2013-06-21 Thread Josh Berkus
On 06/21/2013 02:33 PM, desmodemone wrote:
 Hi all,
   I see a strange behavior ( for me ) on 9.2 (but seems the same on
 9.1 and 9.3)  of the optimizer on query like that :
 

Matteo, I just posted this on -performance.  See Tom's answer.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Robert Haas
On Fri, Jun 21, 2013 at 5:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Andres Freund escribió:
 What we could do to improve the robustness a bit - at least on linux -
 is to prctl(PR_SET_PDEATHSIG, SIGKILL) which would cause children to be
 killed if the postmaster goes away...

 This is an interesting idea (even if it has no relationship to the patch
 at hand).

 The traditional theory has been that that would be less robust, not
 more so.  Child backends are (mostly) able to carry out queries whether
 or not the postmaster is around.

I think that's the Tom Lane theory.  The Robert Haas theory is that if
the postmaster has died, there's no reason to suppose that it hasn't
corrupted shared memory on the way down, or that the system isn't
otherwise heavily fuxxored in some way.

 True, you can't make new connections,
 but how does killing the existing children make that better?

It allows you to start a new postmaster in a timely fashion, instead
of waiting for an idle connection that may not ever terminate without
operator intervention.

Even if it were possible to start a new postmaster that attached to
the existing shared memory segment and began spawning new children, I
think I'd be heavily in favor of killing the old ones off first and
doing a full system reset just for safety.  But it isn't, so what you
get is a crippled system that never recovers without operator
intervention.  And note that I'm not talking about pg_ctl restart;
that fails because the children have the shmem segment still attached
and the postmaster, which is the only thing listed in the PID file, is
already dead.  I'm talking about killall -9 postgres, at least.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-21 Thread Jehan-Guillaume (ioguix) de Rorthais
On 20/06/2013 03:25, Tatsuo Ishii wrote:
 On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii is...@postgresql.org wrote:
 On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
[...]

 The only bottleneck here, is WAL archiving. This assumes you can
 afford WAL archiving at least to a local filesystem, and that the WAL
 compressor is able to cope with WAL bandwidth. But I have no reason to
 think you'd be able to cope with dirty-map updates anyway if you were
 saturating the WAL compressor, as the compressor is more efficient on
 amortized cost per transaction than the dirty-map approach.
 
 Thank you for detailed explanation. I will think more about this.

Just for the record, I was mulling over this idea since a bunch of
month. I even talked about that with Dimitri Fontaine some weeks ago
with some beers :)

My idea came from a customer during a training explaining me the
difference between differential and incremental backup in Oracle.

My approach would have been to create a standalone tool (say
pg_walaggregate) which takes a bunch of WAL from archives and merge them
in a single big file, keeping only the very last version of each page
after aggregating all their changes. The resulting file, aggregating all
the changes from given WAL files would be the differential backup.

A differential backup resulting from a bunch of WAL between W1 and Wn
would help to recover much faster to the time of Wn than replaying all
the WALs between W1 and Wn and saves a lot of space.

I was hoping to find some time to dig around this idea, but as the
subject rose here, then here are my 2¢!

Cheers,
-- 
Jehan-Guillaume (ioguix) de Rorthais


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-21 Thread Michael Paquier
OK let's finalize this patch first. I'll try to send an updated patch
within today.

On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-06-21 20:54:34 +0900, Michael Paquier wrote:
 On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
   @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid 
   OIDNewHeap,
   Is it actually possible to get here with multiple toast indexes?
  Actually it is possible. finish_heap_swap is also called for example
  in ALTER TABLE where rewriting the table (phase 3), so I think it is
  better to protect this code path this way.
 
  But why would we copy invalid toast indexes over to the new relation?
  Shouldn't the new relation have been freshly built in the previous
  steps?
 What do you think about that? Using only the first valid index would be 
 enough?

 What I am thinking about is the following: When we rewrite a relation,
 we build a completely new toast relation. Which will only have one
 index, right? So I don't see how this could could be correct if we deal
 with multiple indexes. In fact, the current patch's swap_relation_files
 throws an error if there are multiple ones around.
Yes, OK. Let me have a look at the code of CLUSTER more in details
before giving a precise answer, but I'll try to remove that renaming
part.  Btw, I'd like to add an assertion in the code at least to
prevent wrong use of this code path.

   diff --git a/src/include/utils/relcache.h 
   b/src/include/utils/relcache.h
   index 8ac2549..31309ed 100644
   --- a/src/include/utils/relcache.h
   +++ b/src/include/utils/relcache.h
   @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
typedef Relation *RelationPtr;
  
/*
   + * RelationGetIndexListIfValid
   + * Get index list of relation without recomputing it.
   + */
   +#define RelationGetIndexListIfValid(rel) \
   +do { \
   + if (rel-rd_indexvalid == 0) \
   + RelationGetIndexList(rel); \
   +} while(0)
  
   Isn't this function misnamed and should be
   RelationGetIndexListIfInValid?
  When naming that; I had more in mind: get the list of indexes if it
  is already there. It looks more intuitive to my mind.
 
  I can't follow. RelationGetIndexListIfValid() doesn't return
  anything. And it doesn't do anything if the list is already valid. It
  only does something iff the list currently is invalid.
 In this case RelationGetIndexListIfInvalid?

 Yep. Suggested that above ;). Maybe RelationFetchIndexListIfInvalid()?

 Hm. Looking at how this is currently used - I am afraid it's not
 correct... the reason RelationGetIndexList() returns a copy is that
 cache invalidations will throw away that list. And you do index_open()
 while iterating over it which will accept invalidation messages.
 Mybe it's better to try using RelationGetIndexList directly and measure
 whether that has a measurable impact=
Yes, I was wondering about potential memory leak that list_copy could
introduce in tuptoaster.c when doing a bulk insert, that's the only
reason why I added this macro.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] problem with commitfest redirection

2013-06-21 Thread Martín Marqués

When ever I try to see the patch from this commit it never loads:

https://commitfest.postgresql.org/action/patch_view?id=1129

Some problem there? I can see other patches, from other commits.

--
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread MauMau

From: Robert Haas robertmh...@gmail.com
On Thu, Jun 20, 2013 at 12:33 PM, Alvaro Herrera

alvhe...@2ndquadrant.com wrote:

I will go with 5 seconds, then.


I'm uncomfortable with this whole concept, and particularly with such
a short timeout.  On a very busy system, things can take a LOT longer
than they think we should; it can take 30 seconds or more just to get
a prompt back from a shell command.  5 seconds is the blink of an eye.


I'm comfortable with 5 seconds.  We are talking about the interval between 
sending SIGQUIT to the children and then sending SIGKILL to them.  In most 
situations, the backends should terminate immediately.  However, as I said a 
few months ago, ereport() call in quickdie() can deadlock indefinitely. 
This is a PostgreSQL's bug.  In addition, Tom san was concerned that some 
PLs (PL/Perl or PL/Python?) block SIGQUIT while executing the UDF, so they 
may not be able to respond to the immediate shutdown request.


What DBAs want from pg_ctl stop -mi is to shutdown the database server as 
immediately as possible.  So I think 5 second is reasonable.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread MauMau

From: Robert Haas robertmh...@gmail.com

On Fri, Jun 21, 2013 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Robert Haas robertmh...@gmail.com writes:

More generally, what do we think the point is of sending SIGQUIT
rather than SIGKILL in the first place, and why does that point cease
to be valid after 5 seconds?


Well, mostly it's about telling the client we're committing hara-kiri.
Without that, there's no very good reason to run quickdie() at all.


That's what I thought, too.  It seems to me that if we think that's
important, then it's important even if it takes more than 5 seconds
for some reason.


I guess Tom san is saying we should be as kind as possible to the client, 
so try to notify the client of the shutdown.  Not complete kindness. 
Because the DBA requested immediate shutdown by running pg_ctl stop -mi, 
the top priority is to shutdown the database server as immediately as 
possible.  The idea here is to try to be friendly to the client as long as 
the DBA can stand.  That's tthe 5 second.




A practical issue with starting to send SIGKILL ourselves is that we
will no longer be able to reflexively diagnose server process died
on signal 9 as the linux OOM killer got you.  I'm not at all
convinced that the cases where SIGQUIT doesn't work are sufficiently
common to justify losing that property.


I'm not, either.  Maybe this question will provoke many indignant
responses, but who in their right mind even uses immediate shutdown on
a production server with any regularity?  The shutdown checkpoint is
sometimes painfully long, but do you really want to run recovery just
to avoid it?  And in the rare case where an immediate shutdown fails
to work, what's wrong will killall -9 postgres?


Checkpoint is irrelevant here because we are discussing immediate shutdown. 
Some problems with killall -9 postgres are:


1. It's not available on Windows.
2. It may kill other database server instances running on the same machine.

Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem with commitfest redirection

2013-06-21 Thread Jaime Casanova
On Fri, Jun 21, 2013 at 8:56 PM, Martín Marqués mar...@2ndquadrant.com wrote:
 When ever I try to see the patch from this commit it never loads:

 https://commitfest.postgresql.org/action/patch_view?id=1129

 Some problem there? I can see other patches, from other commits.


Yes, the URL is wrong. right URL is
http://www.postgresql.org/message-id/CAFjNrYuh=4Vwnv=2n7cj0jjuwc4hool1epxsoflj6s19u02...@mail.gmail.com


--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem with commitfest redirection

2013-06-21 Thread Martín Marqués

El 21/06/13 23:47, Jaime Casanova escribió:

On Fri, Jun 21, 2013 at 8:56 PM, Martín Marqués mar...@2ndquadrant.com wrote:

When ever I try to see the patch from this commit it never loads:

https://commitfest.postgresql.org/action/patch_view?id=1129

Some problem there? I can see other patches, from other commits.



Yes, the URL is wrong. right URL is
http://www.postgresql.org/message-id/CAFjNrYuh=4Vwnv=2n7cj0jjuwc4hool1epxsoflj6s19u02...@mail.gmail.com


Yes, found out that later. Could there be other URL's like that?

--
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Jun 21, 2013 at 5:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The traditional theory has been that that would be less robust, not
 more so.  Child backends are (mostly) able to carry out queries whether
 or not the postmaster is around.

 I think that's the Tom Lane theory.  The Robert Haas theory is that if
 the postmaster has died, there's no reason to suppose that it hasn't
 corrupted shared memory on the way down, or that the system isn't
 otherwise heavily fuxxored in some way.

Eh?  The postmaster does its level best never to touch shared memory
(after initialization anyway).

 True, you can't make new connections,
 but how does killing the existing children make that better?

 It allows you to start a new postmaster in a timely fashion, instead
 of waiting for an idle connection that may not ever terminate without
 operator intervention.

There may be something in that argument, but I find the other one
completely bogus.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for RANGE ... PRECEDING windows in OVER

2013-06-21 Thread Craig Ringer
On 06/22/2013 03:30 AM, ian link wrote:
 Forgive my ignorance, but I don't entirely understand the problem. What
 does '+' and '-' refer to exactly?

Consider RANGE 4.5 PRECEDING'.

You need to be able to test whether, for the current row 'b', any given
row 'a' is within the range (b - 4.5)  a = b . Not 100% sure about the
 vs = boundaries, but that's irrelevant for the example.

To test that, you have to be able to do two things: you have to be able
to test whether one value is greater than another, and you have to be
able to add or subtract a constant from one of the values.

Right now, the b-tree access method provides information on the ordering
operators  = =  =  , which provides half the answer. But these
don't give any concept of *distance* - you can test ordinality but not
cardinality.

To implement the different by 4.5 part, you have to be able to add 4.5
to one value or subtract it from the other.

The obvious way to do that is to look up the function that implements
the '+' or '-' operator, and do:

((OPERATOR(+))(a, 4.5))  b AND (a = b)

or

((OPERATOR(-))(b, 4.5))  a AND (a = b);

The problem outlined by Tom in prior discussion about this is that
PostgreSQL tries really hard not to assume that particular operator
names mean particular things. Rather than knowing that + is always
an operator that adds two values together; is transitive, symmetric and
reflexive, PostgreSQL requires that you define an *operator class* that
names the operator that has those properties.

Or at least, it does for less-than, less-than-or-equals, equals,
greater-than-or-equals, greater-than, and not-equals as part of the
b-tree operator class, which *usually* defines these operators as  = =
=  , but you could use any operator names you wanted if you really
liked.

Right now (as far as I know) there's no operator class that lets you
identify operators for addition and subtraction in a similar way. So
it's necessary to either add such an operator class (in which case
support has to be added for it for every type), extend the existing
b-tree operator class to provide the info, or blindly assume that +
and - are always addition and subtraction.

For an example of why such assumptions are a bad idea, consider matrix
multiplication. Normally, a * b = b * a, but this isn't true for
multiplication of matrices. Similarly, if someone defined a + operator
as an alias for string concatenation (||), we'd be totally wrong to
assume we could use that for doing range-offset windowing.

So. Yeah. Operator classes required, unless we're going to change the
rules and make certain operator names special in PostgreSQL, so that
if you implement them they *must* have certain properties. This seems
like a pretty poor reason to add such a big change.

I hope this explanation (a) is actually correct and (b) is helpful.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-21 Thread Michael Paquier
On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund and...@2ndquadrant.com wrote:
 Hm. Looking at how this is currently used - I am afraid it's not
 correct... the reason RelationGetIndexList() returns a copy is that
 cache invalidations will throw away that list. And you do index_open()
 while iterating over it which will accept invalidation messages.
 Mybe it's better to try using RelationGetIndexList directly and measure
 whether that has a measurable impact=
By looking at the comments of RelationGetIndexList:relcache.c,
actually the method of the patch is correct because in the event of a
shared cache invalidation, rd_indexvalid is set to 0 when the index
list is reset, so the index list would get recomputed even in the case
of shared mem invalidation.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-06-21 Thread Amit kapila
On Friday, June 21, 2013 11:48 PM Robert Haas wrote:
On Thu, Jun 20, 2013 at 12:18 AM, Amit Kapila amit.kap...@huawei.com wrote:
 Auto.conf- 1 Vote (Josh)
 System.auto.conf - 1 Vote (Josh)
 Postgresql.auto.conf - 2 Votes (Zoltan, Amit)
 Persistent.auto.conf - 0 Vote
 generated_by_server.conf - 1 Vote (Peter E)
 System.conf  - 1 Vote (Magnus)
 alter_system.conf- 1 Vote (Alvaro)

 I had consolidated the list, so that it would be helpful for committer to
 decide among proposed names for this file.

 I'll also vote for postgresql.auto.conf.

  Thanks to all of you for suggesting meaningful names. I will change the name 
of file to postgresql.auto.conf.
  Kindly let me know if there is any objection to it.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-06-21 Thread Amit kapila
On Friday, June 21, 2013 11:43 PM Robert Haas wrote:
On Fri, Jun 21, 2013 at 12:56 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Jun 22, 2013 at 12:11 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 19, 2013 at 1:59 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 6/7/13 12:14 AM, Amit Kapila wrote:
 I will change the patch as per below syntax if there are no objections:

  ALTER SYSTEM SET configuration_parameter {TO | =} {value, | 'value'};

 I do like using ALTER SYSTEM in general, but now that I think about it,
 the 9.3 feature to create global settings in pg_db_role_setting would
 also have been a candidate for exactly that same syntax if it had been
 available.  In fact, if we do add ALTER SYSTEM, it might make sense to
 recast that feature into that syntax.

 It might be clearer to do something like ALTER SYSTEM SET EXTERNAL FILE
 or something like that.  It's only a small syntax change, so don't worry
 about it too much, but let's keep thinking about it.

 I think that anything we want to add should either go before SET or
 after the value, such as:

 ALTER SYSTEM SET configuration_parameter = 'value' IN FILE 'myconf.conf';
 ALTER SYSTEM CONFIGURATION FILE SET configuration_parameter = 'file';
 ALTER SYSTEM CONFIGURATION FILE 'myconf.conf' SET
 configuration_parameter = 'file';

 I agree we shouldn't back ourselves into a syntactic corner.

 Maybe this idea has been already discussed, but why don't we just
 add the function like update_config_file(parameter, value)? We can
 commit the core of the patch with that function API first, and then
 we can discuss the syntax and add another API like ALTER SYSTEM
 later.

 We now have set_config() function to set the parameter,
 so adding the function for this feature should not be a surprise.
 If the name of the function is, for example, update_conf_file,
 most users would think that it's intuitive even if SIGHUP is not
 automatically sent immediately. We don't need to emit
 the message like 'This setting change will not be loaded until SIGHUP'.

 I could certainly support that plan.  I'm satisfied with the ALTER
 SYSTEM syntax and feel that might find other plausible uses, but
 functional notation would be OK with me, too.


I can update the patch to have a function as suggested by Fujii-san if it can 
be decided
that for the first committed version it will be sufficient.
OTOH already we already have consensus on ALTER SYSTEM syntax apart from few 
extra keywords to make it more meaningful/extendible. 
I think we can consider the current syntax (ALTER SYSTEM SET ..) and make that 
behavior as default for writing to auto file.
In future we can extend it with other keywords depending on usage.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers