Re: [HACKERS] recovery_target_action = pause hot_standby = off

2015-03-15 Thread Magnus Hagander
On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2015-03-12 15:52:02 +0100, Andres Freund wrote:
/*
 * Override any inconsistent requests. Not that this is a change
 * of behaviour in 9.5; prior to this we simply ignored a request
 * to pause if hot_standby = off, which was surprising behaviour.
 */
if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE 
recoveryTargetActionSet 
standbyState == STANDBY_DISABLED)
recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;

 While it's easy enough to fix I rather dislike the whole intent here
 though. *Silently* switching the mode of operation in a rather
 significant way seems like a bad idea to me. At the very least we need
 to emit a LOG message about this; but I think it'd be much better to
 error out instead.

 9.5's behaviour was already quite surprising. But changing things to a
 different surprising behaviour seems like a bad idea.


+1. Especially for sensitive operations like this, having
predictable-behavior-or-error is usually the best choice.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Crash on SRF execution

2015-03-15 Thread Andres Freund
Hi,

On 2015-03-15 17:59:39 +0200, Itai wrote:
 Thanks for the quick response!! :)
 But I don't get it... isn't:
 if (SRF_IS_FIRSTCALL()){
 }
 the iterator one time
 initialization block where I setup the data to be iterated
 upon? 
 
 Can you please elaborate on how should I fix this? I'm probably missing 
 something basic...

if (SRF_IS_FIRSTCALL())
{
 length = 4000;
 base_num = 120300;
 list = (NumberList *)palloc(sizeof(NumberList));
 list-pp_numbers = (Number **)palloc(sizeof(Number*) * length);

You allocate memory in the per call context here.

 list-length = length;
 i = 0;
 for (; i  length; i++)
 {
  num = (Number *)palloc(sizeof(Number));
  num-value = base_num + i;
  num-is_even = ((base_num + i) % 2 == 0) ? 1 : 0;
  list-pp_numbers[i] = num;
 }
 ereport(INFO, (errmsg(--- data source ---)));
 i = 0;
 for (; i  length; i++)
 {
  ereport(INFO, (errmsg(value: %d, list-pp_numbers[i]-value)));
  ereport(INFO, (errmsg(is_even: %d, list-pp_numbers[i]-is_even)));  

 }
 
 funcctx = SRF_FIRSTCALL_INIT();
 oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);

Because you only switch the memory context here. Move this up, to the
beginning of the SRF_IS_FIRSTCALL block. Before the palloc()s.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

2015-03-15 Thread Andres Freund
On 2014-12-08 02:18:11 +0100, Petr Jelinek wrote:
 On 08/12/14 02:06, Petr Jelinek wrote:
 On 08/12/14 02:03, Michael Paquier wrote:
 On Mon, Dec 8, 2014 at 9:59 AM, Petr Jelinek p...@2ndquadrant.com
 wrote:
 Ok this patch does that, along with the rename to
 recovery_target_action and
 addition to the recovery.conf.sample.
 This needs a rebase as at least da71632 and b8e33a8 are conflicting.
 
 
 Simon actually already committed something similar, so no need.
 
 
 ...except for the removal of pause_at_recovery_target it seems, so I
 attached just that

I intend to push this one unless somebody protests soon.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_action = pause hot_standby = off

2015-03-15 Thread Andres Freund
On 2015-03-15 14:51:46 +0100, Magnus Hagander wrote:
 On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund and...@2ndquadrant.com
 wrote:
 
  On 2015-03-12 15:52:02 +0100, Andres Freund wrote:
 /*
  * Override any inconsistent requests. Not that this is a change
  * of behaviour in 9.5; prior to this we simply ignored a request
  * to pause if hot_standby = off, which was surprising behaviour.
  */
 if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE 
 recoveryTargetActionSet 
 standbyState == STANDBY_DISABLED)
 recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;
 
  While it's easy enough to fix I rather dislike the whole intent here
  though. *Silently* switching the mode of operation in a rather
  significant way seems like a bad idea to me. At the very least we need
  to emit a LOG message about this; but I think it'd be much better to
  error out instead.
 
  9.5's behaviour was already quite surprising. But changing things to a
  different surprising behaviour seems like a bad idea.
 
 
 +1. Especially for sensitive operations like this, having
 predictable-behavior-or-error is usually the best choice.

Yea.

Looking further, it's even worse right now. We'll change the target to
shutdown when hot_standby = off, but iff it was set in the config
file. But the default value is (and was, although configured
differently) documented to be 'pause'; so if it's not configured
explicitly we still will promote.  At least I can't read that out of the
docs.

Personally I think we just should change the default to 'shutdown' for
all cases. That makes documentation and behaviour less surprising. And
makes experimenting less dangerous, since you can just start again.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump quietly ignore missing tables - is it bug?

2015-03-15 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 other variant, I hope better than previous. We can introduce new long
 option --strict. With this active option, every pattern specified by -t
 option have to have identifies exactly only one table. It can be used for
 any other should to exists patterns - schemas. Initial implementation in
 attachment.

I think this design is seriously broken.  If I have '-t foo*' the code
should not prevent that from matching multiple tables.  What would the use
case for such a restriction be?

What would make sense to me is one or both of these ideas:

* require a match for a wildcard-free -t switch

* require at least one (not exactly one) match for a wildcarded -t
  switch.

Neither of those is what you wrote, though.

If we implemented the second one of these, it would have to be controlled
by a new switch, because there are plausible use cases for wildcards that
sometimes don't match anything (not to mention backwards compatibility).
There might be a reasonable argument for the first one being the
default behavior, though; I'm not sure if we could get away with that
from a compatibility perspective.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Crash on SRF execution

2015-03-15 Thread Itai
Hi
 
I'm attempting to program a simple SRF function but it constantly crashes 
(details and code below).
 
Any idea why?
 
Thanks!
 
-Itai
-
Environment
-
Ubunto: 14.04.2 (server)
PG Ver: PostgreSQL 9.4.1 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 
4.8.2-19ubuntu1) 4.8.2, 64-bit
Install: deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main
Dev: postgresql-server-dev-9.4 (9.4.1-1.pgdg14.04+1)
-
Execution
-
select * from pg_srf();
INFO:  --- data source ---
INFO:  value: 120300
INFO:  is_even: 1
INFO:  value: 120301
INFO:  is_even: 0
...
INFO:  value: 1203003998
INFO:  is_even: 1
INFO:  value: 1203003999
INFO:  is_even: 0
INFO:  --- data context ---
INFO:  call_cntr: 0
INFO:  value: 120300
INFO:  is_even: 1
INFO:  call_cntr: 1
INFO:  value: 120301
INFO:  is_even: 0
INFO:  call_cntr: 2
server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
! 
-
pg_srf.h
-
#ifndef PGSRF_H
#define PGSRF_H
#include fmgr.h
// using int as bool due to a different issue (one prob. at a time)
typedef struct Number_tag
{
 int value;
 int is_even;
} Number;
typedef struct NumberList_tag
{
 int length;
 Number ** pp_numbers;
} NumberList;
extern Datum pg_srf(PG_FUNCTION_ARGS);
#endif
-
pg_srf.c
-
#include postgres.h
#include access/htup_details.h
#include catalog/pg_type.h
#include funcapi.h
#include pg_srf.h
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
PG_FUNCTION_INFO_V1(pg_srf);
Datum
pg_srf(PG_FUNCTION_ARGS)
{
 int call_cntr, i, length, base_num;
 Number * num;
 NumberList * list;
 HeapTuple rettuple;
 FuncCallContext *funcctx;
 MemoryContext oldcontext;
 if (SRF_IS_FIRSTCALL())
 {
  length = 4000;
  base_num = 120300;
  list = (NumberList *)palloc(sizeof(NumberList));
  list-pp_numbers = (Number **)palloc(sizeof(Number*) * length);
  list-length = length;
  i = 0;
  for (; i  length; i++)
  {
   num = (Number *)palloc(sizeof(Number));
   num-value = base_num + i;
   num-is_even = ((base_num + i) % 2 == 0) ? 1 : 0;
   list-pp_numbers[i] = num;
  }
  ereport(INFO, (errmsg(--- data source ---)));
  i = 0;
  for (; i  length; i++)
  {
   ereport(INFO, (errmsg(value: %d, list-pp_numbers[i]-value)));
   ereport(INFO, (errmsg(is_even: %d, list-pp_numbers[i]-is_even)));   
  }
  
  funcctx = SRF_FIRSTCALL_INIT();
  oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
  funcctx-user_fctx = list;
  funcctx-max_calls = list-length;
  if (get_call_result_type(fcinfo, NULL, funcctx-tuple_desc) != 
TYPEFUNC_COMPOSITE)
   ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
   errmsg(check if sql function definition returns SETOF record)));

  BlessTupleDesc(funcctx-tuple_desc);
  MemoryContextSwitchTo(oldcontext);
 }
 funcctx = SRF_PERCALL_SETUP();
 list = funcctx-user_fctx;
 call_cntr = funcctx-call_cntr;
 if (call_cntr  funcctx-max_calls)
 {
  Datum retvals[2];
  bool retnulls[2];
  
  if (call_cntr == 0) 
  {
   ereport(INFO, (errmsg(--- data context ---)));
  }
  ereport(INFO, (errmsg(call_cntr: %d, call_cntr)));
  ereport(INFO, (errmsg(value: %d, list-pp_numbers[call_cntr]-value)));
  retvals[0] = Int32GetDatum(list-pp_numbers[call_cntr]-value);
  ereport(INFO, (errmsg(is_even: %d, list-pp_numbers[call_cntr]-is_even)));
  retvals[1] = Int32GetDatum(list-pp_numbers[call_cntr]-is_even);
  retnulls[0] = false;
  retnulls[1] = false;
  rettuple = heap_form_tuple(funcctx-tuple_desc, retvals, retnulls);
  SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(rettuple));
 }
 else
 {
  SRF_RETURN_DONE(funcctx);
 }
}
-
Makefile
-
MODULES = pg_srf
OBJS = pg_srf.o
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
-
SQL
-
CREATE OR REPLACE FUNCTION
  pg_srf(OUT value integer, OUT is_even integer)
RETURNS 
 SETOF record
AS
  'pg_srf.so', 'pg_srf'
LANGUAGE
  C
STRICT
IMMUTABLE;
  

Re: [HACKERS] Crash on SRF execution

2015-03-15 Thread Tom Lane
Itai it...@outlook.com writes:
 I'm attempting to program a simple SRF function but it constantly crashes 
 (details and code below).
 Any idea why?

Looks like you're pallocing some stuff in the calling context (ie, a
short-lived context) during the first execution and expecting it to
still be there in later executions.  You'd need to allocate those
data structures in the multi_call_memory_ctx instead.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_action = pause hot_standby = off

2015-03-15 Thread Petr Jelinek

On 15/03/15 14:51, Magnus Hagander wrote:

On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund and...@2ndquadrant.com
mailto:and...@2ndquadrant.com wrote:

On 2015-03-12 15:52:02 +0100, Andres Freund wrote:
/*
 * Override any inconsistent requests. Not that this is a
change
 * of behaviour in 9.5; prior to this we simply ignored a
request
 * to pause if hot_standby = off, which was surprising
behaviour.
 */
if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE 
recoveryTargetActionSet 
standbyState == STANDBY_DISABLED)
recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;

While it's easy enough to fix I rather dislike the whole intent here
though. *Silently* switching the mode of operation in a rather
significant way seems like a bad idea to me. At the very least we need
to emit a LOG message about this; but I think it'd be much better to
error out instead.

9.5's behaviour was already quite surprising. But changing things to a
different surprising behaviour seems like a bad idea.


+1. Especially for sensitive operations like this, having
predictable-behavior-or-error is usually the best choice.



Thinking about it again now, it does seem that ignoring user setting 
because it's in conflict with another user setting is a bad idea and I 
think we in general throw errors on those.


So +1 from me also.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized

2015-03-15 Thread Andres Freund
On 2015-03-02 18:45:18 +0100, Andres Freund wrote:
 On 2015-03-02 19:23:56 +0200, Heikki Linnakangas wrote:
  Pass the prepared XID as yet another argument to XactEmitCommitRecord, and
  have XactEmitCommitRecord emit the xl_xact_commit_prepared part of the
  record too. It might even make sense to handle the prepared XID like all the
  other optional fields and add an xinfo flag for it.
 
 That's what I mean with non simple. Not a fan of teaching xact.c even
 more about twophase's dealings than it already knows.

I made an effort to show how horrible it would look like. Turns out it's
not that bad ;). Far from pretty, especially in xlog.c, but
acceptable. I think treating them more similar actually might open the
road for reducing the difference further at some point.

The attached patch does pretty much what you suggested above and tries
to address all the point previously made. I plan to push this fairly
soon; unless somebody has further input.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 7557d0afa860c37a1992240b54e2bf3710296fc7 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Sun, 15 Mar 2015 14:42:09 +0100
Subject: [PATCH] Merge the various forms of transaction commit  abort
 records.

Since 465883b0a two versions of commit records have existed. A compact
version that was used when no cache invalidations, smgr unlinks and
similar were needed, and a full version that could deal with all
that. Additionally the full version was embedded into twophase commit
records.

That resulted in a measurable reduction in the size of the logged WAL in
some workloads. But more recently additions like logical decoding, which
e.g. needs information about the database something was executed on,
made it applicable in fewer situations. The static split generally made
it hard to expand the commit record, because concerns over the size made
it hard to add anything to the compact version.

Additionally it's not particularly pretty to have twophase.c insert
RM_XACT records.

Rejigger things so that the commit and abort records only have one form
each, including the twophase equivalents. The presence of the various
optional (in the sense of not being in every record) pieces is indicated
by a bits in the 'xinfo' flag.  That flag previously was not included in
compact commit records. To prevent an increase in size due to its
presence, it's only included if necessary; signalled by a bit in the
xl_info bits available for xact.c, similar to heapam.c's
XLOG_HEAP_OPMASK/XLOG_HEAP_INIT_PAGE.

Twophase commit/aborts are now the same as their normal
counterparts. The original transaction's xid is included in an optional
data field.

This means that commit records generally are smaller, except in the case
of a transaction with subtransactions, but no other special cases; the
increase there is four bytes, which seems acceptable given that the more
common case of not having subtransactions shrank.  The savings are
especially measurable for twophase commits, which previously always used
the full version; but will in practice only infrequently have required
that.

The motivation for this work are not the space savings and and
deduplication though; it's that it makes it easier to extend commit
records with additional information. That's just a few lines of code
now; without impacting the common case where that information is not
needed.

Discussion: 20150220152150.gd4...@awork2.anarazel.de,
235610.92468.qm%40web29004.mail.ird.yahoo.com

Reviewed-By: Heikki Linnakangas, Simon Riggs
---
 src/backend/access/rmgrdesc/xactdesc.c   | 245 +++-
 src/backend/access/transam/twophase.c|  59 +---
 src/backend/access/transam/xact.c| 476 +++
 src/backend/access/transam/xlog.c| 126 
 src/backend/replication/logical/decode.c | 133 +++--
 src/include/access/xact.h| 215 ++
 src/include/access/xlog_internal.h   |   2 +-
 7 files changed, 749 insertions(+), 507 deletions(-)

diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 3e87978..b036b6d 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -14,53 +14,188 @@
  */
 #include postgres.h
 
+#include access/transam.h
 #include access/xact.h
 #include catalog/catalog.h
 #include storage/sinval.h
 #include utils/timestamp.h
 
+/*
+ * Parse the WAL format of a xact commit and abort records into a easier to
+ * understand format.
+ *
+ * This routines are in xactdesc.c because they're accessed in backend (when
+ * replaying WAL) and frontend (pg_xlogdump) code. This file is the only xact
+ * specific one shared between both. They're complicated enough that
+ * duplication would be bothersome.
+ */
+
+void
+ParseCommitRecord(uint8 info, xl_xact_commit *xlrec, xl_xact_parsed_commit 

Re: [HACKERS] Crash on SRF execution

2015-03-15 Thread Itai
Thanks for the quick response!! :)
But I don't get it... isn't:
if (SRF_IS_FIRSTCALL()){
}
the iterator one time
initialization block where I setup the data to be iterated
upon? 

Can you please elaborate on how should I fix this? I'm probably missing 
something basic...

 Date: Sun, 15 Mar 2015 16:50:27 +0100
 From: and...@2ndquadrant.com
 To: it...@outlook.com
 CC: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Crash on SRF execution
 
 Hi,
 
 On 2015-03-15 17:40:11 +0200, Itai wrote:
  I'm attempting to program a simple SRF function but it constantly crashes 
  (details and code below).
   
  Any idea why?
 
   if (SRF_IS_FIRSTCALL())
   {
length = 4000;
base_num = 120300;
list = (NumberList *)palloc(sizeof(NumberList));
list-pp_numbers = (Number **)palloc(sizeof(Number*) * length);
list-length = length;
i = 0;
for (; i  length; i++)
{
 num = (Number *)palloc(sizeof(Number));
 num-value = base_num + i;
 num-is_even = ((base_num + i) % 2 == 0) ? 1 : 0;
 list-pp_numbers[i] = num;
}
ereport(INFO, (errmsg(--- data source ---)));
i = 0;
for (; i  length; i++)
{
 ereport(INFO, (errmsg(value: %d, list-pp_numbers[i]-value)));
 ereport(INFO, (errmsg(is_even: %d, list-pp_numbers[i]-is_even)));   
}

funcctx = SRF_FIRSTCALL_INIT();
oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
funcctx-user_fctx = list;
funcctx-max_calls = list-length;
if (get_call_result_type(fcinfo, NULL, funcctx-tuple_desc) != 
  TYPEFUNC_COMPOSITE)
 ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg(check if sql function definition returns SETOF record)));
  
BlessTupleDesc(funcctx-tuple_desc);
MemoryContextSwitchTo(oldcontext);
   }
 
 The palloc() for list above is in the per call memory context, but you
 use it across several calls. You should allocate it in the multi call
 context you use some lines below.
 
 Greetings,
 
 Andres Freund
 
 -- 
  Andres Freund   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
  

Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-15 Thread Andres Freund
Hi,

On 2015-03-03 11:04:30 -0800, Josh Berkus wrote:
 Attached is version D, which incorporates the above two changes, but NOT
 a general unit comment cleanup of postgresql.conf, which needs to be an
 entirely different patch.

Pushed!

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

2015-03-15 Thread Andres Freund
Hi,

On 2014-12-08 02:18:11 +0100, Petr Jelinek wrote:
 ...except for the removal of pause_at_recovery_target it seems, so I
 attached just that

Pushed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on SRF execution

2015-03-15 Thread Andres Freund
Hi,

On 2015-03-15 17:40:11 +0200, Itai wrote:
 I'm attempting to program a simple SRF function but it constantly crashes 
 (details and code below).
  
 Any idea why?

  if (SRF_IS_FIRSTCALL())
  {
   length = 4000;
   base_num = 120300;
   list = (NumberList *)palloc(sizeof(NumberList));
   list-pp_numbers = (Number **)palloc(sizeof(Number*) * length);
   list-length = length;
   i = 0;
   for (; i  length; i++)
   {
num = (Number *)palloc(sizeof(Number));
num-value = base_num + i;
num-is_even = ((base_num + i) % 2 == 0) ? 1 : 0;
list-pp_numbers[i] = num;
   }
   ereport(INFO, (errmsg(--- data source ---)));
   i = 0;
   for (; i  length; i++)
   {
ereport(INFO, (errmsg(value: %d, list-pp_numbers[i]-value)));
ereport(INFO, (errmsg(is_even: %d, list-pp_numbers[i]-is_even)));   
   }
   
   funcctx = SRF_FIRSTCALL_INIT();
   oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
   funcctx-user_fctx = list;
   funcctx-max_calls = list-length;
   if (get_call_result_type(fcinfo, NULL, funcctx-tuple_desc) != 
 TYPEFUNC_COMPOSITE)
ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg(check if sql function definition returns SETOF record)));
 
   BlessTupleDesc(funcctx-tuple_desc);
   MemoryContextSwitchTo(oldcontext);
  }

The palloc() for list above is in the per call memory context, but you
use it across several calls. You should allocate it in the multi call
context you use some lines below.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on SRF execution

2015-03-15 Thread Itai



Fantastic! That solved this problem.


However I still get a crash  if I change: 


is_even to bool num-is_even = ((base_num + i) % 2 == 0) ? true : false; 
retvals[1] = BoolGetDatum(list-pp_numbers[call_cntr]-is_even); CREATE OR 
REPLACE FUNCTION  pg_srf(OUT value integer, OUT is_even bit) RETURNS
SETOF recordAS  'pg_srf.so', 'pg_srf'LANGUAGE  CSTRICTIMMUTABLE;


 
 Date: Sun, 15 Mar 2015 17:03:38 +0100
 From: and...@2ndquadrant.com
 To: it...@outlook.com
 CC: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Crash on SRF execution
 
 Hi,
 
 On 2015-03-15 17:59:39 +0200, Itai wrote:
  Thanks for the quick response!! :)
  But I don't get it... isn't:
  if (SRF_IS_FIRSTCALL()){
  }
  the iterator one time
  initialization block where I setup the data to be iterated
  upon? 
  
  Can you please elaborate on how should I fix this? I'm probably missing 
  something basic...
 
 if (SRF_IS_FIRSTCALL())
 {
  length = 4000;
  base_num = 120300;
  list = (NumberList *)palloc(sizeof(NumberList));
  list-pp_numbers = (Number **)palloc(sizeof(Number*) * length);
 
 You allocate memory in the per call context here.
 
  list-length = length;
  i = 0;
  for (; i  length; i++)
  {
   num = (Number *)palloc(sizeof(Number));
   num-value = base_num + i;
   num-is_even = ((base_num + i) % 2 == 0) ? 1 : 0;
   list-pp_numbers[i] = num;
  }
  ereport(INFO, (errmsg(--- data source ---)));
  i = 0;
  for (; i  length; i++)
  {
   ereport(INFO, (errmsg(value: %d, list-pp_numbers[i]-value)));
   ereport(INFO, (errmsg(is_even: %d, 
list-pp_numbers[i]-is_even)));   
  }
  
  funcctx = SRF_FIRSTCALL_INIT();
  oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
 
 Because you only switch the memory context here. Move this up, to the
 beginning of the SRF_IS_FIRSTCALL block. Before the palloc()s.
 
 Greetings,
 
 Andres Freund
 
 -- 
  Andres Freund   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
  

Re: [HACKERS] recovery_target_action = pause hot_standby = off

2015-03-15 Thread Magnus Hagander
On Sun, Mar 15, 2015 at 3:16 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2015-03-15 14:51:46 +0100, Magnus Hagander wrote:
  On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund and...@2ndquadrant.com
  wrote:
 
   On 2015-03-12 15:52:02 +0100, Andres Freund wrote:
  /*
   * Override any inconsistent requests. Not that this is a
 change
   * of behaviour in 9.5; prior to this we simply ignored a
 request
   * to pause if hot_standby = off, which was surprising
 behaviour.
   */
  if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE 
  recoveryTargetActionSet 
  standbyState == STANDBY_DISABLED)
  recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;
  
   While it's easy enough to fix I rather dislike the whole intent here
   though. *Silently* switching the mode of operation in a rather
   significant way seems like a bad idea to me. At the very least we need
   to emit a LOG message about this; but I think it'd be much better to
   error out instead.
  
   9.5's behaviour was already quite surprising. But changing things to a
   different surprising behaviour seems like a bad idea.
  
 
  +1. Especially for sensitive operations like this, having
  predictable-behavior-or-error is usually the best choice.

 Yea.

 Looking further, it's even worse right now. We'll change the target to
 shutdown when hot_standby = off, but iff it was set in the config
 file. But the default value is (and was, although configured
 differently) documented to be 'pause'; so if it's not configured
 explicitly we still will promote.  At least I can't read that out of the
 docs.

 Personally I think we just should change the default to 'shutdown' for
 all cases. That makes documentation and behaviour less surprising. And
 makes experimenting less dangerous, since you can just start again.



+1. These things need to be clear. Given the consequences of getting it
wrong, surprising behavior can be quite dangerous.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] recovery_target_action = pause hot_standby = off

2015-03-15 Thread Andres Freund
On 2015-03-12 15:52:02 +0100, Andres Freund wrote:
   /*
* Override any inconsistent requests. Not that this is a change
* of behaviour in 9.5; prior to this we simply ignored a request
* to pause if hot_standby = off, which was surprising behaviour.
*/
   if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE 
   recoveryTargetActionSet 
   standbyState == STANDBY_DISABLED)
   recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;

While it's easy enough to fix I rather dislike the whole intent here
though. *Silently* switching the mode of operation in a rather
significant way seems like a bad idea to me. At the very least we need
to emit a LOG message about this; but I think it'd be much better to
error out instead.

9.5's behaviour was already quite surprising. But changing things to a
different surprising behaviour seems like a bad idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized

2015-03-15 Thread Andres Freund
On 2015-03-15 16:30:16 +0100, Andres Freund wrote:
 The attached patch does pretty much what you suggested above and tries
 to address all the point previously made. I plan to push this fairly
 soon; unless somebody has further input.

Pushed, after fixing two typos in decode.c I introduced while cleaning
up. This might not yet be perfect, but there seems little point in
waiting further.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - merging transaction logs

2015-03-15 Thread Fabien COELHO

Sun, 15 Mar 2015 11:22:01 +0100 (CET)
Hello Tomas,


attached is a patch implementing merging of pgbench logs. These logs are
written by each thread, so with N threads you get N files with names

   pgbench_log.PID
   pgbench_log.PID.1
   ...
   pgbench_log.PID.N

Before analyzing these logs, these files need to be combined. I usually
ended up wrinting ad-hoc scripts doing that, lost them, written them
again and so on over and over again.


I've looked at the patch. Although I think that such a feature is somehow 
desirable... I have two issues with it: ISTM that


(1) it does not belong to pgbench as such

(2) even if, the implementation is not right

About (1):

I think that this functionnality as implemented does not belong to 
pgbench, and does really belong to an external script, which happen not to 
be readily available, which is a shame. PostgreSQL should probably provide 
such a script, or make it easy to find.


It could belong to pgbench if pgbench was *generating* the merged log file 
directly. However I'm not sure this can be done cleanly for the 
multi-process thread emulation (IN PASSING: which I think this should 
really get rid of because I do not know what system does not provide 
threads nowadays and would require to have a state of the art pgbench 
nevertheless, and this emulation significantly complexify the code by 
making things uselessly difficult and/or needed to be implemented twice or 
not be provided in some cases).


Another issue raised by your patch is that the log format may be improved, 
say by providing a one-field timestamp at the beginning of the line.



About (2):

In the implementation you reimplement a partial merge sort by parsing each 
line of each file and merging it with the current result over and over. 
ISTM that an implementation should read all files in parallel and merge 
them in one pass. The current IO complexity is in p²n where it should be 
simply pn... do not use it with a significant number of threads and many 
lines... Ok, the generated files are likely to be kept in cache, but 
nevertheless.


Also there are plenty of copy paste when scanning for two files, and then 
reprinting in all the different formats. The same logic is implemented 
twice, once for simple and once for aggregated. This means that updating 
or extending the log format later on would require to modify these scans 
and prints in many places.


For aggregates, some data in the output may be special values NaN/-/..., 
I am not sure how the implementation would behave in such cases. As lines 
that do not match are silently ignored, the result merge would just be non 
significant should it rather be an error? Try it with a low rate for 
instance.


It seems that system function calls are not tested for errors.


The other disadvantage of the external scripts is that you have to pass
all the info about the logs (whether the logs are aggregated, whther
there's throttling, etc.).


I think that is another argument to make a better format, with the a 
timestamp ahead? Also, ISTM that it only needs to know whether it is 
merging aggregate or simple logs, no more, the other information can be 
infered by the number of fields on the line.



So integrating this into pgbench directly seems like a better approach,
and the attached patch implements that.


You guessed that I disagree. Note that this is only my own opinion.

In summary, I think that:

(a) providing a clean script would be nice,

(b) if we could get rid of the thread emulation, pgbench could generate 
the merged logs directly and simply, and the option could be provided 
then.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assessing parallel-safety

2015-03-15 Thread Noah Misch
On Thu, Mar 12, 2015 at 11:21:37AM -0400, Robert Haas wrote:
 On Thu, Feb 19, 2015 at 1:19 AM, Noah Misch n...@leadboat.com wrote:
  Rereading my previous message, I failed to make the bottom line clear: I
  recommend marking eqsel etc. PROPARALLEL_UNSAFE but *not* checking an
  estimator's proparallel before calling it in the planner.
 
 But what do these functions do that is actually unsafe?

They call the oprcode function of any operator naming them as an estimator.
Users can add operators that use eqsel() as an estimator, and we have no bound
on what those operators' oprcode can do.  (In practice, parallel-unsafe
operators using eqsel() as an estimator will be rare.)

  Hmm.  Altogether prohibiting XLogInsert() in parallel mode seems
  unwise, because it would foreclose heap_page_prune_opt() in workers.
  I realize there's separate conversation about whether pruning during
  SELECT queries is good policy, but in the interested of separating
  mechanism from policy, and in the sure knowledge that allowing at
  least some writes in parallel mode is certainly going to be something
  people will want, it seems better to look into propagating
  XactLastRecEnd.
 
  Good points; that works for me.
 
 The key design decision here seems to be this: How eagerly do we need
 to synchronize XactLastRecEnd?  What exactly goes wrong if it isn't
 synchronized?  For example, if the value were absolutely critical in
 all circumstances, one could imagine storing a shared XactLastRecEnd
 in shared memory.  This doesn't appear to be the case: the main
 requirement is that we definitely need an up-to-date value at commit
 time.  Also, at abort time, we don't really the value for anything
 critical, but it's worth kicking the WAL writer so that any
 accumulated WAL gets flushed.

Exactly.  At entry to RecordTransactionCommit(), XactLastRecEnd must point
past the end of all records whose replay is required to satisfy the user's
expectation of transaction durability.  At other times, harm from its value
being wrong is negligible.  I do suggest adding a comment to its definition
explaining when one can rely on it.

 Here's an incremental patch - which I will incorporate into the
 parallel mode patch if it seems about right to you - that tries to
 tackle all this.

I reviewed this.  Minor things:

 @@ -73,10 +74,15 @@ typedef struct FixedParallelState
   /* Entrypoint for parallel workers. */
   parallel_worker_main_type   entrypoint;
  
 - /* Track whether workers have attached. */
 + /* Mutex protects remaining fiedlds. */

Typo.

 @@ -2564,10 +2578,19 @@ AbortTransaction(void)
* worker, skip this; the user backend must be the one to write the 
 abort
* record.
*/
 - if (parallel)
 - latestXid = InvalidTransactionId;
 - else
 + if (!parallel)
   latestXid = RecordTransactionAbort(false);
 + else
 + {
 + latestXid = InvalidTransactionId;
 +
 + /*
 +  * Since the parallel master won't get our value of 
 XactLastRecEnd in this
 +  * case, we nudge WAL-writer ourselves in this case.  See 
 related comments in
 +  * RecordTransactionAbort for why this matters.
 +  */
 + XLogSetAsyncXactLSN(XactLastRecEnd);

RecordTransactionAbort() skips this for subtransaction aborts.  I would omit
it here, because a parallel worker abort is, in this respect, more like a
subtransaction abort than like a top-level transaction abort.


While studying README.parallel from parallel-mode-v7.patch to understand the
concept of worker commit/abort, this part gave me the wrong idea:

 +Transaction Integration
 +===
...
 +Transaction commit or abort requires careful coordination between backends.
 +Each backend has its own resource owners: buffer pins, catcache or relcache
 +reference counts, tuple descriptors, and so on are managed separately by each
 +backend, and each backend is separately responsible for releasing such
 +resources.  Generally, the commit or abort of a parallel worker is much like
 +a top-transaction commit or abort, but there are a few exceptions.  Most
 +importantly:
 +
 +  - No commit or abort record is written; the initiating backend is
 +responsible for this.
 +
 +  - Cleanup of pg_temp namespaces is not done.  The initiating backend is
 +responsible for this, too.
 +
 +The master kills off all remaining workers as part of commit or abort
 +processing.  It must not only kill the workers but wait for them to actually

I took this to mean that workers normally persist until the master commits a
transaction.  Rather, their lifecycle is like the lifecycle of a buffer pin.
When an error interrupts a parallel operation, transaction abort destroys
workers.  Otherwise, the code driving the specific parallel task destroys them
as early as is practical.  (Strictly to catch bugs, transaction commit does
detect and destroy leaked workers.)

 

Re: [HACKERS] ranlib bleating about dirmod.o being empty

2015-03-15 Thread Michael Paquier
On Sun, Mar 15, 2015 at 1:58 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm getting rather tired of reading these warning messages in OS X builds:

 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib:
  file: libpgport.a(dirmod.o) has no symbols
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib:
  file: libpgport_srv.a(dirmod_srv.o) has no symbols

 The reason for this warning is that dirmod.o contains no functions that
 aren't Windows-specific.  As such, it seems like putting it into the
 list of always compiled port modules is really a mistake.  Any
 objections to switching it to be built only on Windows?

+1. Thanks for changing that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace

2015-03-15 Thread Julien Tachoires
On 15/03/2015 04:34, Andreas Karlsson wrote:
 On 03/15/2015 04:25 AM, Andreas Karlsson wrote:
 Nice. You will also want to apply the attached patch which fixes support
 for the --no-tablespaces flag.
 
 Just realized that --no-tablespaces need to be fixed for pg_restore too.

Indeed, after taking a look at pg_restore case, I would say it won't be
so easy.
Will try to fix it.

--
Julien


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - merging transaction logs

2015-03-15 Thread Fabien COELHO



Firstly, the fact that pgbench produces one file per thread is awkward.


I agree, but I think it is due to the multi process thread emulation: if 
you have real threads, you can do a simple fprintf, possibly with some 
mutex, and you're done. There is really nothing to do to implement this

feature.


Secondly, a separate tool (even if provided with pgbench) would require
passing detailed info about the log format - whether it's aggregated or
not, throttling or not, 


Hmmm.


It could belong to pgbench if pgbench was *generating* the merged
log file directly. [...]


I considered this approach, i.e. adding another 'collector' thread
receiving results from all the other threads, but decided not to do
that. That would require a fair amount of inter-process communication,
locking etc. and might affect the measurements


I agree that inter-process stuff should be avoided. This is not what I had 
in mind. I was thinking of fprintf on the same file handler by different 
threads.



Another issue raised by your patch is that the log format may be
improved, say by providing a one-field timestamp at the beginning of
the line.


I don't see how this is relevant to this patch?


For the simple log format, all the parsing needed would be for the 
timestamp, and the remainder would just be text to pass along, no need to 
%d %f... whatever.



The current IO complexity is in p²n where it should be simply pn...


Maybe. Implementing a 2-way merge sort was the simplest solution at the
moment, and I don't think this really matters because the I/O generated
by the benchmark itself is usually much higher than this.


If you do not do a n-way merge, you could do a 2-way merge on a binary 
tree so that the IO complexity would be p.log(p).n (I think), and not p²n.



For aggregates, some data in the output may be special values
NaN/-/..., [...]

You mean the aggregated log? I can't think of a way to get there such
values - can you provide a plausible scenario how that could happen?


Possibly I'm wrong. Ok, I tried it:

 sh ./pgbench --rate=0.5 --aggregate-interval=1 --log
 sh cat pgbench_log.12671
 1426445236 1 5034 25341156 5034 5034 687 471969 687 687
 1426445237 0 0 0 0 0 0 0 0 0
 1426445238 0 0 0 0 0 0 0 0 0
 1426445239 1 8708 75829264 8708 8708 2063 4255969 2063 2063
 ...

Good news, I could not generate strange values. Just when there are 0 
transactions possibly some care should be taken when combining values.



It seems that system function calls are not tested for errors.


That's true, but that's how the rest of pgbench code is written.


Hmmm This is not entirely true, there are *some* checks if you look 
carefully:-)


   if ((fd = fopen(filename, r)) == NULL) ...
   if ((fp = popen(command, r)) == NULL) ...
   nsocks = select(...)
   if (nsocks  0) ...


So integrating this into pgbench directly seems like a better approach,
and the attached patch implements that.


You guessed that I disagree. Note that this is only my own opinion.

In summary, I think that:

(a) providing a clean script would be nice,

(b) if we could get rid of the thread emulation, pgbench could
generate the merged logs directly and simply, and the option could
be provided then.


That however is not the goal of this patch.


Sure. My point is that the amount of code you write to implement this 
merge stuff is due to this feature. Without it, the patch would probably 
need 10 lines of code. Moreover, the way it is implement requires scanning 
and reprinting, which means more work in many places to update the format 
later.



The thread emulation is there for a reason,


My opinion is that it *was* there for a reason. Whether it makes sense 
today to still have it, maintain it, and have to write such heavy code for 
a trivial feature just because of it is another matter.



and I certainly am not going to work on eliminating it
(not sure that's even possible).


I wish it will be:-)

I would suggest this that I would support: implement this feature the 
simple way (aka fprintf, maybe a mutex) when compiled with threads, and 
generate an error feature not available with process-based thread 
emulation when compiled with processes. This way we avoid a, lot of heavy 
code to maintain in the future, and you still get the feature within 
pgbench. There are already some things which are not the same with thread 
emulation because it would have been tiring to implement for it for very 
little benefit.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on SRF execution

2015-03-15 Thread Itai Dor-On
Thanks Tom.

Assuming the SRF had a parameter, would this be a correct approach
(considering the iterative model) to bail-out early?

if (SRF_IS_FIRSTCALL())
{
int i;

if (get_call_result_type(fcinfo, NULL, funcctx-tuple_desc) !=
TYPEFUNC_COMPOSITE) 
{
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg(Check if sql function definition returns SETOF
record))); 
return;
}

if (PG_ARGISNULL(0)) 
{
ereport(ERROR,
(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 errmsg(Null value not allow for ...)));
 return;
}
 
if((i = PG_GETARG_INT32(0)) != 'WHATEVER') 
{
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg(Null value not allow for ...)));
 return;
}

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Sunday, March 15, 2015 5:50 PM
To: Itai
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Crash on SRF execution

Itai it...@outlook.com writes:
 I'm attempting to program a simple SRF function but it constantly crashes
(details and code below).
 Any idea why?

Looks like you're pallocing some stuff in the calling context (ie, a
short-lived context) during the first execution and expecting it to still be
there in later executions.  You'd need to allocate those data structures in
the multi_call_memory_ctx instead.

regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cygwin vs win32 in configure

2015-03-15 Thread Tom Lane
Buildfarm member brolga seems unhappy with my commit 91f4a5a that
restricted port/dirmod.c to being built only on Windows.  Evidently
we need to build it when $PORTNAME = cygwin as well, which is fine;
but I'm a bit astonished that none of the other stuff inserted in the
# Win32 support stanza beginning at configure.in line 1440 is needed
for cygwin builds.  Is that really correct, or am I misunderstanding
what's happening here?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-03-15 Thread Petr Jelinek

On 14/03/15 04:17, Andreas Karlsson wrote:

On 03/13/2015 10:22 PM, Peter Geoghegan wrote:

On Thu, Mar 12, 2015 at 6:23 PM, Andreas Karlsson andr...@proxel.se
wrote:

/*
  * Integer data types use Numeric accumulators to share code and
avoid risk
  * of overflow.  To speed up aggregation 128-bit integer
accumulators are
  * used instead where sum(X) or sum(X*X) fit into 128-bits, and
there is
  * platform support.
  *
  * For int2 and int4 inputs sum(X) will fit into a 64-bit
accumulator, hence
  * we use faster special-purpose accumulator routines for SUM and
AVG of
  * these datatypes.
  */

#ifdef HAVE_INT128
typedef struct Int128AggState


Not quite. Refer to the 128-bit integer accumulators as
special-purpose accumulator routines instead. Then, in the case of
the extant 64-bit accumulators, refer to them by the shorthand
integer accumulators. Otherwise it's the wrong way around.


I disagree. The term integer accumulators is confusing. Right now I do
not have any better ideas for how to write that comment, so I submit the
next version of the patch with the comment as I wrote it above. Feel
free to come up with a better wording, my English is not always up to
par when writing technical texts.



I don't like the term integer accumulators either as integer is 
platform specific. I would phase it like this:


/*
 * Integer data types in general use Numeric accumulators to share code
 * and avoid risk of overflow.
 *
 * However for performance reasons some of the optimized special-purpose
 * accumulator routines are used when possible.
 *
 * On platforms with 128-bit integer support, the 128-bit routines will be
 * used when sum(X) or sum(X*X) fit into 128-bit.
 *
 * For int2 and int4 inputs, the N and sum(X) fit into 64-bit so the 64-bit
 * accumulators will be used for SUM and AVG of these data types.
 */

It's almost the same thing as you wrote but the 128 bit and 64 bit 
optimizations are put on the same level of optimized routines.


But this is nitpicking at this point, I am happy with the patch as it 
stands right now.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - merging transaction logs

2015-03-15 Thread Tomas Vondra
On 15.3.2015 11:22, Fabien COELHO wrote:

 I've looked at the patch. Although I think that such a feature is
 somehow desirable... I have two issues with it: ISTM that
 
 (1) it does not belong to pgbench as such
 
 (2) even if, the implementation is not right
 
 About (1):
 
 I think that this functionnality as implemented does not belong to 
 pgbench, and does really belong to an external script, which happen 
 not to be readily available, which is a shame. PostgreSQL should 
 probably provide such a script, or make it easy to find.

I disagree, for two main reasons.

Firstly, the fact that pgbench produces one file per thread is awkward.
It was implemented like this most likely because it was the simplest
solution after adding -j in 9.0, but I don't remember if I ever needed
a separate log from a single thread.

Secondly, a separate tool (even if provided with pgbench) would require
passing detailed info about the log format - whether it's aggregated or
not, throttling or not, 

Those are the two main reasons why I think this should be implemented as
another option in pgbench.


 It could belong to pgbench if pgbench was *generating* the merged
 log file directly. However I'm not sure this can be done cleanly for
 the multi-process thread emulation (IN PASSING: which I think this 
 should really get rid of because I do not know what system does not 
 provide threads nowadays and would require to have a state of the art
 pgbench nevertheless, and this emulation significantly complexify the
 code by making things uselessly difficult and/or needed to be 
 implemented twice or not be provided in some cases).

I considered this approach, i.e. adding another 'collector' thread
receiving results from all the other threads, but decided not to do
that. That would require a fair amount of inter-process communication,
locking etc. and might affect the measurements (especially for the
workloads with extremely short transactions, like read-only pgbench).

The implemented approach, i.e. merging results collected by each thread
independently, after the benchmarking actually completed, is a better
solution. And the code is actually simpler.


 Another issue raised by your patch is that the log format may be 
 improved, say by providing a one-field timestamp at the beginning of 
 the line.

I don't see how this is relevant to this patch?


 About (2):
 
 In the implementation you reimplement a partial merge sort by
 parsing each line of each file and merging it with the current result
 over and over. ISTM that an implementation should read all files in
 parallel and merge them in one pass. The current IO complexity is in
 p²n where it should be simply pn... do not use it with a significant
 number of threads and many lines... Ok, the generated files are
 likely to be kept in cache, but nevertheless.

Maybe. Implementing a 2-way merge sort was the simplest solution at the
moment, and I don't think this really matters because the I/O generated
by the benchmark itself is usually much higher than this.

The only case when this might make a difference is probably merging
large transaction logs (e.g. 100% sample from read-only test on small
dataset).

 Also there are plenty of copy paste when scanning for two files, and 
 then reprinting in all the different formats. The same logic is 
 implemented twice, once for simple and once for aggregated. This
 means that updating or extending the log format later on would
 require to modify these scans and prints in many places.

That is true, and I'll address that somehow (either moving the code to a
macro or a separate utility function).

 For aggregates, some data in the output may be special values 
 NaN/-/..., I am not sure how the implementation would behave in
 such cases. As lines that do not match are silently ignored, the
 result merge would just be non significant should it rather be an
 error? Try it with a low rate for instance.

You mean the aggregated log? I can't think of a way to get there such
values - can you provide a plausible scenario how that could happen?

 
 It seems that system function calls are not tested for errors.

That's true, but that's how the rest of pgbench code is written.

 
 The other disadvantage of the external scripts is that you have to
 pass all the info about the logs (whether the logs are aggregated,
 whther there's throttling, etc.).
 
 I think that is another argument to make a better format, with the a
 timestamp ahead? Also, ISTM that it only needs to know whether it is
 merging aggregate or simple logs, no more, the other information can be
 infered by the number of fields on the line.

No, it's not. Even if you change the format like this, you still have no
idea whether the log is a per-transaction log (possibly with some
additional options), aggregated log.

There might be some auto-detection based on number of fields, for
example, but considering how many options influence that I consider that
really unreliable.

 
 So 

Re: [HACKERS] Cygwin vs win32 in configure

2015-03-15 Thread Andres Freund
Hi,

On 2015-03-15 14:03:08 -0400, Tom Lane wrote:
 Buildfarm member brolga seems unhappy with my commit 91f4a5a that
 restricted port/dirmod.c to being built only on Windows.  Evidently
 we need to build it when $PORTNAME = cygwin as well, which is fine;
 but I'm a bit astonished that none of the other stuff inserted in the
 # Win32 support stanza beginning at configure.in line 1440 is needed
 for cygwin builds.  Is that really correct, or am I misunderstanding
 what's happening here?

Those all sound like things that cygwin is going to emulate for us. I
guess we could use more of those functions to reduce the difference, but
I'm not sure it's worth it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_action = pause hot_standby = off

2015-03-15 Thread Simon Riggs
On 15 March 2015 at 14:16, Andres Freund and...@2ndquadrant.com wrote:

 Personally I think we just should change the default to 'shutdown' for
 all cases. That makes documentation and behaviour less surprising. And
 makes experimenting less dangerous, since you can just start again.

We need to look at the specific situation, not make a generic decision.

If hot_standby = off, we are unable to unpause, once paused.

Changing the default doesn't alter that problem.

We have two choices: 1) override to a sensible setting, 2) throw an error.

(2) sounds clean at first but we must look deeper. We know that the
*only* possible other setting is 'shutdown', so it seems more user
friendly to do the thing we *know* they want (1), rather than pretend
that we don't.

(1) is completely predictable and not at all surprising. Add a LOG
message if you wish, but don't throw an error.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_action = pause hot_standby = off

2015-03-15 Thread Andres Freund
On 2015-03-15 20:10:49 +, Simon Riggs wrote:
 On 15 March 2015 at 14:16, Andres Freund and...@2ndquadrant.com wrote:
 
  Personally I think we just should change the default to 'shutdown' for
  all cases. That makes documentation and behaviour less surprising. And
  makes experimenting less dangerous, since you can just start again.
 
 We need to look at the specific situation, not make a generic decision.
 
 If hot_standby = off, we are unable to unpause, once paused.
 
 Changing the default doesn't alter that problem.
 
 We have two choices: 1) override to a sensible setting, 2) throw an error.
 
 (2) sounds clean at first but we must look deeper. We know that the
 *only* possible other setting is 'shutdown', so it seems more user
 friendly to do the thing we *know* they want (1), rather than pretend
 that we don't.
 
 (1) is completely predictable and not at all surprising. Add a LOG
 message if you wish, but don't throw an error.

Sorry, I don't buy this. If I have recovery_target_action = 'pause' in
the config file, I want it to pause. Not do something else, just because
postgres doesn't have a interface to unpause without SQL access. That
makes some sense to developers, but is pretty much ununderstandable for
mere mortals.

Even worse, right now (after the bugfix), the behaviour is:

postgresql.conf:
# hot_standby = off
recovery.conf:
#recovery_target_action = 'pause'
= promote (the downgrading is only active when explicitly configured)

# hot_standby = off
recovery.conf:
recovery_target_action = 'pause'
= shutdown (despite an explicit setting of pause)

hot_standby = on
recovery.conf:
# recovery_target_action = 'pause'
= pause

hot_standby = on
recovery.conf:
recovery_target_action = 'pause'
= pause

To me that's just utterly confusing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-03-15 Thread Etsuro Fujita

On 2015/03/13 11:46, Etsuro Fujita wrote:

BTW, what do you think about opening/locking foreign tables selected for
update at InitPlan, which the original patch does?  As I mentioned in
[1], ISTM that ExecOpenScanRelation called from ExecInitForeignScan is
assuming that.



[1] http://www.postgresql.org/message-id/54bcbbf8.3020...@lab.ntt.co.jp


Let me explain further.  Here is the comment in ExecOpenScanRelation:

 * Determine the lock type we need.  First, scan to see if target 
relation

 * is a result relation.  If not, check if it's a FOR UPDATE/FOR SHARE
 * relation.  In either of those cases, we got the lock already.

I think this is not true for foreign tables selected FOR UPDATE/SHARE, 
which have markType = ROW_MARK_COPY, because such foreign tables don't 
get opened/locked by InitPlan.  Then such foreign tables don't get 
locked by neither of InitPlan nor ExecOpenScanRelation.  I think this is 
a bug.  To fix it, I think we should open/lock such foreign tables at 
InitPlan as the original patch does.


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - merging transaction logs

2015-03-15 Thread Tomas Vondra
On 15.3.2015 20:35, Fabien COELHO wrote:
 
 Firstly, the fact that pgbench produces one file per thread is awkward.
 
 I agree, but I think it is due to the multi process thread emulation: if
 you have real threads, you can do a simple fprintf, possibly with some
 mutex, and you're done. There is really nothing to do to implement this
 feature.

My fear was that this either requires explicit locking, or some implicit
locking - for example while fprintf() in glibc is thread-safe, I really
am not sure about Win32 for example. This might influence the results
for very short transactions - I haven't tried that, though, so this
might be a false assumption.

The way I implemented the merge is immute to this.

 Secondly, a separate tool (even if provided with pgbench) would
 require passing detailed info about the log format - whether it's
 aggregated or not, throttling or not, 
 
 Hmmm.
 
 It could belong to pgbench if pgbench was *generating* the merged
 log file directly. [...]

 I considered this approach, i.e. adding another 'collector' thread 
 receiving results from all the other threads, but decided not to
 do that. That would require a fair amount of inter-process
 communication, locking etc. and might affect the measurements
 
 I agree that inter-process stuff should be avoided. This is not what 
 I had in mind. I was thinking of fprintf on the same file handler 
 by different threads.

That still involves some sort of 'implicit' locking, no? And as I
mentioned, I'm not sure fprintf() is thread-safe on all the platforms we
support.


 Another issue raised by your patch is that the log format may be 
 improved, say by providing a one-field timestamp at the beginning
 of the line.

 I don't see how this is relevant to this patch?
 
 For the simple log format, all the parsing needed would be for the 
 timestamp, and the remainder would just be text to pass along, no
 need to %d %f... whatever.

Oh, ok. Well, that's true, but I don't think that significantly changes
the overall complexity.

 The current IO complexity is in p²n where it should be simply pn...

 Maybe. Implementing a 2-way merge sort was the simplest solution at
 the moment, and I don't think this really matters because the I/O
 generated by the benchmark itself is usually much higher than this.
 
 If you do not do a n-way merge, you could do a 2-way merge on a
 binary tree so that the IO complexity would be p.log(p).n (I think),
 and not p²n.

Yes, I could do that.

I still think this probably an overengineering, but not a big deal. I'll
leave this for later, though (this patch is in 2015-06 CF anyway).

 For aggregates, some data in the output may be special values
 NaN/-/..., [...]
 You mean the aggregated log? I can't think of a way to get there such
 values - can you provide a plausible scenario how that could happen?
 
 Possibly I'm wrong. Ok, I tried it:
 
  sh ./pgbench --rate=0.5 --aggregate-interval=1 --log
  sh cat pgbench_log.12671
  1426445236 1 5034 25341156 5034 5034 687 471969 687 687
  1426445237 0 0 0 0 0 0 0 0 0
  1426445238 0 0 0 0 0 0 0 0 0
  1426445239 1 8708 75829264 8708 8708 2063 4255969 2063 2063
  ...
 
 Good news, I could not generate strange values. Just when there are
 0 transactions possibly some care should be taken when combining 
 values.

Yeah, I wasn't able to come up with such scenario, but wasn't sure.

 
 It seems that system function calls are not tested for errors.

 That's true, but that's how the rest of pgbench code is written.
 
 Hmmm This is not entirely true, there are *some* checks if you look
 carefully:-)
 
if ((fd = fopen(filename, r)) == NULL) ...
if ((fp = popen(command, r)) == NULL) ...
nsocks = select(...)
if (nsocks  0) ...

OK, there are a few checks ;-) But none of the fprintf calls IIRC.

Anyway, I plan to refactor this part of the patch to get rid of the
copy'n'paste pieces, so I'll take care of this too.

 (b) if we could get rid of the thread emulation, pgbench could
 generate the merged logs directly and simply, and the option could
 be provided then.

 That however is not the goal of this patch.
 
 Sure. My point is that the amount of code you write to implement this
 merge stuff is due to this feature. Without it, the patch would 
 probably need 10 lines of code. Moreover, the way it is implement 
 requires scanning and reprinting, which means more work in many 
 places to update the format later.

Possibly. But it's not written like that :-(


 The thread emulation is there for a reason,
 
 My opinion is that it *was* there for a reason. Whether it makes 
 sense today to still have it, maintain it, and have to write such 
 heavy code for a trivial feature just because of it is another 
 matter.

Possibly. I can't really decide that.

 and I certainly am not going to work on eliminating it
 (not sure that's even possible).
 
 I wish it will be:-)
 
 I would suggest this that I would support: implement this feature the
 simple way (aka fprintf, maybe a 

Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-03-15 Thread Andreas Karlsson

On 03/15/2015 09:47 PM, Petr Jelinek wrote:

It's almost the same thing as you wrote but the 128 bit and 64 bit
optimizations are put on the same level of optimized routines.

But this is nitpicking at this point, I am happy with the patch as it
stands right now.


Do you think it is ready for committer?

New version with altered comment is attached.

--
Andreas Karlsson
diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 509f961..48fcc68 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -125,6 +125,41 @@ undefine([Ac_cachevar])dnl
 ])# PGAC_TYPE_64BIT_INT
 
 
+# PGAC_HAVE___INT128
+# --
+# Check if __int128 is a working 128 bit integer type and
+# define HAVE___INT128 if so.
+AC_DEFUN([PGAC_HAVE___INT128],
+[AC_CACHE_CHECK(for __int128, pgac_cv_have___int128,
+[AC_TRY_RUN([
+__int128 a = 2001;
+__int128 b = 4005;
+
+int does_int128_work()
+{
+  __int128 c,d;
+
+  c = a * b;
+  d = (c + b) / b;
+  if (d != a+1)
+return 0;
+  return 1;
+}
+main() {
+  exit(! does_int128_work());
+}],
+[pgac_cv_have___int128=yes],
+[pgac_cv_have___int128=no],
+[# If cross-compiling, check the size reported by the compiler and
+# trust that the arithmetic works.
+AC_COMPILE_IFELSE([AC_LANG_BOOL_COMPILE_TRY([], [sizeof(__int128) == 16])],
+  pgac_cv_have___int128=yes,
+  pgac_cv_have___int128=no)])])
+
+if test x$pgac_cv_have___int128 = xyes ; then
+  AC_DEFINE(HAVE___INT128, 1, [Define to 1 if the system has the type `__int128'.])
+fi])# PGAC_TYPE_64BIT_INT
+
 
 # PGAC_C_FUNCNAME_SUPPORT
 # ---
diff --git a/configure b/configure
index 379dab1..b60f55f 100755
--- a/configure
+++ b/configure
@@ -13789,6 +13789,74 @@ _ACEOF
 fi
 
 
+{ $as_echo $as_me:${as_lineno-$LINENO}: checking for __int128 5
+$as_echo_n checking for __int128...  6; }
+if ${pgac_cv_have___int128+:} false; then :
+  $as_echo_n (cached)  6
+else
+  if test $cross_compiling = yes; then :
+  # If cross-compiling, check the size reported by the compiler and
+# trust that the arithmetic works.
+cat confdefs.h - _ACEOF conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+static int test_array [1 - 2 * !(sizeof(__int128) == 16)];
+test_array [0] = 0;
+return test_array [0];
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile $LINENO; then :
+  pgac_cv_have___int128=yes
+else
+  pgac_cv_have___int128=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+else
+  cat confdefs.h - _ACEOF conftest.$ac_ext
+/* end confdefs.h.  */
+
+__int128 a = 2001;
+__int128 b = 4005;
+
+int does_int128_work()
+{
+  __int128 c,d;
+
+  c = a * b;
+  d = (c + b) / b;
+  if (d != a+1)
+return 0;
+  return 1;
+}
+main() {
+  exit(! does_int128_work());
+}
+_ACEOF
+if ac_fn_c_try_run $LINENO; then :
+  pgac_cv_have___int128=yes
+else
+  pgac_cv_have___int128=no
+fi
+rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
+  conftest.$ac_objext conftest.beam conftest.$ac_ext
+fi
+
+fi
+{ $as_echo $as_me:${as_lineno-$LINENO}: result: $pgac_cv_have___int128 5
+$as_echo $pgac_cv_have___int128 6; }
+
+if test x$pgac_cv_have___int128 = xyes ; then
+
+$as_echo #define HAVE___INT128 1 confdefs.h
+
+fi
+
 # We also check for sig_atomic_t, which *should* be defined per ANSI
 # C, but is missing on some old platforms.
 ac_fn_c_check_type $LINENO sig_atomic_t ac_cv_type_sig_atomic_t #include signal.h
diff --git a/configure.in b/configure.in
index ca29e93..823abaa 100644
--- a/configure.in
+++ b/configure.in
@@ -1767,6 +1767,8 @@ AC_DEFINE_UNQUOTED(MAXIMUM_ALIGNOF, $MAX_ALIGNOF, [Define as the maximum alignme
 AC_CHECK_TYPES([int8, uint8, int64, uint64], [], [],
 [#include stdio.h])
 
+PGAC_HAVE___INT128
+
 # We also check for sig_atomic_t, which *should* be defined per ANSI
 # C, but is missing on some old platforms.
 AC_CHECK_TYPES(sig_atomic_t, [], [], [#include signal.h])
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 715917b..9b70f36 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -402,7 +402,10 @@ static void apply_typmod(NumericVar *var, int32 typmod);
 
 static int32 numericvar_to_int4(NumericVar *var);
 static bool numericvar_to_int8(NumericVar *var, int64 *result);
-static void int8_to_numericvar(int64 val, NumericVar *var);
+static void int64_to_numericvar(int64 val, NumericVar *var);
+#ifdef HAVE_INT128
+static void int128_to_numericvar(int128 val, NumericVar *var);
+#endif
 static double numeric_to_double_no_overflow(Numeric num);
 static double numericvar_to_double_no_overflow(NumericVar *var);
 
@@ -1414,7 +1417,7 @@ width_bucket_numeric(PG_FUNCTION_ARGS)
 	init_var(count_var);
 
 	/* Convert 'count' to a numeric, for ease of use later */
-	int8_to_numericvar((int64) count, count_var);
+	int64_to_numericvar((int64) count, count_var);
 
 	switch (cmp_numerics(bound1, bound2))
 	{
@@ -2083,14 +2086,14 @@ numeric_fac(PG_FUNCTION_ARGS)
 	

Re: [HACKERS] recovery_target_action = pause hot_standby = off

2015-03-15 Thread Michael Paquier
On Mon, Mar 16, 2015 at 7:38 AM, Andres Freund wrote:
 On 2015-03-15 20:10:49 +, Simon Riggs wrote:
 On 15 March 2015 at 14:16, Andres Freund wrote:

  Personally I think we just should change the default to 'shutdown' for
  all cases. That makes documentation and behaviour less surprising. And
  makes experimenting less dangerous, since you can just start again.

 We need to look at the specific situation, not make a generic decision.

 If hot_standby = off, we are unable to unpause, once paused.

 Changing the default doesn't alter that problem.

 We have two choices: 1) override to a sensible setting, 2) throw an error.

 (2) sounds clean at first but we must look deeper. We know that the
 *only* possible other setting is 'shutdown', so it seems more user
 friendly to do the thing we *know* they want (1), rather than pretend
 that we don't.

 (1) is completely predictable and not at all surprising. Add a LOG
 message if you wish, but don't throw an error.

 Sorry, I don't buy this. If I have recovery_target_action = 'pause' in
 the config file, I want it to pause. Not do something else, just because
 postgres doesn't have a interface to unpause without SQL access. That
 makes some sense to developers, but is pretty much ununderstandable for
 mere mortals.

+1 for removing this code block, and +1 for having a LOG message, with
an additional paragraph in the docs mentioning that when using pause
as recovery_target_action and hot_standby = off there is no direct
interface to unpause the node.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cygwin vs win32 in configure

2015-03-15 Thread Andrew Dunstan


On 03/15/2015 02:12 PM, Andres Freund wrote:

Hi,

On 2015-03-15 14:03:08 -0400, Tom Lane wrote:

Buildfarm member brolga seems unhappy with my commit 91f4a5a that
restricted port/dirmod.c to being built only on Windows.  Evidently
we need to build it when $PORTNAME = cygwin as well, which is fine;
but I'm a bit astonished that none of the other stuff inserted in the
# Win32 support stanza beginning at configure.in line 1440 is needed
for cygwin builds.  Is that really correct, or am I misunderstanding
what's happening here?

Those all sound like things that cygwin is going to emulate for us. I
guess we could use more of those functions to reduce the difference, but
I'm not sure it's worth it.




I think that's right.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster setup_param_list() in plpgsql

2015-03-15 Thread Andrew Dunstan


On 03/14/2015 06:04 PM, Tom Lane wrote:

Given the rather hostile response I got to
http://www.postgresql.org/message-id/4146.1425872...@sss.pgh.pa.us
I was not planning to bring this topic up again until 9.6 development
starts.  However, as I said in that thread, this work is getting done now
because of $dayjob deadlines, and I've realized that it would actually
make a lot of sense to apply it before my expanded-arrays patch that's
pending in the current commitfest.  So I'm going to put on my flameproof
long johns and post it anyway.  I will add it to the 2015-06 commitfest,
but I'd really rather deal with it now ...

What this patch does is to remove setup_param_list() overhead for the
common case of PLPGSQL_DTYPE_VAR variables (ie, any non-composite type).
It does that by the expedient of keeping the ParamExternData image of such
a variable valid at all times.  That adds a few cycles to assignments to
these variables, but removes more cycles from each use of them.  Unless
you believe that common plpgsql functions contain lots of dead stores,
this is a guaranteed win overall.

I'm seeing about 10% overall speedup (vs HEAD, with casserts off) for
realistic simple plpgsql logic, such as this test case:

create or replace function typicalspeed(n int) returns bigint as $$
declare res bigint := 0;
begin
   for i in 1 .. n loop
 res := res + i;
 if i % 10 = 0 then res := res / 10; end if;
   end loop;
   return res;
end
$$ language plpgsql strict stable;

For functions with lots of variables (or even just lots of expressions,
since each one of those is a PLpgSQL_datum too), it's even more helpful.
I have found no cases where it makes things worse, at least to within
measurement error (run-to-run variability is a percent or two for me).

The reason I would like to apply this now rather than wait for 9.6
is that by making parameter management more explicit it removes the
need for the klugy changes in exec_eval_datum() that exist in
http://www.postgresql.org/message-id/22945.1424982...@sss.pgh.pa.us
Instead, we could leave exec_eval_datum() alone and substitute read-only
pointers only when manufacturing the parameter image of an expanded-object
variable.  If we do it in the other order then we'll be making an API
change for exec_eval_datum() in 9.5 (assuming said patch gets in) and then
reverting it come 9.6.

So there you have it.  Now, where'd I put those long johns ...




I'm inclined to say go for it. I can recall cases in the past where we 
have found some significant piece of work to be necessary after feature 
freeze in order to enable a piece of work submitted before feature 
freeze to proceed. This sounds like a similar case.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-03-15 Thread Rui DeSousa
Would a parameter to auto terminate long running transactions be a better 
solution? Terminating the long running transaction would allow for the normal 
vacuuming process to cleanup the deleted records thus avoiding database bloat 
without introducing new semantics to Postgres's MVCC. I would also recommend 
that the default be disabled.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-03-15 Thread Gavin Flower

On 16/03/15 13:04, Rui DeSousa wrote:

Would a parameter to auto terminate long running transactions be a better 
solution? Terminating the long running transaction would allow for the normal 
vacuuming process to cleanup the deleted records thus avoiding database bloat 
without introducing new semantics to Postgres's MVCC. I would also recommend 
that the default be disabled.



Hmm...

Better per transaction, or even per statement.

If I fire of a query I expect should be completed within 30 seconds, 
then I probably don't mind it aborting after 5 minutes.   However, if I 
fire of a query that I expect to take between an hour and 3 hours, then 
I definitely don't want it aborted after 5 minutes!


It could be that 99.99% of transactions will complete with a minute, but 
a small minority might reasonably be expected to take much longer.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind in contrib

2015-03-15 Thread Amit Kapila
On Wed, Mar 11, 2015 at 2:23 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 03/11/2015 05:01 AM, Amit Kapila wrote:



 Can't that happen if the source database (new-master) haven't
 received all of the data from target database (old-master) at the
 time of promotion?
 If yes, then source database won't have WAL for truncation and
 the way current mechanism works is must.

 Now I think for such a case doing truncation in the target database
 is the right solution,


 Yeah, that can happen, and truncation is the correct fix for it. The
logic is pretty well explained by this comment in filemap.c:


  *
  * If it's the same size, do nothing here. Any locally
  * modified blocks will be copied based on parsing the local
  * WAL, and any remotely modified blocks will be updated after
  * rewinding, when the remote WAL is replayed.
  */


What about unlogged table, how will they handle this particular case?

I think after old-master and new-master got diverged any operations
on unlogged table won't guarantee that we can get those modified
blocks from new-master during pg_rewind and I think it can lead
to a case where unlogged tables have some data from old-master
and some data from new master considering user always take of
clean shut-down.


Typo in patch:

+ * For our purposes, only files belonging to the main fork are considered
+ * relation files. Other forks are alwayes copied in toto, because we
cannot

/alwayes/always


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com