Re: [HACKERS] Reduce pinning in btree indexes

2015-03-15 Thread Kyotaro HORIGUCHI
Thank you for rewriting.

I was satisfied with the current patch (Head of btnopin on your
github repos). I have no further comment on the functions. Peter
might have a comment on the README description but I suppose I
can push this to the committers.

I attached the your latest patch to this mail as
bt-nopin-v4.patch for now. Please check that there's no problem
in it.


- By this patch, index scan becomes to release buffer pins while
  fetching index tuples in a page, so it should reduce the chance
  of index scans with long duration to block vacuum, because
  vacuums now can easily overtake the current position of an
  index scan. I didn't actually measured how effective it is,
  though.

- It looks to work correctly for scan in both direction with
  simultaneous page splitting and vacuum.

- It makes no performance deterioration, on the contrary it
  accelerates index scans. It seems to be because of removal of
  lock and unlock surrounding _bt_steppage in bt_next.

- It applies cleanly on the current head on the master branch.

- It has enough intelligible comments and README descriptions.

- An inrelevant typo fix is made in buffers/README by this
  patch. But it doesn't seem necessary to be separated.

regards,



At Fri, 13 Mar 2015 15:08:25 + (UTC), Kevin Grittner  
wrote in <848652045.4044965.1426259305233.javamail.ya...@mail.yahoo.com>
> Kyotaro HORIGUCHI  wrote:
> > At Thu, 12 Mar 2015 15:27:37 -0700, Peter Geoghegan  wrote:
> >> On Sat, Feb 14, 2015 at 4:19 PM, Kevin Grittner  wrote:

> >> I think you should call out those "confounding factors" in the
> >> README. It's not hard to piece them together from
> >> _bt_drop_lock_and_maybe_pin(), but I think you should anyway.
> 
> OK:
> 
> https://github.com/kgrittn/postgres/commit/f5f59ded30b114ac83b90a00ba1fa5ef490b994e

It looks perfect for me. Do you have any further request on this,
Peter?

> >> Don't use BUFFER_LOCK_SHARE -- use BT_READ, as the existing
> >> nbtree LockBuffer() callers do. You're inconsistent about that
> >> in V3.
> >
> > I agree with you. It looks the only point where it is used.
> 
> OK:
> 
> https://github.com/kgrittn/postgres/commit/76118b58be819ed5e68569c926d0222bc41640ea
> 
> > Addition to that, the commnet just above the point methioned
> > above quizes me.
> >
> >>/* XXX: Can walking left be lighter on the locking and pins? */
> >>if (BTScanPosIsPinned(so->currPos))
> >>LockBuffer(so->currPos.buf, BUFFER_LOCK_SHARE);
> >>else
> >>so->currPos.buf = _bt_getbuf(rel, so->currPos.currPage, 
> >> BT_READ);
> >
> > I'm happy if I could read the meaming of the comment more
> > clearly. I understand that it says that you want to remove the
> > locking (and pinning), but can't now because the simultaneous
> > splitting of the left page would break something. I'd like to see
> > it clearer even for me either I am correct or not..
> 
> Does this clear it up?:
> 
> https://github.com/kgrittn/postgres/commit/22066fc161a092e800e4c1e853136c4513f8771b

Thank you for the labor. It also looks perfect.

> Since there are no changes that would affect the compiled code
> here, I'm not posting a new patch yet.  I'll do that once things
> seem to have settled down a bit more.


-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index 213542c..c904d2f 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -92,17 +92,18 @@ To minimize lock/unlock traffic, an index scan always searches a leaf page
 to identify all the matching items at once, copying their heap tuple IDs
 into backend-local storage.  The heap tuple IDs are then processed while
 not holding any page lock within the index.  We do continue to hold a pin
-on the leaf page, to protect against concurrent deletions (see below).
-In this state the scan is effectively stopped "between" pages, either
-before or after the page it has pinned.  This is safe in the presence of
-concurrent insertions and even page splits, because items are never moved
-across pre-existing page boundaries --- so the scan cannot miss any items
-it should have seen, nor accidentally return the same item twice.  The scan
-must remember the page's right-link at the time it was scanned, since that
-is the page to move right to; if we move right to the current right-link
-then we'd re-scan any items moved by a page split.  We don't similarly
-remember the left-link, since it's best to use the most up-to-date
-left-link when trying to move left (see detailed move-left algorithm below).
+on the leaf page in some circumstances, to protect against concurrent
+deletions (see below).  In this state the scan is effectively stopped
+"between" pages, either before or after the page it has pinned.  This is
+safe in the presence of concurrent insertions and even page splits, because
+items are never moved across pre-existing page boundaries --- so the scan
+

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-03-15 Thread Kouhei Kaigai
The attached patch changed invocation order of GetForeignJoinPaths and
set_join_pathlist_hook, and adjusted documentation part on custom-scan.sgml.

Other portions are kept as previous version.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
> Sent: Sunday, March 15, 2015 11:38 AM
> To: Robert Haas; Tom Lane
> Cc: Thom Brown; Shigeru Hanada; pgsql-hackers@postgreSQL.org
> Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
> 
> > On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane  wrote:
> > > Robert Haas  writes:
> > >> Another bit of this that I think we could commit without fretting
> > >> about it too much is the code adding set_join_pathlist_hook.  This is
> > >> - I think - analogous to set_rel_pathlist_hook, and like that hook,
> > >> could be used for other purposes than custom plan generation - e.g. to
> > >> delete paths we do not want to use.  I've extracted this portion of
> > >> the patch and adjusted the comments; if there are no objections, I
> > >> will commit this bit also.
> > >
> > > I don't object to the concept, but I think that is a pretty bad place
> > > to put the hook call: add_paths_to_joinrel is typically called multiple
> > > (perhaps *many*) times per joinrel and thus this placement would force
> > > any user of the hook to do a lot of repetitive work.
> >
> > Interesting point.  I guess the question is whether a some or all
> > callers are going to actually *want* a separate call for each
> > invocation of add_paths_to_joinrel(), or whether they'll be happy to
> > operate on the otherwise-complete path list.  It's true that if your
> > goal is to delete paths, it's probably best to be called just once
> > after the path list is complete, and there might be a use case for
> > that, but I guess it's less useful than for baserels.  For a baserel,
> > as long as you don't nuke the sequential-scan path, there is always
> > going to be a way to complete the plan; so this would be a fine way to
> > implement a disable-an-index extension.  But for joinrels, it's not so
> > easy to rule out, say, a hash-join here.  Neither hook placement is
> > much good for that; the path you want to get rid of may have already
> > dominated paths you want to keep.
> >
> From the standpoint of extension development, I'm uncertain whether we
> can easily reproduce information needed to compute alternative paths on
> the hook at standard_join_search(), like a hook at add_paths_to_joinrel().
> 
> (Please correct me, if I misunderstood.)
> For example, it is not obvious which path is inner/outer of the joinrel
> on which custom-scan provider tries to add an alternative scan path.
> Probably, extension needs to find out the path of source relations from
> the join_rel_level[] array.
> Also, how do we pull SpecialJoinInfo? It contains needed information to
> identify required join-type (like JOIN_LEFT), however, extension needs
> to search join_info_list by relids again, if hook is located at
> standard_join_search().
> Even if number of hook invocation is larger if it is located on
> add_paths_to_joinrel(), it allows to design extensions simpler,
> I think.
> 
> > Suppose you want to add paths - e.g. you have an extension that goes
> > and looks for a materialized view that matches this subtree of the
> > query, and if it finds one, it substitutes a scan of the materialized
> > view for a scan of the baserel.  Or, as in KaiGai's case, you have an
> > extension that can perform the whole join in GPU-land and produce the
> > same results we would have gotten via normal execution.  Either way,
> > you want - and this is the central point of the whole patch here - to
> > inject a scan path into a joinrel.  It is not altogether obvious to me
> > what the best placement for this is.  In the materialized view case,
> > you probably need a perfect match between the baserels in the view and
> > the baserels in the joinrel to do anything.  There's no point in
> > re-checking that for every innerrels/outerrels combination.  I don't
> > know enough about the GPU case to reason about it intelligently; maybe
> > KaiGai can comment.
> >
> In case of GPU, extension will add alternative paths based on hash-join
> and nested-loop algorithm with individual cost estimation as long as
> device can execute join condition. It expects planner (set_cheapest)
> will choose the best path in the built-in/additional ones.
> So, it is more reasonable for me, if extension can utilize a common
> infrastructure as built-in logic (hash-join/merge-join/nested-loop)
> is using to compute its cost estimation.
> 
> > But there's another possible approach: suppose that
> > join_search_one_level, after considering left-sided and right-sided
> > joins and after considering bushy joins, checks whether every relation
> > it's got is from the same foreign server, and 

Re: [HACKERS] Parallel Seq Scan

2015-03-15 Thread Amit Langote
On 13-03-2015 PM 11:03, Amit Kapila wrote:
> On Fri, Mar 13, 2015 at 7:15 PM, Robert Haas  wrote:
>>
>> I don't think this is the right fix; the point of that code is to
>> remove a tuple queue from the funnel when it gets detached, which is a
>> correct thing to want to do.  funnel->nextqueue should always be less
>> than funnel->nqueues; how is that failing to be the case here?
>>
> 
> I could not reproduce the issue, neither the exact scenario is
> mentioned in mail.  However what I think can lead to funnel->nextqueue
> greater than funnel->nqueues is something like below:
> 
> Assume 5 queues, so value of funnel->nqueues will be 5 and
> assume value of funnel->nextqueue is 2, so now let us say 4 workers
> got detached one-by-one, so for such a case it will always go in else loop
> and will never change funnel->nextqueue whereas value of funnel->nqueues
> will become 1.
> 

Or if the just-detached queue happens to be the last one, we'll make
shm_mq_receive() to read from a potentially already-detached queue in the
immediately next iteration. That seems to be caused by not having updated the
funnel->nextqueue. With the returned value being SHM_MQ_DETACHED, we'll again
try to remove it from the queue. In this case, it causes the third argument to
memcpy be negative and hence the segfault.

I can't seem to really figure out the other problem of waiting forever in
WaitLatch() but I had managed to make it go away with:

-if (funnel->nextqueue == waitpos)
+if (result != SHM_MQ_DETACHED && funnel->nextqueue == waitpos)

By the way, you can try reproducing this with the example I posted on Friday.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind in contrib

2015-03-15 Thread Amit Kapila
On Wed, Mar 11, 2015 at 2:23 PM, Heikki Linnakangas  wrote:
>
> On 03/11/2015 05:01 AM, Amit Kapila wrote:
>>
>>
>>
>> Can't that happen if the source database (new-master) haven't
>> received all of the data from target database (old-master) at the
>> time of promotion?
>> If yes, then source database won't have WAL for truncation and
>> the way current mechanism works is must.
>>
>> Now I think for such a case doing truncation in the target database
>> is the right solution,
>
>
> Yeah, that can happen, and truncation is the correct fix for it. The
logic is pretty well explained by this comment in filemap.c:
>
>
>  *
>  * If it's the same size, do nothing here. Any locally
>  * modified blocks will be copied based on parsing the local
>  * WAL, and any remotely modified blocks will be updated after
>  * rewinding, when the remote WAL is replayed.
>  */
>

What about unlogged table, how will they handle this particular case?

I think after old-master and new-master got diverged any operations
on unlogged table won't guarantee that we can get those modified
blocks from new-master during pg_rewind and I think it can lead
to a case where unlogged tables have some data from old-master
and some data from new master considering user always take of
clean shut-down.


Typo in patch:

+ * For our purposes, only files belonging to the main fork are considered
+ * relation files. Other forks are alwayes copied in toto, because we
cannot

/alwayes/always


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-03-15 Thread Etsuro Fujita

On 2015/03/13 11:46, Etsuro Fujita wrote:

BTW, what do you think about opening/locking foreign tables selected for
update at InitPlan, which the original patch does?  As I mentioned in
[1], ISTM that ExecOpenScanRelation called from ExecInitForeignScan is
assuming that.



[1] http://www.postgresql.org/message-id/54bcbbf8.3020...@lab.ntt.co.jp


Let me explain further.  Here is the comment in ExecOpenScanRelation:

 * Determine the lock type we need.  First, scan to see if target 
relation

 * is a result relation.  If not, check if it's a FOR UPDATE/FOR SHARE
 * relation.  In either of those cases, we got the lock already.

I think this is not true for foreign tables selected FOR UPDATE/SHARE, 
which have markType = ROW_MARK_COPY, because such foreign tables don't 
get opened/locked by InitPlan.  Then such foreign tables don't get 
locked by neither of InitPlan nor ExecOpenScanRelation.  I think this is 
a bug.  To fix it, I think we should open/lock such foreign tables at 
InitPlan as the original patch does.


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat

2015-03-15 Thread Gavin Flower

On 16/03/15 13:04, Rui DeSousa wrote:

Would a parameter to auto terminate long running transactions be a better 
solution? Terminating the long running transaction would allow for the normal 
vacuuming process to cleanup the deleted records thus avoiding database bloat 
without introducing new semantics to Postgres's MVCC. I would also recommend 
that the default be disabled.



Hmm...

Better per transaction, or even per statement.

If I fire of a query I expect should be completed within 30 seconds, 
then I probably don't mind it aborting after 5 minutes.   However, if I 
fire of a query that I expect to take between an hour and 3 hours, then 
I definitely don't want it aborted after 5 minutes!


It could be that 99.99% of transactions will complete with a minute, but 
a small minority might reasonably be expected to take much longer.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat

2015-03-15 Thread Rui DeSousa
Would a parameter to auto terminate long running transactions be a better 
solution? Terminating the long running transaction would allow for the normal 
vacuuming process to cleanup the deleted records thus avoiding database bloat 
without introducing new semantics to Postgres's MVCC. I would also recommend 
that the default be disabled.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster setup_param_list() in plpgsql

2015-03-15 Thread Andrew Dunstan


On 03/14/2015 06:04 PM, Tom Lane wrote:

Given the rather hostile response I got to
http://www.postgresql.org/message-id/4146.1425872...@sss.pgh.pa.us
I was not planning to bring this topic up again until 9.6 development
starts.  However, as I said in that thread, this work is getting done now
because of $dayjob deadlines, and I've realized that it would actually
make a lot of sense to apply it before my expanded-arrays patch that's
pending in the current commitfest.  So I'm going to put on my flameproof
long johns and post it anyway.  I will add it to the 2015-06 commitfest,
but I'd really rather deal with it now ...

What this patch does is to remove setup_param_list() overhead for the
common case of PLPGSQL_DTYPE_VAR variables (ie, any non-composite type).
It does that by the expedient of keeping the ParamExternData image of such
a variable valid at all times.  That adds a few cycles to assignments to
these variables, but removes more cycles from each use of them.  Unless
you believe that common plpgsql functions contain lots of dead stores,
this is a guaranteed win overall.

I'm seeing about 10% overall speedup (vs HEAD, with casserts off) for
realistic simple plpgsql logic, such as this test case:

create or replace function typicalspeed(n int) returns bigint as $$
declare res bigint := 0;
begin
   for i in 1 .. n loop
 res := res + i;
 if i % 10 = 0 then res := res / 10; end if;
   end loop;
   return res;
end
$$ language plpgsql strict stable;

For functions with lots of variables (or even just lots of expressions,
since each one of those is a PLpgSQL_datum too), it's even more helpful.
I have found no cases where it makes things worse, at least to within
measurement error (run-to-run variability is a percent or two for me).

The reason I would like to apply this now rather than wait for 9.6
is that by making parameter management more explicit it removes the
need for the klugy changes in exec_eval_datum() that exist in
http://www.postgresql.org/message-id/22945.1424982...@sss.pgh.pa.us
Instead, we could leave exec_eval_datum() alone and substitute read-only
pointers only when manufacturing the parameter image of an expanded-object
variable.  If we do it in the other order then we'll be making an API
change for exec_eval_datum() in 9.5 (assuming said patch gets in) and then
reverting it come 9.6.

So there you have it.  Now, where'd I put those long johns ...




I'm inclined to say go for it. I can recall cases in the past where we 
have found some significant piece of work to be necessary after feature 
freeze in order to enable a piece of work submitted before feature 
freeze to proceed. This sounds like a similar case.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cygwin vs "win32" in configure

2015-03-15 Thread Andrew Dunstan


On 03/15/2015 02:12 PM, Andres Freund wrote:

Hi,

On 2015-03-15 14:03:08 -0400, Tom Lane wrote:

Buildfarm member brolga seems unhappy with my commit 91f4a5a that
restricted port/dirmod.c to being built only on "Windows".  Evidently
we need to build it when $PORTNAME = "cygwin" as well, which is fine;
but I'm a bit astonished that none of the other stuff inserted in the
"# Win32 support" stanza beginning at configure.in line 1440 is needed
for cygwin builds.  Is that really correct, or am I misunderstanding
what's happening here?

Those all sound like things that cygwin is going to emulate for us. I
guess we could use more of those functions to reduce the difference, but
I'm not sure it's worth it.




I think that's right.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_action = pause & hot_standby = off

2015-03-15 Thread Michael Paquier
On Mon, Mar 16, 2015 at 7:38 AM, Andres Freund wrote:
> On 2015-03-15 20:10:49 +, Simon Riggs wrote:
>> On 15 March 2015 at 14:16, Andres Freund wrote:
>>
>> > Personally I think we just should change the default to 'shutdown' for
>> > all cases. That makes documentation and behaviour less surprising. And
>> > makes experimenting less dangerous, since you can just start again.
>>
>> We need to look at the specific situation, not make a generic decision.
>>
>> If hot_standby = off, we are unable to unpause, once paused.
>>
>> Changing the default doesn't alter that problem.
>>
>> We have two choices: 1) override to a sensible setting, 2) throw an error.
>>
>> (2) sounds clean at first but we must look deeper. We know that the
>> *only* possible other setting is 'shutdown', so it seems more user
>> friendly to do the thing we *know* they want (1), rather than pretend
>> that we don't.
>>
>> (1) is completely predictable and not at all surprising. Add a LOG
>> message if you wish, but don't throw an error.
>
> Sorry, I don't buy this. If I have "recovery_target_action = 'pause'" in
> the config file, I want it to pause. Not do something else, just because
> postgres doesn't have a interface to unpause without SQL access. That
> makes some sense to developers, but is pretty much ununderstandable for
> mere mortals.

+1 for removing this code block, and +1 for having a LOG message, with
an additional paragraph in the docs mentioning that when using pause
as recovery_target_action and hot_standby = off there is no direct
interface to unpause the node.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-03-15 Thread Andreas Karlsson

On 03/15/2015 09:47 PM, Petr Jelinek wrote:

It's almost the same thing as you wrote but the 128 bit and 64 bit
optimizations are put on the same "level" of optimized routines.

But this is nitpicking at this point, I am happy with the patch as it
stands right now.


Do you think it is ready for committer?

New version with altered comment is attached.

--
Andreas Karlsson
diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 509f961..48fcc68 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -125,6 +125,41 @@ undefine([Ac_cachevar])dnl
 ])# PGAC_TYPE_64BIT_INT
 
 
+# PGAC_HAVE___INT128
+# --
+# Check if __int128 is a working 128 bit integer type and
+# define HAVE___INT128 if so.
+AC_DEFUN([PGAC_HAVE___INT128],
+[AC_CACHE_CHECK(for __int128, pgac_cv_have___int128,
+[AC_TRY_RUN([
+__int128 a = 2001;
+__int128 b = 4005;
+
+int does_int128_work()
+{
+  __int128 c,d;
+
+  c = a * b;
+  d = (c + b) / b;
+  if (d != a+1)
+return 0;
+  return 1;
+}
+main() {
+  exit(! does_int128_work());
+}],
+[pgac_cv_have___int128=yes],
+[pgac_cv_have___int128=no],
+[# If cross-compiling, check the size reported by the compiler and
+# trust that the arithmetic works.
+AC_COMPILE_IFELSE([AC_LANG_BOOL_COMPILE_TRY([], [sizeof(__int128) == 16])],
+  pgac_cv_have___int128=yes,
+  pgac_cv_have___int128=no)])])
+
+if test x"$pgac_cv_have___int128" = xyes ; then
+  AC_DEFINE(HAVE___INT128, 1, [Define to 1 if the system has the type `__int128'.])
+fi])# PGAC_TYPE_64BIT_INT
+
 
 # PGAC_C_FUNCNAME_SUPPORT
 # ---
diff --git a/configure b/configure
index 379dab1..b60f55f 100755
--- a/configure
+++ b/configure
@@ -13789,6 +13789,74 @@ _ACEOF
 fi
 
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __int128" >&5
+$as_echo_n "checking for __int128... " >&6; }
+if ${pgac_cv_have___int128+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  if test "$cross_compiling" = yes; then :
+  # If cross-compiling, check the size reported by the compiler and
+# trust that the arithmetic works.
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+static int test_array [1 - 2 * !(sizeof(__int128) == 16)];
+test_array [0] = 0;
+return test_array [0];
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_have___int128=yes
+else
+  pgac_cv_have___int128=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+__int128 a = 2001;
+__int128 b = 4005;
+
+int does_int128_work()
+{
+  __int128 c,d;
+
+  c = a * b;
+  d = (c + b) / b;
+  if (d != a+1)
+return 0;
+  return 1;
+}
+main() {
+  exit(! does_int128_work());
+}
+_ACEOF
+if ac_fn_c_try_run "$LINENO"; then :
+  pgac_cv_have___int128=yes
+else
+  pgac_cv_have___int128=no
+fi
+rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
+  conftest.$ac_objext conftest.beam conftest.$ac_ext
+fi
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_have___int128" >&5
+$as_echo "$pgac_cv_have___int128" >&6; }
+
+if test x"$pgac_cv_have___int128" = xyes ; then
+
+$as_echo "#define HAVE___INT128 1" >>confdefs.h
+
+fi
+
 # We also check for sig_atomic_t, which *should* be defined per ANSI
 # C, but is missing on some old platforms.
 ac_fn_c_check_type "$LINENO" "sig_atomic_t" "ac_cv_type_sig_atomic_t" "#include 
diff --git a/configure.in b/configure.in
index ca29e93..823abaa 100644
--- a/configure.in
+++ b/configure.in
@@ -1767,6 +1767,8 @@ AC_DEFINE_UNQUOTED(MAXIMUM_ALIGNOF, $MAX_ALIGNOF, [Define as the maximum alignme
 AC_CHECK_TYPES([int8, uint8, int64, uint64], [], [],
 [#include ])
 
+PGAC_HAVE___INT128
+
 # We also check for sig_atomic_t, which *should* be defined per ANSI
 # C, but is missing on some old platforms.
 AC_CHECK_TYPES(sig_atomic_t, [], [], [#include ])
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 715917b..9b70f36 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -402,7 +402,10 @@ static void apply_typmod(NumericVar *var, int32 typmod);
 
 static int32 numericvar_to_int4(NumericVar *var);
 static bool numericvar_to_int8(NumericVar *var, int64 *result);
-static void int8_to_numericvar(int64 val, NumericVar *var);
+static void int64_to_numericvar(int64 val, NumericVar *var);
+#ifdef HAVE_INT128
+static void int128_to_numericvar(int128 val, NumericVar *var);
+#endif
 static double numeric_to_double_no_overflow(Numeric num);
 static double numericvar_to_double_no_overflow(NumericVar *var);
 
@@ -1414,7 +1417,7 @@ width_bucket_numeric(PG_FUNCTION_ARGS)
 	init_var(&count_var);
 
 	/* Convert 'count' to a numeric, for ease of use later */
-	int8_to_numericvar((int64) count, &count_var);
+	int64_to_numericvar((int64) count, &count_var);
 
 	switch (cmp_numerics(bound1, bound2))
 	{
@@ -2083,14 +2086,14 @@ numeric_fac

Re: [HACKERS] PATCH: pgbench - merging transaction logs

2015-03-15 Thread Tomas Vondra
On 15.3.2015 20:35, Fabien COELHO wrote:
> 
>> Firstly, the fact that pgbench produces one file per thread is awkward.
> 
> I agree, but I think it is due to the multi process thread emulation: if
> you have real threads, you can do a simple fprintf, possibly with some
> mutex, and you're done. There is really nothing to do to implement this
> feature.

My fear was that this either requires explicit locking, or some implicit
locking - for example while fprintf() in glibc is thread-safe, I really
am not sure about Win32 for example. This might influence the results
for very short transactions - I haven't tried that, though, so this
might be a false assumption.

The way I implemented the merge is immute to this.

>> Secondly, a separate tool (even if provided with pgbench) would
>> require passing detailed info about the log format - whether it's
>> aggregated or not, throttling or not, 
> 
> Hmmm.
> 
>>> It could belong to pgbench if pgbench was *generating* the merged
>>> log file directly. [...]
>>
>> I considered this approach, i.e. adding another 'collector' thread 
>> receiving results from all the other threads, but decided not to
>> do that. That would require a fair amount of inter-process
>> communication, locking etc. and might affect the measurements
> 
> I agree that inter-process stuff should be avoided. This is not what 
> I had in mind. I was thinking of "fprintf" on the same file handler 
> by different threads.

That still involves some sort of 'implicit' locking, no? And as I
mentioned, I'm not sure fprintf() is thread-safe on all the platforms we
support.


>>> Another issue raised by your patch is that the log format may be 
>>> improved, say by providing a one-field timestamp at the beginning
>>> of the line.
>>
>> I don't see how this is relevant to this patch?
> 
> For the simple log format, all the parsing needed would be for the 
> timestamp, and the remainder would just be text to pass along, no
> need to %d %f... whatever.

Oh, ok. Well, that's true, but I don't think that significantly changes
the overall complexity.

>>> The current IO complexity is in p²n where it should be simply pn...
>>
>> Maybe. Implementing a 2-way merge sort was the simplest solution at
>> the moment, and I don't think this really matters because the I/O
>> generated by the benchmark itself is usually much higher than this.
> 
> If you do not do a n-way merge, you could do a 2-way merge on a
> binary tree so that the IO complexity would be p.log(p).n (I think),
> and not p²n.

Yes, I could do that.

I still think this probably an overengineering, but not a big deal. I'll
leave this for later, though (this patch is in 2015-06 CF anyway).

>>> For aggregates, some data in the output may be special values
>>> "NaN/-/...", [...]
>> You mean the aggregated log? I can't think of a way to get there such
>> values - can you provide a plausible scenario how that could happen?
> 
> Possibly I'm wrong. Ok, I tried it:
> 
>  sh> ./pgbench --rate=0.5 --aggregate-interval=1 --log
>  sh> cat pgbench_log.12671
>  1426445236 1 5034 25341156 5034 5034 687 471969 687 687
>  1426445237 0 0 0 0 0 0 0 0 0
>  1426445238 0 0 0 0 0 0 0 0 0
>  1426445239 1 8708 75829264 8708 8708 2063 4255969 2063 2063
>  ...
> 
> Good news, I could not generate strange values. Just when there are
> 0 transactions possibly some care should be taken when combining 
> values.

Yeah, I wasn't able to come up with such scenario, but wasn't sure.

> 
>>> It seems that system function calls are not tested for errors.
>>
>> That's true, but that's how the rest of pgbench code is written.
> 
> Hmmm This is not entirely true, there are *some* checks if you look
> carefully:-)
> 
>if ((fd = fopen(filename, "r")) == NULL) ...
>if ((fp = popen(command, "r")) == NULL) ...
>nsocks = select(...)
>if (nsocks < 0) ...

OK, there are a few checks ;-) But none of the fprintf calls IIRC.

Anyway, I plan to refactor this part of the patch to get rid of the
copy'n'paste pieces, so I'll take care of this too.

>>> (b) if we could get rid of the "thread emulation", pgbench could
>>> generate the merged logs directly and simply, and the option could
>>> be provided then.
>>
>> That however is not the goal of this patch.
> 
> Sure. My point is that the amount of code you write to implement this
> merge stuff is due to this feature. Without it, the patch would 
> probably need 10 lines of code. Moreover, the way it is implement 
> requires scanning and reprinting, which means more work in many 
> places to update the format later.

Possibly. But it's not written like that :-(


>> The thread emulation is there for a reason,
> 
> My opinion is that it *was* there for a reason. Whether it makes 
> sense today to still have it, maintain it, and have to write such 
> heavy code for a trivial feature just because of it is another 
> matter.

Possibly. I can't really decide that.

>> and I certainly am not going to work on eliminating it
>> (not sure that'

Re: [HACKERS] recovery_target_action = pause & hot_standby = off

2015-03-15 Thread Andres Freund
On 2015-03-15 20:10:49 +, Simon Riggs wrote:
> On 15 March 2015 at 14:16, Andres Freund  wrote:
> 
> > Personally I think we just should change the default to 'shutdown' for
> > all cases. That makes documentation and behaviour less surprising. And
> > makes experimenting less dangerous, since you can just start again.
> 
> We need to look at the specific situation, not make a generic decision.
> 
> If hot_standby = off, we are unable to unpause, once paused.
> 
> Changing the default doesn't alter that problem.
> 
> We have two choices: 1) override to a sensible setting, 2) throw an error.
> 
> (2) sounds clean at first but we must look deeper. We know that the
> *only* possible other setting is 'shutdown', so it seems more user
> friendly to do the thing we *know* they want (1), rather than pretend
> that we don't.
> 
> (1) is completely predictable and not at all surprising. Add a LOG
> message if you wish, but don't throw an error.

Sorry, I don't buy this. If I have "recovery_target_action = 'pause'" in
the config file, I want it to pause. Not do something else, just because
postgres doesn't have a interface to unpause without SQL access. That
makes some sense to developers, but is pretty much ununderstandable for
mere mortals.

Even worse, right now (after the bugfix), the behaviour is:

postgresql.conf:
# hot_standby = off
recovery.conf:
#recovery_target_action = 'pause'
=> promote (the downgrading is only active when explicitly configured)

# hot_standby = off
recovery.conf:
recovery_target_action = 'pause'
=> shutdown (despite an explicit setting of pause)

hot_standby = on
recovery.conf:
# recovery_target_action = 'pause'
=> pause

hot_standby = on
recovery.conf:
recovery_target_action = 'pause'
=> pause

To me that's just utterly confusing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-03-15 Thread Petr Jelinek

On 14/03/15 04:17, Andreas Karlsson wrote:

On 03/13/2015 10:22 PM, Peter Geoghegan wrote:

On Thu, Mar 12, 2015 at 6:23 PM, Andreas Karlsson 
wrote:

/*
  * Integer data types use Numeric accumulators to share code and
avoid risk
  * of overflow.  To speed up aggregation 128-bit integer
accumulators are
  * used instead where sum(X) or sum(X*X) fit into 128-bits, and
there is
  * platform support.
  *
  * For int2 and int4 inputs sum(X) will fit into a 64-bit
accumulator, hence
  * we use faster special-purpose accumulator routines for SUM and
AVG of
  * these datatypes.
  */

#ifdef HAVE_INT128
typedef struct Int128AggState


Not quite. Refer to the 128-bit integer accumulators as
"special-purpose accumulator routines" instead. Then, in the case of
the extant 64-bit accumulators, refer to them by the shorthand
"integer accumulators". Otherwise it's the wrong way around.


I disagree. The term "integer accumulators" is confusing. Right now I do
not have any better ideas for how to write that comment, so I submit the
next version of the patch with the comment as I wrote it above. Feel
free to come up with a better wording, my English is not always up to
par when writing technical texts.



I don't like the term "integer accumulators" either as "integer" is 
platform specific. I would phase it like this:


/*
 * Integer data types in general use Numeric accumulators to share code
 * and avoid risk of overflow.
 *
 * However for performance reasons some of the optimized special-purpose
 * accumulator routines are used when possible.
 *
 * On platforms with 128-bit integer support, the 128-bit routines will be
 * used when sum(X) or sum(X*X) fit into 128-bit.
 *
 * For int2 and int4 inputs, the N and sum(X) fit into 64-bit so the 64-bit
 * accumulators will be used for SUM and AVG of these data types.
 */

It's almost the same thing as you wrote but the 128 bit and 64 bit 
optimizations are put on the same "level" of optimized routines.


But this is nitpicking at this point, I am happy with the patch as it 
stands right now.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_action = pause & hot_standby = off

2015-03-15 Thread Simon Riggs
On 15 March 2015 at 14:16, Andres Freund  wrote:

> Personally I think we just should change the default to 'shutdown' for
> all cases. That makes documentation and behaviour less surprising. And
> makes experimenting less dangerous, since you can just start again.

We need to look at the specific situation, not make a generic decision.

If hot_standby = off, we are unable to unpause, once paused.

Changing the default doesn't alter that problem.

We have two choices: 1) override to a sensible setting, 2) throw an error.

(2) sounds clean at first but we must look deeper. We know that the
*only* possible other setting is 'shutdown', so it seems more user
friendly to do the thing we *know* they want (1), rather than pretend
that we don't.

(1) is completely predictable and not at all surprising. Add a LOG
message if you wish, but don't throw an error.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - merging transaction logs

2015-03-15 Thread Fabien COELHO



Firstly, the fact that pgbench produces one file per thread is awkward.


I agree, but I think it is due to the multi process thread emulation: if 
you have real threads, you can do a simple fprintf, possibly with some 
mutex, and you're done. There is really nothing to do to implement this

feature.


Secondly, a separate tool (even if provided with pgbench) would require
passing detailed info about the log format - whether it's aggregated or
not, throttling or not, 


Hmmm.


It could belong to pgbench if pgbench was *generating* the merged
log file directly. [...]


I considered this approach, i.e. adding another 'collector' thread
receiving results from all the other threads, but decided not to do
that. That would require a fair amount of inter-process communication,
locking etc. and might affect the measurements


I agree that inter-process stuff should be avoided. This is not what I had 
in mind. I was thinking of "fprintf" on the same file handler by different 
threads.



Another issue raised by your patch is that the log format may be
improved, say by providing a one-field timestamp at the beginning of
the line.


I don't see how this is relevant to this patch?


For the simple log format, all the parsing needed would be for the 
timestamp, and the remainder would just be text to pass along, no need to 
%d %f... whatever.



The current IO complexity is in p²n where it should be simply pn...


Maybe. Implementing a 2-way merge sort was the simplest solution at the
moment, and I don't think this really matters because the I/O generated
by the benchmark itself is usually much higher than this.


If you do not do a n-way merge, you could do a 2-way merge on a binary 
tree so that the IO complexity would be p.log(p).n (I think), and not p²n.



For aggregates, some data in the output may be special values
"NaN/-/...", [...]

You mean the aggregated log? I can't think of a way to get there such
values - can you provide a plausible scenario how that could happen?


Possibly I'm wrong. Ok, I tried it:

 sh> ./pgbench --rate=0.5 --aggregate-interval=1 --log
 sh> cat pgbench_log.12671
 1426445236 1 5034 25341156 5034 5034 687 471969 687 687
 1426445237 0 0 0 0 0 0 0 0 0
 1426445238 0 0 0 0 0 0 0 0 0
 1426445239 1 8708 75829264 8708 8708 2063 4255969 2063 2063
 ...

Good news, I could not generate strange values. Just when there are 0 
transactions possibly some care should be taken when combining values.



It seems that system function calls are not tested for errors.


That's true, but that's how the rest of pgbench code is written.


Hmmm This is not entirely true, there are *some* checks if you look 
carefully:-)


   if ((fd = fopen(filename, "r")) == NULL) ...
   if ((fp = popen(command, "r")) == NULL) ...
   nsocks = select(...)
   if (nsocks < 0) ...


So integrating this into pgbench directly seems like a better approach,
and the attached patch implements that.


You guessed that I disagree. Note that this is only my own opinion.

In summary, I think that:

(a) providing a clean script would be nice,

(b) if we could get rid of the "thread emulation", pgbench could
generate the merged logs directly and simply, and the option could
be provided then.


That however is not the goal of this patch.


Sure. My point is that the amount of code you write to implement this 
merge stuff is due to this feature. Without it, the patch would probably 
need 10 lines of code. Moreover, the way it is implement requires scanning 
and reprinting, which means more work in many places to update the format 
later.



The thread emulation is there for a reason,


My opinion is that it *was* there for a reason. Whether it makes sense 
today to still have it, maintain it, and have to write such heavy code for 
a trivial feature just because of it is another matter.



and I certainly am not going to work on eliminating it
(not sure that's even possible).


I wish it will be:-)

I would suggest this that I would support: implement this feature the 
simple way (aka fprintf, maybe a mutex) when compiled with threads, and 
generate an error "feature not available with process-based thread 
emulation" when compiled with processes. This way we avoid a, lot of heavy 
code to maintain in the future, and you still get the feature within 
pgbench. There are already some things which are not the same with thread 
emulation because it would have been tiring to implement for it for very 
little benefit.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace

2015-03-15 Thread Julien Tachoires
On 15/03/2015 04:34, Andreas Karlsson wrote:
> On 03/15/2015 04:25 AM, Andreas Karlsson wrote:
>> Nice. You will also want to apply the attached patch which fixes support
>> for the --no-tablespaces flag.
> 
> Just realized that --no-tablespaces need to be fixed for pg_restore too.

Indeed, after taking a look at pg_restore case, I would say it won't be
so easy.
Will try to fix it.

--
Julien


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - merging transaction logs

2015-03-15 Thread Tomas Vondra
On 15.3.2015 11:22, Fabien COELHO wrote:
>
> I've looked at the patch. Although I think that such a feature is
> somehow desirable... I have two issues with it: ISTM that
> 
> (1) it does not belong to pgbench as such
> 
> (2) even if, the implementation is not right
> 
> About (1):
> 
> I think that this functionnality as implemented does not belong to 
> pgbench, and does really belong to an external script, which happen 
> not to be readily available, which is a shame. PostgreSQL should 
> probably provide such a script, or make it easy to find.

I disagree, for two main reasons.

Firstly, the fact that pgbench produces one file per thread is awkward.
It was implemented like this most likely because it was the simplest
solution after adding "-j" in 9.0, but I don't remember if I ever needed
a separate log from a single thread.

Secondly, a separate tool (even if provided with pgbench) would require
passing detailed info about the log format - whether it's aggregated or
not, throttling or not, 

Those are the two main reasons why I think this should be implemented as
another option in pgbench.


> It could belong to pgbench if pgbench was *generating* the merged
> log file directly. However I'm not sure this can be done cleanly for
> the multi-process "thread emulation" (IN PASSING: which I think this 
> should really get rid of because I do not know what system does not 
> provide threads nowadays and would require to have a state of the art
> pgbench nevertheless, and this emulation significantly complexify the
> code by making things uselessly difficult and/or needed to be 
> implemented twice or not be provided in some cases).

I considered this approach, i.e. adding another 'collector' thread
receiving results from all the other threads, but decided not to do
that. That would require a fair amount of inter-process communication,
locking etc. and might affect the measurements (especially for the
workloads with extremely short transactions, like read-only pgbench).

The implemented approach, i.e. merging results collected by each thread
independently, after the benchmarking actually completed, is a better
solution. And the code is actually simpler.


> Another issue raised by your patch is that the log format may be 
> improved, say by providing a one-field timestamp at the beginning of 
> the line.

I don't see how this is relevant to this patch?


> About (2):
> 
> In the implementation you reimplement a partial merge sort by
> parsing each line of each file and merging it with the current result
> over and over. ISTM that an implementation should read all files in
> parallel and merge them in one pass. The current IO complexity is in
> p²n where it should be simply pn... do not use it with a significant
> number of threads and many lines... Ok, the generated files are
> likely to be kept in cache, but nevertheless.

Maybe. Implementing a 2-way merge sort was the simplest solution at the
moment, and I don't think this really matters because the I/O generated
by the benchmark itself is usually much higher than this.

The only case when this might make a difference is probably merging
large transaction logs (e.g. 100% sample from read-only test on small
dataset).

> Also there are plenty of copy paste when scanning for two files, and 
> then reprinting in all the different formats. The same logic is 
> implemented twice, once for simple and once for aggregated. This
> means that updating or extending the log format later on would
> require to modify these scans and prints in many places.

That is true, and I'll address that somehow (either moving the code to a
macro or a separate utility function).

> For aggregates, some data in the output may be special values 
> "NaN/-/...", I am not sure how the implementation would behave in
> such cases. As lines that do not match are silently ignored, the
> result merge would just be non significant should it rather be an
> error? Try it with a low rate for instance.

You mean the aggregated log? I can't think of a way to get there such
values - can you provide a plausible scenario how that could happen?

> 
> It seems that system function calls are not tested for errors.

That's true, but that's how the rest of pgbench code is written.

> 
>> The other disadvantage of the external scripts is that you have to
>> pass all the info about the logs (whether the logs are aggregated,
>> whther there's throttling, etc.).
> 
> I think that is another argument to make a better format, with the a
> timestamp ahead? Also, ISTM that it only needs to know whether it is
> merging aggregate or simple logs, no more, the other information can be
> infered by the number of fields on the line.

No, it's not. Even if you change the format like this, you still have no
idea whether the log is a per-transaction log (possibly with some
additional options), aggregated log.

There might be some auto-detection based on number of fields, for
example, but considering how many optio

Re: [HACKERS] Cygwin vs "win32" in configure

2015-03-15 Thread Andres Freund
Hi,

On 2015-03-15 14:03:08 -0400, Tom Lane wrote:
> Buildfarm member brolga seems unhappy with my commit 91f4a5a that
> restricted port/dirmod.c to being built only on "Windows".  Evidently
> we need to build it when $PORTNAME = "cygwin" as well, which is fine;
> but I'm a bit astonished that none of the other stuff inserted in the
> "# Win32 support" stanza beginning at configure.in line 1440 is needed
> for cygwin builds.  Is that really correct, or am I misunderstanding
> what's happening here?

Those all sound like things that cygwin is going to emulate for us. I
guess we could use more of those functions to reduce the difference, but
I'm not sure it's worth it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cygwin vs "win32" in configure

2015-03-15 Thread Tom Lane
Buildfarm member brolga seems unhappy with my commit 91f4a5a that
restricted port/dirmod.c to being built only on "Windows".  Evidently
we need to build it when $PORTNAME = "cygwin" as well, which is fine;
but I'm a bit astonished that none of the other stuff inserted in the
"# Win32 support" stanza beginning at configure.in line 1440 is needed
for cygwin builds.  Is that really correct, or am I misunderstanding
what's happening here?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on SRF execution

2015-03-15 Thread Itai Dor-On
Thanks Tom.

Assuming the SRF had a parameter, would this be a correct approach
(considering the iterative model) to bail-out early?

if (SRF_IS_FIRSTCALL())
{
int i;

if (get_call_result_type(fcinfo, NULL, &funcctx->tuple_desc) !=
TYPEFUNC_COMPOSITE) 
{
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("Check if sql function definition returns SETOF
record"))); 
return;
}

if (PG_ARGISNULL(0)) 
{
ereport(ERROR,
(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 errmsg("Null value not allow for ...")));
 return;
}
 
if((i = PG_GETARG_INT32(0)) != 'WHATEVER') 
{
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("Null value not allow for ...")));
 return;
}

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Sunday, March 15, 2015 5:50 PM
To: Itai
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Crash on SRF execution

Itai  writes:
> I'm attempting to program a simple SRF function but it constantly crashes
(details and code below).
> Any idea why?

Looks like you're pallocing some stuff in the calling context (ie, a
short-lived context) during the first execution and expecting it to still be
there in later executions.  You'd need to allocate those data structures in
the multi_call_memory_ctx instead.

regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized

2015-03-15 Thread Andres Freund
On 2015-03-15 16:30:16 +0100, Andres Freund wrote:
> The attached patch does pretty much what you suggested above and tries
> to address all the point previously made. I plan to push this fairly
> soon; unless somebody has further input.

Pushed, after fixing two typos in decode.c I introduced while "cleaning
up". This might not yet be perfect, but there seems little point in
waiting further.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-15 Thread Andres Freund
Hi,

On 2015-03-03 11:04:30 -0800, Josh Berkus wrote:
> Attached is version D, which incorporates the above two changes, but NOT
> a general unit comment cleanup of postgresql.conf, which needs to be an
> entirely different patch.

Pushed!

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

2015-03-15 Thread Andres Freund
Hi,

On 2014-12-08 02:18:11 +0100, Petr Jelinek wrote:
> ...except for the removal of pause_at_recovery_target it seems, so I
> attached just that

Pushed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on SRF execution

2015-03-15 Thread Itai



Fantastic! That solved this problem.


However I still get a crash  if I change: 


is_even to bool num->is_even = ((base_num + i) % 2 == 0) ? true : false; 
retvals[1] = BoolGetDatum(list->pp_numbers[call_cntr]->is_even); CREATE OR 
REPLACE FUNCTION  pg_srf(OUT value integer, OUT is_even bit) RETURNS
SETOF recordAS  'pg_srf.so', 'pg_srf'LANGUAGE  CSTRICTIMMUTABLE;


 
> Date: Sun, 15 Mar 2015 17:03:38 +0100
> From: and...@2ndquadrant.com
> To: it...@outlook.com
> CC: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Crash on SRF execution
> 
> Hi,
> 
> On 2015-03-15 17:59:39 +0200, Itai wrote:
> > Thanks for the quick response!! :)
> > But I don't get it... isn't:
> > if (SRF_IS_FIRSTCALL()){
> > }
> > the iterator one time
> > "initialization block" where I setup the data to be iterated
> > upon? 
> > 
> > Can you please elaborate on how should I fix this? I'm probably missing 
> > something basic...
> 
> > > >  if (SRF_IS_FIRSTCALL())
> > > >  {
> > > >   length = 4000;
> > > >   base_num = 120300;
> > > >   list = (NumberList *)palloc(sizeof(NumberList));
> > > >   list->pp_numbers = (Number **)palloc(sizeof(Number*) * length);
> 
> You allocate memory in the per call context here.
> 
> > > >   list->length = length;
> > > >   i = 0;
> > > >   for (; i < length; i++)
> > > >   {
> > > >num = (Number *)palloc(sizeof(Number));
> > > >num->value = base_num + i;
> > > >num->is_even = ((base_num + i) % 2 == 0) ? 1 : 0;
> > > >list->pp_numbers[i] = num;
> > > >   }
> > > >   ereport(INFO, (errmsg("--- data source ---")));
> > > >   i = 0;
> > > >   for (; i < length; i++)
> > > >   {
> > > >ereport(INFO, (errmsg("value: %d", list->pp_numbers[i]->value)));
> > > >ereport(INFO, (errmsg("is_even: %d", 
> > > > list->pp_numbers[i]->is_even)));   
> > > >   }
> > > >   
> > > >   funcctx = SRF_FIRSTCALL_INIT();
> > > >   oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
> 
> Because you only switch the memory context here. Move this up, to the
> beginning of the SRF_IS_FIRSTCALL block. Before the palloc()s.
> 
> Greetings,
> 
> Andres Freund
> 
> -- 
>  Andres Freund   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
  

Re: [HACKERS] Crash on SRF execution

2015-03-15 Thread Andres Freund
Hi,

On 2015-03-15 17:59:39 +0200, Itai wrote:
> Thanks for the quick response!! :)
> But I don't get it... isn't:
> if (SRF_IS_FIRSTCALL()){
> }
> the iterator one time
> "initialization block" where I setup the data to be iterated
> upon? 
> 
> Can you please elaborate on how should I fix this? I'm probably missing 
> something basic...

> > >  if (SRF_IS_FIRSTCALL())
> > >  {
> > >   length = 4000;
> > >   base_num = 120300;
> > >   list = (NumberList *)palloc(sizeof(NumberList));
> > >   list->pp_numbers = (Number **)palloc(sizeof(Number*) * length);

You allocate memory in the per call context here.

> > >   list->length = length;
> > >   i = 0;
> > >   for (; i < length; i++)
> > >   {
> > >num = (Number *)palloc(sizeof(Number));
> > >num->value = base_num + i;
> > >num->is_even = ((base_num + i) % 2 == 0) ? 1 : 0;
> > >list->pp_numbers[i] = num;
> > >   }
> > >   ereport(INFO, (errmsg("--- data source ---")));
> > >   i = 0;
> > >   for (; i < length; i++)
> > >   {
> > >ereport(INFO, (errmsg("value: %d", list->pp_numbers[i]->value)));
> > >ereport(INFO, (errmsg("is_even: %d", list->pp_numbers[i]->is_even)));  
> > >  
> > >   }
> > >   
> > >   funcctx = SRF_FIRSTCALL_INIT();
> > >   oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);

Because you only switch the memory context here. Move this up, to the
beginning of the SRF_IS_FIRSTCALL block. Before the palloc()s.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on SRF execution

2015-03-15 Thread Itai
Thanks for the quick response!! :)
But I don't get it... isn't:
if (SRF_IS_FIRSTCALL()){
}
the iterator one time
"initialization block" where I setup the data to be iterated
upon? 

Can you please elaborate on how should I fix this? I'm probably missing 
something basic...

> Date: Sun, 15 Mar 2015 16:50:27 +0100
> From: and...@2ndquadrant.com
> To: it...@outlook.com
> CC: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Crash on SRF execution
> 
> Hi,
> 
> On 2015-03-15 17:40:11 +0200, Itai wrote:
> > I'm attempting to program a simple SRF function but it constantly crashes 
> > (details and code below).
> >  
> > Any idea why?
> 
> >  if (SRF_IS_FIRSTCALL())
> >  {
> >   length = 4000;
> >   base_num = 120300;
> >   list = (NumberList *)palloc(sizeof(NumberList));
> >   list->pp_numbers = (Number **)palloc(sizeof(Number*) * length);
> >   list->length = length;
> >   i = 0;
> >   for (; i < length; i++)
> >   {
> >num = (Number *)palloc(sizeof(Number));
> >num->value = base_num + i;
> >num->is_even = ((base_num + i) % 2 == 0) ? 1 : 0;
> >list->pp_numbers[i] = num;
> >   }
> >   ereport(INFO, (errmsg("--- data source ---")));
> >   i = 0;
> >   for (; i < length; i++)
> >   {
> >ereport(INFO, (errmsg("value: %d", list->pp_numbers[i]->value)));
> >ereport(INFO, (errmsg("is_even: %d", list->pp_numbers[i]->is_even)));   
> >   }
> >   
> >   funcctx = SRF_FIRSTCALL_INIT();
> >   oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
> >   funcctx->user_fctx = list;
> >   funcctx->max_calls = list->length;
> >   if (get_call_result_type(fcinfo, NULL, &funcctx->tuple_desc) != 
> > TYPEFUNC_COMPOSITE)
> >ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> >errmsg("check if sql function definition returns SETOF record")));
> > 
> >   BlessTupleDesc(funcctx->tuple_desc);
> >   MemoryContextSwitchTo(oldcontext);
> >  }
> 
> The palloc() for list above is in the per call memory context, but you
> use it across several calls. You should allocate it in the multi call
> context you use some lines below.
> 
> Greetings,
> 
> Andres Freund
> 
> -- 
>  Andres Freund   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
  

Re: [HACKERS] recovery_target_action = pause & hot_standby = off

2015-03-15 Thread Magnus Hagander
On Sun, Mar 15, 2015 at 3:16 PM, Andres Freund 
wrote:

> On 2015-03-15 14:51:46 +0100, Magnus Hagander wrote:
> > On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund 
> > wrote:
> >
> > > On 2015-03-12 15:52:02 +0100, Andres Freund wrote:
> > > >   /*
> > > >* Override any inconsistent requests. Not that this is a
> change
> > > >* of behaviour in 9.5; prior to this we simply ignored a
> request
> > > >* to pause if hot_standby = off, which was surprising
> behaviour.
> > > >*/
> > > >   if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE &&
> > > >   recoveryTargetActionSet &&
> > > >   standbyState == STANDBY_DISABLED)
> > > >   recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;
> > >
> > > While it's easy enough to fix I rather dislike the whole intent here
> > > though. *Silently* switching the mode of operation in a rather
> > > significant way seems like a bad idea to me. At the very least we need
> > > to emit a LOG message about this; but I think it'd be much better to
> > > error out instead.
> > >
> > > <9.5's behaviour was already quite surprising. But changing things to a
> > > different surprising behaviour seems like a bad idea.
> > >
> >
> > +1. Especially for "sensitive" operations like this, having
> > predictable-behavior-or-error is usually the best choice.
>
> Yea.
>
> Looking further, it's even worse right now. We'll change the target to
> shutdown when hot_standby = off, but iff it was set in the config
> file. But the default value is (and was, although configured
> differently) documented to be 'pause'; so if it's not configured
> explicitly we still will promote.  At least I can't read that out of the
> docs.
>
> Personally I think we just should change the default to 'shutdown' for
> all cases. That makes documentation and behaviour less surprising. And
> makes experimenting less dangerous, since you can just start again.
>


+1. These things need to be clear. Given the consequences of getting it
wrong, surprising behavior can be quite dangerous.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Crash on SRF execution

2015-03-15 Thread Andres Freund
Hi,

On 2015-03-15 17:40:11 +0200, Itai wrote:
> I'm attempting to program a simple SRF function but it constantly crashes 
> (details and code below).
>  
> Any idea why?

>  if (SRF_IS_FIRSTCALL())
>  {
>   length = 4000;
>   base_num = 120300;
>   list = (NumberList *)palloc(sizeof(NumberList));
>   list->pp_numbers = (Number **)palloc(sizeof(Number*) * length);
>   list->length = length;
>   i = 0;
>   for (; i < length; i++)
>   {
>num = (Number *)palloc(sizeof(Number));
>num->value = base_num + i;
>num->is_even = ((base_num + i) % 2 == 0) ? 1 : 0;
>list->pp_numbers[i] = num;
>   }
>   ereport(INFO, (errmsg("--- data source ---")));
>   i = 0;
>   for (; i < length; i++)
>   {
>ereport(INFO, (errmsg("value: %d", list->pp_numbers[i]->value)));
>ereport(INFO, (errmsg("is_even: %d", list->pp_numbers[i]->is_even)));   
>   }
>   
>   funcctx = SRF_FIRSTCALL_INIT();
>   oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
>   funcctx->user_fctx = list;
>   funcctx->max_calls = list->length;
>   if (get_call_result_type(fcinfo, NULL, &funcctx->tuple_desc) != 
> TYPEFUNC_COMPOSITE)
>ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>errmsg("check if sql function definition returns SETOF record")));
> 
>   BlessTupleDesc(funcctx->tuple_desc);
>   MemoryContextSwitchTo(oldcontext);
>  }

The palloc() for list above is in the per call memory context, but you
use it across several calls. You should allocate it in the multi call
context you use some lines below.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on SRF execution

2015-03-15 Thread Tom Lane
Itai  writes:
> I'm attempting to program a simple SRF function but it constantly crashes 
> (details and code below).
> Any idea why?

Looks like you're pallocing some stuff in the calling context (ie, a
short-lived context) during the first execution and expecting it to
still be there in later executions.  You'd need to allocate those
data structures in the multi_call_memory_ctx instead.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Crash on SRF execution

2015-03-15 Thread Itai
Hi
 
I'm attempting to program a simple SRF function but it constantly crashes 
(details and code below).
 
Any idea why?
 
Thanks!
 
-Itai
-
Environment
-
Ubunto: 14.04.2 (server)
PG Ver: PostgreSQL 9.4.1 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 
4.8.2-19ubuntu1) 4.8.2, 64-bit
Install: deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main
Dev: postgresql-server-dev-9.4 (9.4.1-1.pgdg14.04+1)
-
Execution
-
select * from pg_srf();
INFO:  --- data source ---
INFO:  value: 120300
INFO:  is_even: 1
INFO:  value: 120301
INFO:  is_even: 0
...
INFO:  value: 1203003998
INFO:  is_even: 1
INFO:  value: 1203003999
INFO:  is_even: 0
INFO:  --- data context ---
INFO:  call_cntr: 0
INFO:  value: 120300
INFO:  is_even: 1
INFO:  call_cntr: 1
INFO:  value: 120301
INFO:  is_even: 0
INFO:  call_cntr: 2
server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> 
-
pg_srf.h
-
#ifndef PGSRF_H
#define PGSRF_H
#include "fmgr.h"
// using int as bool due to a different issue (one prob. at a time)
typedef struct Number_tag
{
 int value;
 int is_even;
} Number;
typedef struct NumberList_tag
{
 int length;
 Number ** pp_numbers;
} NumberList;
extern Datum pg_srf(PG_FUNCTION_ARGS);
#endif
-
pg_srf.c
-
#include 
#include 
#include 
#include 
#include "pg_srf.h"
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
PG_FUNCTION_INFO_V1(pg_srf);
Datum
pg_srf(PG_FUNCTION_ARGS)
{
 int call_cntr, i, length, base_num;
 Number * num;
 NumberList * list;
 HeapTuple rettuple;
 FuncCallContext *funcctx;
 MemoryContext oldcontext;
 if (SRF_IS_FIRSTCALL())
 {
  length = 4000;
  base_num = 120300;
  list = (NumberList *)palloc(sizeof(NumberList));
  list->pp_numbers = (Number **)palloc(sizeof(Number*) * length);
  list->length = length;
  i = 0;
  for (; i < length; i++)
  {
   num = (Number *)palloc(sizeof(Number));
   num->value = base_num + i;
   num->is_even = ((base_num + i) % 2 == 0) ? 1 : 0;
   list->pp_numbers[i] = num;
  }
  ereport(INFO, (errmsg("--- data source ---")));
  i = 0;
  for (; i < length; i++)
  {
   ereport(INFO, (errmsg("value: %d", list->pp_numbers[i]->value)));
   ereport(INFO, (errmsg("is_even: %d", list->pp_numbers[i]->is_even)));   
  }
  
  funcctx = SRF_FIRSTCALL_INIT();
  oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
  funcctx->user_fctx = list;
  funcctx->max_calls = list->length;
  if (get_call_result_type(fcinfo, NULL, &funcctx->tuple_desc) != 
TYPEFUNC_COMPOSITE)
   ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
   errmsg("check if sql function definition returns SETOF record")));

  BlessTupleDesc(funcctx->tuple_desc);
  MemoryContextSwitchTo(oldcontext);
 }
 funcctx = SRF_PERCALL_SETUP();
 list = funcctx->user_fctx;
 call_cntr = funcctx->call_cntr;
 if (call_cntr < funcctx->max_calls)
 {
  Datum retvals[2];
  bool retnulls[2];
  
  if (call_cntr == 0) 
  {
   ereport(INFO, (errmsg("--- data context ---")));
  }
  ereport(INFO, (errmsg("call_cntr: %d", call_cntr)));
  ereport(INFO, (errmsg("value: %d", list->pp_numbers[call_cntr]->value)));
  retvals[0] = Int32GetDatum(list->pp_numbers[call_cntr]->value);
  ereport(INFO, (errmsg("is_even: %d", list->pp_numbers[call_cntr]->is_even)));
  retvals[1] = Int32GetDatum(list->pp_numbers[call_cntr]->is_even);
  retnulls[0] = false;
  retnulls[1] = false;
  rettuple = heap_form_tuple(funcctx->tuple_desc, retvals, retnulls);
  SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(rettuple));
 }
 else
 {
  SRF_RETURN_DONE(funcctx);
 }
}
-
Makefile
-
MODULES = pg_srf
OBJS = pg_srf.o
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
-
SQL
-
CREATE OR REPLACE FUNCTION
  pg_srf(OUT value integer, OUT is_even integer)
RETURNS 
 SETOF record
AS
  'pg_srf.so', 'pg_srf'
LANGUAGE
  C
STRICT
IMMUTABLE;
  

Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized

2015-03-15 Thread Andres Freund
On 2015-03-02 18:45:18 +0100, Andres Freund wrote:
> On 2015-03-02 19:23:56 +0200, Heikki Linnakangas wrote:
> > Pass the prepared XID as yet another argument to XactEmitCommitRecord, and
> > have XactEmitCommitRecord emit the xl_xact_commit_prepared part of the
> > record too. It might even make sense to handle the prepared XID like all the
> > other optional fields and add an xinfo flag for it.
> 
> That's what I mean with "non simple". Not a fan of teaching xact.c even
> more about twophase's dealings than it already knows.

I made an effort to show how horrible it would look like. Turns out it's
not that bad ;). Far from pretty, especially in xlog.c, but
acceptable. I think treating them more similar actually might open the
road for reducing the difference further at some point.

The attached patch does pretty much what you suggested above and tries
to address all the point previously made. I plan to push this fairly
soon; unless somebody has further input.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 7557d0afa860c37a1992240b54e2bf3710296fc7 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 15 Mar 2015 14:42:09 +0100
Subject: [PATCH] Merge the various forms of transaction commit & abort
 records.

Since 465883b0a two versions of commit records have existed. A compact
version that was used when no cache invalidations, smgr unlinks and
similar were needed, and a full version that could deal with all
that. Additionally the full version was embedded into twophase commit
records.

That resulted in a measurable reduction in the size of the logged WAL in
some workloads. But more recently additions like logical decoding, which
e.g. needs information about the database something was executed on,
made it applicable in fewer situations. The static split generally made
it hard to expand the commit record, because concerns over the size made
it hard to add anything to the compact version.

Additionally it's not particularly pretty to have twophase.c insert
RM_XACT records.

Rejigger things so that the commit and abort records only have one form
each, including the twophase equivalents. The presence of the various
optional (in the sense of not being in every record) pieces is indicated
by a bits in the 'xinfo' flag.  That flag previously was not included in
compact commit records. To prevent an increase in size due to its
presence, it's only included if necessary; signalled by a bit in the
xl_info bits available for xact.c, similar to heapam.c's
XLOG_HEAP_OPMASK/XLOG_HEAP_INIT_PAGE.

Twophase commit/aborts are now the same as their normal
counterparts. The original transaction's xid is included in an optional
data field.

This means that commit records generally are smaller, except in the case
of a transaction with subtransactions, but no other special cases; the
increase there is four bytes, which seems acceptable given that the more
common case of not having subtransactions shrank.  The savings are
especially measurable for twophase commits, which previously always used
the full version; but will in practice only infrequently have required
that.

The motivation for this work are not the space savings and and
deduplication though; it's that it makes it easier to extend commit
records with additional information. That's just a few lines of code
now; without impacting the common case where that information is not
needed.

Discussion: 20150220152150.gd4...@awork2.anarazel.de,
235610.92468.qm%40web29004.mail.ird.yahoo.com

Reviewed-By: Heikki Linnakangas, Simon Riggs
---
 src/backend/access/rmgrdesc/xactdesc.c   | 245 +++-
 src/backend/access/transam/twophase.c|  59 +---
 src/backend/access/transam/xact.c| 476 +++
 src/backend/access/transam/xlog.c| 126 
 src/backend/replication/logical/decode.c | 133 +++--
 src/include/access/xact.h| 215 ++
 src/include/access/xlog_internal.h   |   2 +-
 7 files changed, 749 insertions(+), 507 deletions(-)

diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 3e87978..b036b6d 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -14,53 +14,188 @@
  */
 #include "postgres.h"
 
+#include "access/transam.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
 #include "storage/sinval.h"
 #include "utils/timestamp.h"
 
+/*
+ * Parse the WAL format of a xact commit and abort records into a easier to
+ * understand format.
+ *
+ * This routines are in xactdesc.c because they're accessed in backend (when
+ * replaying WAL) and frontend (pg_xlogdump) code. This file is the only xact
+ * specific one shared between both. They're complicated enough that
+ * duplication would be bothersome.
+ */
+
+void
+ParseCommitRecord(uint8 info, xl_xact_commit *xlrec, xl_xact_parsed_comm

Re: [HACKERS] pg_dump quietly ignore missing tables - is it bug?

2015-03-15 Thread Tom Lane
Pavel Stehule  writes:
> other variant, I hope better than previous. We can introduce new long
> option "--strict". With this active option, every pattern specified by -t
> option have to have identifies exactly only one table. It can be used for
> any other "should to exists" patterns - schemas. Initial implementation in
> attachment.

I think this design is seriously broken.  If I have '-t foo*' the code
should not prevent that from matching multiple tables.  What would the use
case for such a restriction be?

What would make sense to me is one or both of these ideas:

* require a match for a wildcard-free -t switch

* require at least one (not "exactly one") match for a wildcarded -t
  switch.

Neither of those is what you wrote, though.

If we implemented the second one of these, it would have to be controlled
by a new switch, because there are plausible use cases for wildcards that
sometimes don't match anything (not to mention backwards compatibility).
There might be a reasonable argument for the first one being the
default behavior, though; I'm not sure if we could get away with that
from a compatibility perspective.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_action = pause & hot_standby = off

2015-03-15 Thread Andres Freund
On 2015-03-15 14:51:46 +0100, Magnus Hagander wrote:
> On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund 
> wrote:
> 
> > On 2015-03-12 15:52:02 +0100, Andres Freund wrote:
> > >   /*
> > >* Override any inconsistent requests. Not that this is a change
> > >* of behaviour in 9.5; prior to this we simply ignored a request
> > >* to pause if hot_standby = off, which was surprising behaviour.
> > >*/
> > >   if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE &&
> > >   recoveryTargetActionSet &&
> > >   standbyState == STANDBY_DISABLED)
> > >   recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;
> >
> > While it's easy enough to fix I rather dislike the whole intent here
> > though. *Silently* switching the mode of operation in a rather
> > significant way seems like a bad idea to me. At the very least we need
> > to emit a LOG message about this; but I think it'd be much better to
> > error out instead.
> >
> > <9.5's behaviour was already quite surprising. But changing things to a
> > different surprising behaviour seems like a bad idea.
> >
> 
> +1. Especially for "sensitive" operations like this, having
> predictable-behavior-or-error is usually the best choice.

Yea.

Looking further, it's even worse right now. We'll change the target to
shutdown when hot_standby = off, but iff it was set in the config
file. But the default value is (and was, although configured
differently) documented to be 'pause'; so if it's not configured
explicitly we still will promote.  At least I can't read that out of the
docs.

Personally I think we just should change the default to 'shutdown' for
all cases. That makes documentation and behaviour less surprising. And
makes experimenting less dangerous, since you can just start again.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_action = pause & hot_standby = off

2015-03-15 Thread Petr Jelinek

On 15/03/15 14:51, Magnus Hagander wrote:

On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund mailto:and...@2ndquadrant.com>> wrote:

On 2015-03-12 15:52:02 +0100, Andres Freund wrote:
 >   /*
 >* Override any inconsistent requests. Not that this is a
change
 >* of behaviour in 9.5; prior to this we simply ignored a
request
 >* to pause if hot_standby = off, which was surprising
behaviour.
 >*/
 >   if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE &&
 >   recoveryTargetActionSet &&
 >   standbyState == STANDBY_DISABLED)
 >   recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;

While it's easy enough to fix I rather dislike the whole intent here
though. *Silently* switching the mode of operation in a rather
significant way seems like a bad idea to me. At the very least we need
to emit a LOG message about this; but I think it'd be much better to
error out instead.

<9.5's behaviour was already quite surprising. But changing things to a
different surprising behaviour seems like a bad idea.


+1. Especially for "sensitive" operations like this, having
predictable-behavior-or-error is usually the best choice.



Thinking about it again now, it does seem that ignoring user setting 
because it's in conflict with another user setting is a bad idea and I 
think we in general throw errors on those.


So +1 from me also.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

2015-03-15 Thread Andres Freund
On 2014-12-08 02:18:11 +0100, Petr Jelinek wrote:
> On 08/12/14 02:06, Petr Jelinek wrote:
> >On 08/12/14 02:03, Michael Paquier wrote:
> >>On Mon, Dec 8, 2014 at 9:59 AM, Petr Jelinek 
> >>wrote:
> >>>Ok this patch does that, along with the rename to
> >>>recovery_target_action and
> >>>addition to the recovery.conf.sample.
> >>This needs a rebase as at least da71632 and b8e33a8 are conflicting.
> >>
> >
> >Simon actually already committed something similar, so no need.
> >
> 
> ...except for the removal of pause_at_recovery_target it seems, so I
> attached just that

I intend to push this one unless somebody protests soon.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_target_action = pause & hot_standby = off

2015-03-15 Thread Magnus Hagander
On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund 
wrote:

> On 2015-03-12 15:52:02 +0100, Andres Freund wrote:
> >   /*
> >* Override any inconsistent requests. Not that this is a change
> >* of behaviour in 9.5; prior to this we simply ignored a request
> >* to pause if hot_standby = off, which was surprising behaviour.
> >*/
> >   if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE &&
> >   recoveryTargetActionSet &&
> >   standbyState == STANDBY_DISABLED)
> >   recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;
>
> While it's easy enough to fix I rather dislike the whole intent here
> though. *Silently* switching the mode of operation in a rather
> significant way seems like a bad idea to me. At the very least we need
> to emit a LOG message about this; but I think it'd be much better to
> error out instead.
>
> <9.5's behaviour was already quite surprising. But changing things to a
> different surprising behaviour seems like a bad idea.
>

+1. Especially for "sensitive" operations like this, having
predictable-behavior-or-error is usually the best choice.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] recovery_target_action = pause & hot_standby = off

2015-03-15 Thread Andres Freund
On 2015-03-12 15:52:02 +0100, Andres Freund wrote:
>   /*
>* Override any inconsistent requests. Not that this is a change
>* of behaviour in 9.5; prior to this we simply ignored a request
>* to pause if hot_standby = off, which was surprising behaviour.
>*/
>   if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE &&
>   recoveryTargetActionSet &&
>   standbyState == STANDBY_DISABLED)
>   recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN;

While it's easy enough to fix I rather dislike the whole intent here
though. *Silently* switching the mode of operation in a rather
significant way seems like a bad idea to me. At the very least we need
to emit a LOG message about this; but I think it'd be much better to
error out instead.

<9.5's behaviour was already quite surprising. But changing things to a
different surprising behaviour seems like a bad idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - merging transaction logs

2015-03-15 Thread Fabien COELHO

Sun, 15 Mar 2015 11:22:01 +0100 (CET)
Hello Tomas,


attached is a patch implementing merging of pgbench logs. These logs are
written by each thread, so with N threads you get N files with names

   pgbench_log.PID
   pgbench_log.PID.1
   ...
   pgbench_log.PID.N

Before analyzing these logs, these files need to be combined. I usually
ended up wrinting ad-hoc scripts doing that, lost them, written them
again and so on over and over again.


I've looked at the patch. Although I think that such a feature is somehow 
desirable... I have two issues with it: ISTM that


(1) it does not belong to pgbench as such

(2) even if, the implementation is not right

About (1):

I think that this functionnality as implemented does not belong to 
pgbench, and does really belong to an external script, which happen not to 
be readily available, which is a shame. PostgreSQL should probably provide 
such a script, or make it easy to find.


It could belong to pgbench if pgbench was *generating* the merged log file 
directly. However I'm not sure this can be done cleanly for the 
multi-process "thread emulation" (IN PASSING: which I think this should 
really get rid of because I do not know what system does not provide 
threads nowadays and would require to have a state of the art pgbench 
nevertheless, and this emulation significantly complexify the code by 
making things uselessly difficult and/or needed to be implemented twice or 
not be provided in some cases).


Another issue raised by your patch is that the log format may be improved, 
say by providing a one-field timestamp at the beginning of the line.



About (2):

In the implementation you reimplement a partial merge sort by parsing each 
line of each file and merging it with the current result over and over. 
ISTM that an implementation should read all files in parallel and merge 
them in one pass. The current IO complexity is in p²n where it should be 
simply pn... do not use it with a significant number of threads and many 
lines... Ok, the generated files are likely to be kept in cache, but 
nevertheless.


Also there are plenty of copy paste when scanning for two files, and then 
reprinting in all the different formats. The same logic is implemented 
twice, once for simple and once for aggregated. This means that updating 
or extending the log format later on would require to modify these scans 
and prints in many places.


For aggregates, some data in the output may be special values "NaN/-/...", 
I am not sure how the implementation would behave in such cases. As lines 
that do not match are silently ignored, the result merge would just be non 
significant should it rather be an error? Try it with a low rate for 
instance.


It seems that system function calls are not tested for errors.


The other disadvantage of the external scripts is that you have to pass
all the info about the logs (whether the logs are aggregated, whther
there's throttling, etc.).


I think that is another argument to make a better format, with the a 
timestamp ahead? Also, ISTM that it only needs to know whether it is 
merging aggregate or simple logs, no more, the other information can be 
infered by the number of fields on the line.



So integrating this into pgbench directly seems like a better approach,
and the attached patch implements that.


You guessed that I disagree. Note that this is only my own opinion.

In summary, I think that:

(a) providing a clean script would be nice,

(b) if we could get rid of the "thread emulation", pgbench could generate 
the merged logs directly and simply, and the option could be provided 
then.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ranlib bleating about dirmod.o being empty

2015-03-15 Thread Michael Paquier
On Sun, Mar 15, 2015 at 1:58 AM, Tom Lane  wrote:
> I'm getting rather tired of reading these warning messages in OS X builds:
>
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib:
>  file: libpgport.a(dirmod.o) has no symbols
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib:
>  file: libpgport_srv.a(dirmod_srv.o) has no symbols
>
> The reason for this warning is that dirmod.o contains no functions that
> aren't Windows-specific.  As such, it seems like putting it into the
> list of "always compiled" port modules is really a mistake.  Any
> objections to switching it to be built only on Windows?

+1. Thanks for changing that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers