Re: [HACKERS] Reduce pinning in btree indexes
Thank you for rewriting. I was satisfied with the current patch (Head of btnopin on your github repos). I have no further comment on the functions. Peter might have a comment on the README description but I suppose I can push this to the committers. I attached the your latest patch to this mail as bt-nopin-v4.patch for now. Please check that there's no problem in it. - By this patch, index scan becomes to release buffer pins while fetching index tuples in a page, so it should reduce the chance of index scans with long duration to block vacuum, because vacuums now can easily overtake the current position of an index scan. I didn't actually measured how effective it is, though. - It looks to work correctly for scan in both direction with simultaneous page splitting and vacuum. - It makes no performance deterioration, on the contrary it accelerates index scans. It seems to be because of removal of lock and unlock surrounding _bt_steppage in bt_next. - It applies cleanly on the current head on the master branch. - It has enough intelligible comments and README descriptions. - An inrelevant typo fix is made in buffers/README by this patch. But it doesn't seem necessary to be separated. regards, At Fri, 13 Mar 2015 15:08:25 + (UTC), Kevin Grittner wrote in <848652045.4044965.1426259305233.javamail.ya...@mail.yahoo.com> > Kyotaro HORIGUCHI wrote: > > At Thu, 12 Mar 2015 15:27:37 -0700, Peter Geoghegan wrote: > >> On Sat, Feb 14, 2015 at 4:19 PM, Kevin Grittner wrote: > >> I think you should call out those "confounding factors" in the > >> README. It's not hard to piece them together from > >> _bt_drop_lock_and_maybe_pin(), but I think you should anyway. > > OK: > > https://github.com/kgrittn/postgres/commit/f5f59ded30b114ac83b90a00ba1fa5ef490b994e It looks perfect for me. Do you have any further request on this, Peter? > >> Don't use BUFFER_LOCK_SHARE -- use BT_READ, as the existing > >> nbtree LockBuffer() callers do. You're inconsistent about that > >> in V3. > > > > I agree with you. It looks the only point where it is used. > > OK: > > https://github.com/kgrittn/postgres/commit/76118b58be819ed5e68569c926d0222bc41640ea > > > Addition to that, the commnet just above the point methioned > > above quizes me. > > > >>/* XXX: Can walking left be lighter on the locking and pins? */ > >>if (BTScanPosIsPinned(so->currPos)) > >>LockBuffer(so->currPos.buf, BUFFER_LOCK_SHARE); > >>else > >>so->currPos.buf = _bt_getbuf(rel, so->currPos.currPage, > >> BT_READ); > > > > I'm happy if I could read the meaming of the comment more > > clearly. I understand that it says that you want to remove the > > locking (and pinning), but can't now because the simultaneous > > splitting of the left page would break something. I'd like to see > > it clearer even for me either I am correct or not.. > > Does this clear it up?: > > https://github.com/kgrittn/postgres/commit/22066fc161a092e800e4c1e853136c4513f8771b Thank you for the labor. It also looks perfect. > Since there are no changes that would affect the compiled code > here, I'm not posting a new patch yet. I'll do that once things > seem to have settled down a bit more. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README index 213542c..c904d2f 100644 --- a/src/backend/access/nbtree/README +++ b/src/backend/access/nbtree/README @@ -92,17 +92,18 @@ To minimize lock/unlock traffic, an index scan always searches a leaf page to identify all the matching items at once, copying their heap tuple IDs into backend-local storage. The heap tuple IDs are then processed while not holding any page lock within the index. We do continue to hold a pin -on the leaf page, to protect against concurrent deletions (see below). -In this state the scan is effectively stopped "between" pages, either -before or after the page it has pinned. This is safe in the presence of -concurrent insertions and even page splits, because items are never moved -across pre-existing page boundaries --- so the scan cannot miss any items -it should have seen, nor accidentally return the same item twice. The scan -must remember the page's right-link at the time it was scanned, since that -is the page to move right to; if we move right to the current right-link -then we'd re-scan any items moved by a page split. We don't similarly -remember the left-link, since it's best to use the most up-to-date -left-link when trying to move left (see detailed move-left algorithm below). +on the leaf page in some circumstances, to protect against concurrent +deletions (see below). In this state the scan is effectively stopped +"between" pages, either before or after the page it has pinned. This is +safe in the presence of concurrent insertions and even page splits, because +items are never moved across pre-existing page boundaries --- so the scan +
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
The attached patch changed invocation order of GetForeignJoinPaths and set_join_pathlist_hook, and adjusted documentation part on custom-scan.sgml. Other portions are kept as previous version. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Sunday, March 15, 2015 11:38 AM > To: Robert Haas; Tom Lane > Cc: Thom Brown; Shigeru Hanada; pgsql-hackers@postgreSQL.org > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > > On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane wrote: > > > Robert Haas writes: > > >> Another bit of this that I think we could commit without fretting > > >> about it too much is the code adding set_join_pathlist_hook. This is > > >> - I think - analogous to set_rel_pathlist_hook, and like that hook, > > >> could be used for other purposes than custom plan generation - e.g. to > > >> delete paths we do not want to use. I've extracted this portion of > > >> the patch and adjusted the comments; if there are no objections, I > > >> will commit this bit also. > > > > > > I don't object to the concept, but I think that is a pretty bad place > > > to put the hook call: add_paths_to_joinrel is typically called multiple > > > (perhaps *many*) times per joinrel and thus this placement would force > > > any user of the hook to do a lot of repetitive work. > > > > Interesting point. I guess the question is whether a some or all > > callers are going to actually *want* a separate call for each > > invocation of add_paths_to_joinrel(), or whether they'll be happy to > > operate on the otherwise-complete path list. It's true that if your > > goal is to delete paths, it's probably best to be called just once > > after the path list is complete, and there might be a use case for > > that, but I guess it's less useful than for baserels. For a baserel, > > as long as you don't nuke the sequential-scan path, there is always > > going to be a way to complete the plan; so this would be a fine way to > > implement a disable-an-index extension. But for joinrels, it's not so > > easy to rule out, say, a hash-join here. Neither hook placement is > > much good for that; the path you want to get rid of may have already > > dominated paths you want to keep. > > > From the standpoint of extension development, I'm uncertain whether we > can easily reproduce information needed to compute alternative paths on > the hook at standard_join_search(), like a hook at add_paths_to_joinrel(). > > (Please correct me, if I misunderstood.) > For example, it is not obvious which path is inner/outer of the joinrel > on which custom-scan provider tries to add an alternative scan path. > Probably, extension needs to find out the path of source relations from > the join_rel_level[] array. > Also, how do we pull SpecialJoinInfo? It contains needed information to > identify required join-type (like JOIN_LEFT), however, extension needs > to search join_info_list by relids again, if hook is located at > standard_join_search(). > Even if number of hook invocation is larger if it is located on > add_paths_to_joinrel(), it allows to design extensions simpler, > I think. > > > Suppose you want to add paths - e.g. you have an extension that goes > > and looks for a materialized view that matches this subtree of the > > query, and if it finds one, it substitutes a scan of the materialized > > view for a scan of the baserel. Or, as in KaiGai's case, you have an > > extension that can perform the whole join in GPU-land and produce the > > same results we would have gotten via normal execution. Either way, > > you want - and this is the central point of the whole patch here - to > > inject a scan path into a joinrel. It is not altogether obvious to me > > what the best placement for this is. In the materialized view case, > > you probably need a perfect match between the baserels in the view and > > the baserels in the joinrel to do anything. There's no point in > > re-checking that for every innerrels/outerrels combination. I don't > > know enough about the GPU case to reason about it intelligently; maybe > > KaiGai can comment. > > > In case of GPU, extension will add alternative paths based on hash-join > and nested-loop algorithm with individual cost estimation as long as > device can execute join condition. It expects planner (set_cheapest) > will choose the best path in the built-in/additional ones. > So, it is more reasonable for me, if extension can utilize a common > infrastructure as built-in logic (hash-join/merge-join/nested-loop) > is using to compute its cost estimation. > > > But there's another possible approach: suppose that > > join_search_one_level, after considering left-sided and right-sided > > joins and after considering bushy joins, checks whether every relation > > it's got is from the same foreign server, and
Re: [HACKERS] Parallel Seq Scan
On 13-03-2015 PM 11:03, Amit Kapila wrote: > On Fri, Mar 13, 2015 at 7:15 PM, Robert Haas wrote: >> >> I don't think this is the right fix; the point of that code is to >> remove a tuple queue from the funnel when it gets detached, which is a >> correct thing to want to do. funnel->nextqueue should always be less >> than funnel->nqueues; how is that failing to be the case here? >> > > I could not reproduce the issue, neither the exact scenario is > mentioned in mail. However what I think can lead to funnel->nextqueue > greater than funnel->nqueues is something like below: > > Assume 5 queues, so value of funnel->nqueues will be 5 and > assume value of funnel->nextqueue is 2, so now let us say 4 workers > got detached one-by-one, so for such a case it will always go in else loop > and will never change funnel->nextqueue whereas value of funnel->nqueues > will become 1. > Or if the just-detached queue happens to be the last one, we'll make shm_mq_receive() to read from a potentially already-detached queue in the immediately next iteration. That seems to be caused by not having updated the funnel->nextqueue. With the returned value being SHM_MQ_DETACHED, we'll again try to remove it from the queue. In this case, it causes the third argument to memcpy be negative and hence the segfault. I can't seem to really figure out the other problem of waiting forever in WaitLatch() but I had managed to make it go away with: -if (funnel->nextqueue == waitpos) +if (result != SHM_MQ_DETACHED && funnel->nextqueue == waitpos) By the way, you can try reproducing this with the example I posted on Friday. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind in contrib
On Wed, Mar 11, 2015 at 2:23 PM, Heikki Linnakangas wrote: > > On 03/11/2015 05:01 AM, Amit Kapila wrote: >> >> >> >> Can't that happen if the source database (new-master) haven't >> received all of the data from target database (old-master) at the >> time of promotion? >> If yes, then source database won't have WAL for truncation and >> the way current mechanism works is must. >> >> Now I think for such a case doing truncation in the target database >> is the right solution, > > > Yeah, that can happen, and truncation is the correct fix for it. The logic is pretty well explained by this comment in filemap.c: > > > * > * If it's the same size, do nothing here. Any locally > * modified blocks will be copied based on parsing the local > * WAL, and any remotely modified blocks will be updated after > * rewinding, when the remote WAL is replayed. > */ > What about unlogged table, how will they handle this particular case? I think after old-master and new-master got diverged any operations on unlogged table won't guarantee that we can get those modified blocks from new-master during pg_rewind and I think it can lead to a case where unlogged tables have some data from old-master and some data from new master considering user always take of clean shut-down. Typo in patch: + * For our purposes, only files belonging to the main fork are considered + * relation files. Other forks are alwayes copied in toto, because we cannot /alwayes/always With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns
On 2015/03/13 11:46, Etsuro Fujita wrote: BTW, what do you think about opening/locking foreign tables selected for update at InitPlan, which the original patch does? As I mentioned in [1], ISTM that ExecOpenScanRelation called from ExecInitForeignScan is assuming that. [1] http://www.postgresql.org/message-id/54bcbbf8.3020...@lab.ntt.co.jp Let me explain further. Here is the comment in ExecOpenScanRelation: * Determine the lock type we need. First, scan to see if target relation * is a result relation. If not, check if it's a FOR UPDATE/FOR SHARE * relation. In either of those cases, we got the lock already. I think this is not true for foreign tables selected FOR UPDATE/SHARE, which have markType = ROW_MARK_COPY, because such foreign tables don't get opened/locked by InitPlan. Then such foreign tables don't get locked by neither of InitPlan nor ExecOpenScanRelation. I think this is a bug. To fix it, I think we should open/lock such foreign tables at InitPlan as the original patch does. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat
On 16/03/15 13:04, Rui DeSousa wrote: Would a parameter to auto terminate long running transactions be a better solution? Terminating the long running transaction would allow for the normal vacuuming process to cleanup the deleted records thus avoiding database bloat without introducing new semantics to Postgres's MVCC. I would also recommend that the default be disabled. Hmm... Better per transaction, or even per statement. If I fire of a query I expect should be completed within 30 seconds, then I probably don't mind it aborting after 5 minutes. However, if I fire of a query that I expect to take between an hour and 3 hours, then I definitely don't want it aborted after 5 minutes! It could be that 99.99% of transactions will complete with a minute, but a small minority might reasonably be expected to take much longer. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat
Would a parameter to auto terminate long running transactions be a better solution? Terminating the long running transaction would allow for the normal vacuuming process to cleanup the deleted records thus avoiding database bloat without introducing new semantics to Postgres's MVCC. I would also recommend that the default be disabled. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster setup_param_list() in plpgsql
On 03/14/2015 06:04 PM, Tom Lane wrote: Given the rather hostile response I got to http://www.postgresql.org/message-id/4146.1425872...@sss.pgh.pa.us I was not planning to bring this topic up again until 9.6 development starts. However, as I said in that thread, this work is getting done now because of $dayjob deadlines, and I've realized that it would actually make a lot of sense to apply it before my expanded-arrays patch that's pending in the current commitfest. So I'm going to put on my flameproof long johns and post it anyway. I will add it to the 2015-06 commitfest, but I'd really rather deal with it now ... What this patch does is to remove setup_param_list() overhead for the common case of PLPGSQL_DTYPE_VAR variables (ie, any non-composite type). It does that by the expedient of keeping the ParamExternData image of such a variable valid at all times. That adds a few cycles to assignments to these variables, but removes more cycles from each use of them. Unless you believe that common plpgsql functions contain lots of dead stores, this is a guaranteed win overall. I'm seeing about 10% overall speedup (vs HEAD, with casserts off) for realistic simple plpgsql logic, such as this test case: create or replace function typicalspeed(n int) returns bigint as $$ declare res bigint := 0; begin for i in 1 .. n loop res := res + i; if i % 10 = 0 then res := res / 10; end if; end loop; return res; end $$ language plpgsql strict stable; For functions with lots of variables (or even just lots of expressions, since each one of those is a PLpgSQL_datum too), it's even more helpful. I have found no cases where it makes things worse, at least to within measurement error (run-to-run variability is a percent or two for me). The reason I would like to apply this now rather than wait for 9.6 is that by making parameter management more explicit it removes the need for the klugy changes in exec_eval_datum() that exist in http://www.postgresql.org/message-id/22945.1424982...@sss.pgh.pa.us Instead, we could leave exec_eval_datum() alone and substitute read-only pointers only when manufacturing the parameter image of an expanded-object variable. If we do it in the other order then we'll be making an API change for exec_eval_datum() in 9.5 (assuming said patch gets in) and then reverting it come 9.6. So there you have it. Now, where'd I put those long johns ... I'm inclined to say go for it. I can recall cases in the past where we have found some significant piece of work to be necessary after feature freeze in order to enable a piece of work submitted before feature freeze to proceed. This sounds like a similar case. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cygwin vs "win32" in configure
On 03/15/2015 02:12 PM, Andres Freund wrote: Hi, On 2015-03-15 14:03:08 -0400, Tom Lane wrote: Buildfarm member brolga seems unhappy with my commit 91f4a5a that restricted port/dirmod.c to being built only on "Windows". Evidently we need to build it when $PORTNAME = "cygwin" as well, which is fine; but I'm a bit astonished that none of the other stuff inserted in the "# Win32 support" stanza beginning at configure.in line 1440 is needed for cygwin builds. Is that really correct, or am I misunderstanding what's happening here? Those all sound like things that cygwin is going to emulate for us. I guess we could use more of those functions to reduce the difference, but I'm not sure it's worth it. I think that's right. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_action = pause & hot_standby = off
On Mon, Mar 16, 2015 at 7:38 AM, Andres Freund wrote: > On 2015-03-15 20:10:49 +, Simon Riggs wrote: >> On 15 March 2015 at 14:16, Andres Freund wrote: >> >> > Personally I think we just should change the default to 'shutdown' for >> > all cases. That makes documentation and behaviour less surprising. And >> > makes experimenting less dangerous, since you can just start again. >> >> We need to look at the specific situation, not make a generic decision. >> >> If hot_standby = off, we are unable to unpause, once paused. >> >> Changing the default doesn't alter that problem. >> >> We have two choices: 1) override to a sensible setting, 2) throw an error. >> >> (2) sounds clean at first but we must look deeper. We know that the >> *only* possible other setting is 'shutdown', so it seems more user >> friendly to do the thing we *know* they want (1), rather than pretend >> that we don't. >> >> (1) is completely predictable and not at all surprising. Add a LOG >> message if you wish, but don't throw an error. > > Sorry, I don't buy this. If I have "recovery_target_action = 'pause'" in > the config file, I want it to pause. Not do something else, just because > postgres doesn't have a interface to unpause without SQL access. That > makes some sense to developers, but is pretty much ununderstandable for > mere mortals. +1 for removing this code block, and +1 for having a LOG message, with an additional paragraph in the docs mentioning that when using pause as recovery_target_action and hot_standby = off there is no direct interface to unpause the node. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates
On 03/15/2015 09:47 PM, Petr Jelinek wrote: It's almost the same thing as you wrote but the 128 bit and 64 bit optimizations are put on the same "level" of optimized routines. But this is nitpicking at this point, I am happy with the patch as it stands right now. Do you think it is ready for committer? New version with altered comment is attached. -- Andreas Karlsson diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 509f961..48fcc68 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -125,6 +125,41 @@ undefine([Ac_cachevar])dnl ])# PGAC_TYPE_64BIT_INT +# PGAC_HAVE___INT128 +# -- +# Check if __int128 is a working 128 bit integer type and +# define HAVE___INT128 if so. +AC_DEFUN([PGAC_HAVE___INT128], +[AC_CACHE_CHECK(for __int128, pgac_cv_have___int128, +[AC_TRY_RUN([ +__int128 a = 2001; +__int128 b = 4005; + +int does_int128_work() +{ + __int128 c,d; + + c = a * b; + d = (c + b) / b; + if (d != a+1) +return 0; + return 1; +} +main() { + exit(! does_int128_work()); +}], +[pgac_cv_have___int128=yes], +[pgac_cv_have___int128=no], +[# If cross-compiling, check the size reported by the compiler and +# trust that the arithmetic works. +AC_COMPILE_IFELSE([AC_LANG_BOOL_COMPILE_TRY([], [sizeof(__int128) == 16])], + pgac_cv_have___int128=yes, + pgac_cv_have___int128=no)])]) + +if test x"$pgac_cv_have___int128" = xyes ; then + AC_DEFINE(HAVE___INT128, 1, [Define to 1 if the system has the type `__int128'.]) +fi])# PGAC_TYPE_64BIT_INT + # PGAC_C_FUNCNAME_SUPPORT # --- diff --git a/configure b/configure index 379dab1..b60f55f 100755 --- a/configure +++ b/configure @@ -13789,6 +13789,74 @@ _ACEOF fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __int128" >&5 +$as_echo_n "checking for __int128... " >&6; } +if ${pgac_cv_have___int128+:} false; then : + $as_echo_n "(cached) " >&6 +else + if test "$cross_compiling" = yes; then : + # If cross-compiling, check the size reported by the compiler and +# trust that the arithmetic works. +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +int +main () +{ +static int test_array [1 - 2 * !(sizeof(__int128) == 16)]; +test_array [0] = 0; +return test_array [0]; + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + pgac_cv_have___int128=yes +else + pgac_cv_have___int128=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +else + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +__int128 a = 2001; +__int128 b = 4005; + +int does_int128_work() +{ + __int128 c,d; + + c = a * b; + d = (c + b) / b; + if (d != a+1) +return 0; + return 1; +} +main() { + exit(! does_int128_work()); +} +_ACEOF +if ac_fn_c_try_run "$LINENO"; then : + pgac_cv_have___int128=yes +else + pgac_cv_have___int128=no +fi +rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \ + conftest.$ac_objext conftest.beam conftest.$ac_ext +fi + +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_have___int128" >&5 +$as_echo "$pgac_cv_have___int128" >&6; } + +if test x"$pgac_cv_have___int128" = xyes ; then + +$as_echo "#define HAVE___INT128 1" >>confdefs.h + +fi + # We also check for sig_atomic_t, which *should* be defined per ANSI # C, but is missing on some old platforms. ac_fn_c_check_type "$LINENO" "sig_atomic_t" "ac_cv_type_sig_atomic_t" "#include diff --git a/configure.in b/configure.in index ca29e93..823abaa 100644 --- a/configure.in +++ b/configure.in @@ -1767,6 +1767,8 @@ AC_DEFINE_UNQUOTED(MAXIMUM_ALIGNOF, $MAX_ALIGNOF, [Define as the maximum alignme AC_CHECK_TYPES([int8, uint8, int64, uint64], [], [], [#include ]) +PGAC_HAVE___INT128 + # We also check for sig_atomic_t, which *should* be defined per ANSI # C, but is missing on some old platforms. AC_CHECK_TYPES(sig_atomic_t, [], [], [#include ]) diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c index 715917b..9b70f36 100644 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -402,7 +402,10 @@ static void apply_typmod(NumericVar *var, int32 typmod); static int32 numericvar_to_int4(NumericVar *var); static bool numericvar_to_int8(NumericVar *var, int64 *result); -static void int8_to_numericvar(int64 val, NumericVar *var); +static void int64_to_numericvar(int64 val, NumericVar *var); +#ifdef HAVE_INT128 +static void int128_to_numericvar(int128 val, NumericVar *var); +#endif static double numeric_to_double_no_overflow(Numeric num); static double numericvar_to_double_no_overflow(NumericVar *var); @@ -1414,7 +1417,7 @@ width_bucket_numeric(PG_FUNCTION_ARGS) init_var(&count_var); /* Convert 'count' to a numeric, for ease of use later */ - int8_to_numericvar((int64) count, &count_var); + int64_to_numericvar((int64) count, &count_var); switch (cmp_numerics(bound1, bound2)) { @@ -2083,14 +2086,14 @@ numeric_fac
Re: [HACKERS] PATCH: pgbench - merging transaction logs
On 15.3.2015 20:35, Fabien COELHO wrote: > >> Firstly, the fact that pgbench produces one file per thread is awkward. > > I agree, but I think it is due to the multi process thread emulation: if > you have real threads, you can do a simple fprintf, possibly with some > mutex, and you're done. There is really nothing to do to implement this > feature. My fear was that this either requires explicit locking, or some implicit locking - for example while fprintf() in glibc is thread-safe, I really am not sure about Win32 for example. This might influence the results for very short transactions - I haven't tried that, though, so this might be a false assumption. The way I implemented the merge is immute to this. >> Secondly, a separate tool (even if provided with pgbench) would >> require passing detailed info about the log format - whether it's >> aggregated or not, throttling or not, > > Hmmm. > >>> It could belong to pgbench if pgbench was *generating* the merged >>> log file directly. [...] >> >> I considered this approach, i.e. adding another 'collector' thread >> receiving results from all the other threads, but decided not to >> do that. That would require a fair amount of inter-process >> communication, locking etc. and might affect the measurements > > I agree that inter-process stuff should be avoided. This is not what > I had in mind. I was thinking of "fprintf" on the same file handler > by different threads. That still involves some sort of 'implicit' locking, no? And as I mentioned, I'm not sure fprintf() is thread-safe on all the platforms we support. >>> Another issue raised by your patch is that the log format may be >>> improved, say by providing a one-field timestamp at the beginning >>> of the line. >> >> I don't see how this is relevant to this patch? > > For the simple log format, all the parsing needed would be for the > timestamp, and the remainder would just be text to pass along, no > need to %d %f... whatever. Oh, ok. Well, that's true, but I don't think that significantly changes the overall complexity. >>> The current IO complexity is in p²n where it should be simply pn... >> >> Maybe. Implementing a 2-way merge sort was the simplest solution at >> the moment, and I don't think this really matters because the I/O >> generated by the benchmark itself is usually much higher than this. > > If you do not do a n-way merge, you could do a 2-way merge on a > binary tree so that the IO complexity would be p.log(p).n (I think), > and not p²n. Yes, I could do that. I still think this probably an overengineering, but not a big deal. I'll leave this for later, though (this patch is in 2015-06 CF anyway). >>> For aggregates, some data in the output may be special values >>> "NaN/-/...", [...] >> You mean the aggregated log? I can't think of a way to get there such >> values - can you provide a plausible scenario how that could happen? > > Possibly I'm wrong. Ok, I tried it: > > sh> ./pgbench --rate=0.5 --aggregate-interval=1 --log > sh> cat pgbench_log.12671 > 1426445236 1 5034 25341156 5034 5034 687 471969 687 687 > 1426445237 0 0 0 0 0 0 0 0 0 > 1426445238 0 0 0 0 0 0 0 0 0 > 1426445239 1 8708 75829264 8708 8708 2063 4255969 2063 2063 > ... > > Good news, I could not generate strange values. Just when there are > 0 transactions possibly some care should be taken when combining > values. Yeah, I wasn't able to come up with such scenario, but wasn't sure. > >>> It seems that system function calls are not tested for errors. >> >> That's true, but that's how the rest of pgbench code is written. > > Hmmm This is not entirely true, there are *some* checks if you look > carefully:-) > >if ((fd = fopen(filename, "r")) == NULL) ... >if ((fp = popen(command, "r")) == NULL) ... >nsocks = select(...) >if (nsocks < 0) ... OK, there are a few checks ;-) But none of the fprintf calls IIRC. Anyway, I plan to refactor this part of the patch to get rid of the copy'n'paste pieces, so I'll take care of this too. >>> (b) if we could get rid of the "thread emulation", pgbench could >>> generate the merged logs directly and simply, and the option could >>> be provided then. >> >> That however is not the goal of this patch. > > Sure. My point is that the amount of code you write to implement this > merge stuff is due to this feature. Without it, the patch would > probably need 10 lines of code. Moreover, the way it is implement > requires scanning and reprinting, which means more work in many > places to update the format later. Possibly. But it's not written like that :-( >> The thread emulation is there for a reason, > > My opinion is that it *was* there for a reason. Whether it makes > sense today to still have it, maintain it, and have to write such > heavy code for a trivial feature just because of it is another > matter. Possibly. I can't really decide that. >> and I certainly am not going to work on eliminating it >> (not sure that'
Re: [HACKERS] recovery_target_action = pause & hot_standby = off
On 2015-03-15 20:10:49 +, Simon Riggs wrote: > On 15 March 2015 at 14:16, Andres Freund wrote: > > > Personally I think we just should change the default to 'shutdown' for > > all cases. That makes documentation and behaviour less surprising. And > > makes experimenting less dangerous, since you can just start again. > > We need to look at the specific situation, not make a generic decision. > > If hot_standby = off, we are unable to unpause, once paused. > > Changing the default doesn't alter that problem. > > We have two choices: 1) override to a sensible setting, 2) throw an error. > > (2) sounds clean at first but we must look deeper. We know that the > *only* possible other setting is 'shutdown', so it seems more user > friendly to do the thing we *know* they want (1), rather than pretend > that we don't. > > (1) is completely predictable and not at all surprising. Add a LOG > message if you wish, but don't throw an error. Sorry, I don't buy this. If I have "recovery_target_action = 'pause'" in the config file, I want it to pause. Not do something else, just because postgres doesn't have a interface to unpause without SQL access. That makes some sense to developers, but is pretty much ununderstandable for mere mortals. Even worse, right now (after the bugfix), the behaviour is: postgresql.conf: # hot_standby = off recovery.conf: #recovery_target_action = 'pause' => promote (the downgrading is only active when explicitly configured) # hot_standby = off recovery.conf: recovery_target_action = 'pause' => shutdown (despite an explicit setting of pause) hot_standby = on recovery.conf: # recovery_target_action = 'pause' => pause hot_standby = on recovery.conf: recovery_target_action = 'pause' => pause To me that's just utterly confusing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates
On 14/03/15 04:17, Andreas Karlsson wrote: On 03/13/2015 10:22 PM, Peter Geoghegan wrote: On Thu, Mar 12, 2015 at 6:23 PM, Andreas Karlsson wrote: /* * Integer data types use Numeric accumulators to share code and avoid risk * of overflow. To speed up aggregation 128-bit integer accumulators are * used instead where sum(X) or sum(X*X) fit into 128-bits, and there is * platform support. * * For int2 and int4 inputs sum(X) will fit into a 64-bit accumulator, hence * we use faster special-purpose accumulator routines for SUM and AVG of * these datatypes. */ #ifdef HAVE_INT128 typedef struct Int128AggState Not quite. Refer to the 128-bit integer accumulators as "special-purpose accumulator routines" instead. Then, in the case of the extant 64-bit accumulators, refer to them by the shorthand "integer accumulators". Otherwise it's the wrong way around. I disagree. The term "integer accumulators" is confusing. Right now I do not have any better ideas for how to write that comment, so I submit the next version of the patch with the comment as I wrote it above. Feel free to come up with a better wording, my English is not always up to par when writing technical texts. I don't like the term "integer accumulators" either as "integer" is platform specific. I would phase it like this: /* * Integer data types in general use Numeric accumulators to share code * and avoid risk of overflow. * * However for performance reasons some of the optimized special-purpose * accumulator routines are used when possible. * * On platforms with 128-bit integer support, the 128-bit routines will be * used when sum(X) or sum(X*X) fit into 128-bit. * * For int2 and int4 inputs, the N and sum(X) fit into 64-bit so the 64-bit * accumulators will be used for SUM and AVG of these data types. */ It's almost the same thing as you wrote but the 128 bit and 64 bit optimizations are put on the same "level" of optimized routines. But this is nitpicking at this point, I am happy with the patch as it stands right now. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_action = pause & hot_standby = off
On 15 March 2015 at 14:16, Andres Freund wrote: > Personally I think we just should change the default to 'shutdown' for > all cases. That makes documentation and behaviour less surprising. And > makes experimenting less dangerous, since you can just start again. We need to look at the specific situation, not make a generic decision. If hot_standby = off, we are unable to unpause, once paused. Changing the default doesn't alter that problem. We have two choices: 1) override to a sensible setting, 2) throw an error. (2) sounds clean at first but we must look deeper. We know that the *only* possible other setting is 'shutdown', so it seems more user friendly to do the thing we *know* they want (1), rather than pretend that we don't. (1) is completely predictable and not at all surprising. Add a LOG message if you wish, but don't throw an error. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pgbench - merging transaction logs
Firstly, the fact that pgbench produces one file per thread is awkward. I agree, but I think it is due to the multi process thread emulation: if you have real threads, you can do a simple fprintf, possibly with some mutex, and you're done. There is really nothing to do to implement this feature. Secondly, a separate tool (even if provided with pgbench) would require passing detailed info about the log format - whether it's aggregated or not, throttling or not, Hmmm. It could belong to pgbench if pgbench was *generating* the merged log file directly. [...] I considered this approach, i.e. adding another 'collector' thread receiving results from all the other threads, but decided not to do that. That would require a fair amount of inter-process communication, locking etc. and might affect the measurements I agree that inter-process stuff should be avoided. This is not what I had in mind. I was thinking of "fprintf" on the same file handler by different threads. Another issue raised by your patch is that the log format may be improved, say by providing a one-field timestamp at the beginning of the line. I don't see how this is relevant to this patch? For the simple log format, all the parsing needed would be for the timestamp, and the remainder would just be text to pass along, no need to %d %f... whatever. The current IO complexity is in p²n where it should be simply pn... Maybe. Implementing a 2-way merge sort was the simplest solution at the moment, and I don't think this really matters because the I/O generated by the benchmark itself is usually much higher than this. If you do not do a n-way merge, you could do a 2-way merge on a binary tree so that the IO complexity would be p.log(p).n (I think), and not p²n. For aggregates, some data in the output may be special values "NaN/-/...", [...] You mean the aggregated log? I can't think of a way to get there such values - can you provide a plausible scenario how that could happen? Possibly I'm wrong. Ok, I tried it: sh> ./pgbench --rate=0.5 --aggregate-interval=1 --log sh> cat pgbench_log.12671 1426445236 1 5034 25341156 5034 5034 687 471969 687 687 1426445237 0 0 0 0 0 0 0 0 0 1426445238 0 0 0 0 0 0 0 0 0 1426445239 1 8708 75829264 8708 8708 2063 4255969 2063 2063 ... Good news, I could not generate strange values. Just when there are 0 transactions possibly some care should be taken when combining values. It seems that system function calls are not tested for errors. That's true, but that's how the rest of pgbench code is written. Hmmm This is not entirely true, there are *some* checks if you look carefully:-) if ((fd = fopen(filename, "r")) == NULL) ... if ((fp = popen(command, "r")) == NULL) ... nsocks = select(...) if (nsocks < 0) ... So integrating this into pgbench directly seems like a better approach, and the attached patch implements that. You guessed that I disagree. Note that this is only my own opinion. In summary, I think that: (a) providing a clean script would be nice, (b) if we could get rid of the "thread emulation", pgbench could generate the merged logs directly and simply, and the option could be provided then. That however is not the goal of this patch. Sure. My point is that the amount of code you write to implement this merge stuff is due to this feature. Without it, the patch would probably need 10 lines of code. Moreover, the way it is implement requires scanning and reprinting, which means more work in many places to update the format later. The thread emulation is there for a reason, My opinion is that it *was* there for a reason. Whether it makes sense today to still have it, maintain it, and have to write such heavy code for a trivial feature just because of it is another matter. and I certainly am not going to work on eliminating it (not sure that's even possible). I wish it will be:-) I would suggest this that I would support: implement this feature the simple way (aka fprintf, maybe a mutex) when compiled with threads, and generate an error "feature not available with process-based thread emulation" when compiled with processes. This way we avoid a, lot of heavy code to maintain in the future, and you still get the feature within pgbench. There are already some things which are not the same with thread emulation because it would have been tiring to implement for it for very little benefit. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace
On 15/03/2015 04:34, Andreas Karlsson wrote: > On 03/15/2015 04:25 AM, Andreas Karlsson wrote: >> Nice. You will also want to apply the attached patch which fixes support >> for the --no-tablespaces flag. > > Just realized that --no-tablespaces need to be fixed for pg_restore too. Indeed, after taking a look at pg_restore case, I would say it won't be so easy. Will try to fix it. -- Julien -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pgbench - merging transaction logs
On 15.3.2015 11:22, Fabien COELHO wrote: > > I've looked at the patch. Although I think that such a feature is > somehow desirable... I have two issues with it: ISTM that > > (1) it does not belong to pgbench as such > > (2) even if, the implementation is not right > > About (1): > > I think that this functionnality as implemented does not belong to > pgbench, and does really belong to an external script, which happen > not to be readily available, which is a shame. PostgreSQL should > probably provide such a script, or make it easy to find. I disagree, for two main reasons. Firstly, the fact that pgbench produces one file per thread is awkward. It was implemented like this most likely because it was the simplest solution after adding "-j" in 9.0, but I don't remember if I ever needed a separate log from a single thread. Secondly, a separate tool (even if provided with pgbench) would require passing detailed info about the log format - whether it's aggregated or not, throttling or not, Those are the two main reasons why I think this should be implemented as another option in pgbench. > It could belong to pgbench if pgbench was *generating* the merged > log file directly. However I'm not sure this can be done cleanly for > the multi-process "thread emulation" (IN PASSING: which I think this > should really get rid of because I do not know what system does not > provide threads nowadays and would require to have a state of the art > pgbench nevertheless, and this emulation significantly complexify the > code by making things uselessly difficult and/or needed to be > implemented twice or not be provided in some cases). I considered this approach, i.e. adding another 'collector' thread receiving results from all the other threads, but decided not to do that. That would require a fair amount of inter-process communication, locking etc. and might affect the measurements (especially for the workloads with extremely short transactions, like read-only pgbench). The implemented approach, i.e. merging results collected by each thread independently, after the benchmarking actually completed, is a better solution. And the code is actually simpler. > Another issue raised by your patch is that the log format may be > improved, say by providing a one-field timestamp at the beginning of > the line. I don't see how this is relevant to this patch? > About (2): > > In the implementation you reimplement a partial merge sort by > parsing each line of each file and merging it with the current result > over and over. ISTM that an implementation should read all files in > parallel and merge them in one pass. The current IO complexity is in > p²n where it should be simply pn... do not use it with a significant > number of threads and many lines... Ok, the generated files are > likely to be kept in cache, but nevertheless. Maybe. Implementing a 2-way merge sort was the simplest solution at the moment, and I don't think this really matters because the I/O generated by the benchmark itself is usually much higher than this. The only case when this might make a difference is probably merging large transaction logs (e.g. 100% sample from read-only test on small dataset). > Also there are plenty of copy paste when scanning for two files, and > then reprinting in all the different formats. The same logic is > implemented twice, once for simple and once for aggregated. This > means that updating or extending the log format later on would > require to modify these scans and prints in many places. That is true, and I'll address that somehow (either moving the code to a macro or a separate utility function). > For aggregates, some data in the output may be special values > "NaN/-/...", I am not sure how the implementation would behave in > such cases. As lines that do not match are silently ignored, the > result merge would just be non significant should it rather be an > error? Try it with a low rate for instance. You mean the aggregated log? I can't think of a way to get there such values - can you provide a plausible scenario how that could happen? > > It seems that system function calls are not tested for errors. That's true, but that's how the rest of pgbench code is written. > >> The other disadvantage of the external scripts is that you have to >> pass all the info about the logs (whether the logs are aggregated, >> whther there's throttling, etc.). > > I think that is another argument to make a better format, with the a > timestamp ahead? Also, ISTM that it only needs to know whether it is > merging aggregate or simple logs, no more, the other information can be > infered by the number of fields on the line. No, it's not. Even if you change the format like this, you still have no idea whether the log is a per-transaction log (possibly with some additional options), aggregated log. There might be some auto-detection based on number of fields, for example, but considering how many optio
Re: [HACKERS] Cygwin vs "win32" in configure
Hi, On 2015-03-15 14:03:08 -0400, Tom Lane wrote: > Buildfarm member brolga seems unhappy with my commit 91f4a5a that > restricted port/dirmod.c to being built only on "Windows". Evidently > we need to build it when $PORTNAME = "cygwin" as well, which is fine; > but I'm a bit astonished that none of the other stuff inserted in the > "# Win32 support" stanza beginning at configure.in line 1440 is needed > for cygwin builds. Is that really correct, or am I misunderstanding > what's happening here? Those all sound like things that cygwin is going to emulate for us. I guess we could use more of those functions to reduce the difference, but I'm not sure it's worth it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cygwin vs "win32" in configure
Buildfarm member brolga seems unhappy with my commit 91f4a5a that restricted port/dirmod.c to being built only on "Windows". Evidently we need to build it when $PORTNAME = "cygwin" as well, which is fine; but I'm a bit astonished that none of the other stuff inserted in the "# Win32 support" stanza beginning at configure.in line 1440 is needed for cygwin builds. Is that really correct, or am I misunderstanding what's happening here? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Crash on SRF execution
Thanks Tom. Assuming the SRF had a parameter, would this be a correct approach (considering the iterative model) to bail-out early? if (SRF_IS_FIRSTCALL()) { int i; if (get_call_result_type(fcinfo, NULL, &funcctx->tuple_desc) != TYPEFUNC_COMPOSITE) { ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("Check if sql function definition returns SETOF record"))); return; } if (PG_ARGISNULL(0)) { ereport(ERROR, (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg("Null value not allow for ..."))); return; } if((i = PG_GETARG_INT32(0)) != 'WHATEVER') { ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("Null value not allow for ..."))); return; } -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Sunday, March 15, 2015 5:50 PM To: Itai Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Crash on SRF execution Itai writes: > I'm attempting to program a simple SRF function but it constantly crashes (details and code below). > Any idea why? Looks like you're pallocing some stuff in the calling context (ie, a short-lived context) during the first execution and expecting it to still be there in later executions. You'd need to allocate those data structures in the multi_call_memory_ctx instead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On 2015-03-15 16:30:16 +0100, Andres Freund wrote: > The attached patch does pretty much what you suggested above and tries > to address all the point previously made. I plan to push this fairly > soon; unless somebody has further input. Pushed, after fixing two typos in decode.c I introduced while "cleaning up". This might not yet be perfect, but there seems little point in waiting further. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB
Hi, On 2015-03-03 11:04:30 -0800, Josh Berkus wrote: > Attached is version D, which incorporates the above two changes, but NOT > a general unit comment cleanup of postgresql.conf, which needs to be an > entirely different patch. Pushed! Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
Hi, On 2014-12-08 02:18:11 +0100, Petr Jelinek wrote: > ...except for the removal of pause_at_recovery_target it seems, so I > attached just that Pushed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Crash on SRF execution
Fantastic! That solved this problem. However I still get a crash if I change: is_even to bool num->is_even = ((base_num + i) % 2 == 0) ? true : false; retvals[1] = BoolGetDatum(list->pp_numbers[call_cntr]->is_even); CREATE OR REPLACE FUNCTION pg_srf(OUT value integer, OUT is_even bit) RETURNS SETOF recordAS 'pg_srf.so', 'pg_srf'LANGUAGE CSTRICTIMMUTABLE; > Date: Sun, 15 Mar 2015 17:03:38 +0100 > From: and...@2ndquadrant.com > To: it...@outlook.com > CC: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Crash on SRF execution > > Hi, > > On 2015-03-15 17:59:39 +0200, Itai wrote: > > Thanks for the quick response!! :) > > But I don't get it... isn't: > > if (SRF_IS_FIRSTCALL()){ > > } > > the iterator one time > > "initialization block" where I setup the data to be iterated > > upon? > > > > Can you please elaborate on how should I fix this? I'm probably missing > > something basic... > > > > > if (SRF_IS_FIRSTCALL()) > > > > { > > > > length = 4000; > > > > base_num = 120300; > > > > list = (NumberList *)palloc(sizeof(NumberList)); > > > > list->pp_numbers = (Number **)palloc(sizeof(Number*) * length); > > You allocate memory in the per call context here. > > > > > list->length = length; > > > > i = 0; > > > > for (; i < length; i++) > > > > { > > > >num = (Number *)palloc(sizeof(Number)); > > > >num->value = base_num + i; > > > >num->is_even = ((base_num + i) % 2 == 0) ? 1 : 0; > > > >list->pp_numbers[i] = num; > > > > } > > > > ereport(INFO, (errmsg("--- data source ---"))); > > > > i = 0; > > > > for (; i < length; i++) > > > > { > > > >ereport(INFO, (errmsg("value: %d", list->pp_numbers[i]->value))); > > > >ereport(INFO, (errmsg("is_even: %d", > > > > list->pp_numbers[i]->is_even))); > > > > } > > > > > > > > funcctx = SRF_FIRSTCALL_INIT(); > > > > oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); > > Because you only switch the memory context here. Move this up, to the > beginning of the SRF_IS_FIRSTCALL block. Before the palloc()s. > > Greetings, > > Andres Freund > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Crash on SRF execution
Hi, On 2015-03-15 17:59:39 +0200, Itai wrote: > Thanks for the quick response!! :) > But I don't get it... isn't: > if (SRF_IS_FIRSTCALL()){ > } > the iterator one time > "initialization block" where I setup the data to be iterated > upon? > > Can you please elaborate on how should I fix this? I'm probably missing > something basic... > > > if (SRF_IS_FIRSTCALL()) > > > { > > > length = 4000; > > > base_num = 120300; > > > list = (NumberList *)palloc(sizeof(NumberList)); > > > list->pp_numbers = (Number **)palloc(sizeof(Number*) * length); You allocate memory in the per call context here. > > > list->length = length; > > > i = 0; > > > for (; i < length; i++) > > > { > > >num = (Number *)palloc(sizeof(Number)); > > >num->value = base_num + i; > > >num->is_even = ((base_num + i) % 2 == 0) ? 1 : 0; > > >list->pp_numbers[i] = num; > > > } > > > ereport(INFO, (errmsg("--- data source ---"))); > > > i = 0; > > > for (; i < length; i++) > > > { > > >ereport(INFO, (errmsg("value: %d", list->pp_numbers[i]->value))); > > >ereport(INFO, (errmsg("is_even: %d", list->pp_numbers[i]->is_even))); > > > > > > } > > > > > > funcctx = SRF_FIRSTCALL_INIT(); > > > oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); Because you only switch the memory context here. Move this up, to the beginning of the SRF_IS_FIRSTCALL block. Before the palloc()s. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Crash on SRF execution
Thanks for the quick response!! :) But I don't get it... isn't: if (SRF_IS_FIRSTCALL()){ } the iterator one time "initialization block" where I setup the data to be iterated upon? Can you please elaborate on how should I fix this? I'm probably missing something basic... > Date: Sun, 15 Mar 2015 16:50:27 +0100 > From: and...@2ndquadrant.com > To: it...@outlook.com > CC: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Crash on SRF execution > > Hi, > > On 2015-03-15 17:40:11 +0200, Itai wrote: > > I'm attempting to program a simple SRF function but it constantly crashes > > (details and code below). > > > > Any idea why? > > > if (SRF_IS_FIRSTCALL()) > > { > > length = 4000; > > base_num = 120300; > > list = (NumberList *)palloc(sizeof(NumberList)); > > list->pp_numbers = (Number **)palloc(sizeof(Number*) * length); > > list->length = length; > > i = 0; > > for (; i < length; i++) > > { > >num = (Number *)palloc(sizeof(Number)); > >num->value = base_num + i; > >num->is_even = ((base_num + i) % 2 == 0) ? 1 : 0; > >list->pp_numbers[i] = num; > > } > > ereport(INFO, (errmsg("--- data source ---"))); > > i = 0; > > for (; i < length; i++) > > { > >ereport(INFO, (errmsg("value: %d", list->pp_numbers[i]->value))); > >ereport(INFO, (errmsg("is_even: %d", list->pp_numbers[i]->is_even))); > > } > > > > funcctx = SRF_FIRSTCALL_INIT(); > > oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); > > funcctx->user_fctx = list; > > funcctx->max_calls = list->length; > > if (get_call_result_type(fcinfo, NULL, &funcctx->tuple_desc) != > > TYPEFUNC_COMPOSITE) > >ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > >errmsg("check if sql function definition returns SETOF record"))); > > > > BlessTupleDesc(funcctx->tuple_desc); > > MemoryContextSwitchTo(oldcontext); > > } > > The palloc() for list above is in the per call memory context, but you > use it across several calls. You should allocate it in the multi call > context you use some lines below. > > Greetings, > > Andres Freund > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_action = pause & hot_standby = off
On Sun, Mar 15, 2015 at 3:16 PM, Andres Freund wrote: > On 2015-03-15 14:51:46 +0100, Magnus Hagander wrote: > > On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund > > wrote: > > > > > On 2015-03-12 15:52:02 +0100, Andres Freund wrote: > > > > /* > > > >* Override any inconsistent requests. Not that this is a > change > > > >* of behaviour in 9.5; prior to this we simply ignored a > request > > > >* to pause if hot_standby = off, which was surprising > behaviour. > > > >*/ > > > > if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE && > > > > recoveryTargetActionSet && > > > > standbyState == STANDBY_DISABLED) > > > > recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN; > > > > > > While it's easy enough to fix I rather dislike the whole intent here > > > though. *Silently* switching the mode of operation in a rather > > > significant way seems like a bad idea to me. At the very least we need > > > to emit a LOG message about this; but I think it'd be much better to > > > error out instead. > > > > > > <9.5's behaviour was already quite surprising. But changing things to a > > > different surprising behaviour seems like a bad idea. > > > > > > > +1. Especially for "sensitive" operations like this, having > > predictable-behavior-or-error is usually the best choice. > > Yea. > > Looking further, it's even worse right now. We'll change the target to > shutdown when hot_standby = off, but iff it was set in the config > file. But the default value is (and was, although configured > differently) documented to be 'pause'; so if it's not configured > explicitly we still will promote. At least I can't read that out of the > docs. > > Personally I think we just should change the default to 'shutdown' for > all cases. That makes documentation and behaviour less surprising. And > makes experimenting less dangerous, since you can just start again. > +1. These things need to be clear. Given the consequences of getting it wrong, surprising behavior can be quite dangerous. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Crash on SRF execution
Hi, On 2015-03-15 17:40:11 +0200, Itai wrote: > I'm attempting to program a simple SRF function but it constantly crashes > (details and code below). > > Any idea why? > if (SRF_IS_FIRSTCALL()) > { > length = 4000; > base_num = 120300; > list = (NumberList *)palloc(sizeof(NumberList)); > list->pp_numbers = (Number **)palloc(sizeof(Number*) * length); > list->length = length; > i = 0; > for (; i < length; i++) > { >num = (Number *)palloc(sizeof(Number)); >num->value = base_num + i; >num->is_even = ((base_num + i) % 2 == 0) ? 1 : 0; >list->pp_numbers[i] = num; > } > ereport(INFO, (errmsg("--- data source ---"))); > i = 0; > for (; i < length; i++) > { >ereport(INFO, (errmsg("value: %d", list->pp_numbers[i]->value))); >ereport(INFO, (errmsg("is_even: %d", list->pp_numbers[i]->is_even))); > } > > funcctx = SRF_FIRSTCALL_INIT(); > oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); > funcctx->user_fctx = list; > funcctx->max_calls = list->length; > if (get_call_result_type(fcinfo, NULL, &funcctx->tuple_desc) != > TYPEFUNC_COMPOSITE) >ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >errmsg("check if sql function definition returns SETOF record"))); > > BlessTupleDesc(funcctx->tuple_desc); > MemoryContextSwitchTo(oldcontext); > } The palloc() for list above is in the per call memory context, but you use it across several calls. You should allocate it in the multi call context you use some lines below. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Crash on SRF execution
Itai writes: > I'm attempting to program a simple SRF function but it constantly crashes > (details and code below). > Any idea why? Looks like you're pallocing some stuff in the calling context (ie, a short-lived context) during the first execution and expecting it to still be there in later executions. You'd need to allocate those data structures in the multi_call_memory_ctx instead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Crash on SRF execution
Hi I'm attempting to program a simple SRF function but it constantly crashes (details and code below). Any idea why? Thanks! -Itai - Environment - Ubunto: 14.04.2 (server) PG Ver: PostgreSQL 9.4.1 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit Install: deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main Dev: postgresql-server-dev-9.4 (9.4.1-1.pgdg14.04+1) - Execution - select * from pg_srf(); INFO: --- data source --- INFO: value: 120300 INFO: is_even: 1 INFO: value: 120301 INFO: is_even: 0 ... INFO: value: 1203003998 INFO: is_even: 1 INFO: value: 1203003999 INFO: is_even: 0 INFO: --- data context --- INFO: call_cntr: 0 INFO: value: 120300 INFO: is_even: 1 INFO: call_cntr: 1 INFO: value: 120301 INFO: is_even: 0 INFO: call_cntr: 2 server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> - pg_srf.h - #ifndef PGSRF_H #define PGSRF_H #include "fmgr.h" // using int as bool due to a different issue (one prob. at a time) typedef struct Number_tag { int value; int is_even; } Number; typedef struct NumberList_tag { int length; Number ** pp_numbers; } NumberList; extern Datum pg_srf(PG_FUNCTION_ARGS); #endif - pg_srf.c - #include #include #include #include #include "pg_srf.h" #ifdef PG_MODULE_MAGIC PG_MODULE_MAGIC; #endif PG_FUNCTION_INFO_V1(pg_srf); Datum pg_srf(PG_FUNCTION_ARGS) { int call_cntr, i, length, base_num; Number * num; NumberList * list; HeapTuple rettuple; FuncCallContext *funcctx; MemoryContext oldcontext; if (SRF_IS_FIRSTCALL()) { length = 4000; base_num = 120300; list = (NumberList *)palloc(sizeof(NumberList)); list->pp_numbers = (Number **)palloc(sizeof(Number*) * length); list->length = length; i = 0; for (; i < length; i++) { num = (Number *)palloc(sizeof(Number)); num->value = base_num + i; num->is_even = ((base_num + i) % 2 == 0) ? 1 : 0; list->pp_numbers[i] = num; } ereport(INFO, (errmsg("--- data source ---"))); i = 0; for (; i < length; i++) { ereport(INFO, (errmsg("value: %d", list->pp_numbers[i]->value))); ereport(INFO, (errmsg("is_even: %d", list->pp_numbers[i]->is_even))); } funcctx = SRF_FIRSTCALL_INIT(); oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); funcctx->user_fctx = list; funcctx->max_calls = list->length; if (get_call_result_type(fcinfo, NULL, &funcctx->tuple_desc) != TYPEFUNC_COMPOSITE) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("check if sql function definition returns SETOF record"))); BlessTupleDesc(funcctx->tuple_desc); MemoryContextSwitchTo(oldcontext); } funcctx = SRF_PERCALL_SETUP(); list = funcctx->user_fctx; call_cntr = funcctx->call_cntr; if (call_cntr < funcctx->max_calls) { Datum retvals[2]; bool retnulls[2]; if (call_cntr == 0) { ereport(INFO, (errmsg("--- data context ---"))); } ereport(INFO, (errmsg("call_cntr: %d", call_cntr))); ereport(INFO, (errmsg("value: %d", list->pp_numbers[call_cntr]->value))); retvals[0] = Int32GetDatum(list->pp_numbers[call_cntr]->value); ereport(INFO, (errmsg("is_even: %d", list->pp_numbers[call_cntr]->is_even))); retvals[1] = Int32GetDatum(list->pp_numbers[call_cntr]->is_even); retnulls[0] = false; retnulls[1] = false; rettuple = heap_form_tuple(funcctx->tuple_desc, retvals, retnulls); SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(rettuple)); } else { SRF_RETURN_DONE(funcctx); } } - Makefile - MODULES = pg_srf OBJS = pg_srf.o PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) include $(PGXS) - SQL - CREATE OR REPLACE FUNCTION pg_srf(OUT value integer, OUT is_even integer) RETURNS SETOF record AS 'pg_srf.so', 'pg_srf' LANGUAGE C STRICT IMMUTABLE;
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On 2015-03-02 18:45:18 +0100, Andres Freund wrote: > On 2015-03-02 19:23:56 +0200, Heikki Linnakangas wrote: > > Pass the prepared XID as yet another argument to XactEmitCommitRecord, and > > have XactEmitCommitRecord emit the xl_xact_commit_prepared part of the > > record too. It might even make sense to handle the prepared XID like all the > > other optional fields and add an xinfo flag for it. > > That's what I mean with "non simple". Not a fan of teaching xact.c even > more about twophase's dealings than it already knows. I made an effort to show how horrible it would look like. Turns out it's not that bad ;). Far from pretty, especially in xlog.c, but acceptable. I think treating them more similar actually might open the road for reducing the difference further at some point. The attached patch does pretty much what you suggested above and tries to address all the point previously made. I plan to push this fairly soon; unless somebody has further input. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From 7557d0afa860c37a1992240b54e2bf3710296fc7 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Sun, 15 Mar 2015 14:42:09 +0100 Subject: [PATCH] Merge the various forms of transaction commit & abort records. Since 465883b0a two versions of commit records have existed. A compact version that was used when no cache invalidations, smgr unlinks and similar were needed, and a full version that could deal with all that. Additionally the full version was embedded into twophase commit records. That resulted in a measurable reduction in the size of the logged WAL in some workloads. But more recently additions like logical decoding, which e.g. needs information about the database something was executed on, made it applicable in fewer situations. The static split generally made it hard to expand the commit record, because concerns over the size made it hard to add anything to the compact version. Additionally it's not particularly pretty to have twophase.c insert RM_XACT records. Rejigger things so that the commit and abort records only have one form each, including the twophase equivalents. The presence of the various optional (in the sense of not being in every record) pieces is indicated by a bits in the 'xinfo' flag. That flag previously was not included in compact commit records. To prevent an increase in size due to its presence, it's only included if necessary; signalled by a bit in the xl_info bits available for xact.c, similar to heapam.c's XLOG_HEAP_OPMASK/XLOG_HEAP_INIT_PAGE. Twophase commit/aborts are now the same as their normal counterparts. The original transaction's xid is included in an optional data field. This means that commit records generally are smaller, except in the case of a transaction with subtransactions, but no other special cases; the increase there is four bytes, which seems acceptable given that the more common case of not having subtransactions shrank. The savings are especially measurable for twophase commits, which previously always used the full version; but will in practice only infrequently have required that. The motivation for this work are not the space savings and and deduplication though; it's that it makes it easier to extend commit records with additional information. That's just a few lines of code now; without impacting the common case where that information is not needed. Discussion: 20150220152150.gd4...@awork2.anarazel.de, 235610.92468.qm%40web29004.mail.ird.yahoo.com Reviewed-By: Heikki Linnakangas, Simon Riggs --- src/backend/access/rmgrdesc/xactdesc.c | 245 +++- src/backend/access/transam/twophase.c| 59 +--- src/backend/access/transam/xact.c| 476 +++ src/backend/access/transam/xlog.c| 126 src/backend/replication/logical/decode.c | 133 +++-- src/include/access/xact.h| 215 ++ src/include/access/xlog_internal.h | 2 +- 7 files changed, 749 insertions(+), 507 deletions(-) diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c index 3e87978..b036b6d 100644 --- a/src/backend/access/rmgrdesc/xactdesc.c +++ b/src/backend/access/rmgrdesc/xactdesc.c @@ -14,53 +14,188 @@ */ #include "postgres.h" +#include "access/transam.h" #include "access/xact.h" #include "catalog/catalog.h" #include "storage/sinval.h" #include "utils/timestamp.h" +/* + * Parse the WAL format of a xact commit and abort records into a easier to + * understand format. + * + * This routines are in xactdesc.c because they're accessed in backend (when + * replaying WAL) and frontend (pg_xlogdump) code. This file is the only xact + * specific one shared between both. They're complicated enough that + * duplication would be bothersome. + */ + +void +ParseCommitRecord(uint8 info, xl_xact_commit *xlrec, xl_xact_parsed_comm
Re: [HACKERS] pg_dump quietly ignore missing tables - is it bug?
Pavel Stehule writes: > other variant, I hope better than previous. We can introduce new long > option "--strict". With this active option, every pattern specified by -t > option have to have identifies exactly only one table. It can be used for > any other "should to exists" patterns - schemas. Initial implementation in > attachment. I think this design is seriously broken. If I have '-t foo*' the code should not prevent that from matching multiple tables. What would the use case for such a restriction be? What would make sense to me is one or both of these ideas: * require a match for a wildcard-free -t switch * require at least one (not "exactly one") match for a wildcarded -t switch. Neither of those is what you wrote, though. If we implemented the second one of these, it would have to be controlled by a new switch, because there are plausible use cases for wildcards that sometimes don't match anything (not to mention backwards compatibility). There might be a reasonable argument for the first one being the default behavior, though; I'm not sure if we could get away with that from a compatibility perspective. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_action = pause & hot_standby = off
On 2015-03-15 14:51:46 +0100, Magnus Hagander wrote: > On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund > wrote: > > > On 2015-03-12 15:52:02 +0100, Andres Freund wrote: > > > /* > > >* Override any inconsistent requests. Not that this is a change > > >* of behaviour in 9.5; prior to this we simply ignored a request > > >* to pause if hot_standby = off, which was surprising behaviour. > > >*/ > > > if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE && > > > recoveryTargetActionSet && > > > standbyState == STANDBY_DISABLED) > > > recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN; > > > > While it's easy enough to fix I rather dislike the whole intent here > > though. *Silently* switching the mode of operation in a rather > > significant way seems like a bad idea to me. At the very least we need > > to emit a LOG message about this; but I think it'd be much better to > > error out instead. > > > > <9.5's behaviour was already quite surprising. But changing things to a > > different surprising behaviour seems like a bad idea. > > > > +1. Especially for "sensitive" operations like this, having > predictable-behavior-or-error is usually the best choice. Yea. Looking further, it's even worse right now. We'll change the target to shutdown when hot_standby = off, but iff it was set in the config file. But the default value is (and was, although configured differently) documented to be 'pause'; so if it's not configured explicitly we still will promote. At least I can't read that out of the docs. Personally I think we just should change the default to 'shutdown' for all cases. That makes documentation and behaviour less surprising. And makes experimenting less dangerous, since you can just start again. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_action = pause & hot_standby = off
On 15/03/15 14:51, Magnus Hagander wrote: On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund mailto:and...@2ndquadrant.com>> wrote: On 2015-03-12 15:52:02 +0100, Andres Freund wrote: > /* >* Override any inconsistent requests. Not that this is a change >* of behaviour in 9.5; prior to this we simply ignored a request >* to pause if hot_standby = off, which was surprising behaviour. >*/ > if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE && > recoveryTargetActionSet && > standbyState == STANDBY_DISABLED) > recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN; While it's easy enough to fix I rather dislike the whole intent here though. *Silently* switching the mode of operation in a rather significant way seems like a bad idea to me. At the very least we need to emit a LOG message about this; but I think it'd be much better to error out instead. <9.5's behaviour was already quite surprising. But changing things to a different surprising behaviour seems like a bad idea. +1. Especially for "sensitive" operations like this, having predictable-behavior-or-error is usually the best choice. Thinking about it again now, it does seem that ignoring user setting because it's in conflict with another user setting is a bad idea and I think we in general throw errors on those. So +1 from me also. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
On 2014-12-08 02:18:11 +0100, Petr Jelinek wrote: > On 08/12/14 02:06, Petr Jelinek wrote: > >On 08/12/14 02:03, Michael Paquier wrote: > >>On Mon, Dec 8, 2014 at 9:59 AM, Petr Jelinek > >>wrote: > >>>Ok this patch does that, along with the rename to > >>>recovery_target_action and > >>>addition to the recovery.conf.sample. > >>This needs a rebase as at least da71632 and b8e33a8 are conflicting. > >> > > > >Simon actually already committed something similar, so no need. > > > > ...except for the removal of pause_at_recovery_target it seems, so I > attached just that I intend to push this one unless somebody protests soon. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_action = pause & hot_standby = off
On Sun, Mar 15, 2015 at 2:27 PM, Andres Freund wrote: > On 2015-03-12 15:52:02 +0100, Andres Freund wrote: > > /* > >* Override any inconsistent requests. Not that this is a change > >* of behaviour in 9.5; prior to this we simply ignored a request > >* to pause if hot_standby = off, which was surprising behaviour. > >*/ > > if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE && > > recoveryTargetActionSet && > > standbyState == STANDBY_DISABLED) > > recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN; > > While it's easy enough to fix I rather dislike the whole intent here > though. *Silently* switching the mode of operation in a rather > significant way seems like a bad idea to me. At the very least we need > to emit a LOG message about this; but I think it'd be much better to > error out instead. > > <9.5's behaviour was already quite surprising. But changing things to a > different surprising behaviour seems like a bad idea. > +1. Especially for "sensitive" operations like this, having predictable-behavior-or-error is usually the best choice. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] recovery_target_action = pause & hot_standby = off
On 2015-03-12 15:52:02 +0100, Andres Freund wrote: > /* >* Override any inconsistent requests. Not that this is a change >* of behaviour in 9.5; prior to this we simply ignored a request >* to pause if hot_standby = off, which was surprising behaviour. >*/ > if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE && > recoveryTargetActionSet && > standbyState == STANDBY_DISABLED) > recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN; While it's easy enough to fix I rather dislike the whole intent here though. *Silently* switching the mode of operation in a rather significant way seems like a bad idea to me. At the very least we need to emit a LOG message about this; but I think it'd be much better to error out instead. <9.5's behaviour was already quite surprising. But changing things to a different surprising behaviour seems like a bad idea. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pgbench - merging transaction logs
Sun, 15 Mar 2015 11:22:01 +0100 (CET) Hello Tomas, attached is a patch implementing merging of pgbench logs. These logs are written by each thread, so with N threads you get N files with names pgbench_log.PID pgbench_log.PID.1 ... pgbench_log.PID.N Before analyzing these logs, these files need to be combined. I usually ended up wrinting ad-hoc scripts doing that, lost them, written them again and so on over and over again. I've looked at the patch. Although I think that such a feature is somehow desirable... I have two issues with it: ISTM that (1) it does not belong to pgbench as such (2) even if, the implementation is not right About (1): I think that this functionnality as implemented does not belong to pgbench, and does really belong to an external script, which happen not to be readily available, which is a shame. PostgreSQL should probably provide such a script, or make it easy to find. It could belong to pgbench if pgbench was *generating* the merged log file directly. However I'm not sure this can be done cleanly for the multi-process "thread emulation" (IN PASSING: which I think this should really get rid of because I do not know what system does not provide threads nowadays and would require to have a state of the art pgbench nevertheless, and this emulation significantly complexify the code by making things uselessly difficult and/or needed to be implemented twice or not be provided in some cases). Another issue raised by your patch is that the log format may be improved, say by providing a one-field timestamp at the beginning of the line. About (2): In the implementation you reimplement a partial merge sort by parsing each line of each file and merging it with the current result over and over. ISTM that an implementation should read all files in parallel and merge them in one pass. The current IO complexity is in p²n where it should be simply pn... do not use it with a significant number of threads and many lines... Ok, the generated files are likely to be kept in cache, but nevertheless. Also there are plenty of copy paste when scanning for two files, and then reprinting in all the different formats. The same logic is implemented twice, once for simple and once for aggregated. This means that updating or extending the log format later on would require to modify these scans and prints in many places. For aggregates, some data in the output may be special values "NaN/-/...", I am not sure how the implementation would behave in such cases. As lines that do not match are silently ignored, the result merge would just be non significant should it rather be an error? Try it with a low rate for instance. It seems that system function calls are not tested for errors. The other disadvantage of the external scripts is that you have to pass all the info about the logs (whether the logs are aggregated, whther there's throttling, etc.). I think that is another argument to make a better format, with the a timestamp ahead? Also, ISTM that it only needs to know whether it is merging aggregate or simple logs, no more, the other information can be infered by the number of fields on the line. So integrating this into pgbench directly seems like a better approach, and the attached patch implements that. You guessed that I disagree. Note that this is only my own opinion. In summary, I think that: (a) providing a clean script would be nice, (b) if we could get rid of the "thread emulation", pgbench could generate the merged logs directly and simply, and the option could be provided then. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ranlib bleating about dirmod.o being empty
On Sun, Mar 15, 2015 at 1:58 AM, Tom Lane wrote: > I'm getting rather tired of reading these warning messages in OS X builds: > > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: > file: libpgport.a(dirmod.o) has no symbols > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: > file: libpgport_srv.a(dirmod_srv.o) has no symbols > > The reason for this warning is that dirmod.o contains no functions that > aren't Windows-specific. As such, it seems like putting it into the > list of "always compiled" port modules is really a mistake. Any > objections to switching it to be built only on Windows? +1. Thanks for changing that. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers