Re: [HACKERS] moving from contrib to bin
On Tue, Dec 23, 2014 at 3:24 PM, Michael Paquier wrote: > On Mon, Dec 22, 2014 at 11:30 PM, Alvaro Herrera > wrote: >> Michael Paquier wrote: >> >>> And here is an updated patch following those lines. Similarly to the >>> things in contrib/, a set of variables are used to define for each >>> module what are the extra libraries, include dirs, etc. This refactors >>> quite a bit of code, even if there are a couple of exceptions like >>> pg_xlogdump/ or pg_basebackup/. >> >> In a broad look, this looks a lot better. I think we should press >> forward with this whole set of patches and see what the buildfarm >> thinks. > Here is a new series of patches for all those things, with the > following additions: > - Some cleanup for MSVC scripts compared to last patch > - Moved documentation to ref/ > - Removed mention to the authors of the utilities moved (?) > This set of patches is basically made of the former set, with 2 > additional patches for MSVC stuff and documentation pages moved to > ref/. Where are we on this? This patch is waiting for input from author for the last couple of weeks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP PRIVILEGES OWNED BY
On Thu, Dec 18, 2014 at 1:43 AM, Marko Tiikkaja wrote: > I don't have a problem with that. It would probably work, too, since FROM > is already fully reserved. Marking patch as returned with feedback as there has been no input from Marko in the last couple of weeks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
On Mon, Dec 15, 2014 at 7:50 PM, Atri Sharma wrote: > I have moved patch to current CF and marked it as Waiting on Author since I > plan to resubmit after addressing the concerns. Nothing happened in the last month, so marking as returned with feedback. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Async execution of postgres_fdw.
On Tue, Jan 13, 2015 at 8:46 PM, Kyotaro HORIGUCHI wrote: > I'll look into the case after this, but I'd like to send a > revised patch at this point. Hm. Seems like this patch is not completely baked yet. Horiguchi-san, as you are obviously still working on it, would you agree to move it to the next CF? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical column ordering
On Sun, Jan 4, 2015 at 10:37 AM, Alvaro Herrera wrote: > So I'm reworking my patch with that in mind. Switching to returned with feedback. Alvaro, feel free to add an entry to the next CF if you are planning to work on it again. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join push-down support for foreign tables
On Fri, Dec 26, 2014 at 1:48 PM, Shigeru Hanada wrote: > Hmm, I agree to support N-way join is very useful. Postgres-XC's SQL > generator seems to give us a hint for such case, I'll check it out > again. Switching to returned with feedback, as this patch is waiting for feedback for a couple of weeks now. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] install libpq.dll in bin directory on Windows / Cygwin
On Wed, Dec 24, 2014 at 3:08 PM, Michael Paquier wrote: > Attached are two patches, one for MinGW/cygwin, a slightly modified > version from Peter and the second implementing the same thing but for > the MSVC scripts. The method for MSVC is similar to what is done in > Peter's patch: roughly it checks if SO_MAJOR_VERSION is present in the > Makefile of a given library, the path of Makefile is found by looking > at the location of the .rc in the vcproj file (could be better but I > could not come up with a better method). TBH, it would be good to be > completely consistent in the way we build things on Windows, and we > may as well consider a backpatch to fix this long-standing bug. The > MSVC patch removes of course the hack copying libpq.dll from lib/ to > bin/. > > I mentioned the fix for MSVC scripts as well here: > http://www.postgresql.org/message-id/cab7npqqiuepzphd3mmk+q7_cqqrkk_v1fvxknymri660z4d...@mail.gmail.com Peter, this patch is waiting for input for a couple of weeks. IMO, it would be good to finally get a fix for this bug, and we have patches for both MSVC (the patch I sent) and mingw (your stuff). -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On 2014/12/23 0:36, Tom Lane wrote: > Yeah, we need to do something about the PlanRowMark data structure. > Aside from the pre-existing issue in postgres_fdw, we need to fix it > to support inheritance trees in which more than one rowmark method > is being used. That rte.hasForeignChildren thing is a crock, > The idea I'd had about that was to convert the markType field into a > bitmask, so that a parent node's markType could represent the logical > OR of the rowmark methods being used by all its children. I've not > attempted to code this up though, and probably won't get to it until > after Christmas. One thing that's not clear is what should happen > with ExecRowMark. As I said before, that seems to me like a good idea. So I'll update the patch based on that if you're okey with it. Or you've found any problem concerning the above idea? Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace
On Tue, Dec 30, 2014 at 11:48 AM, Andreas Karlsson wrote: > Here is my review of the feature. Marked as returned with feedback. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] compress method for spgist - 2
Marking this patch as returned with feedback because it is waiting for input from the author for now a couple of weeks. Heikki, the refactoring patch has some value, are you planning to push it? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
Marking this patch as returned with feedback for this CF, moving it to the next one. I doubt that there will be much progress here for the next couple of days, so let's try at least to get something for this release cycle. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fillfactor for GIN indexes
On Thu, Jan 8, 2015 at 2:03 PM, Michael Paquier wrote: > On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov > wrote: >> On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier >>> I am attaching an updated patch, with the default fillfactor value at >>> 75%, and with the page split code using the fillfactor rate. >>> Thoughts? >> Rewritten version of patch is attached. I made following changes: > > Thanks! With this patch (and my previous version as well) GIN indexes > with default fillfactor have a size higher than 9.4 indexes, 9.4 > behavior being consistent only with fillfactor=100 and not the default > of 90. Are we fine with that? IMO, this patch has value to control random updates on GIN indexes, but we should have a default fillfactor of 100 to have index size consistent with 9.4. Thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
On Mon, Dec 15, 2014 at 11:15 PM, Alex Shulgin wrote: > Michael Paquier writes: Perhaps ssloptions.[ch], unless you plan to add non-option-related code there later? >>> >>> I don't think anything else than common options-related code fits in >>> there, so ssloptions.c makes sense to me. >>> BTW, there is no Regent code in your openssl.c, so the copyright statement is incorrect. >>> >>> Good catch, I just blindly copied that from some other file. >> There have been arguments in favor and against this patch... But >> marking it as returned with feedback because of a lack of activity in >> the last couple of weeks. Feel free to update if you think that's not >> correct, and please move this patch to commit fest 2014-12. > > Attached is a new version that addresses the earlier feedback: renamed > the added *.[ch] files and removed incorrect copyright line. > > I'm moving this to the current CF. This patch is statuquo since the beginning of this CF, at the arguments are standing the same. If there is nothing happening maybe it would be better to mark it as returned with feedback? Thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns
Here is an example using postgres_fdw. [Terminal 1] postgres=# create table t (a int, b int); CREATE TABLE postgres=# insert into t values (1, 1); INSERT 0 1 postgres=# begin; BEGIN postgres=# update t set b = b * 2; UPDATE 1 [Terminal 2] postgres=# create foreign table ft (a int) server loopback options (table_name 'lbt'); CREATE FOREIGN TABLE postgres=# insert into ft values (1); INSERT 0 1 postgres=# select tableoid, ctid, * from ft; tableoid | ctid | a --+---+--- 25092 | (0,1) | 1 (1 row) postgres=# select ft.tableoid, ft.ctid, ft.* from t, ft where t.a = ft.a for update; [Terminal 1] postgres=# commit; COMMIT [Terminal 2] postgres=# select ft.tableoid, ft.ctid, ft.* from t, ft where t.a = ft.a for update; tableoid | ctid | a --++--- 0 | (4294967295,0) | 1 (1 row) Note that tableoid and ctid have been changed! I think the reason for that is because EvalPlanQualFetchRowMarks doesn't properly set tableoid and ctid for foreign tables, IIUC. I think this should be fixed. Please find attached a patch. The patch slightly relates to [1], so if it is reasonable, I'll update [1] on top of this. [1] https://commitfest.postgresql.org/action/patch_view?id=1386 Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/postgres_fdw.c --- b/contrib/postgres_fdw/postgres_fdw.c *** *** 2947,2953 make_tuple_from_result_row(PGresult *res, tuple = heap_form_tuple(tupdesc, values, nulls); if (ctid) ! tuple->t_self = *ctid; /* Clean up */ MemoryContextReset(temp_context); --- 2947,2953 tuple = heap_form_tuple(tupdesc, values, nulls); if (ctid) ! tuple->t_self = tuple->t_data->t_ctid = *ctid; /* Clean up */ MemoryContextReset(temp_context); *** a/src/backend/executor/execMain.c --- b/src/backend/executor/execMain.c *** *** 795,800 InitPlan(QueryDesc *queryDesc, int eflags) --- 795,801 { PlanRowMark *rc = (PlanRowMark *) lfirst(l); Oid relid; + char relkind; Relation relation; ExecRowMark *erm; *** *** 817,826 InitPlan(QueryDesc *queryDesc, int eflags) --- 818,833 break; case ROW_MARK_COPY: /* there's no real table here ... */ + relkind = rt_fetch(rc->rti, rangeTable)->relkind; + if (relkind == RELKIND_FOREIGN_TABLE) + relid = getrelid(rc->rti, rangeTable); + else + relid = InvalidOid; relation = NULL; break; default: elog(ERROR, "unrecognized markType: %d", rc->markType); + relid = InvalidOid; relation = NULL; /* keep compiler quiet */ break; } *** *** 831,836 InitPlan(QueryDesc *queryDesc, int eflags) --- 838,844 erm = (ExecRowMark *) palloc(sizeof(ExecRowMark)); erm->relation = relation; + erm->relid = relid; erm->rti = rc->rti; erm->prti = rc->prti; erm->rowmarkId = rc->rowmarkId; *** *** 2318,2325 EvalPlanQualFetchRowMarks(EPQState *epqstate) /* build a temporary HeapTuple control structure */ tuple.t_len = HeapTupleHeaderGetDatumLength(td); ! ItemPointerSetInvalid(&(tuple.t_self)); ! tuple.t_tableOid = InvalidOid; tuple.t_data = td; /* copy and store tuple */ --- 2326,2342 /* build a temporary HeapTuple control structure */ tuple.t_len = HeapTupleHeaderGetDatumLength(td); ! /* if relid is valid, rel is a foreign table; set system columns */ ! if (OidIsValid(erm->relid)) ! { ! tuple.t_self = td->t_ctid; ! tuple.t_tableOid = erm->relid; ! } ! else ! { ! ItemPointerSetInvalid(&(tuple.t_self)); ! tuple.t_tableOid = InvalidOid; ! } tuple.t_data = td; /* copy and store tuple */ *** a/src/backend/executor/nodeForeignscan.c --- b/src/backend/executor/nodeForeignscan.c *** *** 22,27 --- 22,28 */ #include "postgres.h" + #include "access/htup_details.h" #include "executor/executor.h" #include "executor/nodeForeignscan.h" #include "foreign/fdwapi.h" *** *** 53,65 ForeignNext(ForeignScanState *node) /* * If any system columns are requested, we have to force the tuple into * physical-tuple form to avoid "cannot extract system attribute from ! * virtual tuple" errors later. We also insert a valid value for ! * tableoid, which is the only actually-useful system column. */ if (plan->fsSystemCol && !TupIsNull(slot)) { HeapTuple tup = ExecMaterializeSlot(slot); tup->t_tableOid = RelationGetRelid(node->ss.ss_currentRelation); } --- 54,69 /* * If any system columns are requested, we have to force the tuple into * physical-tuple form to avoid "cannot extract system attribute from ! * virtual tuple" errors later. We also insert a valid value for TID ! * and tableoid, which are the only actually-useful system columns. */ if (plan->fsSystemCol && !TupIsNull(slot)) { HeapTuple
Re: [HACKERS] __attribute__ for non-gcc compilers
15.01.2015, 00:54, Andres Freund kirjoitti: > I think I'd for now simply not define pg_attribute_aligned() on > platforms where it's not supported, instead of defining it empty. If we > need a softer variant we can name it pg_attribute_aligned_if_possible or > something. Good point, all attributes that actually change program behavior (aligned and packed for now) need to error out during compilation if they're used and they're not supported by the compiler. Attributes which may improve optimization or just provide more information for the developers (noreturn, unused, format) can be defined empty when they're not supported (or are not supported well enough: GCC < 3 doesn't know about %z in printf format.) / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Commit fest 2015-12 enters money time
Hi all, We are soon entering in the money time for this CF. The last month has been mainly a vacation period, the progress being fantomatic on many fronts, still there are a couple of patches that are marked as ready for committer: - Foreign table inheritance , whose first patch has been committed - speedup tidbitmap patch: cache page - pg_basebackup vs. Windows and tablespaces (Extend base backup to include symlink file used to restore symlinks) - If pg_hba.conf has changed since last reload, emit a HINT to the client and server log - Add restore_command_retry_interval option to control timeout of restore_command nonzero status code - documentation update for doc/src/sgml/func.sgml - Reducing lock strength of trigger and foreign key DDL I am going to make a first pass on all the patches to check their status, and please note that the patches waiting for some input from their authors will be marked as returned with feedback. Thanks, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL
I wrote: > I think that we could pass it to a committer. Marked as such. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Define and document a guaranteed ordering for RETURNING?
Apparently I'm semi-blind - the docs also note that: > If the INSERT command contains a RETURNING clause, the result will be similar to that of a SELECT statement containing the columns and values defined in the RETURNING list, computed over the row(s) inserted by the command. ... so perhaps it's enough to just explicitly mention that ordering is preserved.
Re: [HACKERS] Overhauling our interrupt handling
Hello, I'd synced up this at last. I think I should finilize my commitfest item for this issue, with .. "Rejected"? > This mail is a answer to > http://archives.postgresql.org/message-id/20150110022542.GF12509%40alap3.anarazel.de > but I decided that it's a good go move it into a new thread since the > patchseries has outgrown its original target. It's invasive and touches > many arcane areas of the code, so that I think more eyes than a long > deep thread is likely to have, are a good thing. > > As a short description of the goal of the patchseries: > This started out as steps on the way to be able to safely handle > terminate_backend() et al when we're reading/writing from the > client. E.g. because the client is dead and we haven't noticed yet and > are stuck writing, or because we want to implement a idle_in_transaction > timeout. > > Making these changes allowed me to see that not doing significant work > in signal handlers can make code much simpler and more robust. The > number of bugs postgres had in the past that involved doing too much in > signal handlers is significant. Thus I've since extended the > patchseries to get rid of nearly all nontrivial work in signal > handlers. It sounds pretty nice. > All the patches in the series up to 0008 hav ecommit messages providing > more detail. A short description of each patch follows: > > 0001: Replace walsender's latch with the general shared latch. > > New patch that removes ImmediateInteruptOK behaviour from walsender. I > think that's a rather good idea, because walsender currently seems to > assume WaitLatchOrSocket is reentrant - which I don't think is really > guaranteed. > Hasn't been reviewed yet, but I think it's not far from being > committable. Deesn't this patchset containing per-socket basis non-blocking control for win32? It should make the code (above the win32 socket layer itself) more simpler. > 0002: Use a nonblocking socket for FE/BE communication and block using > latches. > > Has previously been reviewed by Heikki. I think Noah also had a > look, although I'm not sure how close that was. > > I think this can be committed soon. > > 0003: Introduce and use infrastructure for interrupt processing during client > reads. > > From here on ImmediateInterruptOK isn't set during client > communication. Normal interrupts and sinval/async interrupts are > processed outside of signal handlers. Especially the sinval/async > greatly simplify the respective code. > > Has been reviewed by Heikki in an earlier version - I think I > addressed all the review comments. > > I'd like somebody else to look at the sinval/async.c changes > before committing. I really like the simplification this change > allowed. > > I think this patch might not be safe without 0004 because I can't > convince myself that it's safe to interrupt latch waits with work > that actually might also use the same latch (sinval > interrupts). But it's easier to review this way as 0004 is quite > big. > > 0004: Process 'die' interrupts while reading/writing from the client socket. > > This is the reason Horiguchi-san started this thread. > > I think the important debate here is whether we think it's > acceptable that there are situations (a full socket buffer, but a > alive connection) where the client previously got an error, but > not anymore afterwards. I think that's much better than having > unkillable connections, but I can see others being of a different > opinion. This patch yields a code a bit confusion like following. | secure_raw_write(Port *port, const void *ptr, size_t len) | { .. | w = WaitLatchOrSocket(MyLatch, | if (w & WL_LATCH_SET) ... | errno = EINTR; | else if (w & WL_SOCKET_WRITEABLE) | goto wloop; | | errno = save_errno; The errno set when WL_LATCH_SET always vanishes. Specifically, the EINTR set by SIGTERM(WL_LATCH_SET) is overwritten by EAGAIN. As the result, pg_terminte_backend() cannot kill the backend till the write blocking is released. errno = save_errno should be the alternative of the line "errno = EINTR" and I confirmed that the change leads to the desirable (as of me) behavior. > 0005: Move deadlock and other interrupt handling in proc.c out of signal > handlers. > > I'm surprised how comparatively simple this turned out to be. For > robustness and understanding I think this is a great improvement. > > New and not reviewed at all. Needs significant review. But works > in my simple testcases. > > 0006: Don't allow immediate interupts during authentication anymore. > > So far we've set ImmediateInterruptOK to true during large parts > of the client authentication - that's not all that pretty, > interrupts might arrive while we're in so
Re: [HACKERS] orangutan seizes up during isolation-check
On Wed, Jan 14, 2015 at 04:48:53PM -0500, Peter Eisentraut wrote: > What I'm seeing now is that the unaccent regression tests when run under > make check-world abort with > > FATAL: postmaster became multithreaded during startup > HINT: Set the LC_ALL environment variable to a valid locale. contrib/unaccent/Makefile sets NO_LOCALE=1, so that makes sense. I expect the patch over here will fix it: http://www.postgresql.org/message-id/20150109063015.ga2491...@tornado.leadboat.com > The behavior is also a bit strange. The step > > == starting postmaster== > > hangs for one minute before failing. This probably has nothing to do > with your change, but maybe pg_regress could detect postmaster startup > failures better. Yeah, I think any postmaster startup failure has that effect. Not ideal. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Wed, Jan 14, 2015 at 9:30 AM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > > On Wed, Jan 14, 2015 at 9:12 AM, Amit Kapila wrote: >> >> Here we have to decide what should be the strategy and how much >> each worker should scan. As an example one of the the strategy >> could be if the table size is X MB and there are 8 workers, then >> divide the work as X/8 MB for each worker (which I have currently >> used in patch) and another could be each worker does scan >> 1 block at a time and then check some global structure to see which >> next block it needs to scan, according to me this could lead to random >> scan. I have read that some other databases also divide the work >> based on partitions or segments (size of segment is not very clear). > > > A block can contain useful tuples, i.e tuples which are visible and fulfil the quals + useless tuples i.e. tuples which are dead, invisible or that do not fulfil the quals. Depending upon the contents of these blocks, esp. the ratio of (useful tuples)/(unuseful tuples), even though we divide the relation into equal sized runs, each worker may take different time. So, instead of dividing the relation into number of run = number of workers, it might be better to divide them into fixed sized runs with size < (total number of blocks/ number of workers), and let a worker pick up a run after it finishes with the previous one. The smaller the size of runs the better load balancing but higher cost of starting with the run. So, we have to strike a balance. > I think your suggestion is good and it somewhat falls inline with what Robert has suggested, but instead of block-by-block, you seem to be suggesting of doing it in chunks (where chunk size is not clear), however the only point against this is that such a strategy for parallel sequence scan could lead to random scans which can hurt the operation badly. Nonetheless, I will think more on this lines of making work distribution dynamic so that we can ensure that all workers can be kept busy. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[HACKERS] Define and document a guaranteed ordering for RETURNING?
Hi all It's recently occurred to me that many people rely on the RETURNING clause for multi-valued INSERTs, INSERT INTO ... SELECT, etc, returning rows in the same order they were supplied as input into the INSERT. I often see lock-step iteration over the inputs to an INSERT and the RETURNING result-tuples used to map generated sequence values to the corresponding input. I can't find anything relevant in the docs about the order of results in RETURNING, and it strikes me that this should probably be codified one way or the other. The docs simply read: > The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted. This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT. Looking at the code I don't see any reason why it wouldn't currently be safe to assume that INSERT ... VALUES (...), (...), ... RETURNING ... and INSERT ... SELECT ... ORDER BY ... RETURNING ... always return rows in the exact order of their input. But that may not always be true and if it's not documented as safe, it's not necessarily wise to rely on it. If you can't rely on RETURNING order you have to instead identify a candidate key on the input and use a wider RETURNING result that includes that candidate key as well as whatever else you want returned. Then map the results onto the inputs. No fun at all. So - I propose to document that INSERT ... RETURNING guarantees that the result-tuples will be in the same order as the input to the INSERT ... RETURNING statement. So if that order is well-defined, as in the case of multi-VALUES clauses or SELECT ... ORDER BY ..., the result order is also well defined. (An alternative would be adding RETURNING ... WITH ORDINALITY. Which seems pretty horrible). Thoughts/comments? Barring objections I'll write up a docs patch for the docs list. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] initdb -S and tablespaces
At 2015-01-14 11:59:08 +0100, and...@2ndquadrant.com wrote: > > > + if (ControlFile->state != DB_SHUTDOWNED && > > + ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY) > > + perform_fsync(data_directory); > > + > > a) Please think of a slightly more descriptive name than perform_fsync OK. (I just copied the name initdb uses, because at the time I was still thinking in terms of a later patch moving this to src/common.) What do you think of fsync_recursively? fsync_pgdata? I think fsync_recursively(data_directory) reads well. > b) I think we should check here whether fsync is enabled. OK, will do. > c) I'm wondering if we should add fsync to the control file and also >perform an fsync if the last shutdown was clear, but fsync was >disabled. Explain? "Add fsync to the control file" means store the value of the fsync GUC setting in the control file? And would the fsync you mention be dependent on the setting, or unconditional? > > +static void > > +pre_sync_fname(char *fname, bool isdir) > > +{ > > this essentially already exists in the backend inparts. C.f. > pg_flush_data. OK, I missed that. I'll rework the patch to use it. > Theoretically you should also invoke fsync on directories. OK. > > +* We need to name the parent of PGDATA. get_parent_directory() isn't > > +* enough here, because it can result in an empty string. > > +*/ > > + snprintf(pdir, MAXPGPATH, "%s/..", pg_data); > > + canonicalize_path(pdir); > > Hm. Why is this neded? Just syncing . should work? Not sure, will investigate. Thanks. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind in contrib
On Wed, Jan 14, 2015 at 6:43 PM, Heikki Linnakangas wrote: > Here is a new version. There are now a fair amount of code changes compared > to the version on github, like the logging and progress information, and a > lot of comment changes. I also improved the documentation. Perhaps this could a direct link to the page of pg_rewind with ? -possibly new, system. Once complete, the primary and standby can be +possibly new, system. The pg_rewind utility can be +used to speed up this process on large clusters. +Once complete, the primary and standby can be Missing a dot a the end of the phrase of this paragraph: + + This option specifies the target data directory that is synchronized + with the source. The target server must shut down cleanly before + running pg_rewind + Compilation of pg_rewind on MSVC is not done. It is still listed as a contrib/ in Mkvcbuild.pm. The compilation of pg_xlogdump fails because inclusion of dbcommands_xlog.h is missing in rmgrdesc.c (patch attached). src/bin/Makefile has not been updated to compile pg_rewind. -- Michael diff --git a/contrib/pg_xlogdump/rmgrdesc.c b/contrib/pg_xlogdump/rmgrdesc.c index 180818d..deb1b8f 100644 --- a/contrib/pg_xlogdump/rmgrdesc.c +++ b/contrib/pg_xlogdump/rmgrdesc.c @@ -23,6 +23,7 @@ #include "access/xlog_internal.h" #include "catalog/storage_xlog.h" #include "commands/dbcommands.h" +#include "commands/dbcommands_xlog.h" #include "commands/sequence.h" #include "commands/tablespace.h" #include "rmgrdesc.h" -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Wed, Jan 14, 2015 at 9:01 PM, Sawada Masahiko wrote: > > Hi all, > > The postgresql.auto.conf is loaded after loading of postgresql.conf > whenever configuration file is loaded or reloaded. > This means that parameter in postgresql.auto.conf is quite high > priority, so the parameter in postgresql.conf does not work at all > even if user set it manually. > > If user want to change stopped postgresql server then user need to > merge two configuration file(postgresql.conf and postgresql.auto.conf) > while maintaining the consistency manually. > I think one way to currently do it is with the help of Alter System .. Reset command. Basically user can check via pg_settings, if the variable belongs to postgresql.auto.conf and he knows already a duplicate copy of same is available in postrgresql.conf and that is the value he want to use, then RESET command could be used to remove the setting from postgresql.auto.conf > From an operational perspective having a written config with duplicate > entries is not good thing. > I think we need to merge two configuration file into one (or more than > one, if it uses like 'include' word) > > The one solution is to add merging tool/commnand which merges two > configuration file while maintaining the consistency. If you want to think of some solution which can make the usage of Alter System more convenient, then we should think only in terms of change/ remove the value in postgresql.auto.conf as that is the place where we have added values programatically. One thought I have in this line is that currently there doesn't seem to be a way to know if the setting has an entry both in postgresql.conf and postgresql.auto.conf, if we can have some way of knowing the same (pg_settings?), then it could be convenient for user to decide if the value in postgresql.auto.conf is useful or not and if it's not useful then use Alter System .. Reset command to remove the same from postgresql.auto.conf. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan wrote: > My immediate observation here is that blocks 2 and 9 have identical > metadata (from their page opaque area), but partially non-matching > data items (however, the number of items on each block is consistent > and correct according to that metadata, as far as it goes). I think > block 9 is supposed to have a right-link to block 5 (block 5 seems to > think so, at least -- its left link is 9). I am mistaken on one detail here - blocks 2 and 9 are actually fully identical. I still have no idea why, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Enforce creation of destination folders for source files in pg_regress (Was: pg_regress writes into source tree)
Hi all, pg_regress will fail with test suites using only source files if the destination folders do not exist in the code tree. This is annoying because this forces to maintain empty folders sql/ and expected/ with a .gitignore ignoring everything. The issue has been discussed here (http://www.postgresql.org/message-id/54935d9d.7010...@dunslane.net) but nobody actually sent a patch, so here is one, and a thread for this discussion. Regards, -- Michael *** a/src/test/regress/pg_regress.c --- b/src/test/regress/pg_regress.c *** *** 496,501 convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c --- 496,502 { char testtablespace[MAXPGPATH]; char indir[MAXPGPATH]; + char result_dir[MAXPGPATH]; struct stat st; int ret; char **name; *** *** 520,525 convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c --- 521,534 /* Error logged in pgfnames */ exit(2); + /* + * Enforce creation of destination directory if it does not exist yet. + * This is useful for tests using only source files. + */ + snprintf(result_dir, MAXPGPATH, "%s/%s", dest_dir, dest_subdir); + if (!directory_exists(result_dir)) + make_directory(result_dir); + snprintf(testtablespace, MAXPGPATH, "%s/testtablespace", outputdir); #ifdef WIN32 *** *** 565,571 convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c /* build the full actual paths to open */ snprintf(prefix, strlen(*name) - 6, "%s", *name); snprintf(srcfile, MAXPGPATH, "%s/%s", indir, *name); ! snprintf(destfile, MAXPGPATH, "%s/%s/%s.%s", dest_dir, dest_subdir, prefix, suffix); infile = fopen(srcfile, "r"); --- 574,580 /* build the full actual paths to open */ snprintf(prefix, strlen(*name) - 6, "%s", *name); snprintf(srcfile, MAXPGPATH, "%s/%s", indir, *name); ! snprintf(destfile, MAXPGPATH, "%s/%s.%s", result_dir, prefix, suffix); infile = fopen(srcfile, "r"); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH CHECK and Column-Level Privileges
Noah, * Noah Misch (n...@leadboat.com) wrote: > On Mon, Jan 12, 2015 at 05:16:40PM -0500, Stephen Frost wrote: > > Alright, here's an updated patch which doesn't return any detail if no > > values are visible or if only a partial key is visible. > > I browsed this patch. There's been no mention of foreign key constraints, but > ri_ReportViolation() deserves similar hardening. If a user has only DELETE > privilege on a PK table, FK violation messages should not leak the PK values. > Modifications to the foreign side are less concerning, since the user will > often know the attempted value; still, I would lock down both sides. Ah, yes, good point. > Please add a comment explaining the safety of each row-emitting message you > haven't changed. For example, the one in refresh_by_match_merge() is safe > because getting there requires MV ownership. Sure. > Instead of duplicating an entire ereport() to change whether you include an > errdetail, use "condition ? errdetail(...) : 0". Yeah, that's a bit nicer, will do. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Improving RLS qual pushdown
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Robert Haas wrote: > > On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed > > wrote: > > > On 14 January 2015 at 13:29, Robert Haas wrote: > > >> One thing they could still leak is the number of times they got > > >> called, and thus possibly the number of unseen rows. Now if the > > >> expressions get constant-folded away that won't be an issue, but a > > >> clever user can probably avoid that. > > > > > > Right now, EXPLAIN ANALYSE can be used to tell you the number of > > > unseen rows. Is that something that people are concerned about, and > > > are there any plans to change it? > > > > Interesting question. I don't know. > > Wasn't this part of the "covert channel" discussion that took place way > before RLS was committed? As I recall, it was argued that such covert > channels are acceptable as long as their bandwidth is low. Yes, it was part of the discussion and no, there's no plans to try and hide row counts in explain analyze, nor to deal with things like unique constraint or foreign key reference violations. There are other areas which need improvement which will help address covert channel activity (better auditing, better control over what actions are allowed to diffferent users when it comes to creating objects, modifying permissions and policies, throttling, etc). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Improving RLS qual pushdown
Robert Haas wrote: > On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed > wrote: > > On 14 January 2015 at 13:29, Robert Haas wrote: > >> One thing they could still leak is the number of times they got > >> called, and thus possibly the number of unseen rows. Now if the > >> expressions get constant-folded away that won't be an issue, but a > >> clever user can probably avoid that. > > > > Right now, EXPLAIN ANALYSE can be used to tell you the number of > > unseen rows. Is that something that people are concerned about, and > > are there any plans to change it? > > Interesting question. I don't know. Wasn't this part of the "covert channel" discussion that took place way before RLS was committed? As I recall, it was argued that such covert channels are acceptable as long as their bandwidth is low. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan wrote: > My immediate observation here is that blocks 2 and 9 have identical > metadata (from their page opaque area), but partially non-matching > data items (however, the number of items on each block is consistent > and correct according to that metadata, as far as it goes). I think > block 9 is supposed to have a right-link to block 5 (block 5 seems to > think so, at least -- its left link is 9). For the convenience of others, here is a version with a normalized data column added (interpreting the datums as little-endian Oids). -- Peter Geoghegan page,itemoffset,ctid,itemlen,nulls,vars,data,normalized data 1,1,"(0,9)",16,f,f,16 0b 00 00 00 00 00 00,2838 1,2,"(2,50)",16,f,f,70 00 00 00 00 00 00 00,112 1,3,"(2,52)",16,f,f,71 00 00 00 00 00 00 00,113 1,4,"(2,24)",16,f,f,ae 00 00 00 00 00 00 00,174 1,5,"(2,25)",16,f,f,af 00 00 00 00 00 00 00,175 1,6,"(2,51)",16,f,f,24 02 00 00 00 00 00 00,548 1,7,"(2,53)",16,f,f,25 02 00 00 00 00 00 00,549 1,8,"(4,66)",16,f,f,3a 03 00 00 00 00 00 00,826 1,9,"(2,55)",16,f,f,3b 03 00 00 00 00 00 00,827 1,10,"(2,56)",16,f,f,3c 03 00 00 00 00 00 00,828 1,11,"(4,15)",16,f,f,70 04 00 00 00 00 00 00,1136 1,12,"(1,31)",16,f,f,71 04 00 00 00 00 00 00,1137 1,13,"(4,14)",16,f,f,bd 04 00 00 00 00 00 00,1213 1,14,"(4,17)",16,f,f,be 04 00 00 00 00 00 00,1214 1,15,"(1,34)",16,f,f,d0 04 00 00 00 00 00 00,1232 1,16,"(1,35)",16,f,f,d1 04 00 00 00 00 00 00,1233 1,17,"(0,2)",16,f,f,df 04 00 00 00 00 00 00,1247 1,18,"(3,10)",16,f,f,e1 04 00 00 00 00 00 00,1249 1,19,"(3,11)",16,f,f,e7 04 00 00 00 00 00 00,1255 1,20,"(7,6)",16,f,f,eb 04 00 00 00 00 00 00,1259 1,21,"(2,23)",16,f,f,ec 04 00 00 00 00 00 00,1260 1,22,"(4,16)",16,f,f,ed 04 00 00 00 00 00 00,1261 1,23,"(4,12)",16,f,f,ee 04 00 00 00 00 00 00,1262 1,24,"(4,64)",16,f,f,89 05 00 00 00 00 00 00,1417 1,25,"(3,31)",16,f,f,8a 05 00 00 00 00 00 00,1418 1,26,"(1,48)",16,f,f,8b 08 00 00 00 00 00 00,2187 1,27,"(4,63)",16,f,f,18 09 00 00 00 00 00 00,2328 1,28,"(0,64)",16,f,f,20 09 00 00 00 00 00 00,2336 1,29,"(0,65)",16,f,f,21 09 00 00 00 00 00 00,2337 1,30,"(4,18)",16,f,f,5c 09 00 00 00 00 00 00,2396 1,31,"(1,11)",16,f,f,5d 09 00 00 00 00 00 00,2397 1,32,"(4,40)",16,f,f,28 0a 00 00 00 00 00 00,2600 1,33,"(3,47)",16,f,f,29 0a 00 00 00 00 00 00,2601 1,34,"(3,48)",16,f,f,2a 0a 00 00 00 00 00 00,2602 1,35,"(3,49)",16,f,f,2b 0a 00 00 00 00 00 00,2603 1,36,"(3,41)",16,f,f,2c 0a 00 00 00 00 00 00,2604 1,37,"(4,45)",16,f,f,2d 0a 00 00 00 00 00 00,2605 1,38,"(3,42)",16,f,f,2e 0a 00 00 00 00 00 00,2606 1,39,"(4,48)",16,f,f,2f 0a 00 00 00 00 00 00,2607 1,40,"(4,49)",16,f,f,30 0a 00 00 00 00 00 00,2608 1,41,"(4,44)",16,f,f,31 0a 00 00 00 00 00 00,2609 1,42,"(3,43)",16,f,f,32 0a 00 00 00 00 00 00,2610 1,43,"(7,26)",16,f,f,33 0a 00 00 00 00 00 00,2611 1,44,"(3,50)",16,f,f,34 0a 00 00 00 00 00 00,2612 1,45,"(4,71)",16,f,f,35 0a 00 00 00 00 00 00,2613 1,46,"(4,47)",16,f,f,37 0a 00 00 00 00 00 00,2615 1,47,"(3,46)",16,f,f,38 0a 00 00 00 00 00 00,2616 1,48,"(3,44)",16,f,f,39 0a 00 00 00 00 00 00,2617 1,49,"(4,41)",16,f,f,3a 0a 00 00 00 00 00 00,2618 1,50,"(0,1)",16,f,f,3b 0a 00 00 00 00 00 00,2619 1,51,"(4,42)",16,f,f,3c 0a 00 00 00 00 00 00,2620 1,52,"(0,61)",16,f,f,5a 0a 00 00 00 00 00 00,2650 1,53,"(0,54)",16,f,f,5b 0a 00 00 00 00 00 00,2651 1,54,"(0,55)",16,f,f,5c 0a 00 00 00 00 00 00,2652 1,55,"(0,56)",16,f,f,5d 0a 00 00 00 00 00 00,2653 1,56,"(0,57)",16,f,f,5e 0a 00 00 00 00 00 00,2654 1,57,"(0,59)",16,f,f,5f 0a 00 00 00 00 00 00,2655 1,58,"(0,50)",16,f,f,60 0a 00 00 00 00 00 00,2656 1,59,"(7,14)",16,f,f,61 0a 00 00 00 00 00 00,2657 1,60,"(0,42)",16,f,f,62 0a 00 00 00 00 00 00,2658 1,61,"(0,43)",16,f,f,63 0a 00 00 00 00 00 00,2659 1,62,"(0,68)",16,f,f,64 0a 00 00 00 00 00 00,2660 1,63,"(0,69)",16,f,f,65 0a 00 00 00 00 00 00,2661 1,64,"(7,3)",16,f,f,66 0a 00 00 00 00 00 00,2662 1,65,"(7,4)",16,f,f,67 0a 00 00 00 00 00 00,2663 1,66,"(0,53)",16,f,f,68 0a 00 00 00 00 00 00,2664 1,67,"(7,22)",16,f,f,69 0a 00 00 00 00 00 00,2665 1,68,"(7,23)",16,f,f,6a 0a 00 00 00 00 00 00,2666 1,69,"(7,24)",16,f,f,6b 0a 00 00 00 00 00 00,2667 1,70,"(1,72)",16,f,f,6c 0a 00 00 00 00 00 00,2668 1,71,"(1,73)",16,f,f,6d 0a 00 00 00 00 00 00,2669 1,72,"(1,74)",16,f,f,6e 0a 00 00 00 00 00 00,2670 1,73,"(1,6)",16,f,f,6f 0a 00 00 00 00 00 00,2671 1,74,"(1,7)",16,f,f,70 0a 00 00 00 00 00 00,2672 1,75,"(1,75)",16,f,f,71 0a 00 00 00 00 00 00,2673 1,76,"(1,76)",16,f,f,72 0a 00 00 00 00 00 00,2674 1,77,"(1,67)",16,f,f,73 0a 00 00 00 00 00 00,2675 1,78,"(0,40)",16,f,f,74 0a 00 00 00 00 00 00,2676 1,79,"(0,41)",16,f,f,75 0a 00 00 00 00 00 00,2677 1,80,"(1,50)",16,f,f,76 0a 00 00 00 00 00 00,2678 1,81,"(1,51)",16,f,f,77 0a 00 00 00 00 00 00,2679 1,82,"(1,49)",16,f,f,78 0a 00 00 00 00 00 00,2680 1,83,"(1,58)",16,f,f,79 0a 00 00 00 00 00 00,2681 1,84,"(1,59)",16,f,f,7a 0a 00 00 00 00 00 00,2682 1,85,"(1,89)",16,f,f,7b 0a 00 00 00 00 00 00,2683 1,86,"(1,70)",16,f,f,7c 0a 00 00 00 00 00 00,2684 1,87,"(1,71)",16,f,f,7d 0a 00 00 00 00 00 00,2685 1,88
Re: [HACKERS] Safe memory allocation functions
Robert Haas wrote: > On Tue, Jan 13, 2015 at 10:10 AM, Tom Lane wrote: > > However, there is a larger practical problem with this whole concept, > > which is that experience should teach us to be very wary of the assumption > > that asking for memory the system can't give us will just lead to nice > > neat malloc-returns-NULL behavior. Any small perusal of the mailing list > > archives will remind you that very often the end result will be SIGSEGV, > > OOM kills, unrecoverable trap-on-write when the kernel realizes it can't > > honor a copy-on-write promise, yadda yadda. Agreed that it's arguable > > that these only occur in misconfigured systems ... but misconfiguration > > appears to be the default in a depressingly large fraction of systems. > > (This is another reason for "_safe" not being the mot juste :-() > > I don't really buy this. It's pretty incredible to think that after a > malloc() failure there is absolutely no hope of carrying on sanely. > If that were true, we wouldn't be able to ereport() out-of-memory > errors at any severity less than FATAL, but of course it doesn't work > that way. Moreover, AllocSetAlloc() contains malloc() and, if that > fails, calls malloc() again with a smaller value, without even > throwing an error. I understood Tom's point differently: instead of malloc() failing, malloc() will return a supposedly usable pointer, but later usage of it will lead to a crash of some sort. We know this does happen in reality, because people do report it; but we also know how to fix it. And for systems that have been correctly set up, the new behavior (using some plan B for when malloc actually fails instead of spuriously succeeding only to cause a later crash) will be much more convenient. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Out-of-bounds write and incorrect detection of trigger file in pg_standby
On Thu, Jan 15, 2015 at 7:13 AM, Robert Haas wrote: > Instead of doing this: > > if (len < sizeof(buf)) > buf[len] = '\0'; > > ...I would suggest making the size of the buffer one greater than the > size of the read(), and then always nul-terminating the buffer. It > seems to me that would make the code easier to reason about. How about the attached then? This way we still detect the same way any invalid values: - if ((len = read(fd, buf, sizeof(buf))) < 0) + if ((len = read(fd, buf, sizeof(buf) - 1)) < 0) Regards, -- Michael diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c index d6b1692..2f9f2b4 100644 --- a/contrib/pg_standby/pg_standby.c +++ b/contrib/pg_standby/pg_standby.c @@ -418,7 +418,7 @@ CheckForExternalTrigger(void) return; } - if ((len = read(fd, buf, sizeof(buf))) < 0) + if ((len = read(fd, buf, sizeof(buf) - 1)) < 0) { fprintf(stderr, "WARNING: could not read \"%s\": %s\n", triggerPath, strerror(errno)); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Partitioning: issues/ideas (Was: Re: [HACKERS] On partitioning)
On 06-01-2015 PM 03:40, Amit Langote wrote: > > I agree that while we are discussing these points, we could also be > discussing how we solve problems of existing partitioning implementation > using whatever the above things end up being. Proposed approaches to > solve those problems might be useful to drive the first step as well or > perhaps that's how it should be done anyway. > I realize the discussion has not quite brought us to *conclusions* so far though surely there has been valuable input from people. Anyway, starting a new thread with the summary of what has been (please note that the order of listing the points does not necessarily connote the priority): * It has been repeatedly pointed out that we may want to decouple partitioning from inheritance because implementing partitioning as an extension of inheritance mechanism means that we have to keep all the existing semantics which might limit what we want to do with the special case of it which is partitioning; in other words, we would find ourselves in difficult position where we have to inject a special case code into a very generalized mechanism that is inheritance. Specifically, do we regard a partitions as pg_inherits children of its partitioning parent? * Syntax: do we want to make it similar to one of the many other databases out there? Or we could invent our own? I like the syntax that Robert suggested that covers the cases of RANGE and LIST partitioning without actually having to use those keywords explicitly; something like the following: CREATE TABLE parent PARTITION ON (column [ USING opclass ] [, ... ]); CREATE TABLE child PARTITION OF parent_name FOR VALUES { (value, ...) [ TO (value, ...) ] } So instead of making a hard distinction between range and list partitioning, you can say: CREATE TABLE child_name PARTITION OF parent_name FOR VALUES (3, 5, 7); wherein, child is effectively a LIST partition CREATE TABLE child PARTITION OF parent_name FOR VALUES (8) TO (12); wherein, child is effectively a RANGE partition on one column CREATE TABLE child PARTITION OF parent_name FOR VALUES(20, 120) TO (30, 130); wherein, child is effectively a RANGE partition on two columns I wonder if we could add a clause like DISTRIBUTED BY to complement PARTITION ON that represents a hash distributed/partitioned table (that could be a syntax to support sharded tables maybe; we would definitely want to move ahead in that direction I guess) * Catalog: We would like to have a catalog structure suitable to implement capabilities like multi-column partitioning, sub-partitioning (with arbitrary number of levels in the hierarchy). I had suggested that we create two new catalogs viz. pg_partitioned_rel, pg_partition_def to store metadata about a partition key of a partitioned relation and partition bound info of a partition, respectively. Also, see the point about on-disk representation of partition bounds * It is desirable to treat partitions as pg_class relations with perhaps a new relkind(s). We may want to choose an implementation where only the lowest level relations in a partitioning hierarchy have storage; those at the upper layers are mere placeholder relations though of course with associated constraints determined by partitioning criteria (with appropriate metadata entered into the additional catalogs). I am not quite sure if each kind of the relations involved in the partitioning scheme have separate namespaces and, if they are, how we implement that * In the initial implementation, we could just live with partitioning on a set of columns (and not arbitrary expressions of them) * We perhaps do not need multi-column LIST partitions as they are not very widely used and may complicate the implementation * There are a number of suggestions about how we represent partition bounds (on-disk) - pg_node_tree, RECORD (a composite type or the rowtype associated with the relation itself), etc. Important point to consider here may be that partition key may contain more than one column * How we represent partition definition in memory (for a given partitioned relation) - important point to remember is that such a representation should be efficient to iterate through or binary-searchable. Also see the points about tuple-routing and partition-pruning * Overflow/catchall partition: it seems we do not want/need them. It might seem desirable for example in cases where a big transaction enters a large number of tuples all but one of which find a defined partition; we may not want to error out in such case but instead enter that erring tuple into the overflow partition instead. If we choose to implement that, we would like to also implement the capability to move the tuples into the appropriate partition once it's defined. Related is the notion of automatically creating partitions if one is not already defined for a just entered tuple; but there may be locking troubles if many concurrent sessions try to do that * Tuple-routing: based on the intern
Re: [HACKERS] Parallel Seq Scan
On 1/13/15 9:42 PM, Amit Kapila wrote: As an example one of the the strategy could be if the table size is X MB and there are 8 workers, then divide the work as X/8 MB for each worker (which I have currently used in patch) and another could be each worker does scan 1 block at a time and then check some global structure to see which next block it needs to scan, according to me this could lead to random scan. I have read that some other databases also divide the work based on partitions or segments (size of segment is not very clear). Long-term I think we'll want a mix between the two approaches. Simply doing something like blkno % num_workers is going to cause imbalances, but trying to do this on a per-block basis seems like too much overhead. Also long-term, I think we also need to look at a more specialized version of parallelism at the IO layer. For example, during an index scan you'd really like to get IO requests for heap blocks started in the background while the backend is focused on the mechanics of the index scan itself. While this could be done with the stuff Robert has written I have to wonder if it'd be a lot more efficient to use fadvise or AIO. Or perhaps it would just be better to deal with an entire index page (remembering TIDs) and then hit the heap. But I agree with Robert; there's a lot yet to be done just to get *any* kind of parallel execution working before we start thinking about how to optimize it. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typo fix in alter_table.sgml
Robert Haas wrote: > But that's not a "typo" as stated in $SUBJECT but rather a "markup fix". Definitely. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 4:53 PM, Merlin Moncure wrote: > yeah. via: > cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index', > s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv > header; My immediate observation here is that blocks 2 and 9 have identical metadata (from their page opaque area), but partially non-matching data items (however, the number of items on each block is consistent and correct according to that metadata, as far as it goes). I think block 9 is supposed to have a right-link to block 5 (block 5 seems to think so, at least -- its left link is 9). So now the question is: how did that inconsistency arise? It didn't necessarily arise at the time of the (presumed) split of block 2 to create 9. It could be that the opaque area was changed by something else, some time later. I'll investigate more. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
On 01/15/2015 02:24 AM, Robert Haas wrote: > I think your idea of adding read-only GUCs with the same names as all > of the recovey.conf parameters is a clear win. Even if we do nothing > more than that, it makes the values visible from the SQL level, and > that's good. But I think we should go further and make them really be > GUCs. Otherwise, if you want to be able to change one of those > parameters other than at server startup time, you've got to invent a > separate system for reloading them on SIGHUP. If you make them part > of the GUC mechanism, you can use that. I think it's quite safe to > say that the whole reason we *have* a GUC mechanism is so that all of > our configuration can be done through one grand, unified mechanism > rather than being fragmented across many files. +1 I do find it ironic that the creator of the Grand Unified Configuration Settings is arguing against unifying the files. > Some renaming might be in order. Heikki previously suggested merging > all of the recovery_target_whatever settings down into a single > parameter recovery_target='kindofrecoverytarget furtherdetailsgohere', > like recovery_target='xid 1234' or recovery_target='name bob'. Maybe > that would be more clear (or not). Not keen on non-atomic settings, personally. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 6:50 PM, Peter Geoghegan wrote: > This is great, but it's not exactly clear which bt_page_items() page > is which - some are skipped, but I can't be sure which. Would you mind > rewriting that query to indicate which block is under consideration by > bt_page_items()? yeah. via: cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index', s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv header; merlin page,itemoffset,ctid,itemlen,nulls,vars,data 1,1,"(0,9)",16,f,f,16 0b 00 00 00 00 00 00 1,2,"(2,50)",16,f,f,70 00 00 00 00 00 00 00 1,3,"(2,52)",16,f,f,71 00 00 00 00 00 00 00 1,4,"(2,24)",16,f,f,ae 00 00 00 00 00 00 00 1,5,"(2,25)",16,f,f,af 00 00 00 00 00 00 00 1,6,"(2,51)",16,f,f,24 02 00 00 00 00 00 00 1,7,"(2,53)",16,f,f,25 02 00 00 00 00 00 00 1,8,"(4,66)",16,f,f,3a 03 00 00 00 00 00 00 1,9,"(2,55)",16,f,f,3b 03 00 00 00 00 00 00 1,10,"(2,56)",16,f,f,3c 03 00 00 00 00 00 00 1,11,"(4,15)",16,f,f,70 04 00 00 00 00 00 00 1,12,"(1,31)",16,f,f,71 04 00 00 00 00 00 00 1,13,"(4,14)",16,f,f,bd 04 00 00 00 00 00 00 1,14,"(4,17)",16,f,f,be 04 00 00 00 00 00 00 1,15,"(1,34)",16,f,f,d0 04 00 00 00 00 00 00 1,16,"(1,35)",16,f,f,d1 04 00 00 00 00 00 00 1,17,"(0,2)",16,f,f,df 04 00 00 00 00 00 00 1,18,"(3,10)",16,f,f,e1 04 00 00 00 00 00 00 1,19,"(3,11)",16,f,f,e7 04 00 00 00 00 00 00 1,20,"(7,6)",16,f,f,eb 04 00 00 00 00 00 00 1,21,"(2,23)",16,f,f,ec 04 00 00 00 00 00 00 1,22,"(4,16)",16,f,f,ed 04 00 00 00 00 00 00 1,23,"(4,12)",16,f,f,ee 04 00 00 00 00 00 00 1,24,"(4,64)",16,f,f,89 05 00 00 00 00 00 00 1,25,"(3,31)",16,f,f,8a 05 00 00 00 00 00 00 1,26,"(1,48)",16,f,f,8b 08 00 00 00 00 00 00 1,27,"(4,63)",16,f,f,18 09 00 00 00 00 00 00 1,28,"(0,64)",16,f,f,20 09 00 00 00 00 00 00 1,29,"(0,65)",16,f,f,21 09 00 00 00 00 00 00 1,30,"(4,18)",16,f,f,5c 09 00 00 00 00 00 00 1,31,"(1,11)",16,f,f,5d 09 00 00 00 00 00 00 1,32,"(4,40)",16,f,f,28 0a 00 00 00 00 00 00 1,33,"(3,47)",16,f,f,29 0a 00 00 00 00 00 00 1,34,"(3,48)",16,f,f,2a 0a 00 00 00 00 00 00 1,35,"(3,49)",16,f,f,2b 0a 00 00 00 00 00 00 1,36,"(3,41)",16,f,f,2c 0a 00 00 00 00 00 00 1,37,"(4,45)",16,f,f,2d 0a 00 00 00 00 00 00 1,38,"(3,42)",16,f,f,2e 0a 00 00 00 00 00 00 1,39,"(4,48)",16,f,f,2f 0a 00 00 00 00 00 00 1,40,"(4,49)",16,f,f,30 0a 00 00 00 00 00 00 1,41,"(4,44)",16,f,f,31 0a 00 00 00 00 00 00 1,42,"(3,43)",16,f,f,32 0a 00 00 00 00 00 00 1,43,"(7,26)",16,f,f,33 0a 00 00 00 00 00 00 1,44,"(3,50)",16,f,f,34 0a 00 00 00 00 00 00 1,45,"(4,71)",16,f,f,35 0a 00 00 00 00 00 00 1,46,"(4,47)",16,f,f,37 0a 00 00 00 00 00 00 1,47,"(3,46)",16,f,f,38 0a 00 00 00 00 00 00 1,48,"(3,44)",16,f,f,39 0a 00 00 00 00 00 00 1,49,"(4,41)",16,f,f,3a 0a 00 00 00 00 00 00 1,50,"(0,1)",16,f,f,3b 0a 00 00 00 00 00 00 1,51,"(4,42)",16,f,f,3c 0a 00 00 00 00 00 00 1,52,"(0,61)",16,f,f,5a 0a 00 00 00 00 00 00 1,53,"(0,54)",16,f,f,5b 0a 00 00 00 00 00 00 1,54,"(0,55)",16,f,f,5c 0a 00 00 00 00 00 00 1,55,"(0,56)",16,f,f,5d 0a 00 00 00 00 00 00 1,56,"(0,57)",16,f,f,5e 0a 00 00 00 00 00 00 1,57,"(0,59)",16,f,f,5f 0a 00 00 00 00 00 00 1,58,"(0,50)",16,f,f,60 0a 00 00 00 00 00 00 1,59,"(7,14)",16,f,f,61 0a 00 00 00 00 00 00 1,60,"(0,42)",16,f,f,62 0a 00 00 00 00 00 00 1,61,"(0,43)",16,f,f,63 0a 00 00 00 00 00 00 1,62,"(0,68)",16,f,f,64 0a 00 00 00 00 00 00 1,63,"(0,69)",16,f,f,65 0a 00 00 00 00 00 00 1,64,"(7,3)",16,f,f,66 0a 00 00 00 00 00 00 1,65,"(7,4)",16,f,f,67 0a 00 00 00 00 00 00 1,66,"(0,53)",16,f,f,68 0a 00 00 00 00 00 00 1,67,"(7,22)",16,f,f,69 0a 00 00 00 00 00 00 1,68,"(7,23)",16,f,f,6a 0a 00 00 00 00 00 00 1,69,"(7,24)",16,f,f,6b 0a 00 00 00 00 00 00 1,70,"(1,72)",16,f,f,6c 0a 00 00 00 00 00 00 1,71,"(1,73)",16,f,f,6d 0a 00 00 00 00 00 00 1,72,"(1,74)",16,f,f,6e 0a 00 00 00 00 00 00 1,73,"(1,6)",16,f,f,6f 0a 00 00 00 00 00 00 1,74,"(1,7)",16,f,f,70 0a 00 00 00 00 00 00 1,75,"(1,75)",16,f,f,71 0a 00 00 00 00 00 00 1,76,"(1,76)",16,f,f,72 0a 00 00 00 00 00 00 1,77,"(1,67)",16,f,f,73 0a 00 00 00 00 00 00 1,78,"(0,40)",16,f,f,74 0a 00 00 00 00 00 00 1,79,"(0,41)",16,f,f,75 0a 00 00 00 00 00 00 1,80,"(1,50)",16,f,f,76 0a 00 00 00 00 00 00 1,81,"(1,51)",16,f,f,77 0a 00 00 00 00 00 00 1,82,"(1,49)",16,f,f,78 0a 00 00 00 00 00 00 1,83,"(1,58)",16,f,f,79 0a 00 00 00 00 00 00 1,84,"(1,59)",16,f,f,7a 0a 00 00 00 00 00 00 1,85,"(1,89)",16,f,f,7b 0a 00 00 00 00 00 00 1,86,"(1,70)",16,f,f,7c 0a 00 00 00 00 00 00 1,87,"(1,71)",16,f,f,7d 0a 00 00 00 00 00 00 1,88,"(1,56)",16,f,f,7e 0a 00 00 00 00 00 00 1,89,"(1,57)",16,f,f,7f 0a 00 00 00 00 00 00 1,90,"(1,52)",16,f,f,80 0a 00 00 00 00 00 00 1,91,"(1,53)",16,f,f,81 0a 00 00 00 00 00 00 1,92,"(1,43)",16,f,f,82 0a 00 00 00 00 00 00 1,93,"(7,1)",16,f,f,83 0a 00 00 00 00 00 00 1,94,"(1,61)",16,f,f,84 0a 00 00 00 00 00 00 1,95,"(1,62)",16,f,f,85 0a 00 00 00 00 00 00 1,96,"(0,29)",16,f,f,86 0a 00 00 00 00 00 00 1,97,"(0,30)",16,f,f,87 0a 00 00 00 00 00 00 1,98,"(6,35)",16,f,f,88 0a 00 00 00 00 00 00 1,99,"(1,36)",16,f,f,89 0a 00 00 00 00 00 00 1,100,"(1,37)",16,f,f,8a 0a 00 00 00 00 00 00 1,101,"(1,63)",
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
This is great, but it's not exactly clear which bt_page_items() page is which - some are skipped, but I can't be sure which. Would you mind rewriting that query to indicate which block is under consideration by bt_page_items()? Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shouldn't CREATE TABLE LIKE copy the relhasoids property?
Andrew Dunstan writes: > On 01/14/2015 07:29 PM, Tom Lane wrote: >> If you don't find that problematic, how about this case? >> >> regression=# create table src2 (f1 int, primary key(oid)) with oids; >> CREATE TABLE >> regression=# create table dst2 (like src2 including indexes); >> ERROR: column "oid" named in key does not exist > I agree it's odd, and probably wrong, although it's been like that for a > very long time, hasn't it? Sure, LIKE has always behaved this way. It still seems wrong though. As a reference point, creating a table that inherits from src1 or src2 will result in it having oids (even if you say WITHOUT OIDS). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shouldn't CREATE TABLE LIKE copy the relhasoids property?
On 01/14/2015 07:29 PM, Tom Lane wrote: dst1 doesn't get an OID column: regression=# create table src1 (f1 int) with oids; CREATE TABLE regression=# create table dst1 (like src1); CREATE TABLE regression=# \d+ src1 Table "public.src1" Column | Type | Modifiers | Storage | Stats target | Description +-+---+-+--+- f1 | integer | | plain | | Has OIDs: yes regression=# \d+ dst1 Table "public.dst1" Column | Type | Modifiers | Storage | Stats target | Description +-+---+-+--+- f1 | integer | | plain | | If you don't find that problematic, how about this case? regression=# create table src2 (f1 int, primary key(oid)) with oids; CREATE TABLE regression=# create table dst2 (like src2 including indexes); ERROR: column "oid" named in key does not exist I agree it's odd, and probably wrong, although it's been like that for a very long time, hasn't it? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 6:26 PM, Merlin Moncure wrote: > On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan wrote: >> On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure wrote: >>> (gdb) print BufferGetBlockNumber(buf) >>> $15 = 9 >>> >>> ..and it stays 9, continuing several times having set breakpoint. >> >> >> And the index involved? I'm pretty sure that this in an internal page, no? > > The index is the oid index on pg_class. Some more info: > > *) temp table churn is fairly high. Several dozen get spawned and > destroted at the start of a replication run, all at once, due to some > dodgy coding via dblink. During the replication run, the temp table > churn rate drops. > > *) running btreecheck, I see: > cds2=# select bt_index_verify('pg_class_oid_index'); > NOTICE: page 7 of index "pg_class_oid_index" is deleted > NOTICE: page 10 of index "pg_class_oid_index" is deleted > NOTICE: page 12 of index "pg_class_oid_index" is deleted > bt_index_verify > ─ > > > cds2=# select bt_leftright_verify('pg_class_oid_index'); > WARNING: left link/right link pair don't comport at level 0, block 9, > last: 2, current left: 4 > WARNING: left link/right link pair don't comport at level 0, block 9, > last: 9, current left: 4 > WARNING: left link/right link pair don't comport at level 0, block 9, > last: 9, current left: 4 > WARNING: left link/right link pair don't comport at level 0, block 9, > last: 9, current left: 4 > WARNING: left link/right link pair don't comport at level 0, block 9, > last: 9, current left: 4 > [repeat infinity until cancel] > > which looks like the index is corrupted? ISTM _bt_moveright is > hanging because it's trying to move from block 9 to block 9 and so > loops forever. per Peter the following might be useful: cds2=# select * from bt_metap('pg_class_oid_index'); magic │ version │ root │ level │ fastroot │ fastlevel ┼─┼──┼───┼──┼─── 340322 │ 2 │3 │ 1 │3 │ 1 cds2=# select (bt_page_stats('pg_class_oid_index', s)).* from generate_series(1,12) s; blkno │ type │ live_items │ dead_items │ avg_item_size │ page_size │ free_size │ btpo_prev │ btpo_next │ btpo │ btpo_flags ───┼──┼┼┼───┼───┼───┼───┼───┼───┼ 1 │ l│119 │ 0 │16 │ 8192 │ 5768 │ 0 │ 4 │ 0 │ 1 2 │ l│ 25 │ 0 │16 │ 8192 │ 7648 │ 4 │ 9 │ 0 │ 1 3 │ r│ 8 │ 0 │15 │ 8192 │ 7996 │ 0 │ 0 │ 1 │ 2 4 │ l│178 │ 0 │16 │ 8192 │ 4588 │ 1 │ 2 │ 0 │ 1 5 │ l│ 7 │ 0 │16 │ 8192 │ 8008 │ 9 │11 │ 0 │ 1 6 │ l│ 5 │ 0 │16 │ 8192 │ 8048 │11 │ 8 │ 0 │ 1 7 │ d│ 0 │ 0 │ 0 │ 8192 │ 0 │-1 │-1 │ 12366 │ 0 8 │ l│187 │ 0 │16 │ 8192 │ 4408 │ 6 │ 0 │ 0 │ 1 9 │ l│ 25 │ 0 │16 │ 8192 │ 7648 │ 4 │ 9 │ 0 │ 1 10 │ d│ 0 │ 0 │ 0 │ 8192 │ 0 │-1 │-1 │ 12366 │ 0 11 │ l│ 6 │ 0 │16 │ 8192 │ 8028 │ 5 │ 6 │ 0 │ 1 12 │ d│ 0 │ 0 │ 0 │ 8192 │ 0 │-1 │-1 │ 10731 │ 0 merlin cds2=# select (bt_page_items('pg_class_oid_index', s)).* from generate_series(1,12) s; NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted itemoffset │ ctid │ itemlen │ nulls │ vars │ data ┼┼─┼───┼──┼─ 1 │ (0,9) │ 16 │ f │ f│ 16 0b 00 00 00 00 00 00 2 │ (2,50) │ 16 │ f │ f│ 70 00 00 00 00 00 00 00 3 │ (2,52) │ 16 │ f │ f│ 71 00 00 00 00 00 00 00 4 │ (2,24) │ 16 │ f │ f│ ae 00 00 00 00 00 00 00 5 │ (2,25) │ 16 │ f │ f│ af 00 00 00 00 00 00 00 6 │ (2,51) │ 16 │ f │ f│ 24
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 4:26 PM, Merlin Moncure wrote: > The index is the oid index on pg_class. Some more info: > > *) temp table churn is fairly high. Several dozen get spawned and > destroted at the start of a replication run, all at once, due to some > dodgy coding via dblink. During the replication run, the temp table > churn rate drops. > > *) running btreecheck, I see: I knew I wrote that tool for a reason. :-) It would be great if I could get a raw dump of the index using contrib/pageinspect at this juncture too, as already described. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] s_lock.h default definitions are rather confused
Andres Freund writes: > Right now I think a #ifdef/undef S_UNLOCK in the relevant gcc section > sufficient and acceptable. It's after all the HPPA section that doesn't > really play by the rules. Works for me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Shouldn't CREATE TABLE LIKE copy the relhasoids property?
dst1 doesn't get an OID column: regression=# create table src1 (f1 int) with oids; CREATE TABLE regression=# create table dst1 (like src1); CREATE TABLE regression=# \d+ src1 Table "public.src1" Column | Type | Modifiers | Storage | Stats target | Description +-+---+-+--+- f1 | integer | | plain | | Has OIDs: yes regression=# \d+ dst1 Table "public.dst1" Column | Type | Modifiers | Storage | Stats target | Description +-+---+-+--+- f1 | integer | | plain | | If you don't find that problematic, how about this case? regression=# create table src2 (f1 int, primary key(oid)) with oids; CREATE TABLE regression=# create table dst2 (like src2 including indexes); ERROR: column "oid" named in key does not exist regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan wrote: > On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure wrote: >> (gdb) print BufferGetBlockNumber(buf) >> $15 = 9 >> >> ..and it stays 9, continuing several times having set breakpoint. > > > And the index involved? I'm pretty sure that this in an internal page, no? The index is the oid index on pg_class. Some more info: *) temp table churn is fairly high. Several dozen get spawned and destroted at the start of a replication run, all at once, due to some dodgy coding via dblink. During the replication run, the temp table churn rate drops. *) running btreecheck, I see: cds2=# select bt_index_verify('pg_class_oid_index'); NOTICE: page 7 of index "pg_class_oid_index" is deleted NOTICE: page 10 of index "pg_class_oid_index" is deleted NOTICE: page 12 of index "pg_class_oid_index" is deleted bt_index_verify ─ cds2=# select bt_leftright_verify('pg_class_oid_index'); WARNING: left link/right link pair don't comport at level 0, block 9, last: 2, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 [repeat infinity until cancel] which looks like the index is corrupted? ISTM _bt_moveright is hanging because it's trying to move from block 9 to block 9 and so loops forever. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure wrote: > (gdb) print BufferGetBlockNumber(buf) > $15 = 9 > > ..and it stays 9, continuing several times having set breakpoint. And the index involved? I'm pretty sure that this in an internal page, no? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 2:32 PM, Peter Geoghegan wrote: > On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan wrote: >> Could you write some code to print out the block number (i.e. >> "BlockNumber blkno") if there are more than, say, 5 retries within >> _bt_moveright()? > > Obviously I mean that the block number should be printed, no matter > whether or not the P_INCOMPLETE_SPLIT() path is taken or not. So you > should just move this to the top of the for(;;) loop, so it's always > available to print: > > BlockNumber blkno = BufferGetBlockNumber(buf); (gdb) print BufferGetBlockNumber(buf) $15 = 9 ..and it stays 9, continuing several times having set breakpoint. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
> On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai wrote: > > When custom-scan node replaced a join-plan, it shall have at least two > > child plan-nodes. The callback handler of PlanCustomPath needs to be > > able to call create_plan_recurse() to transform the underlying > > path-nodes to plan-nodes, because this custom-scan node may take other > > built-in scan or sub-join nodes as its inner/outer input. > > In case of FDW, it shall kick any underlying scan relations to remote > > side, thus we may not expect ForeignScan has underlying plans... > > Do you have an example of this? > Yes, even though full code set is too large for patch submission... https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880 This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin. It takes GpuHashJoinPath inherited from CustomPath that has multiple underlying scan/join paths. Once it is called back from the backend, it also calls create_plan_recurse() to make inner/outer plan nodes according to the paths. In the result, we can see the following query execution plan that CustomScan takes underlying scan plans. postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2; QUERY PLAN -- Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922 width=143) Hash clause 1: (aid = aid) Hash clause 2: (bid = bid) Bulkload: On -> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=409 width=77) -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=4 width=37) hash keys: aid nBatches: 1 Buckets: 46000 Memory Usage: 99.99% -> Seq Scan on t1 (cost=0.00..734.00 rows=4 width=37) -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=4 width=37) hash keys: bid nBatches: 1 Buckets: 46000 Memory Usage: 49.99% -> Seq Scan on t2 (cost=0.00..734.00 rows=4 width=37) (13 rows) Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > -Original Message- > From: Robert Haas [mailto:robertmh...@gmail.com] > Sent: Thursday, January 15, 2015 2:07 AM > To: Kaigai Kouhei(海外 浩平) > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] > Custom Plan API) > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai wrote: > > When custom-scan node replaced a join-plan, it shall have at least two > > child plan-nodes. The callback handler of PlanCustomPath needs to be > > able to call create_plan_recurse() to transform the underlying > > path-nodes to plan-nodes, because this custom-scan node may take other > > built-in scan or sub-join nodes as its inner/outer input. > > In case of FDW, it shall kick any underlying scan relations to remote > > side, thus we may not expect ForeignScan has underlying plans... > > Do you have an example of this? > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] s_lock.h default definitions are rather confused
On 2015-01-14 12:27:42 -0500, Robert Haas wrote: > On Mon, Jan 12, 2015 at 12:57 PM, Andres Freund > wrote: > > On 2015-01-10 23:03:36 +0100, Andres Freund wrote: > >> On 2015-01-10 16:09:42 -0500, Tom Lane wrote: > >> > I've not tried to build HEAD on my HPPA dinosaur for awhile, but I did > >> > just now, and I am presented with boatloads of this: > >> > > >> > ../../../src/include/storage/s_lock.h:759: warning: `S_UNLOCK' redefined > >> > ../../../src/include/storage/s_lock.h:679: warning: this is the location > >> > of the previous definition > >> > > >> > which is not too surprising because the "default" definition at line 679 > >> > precedes the HPPA-specific one at line 759. > >> > >> That's 0709b7ee72e4bc71ad07b7120acd117265ab51d0. > >> > >> Not too surprising that it broke and wasn't noticed without access to > >> hppa - the hppa code uses gcc inline assembly outside of the big > >> defined(__GNUC__) and inside the section headed "Platforms that use > >> non-gcc inline assembly". > >> > >> > I'm not particularly interested in untangling the effects of the recent > >> > hackery in s_lock.h enough to figure out how the overall structure got > >> > broken, but I trust one of you will clean up the mess. > >> > >> I think it's easiest solved by moving the gcc inline assembly up to the > >> rest of the gcc inline assembly. That'll require duplicating a couple > >> lines, but seems easier to understand nonetheless. Not pretty. > > > > Robert, do you have a better idea? > > How about hoisting the entire hppa section up to the top of the file, > before the __GNUC__ || __INTEL_COMPILER section? We could do that, but I'd rather not see that uglyness at the top everytime I open the file :P Right now I think a #ifdef/undef S_UNLOCK in the relevant gcc section sufficient and acceptable. It's after all the HPPA section that doesn't really play by the rules. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] __attribute__ for non-gcc compilers
On 2015-01-14 17:46:39 -0500, Robert Haas wrote: > On Wed, Jan 14, 2015 at 5:29 PM, Andres Freund wrote: > > I think it's better than the alternatives: > > > > a) Don't support 64bit atomics on any 32bit platform. I think that'd be > >sad because there's some places that could greatly benefit from being > >able to read/store/manipulate e.g. LSNs atomically. > > b) Double the size of 64bit atomics on 32bit platforms, and add > >TYPEALIGN() to every access inside the atomics implementation. > > c) Require 64 atomics to be aligned appropriately manually in every > >place they're embedded. I think that's completely impractical. > > > > The only viable one imo is a) > > I can't really fault that reasoning, but if __attribute__((align)) > only works on some platforms, then you've got silent, subtle breakage > on the ones where it doesn't. The __attribute__((align()))'s are in compiler specific code sections anyway - and there's asserts ensuring that the alignment is correct in all routines where it matters (IIRC). Those are what caught the problem. C.f. http://archives.postgresql.org/message-id/20150108204635.GK6299%40alap3.anarazel.de I think I'd for now simply not define pg_attribute_aligned() on platforms where it's not supported, instead of defining it empty. If we need a softer variant we can name it pg_attribute_aligned_if_possible or something. Sounds sane? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] __attribute__ for non-gcc compilers
On Wed, Jan 14, 2015 at 5:29 PM, Andres Freund wrote: > I think it's better than the alternatives: > > a) Don't support 64bit atomics on any 32bit platform. I think that'd be >sad because there's some places that could greatly benefit from being >able to read/store/manipulate e.g. LSNs atomically. > b) Double the size of 64bit atomics on 32bit platforms, and add >TYPEALIGN() to every access inside the atomics implementation. > c) Require 64 atomics to be aligned appropriately manually in every >place they're embedded. I think that's completely impractical. > > The only viable one imo is a) I can't really fault that reasoning, but if __attribute__((align)) only works on some platforms, then you've got silent, subtle breakage on the ones where it doesn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] __attribute__ for non-gcc compilers
On 2015-01-14 15:48:47 -0500, Robert Haas wrote: > On Tue, Jan 13, 2015 at 4:18 PM, Oskari Saarenmaa wrote: > > Commit db4ec2ffce35 added alignment attributes for 64-bit atomic > > variables as required on 32-bit platforms using > > __attribute__((aligned(8)). That works fine with GCC, and would work > > with Solaris Studio Compiler [1] and IBM XL C [2], but src/include/c.h > > defines __attribute__ as an empty macro when not using GCC. > > Unfortunately we can't just disable that #define and enable all > > __attributes__ for Solaris CC and XLC as we use a bunch of attributes > > that are not supported by those compilers and using them unconditionally > > would generate a lot of warnings. > > > > Attached a patch that defines custom macros for each attribute and > > enables them individually for compilers that support them and never > > defines __attribute__. > > > > I have tested this with GCC 4.9.2 on Linux x86-64 and Solaris CC 5.12 on > > Sparc (32-bit build); I don't have access to an IBM box with XLC. > > I guess my first question is whether we want to be relying on > __attribute__((aligned)) in the first place. I think it's better than the alternatives: a) Don't support 64bit atomics on any 32bit platform. I think that'd be sad because there's some places that could greatly benefit from being able to read/store/manipulate e.g. LSNs atomically. b) Double the size of 64bit atomics on 32bit platforms, and add TYPEALIGN() to every access inside the atomics implementation. c) Require 64 atomics to be aligned appropriately manually in every place they're embedded. I think that's completely impractical. The only viable one imo is a) Since the atomics code is compiler dependant anyway, I don't much of a problem with relying on compiler specific alignment instructions. Really, the only problem right now is that __attribute__ is defined to be empty on compilers where it has meaning. > If we do, then this seems like a pretty sensible and necessary change. I think it's also an improvment independent from the alignment code. Having a more widely portable __attribute__((noreturn)), __attribute__((unused)), __attribute__((format(..))) seems like a good general improvement. And all of these are supported in other compilers than gcc. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Out-of-bounds write and incorrect detection of trigger file in pg_standby
On Wed, Jan 14, 2015 at 8:24 AM, Michael Paquier wrote: > In pg_standby.c, we use a 32-byte buffer in CheckForExternalTrigger to > which is read the content of the trigger file defined by -f: > CheckForExternalTrigger(void) > { > charbuf[32]; > [...] > if ((len = read(fd, buf, sizeof(buf))) < 0) > { > stuff(); > } > > if (len < sizeof(buf)) > buf[len] = '\0'; > However it happens that if the file read contains 32 bytes or more, we > enforce \0 in a position out of bounds. While looking at that, I also > noticed that pg_standby can trigger a failover as long as the > beginning of buf matches either "smart" or "fast", but I think we > should check for a perfect match instead of a comparison of the first > bytes, no? > > Attached is a patch to fix the out-of-bound issue as well as > improvements for the detection of the trigger file content. I think > that the out-of-bound fix should be backpatched, while the > improvements for the trigger file are master-only. Coverity has > pointed out the out-of-bound issue. I would imagine that the goal might have been to accept not only smart but also smart\r and smart\n and smart\r\n. It's a little weird that we also accept smartypants, though. Instead of doing this: if (len < sizeof(buf)) buf[len] = '\0'; ...I would suggest making the size of the buffer one greater than the size of the read(), and then always nul-terminating the buffer. It seems to me that would make the code easier to reason about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind in contrib
On Wed, Jan 7, 2015 at 8:44 AM, Andrew Dunstan wrote: > I understand, but I think "pg_rewind" is likely to be misleading to many > users who will say "but I don't want just to rewind". > > I'm not wedded to the name I suggested, but I think we should look at > possible alternative names. We do have experience of misleading names > causing confusion (e.g. "initdb"). FWIW, pg_rewind sounds pretty good to me. But maybe I'm just not understanding what the use case for this, apart from rewinding? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
On 12.1.2015 01:28, Ali Akbar wrote: > > > Or else we implement what you suggest below (more comments below): > > > > Thinking about the 'release' flag a bit more - maybe we > could do > > this > > instead: > > > > if (release && astate->private_cxt) > > MemoryContextDelete(astate->mcontext); > > else if (release) > > { > > pfree(astate->dvalues); > > pfree(astate->dnulls); > > pfree(astate); > > } > > > > i.e. either destroy the whole context if possible, and > just free the > > memory when using a shared memory context. But I'm afraid > this would > > penalize the shared memory context, because that's > intended for > > cases where all the build states coexist in parallel and > then at some > > point are all converted into a result and thrown away. > Adding pfree() > > calls is no improvement here, and just wastes cycles. > > > > > > As per Tom's comment, i'm using "parent memory context" instead of > > "shared memory context" below. > > > > In the future, if some code writer decided to use > subcontext=false, > > to save memory in cases where there are many array > accumulation, and > > the parent memory context is long-living, current code can cause > > memory leak. So i think we should implement your suggestion > > (pfreeing astate), and warn the implication in the API > comment. The > > API user must choose between release=true, wasting cycles but > > preventing memory leak, or managing memory from the parent memory > > context. > > I'm wondering whether this is necessary after fixing makeArrayResult to > use the privat_cxt flag. It's still possible to call makeMdArrayResult > directly (with the wrong 'release' value). > > Another option might be to get rid of the 'release' flag altogether, and > just use the 'private_cxt' - I'm not aware of a code using release=false > with private_cxt=true (e.g. to build the same array twice from the same > astate). But maybe there's such code, and another downside is that it'd > break the existing API. > > > In one possible use case, for efficiency maybe the caller will > > create a special parent memory context for all array accumulation. > > Then uses makeArrayResult* with release=false, and in the end > > releasing the parent memory context once for all. > > Yeah, although I'd much rather not break the existing code at all. That > is - my goal is not to make it slower unless absolutely necessary (and > in that case the code may be fixed per your suggestion). But I'm not > convinced it's worth it. > > > OK. Do you think we need to note this in the comments? Something like > this: If using subcontext=false, the caller must be careful about memory > usage, because makeArrayResult* will not free the memory used. Yes, I think it's worth mentioning. > But I think it makes sense to move the error handling into > initArrayResultArr(), including the get_element_type() call, and remove > the element_type from the signature. This means initArrayResultAny() > will call the get_element_type() twice, but I guess that's negligible. > And everyone who calls initArrayResultArr() will get the error handling > for free. > > Patch v7 attached, implementing those two changes, i.e. > > * makeMdArrayResult(..., astate->private_cxt) > * move error handling into initArrayResultArr() > * remove element_type from initArrayResultArr() signature > > > Reviewing the v7 patch: > - applies cleanly to current master. patch format, whitespace, etc is good > - make check runs without error > - performance & memory usage still consistent > > If you think we don't have to add the comment (see above), i'll mark > this as ready for committer Attached is v8 patch, with a few comments added: 1) before initArrayResult() - explaining when it's better to use a single memory context, and when it's more efficient to use a separate memory context for each array build state 2) before makeArrayResult() - explaining that it won't free memory when allocated in a single memory context (and that a pfree() has to be used if necessary) 3) before makeMdArrayResult() - explaining that it's illegal to use release=true unless using a subcontext Otherwise the v8 patch is exactly the same as v7. Assuming the comments make it sufficiently clear, I agree with marking this as 'ready for committer'. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/executor/node
Re: [HACKERS] orangutan seizes up during isolation-check
On 1/1/15 11:04 PM, Noah Misch wrote: >> Clusters hosted on OS X fall into these categories: >> >> 1) Unaffected configuration. This includes everyone setting a valid messages >>locale via LANG, LC_ALL or LC_MESSAGES. >> 2) Affected configuration. Through luck and light use, the cluster would not >>experience the crashes/hangs. >> 3) Cluster would experience the crashes/hangs. >> >> DBAs in (3) want the FATAL at startup, but those in (2) want a LOG message >> instead. DBAs in (1) don't care. Since intermittent postmaster hangs are >> far >> worse than startup failure, if (2) and (3) have similar population, FATAL is >> the better bet. If (2) is sufficiently more populous than (3), then the many >> small pricks from startup failure do add up to hurt more than the occasional >> postmaster hang. Who knows how that calculation plays out. > > The first attached patch, for all branches, adds LOG-level messages and an > assertion. So cassert builds will fail hard, while others won't. The second > patch, for master only, changes the startup-time message to FATAL. If we > decide to use FATAL in all branches, I would just squash them into one. What I'm seeing now is that the unaccent regression tests when run under make check-world abort with FATAL: postmaster became multithreaded during startup HINT: Set the LC_ALL environment variable to a valid locale. My locale settings for this purpose are LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL= The environment variable LANG is set, all the other ones are not set. I could presumably set LC_ALL, but then I lose the ability to set different locales for different categories. Running LC_ALL=$LANG make -C contrib/unaccent check doesn't fix it; I get the same error. The behavior is also a bit strange. The step == starting postmaster== hangs for one minute before failing. This probably has nothing to do with your change, but maybe pg_regress could detect postmaster startup failures better. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Async execution of postgres_fdw.
On Tue, Jan 13, 2015 at 6:46 AM, Kyotaro HORIGUCHI wrote: >> Is it possible to use the parallel query infrastructure being built by >> Robert or to do something like parallel seq scan? That will work, not just >> for Postgres FDW but all the FDWs. > > But, I think, from the performance view, every scan of multiple > foreign scans don't need correnponding local process. Quite so. I think this is largely a separate project. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS qual pushdown
On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed wrote: > On 14 January 2015 at 13:29, Robert Haas wrote: >> One thing they could still leak is the number of times they got >> called, and thus possibly the number of unseen rows. Now if the >> expressions get constant-folded away that won't be an issue, but a >> clever user can probably avoid that. > > Right now, EXPLAIN ANALYSE can be used to tell you the number of > unseen rows. Is that something that people are concerned about, and > are there any plans to change it? Interesting question. I don't know. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typo fix in alter_table.sgml
On Wed, Jan 14, 2015 at 2:55 AM, Michael Paquier wrote: > I noticed that SET STATISTICS was not in a block in > alter_table.sgml: > > - SET STATISTICS acquires a SHARE UPDATE EXCLUSIVE > lock. > + SET STATISTICS acquires a > + SHARE UPDATE EXCLUSIVE lock. > > That's a small detail, still.. Patch is attached. OK, committed. But that's not a "typo" as stated in $SUBJECT but rather a "markup fix". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Tue, Jan 13, 2015 at 6:25 AM, John Gorman wrote: > One approach that I has worked well for me is to break big jobs into much > smaller bite size tasks. Each task is small enough to complete quickly. > > We add the tasks to a task queue and spawn a generic worker pool which eats > through the task queue items. > > This solves a lot of problems. > > - Small to medium jobs can be parallelized efficiently. > - No need to split big jobs perfectly. > - We don't get into a situation where we are waiting around for a worker to > finish chugging through a huge task while the other workers sit idle. > - Worker memory footprint is tiny so we can afford many of them. > - Worker pool management is a well known problem. > - Worker spawn time disappears as a cost factor. > - The worker pool becomes a shared resource that can be managed and reported > on and becomes considerably more predictable. I think this is a good idea, but for now I would like to keep our goals somewhat more modest: let's see if we can get parallel sequential scan, and only parallel sequential scan, working and committed. Ultimately, I think we may need something like what you're talking about, because if you have a query with three or six or twelve different parallelizable operations in it, you want the available CPU resources to switch between those as their respective needs may dictate. You certainly don't want to spawn a separate pool of workers for each scan. But I think getting that all working in the first version is probably harder than what we should attempt. We have a bunch of problems to solve here just around parallel sequential scan and the parallel mode infrastructure: heavyweight locking, prefetching, the cost model, and so on. Trying to add to that all of the problems that might attend on a generic task queueing infrastructure fills me with no small amount of fear. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] OOM on EXPLAIN with lots of nodes
On Tue, Jan 13, 2015 at 8:16 PM, Tom Lane wrote: > I wrote: >> Heikki Linnakangas writes: >>> But do we really need to backpatch any of this? > >> Alexey's example consumes only a couple hundred MB in 9.2, vs about 7GB >> peak in 9.3 and up. That seems like a pretty nasty regression. > > I did a bit more measurement of the time and backend memory consumption > for Alexey's example EXPLAIN: > > 9.2: 0.9 sec, circa 200 MB > HEAD: 56 sec, circa 7300 MB > with patch below: 3.7 sec, circa 300 MB > > So while this doesn't get us all the way back down to where we were before > we started trying to guarantee unique table/column identifiers in EXPLAIN > printouts, it's at least a lot closer. > > Not sure whether to just commit this to HEAD and call it a day, or to > risk back-patching. I think we need to back-patch something; that's a pretty nasty regression, and I have some EDB-internal reports that might be from the same cause. I'm not too concerned about forcibly breaking the API here, but I can understand why somebody might want to do that. If we do, I like the idea of renaming ExplainInitState() or maybe by replacing it by a NewExplainState() function that is used instead. But I'm not sure how necessary it is really. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Safe memory allocation functions
On Tue, Jan 13, 2015 at 10:10 AM, Tom Lane wrote: > However, there is a larger practical problem with this whole concept, > which is that experience should teach us to be very wary of the assumption > that asking for memory the system can't give us will just lead to nice > neat malloc-returns-NULL behavior. Any small perusal of the mailing list > archives will remind you that very often the end result will be SIGSEGV, > OOM kills, unrecoverable trap-on-write when the kernel realizes it can't > honor a copy-on-write promise, yadda yadda. Agreed that it's arguable > that these only occur in misconfigured systems ... but misconfiguration > appears to be the default in a depressingly large fraction of systems. > (This is another reason for "_safe" not being the mot juste :-() I don't really buy this. It's pretty incredible to think that after a malloc() failure there is absolutely no hope of carrying on sanely. If that were true, we wouldn't be able to ereport() out-of-memory errors at any severity less than FATAL, but of course it doesn't work that way. Moreover, AllocSetAlloc() contains malloc() and, if that fails, calls malloc() again with a smaller value, without even throwing an error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] __attribute__ for non-gcc compilers
On Tue, Jan 13, 2015 at 4:18 PM, Oskari Saarenmaa wrote: > Commit db4ec2ffce35 added alignment attributes for 64-bit atomic > variables as required on 32-bit platforms using > __attribute__((aligned(8)). That works fine with GCC, and would work > with Solaris Studio Compiler [1] and IBM XL C [2], but src/include/c.h > defines __attribute__ as an empty macro when not using GCC. > Unfortunately we can't just disable that #define and enable all > __attributes__ for Solaris CC and XLC as we use a bunch of attributes > that are not supported by those compilers and using them unconditionally > would generate a lot of warnings. > > Attached a patch that defines custom macros for each attribute and > enables them individually for compilers that support them and never > defines __attribute__. > > I have tested this with GCC 4.9.2 on Linux x86-64 and Solaris CC 5.12 on > Sparc (32-bit build); I don't have access to an IBM box with XLC. I guess my first question is whether we want to be relying on __attribute__((aligned)) in the first place. If we do, then this seems like a pretty sensible and necessary change. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan wrote: > Could you write some code to print out the block number (i.e. > "BlockNumber blkno") if there are more than, say, 5 retries within > _bt_moveright()? Obviously I mean that the block number should be printed, no matter whether or not the P_INCOMPLETE_SPLIT() path is taken or not. So you should just move this to the top of the for(;;) loop, so it's always available to print: BlockNumber blkno = BufferGetBlockNumber(buf); Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 11:49 AM, Merlin Moncure wrote: > so it looks like nobody ever exits from _bt_moveright. any last > requests before I start bisecting down? Could you write some code to print out the block number (i.e. "BlockNumber blkno") if there are more than, say, 5 retries within _bt_moveright()? You'll also want to print out the name of the index (so rel->rd_rel->relname.data). That'll probably only trip when this happens (5 retries are very unusual). Then, use contrib/pageinspect to dump that block's contents (I think that'll work, since no one sits on a buffer lock here). I'd like to know what index we're dealing with here. I wonder if it's "pg_attribute_relid_attnam_index". So: call "bt_page_stats('pg_whatever', n)", n being the relevant block number. Try and do the same with the right link (which bt_page_stats() will show). Then try and call bt_page_items() against one or both of those blocks/pages. I'd like to see what that shows. I'm wondering if some invariant has been violated. Which _bt_moveright() caller are we talking about here, BTW? I guess it's just the _bt_first()/ _bt_search() one. That might vary, of course. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leak in vacuumlo
On Tue, Jan 13, 2015 at 2:58 AM, Michael Paquier wrote: > Coverity has pointed out that vacuumlo.c is leaking memory when > grabbing the list of candidate table names and schemas. Attached is a > patch to fix the leak. It's only leaking in error cases, the first of which is quite unlikely. But I don't see that as a reason not to fix it, so -- committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching
On 11.12.2014 23:46, Tomas Vondra wrote: > On 11.12.2014 22:16, Robert Haas wrote: >> On Thu, Dec 11, 2014 at 2:51 PM, Tomas Vondra wrote: >> >>> The idea was that if we could increase the load a bit (e.g. using 2 >>> tuples per bucket instead of 1), we will still use a single batch in >>> some cases (when we miss the work_mem threshold by just a bit). The >>> lookups will be slower, but we'll save the I/O. >> >> Yeah. That seems like a valid theory, but your test results so far >> seem to indicate that it's not working out like that - which I find >> quite surprising, but, I mean, it is what it is, right? > > Not exactly. My tests show that as long as the outer table batches fit > into page cache, icreasing the load factor results in worse performance > than batching. > > When the outer table is "sufficiently small", the batching is faster. > > Regarding the "sufficiently small" - considering today's hardware, we're > probably talking about gigabytes. On machines with significant memory > pressure (forcing the temporary files to disk), it might be much lower, > of course. Of course, it also depends on kernel settings (e.g. > dirty_bytes/dirty_background_bytes). > > If we could identify those cases (at least the "temp files > RAM") then > maybe we could do this. Otherwise we're going to penalize all the other > queries ... > > Maybe the best solution for now is "increase the work_mem a bit" > recommendation. I think it's time to mark this patch as rejected (or maybe returned with feedback). The patch was meant as an attempt to implement Robert's idea from the hashjoin patch, but apparently we have no clear idea how to do it without hurting performance for many existing users. Maybe we can try later again, but there's no poin in keeping this in the current CF. Any objections? regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] explain sortorder
Hi, we will also remove the following is lc_collate hint in the next version, showing only mandatory info as suggested. /* for those who use COLLATE although their default is already the wanted */ if (strcmp(collname, localeptr) == 0) { appendStringInfo(sortorderInformation, " (%s is LC_COLLATE)", collname); } Anybody insisting on that? Arne Note: I see, at the moment we use the wrong default for DESC. We'll fix that. On Wed, 14 Jan 2015, Heikki Linnakangas wrote: On 01/14/2015 05:26 PM, Timmer, Marius wrote: Hello Heikki, abbreviated version: Sorry, the problem is only the unhandy patch text format, not different opinions how to proceed. Long version: The v7 patch file already addressed your suggestions, but the file contained serveral (old) local commits, the new ones at the end of the patch text/file. Ah, missed that. I stopped reading when I saw the old stuff there :-). v7.1 is attached and addresses this issue providing a clean patch file. Ok, thanks, will take a look. V8 will - as mentioned - add missing docs and regression tests, Great! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] explain sortorder
Hello Heikki, abbreviated version: Sorry, the problem is only the unhandy patch text format, not different opinions how to proceed. Long version: The v7 patch file already addressed your suggestions, but the file contained serveral (old) local commits, the new ones at the end of the patch text/file. v7.1 is attached and addresses this issue providing a clean patch file. V8 will - as mentioned - add missing docs and regression tests, Mike suggested. VlG-Arne & Marius --- Marius Timmer Zentrum für Informationsverarbeitung Westfälische Wilhelms-Universität Münster Einsteinstraße 60 mtimm...@uni-muenster.de Am 13.01.2015 um 18:52 schrieb Heikki Linnakangas : > On 01/13/2015 06:04 PM, Timmer, Marius wrote: >> -malloc() (StringInfo is used as suggested now). > > There really shouldn't be any snprintf() calls in the patch, when StringInfo > is used correctly... > >> @@ -1187,6 +1187,7 @@ explain (verbose, costs off) select * from matest0 >> order by 1-id; >> Sort >>Output: matest0.id, matest0.name, ((1 - matest0.id)) >>Sort Key: ((1 - matest0.id)) >> + Sort Order: ASC NULLS LAST >>-> Result >> Output: matest0.id, matest0.name, (1 - matest0.id) >> -> Append > > This patch isn't going to be committed with this output format. Please change > per my suggestion earlier: > >> I don't like this output. If there are a lot of sort keys, it gets >> difficult to match the right ASC/DESC element to the sort key it applies >> to. (Also, there seems to be double-spaces in the list) >> >> I would suggest just adding the information to the Sort Key line. As >> long as you don't print the modifiers when they are defaults (ASC and >> NULLS LAST), we could print the information even in non-VERBOSE mode. So >> it would look something like: >> >> Sort Key: sortordertest.n1 NULLS FIRST, sortordertest.n2 DESC > > Or if you don't agree with that, explain why. > > - Heikki > explain_sortorder-v7_1.patch Description: explain_sortorder-v7_1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 9:49 AM, Andres Freund wrote: > On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: >> On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund >> wrote: >> > If you gdb in, and type 'fin' a couple times, to wait till the function >> > finishes, is there actually any progress? I'm wondering whether it's >> > just many catalog accesses + contention, or some other >> > problem. Alternatively set a breakpoint on ScanPgRelation() or so and >> > see how often it's hit. >> >> well, i restarted the database, forgetting my looper was running which >> immediately spun up and it got stuck again with a similar profile >> (lots of cpu in spinlock): >> >> Samples: 3K of event 'cycles', Event count (approx.): 2695723228 >> 31.16% postgres[.] s_lock >> 22.32% postgres[.] tas >> 12.13% postgres[.] tas >> 5.93% postgres[.] spin_delay >> 5.69% postgres[.] LWLockRelease >> 3.75% postgres[.] LWLockAcquireCommon >> 3.61% perf[.] 0x000526c4 >> 2.51% postgres[.] FunctionCall2Coll >> 1.48% libc-2.17.so[.] 0x0016a132 >> >> > If you gdb in, and type 'fin' a couple times, >> (gdb) fin >> Run till exit from #0 0x7ff4c63f7a97 in semop () from >> /lib/x86_64-linux-gnu/libc.so.6 >> 0x006de073 in PGSemaphoreLock () >> (gdb) fin >> Run till exit from #0 0x006de073 in PGSemaphoreLock () >> >> It returned once. Second time, it didn't at least so far (minute or so). > > Hm, that's autovac though, not the normal user backends that actually do > stuff, right? If you could additionally check those, it'd be great. got it to reproduce. had to take a break to get some work done. (gdb) fin Run till exit from #0 0x7f7b07099dc3 in select () from /lib/x86_64-linux-gnu/libc.so.6 0x008d3ac4 in pg_usleep () (gdb) fin Run till exit from #0 0x008d3ac4 in pg_usleep () 0x00750b69 in s_lock () (gdb) fin Run till exit from #0 0x00750b69 in s_lock () 0x00750844 in LWLockRelease () (gdb) fin Run till exit from #0 0x00750844 in LWLockRelease () 0x0073 in LockBuffer () (gdb) fin Run till exit from #0 0x0073 in LockBuffer () 0x004b2db4 in _bt_relandgetbuf () (gdb) fin Run till exit from #0 0x004b2db4 in _bt_relandgetbuf () 0x004b7116 in _bt_moveright () (gdb) fin Run till exit from #0 0x004b7116 in _bt_moveright () so it looks like nobody ever exits from _bt_moveright. any last requests before I start bisecting down? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On Tue, Jan 13, 2015 at 4:41 PM, Peter Eisentraut wrote: > I think the key point I'm approaching is that the information should > only ever be in one place, all the time. This is not dissimilar from > why we took the tablespace location out of the system catalogs. Users > might have all kinds of workflows for how they back up, restore, and > move their tablespaces. This works pretty well right now, because the > authoritative configuration information is always in plain view. The > proposal is essentially that we add another location for this > information, because the existing location is incompatible with some > operating system tools. And, when considered by a user, that second > location might or might not collide with or overwrite the first location > at some mysterious times. > > So I think the preferable fix is not to add a second location, but to > make the first location compatible with said operating system tools, > possibly in the way I propose above. I see. I'm a little concerned that following symlinks may be cheaper than whatever system we would come up with for caching the tablespace-name-to-file-name mappings. But that concern might be unfounded, and apart from it I have no reason to oppose your proposal, if you want to do the work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On 1/13/15 5:02 AM, David Rowley wrote: I can't quite get my head around what you mean here, as the idea sounds quite similar to something that's been discussed already and ruled out. If we're joining relation a to relation b, say the plan chosen is a merge join. If we put some special node as the parent of the merge join then how will we know to skip or not skip any sorts that are there especially for the merge join, or perhaps the planner choose an index scan as a sorted path, where now that the join is removed, could become a faster seqscan. The whole plan switching node discussion came from this exact problem. Nobody seemed to like the non-optimal plan that was not properly optimised for having the relation removed. In my mind this is the same as a root level Alternative Plan, so you'd be free to do whatever you wanted in the alternate: -> blah blah -> Alternate -> Merge join ... -> SeqScan I'm guessing this would be easier to code, but that's just a guess. The other advantage is if you can't eliminate the join to table A at runtime you could still eliminate table B, whereas a top-level Alternate node doesn't have that flexibility. This does have a disadvantage of creating more plan variations to consider. With a single top-level Alternate node there's only one other option. I believe multiple Alternates would create more options to consider. Ultimately, unless this is easier to code than a top-level alternate, it's probably not worth pursuing. It also seems that transitioning through needless nodes comes at a cost. This is why I quite liked the Alternative Plan node idea, as it allowed me to skip over the alternative plan node at plan initialisation. For init I would expect this to result in a smaller number of nodes than a top-level Alternate, because you wouldn't be duplicating all the stuff above the joins. That said, I rather doubt it's worth worrying about the cost to init; won't it be completely swamped by extra planning cost, no matter how we go about this? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
* Dean Rasheed (dean.a.rash...@gmail.com) wrote: > Turns out it wasn't as simple as that. prepend_row_security_policies() > really could get called multiple times for the same RTE, because the > call to query_tree_walker() at the end of fireRIRrules() would descend > into the just-added quals again. The simplest fix seems to be to > process RLS in a separate loop at the end, so that it can have it's > own infinite recursion detection, which is different from that needed > for pre-existing security quals and with check options from security > barrier views. This approach simplifies things a bit, and ensures that > we only try to expand RLS once for each RTE. Right, I specifically recall having prepend_row_security_policies() getting called multiple times for the same RTE. I like this approach of using a separate loop though and it strikes me that it lends more weight to the argument that we're better off with these as independent considerations. > > Also, I'm thinking that it would be better to refactor things a bit > > and have prepend_row_security_policies() just return the new > > securityQuals and withCheckOptions to add. Then fireRIRrules() would > > only have to recurse into the new quals being added, not the > > already-processed quals. Hmm, good point. > Turns out that refactoring actually became necessary in order to fix > this bug, but I think it makes things cleaner and more efficient. Sounds good, I'll take a look. > Here's an updated patch with a new test for this bug. I've been > developing the fixes for these RLS issues as one big patch, but I > suppose it would be easy to split up, if that's preferred. I'm alright with it as-is for now. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] ereport bug
On Wed, Jan 14, 2015 at 1:40 PM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Jan 12, 2015 at 6:27 AM, Dmitry Voronin >> wrote: >>> I am attaching to this letter a test case that shows the behavior >>> errcontext() macro and the way to fix it. > >> So the upshot of this is that given errfinish(A, B, C), where A, B, >> and C are expressions, my gcc is choosing to evaluate C, then B, then >> A, then the errfinish call itself. But whoever wrote the errcontext() >> macro evidently thought, in this kind of situation, the compiler would >> be certain to evaluate A, then B, then C, then errfinish. But it >> doesn't. > > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1f9bf05e539646103c518bcbb49c04919b238f7a Oops, sorry. I missed that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On 14 January 2015 at 08:43, Dean Rasheed wrote: > On 12 January 2015 at 14:24, Stephen Frost wrote: >> Interesting, thanks for the work! I had been suspicious that there was >> an issue with the recursion handling. >> > > So continuing to review the RLS code, I spotted the following in > prepend_row_security_policies(): > > /* > * We may end up getting called multiple times for the same RTE, so check > * to make sure we aren't doing double-work. > */ > if (rte->securityQuals != NIL) > return false; > > which looked suspicious. I assume that's just a hang-up from an > earlier attempt to prevent infinite recursion in RLS expansion, but > actually it's wrong because in the case of an UPDATE to a security > barrier view on top of a table with RLS enabled, the view's > securityQuals will get added to the RTE first, and that shouldn't > prevent the underlying table's RLS securityQuals from being added. > > AFAICT, it should be safe to just remove the above code. I can't see > how prepend_row_security_policies() could end up getting called more > than once for the same RTE. > Turns out it wasn't as simple as that. prepend_row_security_policies() really could get called multiple times for the same RTE, because the call to query_tree_walker() at the end of fireRIRrules() would descend into the just-added quals again. The simplest fix seems to be to process RLS in a separate loop at the end, so that it can have it's own infinite recursion detection, which is different from that needed for pre-existing security quals and with check options from security barrier views. This approach simplifies things a bit, and ensures that we only try to expand RLS once for each RTE. > Also, I'm thinking that it would be better to refactor things a bit > and have prepend_row_security_policies() just return the new > securityQuals and withCheckOptions to add. Then fireRIRrules() would > only have to recurse into the new quals being added, not the > already-processed quals. > Turns out that refactoring actually became necessary in order to fix this bug, but I think it makes things cleaner and more efficient. Here's an updated patch with a new test for this bug. I've been developing the fixes for these RLS issues as one big patch, but I suppose it would be easy to split up, if that's preferred. Regards, Dean diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c new file mode 100644 index 9cbbcfb..34ddf41 *** a/src/backend/optimizer/plan/planner.c --- b/src/backend/optimizer/plan/planner.c *** inheritance_planner(PlannerInfo *root) *** 790,795 --- 790,796 { Query *parse = root->parse; int parentRTindex = parse->resultRelation; + Bitmapset *resultRTindexes = NULL; List *final_rtable = NIL; int save_rel_array_size = 0; RelOptInfo **save_rel_array = NULL; *** inheritance_planner(PlannerInfo *root) *** 814,820 --- 815,835 * (1) would result in a rangetable of length O(N^2) for N targets, with * at least O(N^3) work expended here; and (2) would greatly complicate * management of the rowMarks list. + * + * Note that any RTEs with security barrier quals will be turned into + * subqueries during planning, and so we must create copies of them too, + * except where they are target relations, which will each only be used + * in a single plan. */ + resultRTindexes = bms_add_member(resultRTindexes, parentRTindex); + foreach(lc, root->append_rel_list) + { + AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc); + if (appinfo->parent_relid == parentRTindex) + resultRTindexes = bms_add_member(resultRTindexes, + appinfo->child_relid); + } + foreach(lc, root->append_rel_list) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc); *** inheritance_planner(PlannerInfo *root) *** 885,905 { RangeTblEntry *rte = (RangeTblEntry *) lfirst(lr); ! if (rte->rtekind == RTE_SUBQUERY) { Index newrti; /* * The RTE can't contain any references to its own RT ! * index, so we can save a few cycles by applying ! * ChangeVarNodes before we append the RTE to the ! * rangetable. */ newrti = list_length(subroot.parse->rtable) + 1; ChangeVarNodes((Node *) subroot.parse, rti, newrti, 0); ChangeVarNodes((Node *) subroot.rowMarks, rti, newrti, 0); ChangeVarNodes((Node *) subroot.append_rel_list, rti, newrti, 0); rte = copyObject(rte); subroot.parse->rtable = lappend(subroot.parse->rtable, rte); } --- 900,928 { RangeTblEntry *rte = (RangeTblEntry *) lfirst(lr); ! /* ! * Copy subquery RTEs and RTEs with security barrier quals ! * that will be turned into subqueries, except those that are ! * target relations. ! */ ! if (rte->rtekind == RTE_SUBQUERY || ! (rte->securityQuals
Re: [HACKERS] libpq bad async behaviour
On 14 January 2015 at 08:40, Andres Freund wrote: > I think that kind of solution isn't likely to be satisfying. The amount > of porting work is just not going to be worth the cost. And it won't be > easily hideable in the API at all as the callers will expect a normal > fd. > All that consumers of the API need is something they can `select()` or equivalent on. > > Yeah, this is a problem. Fixing it isn't easy, though, I think. > > I think > extern PostgresPollingStatusType PQconnectPoll(PGconn *conn); > has the right interface. It returns what upper layers need to wait > for. I think we should extend pretty much that to more interfaces. This would be a fine solution. That enum indeed has the correct values/semantics. > This > likely means that we'll need extended versions of PQFlush() and > PQconsumeInput() - afaics it shouldn't be much more? > PQping? PQconnectPoll already has it. Though, I think we could probably even reduce this down to a single common function for all cases: PQpoll() or similar.
Re: [HACKERS] ereport bug
Robert Haas writes: > On Mon, Jan 12, 2015 at 6:27 AM, Dmitry Voronin > wrote: >> I am attaching to this letter a test case that shows the behavior >> errcontext() macro and the way to fix it. > So the upshot of this is that given errfinish(A, B, C), where A, B, > and C are expressions, my gcc is choosing to evaluate C, then B, then > A, then the errfinish call itself. But whoever wrote the errcontext() > macro evidently thought, in this kind of situation, the compiler would > be certain to evaluate A, then B, then C, then errfinish. But it > doesn't. http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1f9bf05e539646103c518bcbb49c04919b238f7a regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 7:22 AM, Merlin Moncure wrote: > I'll try to pull commits that Peter suggested and see if that helps > (I'm getting ready to bring the database down). I can send the code > off-list if you guys think it'd help. Thanks for the code! I think it would be interesting to see if the bug reproduces without the page split commit (40dae7ec537c5619fc93ad602c62f37be786d161) first (which was applied afterwards). Then, without the page deletion commit (efada2b8e920adfdf7418862e939925d2acd1b89). But without the page split commit seems most interesting at the moment. It would be easier to diagnose the issue if the bug still reproduces without that commit - that way, we can be reasonably confident that there are no confounding factors from the new page split code. There were some bugfixes to those two during the 9.4 beta cycle, too. It might be a bit tricky to generate a 9.4 that lacks one or both of those commits. Bugfix commits in reverse chronological order: c73669c0e0168923e3f9e787beec980f55af2bd8 (deletion) c91a9b5a285e20e54cf90f3660ce51ce3a5c2ef4 (incomplete split) 4fafc4ecd9e4d224d92c4a8549c5646860787a5d (deletion) 4a5d55ec2b711e13438a32d119a809a22ced410b (incomplete split) 77fe2b6d795f3f4ed282c9c980920e128a57624e (deletion) 7d98054f0dd115f57ad0ec1f424a66c13459013b (deletion) 954523cdfe229f1cb99a43a19e291a557ae2822d (incomplete split) I think that's all of them (apart from the original commits, of course). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] s_lock.h default definitions are rather confused
On Mon, Jan 12, 2015 at 12:57 PM, Andres Freund wrote: > On 2015-01-10 23:03:36 +0100, Andres Freund wrote: >> On 2015-01-10 16:09:42 -0500, Tom Lane wrote: >> > I've not tried to build HEAD on my HPPA dinosaur for awhile, but I did >> > just now, and I am presented with boatloads of this: >> > >> > ../../../src/include/storage/s_lock.h:759: warning: `S_UNLOCK' redefined >> > ../../../src/include/storage/s_lock.h:679: warning: this is the location >> > of the previous definition >> > >> > which is not too surprising because the "default" definition at line 679 >> > precedes the HPPA-specific one at line 759. >> >> That's 0709b7ee72e4bc71ad07b7120acd117265ab51d0. >> >> Not too surprising that it broke and wasn't noticed without access to >> hppa - the hppa code uses gcc inline assembly outside of the big >> defined(__GNUC__) and inside the section headed "Platforms that use >> non-gcc inline assembly". >> >> > I'm not particularly interested in untangling the effects of the recent >> > hackery in s_lock.h enough to figure out how the overall structure got >> > broken, but I trust one of you will clean up the mess. >> >> I think it's easiest solved by moving the gcc inline assembly up to the >> rest of the gcc inline assembly. That'll require duplicating a couple >> lines, but seems easier to understand nonetheless. Not pretty. > > Robert, do you have a better idea? How about hoisting the entire hppa section up to the top of the file, before the __GNUC__ || __INTEL_COMPILER section? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ereport bug
On Mon, Jan 12, 2015 at 6:27 AM, Dmitry Voronin wrote: > I am attaching to this letter a test case that shows the behavior > errcontext() macro and the way to fix it. So the upshot of this is that given errfinish(A, B, C), where A, B, and C are expressions, my gcc is choosing to evaluate C, then B, then A, then the errfinish call itself. But whoever wrote the errcontext() macro evidently thought, in this kind of situation, the compiler would be certain to evaluate A, then B, then C, then errfinish. But it doesn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai wrote: > When custom-scan node replaced a join-plan, it shall have at least two > child plan-nodes. The callback handler of PlanCustomPath needs to be > able to call create_plan_recurse() to transform the underlying path-nodes > to plan-nodes, because this custom-scan node may take other built-in > scan or sub-join nodes as its inner/outer input. > In case of FDW, it shall kick any underlying scan relations to remote > side, thus we may not expect ForeignScan has underlying plans... Do you have an example of this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
- Original Message - > From: Sawada Masahiko > To: Tom Lane > Cc: PostgreSQL-development > Sent: Wednesday, 14 January 2015, 16:09 > Subject: Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf > > On Thu, Jan 15, 2015 at 12:37 AM, Tom Lane wrote: >> Sawada Masahiko writes: >>> The postgresql.auto.conf is loaded after loading of postgresql.conf >>> whenever configuration file is loaded or reloaded. >>> This means that parameter in postgresql.auto.conf is quite high >>> priority, so the parameter in postgresql.conf does not work at all >>> even if user set it manually. >> >>> If user want to change stopped postgresql server then user need to >>> merge two configuration file(postgresql.conf and postgresql.auto.conf) >>> while maintaining the consistency manually. >> From an operational perspective having a written config with > duplicate >>> entries is not good thing. >>> I think we need to merge two configuration file into one (or more than >>> one, if it uses like 'include' word) >> >>> The one solution is to add merging tool/commnand which merges two >>> configuration file while maintaining the consistency. >>> This topic have been already discussed? >> >> Yes. The entire reason that postgresql.auto.conf is separate is that >> we despaired of reading and rewriting postgresql.conf automatically >> without making a hash of material in the comments. Calling the logic >> a "merge tool" does not make that problem go away. >> > > The merge tool do not only to merge the all parameters in two > configuration into one file but also to remove duplicate parameters. > That is, the configuration files will be one file in logically. I think the point is that you can't really interpret the context of comments with an automated tool, so you'd probably construct a misleading conf file. I created something to mash conf files together in perl, but mash them together is all it does: https://github.com/glynastill/pg_upgrade_conf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Fri, Jan 2, 2015 at 11:52 AM, Bruce Momjian wrote: > OK, so given your stats, the feature give a 12.5% reduction in I/O. If > that is significant, shouldn't we see a performance improvement? If we > don't see a performance improvement, is I/O reduction worthwhile? Is it > valuable in that it gives non-database applications more I/O to use? Is > that all? > > I suggest we at least document that this feature as mostly useful for > I/O reduction, and maybe say CPU usage and performance might be > negatively impacted. > > OK, here is the email I remember from Fujii Masao this same thread that > showed a performance improvement for WAL compression: > > > http://www.postgresql.org/message-id/CAHGQGwGqG8e9YN0fNCUZqTTT=hnr7ly516kft5ffqf4pp1q...@mail.gmail.com > > Why are we not seeing the 33% compression and 15% performance > improvement he saw? What am I missing here? Bruce, some database workloads are I/O bound and others are CPU bound. Any patch that reduces I/O by using CPU is going to be a win when the system is I/O bound and a loss when it is CPU bound. I'm not really sure what else to say about that; it seems pretty obvious. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
On Mon, Jan 5, 2015 at 12:54 PM, Tom Lane wrote: > TBH, my first reaction to this entire patch is unfavorable: it's a > solution in search of a problem. It adds substantial complication not > only for users but for PG developers in order to solve a rather narrow > performance issue. > > What would make sense to me is to teach the planner about inlining > SQL functions that include ORDER BY clauses, so that the performance > issue of a double sort could be avoided entirely transparently to > the user. That's not a bad idea, but it only helps for SQL functions. Actually, the problem I have run into in the past was not that the planner didn't know the ordering of the SRF's return value, but that it had no statistics for it, and therefore made bad optimization decisions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Thu, Jan 15, 2015 at 1:15 AM, Tom Lane wrote: > Sawada Masahiko writes: >> On Thu, Jan 15, 2015 at 12:37 AM, Tom Lane wrote: >>> Yes. The entire reason that postgresql.auto.conf is separate is that >>> we despaired of reading and rewriting postgresql.conf automatically >>> without making a hash of material in the comments. Calling the logic >>> a "merge tool" does not make that problem go away. > >> The merge tool do not only to merge the all parameters in two >> configuration into one file but also to remove duplicate parameters. >> That is, the configuration files will be one file in logically. > > I'll just say one more time that if we thought this could work, we'd not > have arrived at the separate-files design to begin with. You can work > on it if you like, but I will bet a good deal that you will not end up > with something that gets accepted. > Yep, I don't intend to propose again that. Because I thought that the maintaining of configuration file will be complicated, so I just thought we can add supporting tool. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC
Hi Marco, thank you for sending an updated patch. I am writing down a report of this initial (and partial) review. IMPORTANT: This patch is not complete, as stated by Marco. See the "Conclusions" section for my proposed TODO list. == Patch application I have been able to successfully apply your patch and compile it. Regression tests passed. == Initial run I have created a fresh new instance of PostgreSQL and activated streaming replication to be used by pg_basebackup. I have done a pgbench run with scale 100. I have taken a full consistent backup with pg_basebackup (in plain format): pg_basebackup -v -F p -D $BACKUPDIR/backup-$(date '+%s') -x I have been able to verify that the backup_profile is correctly placed in the destination PGDATA directory. Here is an excerpt: POSTGRESQL BACKUP PROFILE 1 START WAL LOCATION: 0/358 (file 00010003) CHECKPOINT LOCATION: 0/38C BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2015-01-14 10:07:07 CET LABEL: pg_basebackup base backup FILE LIST \N \N t 1421226427 206 backup_label \N \N t 1421225508 88 postgresql.auto.conf ... As suggested by Marco, I have manually taken the LSN from this file (next version must do this automatically). I have then executed pg_basebackup and activated the incremental feature by using the LSN from the previous backup, as follows: LSN=$(awk '/^START WAL/{print $4}' backup_profile) pg_basebackup -v -F p -D $BACKUPDIR/backup-$(date '+%s') -I $LSN -x The time taken by this operation has been much lower than the previous one and the size is much lower (I have not done any operation in the meantime): du -hs backup-1421226* 1,5Gbackup-1421226427 17M backup-1421226427 I have done some checks on the file system and then used the prototype of recovery script in Python written by Marco. ./recover.py backup-1421226427 backup-1421226427 new-data The cluster started successfully. I have then run a pg_dump of the pgbench database and were able to reload it on the initial cluster. == Conclusions The first run of this patch seems promising. While the discussion on the LSN map continues (which is mainly an optimisation of this patch), I would really like to see this patch progress as it would be a killer feature in several contexts (not in every context). Just in this period we are releasing file based incremental backup for Barman and customers using the alpha version are experiencing on average a deduplication ratio between 50% to 70%. This is for example an excerpt of "barman show-backup" from one of our customers (a daily saving of 550GB is not bad): Base backup information: Disk usage : 1.1 TiB (1.1 TiB with WALs) Incremental size : 564.6 GiB (-50.60%) ... My opinion, Marco, is that for version 5 of this patch, you: 1) update the information on the wiki (it is outdated - I know you have been busy with LSN map optimisation) 2) modify pg_basebackup in order to accept a directory (or tar file) and automatically detect the LSN from the backup profile 3) add the documentation regarding the backup profile and pg_basebackup Once we have all of this, we can continue trying the patch. Some unexplored paths are: * tablespace usage * tar format * performance impact (in both "read-only" and heavily updated contexts) * consistency checks I would then leave for version 6 the pg_restorebackup utility (unless you want to do everything at once). One limitation of the current recovery script is that it cannot accept multiple incremental backups (it just accepts three parameters: base backup, incremental backup and merge destination). Maybe you can change the syntax as follows: ./recover.py DESTINATION BACKUP_1 BACKUP_2 [BACKUP_3, ...] Thanks a lot for working on this. I am looking forward to continuing the review. Ciao, Gabriele -- Gabriele Bartolini - 2ndQuadrant Italia - Managing Director PostgreSQL Training, Services and Support gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it 2015-01-13 17:21 GMT+01:00 Marco Nenciarini : > Il 13/01/15 12:53, Gabriele Bartolini ha scritto: > > Hi Marco, > > > > could you please send an updated version the patch against the current > > HEAD in order to facilitate reviewers? > > > > Here is the updated patch for incremental file based backup. > > It is based on the current HEAD. > > I'm now working to the client tool to rebuild a full backup starting > from a file based incremental backup. > > Regards, > Marco > > -- > Marco Nenciarini - 2ndQuadrant Italy > PostgreSQL Training, Services and Support > marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it >
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
Sawada Masahiko writes: > On Thu, Jan 15, 2015 at 12:37 AM, Tom Lane wrote: >> Yes. The entire reason that postgresql.auto.conf is separate is that >> we despaired of reading and rewriting postgresql.conf automatically >> without making a hash of material in the comments. Calling the logic >> a "merge tool" does not make that problem go away. > The merge tool do not only to merge the all parameters in two > configuration into one file but also to remove duplicate parameters. > That is, the configuration files will be one file in logically. I'll just say one more time that if we thought this could work, we'd not have arrived at the separate-files design to begin with. You can work on it if you like, but I will bet a good deal that you will not end up with something that gets accepted. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings
On 2015-01-14 11:09:10 -0500, Tom Lane wrote: > Andres Freund writes: > > I've already given up... Given how infrequent it is, suppressing it for > > gull seems sufficient. > > I'm confused --- I see no format warnings in gull's current reports. Sorry it was me being confused. I somehow switched gaur and gull because I had tabs open for both. Turns out that gaur doesn't doesn't do %z at all... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings
Andres Freund writes: > I've already given up... Given how infrequent it is, suppressing it for > gull seems sufficient. I'm confused --- I see no format warnings in gull's current reports. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Thu, Jan 15, 2015 at 12:37 AM, Tom Lane wrote: > Sawada Masahiko writes: >> The postgresql.auto.conf is loaded after loading of postgresql.conf >> whenever configuration file is loaded or reloaded. >> This means that parameter in postgresql.auto.conf is quite high >> priority, so the parameter in postgresql.conf does not work at all >> even if user set it manually. > >> If user want to change stopped postgresql server then user need to >> merge two configuration file(postgresql.conf and postgresql.auto.conf) >> while maintaining the consistency manually. > >>> From an operational perspective having a written config with duplicate >> entries is not good thing. >> I think we need to merge two configuration file into one (or more than >> one, if it uses like 'include' word) > >> The one solution is to add merging tool/commnand which merges two >> configuration file while maintaining the consistency. >> This topic have been already discussed? > > Yes. The entire reason that postgresql.auto.conf is separate is that > we despaired of reading and rewriting postgresql.conf automatically > without making a hash of material in the comments. Calling the logic > a "merge tool" does not make that problem go away. > The merge tool do not only to merge the all parameters in two configuration into one file but also to remove duplicate parameters. That is, the configuration files will be one file in logically. It will be clearly complicated work that the user need to rewrite postgresql.conf manually while maintaining the consistency. (On top of that, The executing of ALTER SYSTEM command is not allowed user except super user.) Is it bad that ALTER SYSTEM parses postgresql.conf again at AlterSystemSetConfigFile() to get line number and file name of that parameter? (or postgresql continue to keep line number and file name of parameter) Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: > On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund wrote: > > If you gdb in, and type 'fin' a couple times, to wait till the function > > finishes, is there actually any progress? I'm wondering whether it's > > just many catalog accesses + contention, or some other > > problem. Alternatively set a breakpoint on ScanPgRelation() or so and > > see how often it's hit. > > well, i restarted the database, forgetting my looper was running which > immediately spun up and it got stuck again with a similar profile > (lots of cpu in spinlock): > > Samples: 3K of event 'cycles', Event count (approx.): 2695723228 > 31.16% postgres[.] s_lock > 22.32% postgres[.] tas > 12.13% postgres[.] tas > 5.93% postgres[.] spin_delay > 5.69% postgres[.] LWLockRelease > 3.75% postgres[.] LWLockAcquireCommon > 3.61% perf[.] 0x000526c4 > 2.51% postgres[.] FunctionCall2Coll > 1.48% libc-2.17.so[.] 0x0016a132 > > > If you gdb in, and type 'fin' a couple times, > (gdb) fin > Run till exit from #0 0x7ff4c63f7a97 in semop () from > /lib/x86_64-linux-gnu/libc.so.6 > 0x006de073 in PGSemaphoreLock () > (gdb) fin > Run till exit from #0 0x006de073 in PGSemaphoreLock () > > It returned once. Second time, it didn't at least so far (minute or so). Hm, that's autovac though, not the normal user backends that actually do stuff, right? If you could additionally check those, it'd be great. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund wrote: > If you gdb in, and type 'fin' a couple times, to wait till the function > finishes, is there actually any progress? I'm wondering whether it's > just many catalog accesses + contention, or some other > problem. Alternatively set a breakpoint on ScanPgRelation() or so and > see how often it's hit. well, i restarted the database, forgetting my looper was running which immediately spun up and it got stuck again with a similar profile (lots of cpu in spinlock): Samples: 3K of event 'cycles', Event count (approx.): 2695723228 31.16% postgres[.] s_lock 22.32% postgres[.] tas 12.13% postgres[.] tas 5.93% postgres[.] spin_delay 5.69% postgres[.] LWLockRelease 3.75% postgres[.] LWLockAcquireCommon 3.61% perf[.] 0x000526c4 2.51% postgres[.] FunctionCall2Coll 1.48% libc-2.17.so[.] 0x0016a132 > If you gdb in, and type 'fin' a couple times, (gdb) fin Run till exit from #0 0x7ff4c63f7a97 in semop () from /lib/x86_64-linux-gnu/libc.so.6 0x006de073 in PGSemaphoreLock () (gdb) fin Run till exit from #0 0x006de073 in PGSemaphoreLock () It returned once. Second time, it didn't at least so far (minute or so). >> I can send the code off-list if you guys think it'd help. > > Might be interesting. sent (off-list). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
Sawada Masahiko writes: > The postgresql.auto.conf is loaded after loading of postgresql.conf > whenever configuration file is loaded or reloaded. > This means that parameter in postgresql.auto.conf is quite high > priority, so the parameter in postgresql.conf does not work at all > even if user set it manually. > If user want to change stopped postgresql server then user need to > merge two configuration file(postgresql.conf and postgresql.auto.conf) > while maintaining the consistency manually. >> From an operational perspective having a written config with duplicate > entries is not good thing. > I think we need to merge two configuration file into one (or more than > one, if it uses like 'include' word) > The one solution is to add merging tool/commnand which merges two > configuration file while maintaining the consistency. > This topic have been already discussed? Yes. The entire reason that postgresql.auto.conf is separate is that we despaired of reading and rewriting postgresql.conf automatically without making a hash of material in the comments. Calling the logic a "merge tool" does not make that problem go away. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Merging postgresql.conf and postgresql.auto.conf
Hi all, The postgresql.auto.conf is loaded after loading of postgresql.conf whenever configuration file is loaded or reloaded. This means that parameter in postgresql.auto.conf is quite high priority, so the parameter in postgresql.conf does not work at all even if user set it manually. If user want to change stopped postgresql server then user need to merge two configuration file(postgresql.conf and postgresql.auto.conf) while maintaining the consistency manually. >From an operational perspective having a written config with duplicate entries is not good thing. I think we need to merge two configuration file into one (or more than one, if it uses like 'include' word) The one solution is to add merging tool/commnand which merges two configuration file while maintaining the consistency. This topic have been already discussed? Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] explain sortorder
On 01/14/2015 05:26 PM, Timmer, Marius wrote: Hello Heikki, abbreviated version: Sorry, the problem is only the unhandy patch text format, not different opinions how to proceed. Long version: The v7 patch file already addressed your suggestions, but the file contained serveral (old) local commits, the new ones at the end of the patch text/file. Ah, missed that. I stopped reading when I saw the old stuff there :-). v7.1 is attached and addresses this issue providing a clean patch file. Ok, thanks, will take a look. V8 will - as mentioned - add missing docs and regression tests, Great! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On 2015-01-14 09:22:45 -0600, Merlin Moncure wrote: > On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund wrote: > > On 2015-01-14 10:05:01 -0500, Tom Lane wrote: > >> Merlin Moncure writes: > >> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: > >> >> What are the autovac processes doing (according to pg_stat_activity)? > >> > >> > pid,running,waiting,query > >> > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class > > > > It'd be interesting to know whether that vacuum gets very frequent > > semaphore wakeups. Could you strace it for a second or three? > > for 30 seconds+ it just looks like this: > mmoncure@mernix2 ~ $ sudo strace -p 7105 > Process 7105 attached > semop(5701638, {{4, -1, 0}}, 1 Ok. So that explains why it's not interruptible. > all of other processes are yielding out of the spinlock, for example: > select(0, NULL, NULL, NULL, {0, 1408}) = 0 (Timeout) Note the above isn't the spinlock, it's the process's semaphore. It'll only get set if the refcount ever indicates that nobody but autovac is holding the lock. > > How did this perform < 9.4? > this is a new project. However, I can run it vs earlier version. > > Can you guess how many times these dynamic > > statements are planned? How many different relations are accessed in the > > dynamically planned queries? > > only once or twice, and only a couple of tables. Hm. Odd. The first -g profile seemed to indicate a hell of a lot time was spent in LWLockRelease() - indicating that there's actually progress. Later profiles/backtraces were less clear. If you gdb in, and type 'fin' a couple times, to wait till the function finishes, is there actually any progress? I'm wondering whether it's just many catalog accesses + contention, or some other problem. Alternatively set a breakpoint on ScanPgRelation() or so and see how often it's hit. > I can send the code off-list if you guys think it'd help. Might be interesting. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund wrote: > On 2015-01-14 10:05:01 -0500, Tom Lane wrote: >> Merlin Moncure writes: >> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane wrote: >> >> What are the autovac processes doing (according to pg_stat_activity)? >> >> > pid,running,waiting,query >> > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class > > It'd be interesting to know whether that vacuum gets very frequent > semaphore wakeups. Could you strace it for a second or three? for 30 seconds+ it just looks like this: mmoncure@mernix2 ~ $ sudo strace -p 7105 Process 7105 attached semop(5701638, {{4, -1, 0}}, 1 all of other processes are yielding out of the spinlock, for example: select(0, NULL, NULL, NULL, {0, 1408}) = 0 (Timeout) > How did this perform < 9.4? this is a new project. However, I can run it vs earlier version. Can you guess how many times these dynamic > statements are planned? How many different relations are accessed in the > dynamically planned queries? only once or twice, and only a couple of tables. This is an operation that should only take few seconds (inserting a few 10s of thousands of rows), that has blocked for many hours now. Usually it runs through taking a few seconds. This is either a deadlock or a deadlock emulating sequence of operations. I'll try to pull commits that Peter suggested and see if that helps (I'm getting ready to bring the database down). I can send the code off-list if you guys think it'd help. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On 2015-01-14 10:13:32 -0500, Tom Lane wrote: > Merlin Moncure writes: > > Yes, it is pg_class is coming from LockBufferForCleanup (). As you > > can see above, it has a shorter runtime. So it was killed off once > > about a half hour ago which did not free up the logjam. However, AV > > spawned it again and now it does not respond to cancel. > > Interesting. That seems like there might be two separate issues at > play. It's plausible that LockBufferForCleanup might be interfering > with other attempts to scan the index, but then why wouldn't killing > the AV have unstuck things? LockBufferForCleanup() unfortunately isn't interruptible. I've every now and then seen vacuums being stuck for a long while, trying to acquire cleanup locks - IIRC I complained about that on list even. So autovac will only be cancelled when going to the next page. And even if the page ever gets a zero (well, one) refcount, by the time autovac is woken up via the semaphore, it'll often end up being used again. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers