Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait
Michael, * Michael Paquier (michael.paqu...@gmail.com) wrote: > On Wed, Jan 18, 2017 at 7:31 AM, Stephen Frostwrote: > > Perhaps we need a way for pg_ctl to realize a cold-standby case and > > throw an error or warning if --wait is specified then, but that hardly > > seems like the common use-case. It also wouldn't make any sense to have > > anything in the init system which depended on PG being up in such a case > > because, well, PG isn't ever going to be 'up'. > > Yeah, it seems to me that we are likely looking for a wait mode saying > to exit pg_ctl once Postgres is happily rejecting connections, because > that means that it is up and that it is sorting out something first > before accepting them. This would basically filter the states in the > control file that we find as acceptable if the connection test > continues complaining about PQPING_REJECT. If you're suggesting this *only* in the case where PG is starting up as a cold standby, then, ok, maybe. I don't think '-w' should mean anything less than "up and accepting connections" for regular or hot standby systems. I'm not really convinced that the code is worth the trouble to handle this case, but I'm not going to argue if someone wants to write it. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel Index Scans
On Tue, Jan 17, 2017 at 11:27 PM, Robert Haaswrote: > On Mon, Jan 16, 2017 at 7:11 AM, Amit Kapila wrote: > > > WAIT_EVENT_PARALLEL_INDEX_SCAN is in fact btree-specific. There's no > guarantee that any other AMs the implement parallel index scans will > use that wait event, and they might have others instead. I would make > it a lot more specific, like WAIT_EVENT_BTREE_PAGENUMBER. (Waiting > for the page number needed to continue a parallel btree scan to become > available.) > WAIT_EVENT_BTREE_PAGENUMBER - NUMBER sounds slightly inconvenient. How about just WAIT_EVENT_BTREE_PAGE? We can keep the description as suggested by you? > Why do we need separate AM support for index_parallelrescan()? Why > can't index_rescan() cover that case? The reason is that sometime index_rescan is called when we have to just update runtime scankeys in index and we don't want to reset parallel scan for that. Refer ExecReScanIndexScan() changes in patch parallel_index_opt_exec_support_v4. Rescan is called from below place during index scan. ExecIndexScan(IndexScanState *node) { /* * If we have runtime keys and they've not already been set up, do it now. */ if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady) ExecReScan((PlanState *) node); > If the AM-specific method is > getting the IndexScanDesc, it can see for itself whether it is a > parallel scan. > I think if we want to do that way then we need to pass some additional information related to runtime scan keys in index_rescan method and then probably to amspecific rescan method. That sounds scary. > > I think it's a bad idea to add a ParallelIndexScanDesc argument to > index_beginscan(). That function is used in lots of places, and > somebody might think that they are allowed to actually pass a non-NULL > value there, which they aren't: they must go through > index_beginscan_parallel. I think that the additional argument should > only be added to index_beginscan_internal, and > index_beginscan_parallel should remain unchanged. > If we go that way then we need to set few parameters like heapRelation and xs_snapshot in index_beginscan_parallel as we are doing in index_beginscan. Does going that way sound better to you? > Either that, or get > rid of index_beginscan_parallel altogether and have everyone use > index_beginscan directly, and put the snapshot-restore logic in that > function. > I think there is value in retaining index_beginscan_parallel as that is parallel to heap_beginscan_parallel. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Generate fmgr prototypes automatically
Peter Eisentrautwrites: > Generate fmgr prototypes automatically BTW, I notice some suspicious-looking behavior with -j: $ make -j8 -s Writing fmgroids.h Writing fmgroids.h Writing postgres.bki Writing fmgrprotos.h Writing fmgrtab.c Writing schemapg.h Writing postgres.description Writing postgres.shdescription Writing fmgrprotos.h Writing fmgrtab.c ... Why do fmgroids.h, fmgrprotos.h, and fmgrtab.c now get mentioned twice? I suspect there is something broken about the parallelization. If indeed multiple instances of gmake are writing these files concurrently, that's likely to result in irreproducible build failures. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
On Wed, Jan 18, 2017 at 8:15 PM, Vladimir Rusinovwrote: > On the topic of binaries, there's going to be another patch renaming them. > Those will have no aliases as it's trivial to work-around (symlinks, shell > scripts, etc) and not so trivial to implement in a portable way. On Windows that would be a pain... So let's not use symlinks for binaries. > I'm used to workflows with lots of small commits, so I was not able to force > myself to include all of the renames in one diff. That's a good habit IMO. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] kNN for btree
Sorry for the broken formatting in my previous message. Below is a corrected version of this message. I'd like to present a series of patches that implements k-Nearest Neighbors (kNN) search for btree, which can be used to speed up ORDER BY distance queries like this: SELECT * FROM events ORDER BY date <-> '2000-01-01'::date ASC LIMIT 100; Now only GiST supports kNN, but kNN on btree can be emulated using contrib/btree_gist. Scanning algorithm == Algorithm is very simple: we use bidirectional B-tree index scan starting at the point from which we measure the distance (target point). At each step, we advance this scan in the direction that has the nearest point. But when the target point does not fall into the scanned range, we don't even need to use a bidirectional scan here --- we can use ordinary unidirectional scan in the right direction. Performance results === Test database is taken from original kNN-GiST presentation (PGCon 2010). Test query SELECT * FROM events ORDER BY date <-> '1957-10-04'::date ASC LIMIT k; can be optimized to the next rather complicated UNION form, which no longer requires kNN: WITH t1 AS (SELECT * FROM events WHERE date >= '1957-10-04'::date ORDER BY date ASC LIMIT k), t2 AS (SELECT * FROM events WHERE date < '1957-10-04'::date ORDER BY date DESC LIMIT k), t AS (SELECT * FROM t1 UNION SELECT * FROM t2) SELECT * FROM t ORDER BY date <-> '1957-10-04'::date ASC LIMIT k; In each cell of this table shown query execution time in milliseconds and the number of accessed blocks: k | kNN-btree | kNN-GiST| Opt. query | Seq. scan | | (btree_gist) | with UNION | with sort |--|--|---| 1 | 0.041 4 | 0.079 4 | 0.060 8 | 41.1 1824 10 | 0.048 7 | 0.091 9 | 0.09717 | 41.8 1824 100 | 0.10747 | 0.19252 | 0.342 104 | 42.3 1824 1000 | 0.735 573 | 0.913 650 | 2.970 1160 | 43.5 1824 1 | 5.070 5622 | 6.240 6760 | 36.300 11031 | 54.1 1824 10 | 49.600 51608 | 61.900 64194 | 295.100 94980 | 115.0 1824 As you can see, kNN-btree can be two times faster than kNN-GiST (btree_gist) when k < 1000, but the number of blocks read is roughly the same. Implementation details == A brief description is given below for each of the patches: 1. Introduce amcanorderbyop() function This patch transforms existing boolean AM property amcanorderbyop into a method (function pointer). This is necessary because, unlike GiST, kNN for btree supports only a one ordering operator on the first index column and we need a different pathkey matching logic for btree (there was a corresponding comment in match_pathkeys_to_index()). GiST-specific logic has been moved from match_pathkeys_to_index() to gistcanorderbyop(). 2. Extract substructure BTScanState from BTScanOpaque This refactoring is necessary for bidirectional kNN-scan implementation. Now, BTScanOpaque's substructure BTScanState containing only the fields related to scan position is passed to some functions where the whole BTScanOpaque was passed previously. 3. Extract get_index_column_opclass(), get_opclass_opfamily_and_input_type(). Extracted two simple common functions used in gistproperty() and btproperty() (see the next patch). 4. Add kNN support to btree * Added additional optional BTScanState to BTScanOpaque for bidirectional kNN scan. * Implemented bidirectional kNN scan. * Implemented logic for selecting kNN strategy * Implemented btcanorderbyop(), updated btproperty() and btvalidate() B-tree user interface functions have not been altered because ordering operators are used directly. 5. Add distance operators for some types These operators for integer, float, date, time, timestamp, interval, cash and oid types have been copied from contrib/btree_gist and added to the existing btree opclasses as ordering operators. Their btree_gist duplicates are removed in the next patch. 6. Remove duplicate distance operators from contrib/btree_gist. References to their own distance operators in btree_gist opclasses are replaced with references to the built-in operators and than duplicate operators are dropped. But if the user is using somewhere these operators, upgrade of btree_gist from 1.3 to 1.4 would fail. 7. Add regression tests for btree kNN. Tests were added only after the built-in distance operators were added. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw bug in 9.6
On Wed, Jan 18, 2017 at 8:18 AM, Etsuro Fujitawrote: > On 2017/01/16 11:38, Etsuro Fujita wrote: >> >> On 2017/01/14 6:39, Jeff Janes wrote: >>> >>> I do get a compiler warning: >>> >>> foreign.c: In function 'CreateLocalJoinPath': >>> foreign.c:832: warning: implicit declaration of function >>> 'pathkeys_contained_in' > > >> Will fix. > > > Done. Attached is the new version. I also adjusted the code a bit further. With this patch there are multiple cases, where CreateLocalJoinPath() would return NULL and hence postgres_fdw would not push a join down for example /* * (1) if either cheapest-total path is parameterized by the * other rel, we can't generate a hashjoin/mergejoin path, and * (2) proposed hashjoin/mergejoin path is still parameterized * (ie, the required_outer set calculated by * calc_non_nestloop_required_outer isn't NULL), it's not what * we want; which means that both the cheapest-total paths * should be unparameterized. */ if (outer_path->param_info || inner_path->param_info) return NULL; or /* * If the cheapest-total outer path is parameterized by the * inner rel, we can't generate a nestloop path. (There's no * use looking for alternative outer paths, since it should * already be the least-parameterized available path.) */ if (PATH_PARAM_BY_REL(outer_path, innerrel)) return NULL; /* If proposed path is still parameterized, don't return it. */ required_outer = calc_nestloop_required_outer(outer_path, inner_path); if (required_outer) { bms_free(required_outer); return NULL; } Am I right? The earlier version of this function GetExistingLocalJoinPath() used to return a local path in those cases and hence postgres_fdw was able to push down corresponding queries. So, we are introducing a performance regression. We need to plug those holes. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
On Tue, Jan 17, 2017 at 10:03 PM, Robert Haaswrote: > Q: OK, where is my WAL stored? > A: pg_wal > Q: How do I reset it? > A: pg_resetxlog > On the topic of binaries, there's going to be another patch renaming them. Those will have no aliases as it's trivial to work-around (symlinks, shell scripts, etc) and not so trivial to implement in a portable way. I'm used to workflows with lots of small commits, so I was not able to force myself to include all of the renames in one diff. -- Vladimir Rusinov Storage SRE, Google Ireland Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland Registered in Dublin, Ireland Registration Number: 368047 smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.
On Fri, Jan 6, 2017 at 6:58 PM, Ashutosh Sharmawrote: I've successfully applied the patch on the latest head and ran a regression tests without any failure. There is no major changes. However, I've some minor comments on the patch: +/* + * HASH_ALLOCATABLE_PAGE_SZ represents allocatable + * space (pd_upper - pd_lower) on a hash page. + */ +#define HASH_ALLOCATABLE_PAGE_SZ \ + BLCKSZ - \ + (SizeOfPageHeaderData + sizeof(HashPageOpaqueData)) My suggestion will be not to write "(pd_upper - pd_lower)" in the comment. You may write allocatable space on a empty hash page. + buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, NULL); Use BAS_BULKREAD strategy to read the buffer. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Wed, Jan 4, 2017 at 5:36 PM, Merlin Moncurewrote: > Still getting checksum failures. Over the last 30 days, I see the > following. Since enabling checksums FWICT none of the damage is > permanent and rolls back with the transaction. So creepy! The checksums still only differ in least significant digits which pretty much means that there is a block number mismatch. So if you rule out filesystem not doing its job correctly and transposing blocks, it could be something else that is resulting in blocks getting read from a location that happens to differ by a small multiple of page size. Maybe somebody is racily mucking with table fd's between seeking and reading. That would explain the issue disappearing after a retry. Maybe you can arrange for the RelFileNode and block number to be logged for the checksum failures and check what the actual checksums are in data files surrounding the failed page. If the requested block number contains something completely else, but the page that follows contains the expected checksum value, then it would support this theory. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
Hi, > 1. > +static Page > +verify_hash_page(bytea *raw_page, int flags) > > Few checks for meta page are missing, refer _hash_checkpage. okay, I have added the checks for meta page as well. Please refer to attached patch. > > 2. > + if (!superuser()) > + ereport(ERROR, > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > + (errmsg("must be superuser to use pageinspect functions"; > > Isn't it better to use "raw page" instead of "pageinspect" in the > above error message? If you agree, then fix other similar occurrences > in the patch. Yes, knowing that we deal with raw pages as in brin index it would be good to use raw page instead of pageinspect to maintain the consistency in the error message. > > 3. > + values[j++] = CStringGetTextDatum(psprintf("(%u,%u)", > + BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)), > + itup->t_tid.ip_posid)); > > Fix indentation in the third line. > Corrected. Please check the attached patch. > 4. > +Datum > +hash_page_items(PG_FUNCTION_ARGS) > +{ > + bytea *raw_page = PG_GETARG_BYTEA_P(0); > > > +Datum > +hash_bitmap_info(PG_FUNCTION_ARGS) > +{ > + Oid indexRelid = PG_GETARG_OID(0); > + uint32 ovflblkno = PG_GETARG_UINT32(1); > > Is there a reason for keeping the input arguments for > hash_bitmap_info() different from hash_page_items()? > Yes, there are two reasons behind it. Firstly, we need metapage to identify the bitmap page that holds the information about the overflow page passed as an input to this function. Secondly, we will have to input overflow block number as an input to this function so as to determine the overflow bit number which can be used further to identify the bitmap page. + /* Read the metapage so we can determine which bitmap page to use */ + metabuf = _hash_getbuf(indexRel, HASH_METAPAGE, HASH_READ, LH_META_PAGE); + metap = HashPageGetMeta(BufferGetPage(metabuf)); + + /* Identify overflow bit number */ + ovflbitno = _hash_ovflblkno_to_bitno(metap, ovflblkno); + + bitmappage = ovflbitno >> BMPG_SHIFT(metap); + bitmapbit = ovflbitno & BMPG_MASK(metap); + + if (bitmappage >= metap->hashm_nmaps) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg("invalid overflow bit number %u", ovflbitno))); + + bitmapblkno = metap->hashm_mapp[bitmappage]; > 5. > +hash_metapage_info(PG_FUNCTION_ARGS) > { > .. > + spares = palloc0(HASH_MAX_SPLITPOINTS * 5 + 1); > .. > + mapp = palloc0(HASH_MAX_BITMAPS * 5 + 1); > .. > } > > Don't you think we should free the allocated memory in this function? > Also, why are you 5 as a multiplier in both the above pallocs, > shouldn't it be 4? Yes, we should free it. We have used 5 as a multiplier instead of 4 because of ' ' character. Apart from above comments, the attached patch also handles all the comments from Mithun. With Regards, Ashutosh Sharma. EnterpriseDB: http://www.enterprisedb.com 0001-Add-support-for-hash-index-in-pageinspect-contrib-mo.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Hello. (I apologize in advance for possible inaccurate wording on maths, which might cause confusion..) At Wed, 11 Jan 2017 16:37:59 +0100, Emre Hasegeliwrote in > > - Floating point comparisons for gemetric types > > > > Comparison related to geometric types is performed by FPeq > > macro and similars defined in geo_decls.h. This intends to give > > tolerance to the comparisons. > > > > A > > FPeq: |<=e-|-e=>|(<= means inclusive, e = epsilon = tolerance) > > FPne: ->| e | e |<- (<- means exclusive) > > FPlt: | e |<- > > FPle: |<=e | > > FPgt: ->| e | > > FPge: | e=>| > > > > These seems reasonable ignoring the tolerance amount issue. > > I tried to explain below why I don't find this method reasonable. Thank you very much for the explanation. > > At least for the point type, (bitmap) index scan is consistent > > with sequential scan. I remember that the issue was > > "inconsistency between indexed and non-indexed scans over > > geometric types" but they seem consistent with each other. > > You can see on the Git history that it was a lot of trouble to make > the indexes consistent with the operators, and this is limiting > improving indexing of the geometric types. Yeah. geo_ops.c has the following comment from its first version in the current repository. | * Relational operators for Points. | * Since we do have a sense of coordinates being | * "equal" to a given accuracy (point_vert, point_horiz), So, we take it as a requirement for geometric types. All the difficulties come from the "nature". > > You mentioned b-tree, which don't have predefined opclass for > > geometric types. So the "inconsistency" may be mentioning the > > difference between geometric values and combinations of plain > > (first-class) values. But the two are different things and > > apparently using different operators (~= and = for equality) so > > the issue is not fit for this. More specifically, "p ~= p0" and > > "x = x0 and y = y0" are completely different. > > Yes, the default btree operator class doesn't have to implement > standard equality and basic comparison operator symbols. We can > implement it with different symbols. Perhaps I don't understand it. Many opclass are defined for btree. But since (ordinary) btree handles only one-dimentional, totally-orderable sets, geometric types even point don't fit it. Even if we relaxed it by removing EPSILON, multidimentional data still doesn't fit. Still we can implement equality mechanism *over* multiple btree indexes, but it cannot be a opclass for btree. > > Could you let me (or any other guys on this ml) have more precise > > description on the problem and/or what you want to do with this > > patch? > > I will try to itemize the current problems with the epsilon method: > > = Operator Inconsistencies > > My current patches address all of them. No. It just removes tolerans that is a "requirement" for coordinates of geometric types. I think we don't have enough reason to remove it. We could newly define geometric types with no-tolerance as 'exact_point" or something but it still doesn't fit btree. > - Only some operators are using the epsilon. > > I included the list of the ones which doesn't use the epsilon on > initial post of this thread. This makes the operators not compatible > with each other. For example, you cannot say "if box_a @> box_b and > box_b @> point then box_a @> box_b". (It must be "then box_a @> point".) Both of the operators don't seem to use EPSILON so the transitivity holds, but perhaps we should change them to more appropriate shape, that is, to use EPSILON. Even having the tolerance, we can define the operators to keep the transitivity. > - The application of epsilon is implementation dependent. > > Epsilon works between two numbers. There is not always a single way > of implementing geometric operators. This is surprising to the users. > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <= > epsilon". Maybe it should be like "if box_a ~= box_b && box_b @< box_a then ..". I'm not sure off-hand. But I don't see the relationship is significant. > This is also creating a high barrier on developing those operators. > Fixing bugs and performance regressions are very likely to change the > results. I agree to you. The effect of the EPSILON is quite arbitrary and difficult to see their chemistry. > - Some operators are just wrong: > > Line operators are not well maintained. The following are giving > wrong results when neither A or B of lines are not exactly 1: > > * line # line > * line <-> line > * point ## line > * point ## lseg These are not comparison operators so EPSILON is unrelated. And I agree that they needs fix. > They are broken independently from the epsilon, though it is not easy > to fix them without