Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON
On Thu, Aug 4, 2016 at 7:46 PM, Pavel Stehulewrote: > 2016-08-04 15:37 GMT+02:00 Amit Kapila : >> >> > I dislike automatic commit or rollback here. >> > >> >> What problem you see with it, if we do so and may be mention the same >> in docs as well. Anyway, I think we should make the behaviour of both >> ecpg and psql same. > > > Implicit COMMIT can be dangerous > Not, when user has specifically requested for autocommit mode as 'on'. I think here what would be more meaningful is that after "Set AutoCommit On", when the first command is committed, it should commit previous non-pending committed commands as well. >> >> Not sure what benefit we will get by raising warning. I think it is >> better to choose one behaviour (automatic commit or leave the >> transaction open as is currently being done in psql) and make it >> consistent across all clients. > > > I am not sure about value of ecpg for this case. It is used by 0.0001% > users. Probably nobody in Czech Republic knows this client. > Sure, but that doesn't give us the license for being inconsistent in behaviour across different clients. > Warnings enforce the user do some decision > They could be annoying as well, especially if that happens in scripts. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Sat, Aug 6, 2016 at 8:34 AM, Bruce Momjianwrote: > On Fri, Aug 5, 2016 at 09:40:35PM -0400, Bruce Momjian wrote: > > So to summarize again: > > > > o chains start out as HOT > > o they become WARM when some indexes change and others don't > > o for multiple index changes, we have to check all indexes > >for key/ctid matches > > o for single index changes, we can fail HOT and create a new > >non-HOT/WARM tuple if there are too many index matches > > o 99% of index checks will not find a key/ctid match > > I think a WARM chain where the the head ctid isn't LP_REDIRECT wasn't > pruned, so here is an updated list: > > o chains start out as HOT > o they become WARM when some indexes change and others don't > o if WARM chain head is not LP_REDIRECT, check existing chain for key >matches > o if WARM chain head is LP_REDIRECT: > o for single index changes, we can fail HOT and create a new >non-HOT/WARM tuple if there are too many index matches > o for multiple index changes, we have to check all indexes >for key/ctid matches > o 99% of index checks will not find a key/ctid match > > So, we are only checking the index if the WARM chain was pruned, and we > can bail out if there is only one index changed. This is looking more > doable. The duplicate tuples problem that we are focusing on, happens when an index already has two or index tuples pointing to the same root tuple/lp. When it's time to insert third index tuple, we must not insert a duplicate (key, CTID) tuple. I've a design where we can track which columns (we are interested only in the columns on which indexes use) were ever changed in the WARM chain. We allow one change for every index column, but the second change will require a duplicate lookup. This is still quite an improvement, the cold updates may reduce by at least more than 50% already, but someone can argue that this does not handle the case where same index column is repeatedly updated. If we need to find an efficient way to convert WARM chains back to HOT, which will happen soon when the old index tuple retires, the system can attain a stable state, not for all but many use cases. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] "Some tests to cover hash_index"
On Thu, Aug 4, 2016 at 7:24 PM, Mithun Cywrote: > I am attaching the patch to improve some coverage of hash index code [1]. > I have added some basic tests, which mainly covers overflow pages. It took > 2 sec extra time in my machine in parallel schedule. > > > > > Hit Total Coverage > old tests Line Coverage 780 1478 52.7 > > Function Coverage 63 85 74.1 > improvement after tests Line Coverage 1181 1478 79.9 % > > Function Coverage 78 85 91.8 % > > I think the code coverage improvement for hash index with these tests seems to be quite good, however time for tests seems to be slightly on higher side. Do anybody have better suggestion for these tests? diff --git a/src/test/regress/sql/concurrent_hash_index.sql b/src/test/regress/sql/concurrent_hash_index.sql I wonder why you have included a new file for these tests, why can't be these added to existing hash_index.sql. +-- +-- Cause some overflow insert and splits. +-- +CREATE TABLE con_hash_index_table (keycol INT); +CREATE INDEX con_hash_index on con_hash_index_table USING HASH (keycol); The relation name con_hash_index* choosen in above tests doesn't seem to be appropriate, how about hash_split_heap* or something like that. Register your patch in latest CF (https://commitfest.postgresql.org/10/) -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 09:40:35PM -0400, Bruce Momjian wrote: > So to summarize again: > > o chains start out as HOT > o they become WARM when some indexes change and others don't > o for multiple index changes, we have to check all indexes >for key/ctid matches > o for single index changes, we can fail HOT and create a new >non-HOT/WARM tuple if there are too many index matches > o 99% of index checks will not find a key/ctid match I think a WARM chain where the the head ctid isn't LP_REDIRECT wasn't pruned, so here is an updated list: o chains start out as HOT o they become WARM when some indexes change and others don't o if WARM chain head is not LP_REDIRECT, check existing chain for key matches o if WARM chain head is LP_REDIRECT: o for single index changes, we can fail HOT and create a new non-HOT/WARM tuple if there are too many index matches o for multiple index changes, we have to check all indexes for key/ctid matches o 99% of index checks will not find a key/ctid match So, we are only checking the index if the WARM chain was pruned, and we can bail out if there is only one index changed. This is looking more doable. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ctl promote wait
On 8/5/16 12:14 AM, Michael Paquier wrote: > In do_stop, this patches makes the wait happen for a maximum of > wait_seconds * 2, once when getting the control file information, and > once when waiting for the server to shut down. That's not how I read it. get_controlfile() will decrease the wait_seconds argument by how much wait time it has used. The wait for shutdown will then only use as much seconds as are left. > This is not a good > idea, and the idea of putting a wait argument in get_controlfile does > not seem a good interface to me. I'd rather see get_controlfile be > extended with a flag saying no_error_on_failure and keep the wait > logic within pg_ctl. I guess we could write a wrapper function in pg_ctl that encapsulated the wait logic. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 09:02:39PM -0400, Bruce Momjian wrote: > Yes, it seems we either find the entry fast and avoid the index > addition, or don't find it quickly and use a non-HOT, non-WARM update. > The problem is that you have to do this for multiple indexes, and if you > quickly find you need to add an entry to the first index, when you get > to the second one you can't easily bail out and go with a non-HOT, > non-WARM update. I suppose we could bail out of a long index search if > there is only one index with a changed key. > > Here's how I understand it --- if you are looking for a key that has > only a few index entries, it will be fast to find of the key/ctid is > listed. If the index has many index entries for the key, it will be > expensive to find if there is a matching key/ctid, but a read-only-query > index lookup for that key will be expensive too, whether you use the > bitmap scan or not. And, effectively, if we bail out and decide to go > with a non-HOT, non-WARM update, we are making the index even bigger. So to summarize again: o chains start out as HOT o they become WARM when some indexes change and others don't o for multiple index changes, we have to check all indexes for key/ctid matches o for single index changes, we can fail HOT and create a new non-HOT/WARM tuple if there are too many index matches o 99% of index checks will not find a key/ctid match I am not sure how to optimize an index non-match. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 09:21:49PM -0300, Claudio Freire wrote: > On Fri, Aug 5, 2016 at 9:07 PM, Bruce Momjianwrote: > > On Fri, Aug 5, 2016 at 07:51:05PM -0300, Claudio Freire wrote: > >> > This does create more HOT chains where the root ctid cannot be removed, > >> > but it does avoid the index key/ctid check because any entry in the HOT > >> > chain has identical keys. What this would not handle is when an entire > >> > HOT chain is pruned, as the keys would be gone. > >> > >> I believe the only solution is to use bitmap index scans with WARM indexes. > > > > Let me back up and explain the benefits we get from allowing HOT chains > > to be WARM: > > > > * no index entries for WARM indexes whose values don't change > > * improved pruning because the HOT/WARM chains can be longer because we > >don't have to break chains for index changes > > > > While I realize bitmap indexes would allow us to remove duplicate index > > entries, it removes one of the two benefits of WARM, and it causes every > > index read to be expensive --- I can't see how this would be a win over > > doing the index check on writes. > > But the index check could be prohibitely expensive. True, but I am afraid the bitmap handling on reads will be worse. > So we're back to bailing out? > > When an update comes and we're considering WARM, we need to query the > indexes, each one, and if one index cannot quickly verify the > existence of an entry for the root tid for the key, bail out from > WARM. Yes, it seems we either find the entry fast and avoid the index addition, or don't find it quickly and use a non-HOT, non-WARM update. The problem is that you have to do this for multiple indexes, and if you quickly find you need to add an entry to the first index, when you get to the second one you can't easily bail out and go with a non-HOT, non-WARM update. I suppose we could bail out of a long index search if there is only one index with a changed key. Here's how I underestand it --- if you are looking for a key that has only a few index entries, it will be fast to find of the key/ctid is listed. If the index has many index entries for the key, it will be expensive to find if there is a matching key/ctid, but a read-only-query index lookup for that key will be expensive too, whether you use the bitmap scan or not. And, effectively, if we bail out and decide to go with a non-HOT, non-WARM update, we are making the index even bigger. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 9:21 PM, Claudio Freirewrote: > On Fri, Aug 5, 2016 at 9:07 PM, Bruce Momjian wrote: >> On Fri, Aug 5, 2016 at 07:51:05PM -0300, Claudio Freire wrote: >>> > This does create more HOT chains where the root ctid cannot be removed, >>> > but it does avoid the index key/ctid check because any entry in the HOT >>> > chain has identical keys. What this would not handle is when an entire >>> > HOT chain is pruned, as the keys would be gone. >>> >>> I believe the only solution is to use bitmap index scans with WARM indexes. >> >> Let me back up and explain the benefits we get from allowing HOT chains >> to be WARM: >> >> * no index entries for WARM indexes whose values don't change >> * improved pruning because the HOT/WARM chains can be longer because we >>don't have to break chains for index changes >> >> While I realize bitmap indexes would allow us to remove duplicate index >> entries, it removes one of the two benefits of WARM, and it causes every >> index read to be expensive --- I can't see how this would be a win over >> doing the index check on writes. > > But the index check could be prohibitely expensive. Well... it could be made efficient for the case of nbtree with a format change. If nbtree sorted by tid as well, for instance. But an upgrade there would involve a reindex before WARM can be applied. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 9:07 PM, Bruce Momjianwrote: > On Fri, Aug 5, 2016 at 07:51:05PM -0300, Claudio Freire wrote: >> > This does create more HOT chains where the root ctid cannot be removed, >> > but it does avoid the index key/ctid check because any entry in the HOT >> > chain has identical keys. What this would not handle is when an entire >> > HOT chain is pruned, as the keys would be gone. >> >> I believe the only solution is to use bitmap index scans with WARM indexes. > > Let me back up and explain the benefits we get from allowing HOT chains > to be WARM: > > * no index entries for WARM indexes whose values don't change > * improved pruning because the HOT/WARM chains can be longer because we >don't have to break chains for index changes > > While I realize bitmap indexes would allow us to remove duplicate index > entries, it removes one of the two benefits of WARM, and it causes every > index read to be expensive --- I can't see how this would be a win over > doing the index check on writes. But the index check could be prohibitely expensive. So we're back to bailing out? When an update comes and we're considering WARM, we need to query the indexes, each one, and if one index cannot quickly verify the existence of an entry for the root tid for the key, bail out from WARM. That may prevent the benefits of WARM in a lot of real world cases. You just need one less-than-unique index to disable WARM for all updates, and IIRC those are common. You'd pay the CPU price of WARM without any of the benefits. I don't think I have a single table in my databases that don't exhibit this behavior. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
> > > I really don't get what's problematic with posting a message on a mailing > > list about a potential performance issue, to try to get people's > reactions, > > without diving into profiling right away. I'm not a PostgreSQL developer, > > have other urgent things to do and don't even spend most of my > programming > > time in C. > > There's absolutely nothing wrong with that. I find your questions > helpful and interesting and I hope you will keep asking them. I think > that they are a valuable contribution to the discussion on this list. > Thanks for the positive feedback Robert. I'm glad reducing the overhead of out-of-line parameters seems like an important goal. FWIW, if as Vladimir seems to suggest, it's possible to bring down the overhead of the v3 extended protocol to somewhere near the simple protocol, that would obviously be a better solution - it would mean things work faster here and now, rather than waiting for the v4 protocol. I have no idea if that's possible though, I'll see if I can spend some time on understand where the slowdown comes from.
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 07:51:05PM -0300, Claudio Freire wrote: > > This does create more HOT chains where the root ctid cannot be removed, > > but it does avoid the index key/ctid check because any entry in the HOT > > chain has identical keys. What this would not handle is when an entire > > HOT chain is pruned, as the keys would be gone. > > I believe the only solution is to use bitmap index scans with WARM indexes. Let me back up and explain the benefits we get from allowing HOT chains to be WARM: * no index entries for WARM indexes whose values don't change * improved pruning because the HOT/WARM chains can be longer because we don't have to break chains for index changes While I realize bitmap indexes would allow us to remove duplicate index entries, it removes one of the two benefits of WARM, and it causes every index read to be expensive --- I can't see how this would be a win over doing the index check on writes. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New version numbering practices
Peter Eisentrautwrites: > One hiccup I found is that server_version_num is not sent to clients. > Instead, libpq assembles the numeric version number itself from the > string version, and it will fail if it sees only one number (e.g., > 10devel). It will then set the version number to 0 for "unknown". Per discussion, I applied and back-patched the libpq/fe-exec.c part of this, so that back-branch clients will have a decent chance of understanding the new server_version format by the time it hits the field. In a quick look around, it seemed like we might also want to fix and back-patch the server version printing logic in psql's connection_warnings() function. However that would involve touching a translatable string so I thought it best not to do it just before back-branch releases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 4:26 PM, Bruce Momjianwrote: > On Fri, Aug 5, 2016 at 11:30:26PM +0530, Pavan Deolasee wrote: >> On Fri, Aug 5, 2016 at 11:26 PM, Bruce Momjian wrote: >> >> On Fri, Aug 5, 2016 at 02:43:37PM -0300, Claudio Freire wrote: >> > But doing the WARM chain backtracking and checking for previous >> > versions with appropriate keys should be enough to support this use >> > case too, it just needs to be well optimized to avoid seriously >> > impacting performance on the average case. >> >> Yes, that was an idea I had to --- if the in-page HOT chain already has >> the key, we know it is already in the index and we can skip the index >> addition. >> >> I just don't know how would you do that without delaying/not-doing HOT chain >> prune. As soon as root and intermediate HOT tuples are gone, all information >> is >> lost. You may delay that, but that will defeat the whole purpose. If chains >> are >> not pruned in-time, you may not find any free space in the page and end up >> doing a cold update anyways. But may be I am missing something and Claudio >> has >> a different idea. > > Yes, pruning would be a problem. :-( > > A check only needs to happen when the indexed key changes, right? So we > are comparing the cost of adding an index key vs. the cost of trying to > find a matching key/ctid in the index. Seems the later is cheaper, but > it is not obvious. > > I think I see Claudio's idea now --- from his diagram, you can have WARM > chains span multiple HOT chains. What he is doing is creating a new HOT > chain everytime _any_ key changes, and he knows the entire HOT chain has > idential values for all indexes. He moves from one HOT chain to another > during an index scan by checking if the index value is the same in the > old and new HOT chain (that is the same WARM chain). > > This does create more HOT chains where the root ctid cannot be removed, > but it does avoid the index key/ctid check because any entry in the HOT > chain has identical keys. What this would not handle is when an entire > HOT chain is pruned, as the keys would be gone. I believe the only solution is to use bitmap index scans with WARM indexes. Doing so simplifies things considerably. For one, it isolates the new functionality and makes it opt-in. Then, WARM indexes can always point to the root of the HOT chain, and be ignored for satisfies HOT. The planner will only consider bitmap index scans with WARM indexes, and always issue a recheck. The bitmap is needed to remove duplicate item pointers, and the recheck, as usual, to filter out unwanted versions. On update, one only needs a pointer to the HOT head, which would arguably be at hand, or at worst reachable by backtracking t_ctid pointers. No key checks of any kind, and no weird flags required. I don't see another solution to the information loss when pruning whole HOT chains. Suppose we start with this situation: lp: 1 2h 3h 4w 5h 6h 7h k1: a a a b b a a k2: c c c c c c c So we have 2 HOT chains, 1-3, 4-7, two indexes, one never updated, the other with an update and then returning to the same key. The first index will have two index entries, one for key a, another for b, both pointing at 1. a -> (p,1) b -> (p,1) Now versions 1-3 are dead, so we want to prune them. We end up with a redirect on the LP for 1 -> 4, LPs 2 and 3 unused because they were HOT r4 u u lp: 1 2 3 4w 5h 6 7h k1: _ _ _ b b a a k2: _ _ _ c c c c Now suppose versions 4 and 5 die. So we end up with: r6 u u u u lp: 1 2 3 4 5 6 7h k1: _ _ _ _ _ a a k2: _ _ _ _ _ c c We just forgot that "b" was in k1, yet we still have an index entry pointing to the chain. Should an update come: r6 u u u u lp: 1 2 3 4 5 6 7h 8? k1: _ _ _ _ _ a a b k2: _ _ _ _ _ c c c Is an index entry b->(p,1) needed for 8? According to the logic, it is, k1 would need a new entry b -> (p,1), but the entry is already there. It's just unverifiable in reasonable time. So a simpler solution is to always add such entries, and let the bitmap index scan filter duplicates. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] System load consideration before spawning parallel workers
On 8/1/16 1:08 AM, Haribabu Kommi wrote: There are some utilities and functions that are available to calculate the current system load, based on the available resources and system load, the module can allow the number of parallel workers that can start. In my observation, adding this calculation will add some overhead for simple queries. Because of this reason, i feel this can be hook function, only for the users who want it, can be loaded. I think we need to provide more tools to allow users to control system behavior on a more dynamic basis. How many workers to launch is a good example. There's more reasons than just CPU that parallel workers can help (IO being an obvious one, but possible other things like GPU). Another example is allowing users to alter the selection process used by autovac workers. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On Sat, Aug 6, 2016 at 2:38 AM, Tomas Vondrawrote: > On 08/05/2016 06:24 AM, Michael Paquier wrote: >> >> On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra >> wrote: >>> >>> Attached is v19 of the "multivariate stats" patch series - essentially >>> v18 >>> rebased on top of current master. Aside from a few bug fixes, the main >>> improvement is addition of SGML docs demonstrating the statistics in a >>> way >>> similar to the current "Row Estimation Examples" (and the docs are >>> actually >>> in the same section). I've tried to keep the right amount of technical >>> detail (and pointing to the right README for additional details), but >>> this >>> may need improvements. I have not written docs explaining how statistics >>> may >>> be combined yet (more about this later). >> >> >> What we have here is quite something: >> $ git diff master --stat | tail -n1 >> 77 files changed, 12809 insertions(+), 65 deletions(-) >> I will try to get familiar on the topic and added myself as a reviewer >> of this patch. Hopefully I'll get feedback soon. > > > Yes, it's a large patch. Although 25% of the insertions are SGML docs, > regression tests and READMEs, and large part of the remaining ~9k insertions > are comments. But it may still be overwhelming, no doubt about that. > > FWIW, if someone is interested in the patch but is unsure where to start, > I'm ready to help with that as much as possible. For example if you happen > to go to PostgresOpen, feel free to drag me to a corner and ask me as many > questions as you want ... Sure. Only PGconf SV is on my track this year. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 03:26:15PM -0400, Bruce Momjian wrote: > > I just don't know how would you do that without delaying/not-doing HOT chain > > prune. As soon as root and intermediate HOT tuples are gone, all > > information is > > lost. You may delay that, but that will defeat the whole purpose. If chains > > are > > not pruned in-time, you may not find any free space in the page and end up > > doing a cold update anyways. But may be I am missing something and Claudio > > has > > a different idea. > > Yes, pruning would be a problem. :-( > > A check only needs to happen when the indexed key changes, right? So we > are comparing the cost of adding an index key vs. the cost of trying to > find a matching key/ctid in the index. Seems the later is cheaper, but > it is not obvious. Here is an illustration of the issue: CREATE TABLE test (col1 INTEGER, col2 INTEGER, col3 INTEGER); -- index first two columns CREATE INDEX i_test1 ON test (col1); CREATE INDEX i_test2 ON test (col2); INSERT INTO test VALUES (1, 7, 20); -- create a HOT chain UPDATE test SET col3 = 30; -- change the HOT chain to a WARM chain, changes i_test2 but not i_test1 UPDATE test SET col2 = 8; -- we should avoid adding a col2=7 i_test2 index tuple -- because we already have one; how do we know that? UPDATE test SET col2 = 7; -- we should see only one col2=7 i_test2 index tuple SELECT * FROM test WHERE col2 = 7; col1 | col2 | col3 --+--+-- 1 |7 | 30 -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls
Andrew Gierthwrites: > Objection withdrawn. OK, thanks. What shall we do about Andreas' request to back-patch this? I'm personally willing to do it, but there is the old bugaboo of "maybe it will destabilize a plan that someone is happy with". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncate trigger for foreign data wrappers
On 2016-08-05 16:44:20 -0400, Tom Lane wrote: > Andres Freundwrites: > > On 2016-08-05 16:35:02 -0400, Tom Lane wrote: > >> In particular, it seems to me that rather than implement just this, > >> we really ought to provide an API that lets FDWs actually implement > >> TRUNCATE if they feel like it. Having the trigger and not TRUNCATE > >> capability itself just screams "half baked", wouldn't you say? > > > Both is fine with me. I do object to the position that we need an answer > > for all utility commands - that seems like a too high hurdle to pass - > > but implementing truncate for FDWs directly sounds good. > > To clarify: I was certainly not suggesting that we need to implement > more than that in the first go. I was just saying that some sort of > long-term roadmap about utility commands for FDWs would be a good idea. Well, my problem with that is that I don't see TRUNCATE as being in the same camp as most other utility commands; thus I'm not sure there's really a coherent view for it and the rest. In the end it's just an optimized DELETE, and we didn't say that other DML needs to provide a view for utility commands either. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Fri, Aug 5, 2016 at 9:06 AM, Robert Haaswrote: > To some extent, sure, absolutely. But it's our job as developers to > try to foresee and minimize those cases. When Noah was at > EnterpriseDB a few years ago and we were talking about parallel > internal sort, Noah started by doing a survey of the literature and > identified parallel quicksort as the algorithm that seemed best for > our use case. Of course, every time quicksort partitions the input, > you get two smaller sorting problems, so it's easy to see how to use 2 > CPUs after the initial partitioning step has been completed and 4 CPUs > after each of those partitions has been partitioned again, and so on. > However, that turns out not to be good enough because the first > partitioning step can consume a significant percentage of the total > runtime - so if you only start parallelizing after that, you're > leaving too much on the table. To avoid that, the algorithm he was > looking at had a (complicated) way of parallelizing the first > partitioning step; then you can, it seems, do the full sort in > parallel. > > There are some somewhat outdated and perhaps naive ideas about this > that we wrote up here: > > https://wiki.postgresql.org/wiki/Parallel_Sort I'm familiar with that effort. I think that when research topics like sorting, it can sometimes be a mistake to not look at an approach specifically recommended by the database research community. A lot of the techniques we've benefited from within tuplesort.c have been a matter of addressing memory latency as a bottleneck; techniques that are fairly simple and not worth writing a general interest paper on. Also, things like abbreviated keys are beneficial in large part because people tend to follow the first normal form, and therefore an abbreviated key can contain a fair amount of entropy most of the time. Similarly, Radix sort seems really cool, but our requirements around generality seem to make it impractical. > Anyway, you're proposing an algorithm that can't be fully > parallelized. Maybe that's OK. But I'm a little worried about it. > I'd feel more confident if we knew that the merge could be done in > parallel and were just leaving that to a later development stage; or > if we picked an algorithm like the one above that doesn't leave a > major chunk of the work unparallelizable. I might be able to resurrect the parallel merge stuff, just to guide reviewer intuition on how much that can help or hurt. I can probably repurpose it to show you the mixed picture on how effective it is. I think it might help more with collatable text that doesn't have abbreviated keys, for example, because you can use more of the machines memory bandwidth for longer. But for integers, it can hurt. (That's my recollection; I prototyped parallel merge a couple of months ago now.) >> Isn't that what you >> generally see with queries that show off the parallel join capability? > > For nested loop joins, no. The whole join operation can be done in > parallel. Sure, I know, but I'm suggesting that laws-of-physics problems may still be more significant than implementation deficiencies, even though those deficiencies should need to be stamped out. Linear scalability is really quite rare for most database workloads. > Obviously, parallel query is subject to a long list of annoying > restrictions at this point. On queries that don't hit any of those > restrictions we can get 4-5x speedup with a leader and 4 workers. As > we expand the range of plan types that we can construct, I think we'll > see those kinds of speedups for a broader range of queries. (The > question of exactly why we top out with as few workers as currently > seems to be the case needs more investigation, too; maybe contention > effects?) You're probably bottlenecked on memory bandwidth. Note that I showed improvements with 8 workers, not 4. 4 Workers are slower than 8, but not by that much. >> https://www.mssqltips.com/sqlservertip/3100/reduce-time-for-sql-server-index-rebuilds-and-update-statistics/ > > I do agree that it is important not to have unrealistic expectations. Great. My ambition for this patch is that it put parallel CREATE INDEX on a competitive footing against the implementations featured in other major systems. I don't think we need to do everything at once, but I have no intention of pushing forward with something that doesn't do respectably there. I also want to avoid partitioning in the first version of this, and probably in any version that backs CREATE INDEX. I've only made minimal changes to the tuplesort.h interface here to support parallelism. That flexibility counts for a lot, IMV. >> As I've said, there is probably a good argument to be made for >> partitioning to increase parallelism. But, that involves risks around >> the partitioning being driven by statistics or a cost model > Yes. Rushabh is working on that, and Finalize GroupAggregate -> > Gather Merge -> Partial
Re: [HACKERS] truncate trigger for foreign data wrappers
Andres Freundwrites: > On 2016-08-05 16:35:02 -0400, Tom Lane wrote: >> In particular, it seems to me that rather than implement just this, >> we really ought to provide an API that lets FDWs actually implement >> TRUNCATE if they feel like it. Having the trigger and not TRUNCATE >> capability itself just screams "half baked", wouldn't you say? > Both is fine with me. I do object to the position that we need an answer > for all utility commands - that seems like a too high hurdle to pass - > but implementing truncate for FDWs directly sounds good. To clarify: I was certainly not suggesting that we need to implement more than that in the first go. I was just saying that some sort of long-term roadmap about utility commands for FDWs would be a good idea. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncate trigger for foreign data wrappers
On 2016-08-05 16:35:02 -0400, Tom Lane wrote: > In particular, it seems to me that rather than implement just this, > we really ought to provide an API that lets FDWs actually implement > TRUNCATE if they feel like it. Having the trigger and not TRUNCATE > capability itself just screams "half baked", wouldn't you say? Both is fine with me. I do object to the position that we need an answer for all utility commands - that seems like a too high hurdle to pass - but implementing truncate for FDWs directly sounds good. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncate trigger for foreign data wrappers
Andres Freundwrites: > On 2016-08-05 14:05:02 -0400, Robert Haas wrote: >> I agree, but I still think it's weird if foreign tables support >> TRUNCATE itself not but triggers on TRUNCATE. > You mean the other way round? To me this seems very comparable to > INSTEAD triggers, but ... While I have no particular objection to allowing TRUNCATE triggers on FDWs, I concur with Robert's feeling that we ought to sketch out a plan for what utility commands on foreign tables should do in general, before we go introducing individual new features. It would be annoying if we stuck this in with little forethought and then found out it was inconsistent with other stuff people want to add later. In particular, it seems to me that rather than implement just this, we really ought to provide an API that lets FDWs actually implement TRUNCATE if they feel like it. Having the trigger and not TRUNCATE capability itself just screams "half baked", wouldn't you say? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add hint for people who place EXECUTE USING arguments in parentheses (in plpgsql)
Basically, diff --git a/src/backend/parser/parse_param.c b/src/backend/parser/parse_param.c index b402843..97064fc 100644 --- a/src/backend/parser/parse_param.c +++ b/src/backend/parser/parse_param.c @@ -108,6 +108,9 @@ fixed_paramref_hook(ParseState *pstate, ParamRef *pref) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_PARAMETER), errmsg("there is no parameter $%d", paramno), + parstate->numParams == 1 && < It is of pseudo-type record > + ? errhint("%s", _("did you incorrectly enclose USING arguments in parentheses?")) + : 0 parser_errposition(pstate, pref->location))); param = makeNode(Param); I didn't spend too much time trying to figure out how to test that a parameter is composite/record typed... I think a false positive on SQL EXECUTE is going to be very small...but I choose here mostly out of ease - a fix in pl/pgsql would be more correct. David J.
Re: [HACKERS] Notice lock waits
On Fri, Aug 5, 2016 at 12:17 PM, Tom Lanewrote: > Jeff Janes writes: >> I have it PGC_SUSET because it does send some tiny amount of >> information about the blocking process (the PID) to the blocked >> process. That is probably too paranoid, because the PID can be seen >> by anyone in the pg_locks table anyway. > > Why not just leave out the PID? I think it's often far too simplistic > to blame a lock wait on a single other process, anyway. It actually wasn't including the PID anyway, as the errdetail_log_plural was not getting passed to the client. So I changed it to PGC_USERSET, didn't attempt to include details that won't be sent anyway (although it would be nice for a superuser to be able to see the statement text of the blocker, but that is a bigger issue than I am willing to deal with here) and have removed a memory leak/bug I introduced by foolishly trying to use 'continue' to avoid introducing yet another layer of nesting. Cheers, Jeff notice_lock_waits-V02.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 11:30:26PM +0530, Pavan Deolasee wrote: > On Fri, Aug 5, 2016 at 11:26 PM, Bruce Momjianwrote: > > On Fri, Aug 5, 2016 at 02:43:37PM -0300, Claudio Freire wrote: > > But doing the WARM chain backtracking and checking for previous > > versions with appropriate keys should be enough to support this use > > case too, it just needs to be well optimized to avoid seriously > > impacting performance on the average case. > > Yes, that was an idea I had to --- if the in-page HOT chain already has > the key, we know it is already in the index and we can skip the index > addition. > > I just don't know how would you do that without delaying/not-doing HOT chain > prune. As soon as root and intermediate HOT tuples are gone, all information > is > lost. You may delay that, but that will defeat the whole purpose. If chains > are > not pruned in-time, you may not find any free space in the page and end up > doing a cold update anyways. But may be I am missing something and Claudio has > a different idea. Yes, pruning would be a problem. :-( A check only needs to happen when the indexed key changes, right? So we are comparing the cost of adding an index key vs. the cost of trying to find a matching key/ctid in the index. Seems the later is cheaper, but it is not obvious. I think I see Claudio's idea now --- from his diagram, you can have WARM chains span multiple HOT chains. What he is doing is creating a new HOT chain everytime _any_ key changes, and he knows the entire HOT chain has idential values for all indexes. He moves from one HOT chain to another during an index scan by checking if the index value is the same in the old and new HOT chain (that is the same WARM chain). This does create more HOT chains where the root ctid cannot be removed, but it does avoid the index key/ctid check because any entry in the HOT chain has identical keys. What this would not handle is when an entire HOT chain is pruned, as the keys would be gone. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Notice lock waits
Jeff Janeswrites: > I have it PGC_SUSET because it does send some tiny amount of > information about the blocking process (the PID) to the blocked > process. That is probably too paranoid, because the PID can be seen > by anyone in the pg_locks table anyway. Why not just leave out the PID? I think it's often far too simplistic to blame a lock wait on a single other process, anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Notice lock waits
On 05/08/2016 19:00, Jeff Janes wrote: > One time too many, I ran some minor change using psql on a production > server and was wondering why it was taking so much longer than it did > on the test server. Only to discover, after messing around with > opening new windows and running queries against pg_stat_activity and > pg_locks and so on, that it was waiting for a lock. > > So I created a new guc, notice_lock_waits, which acts like > log_lock_waits but sends the message as NOTICE so it will show up on > interactive connections like psql. > > I turn it on in my .psqlrc, as it doesn't make much sense for me to > turn it on in non-interactive sessions. > > A general facility for promoting selected LOG messages to NOTICE would > be nice, but I don't know how to design or implement that. This is > much easier, and I find it quite useful. > > I have it PGC_SUSET because it does send some tiny amount of > information about the blocking process (the PID) to the blocked > process. That is probably too paranoid, because the PID can be seen > by anyone in the pg_locks table anyway. > > Do you think this is useful and generally implemented in the correct > way? If so, I'll try to write some sgml documentation for it. > I really like the idea. I'm not really sure on current implementation. Unless I'm wrong, disabling log_lock_waits would also disable notice_lock_waits, even if it's on. Maybe a new value for log_lock_waits, like "interactive". If switching this GUC from bool to enum is not acceptable or allowing to see blocking PID for anyone is an issue, then maybe adding a new GUC to say to also send a NOTICE instead? -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_size_pretty, SHOW, and spaces
On Fri, Aug 5, 2016 at 02:07:18PM -0400, Robert Haas wrote: > On Fri, Aug 5, 2016 at 11:06 AM, Bruce Momjianwrote: > > On Fri, Aug 5, 2016 at 10:57:35AM -0400, Peter Eisentraut wrote: > >> On 8/2/16 12:51 PM, Bruce Momjian wrote: > >> > Yes, that's a strong argument for using a space. I have adjusted the > >> > patch to use spaces in all reasonable places. Patch attached, which I > >> > have gzipped because it was 133 KB. (Ah, see what I did there?) :-) > >> > > >> > I am thinking of leaving the 9.6 docs alone as I have already made them > >> > consistent (no space) with minimal changes. We can make it consistent > >> > the other way in PG 10. > >> > >> I don't think anyone wanted to *remove* the spaces in the documentation. > >> I think this change makes the documentation harder to read. > > > > Well, we had spaces in only a few places in the docs, and as I said, it > > is not consistent. Do you want those few put back for 9.6? > > +1 for that. I can't see how it's good for 10 to be one way, 9.6 to > be the opposite way, and 9.5 and prior to be someplace in the middle. > That seems like a back-patching mess. OK, done. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Thomas Munrowrites: > The assertion in tsvector_delete_by_indices fails because its counting > algorithm doesn't expect indices_to_delete to contain multiple > references to the same index. Maybe that could be fixed by > uniquifying in tsvector_delete_arr before calling it, but since > tsvector_delete_by_indices already qsorts its input, it should be able > to handle duplicates cheaply. I poked at this and realized that that's not sufficient. If there are duplicates in indices_to_delete, then the initial estimate tsout->size = tsv->size - indices_count; is wrong because indices_count is an overestimate of how many lexemes will be removed. And because the calculation "dataout = STRPTR(tsout)" depends on tsout->size, we can't just wait till later to get it right. We could possibly initialize tsout->size = tsv->size (the maximum possible value), thereby ensuring that the WordEntry array doesn't overlap the dataout area; compute the correct tsout->size in the loop; and then memmove the data area into place to collapse out wasted space. But I think it might be simpler and better-performant just to de-dup the indices_to_delete array after qsort'ing it; that would certainly win for the case of indices_count == 1. The other problems I noted with failure to delete items seem to stem from the fact that tsvector_delete_arr relies on tsvector_bsearch to find items, but the input tsvector is not sorted (never mind de'duped) by array_to_tsvector. This seems like simple brain fade in array_to_tsvector, as AFAICS that's a required property of tsvectors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_replication_origin_xact_reset() and its argument variables
On Tue, Aug 2, 2016 at 2:59 AM, Fujii Masaowrote: > On Tue, Aug 2, 2016 at 2:48 AM, Andres Freund wrote: >> Hi Fujii, >> >> On 2016-07-28 16:44:37 +0900, Fujii Masao wrote: >>> On Sat, Jul 2, 2016 at 7:01 AM, Tom Lane wrote: >>> > Andres Freund writes: >>> >> On 2016-06-30 10:14:04 -0400, Tom Lane wrote: >>> >>> Fujii Masao writes: >>> As far as I read the code of the function, those arguments don't seem >>> to >>> be necessary. So I'm afraid that the pg_proc entry for the function >>> might >>> be incorrect and those two arguments should be removed from the >>> definition. >>> > >>> >>> Sure looks that way from here. Copy-and-paste from the previous >>> >>> line in pg_proc.h, perhaps? >>> > >>> >> Yes, that's clearly wrong. >>> >>> Attached patch (pg_replication_origin_xact_reset_9.6.patch) fixes this. >>> We need to apply this at least before RC1 of PostgreSQL9.6 will be released >>> because the patch needs the change of catalog version. >>> >>> >> Damn. Can't fix that for 9.5 anymore. The >>> >> function isn't all that important (especially not from SQL), but still, >>> >> that's annoying. I'm inclined to just remove the args in 9.6. We could >>> >> also add a note to the 9.5 docs, adding that the arguments are there by >>> >> error? >>> >>> What about the attched patch (pg_replication_origin_xact_reset_9.5.patch)? >> >> except for the strictness remark in the other email, > > Yes, you're right. My careless mistake... :( > >> these look sane to >> me. Do you want to push them? I'll do so by Wednesday otherwise, to >> leave some room before the next RC. > > Could you do that if possible? Pushed since right now I have time to do that. Anyway, thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncate trigger for foreign data wrappers
On 2016-08-05 14:05:02 -0400, Robert Haas wrote: > On Fri, Aug 5, 2016 at 2:04 PM, Andres Freundwrote: > > On 2016-08-05 13:32:18 -0400, Robert Haas wrote: > >> I think if we're going to add support utility commands on foreign > >> tables, we ought to think about all of the different utility commands > >> that someone might want and what exactly we want the behavior to be. > > > >> For example, consider CLUSTER or CREATE INDEX or VACUUM or ANALYZE. > >> We might interpret TRUNCATE or CLUSTER as a request to dispatch the > >> same request for the remote side, but ANALYZE can't mean that: it has > >> to mean gather local statistics. And what if the other side is not PG > >> and supports other operations that we don't have, like OPTIMIZE TABLE > >> or DISENGAGE FTL? > > > > That's not really comparable imo - we don't have triggers for those > > locally either. For better or worse we've decided that TRUNCATE is more > > like DML than DDL. > > I agree, but I still think it's weird if foreign tables support > TRUNCATE itself not but triggers on TRUNCATE. You mean the other way round? To me this seems very comparable to INSTEAD triggers, but ... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_size_pretty, SHOW, and spaces
On Fri, Aug 5, 2016 at 11:06 AM, Bruce Momjianwrote: > On Fri, Aug 5, 2016 at 10:57:35AM -0400, Peter Eisentraut wrote: >> On 8/2/16 12:51 PM, Bruce Momjian wrote: >> > Yes, that's a strong argument for using a space. I have adjusted the >> > patch to use spaces in all reasonable places. Patch attached, which I >> > have gzipped because it was 133 KB. (Ah, see what I did there?) :-) >> > >> > I am thinking of leaving the 9.6 docs alone as I have already made them >> > consistent (no space) with minimal changes. We can make it consistent >> > the other way in PG 10. >> >> I don't think anyone wanted to *remove* the spaces in the documentation. >> I think this change makes the documentation harder to read. > > Well, we had spaces in only a few places in the docs, and as I said, it > is not consistent. Do you want those few put back for 9.6? +1 for that. I can't see how it's good for 10 to be one way, 9.6 to be the opposite way, and 9.5 and prior to be someplace in the middle. That seems like a back-patching mess. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncate trigger for foreign data wrappers
On Fri, Aug 5, 2016 at 2:04 PM, Andres Freundwrote: > On 2016-08-05 13:32:18 -0400, Robert Haas wrote: >> I think if we're going to add support utility commands on foreign >> tables, we ought to think about all of the different utility commands >> that someone might want and what exactly we want the behavior to be. > >> For example, consider CLUSTER or CREATE INDEX or VACUUM or ANALYZE. >> We might interpret TRUNCATE or CLUSTER as a request to dispatch the >> same request for the remote side, but ANALYZE can't mean that: it has >> to mean gather local statistics. And what if the other side is not PG >> and supports other operations that we don't have, like OPTIMIZE TABLE >> or DISENGAGE FTL? > > That's not really comparable imo - we don't have triggers for those > locally either. For better or worse we've decided that TRUNCATE is more > like DML than DDL. I agree, but I still think it's weird if foreign tables support TRUNCATE itself not but triggers on TRUNCATE. And I don't want to start supporting TRUNCATE itself without a bit more thought about where we go from there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncate trigger for foreign data wrappers
On 2016-08-05 13:32:18 -0400, Robert Haas wrote: > I think if we're going to add support utility commands on foreign > tables, we ought to think about all of the different utility commands > that someone might want and what exactly we want the behavior to be. > For example, consider CLUSTER or CREATE INDEX or VACUUM or ANALYZE. > We might interpret TRUNCATE or CLUSTER as a request to dispatch the > same request for the remote side, but ANALYZE can't mean that: it has > to mean gather local statistics. And what if the other side is not PG > and supports other operations that we don't have, like OPTIMIZE TABLE > or DISENGAGE FTL? That's not really comparable imo - we don't have triggers for those locally either. For better or worse we've decided that TRUNCATE is more like DML than DDL. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 11:26 PM, Bruce Momjianwrote: > On Fri, Aug 5, 2016 at 02:43:37PM -0300, Claudio Freire wrote: > > But doing the WARM chain backtracking and checking for previous > > versions with appropriate keys should be enough to support this use > > case too, it just needs to be well optimized to avoid seriously > > impacting performance on the average case. > > Yes, that was an idea I had to --- if the in-page HOT chain already has > the key, we know it is already in the index and we can skip the index > addition. > > I just don't know how would you do that without delaying/not-doing HOT chain prune. As soon as root and intermediate HOT tuples are gone, all information is lost. You may delay that, but that will defeat the whole purpose. If chains are not pruned in-time, you may not find any free space in the page and end up doing a cold update anyways. But may be I am missing something and Claudio has a different idea. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 02:43:37PM -0300, Claudio Freire wrote: > But doing the WARM chain backtracking and checking for previous > versions with appropriate keys should be enough to support this use > case too, it just needs to be well optimized to avoid seriously > impacting performance on the average case. Yes, that was an idea I had to --- if the in-page HOT chain already has the key, we know it is already in the index and we can skip the index addition. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring of heapam code.
Anastasia Lubennikova wrote: > Working on page compression and some other issues related to > access methods, I found out that the code related to heap > looks too complicated. Much more complicated, than it should be. > Since I anyway got into this area, I want to suggest a set of improvements. Hm. I'm hacking on heapam.c pretty heavily ATM. Your first patch causes no problem I think, or if it does it's probably pretty minor, so that's okay. I'm unsure about the second one -- I think it's okay too, because it mostly seems to be about moving stuff from heapam.c to hio.c and shuffling some code around that I don't think I'm modifying. > Also I think that it's quite strange to have a function heap_create(), that > is called > from index_create(). It has absolutely nothing to do with heap access > method. Agreed. But changing its name while keeping it in heapam.c does not really improve things enough. I'd rather have it moved elsewhere that's not specific to "heaps" (somewhere in access/common perhaps). However, renaming the functions would break third-party code for no good reason. I propose that we only rename things if we also break it for other reasons (say because the API changes in a fundamental way). > One more thing that requires refactoring is "RELKIND_RELATION" name. > We have a type of relation called "relation"... I don't see us fixing this bit, really. Historical baggage and all that. For example, a lot of SQL queries use "WHERE relkind = 'r'". If we change the name, we should probably also change the relkind constant, and that would break the queries. If we change the name and do not change the constant, then we have to have a comment "we call them RELKIND_TABLE but the char is r because it was called RELATION previously", which isn't any better. > So there is a couple of patches. They do not cover all mentioned problems, > but I'd like to get a feedback before continuing. I agree that we could improve things in this general area, but I do not endorse any particular changes you propose in these patches; I haven't reviewed your patches. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 2:26 PM, Pavan Deolaseewrote: > On Fri, Aug 5, 2016 at 8:55 PM, Claudio Freire > wrote: >> >> On Fri, Aug 5, 2016 at 1:27 AM, Pavan Deolasee >> wrote: >> > >> > >> > I don't see why it is hard. Trying to find the index entry for the old >> > update row seems odd, and updating an index row seems odd, but skipping >> > an index addition for the new row seems simple. Is the problem >> > concurrent activity? I assumed already had that ability to add to HOT >> > chains because we have to lock the update chain. >> >> >> Think of an index over a 1TB table whose keys have only 2 values: true >> and false. >> >> Sure, it's a horrible index. But I've seen things like that, and I've >> seen cases when they're useful too. >> >> So, conceptually, for each key you have circa N/2 tids on the index. >> nbtree finds the leftmost valid insert point comparing keys, it >> doesn't care about tids, so to find the index entries that point to >> the page where the new tuple is, you'd have to scan the N/2 tids in >> the index, an extremely expensive operation. >> > > Well, it's always going to be extremely hard to solve for all use cases. > This is one such extreme case and we should just give up and do cold update. > > I think we can look at the index type (unique vs non-unique) along with > table statistics to find what fraction of column values are distinct and > then estimate whether its worthwhile to look for duplicate (key, CTID) or > just do a cold update. In addition put some cap of how hard we try once we > decide to check for duplicates and give up after we cross that threshold. I don't think bailing out in this case is necessary, and it could be more common than you'd think. It doesn't need to be this extreme to cause the issue, it only needs equal-key runs that span more than an index page, and that could be far more common than the extreme example I gave. But doing the WARM chain backtracking and checking for previous versions with appropriate keys should be enough to support this use case too, it just needs to be well optimized to avoid seriously impacting performance on the average case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls
> "Tom" == Tom Lanewrites: Tom> Also, the way that the value is calculated in the Tom> samples-not-all-distinct case corresponds to the way I have it in Tom> the patch. Ahh, gotcha. You're referring to this: /* * If we estimated the number of distinct values at more than 10% of * the total row count (a very arbitrary limit), then assume that * stadistinct should scale with the row count rather than be a fixed * value. */ if (stats->stadistinct > 0.1 * totalrows) stats->stadistinct = -(stats->stadistinct / totalrows); where "totalrows" includes nulls obviously. So this expects negative stadistinct to be scaled by the total table size, and the all-distinct case should do the same. Objection withdrawn. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Robert Haaswrites: > Action within 72 hours now seems inadequate; we are scheduled to wrap > rc1 on Monday. We need someone to either fix these bugs very very > soon, or decide to ship beta4 instead of rc1 (uggh), or decide it's OK > to ship rc1 with these known defects, or postpone the planned release. Given the time of year, I'd not be surprised if Oleg and Teodor are on vacation. In view of the time pressure, I'll take a whack at fixing this. I think that Thomas Munro's suggestion is good as far as fixing the Assert failure is concerned. I do not know where the other problems are, but maybe I can find them ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncate trigger for foreign data wrappers
On Fri, Aug 5, 2016 at 10:39 AM, Andres Freundwrote: > On 2016-08-05 10:33:49 -0400, Tom Lane wrote: >> Murat Tuncer writes: >> > I recently hit a road blocker when I tried to create a truncate trigger on >> > a foreign table. trigger.c::CreateTrigger() function has explicit check to >> > block truncate trigger on foreign tables. >> >> That's good, because we don't implement TRUNCATE on foreign tables: there >> is nothing in the FDW API that would support it. Not much point in >> declaring a trigger for an event that can never happen. > > Well, allowing BEFORE triggers to return NULL or something, preventing > the execution of the rest, would be such an implementation, and also > independently useful. I guess. I think if we're going to add support utility commands on foreign tables, we ought to think about all of the different utility commands that someone might want and what exactly we want the behavior to be. For example, consider CLUSTER or CREATE INDEX or VACUUM or ANALYZE. We might interpret TRUNCATE or CLUSTER as a request to dispatch the same request for the remote side, but ANALYZE can't mean that: it has to mean gather local statistics. And what if the other side is not PG and supports other operations that we don't have, like OPTIMIZE TABLE or DISENGAGE FTL? That isn't, strictly speaking, a reason to reject a patch that just allows TRUNCATE triggers on FDWs to "return NULL or something", but I can't get very excited about such a patch because I think the utility is fairly marginal. I'd rather have a little more of a plan than that before we go start tinkering. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On 08/05/2016 06:24 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondrawrote: Attached is v19 of the "multivariate stats" patch series - essentially v18 rebased on top of current master. Aside from a few bug fixes, the main improvement is addition of SGML docs demonstrating the statistics in a way similar to the current "Row Estimation Examples" (and the docs are actually in the same section). I've tried to keep the right amount of technical detail (and pointing to the right README for additional details), but this may need improvements. I have not written docs explaining how statistics may be combined yet (more about this later). What we have here is quite something: $ git diff master --stat | tail -n1 77 files changed, 12809 insertions(+), 65 deletions(-) I will try to get familiar on the topic and added myself as a reviewer of this patch. Hopefully I'll get feedback soon. Yes, it's a large patch. Although 25% of the insertions are SGML docs, regression tests and READMEs, and large part of the remaining ~9k insertions are comments. But it may still be overwhelming, no doubt about that. FWIW, if someone is interested in the patch but is unsure where to start, I'm ready to help with that as much as possible. For example if you happen to go to PostgresOpen, feel free to drag me to a corner and ask me as many questions as you want ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash index with larger hashes
On Fri, Aug 5, 2016 at 10:39 AM, Kenneth Marshallwrote: > I have been following the recent discussions on increasing the > size of the hash function used in Postgres and the work to > provide WAL and performance improvements for hash indexes. > I know it was mentioned when we moved to the new hashing > functions, but the existing functions do provide an additional > 32-bits of hash. We currently do not use them, but they are > already calculated. > > It had occurred to me that one way to decrease the space used > to store the hash value would be to include information about > the page number to determine the actual value. For example, > a hash index of 65k pages (540mb) would get two additional > bytes of hash with no associated storage cost. Also, if you > subdivided the hash page into say 128 sub-pages you would > get the extra 2 bytes of hash in a 4mb index. In this way, > the bigger the hash index is, the more bits you get for free. > > Just wanted to throw it out there. I'm not sure I understand exactly what you are proposing here. Suppose we have 64 bits of hashcode and (1 << N) buckets. Currently, we store hashcode bits 0..31 on each item. Maybe what you're saying is that we could instead store bits B..(31+B) on each item - that is, cram in as many extra bits on each individual item as log2(nbuckets). The problem with that is that it would make it very hard to split buckets. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 8:55 PM, Claudio Freirewrote: > On Fri, Aug 5, 2016 at 1:27 AM, Pavan Deolasee > wrote: > > > > > > I don't see why it is hard. Trying to find the index entry for the old > > update row seems odd, and updating an index row seems odd, but skipping > > an index addition for the new row seems simple. Is the problem > > concurrent activity? I assumed already had that ability to add to HOT > > chains because we have to lock the update chain. > > > Think of an index over a 1TB table whose keys have only 2 values: true > and false. > > Sure, it's a horrible index. But I've seen things like that, and I've > seen cases when they're useful too. > > So, conceptually, for each key you have circa N/2 tids on the index. > nbtree finds the leftmost valid insert point comparing keys, it > doesn't care about tids, so to find the index entries that point to > the page where the new tuple is, you'd have to scan the N/2 tids in > the index, an extremely expensive operation. > > Well, it's always going to be extremely hard to solve for all use cases. This is one such extreme case and we should just give up and do cold update. I think we can look at the index type (unique vs non-unique) along with table statistics to find what fraction of column values are distinct and then estimate whether its worthwhile to look for duplicate (key, CTID) or just do a cold update. In addition put some cap of how hard we try once we decide to check for duplicates and give up after we cross that threshold. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] money type overflow checks
Peter Eisentrautwrites: > The input function of the money type has no overflow checks: Ugh. > (Is checking for < 0 a valid overflow check? No, I don't think it's sufficient after a multiplication by 10. That would be enough to shift some bits clear out of the word, but there's no certainty that the new sign bit would be 1. The scheme used in scanint8 is safe. But I think it was written that way mainly to avoid hard-wired assumptions about how wide int64 is, a consideration that's a mite obsolete now. You could possibly avoid the cost of a division by relying on comparisons to PG_INT64_MAX/10. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 5 August 2016 at 16:22, Andres Freundwrote: > On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote: >> as promised here is WIP version of logical replication patch. > > Yay! Yay2 > I'm about to head out for a week of, desperately needed, holidays, but > after that I plan to spend a fair amount of time helping to review > etc. this. Have a good one. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Notice lock waits
One time too many, I ran some minor change using psql on a production server and was wondering why it was taking so much longer than it did on the test server. Only to discover, after messing around with opening new windows and running queries against pg_stat_activity and pg_locks and so on, that it was waiting for a lock. So I created a new guc, notice_lock_waits, which acts like log_lock_waits but sends the message as NOTICE so it will show up on interactive connections like psql. I turn it on in my .psqlrc, as it doesn't make much sense for me to turn it on in non-interactive sessions. A general facility for promoting selected LOG messages to NOTICE would be nice, but I don't know how to design or implement that. This is much easier, and I find it quite useful. I have it PGC_SUSET because it does send some tiny amount of information about the blocking process (the PID) to the blocked process. That is probably too paranoid, because the PID can be seen by anyone in the pg_locks table anyway. Do you think this is useful and generally implemented in the correct way? If so, I'll try to write some sgml documentation for it. Cheers, Jeff notice_lock_waits-V01.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
On Thu, Aug 4, 2016 at 8:14 AM, Aleksander Alekseevwrote: > Thanks everyone for your remarks and comments! > > Here is an improved version of a patch. > > Main changes: > * Patch passes `make installcheck` > * Code is fully commented, also no more TODO's > > I wish I sent this version of a patch last time. Now I realize it was > really hard to read and understand. Hope I managed to correct this > flaw. If you believe that some parts of the code are still poorly > commented or could be improved in any other way please let me know. > > And as usual, any other comments, remarks or questions are highly > appreciated! Three general comments: 1. It's always seemed to me that a huge problem with anything of this sort is dependencies. For example, suppose I create a fast temporary table and then I create a functional index on the fast temporary table that uses some SQL function defined in pg_proc. Then, another user drops the function. Then, I try to use the index. With regular temporary tables, dependencies protect us here, but if there are no catalog entries, I wonder how this can ever be made safe. Similar problems exist for triggers, constraints, RLS policies, and attribute defaults. 2. This inserts additional code in a bunch of really low-level places like heap_hot_search_buffer, heap_update, heap_delete, etc. I think what you've basically done here is create a new, in-memory heap AM and then, because we don't have an API for adding new storage managers, you've bolted it onto the existing heapam layer. That's certainly a reasonable approach for creating a PoC, but I think we actually need a real API here. Otherwise, when the next person comes along and wants to add a third heap implementation, they've got to modify all of these same places again. I don't think this code is reasonably maintainable in this form. 3. Currently, temporary tables are parallel-restricted: a query that references temporary tables can use parallelism, but access to the temporary tables is only possible from within the leader. I suspect, although I'm not entirely sure, that lifting this restriction would be easier with our current temporary table implementation than with this one, because the current temporary table implementation mostly relies on a set of buffers that could be moved from backend-private memory to DSM. On a quick look, this implementation uses a bunch of new data structures that are heavy on pointers, so that gets quite a bit more complicated to make parallel-safe (unless we adopt threads instead of processes!). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Change the default of update_process_title to off
On Fri, Aug 5, 2016 at 3:25 AM, Tsunakawa, Takayukiwrote: >> From: Tom Lane [mailto:t...@sss.pgh.pa.us] >> Yeah, I think I agree. It would be bad to disable it by default on Unix, >> because ps(1) is a very standard tool there, but the same argument doesn't >> hold for Windows. > > It seems that we could reach a consensus. The patch is attached. I'll add > this to the next CommitFest. > >> Another route to a solution would be to find a cheaper way to update the >> process title on Windows ... has anyone looked for alternatives? > > I couldn't find an alternative solution after asking some Windows support > staff. > > Regards > Takayuki Tsunakawa Shouldn't we change the code which initdb uses to create the example postgresql.conf, so that it shows the commented out value as being 'off', when initdb is run on Windows? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
30.07.2016 14:00, Andrew Borodin: Here is BRIN-compatible version of patch. Now function PageIndexTupleOverwrite consumes tuple size as a parameter, this behavior is coherent with PageAddItem. Also, i had to remove asserstion that item has a storage in the loop that adjusts line pointers. It would be great if someone could check that place (commented Assert(ItemIdHasStorage(ii)); in PageIndexTupleOverwrite). I suspect that there might be a case when linp's should not be adjusted. Hi, I reviewed the code and I have couple of questions about following code: /* may have negative size here if new tuple is larger */ size_diff = oldsize - alignednewsize; offset = ItemIdGetOffset(tupid); if (offset < phdr->pd_upper || (offset + size_diff) > phdr->pd_special || offset != MAXALIGN(offset) || size_diff != MAXALIGN(size_diff)) ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg("corrupted item offset: offset = %u, size = %u", offset, (unsigned int) size_diff))); First of all, shouldn't we use MAXALIGN(oldsize) instead of oldsize? Although, I'm quite sure that it was already aligned somewhere before. I doubt that the check (size_diff != MAXALIGN(size_diff)) is necessary. I'd rather add the check: (offset+size_diff < pd_lower). Besides that moment, the patch seems pretty good. BTW, I'm very surprised that it improves performance so much. And also size is reduced significantly. 89MB against 289MB without patch. Could you explain in details, why does it happen? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column COMMENTs in CREATE TABLE?
On 8/5/16 11:58 AM, David Fetter wrote: > For what it's worth, I tend to put the function body last. That's > just my taste, though. Would it be hard to keep the ability to > permute the stuff after > > CREATE FUNCTION (args) > RETURNS [SETOF] type > > as we have it now? I don't think anybody is suggesting to change that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
On Thu, Aug 4, 2016 at 12:14 AM, Noah Mischwrote: > On Wed, Aug 03, 2016 at 05:52:44PM -0400, Tom Lane wrote: >> I wrote: >> > I'm thinking there are two distinct bugs here. >> >> Actually, make that three bugs. I was so focused on the crashing >> that I failed to notice that ts_delete wasn't producing sane answers >> even when it didn't crash: > > [Action required within 72 hours. This is a generic notification.] > > The above-described topic is currently a PostgreSQL 9.6 open item. Teodor, > since you committed the patch believed to have created it, you own this open > item. If some other commit is more relevant or if this does not belong as a > 9.6 open item, please let us know. Otherwise, please observe the policy on > open item ownership[1] and send a status update within 72 hours of this > message. Include a date for your subsequent status update. Testers may > discover new open items at any time, and I want to plan to get them all fixed > in advance of shipping 9.6rc1 next week. Consequently, I will appreciate your > efforts toward speedy resolution. Thanks. Action within 72 hours now seems inadequate; we are scheduled to wrap rc1 on Monday. We need someone to either fix these bugs very very soon, or decide to ship beta4 instead of rc1 (uggh), or decide it's OK to ship rc1 with these known defects, or postpone the planned release. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Wed, Aug 3, 2016 at 5:13 PM, Peter Geogheganwrote: > On Wed, Aug 3, 2016 at 11:42 AM, Robert Haas wrote: >> I'm not going to say it's bad to be able to do things 2-2.5x faster, >> but linear scalability this ain't - particularly because your 2.58x >> faster case is using up to 7 or 8 times as much memory. The >> single-process case would be faster in that case, too: you could >> quicksort. > > [ lengthy counter-argument ] None of this convinces me that testing this in a way that is not "apples to apples" is a good idea, nor will any other argument. >> I also think that Amdahl's law is going to pinch pretty severely here. > > Doesn't that almost always happen, though? To some extent, sure, absolutely. But it's our job as developers to try to foresee and minimize those cases. When Noah was at EnterpriseDB a few years ago and we were talking about parallel internal sort, Noah started by doing a survey of the literature and identified parallel quicksort as the algorithm that seemed best for our use case. Of course, every time quicksort partitions the input, you get two smaller sorting problems, so it's easy to see how to use 2 CPUs after the initial partitioning step has been completed and 4 CPUs after each of those partitions has been partitioned again, and so on. However, that turns out not to be good enough because the first partitioning step can consume a significant percentage of the total runtime - so if you only start parallelizing after that, you're leaving too much on the table. To avoid that, the algorithm he was looking at had a (complicated) way of parallelizing the first partitioning step; then you can, it seems, do the full sort in parallel. There are some somewhat outdated and perhaps naive ideas about this that we wrote up here: https://wiki.postgresql.org/wiki/Parallel_Sort Anyway, you're proposing an algorithm that can't be fully parallelized. Maybe that's OK. But I'm a little worried about it. I'd feel more confident if we knew that the merge could be done in parallel and were just leaving that to a later development stage; or if we picked an algorithm like the one above that doesn't leave a major chunk of the work unparallelizable. > Isn't that what you > generally see with queries that show off the parallel join capability? For nested loop joins, no. The whole join operation can be done in parallel. For hash joins, yes: building the hash table once per worker can run afoul of Amdahl's law in a big way. That's why Thomas Munro is working on fixing it: https://wiki.postgresql.org/wiki/EnterpriseDB_database_server_roadmap Obviously, parallel query is subject to a long list of annoying restrictions at this point. On queries that don't hit any of those restrictions we can get 4-5x speedup with a leader and 4 workers. As we expand the range of plan types that we can construct, I think we'll see those kinds of speedups for a broader range of queries. (The question of exactly why we top out with as few workers as currently seems to be the case needs more investigation, too; maybe contention effects?) >> If the final merge phase is a significant percentage of the total >> runtime, picking an algorithm that can't parallelize the final merge >> is going to limit the speedups to small multiples. That's an OK place >> to be as a result of not having done all the work yet, but you don't >> want to get locked into it. If we're going to have a substantial >> portion of the work that can never be parallelized, maybe we've picked >> the wrong algorithm. > > I suggest that this work be compared to something with similar > constraints. I used Google to try to get some indication of how much > of a difference parallel CREATE INDEX makes in other major database > systems. This is all I could find: > > https://www.mssqltips.com/sqlservertip/3100/reduce-time-for-sql-server-index-rebuilds-and-update-statistics/ I do agree that it is important not to have unrealistic expectations. > As I've said, there is probably a good argument to be made for > partitioning to increase parallelism. But, that involves risks around > the partitioning being driven by statistics or a cost model, and I > don't think you'd be too on board with the idea of every CREATE INDEX > after bulk loading needing an ANALYZE first. I tend to think of that > as more of a parallel query thing, because you can often push down a > lot more there, dynamic sampling might be possible, and there isn't a > need to push all the tuples through one point in the end. Nothing I've > done here precludes your idea of a sort-order-preserving gather node. > I think that we may well need both. Yes. Rushabh is working on that, and Finalize GroupAggregate -> Gather Merge -> Partial GroupAggregate -> Sort -> whatever is looking pretty sweet. >> The work on making the logtape infrastructure parallel-aware seems >> very interesting and potentially useful for other things. Sadly, I >> don't
[HACKERS] money type overflow checks
The input function of the money type has no overflow checks: => select '12345678901234567890'::money; money - -$13,639,628,150,831,692.72 (1 row) The tests in the regression test file money.sql are bogus because they only test the overflow checks of the bigint type before the cast. Here is a patch that adds appropriate checks and tests. We could probably remove the bogus tests. (Is checking for < 0 a valid overflow check? We save the sign until the very end, so it ought to work. The code in int8.c works differently there.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 6fcce1d6c3685cfb5bacdb89981d4f6a3911dded Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Fri, 5 Aug 2016 11:50:53 -0400 Subject: [PATCH] Add overflow checks to money type input function --- src/backend/utils/adt/cash.c| 18 ++ src/test/regress/expected/money.out | 47 + src/test/regress/sql/money.sql | 11 + 3 files changed, 76 insertions(+) diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c index b336185..0c06f71 100644 --- a/src/backend/utils/adt/cash.c +++ b/src/backend/utils/adt/cash.c @@ -197,6 +197,12 @@ cash_in(PG_FUNCTION_ARGS) { value = (value * 10) + (*s - '0'); + if (value < 0) +ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("value \"%s\" is out of range for type money", +str))); + if (seen_dot) dec++; } @@ -216,10 +222,22 @@ cash_in(PG_FUNCTION_ARGS) if (isdigit((unsigned char) *s) && *s >= '5') value++; + if (value < 0) + ereport(ERROR, +(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("value \"%s\" is out of range for type money", + str))); + /* adjust for less than required decimal places */ for (; dec < fpoint; dec++) value *= 10; + if (value < 0) + ereport(ERROR, +(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("value \"%s\" is out of range for type money", + str))); + /* * should only be trailing digits followed by whitespace, right paren, * trailing sign, and/or trailing currency symbol diff --git a/src/test/regress/expected/money.out b/src/test/regress/expected/money.out index 538235c..206f5c4 100644 --- a/src/test/regress/expected/money.out +++ b/src/test/regress/expected/money.out @@ -185,6 +185,53 @@ SELECT * FROM money_data; $123.46 (1 row) +-- input checks +SELECT '1234567890'::money; + money +--- + $1,234,567,890.00 +(1 row) + +SELECT '12345678901234567'::money; + money + + $12,345,678,901,234,567.00 +(1 row) + +SELECT '123456789012345678'::money; +ERROR: value "123456789012345678" is out of range for type money +LINE 1: SELECT '123456789012345678'::money; + ^ +SELECT '9223372036854775807'::money; +ERROR: value "9223372036854775807" is out of range for type money +LINE 1: SELECT '9223372036854775807'::money; + ^ +SELECT '-12345'::money; +money +- + -$12,345.00 +(1 row) + +SELECT '-1234567890'::money; + money + + -$1,234,567,890.00 +(1 row) + +SELECT '-12345678901234567'::money; +money +- + -$12,345,678,901,234,567.00 +(1 row) + +SELECT '-123456789012345678'::money; +ERROR: value "-123456789012345678" is out of range for type money +LINE 1: SELECT '-123456789012345678'::money; + ^ +SELECT '-9223372036854775808'::money; +ERROR: value "-9223372036854775808" is out of range for type money +LINE 1: SELECT '-9223372036854775808'::money; + ^ -- Cast int4/int8 to money SELECT 1234567890::money; money diff --git a/src/test/regress/sql/money.sql b/src/test/regress/sql/money.sql index 09b9476..d07a616 100644 --- a/src/test/regress/sql/money.sql +++ b/src/test/regress/sql/money.sql @@ -57,6 +57,17 @@ CREATE TABLE money_data (m money); INSERT INTO money_data VALUES ('$123.459'); SELECT * FROM money_data; +-- input checks +SELECT '1234567890'::money; +SELECT '12345678901234567'::money; +SELECT '123456789012345678'::money; +SELECT '9223372036854775807'::money; +SELECT '-12345'::money; +SELECT '-1234567890'::money; +SELECT '-12345678901234567'::money; +SELECT '-123456789012345678'::money; +SELECT '-9223372036854775808'::money; + -- Cast int4/int8 to money SELECT 1234567890::money; SELECT 12345678901234567::money; -- 2.9.2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Prevent "snapshot too old" from trying to return pruned TOAST tu
On Fri, Aug 5, 2016 at 11:05 AM, Tom Lanewrote: > Robert Haas writes: >> On Thu, Aug 4, 2016 at 10:58 PM, Robert Haas wrote: >>> What is the reason? We refuse to separate frontend and backend >>> headers in any sort of principled way? > >> That was poorly phrased. I'll try again: I can't see any reason for >> using a macro here except that it allows frontend code to compile this >> without breaking. > > Well, the alternative would be to put "#ifndef FRONTEND" around the > static-inline function. That's not very pretty either, and it's > inconsistent with the existing precedent (ie, InitDirtySnapshot). > Also, it presumes that a non-backend includer actually has defined > FRONTEND; that seems to be the case for pg_xlogdump but I do not > think we do that everywhere. That may not be pretty, but it'd be a lot more transparent. If I see #ifndef FRONTEND, I think, oh, that's protecting some stuff that shouldn't be included in front-end compiles. If I see a macro, I not necessarily think of the fact that this may be a way of preventing that code from being compiled in front-end compiles. >> Here's a patch. Is this what you want? > > OK by me. OK, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column COMMENTs in CREATE TABLE?
On Fri, Aug 05, 2016 at 10:14:21AM -0400, Peter Eisentraut wrote: > On 7/3/16 11:41 AM, Tom Lane wrote: > > I can see the reasoning for > > allowing COMMENT in a table column definition, but the argument for > > allowing it in simpler CREATEs seems tissue-thin: > > > > CREATE FUNCTION foo(int) RETURNS ... ; > > COMMENT ON FUNCTION foo(int) IS 'blah'; > > > > vs > > > > CREATE FUNCTION foo(int) RETURNS ... > > WITH (COMMENT 'blah'); > > > > Not much of a keystroke savings, nor is the comment noticeably > > "closer" to its object than before. > > I had actually been thinking about a similar proposal, but specifically > for CREATE FUNCTION. But the syntax would have to put it above the > function body, not below it. I think the CREATE FUNCTION syntax could > actually handle that. For what it's worth, I tend to put the function body last. That's just my taste, though. Would it be hard to keep the ability to permute the stuff after CREATE FUNCTION (args) RETURNS [SETOF] type as we have it now? Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regression test for extended query protocol
On Fri, Aug 5, 2016 at 10:11 AM, Tom Lanewrote: > Robert Haas writes: >> I think it would be an interesting project for someone to try to >> figure out how to make 'make check-extended-query-protocol' or similar >> work. > > Seems like it would not be that hard to put some kind of option in psql > to issue queries with PQexecParams not plain PQexec. Yes, we did that. > However, since > that wouldn't exercise sending out-of-line parameters, it's not clear > to me that it'd be a very interesting test. Well, you exercise copyfuncs.c quite a bit more, if nothing else. > I still suspect that doing this in core is mostly going to be duplicative > effort, and we'd be better off making use of existing JDBC tests. That's possible; I don't know much about the JDBC tests, so it's hard for me to say how that would compare in terms of coverage to my proposal. Perhaps both things are worth doing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 12:25:39PM -0300, Claudio Freire wrote: > > 2. The existence of index pointers to intermediate tuples will lead to index > > bloat. This is the same problem that we found in Simon's original proposal > > (discussed internally). None of the intermediate line pointers can be > > vacuumed until the entire chain becomes DEAD. Event if the a duplicate index > > key is inserted for every index, knowing that and reclaiming to the index > > pointers to the original root line pointer, will be difficult. > > I don't see the difference it makes for bloat between storing the root > tid and the intermediate tid, but I haven't yet figured out how > vacuuming would work so maybe I have to think about that to realize > the difference. Think of page pruning --- we can't remove a ctid that an index points to. The more ctids you point to in a HOT chain, the fewer ctids you can remove --- that's why we want to point only to the head of the HOT/WARM chain. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 1:27 AM, Pavan Deolaseewrote: > > > On Fri, Aug 5, 2016 at 8:23 AM, Claudio Freire > wrote: >> >> On Thu, Aug 4, 2016 at 11:15 PM, Bruce Momjian wrote: >> >> > >> > OK, that's a lot of text, and I am confused. Please tell me the >> > advantage of having an index point to a string of HOT chains, rather >> > than a single one? Is the advantage that the index points into the >> > middle of the HOT chain rather than at the beginning? I can see some >> > value to that, but having more ctid HOT headers that can't be removed >> > except by VACUUM seems bad, plus the need for index rechecks as you >> > cross to the next HOT chain seems bad. >> > >> > The value of WARM is to avoid index bloat --- right now we traverse the >> > HOT chain on a single page just fine with no complaints about speed so I >> > don't see a value to optimizing that traversal, and I think the key >> > rechecks and ctid bloat will make it all much slower. >> > >> > It also seems much more complicated. >> >> The point is avoiding duplicate rows in the output of index scans. >> >> I don't think you can avoid it simply by applying index condition >> rechecks as the original proposal implies, in this case: >> >> CREATE TABLE t (id integer not null primary key, someid integer, dat >> integer); >> CREATE INDEX i1 ON t (someid); >> >> INSERT INTO t (id, someid, dat) VALUES (1, 2, 100); >> UPDATE t SET dat = dat + 1 where id = 1; >> UPDATE t SET dat = dat + 1, id = 2 where id = 1; >> UPDATE t SET dat = dat + 1, id = 3, someid = 3 where id = 2; >> UPDATE t SET dat = dat + 1, id = 1, someid = 2 where id = 3; >> SELECT * FROM t WHERE someid = 2; >> >> That, I believe, will cause the problematic chains where the condition >> recheck passes both times the last visible tuple is visited during the >> scan. It will return more than one tuple when in reality there is only >> one. > > > Hmm. That seems like a real problem. The only way to address that is to > ensure that for a given WARM chain and given index key, there must exists > only a single index entry. There can and will be multiple entries if the > index key changes, but if the index key changes to the original value (or > any other value already in the WARM chain) again, we must not add another > index entry. Now that does not seem like a very common scenario and skipping > a duplicate index entry does have its own benefit, so may be its not a > terrible idea to try that. You're right, it may be expensive to search for > an existing matching index key for the same root line pointer. But if we > could somehow short-circuit the more common case, it might still be worth > trying. I think it can be done. I could try to write a POC and see how it works. > The other idea you mentioned is to index intermediate tuples but somehow > restrict WARM chain following knowing that the subsequent tuples are > reachable via different index tuple. I see two major problems with that > approach: > > 1. When HOT or WARM chains are pruned and dead tuples removed, we may lose > the knowledge about key changes between intermediate updates. Without that > its seems difficult to know when to stop while following chain starting from > the old index tuple. > > 2. The existence of index pointers to intermediate tuples will lead to index > bloat. This is the same problem that we found in Simon's original proposal > (discussed internally). None of the intermediate line pointers can be > vacuumed until the entire chain becomes DEAD. Event if the a duplicate index > key is inserted for every index, knowing that and reclaiming to the index > pointers to the original root line pointer, will be difficult. I don't see the difference it makes for bloat between storing the root tid and the intermediate tid, but I haven't yet figured out how vacuuming would work so maybe I have to think about that to realize the difference. >> Not to mention the cost of scanning the chain of N tuples N times, >> making it quadratic - bound by the number of tuples that can fit a >> page, but still high. >> >> Solving that, I believe, will remove all the simplicity of pointing to >> root tuples. >> > > You're right. But having all indexes point to the root line pointer makes > things simpler to manage and less buggy. So if we can somehow solve the > duplicate tuples problem, even at additional cost at update time, it might > still be worth. I would suggest that we should think more and we can find > some solution to the problem. Will try to go down that road and see what can be optimized. >> I don't really see how you'd do that on yours. You seem to assume >> finding a particular key-item pointer pair in an index would be >> efficient, but IMHO that's not the case. > > > That's true. But we could somehow short-circuit the more common pattern, > that might still be worth. For corner cases, we can fall back to non-HOT > update and keep things simple. It
Re: [HACKERS] improved DefElem list processing
On 8/4/16 2:21 PM, Tom Lane wrote: > Forgot to mention: seems like you should have added a location > argument to makeDefElem. I was hesitating to do that lest it break extensions or something, but I guess we break bigger things than that all the time. I'll change it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improved DefElem list processing
On 8/4/16 2:08 PM, Tom Lane wrote: > +1 on the general idea, but I wonder if it's a problem that you replaced > an O(N) check with an O(N^2) check. I don't think there are any commands > that would be likely to have so many options that this would become a > serious issue, but ... I'll run some tests. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] handling unconvertible error messages
On 8/4/16 9:42 AM, Tom Lane wrote: > I'm inclined to think that we should reset the message locale > to C as soon as we've forked away from the postmaster, and leave it > that way until we've absorbed settings from the startup packet. > Sending messages of this sort in English isn't great, but it's better > than sending completely-unreadable ones. Or is that just my > English-centricity showing? Well, most of the time this all works, only if there are different client and server settings you might have problems. We wouldn't want to partially disable the NLS feature for the normal case. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] handling unconvertible error messages
On 8/4/16 2:45 AM, Victor Wagner wrote: > 4. At the session startup try to reinitializie LC_MESSAGES locale > category with the combination > of the server (or better client-send) language and region and > client-supplied encoding, and if this failed, use untranslated error > message. Obvoisly, attempt to set locale to ru_RU.ISO8859-1 would fail. > so, if client would ask server with ru_RU.UTF-8 default locale to use > LATIN1 encoding, server would fallback to untranslated messages. I think this is basically my solution (1), with the same problems. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote: > as promised here is WIP version of logical replication patch. Yay! I'm about to head out for a week of, desperately needed, holidays, but after that I plan to spend a fair amount of time helping to review etc. this. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_size_pretty, SHOW, and spaces
On Fri, Aug 5, 2016 at 10:57:35AM -0400, Peter Eisentraut wrote: > On 8/2/16 12:51 PM, Bruce Momjian wrote: > > Yes, that's a strong argument for using a space. I have adjusted the > > patch to use spaces in all reasonable places. Patch attached, which I > > have gzipped because it was 133 KB. (Ah, see what I did there?) :-) > > > > I am thinking of leaving the 9.6 docs alone as I have already made them > > consistent (no space) with minimal changes. We can make it consistent > > the other way in PG 10. > > I don't think anyone wanted to *remove* the spaces in the documentation. > I think this change makes the documentation harder to read. Well, we had spaces in only a few places in the docs, and as I said, it is not consistent. Do you want those few put back for 9.6? -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Prevent "snapshot too old" from trying to return pruned TOAST tu
Robert Haaswrites: > On Thu, Aug 4, 2016 at 10:58 PM, Robert Haas wrote: >> What is the reason? We refuse to separate frontend and backend >> headers in any sort of principled way? > That was poorly phrased. I'll try again: I can't see any reason for > using a macro here except that it allows frontend code to compile this > without breaking. Well, the alternative would be to put "#ifndef FRONTEND" around the static-inline function. That's not very pretty either, and it's inconsistent with the existing precedent (ie, InitDirtySnapshot). Also, it presumes that a non-backend includer actually has defined FRONTEND; that seems to be the case for pg_xlogdump but I do not think we do that everywhere. > Here's a patch. Is this what you want? OK by me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_size_pretty, SHOW, and spaces
On 8/2/16 12:51 PM, Bruce Momjian wrote: > Yes, that's a strong argument for using a space. I have adjusted the > patch to use spaces in all reasonable places. Patch attached, which I > have gzipped because it was 133 KB. (Ah, see what I did there?) :-) > > I am thinking of leaving the 9.6 docs alone as I have already made them > consistent (no space) with minimal changes. We can make it consistent > the other way in PG 10. I don't think anyone wanted to *remove* the spaces in the documentation. I think this change makes the documentation harder to read. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hash index with larger hashes
Hello Developers, I have been following the recent discussions on increasing the size of the hash function used in Postgres and the work to provide WAL and performance improvements for hash indexes. I know it was mentioned when we moved to the new hashing functions, but the existing functions do provide an additional 32-bits of hash. We currently do not use them, but they are already calculated. It had occurred to me that one way to decrease the space used to store the hash value would be to include information about the page number to determine the actual value. For example, a hash index of 65k pages (540mb) would get two additional bytes of hash with no associated storage cost. Also, if you subdivided the hash page into say 128 sub-pages you would get the extra 2 bytes of hash in a 4mb index. In this way, the bigger the hash index is, the more bits you get for free. Just wanted to throw it out there. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncate trigger for foreign data wrappers
On 2016-08-05 10:33:49 -0400, Tom Lane wrote: > Murat Tuncerwrites: > > I recently hit a road blocker when I tried to create a truncate trigger on > > a foreign table. trigger.c::CreateTrigger() function has explicit check to > > block truncate trigger on foreign tables. > > That's good, because we don't implement TRUNCATE on foreign tables: there > is nothing in the FDW API that would support it. Not much point in > declaring a trigger for an event that can never happen. Well, allowing BEFORE triggers to return NULL or something, preventing the execution of the rest, would be such an implementation, and also independently useful. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncate trigger for foreign data wrappers
Murat Tuncerwrites: > I recently hit a road blocker when I tried to create a truncate trigger on > a foreign table. trigger.c::CreateTrigger() function has explicit check to > block truncate trigger on foreign tables. That's good, because we don't implement TRUNCATE on foreign tables: there is nothing in the FDW API that would support it. Not much point in declaring a trigger for an event that can never happen. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls
Andrew Gierthwrites: > Hm. I am wrong about this, since it's the fact that consumers are taking > stanullfrac into account that makes the value wrong in the first place. Also, the way that the value is calculated in the samples-not-all-distinct case corresponds to the way I have it in the patch. What you want to do would correspond to leaving these edge cases alone and changing all the other ANALYZE cases instead (*plus* changing the consumers). I find that a tad scary. > But I think the fix is still wrong, because it changes the meaning of > ALTER TABLE ... ALTER col SET (n_distinct=...) in a non-useful way; it > is no longer possible to nail down a useful negative n_distinct value if > the null fraction of the column is variable. I think that argument is bogus. If we change the way that get_variable_numdistinct (and other consumers) use the value, that will break all existing custom settings of n_distinct, because they will no longer mean what they did before. There have been exactly zero field complaints that people could not get the results they wanted, so I do not think that's justified. In short, what you want to do constitutes a redefinition of stadistinct, while my patch doesn't. That is far more invasive and I fear it will break things that are not broken today. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column COMMENTs in CREATE TABLE?
On 7/3/16 11:41 AM, Tom Lane wrote: > I can see the reasoning for > allowing COMMENT in a table column definition, but the argument for > allowing it in simpler CREATEs seems tissue-thin: > > CREATE FUNCTION foo(int) RETURNS ... ; > COMMENT ON FUNCTION foo(int) IS 'blah'; > > vs > > CREATE FUNCTION foo(int) RETURNS ... > WITH (COMMENT 'blah'); > > Not much of a keystroke savings, nor is the comment noticeably > "closer" to its object than before. I had actually been thinking about a similar proposal, but specifically for CREATE FUNCTION. But the syntax would have to put it above the function body, not below it. I think the CREATE FUNCTION syntax could actually handle that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regression test for extended query protocol
Robert Haaswrites: > I think it would be an interesting project for someone to try to > figure out how to make 'make check-extended-query-protocol' or similar > work. Seems like it would not be that hard to put some kind of option in psql to issue queries with PQexecParams not plain PQexec. However, since that wouldn't exercise sending out-of-line parameters, it's not clear to me that it'd be a very interesting test. I still suspect that doing this in core is mostly going to be duplicative effort, and we'd be better off making use of existing JDBC tests. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reviewing freeze map code
On Thu, Aug 4, 2016 at 3:24 AM, Andres Freundwrote: > Hi, > > On 2016-08-02 10:55:18 -0400, Noah Misch wrote: >> [Action required within 72 hours. This is a generic notification.] >> >> The above-described topic is currently a PostgreSQL 9.6 open item. Andres, >> since you committed the patch believed to have created it, you own this open >> item. > > Well kinda (it was a partial fix for something not originally by me), > but I'll deal with. Reading now, will commit tomorrow. Thanks. I kept meaning to get to this one, and failing to do so. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
On Tue, Aug 2, 2016 at 2:00 PM, Shay Rojanskywrote: >> Shay, why don't you use a profiler? Seriously. >> I'm afraid "iterate the per-message loop in PostgresMain five times not >> once" /"just discussing what may or may not be a problem..." is just >> hand-waving. >> >> Come on, it is not that hard. > > I really don't get what's problematic with posting a message on a mailing > list about a potential performance issue, to try to get people's reactions, > without diving into profiling right away. I'm not a PostgreSQL developer, > have other urgent things to do and don't even spend most of my programming > time in C. There's absolutely nothing wrong with that. I find your questions helpful and interesting and I hope you will keep asking them. I think that they are a valuable contribution to the discussion on this list. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Change the default of update_process_title to off
On Fri, Aug 5, 2016 at 01:12:39PM +0200, David Rowley wrote: > On 5 August 2016 at 12:25, Tsunakawa, Takayuki >wrote: > > It seems that we could reach a consensus. The patch is attached. I'll add > > this to the next CommitFest. > > > + The default is off on Windows > + because the overhead is significant, and on on other platforms. I think "on on" is odd. I think you want "and on for other platforms." -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
On Wed, Aug 3, 2016 at 7:35 PM, Bruce Momjianwrote: > On Wed, Aug 3, 2016 at 10:02:39AM -0400, Tom Lane wrote: >> Bruce Momjian writes: >> > On Sun, Jul 31, 2016 at 05:57:12PM -0400, Tom Lane wrote: >> >> In hindsight it seems clear that what a lot of apps want out of extended >> >> protocol is only the ability to send parameter values out-of-line instead >> >> of having to quote/escape them into SQL literals. Maybe an idea for the >> >> fabled V4 protocol update is some compromise query type that corresponds >> >> precisely to PQexecParams's feature set: you can send parameter values >> >> out-of-line, and you can specify text or binary results, but there's no >> >> notion of any persistent state being created and no feedback about >> >> parameter data types. >> >> > Do you want this on the TODO list? >> >> I didn't hear anyone say it was a silly idea, so sure. > > Done: > > Create a more efficient way to handle out-of-line parameters FWIW, I agree with this idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Fri, Aug 5, 2016 at 09:57:19AM +0530, Pavan Deolasee wrote: > Hmm. That seems like a real problem. The only way to address that is to ensure > that for a given WARM chain and given index key, there must exists only a > single index entry. There can and will be multiple entries if the index key > changes, but if the index key changes to the original value (or any other > value > already in the WARM chain) again, we must not add another index entry. Now > that > does not seem like a very common scenario and skipping a duplicate index entry > does have its own benefit, so may be its not a terrible idea to try that. > You're right, it may be expensive to search for an existing matching index key > for the same root line pointer. But if we could somehow short-circuit the more > common case, it might still be worth trying. I assumed that behavior was part of the original design; I stated earlier: Also, what happens when a tuple chain goes off-page, then returns to the original page? That is a new HOT chain given our current code, but would the new tuple addition realize it needs to create a new index tuple? The new tuple code needs to check the index's root HOT tid for a non-match. We have to store the _head_ of the HOT/WARM chain in the index, not only for pruning, but so we can know if this HOT/WARM chain is already in the index and skip the index addition. I think modifying an index tuple is expensive, but checking an index and avoiding an addition should be quick. > The other idea you mentioned is to index intermediate tuples but somehow > restrict WARM chain following knowing that the subsequent tuples are reachable > via different index tuple. I see two major problems with that approach: I don't see much value of doing that, and a lot of complexity. > > 1. When HOT or WARM chains are pruned and dead tuples removed, we may lose the > knowledge about key changes between intermediate updates. Without that its > seems difficult to know when to stop while following chain starting from the > old index tuple. > > 2. The existence of index pointers to intermediate tuples will lead to index > bloat. This is the same problem that we found in Simon's original proposal > (discussed internally). None of the intermediate line pointers can be vacuumed > until the entire chain becomes DEAD. Event if the a duplicate index key is > inserted for every index, knowing that and reclaiming to the index pointers to > the original root line pointer, will be difficult. > > > > Not to mention the cost of scanning the chain of N tuples N times, > making it quadratic - bound by the number of tuples that can fit a > page, but still high. > > Solving that, I believe, will remove all the simplicity of pointing to > root tuples. > > > > You're right. But having all indexes point to the root line pointer makes > things simpler to manage and less buggy. So if we can somehow solve the > duplicate tuples problem, even at additional cost at update time, it might > still be worth. I would suggest that we should think more and we can find some > solution to the problem. Why is that hard? It seems simple to me as we just try to do the index insert, and skip it an entry for our key and ctid already exists. > I don't really see how you'd do that on yours. You seem to assume > finding a particular key-item pointer pair in an index would be > efficient, but IMHO that's not the case. > > > That's true. But we could somehow short-circuit the more common pattern, that > might still be worth. For corner cases, we can fall back to non-HOT update and > keep things simple. It will still be a huge improvement over what we have > currently. I don't see why it is hard. Trying to find the index entry for the old update row seems odd, and updating an index row seems odd, but skipping an index addition for the new row seems simple. Is the problem concurrent activity? I assumed already had that ability to add to HOT chains because we have to lock the update chain. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring of heapam code.
05.08.2016 16:30, Kevin Grittner: On Fri, Aug 5, 2016 at 2:54 AM, Anastasia Lubennikovawrote: They can be logically separated into three categories: "primary storage" - r, S, t, v. They store data and visibility information. The only implementation is heapam.c "secondary index" - i. They store some data and pointers to primary storage. Various existing AMs and managed by AM API. "virtual relations" - c, f, m. They have no physical storage, only entries in caches and catalogs. Materialized views (relkind == "m") have physical storage, and may have indexes. Good point. I сonfused letters for views (virtual relations) and materialized views (primary storage). Should be: "primary storage" - r, S, t, m. They store data and visibility information. The only implementation is heapam.c "secondary index" - i. They store some data and pointers to primary storage. Various existing AMs and managed by AM API. "virtual relations" - c, f, v. They have no physical storage, only entries in caches and catalogs. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cache Hash Index meta page.
On Thu, Aug 4, 2016 at 3:36 AM, Tom Lanewrote: > Jeff Janes writes: >> On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy >> wrote: >>> I have created a patch to cache the meta page of Hash index in >>> backend-private memory. This is to save reading the meta page buffer every >>> time when we want to find the bucket page. In “_hash_first” call, we try to >>> read meta page buffer twice just to make sure bucket is not split after we >>> found bucket page. With this patch meta page buffer read is not done, if the >>> bucket is not split after caching the meta page. > > Is this really safe? The metapage caching in btree is all right because > the algorithm is guaranteed to work even if it starts with a stale idea of > where the root page is. I do not think the hash code is equally robust > about stale data in its metapage. > I think stale data in metapage could only cause problem if it leads to a wrong calculation of bucket based on hashkey. I think that shouldn't happen. It seems to me that the safety comes from the fact that required fields (lowmask/highmask) to calculate the bucket won't be changed more than once without splitting the current bucket (which we are going to scan). Do you see a problem in hashkey to bucket mapping (_hash_hashkey2bucket), if the lowmask/highmask are changed by one additional table half or do you have something else in mind? > >> What happens on a system which has gone through pg_upgrade? > > That being one reason why. It might be okay if we add another hasho_flag > bit saying that hasho_prevblkno really contains a maxbucket number, and > then add tests for that bit everyplace that hasho_prevblkno is referenced. > Good idea. - if (retry) + if (opaque->hasho_prevblkno <= metap->hashm_maxbucket) This code seems to be problematic with respect to upgrades, because hasho_prevblkno will be initialized to 0x without the patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring of heapam code.
On Fri, Aug 5, 2016 at 2:54 AM, Anastasia Lubennikovawrote: > They can be logically separated into three categories: > "primary storage" - r, S, t, v. They store data and visibility information. > The only implementation is heapam.c > "secondary index" - i. They store some data and pointers to primary storage. > Various existing AMs and managed by AM API. > "virtual relations" - c, f, m. They have no physical storage, only entries > in caches and catalogs. Materialized views (relkind == "m") have physical storage, and may have indexes. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regression test for extended query protocol
On Tue, Aug 2, 2016 at 10:33 PM, Tatsuo Ishiiwrote: > In my understanding, we don't have any regression test for protocol > level prepared query (we do have SQL level prepared query tests, > though). Shouldn't we add those tests to the regression test suites? A few years ago, EDB had a bug that only manifested itself when using the extended query protocol. We hacked together something that could run our entire regression test suite (roughly equivalent to 'make check', but an order of magnitude larger) under the extended query protocol and found a few more bugs of the same general flavor. There were a few things that fell over that, I think, were not bugs, but it mostly worked. I think it would be an interesting project for someone to try to figure out how to make 'make check-extended-query-protocol' or similar work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans
On Tue, Jul 26, 2016 at 11:20 PM, Etsuro Fujitawrote: > I noticed that currently the core doesn't show any information on the target > relations involved in a foreign/custom join in EXPLAIN, by itself. I think that's a feature, not a bug. > postgres_fdw shows the target relations in the Relations line, as shown > above, but I think that the core should show such information independently > of FDWs; in the above example replace "Foreign Scan" with "Foreign Join on > public.ft1 t1, public.ft2 t2". I disagree with that. Currently, when we say that something is a join (Merge Join, Hash Join, Nested Loop) we mean that the executor is performing a join, but that's not the case here. The executor is performing a scan. The remote side, we suppose, is performing a join for us, but we are not performing a join: we are performing a scan. So I think the fact that it shows up in the plan as "Foreign Scan" is exactly right. We are scanning some foreign thing, and that thing may internally be doing other stuff, like joins and aggregates, but all we're doing is scanning it. Also, I don't really see the point of moving this from postgres_fdw to core. If, at some point in time, there are many FDWs that implement sophisticated pushdown operations and we figure out that they are all duplicating the code to do the EXPLAIN printout, and they're all printing basically the same thing, perhaps not in an entirely consistent way, then we could try to unify all of them into one implementation in core. But that's certainly not where we are right now. I don't see any harm at all in leaving this under the control of the FDW, and in fact, I think it's better. Neither the postgres_fdw format nor what you want to replace it with are so unambiguously awesome that some other FDW author might not come up with something better. I think we should leave this the way it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning
> > > > > Consider lists ('e', 'i', 'f'), ('h', 'd','m') and ('l', 'b', 'a') for a > > list partitioned tables. I am suggesting that we arrange them as > > ('a','b','l'), ('d', 'h', 'm') and ('e', 'f', 'i'). If the given row > (either > > for comparison or for inserting) has value 'c', we will search for it in > > ('a','b','l') but will be eliminate other two partitions since the > second's > > partition's lowest value is higher than 'c' and lowest values of rest of > the > > partitions are higher than that of the second partition. Without this > order > > among the partitions, we will have to compare lowest values of all the > > partitions. > > I would think that for that case what we'd want to do is create two > lists. One looks like this: > > [ 'a', 'b', 'd', 'e', f', 'h', 'i', 'l', 'm' ] > > The other looks like this: > > [3, 3, 2, 1, 1, 2, 1, 1, 3, 2] > small correction; there's an extra 1 here. Every partition in the example has only three values. > > Given a particular value, bsearch the first list. If the value is not > found, it's not in any partition. If it is found, then go look up the > same array index in the second list; that's the containing partition. > +1, if we could do it. It will need a change in the way Amit's patch stores partitioning scheme in PartitionDesc. This way specifications {('e', 'i', 'f'), ('h', 'd','m') and ('l', 'b', 'a')} and {('h', 'd','m') , ('e', 'i', 'f'), and ('l', 'b', 'a')} which are essentially same, are represented in different ways. It makes matching partitions for partition-wise join a bit tedius. We have to make sure that the first array matches for both the joining relations and then make sure that all the values belonging to the same partition for one table also belong to the same partition in the other table. Some more complex logic for matching subsets of lists for partition-wise join. At least for straight forward partitioned table matching it helps to have both these array look same independent of the user specification. From that point of view, the partition be ordered by their lowest or highest list values and the second array is the index in the ordered set. For both the specifications above, the list will look like [ 'a', 'b', 'd', 'e', f', 'h', 'i', 'l', 'm' ] [1, 1, 2, 3, 3, 2, 3, 1, 2] -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] handling unconvertible error messages
On Thu, 04 Aug 2016 14:25:52 -0400 Tom Lanewrote: > Victor Wagner writes: > > Really, if this response is sent after backend has been forked, > > problem probably can be easily fixed better way - StartupMessage > > contain information about desired client encoding, so this > > information just should be processed earlier than any other > > information from this message, which can cause errors (such as > > database name). > > I think that's wishful thinking. There will *always* be errors that > come out before we can examine the contents of the startup message. > Moreover, until we've done authentication, we should be very wary of > applying client-specified settings at all: they might be malicious. I think that this case can be an exception from the rule "don't apply settings from the untrusted source". Let's consider possible threat model: 1. We anyway parse StartupMessage before authentication. There is nothing we can do with it, so parser should be robust enough, to handle untrusted input. As I can see from the quick glance, it is. 2. When encoding name is parsed, it is used to search in the array of supported encoding. No possible attack here - either it is valid or not. 3. As far as I know, we don't allow client to change language, only encoding, so it is not even possible that attacker could make messages in the log unreadable for the system administartor. So, if we would fix the problem, reported by Peter Eisentraut at the begining of this thread, and fall back to untranslated messages whenever client-requested encoding is unable to represent messages in the server default language, this solution, would be not worse than your solution. There would be fallback to C locale in any case of doubt, but in the case when NLS messages can be made readable, they would be readable. Really, there is at least one case, when fallback to C locale should be done unconditionally - a CancelRequest. In this case client cannot send an encoding, so C locale should be used. As far as I understand it is not the case with SSLRequest. Although it doesn't contain encoding information as well as CancelRequest, errors in subsequent SSL negotiations would be reported by client-side SSL libraries, not by server. -- > > regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning
On Fri, Aug 5, 2016 at 8:10 AM, Ashutosh Bapatwrote: > On Fri, Aug 5, 2016 at 5:21 PM, Robert Haas wrote: >> On Fri, Aug 5, 2016 at 6:53 AM, Ashutosh Bapat >> wrote: >> > The lists for list partitioned tables are stored as they are specified >> > by >> > the user. While searching for a partition to route tuple to, we compare >> > it >> > with every list value of every partition. We might do something better >> > similar to what's been done to range partitions. The list of values for >> > a >> > given partition can be stored in ascending/descending sorted order. Thus >> > a >> > binary search can be used to check whether given row's partition key >> > column >> > has same value as one in the list. The partitions can then be stored in >> > the >> > ascending/descending order of the least/greatest values of corresponding >> > partitions. >> >> +1. Here as with range partitions, we must be sure to know which >> opclass should be used for the comparisons. >> >> > We might be able to eliminate search in a given partition if its >> > lowest value is higher than the given value or its higher value is lower >> > than the given value. >> >> I don't think I understand this part. > > Consider lists ('e', 'i', 'f'), ('h', 'd','m') and ('l', 'b', 'a') for a > list partitioned tables. I am suggesting that we arrange them as > ('a','b','l'), ('d', 'h', 'm') and ('e', 'f', 'i'). If the given row (either > for comparison or for inserting) has value 'c', we will search for it in > ('a','b','l') but will be eliminate other two partitions since the second's > partition's lowest value is higher than 'c' and lowest values of rest of the > partitions are higher than that of the second partition. Without this order > among the partitions, we will have to compare lowest values of all the > partitions. I would think that for that case what we'd want to do is create two lists. One looks like this: [ 'a', 'b', 'd', 'e', f', 'h', 'i', 'l', 'm' ] The other looks like this: [3, 3, 2, 1, 1, 2, 1, 1, 3, 2] Given a particular value, bsearch the first list. If the value is not found, it's not in any partition. If it is found, then go look up the same array index in the second list; that's the containing partition. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning
On Fri, Aug 5, 2016 at 5:21 PM, Robert Haaswrote: > On Fri, Aug 5, 2016 at 6:53 AM, Ashutosh Bapat > wrote: > > The lists for list partitioned tables are stored as they are specified by > > the user. While searching for a partition to route tuple to, we compare > it > > with every list value of every partition. We might do something better > > similar to what's been done to range partitions. The list of values for a > > given partition can be stored in ascending/descending sorted order. Thus > a > > binary search can be used to check whether given row's partition key > column > > has same value as one in the list. The partitions can then be stored in > the > > ascending/descending order of the least/greatest values of corresponding > > partitions. > > +1. Here as with range partitions, we must be sure to know which > opclass should be used for the comparisons. > > > We might be able to eliminate search in a given partition if its > > lowest value is higher than the given value or its higher value is lower > > than the given value. > > I don't think I understand this part. > Consider lists ('e', 'i', 'f'), ('h', 'd','m') and ('l', 'b', 'a') for a list partitioned tables. I am suggesting that we arrange them as ('a','b','l'), ('d', 'h', 'm') and ('e', 'f', 'i'). If the given row (either for comparison or for inserting) has value 'c', we will search for it in ('a','b','l') but will be eliminate other two partitions since the second's partition's lowest value is higher than 'c' and lowest values of rest of the partitions are higher than that of the second partition. Without this order among the partitions, we will have to compare lowest values of all the partitions. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Declarative partitioning
On Fri, Aug 5, 2016 at 6:53 AM, Ashutosh Bapatwrote: > The lists for list partitioned tables are stored as they are specified by > the user. While searching for a partition to route tuple to, we compare it > with every list value of every partition. We might do something better > similar to what's been done to range partitions. The list of values for a > given partition can be stored in ascending/descending sorted order. Thus a > binary search can be used to check whether given row's partition key column > has same value as one in the list. The partitions can then be stored in the > ascending/descending order of the least/greatest values of corresponding > partitions. +1. Here as with range partitions, we must be sure to know which opclass should be used for the comparisons. > We might be able to eliminate search in a given partition if its > lowest value is higher than the given value or its higher value is lower > than the given value. I don't think I understand this part. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Thu, Aug 4, 2016 at 8:02 PM, Mithun Cywrote: > I did some basic testing of same. In that I found one issue with cursor. > Thanks for the testing. The reason for failure was that the patch didn't take into account the fact that for scrolling cursors, scan can reacquire the lock and pin on bucket buffer multiple times. I have fixed it such that we release the pin on bucket buffers after we scan the last overflow page in bucket. Attached patch fixes the issue for me, let me know if you still see the issue. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com concurrent_hash_index_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Change the default of update_process_title to off
On 5 August 2016 at 12:25, Tsunakawa, Takayukiwrote: > It seems that we could reach a consensus. The patch is attached. I'll add > this to the next CommitFest. + The default is off on Windows + because the overhead is significant, and on on other platforms. "than on other platforms" But perhaps it's better written like: + This value defaults to "off" on Windows platforms due to the platform's significant overhead for updating the process title. +1 for this patch from me. I'd hate to think someone would quickly dismiss Postgres on windows due to some low TPS caused by updating the process title. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning
The lists for list partitioned tables are stored as they are specified by the user. While searching for a partition to route tuple to, we compare it with every list value of every partition. We might do something better similar to what's been done to range partitions. The list of values for a given partition can be stored in ascending/descending sorted order. Thus a binary search can be used to check whether given row's partition key column has same value as one in the list. The partitions can then be stored in the ascending/descending order of the least/greatest values of corresponding partitions. We might be able to eliminate search in a given partition if its lowest value is higher than the given value or its higher value is lower than the given value. On Thu, Jul 21, 2016 at 10:10 AM, Amit Langote < langote_amit...@lab.ntt.co.jp> wrote: > On 2016/07/19 22:53, Ashutosh Bapat wrote: > > I am seeing following warning with this set of patches. > > gram.y:4734:24: warning: assignment from incompatible pointer type > [enabled > > by default] > > > Thanks, will fix. Was a copy-paste error. > > Thanks, > Amit > > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls
> "Andrew" == Andrew Gierthwrites: > "Tom" == Tom Lane writes: Tom> What I did in the patch is to scale the formerly fixed "-1.0" Tom> stadistinct estimate to discount the fraction of nulls we found. Andrew> This seems quite dubious to me. stadistinct representing only Andrew> the non-null values seems to me to be substantially more useful Andrew> and less confusing; it should be up to consumers to take Andrew> stanullfrac into account (in general they already do) since in Andrew> many cases we explicitly do _not_ want to count nulls. Hm. I am wrong about this, since it's the fact that consumers are taking stanullfrac into account that makes the value wrong in the first place. For example, if a million-row table has stanullfrac=0.9 and stadistinct=-1, then get_variable_numdistinct is returning 1 million, and (for example) var_eq_non_const divides 0.1 by that to give a selectivity of 1 in 10 million, which is obviously wrong. But I think the fix is still wrong, because it changes the meaning of ALTER TABLE ... ALTER col SET (n_distinct=...) in a non-useful way; it is no longer possible to nail down a useful negative n_distinct value if the null fraction of the column is variable. Would it not make more sense to do the adjustment in get_variable_numdistinct, instead? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 10 Roadmaps
On 5 August 2016 at 04:40, Etsuro Fujitawrote: > On 2016/08/02 23:54, Simon Riggs wrote: >> >> https://wiki.postgresql.org/wiki/PostgreSQL10_Roadmap > > > Thank you for creating the wiki page! > > I added a link to the NTT roadmap page on the links page. We hope that we > can collaborate on the projects with people working at other companies or > with individual contributors in the same development areas. Very good, thanks -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Change the default of update_process_title to off
> From: Tom Lane [mailto:t...@sss.pgh.pa.us] > Yeah, I think I agree. It would be bad to disable it by default on Unix, > because ps(1) is a very standard tool there, but the same argument doesn't > hold for Windows. It seems that we could reach a consensus. The patch is attached. I'll add this to the next CommitFest. > Another route to a solution would be to find a cheaper way to update the > process title on Windows ... has anyone looked for alternatives? I couldn't find an alternative solution after asking some Windows support staff. Regards Takayuki Tsunakawa update_process_title_off_on_win.patch Description: update_process_title_off_on_win.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls
På fredag 05. august 2016 kl. 01:01:06, skrev Tom Lane>: I wrote: > Looking around, there are a couple of places outside commands/analyze.c > that are making the same mistake, so this patch isn't complete, but it > illustrates what needs to be done. Here's a more complete patch. regards, tom lane (Sorry for HTML-email, but we live in 2016, and I think it improves readability) Great! I have tested it (with PG-9.6 from HEAD, e7caacf733f3ee77a555aa715ab6fbf4434e6b52) and it sure looks like it fixes the problem for my query. The query: explain analyze SELECT log.relation_id as company_id , sum(log.duration) AS durationFROM onp_crm_activity_log log JOIN onp_crm_person logfor ON logfor.onp_user_id =log.logged_for AND logfor.is_resource = FALSE WHERE 1 = 1 -- Filter out already invoiced AND NOT EXISTS( SELECT * FROM onp_crm_calendarentry_invoice_membership cemJOIN onp_crm_invoice_line il ON cem.invoice_line_id = il.idJOIN onp_crm_invoice inv ON il.invoice_id = inv.entity_idWHERE cem.calendar_entry_id = log.id AND NOT EXISTS( SELECT * FROM onp_crm_invoice creditnoteWHERE il.invoice_id = creditnote.credit_against ) ) GROUP BY log.relation_id ; Gave the following explain-plan (before the patch), with enable_nestloop=off (needed to prevent nest-loop-anti-join which caused this query to execute in ~26 min.): QUERY PLAN --- HashAggregate (cost=25201.62..25220.06 rows=1475 width=36) (actual time =381.191..381.281rows=404 loops=1) Group Key: log.relation_id -> Hash Anti Join (cost=4782.34..25148.28 rows=10667 width=12) (actual time=103.683..361.409 rows= 148330loops=1) Hash Cond: (log.id = cem.calendar_entry_id) -> Hash Join (cost =81.46..20312.73rows=10667 width=20) (actual time=0.100..156.432 rows=380617 loops=1) Hash Cond: (log.logged_for = logfor.onp_user_id) -> Seq Scan on onp_crm_activity_loglog (cost=0.00..18696.71 rows=380771 width=24) (actual time =0.005..49.195rows=380774 loops=1) -> Hash (cost=25.02..25.02 rows=4515 width=8 ) (actualtime=0.078..0.078 rows=119 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 69kB -> Index Scan using onp_crm_person_onp_id_idx on onp_crm_person logfor (cost=0.14..25.02 rows=4515 width=8) (actual time=0.005..0.063 rows=119 loops=1) Filter: (NOT is_resource) Rows Removed by Filter: 8 -> Hash (cost =4700.87..4700.87rows=1 width=4) (actual time=103.575..103.575 rows=232412 loops=1) Buckets: 262144 (originally 1024) Batches: 1 (originally 1) Memory Usage: 10219kB -> Hash Join (cost=474.41..4700.87 rows=1 width=4) (actual time =9.724..76.015rows=232412 loops=1) Hash Cond: (cem.invoice_line_id = il.id) -> Seq Scanon onp_crm_calendarentry_invoice_membership cem (cost=0.00..3354.28 rows =232528 width=8) (actual time=0.004..17.626 rows=232528 loops=1) -> Hash (cost =474.40..474.40rows=1 width=4) (actual time=9.710..9.710 rows=11535 loops=1) Buckets:16384 (originally 1024) Batches: 1 (originally 1) Memory Usage: 534kB -> MergeJoin (cost=415.33..474.40 rows=1 width=4) (actual time=4.149..8.467 rows =11535 loops=1) Merge Cond: (il.invoice_id = inv.entity_id) -> Sort (cost =415.05..415.06rows=1 width=8) (actual time=4.138..4.701 rows=11535 loops=1) SortKey: il.invoice_id Sort Method: quicksort Memory: 925kB -> Hash Anti Join ( cost=77.13..415.04 rows=1 width=8) (actual time=0.257..2.716 rows=11535 loops=1) HashCond: (il.invoice_id = creditnote.credit_against) -> Seq Scan on onp_crm_invoice_line il (cost=0.00..294.30 rows=11630 width=8) (actual time =0.003..0.995rows=11630 loops=1) -> Hash (cost=50.38..50.38 rows=2140 width=4) (actualtime=0.247..0.247 rows=37 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 34kB -> Index Only Scan using origo_invoice_credit_against_idx on onp_crm_invoice creditnote (cost=0.28..50.38 rows=2140 width=4) (actual time =0.013..0.182rows=2140 loops=1) Heap Fetches: 0 -> Index Only Scan using origo_invoice_id_status_sent_idxon onp_crm_invoice inv (cost=0.28..53.98 rows= 2140width=8) (actual time=0.008..1.274 rows=11576 loops=1) Heap Fetches: 0 Planningtime: 0.955 ms Execution time: 381.884 ms (35 rows) AFAIU, this is the problematic estimate: -> Hash Anti Join (cost=77.13..415.04 rows=1 width=8) (actual time=0.257..2.716 rows=11535 loops=1) Right? now (after the patch, and without needing to disable nest_loop) gives the following explain-plan: QUERY PLAN HashAggregate (cost=44409.89..44428.47 rows=1486 width=36) (actual time =366.502..366.594rows=404 loops=1) Group Key: log.relation_id -> Hash Anti Join (cost=10157.30..43650.11
[HACKERS] truncate trigger for foreign data wrappers
Hello I recently hit a road blocker when I tried to create a truncate trigger on a foreign table. trigger.c::CreateTrigger() function has explicit check to block truncate trigger on foreign tables. However, trigger.c::ExecuteTruncate() does not seem to have any problems issuing before/after triggers to fdws. Just tapping on the utility hook to catch truncate statement did not work when the fdw is inside inheritance hierarchy. Is there a way to enable truncate triggers for foreign tables in the long run ? Or be able to detect somehow my fdw table is getting a truncate request ? thanks -- *Murat Tuncer*Software Engineer | Citus Data mtun...@citusdata.com
Re: [HACKERS] Refactoring of heapam code.
On 5 August 2016 at 08:54, Anastasia Lubennikovawrote: > Working on page compression and some other issues related to > access methods, I found out that the code related to heap > looks too complicated. Much more complicated, than it should be. > Since I anyway got into this area, I want to suggest a set of improvements. > > There is a number of problems I see in the existing code: > Problem 1. Heap != Relation. > > File heapam.c has a note: > * This file contains the heap_ routines which implement > * the POSTGRES heap access method used for all POSTGRES > * relations. > This statement is wrong, since we also have index relations, > that are implemented using other access methods. > > Also I think that it's quite strange to have a function heap_create(), that > is called > from index_create(). It has absolutely nothing to do with heap access > method. > > And so on, and so on. > > One more thing that requires refactoring is "RELKIND_RELATION" name. > We have a type of relation called "relation"... > > Problem 2. > Some functions are shared between heap and indexes access methods. > Maybe someday it used to be handy, but now, I think, it's time to separate > them, because IndexTuples and HeapTuples have really little in common. > > Problem 3. The word "heap" is used in many different contexts. > Heap - a storage where we have tuples and visibility information. > Generally speaking, it can be implemented over any data structure, > not just a plain table as it is done now. > > Heap - an access method, that provides a set of routines for plain table > storage, such as insert, delete, update, gettuple, vacuum and so on. > We use this access method not only for user's tables. > > There are many types of relations (pg_class.h): > #define RELKIND_RELATION'r'/* ordinary table */ > #define RELKIND_INDEX 'i'/* secondary index */ > #define RELKIND_SEQUENCE'S'/* sequence object */ > #define RELKIND_TOASTVALUE 't'/* for out-of-line > values */ > #define RELKIND_VIEW'v'/* view */ > #define RELKIND_COMPOSITE_TYPE 'c'/* composite type */ > #define RELKIND_FOREIGN_TABLE 'f'/* foreign table */ > #define RELKIND_MATVIEW 'm'/* materialized view */ > > They can be logically separated into three categories: > "primary storage" - r, S, t, v. They store data and visibility information. > The only implementation is heapam.c > "secondary index" - i. They store some data and pointers to primary storage. > Various existing AMs and managed by AM API. > "virtual relations" - c, f, m. They have no physical storage, only entries > in caches and catalogs. > > Now, for some reasons, we sometimes use name "heap" for both > "primary storage" and "virual relations". See src/backend/catalog/heap.c. > But some functions work only with "primary storage". See pgstat_relation(). > And we constantly have to check relkind everywhere. > I'd like it would be clear from the code interface and naming. > > So there is a couple of patches. They do not cover all mentioned problems, > but I'd like to get a feedback before continuing. > > Patch 1: > There is a macro "PageGetItem" in bufpage.h that is used to retrieve an item > from the given page. It is used for both heap and index tuples. > It doesn't seems a problem, until you are going to find anything in this > code. > > The first patch "PageGetItem_refactoring.patch" is intended to fix it. > It is just renaming: > (IndexTuple) PageGetItem ---> PageGetItemIndex > (HeapTupleHeader) PageGetItem ---> PageGetItemHeap > Other types of tuples (BrinTuple, SpGistInnerTuple, SpGistLeafTuple, > SpGistDeadTuple) > still use PageGetItem. > > I also changed functions that do not access tuple's data, but only need > HeapTupleHeader fields. They use a macro PageGetItemHeapHeaderOnly. > > I do not insist on these particular names, by the way. > > Patch 2. > heapam.c, hio.c and src/backend/catalog/heap.c have a lot of confusing names > and outdated comments. The patch is intended to improve it. > Fun fact: I found several "check it later" comments unchanged since > "Postgres95 1.01 Distribution - Virgin Sources" commit. > > I wonder if we can wind better name relation_drop_with_catalog() since, > again, it's > not about all kinds of relations. We use another function to drop index > relations. > > I hope these changes will be useful for the future development. > Suggested patches are mostly about renaming and rearrangement of functions > between files, so, if nobody has conceptual objections, all I need from > reviewers > is an attentive look to find typos, grammar mistakes and overlooked areas. Could you add this to the commitfest? Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls
> "Tom" == Tom Lanewrites: Tom> What I did in the patch is to scale the formerly fixed "-1.0" Tom> stadistinct estimate to discount the fraction of nulls we found. This seems quite dubious to me. stadistinct representing only the non-null values seems to me to be substantially more useful and less confusing; it should be up to consumers to take stanullfrac into account (in general they already do) since in many cases we explicitly do _not_ want to count nulls. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans
On 2016/08/02 21:35, Etsuro Fujita wrote: I removed the Relations line. Here is an updated version of the patch. I revised code and comments a bit. Attached is an updated version of the patch. Best regards, Etsuro Fujita explain-for-foreign-join-pushdown-v2.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
Hello, While the exact cause of the situation is not known yet but we have apparently forgot that pg_stop_backup can be called simultaneously with pg_start_backup. It seems anyway to me that pg_stop_backup should be called before the end of pg_start_backup from their definition and what they do. And it should fix the assertion failure. On solution is exclusiveBackupStarted (I'd like to rename the variable) on shared memory as the patch does. Another place where we can have something with the same effect is file system. We can create 'backup_label.tmp" at very early in pg_start_backup and rename it to backup_label at the end of the function. Anyway exclusive pg_stop_backup needs that. A defect of that would be the remaining backup_label.tmp file after crash during pg_start_backup. Renaming tmp to (none) is atomic enough for this usage but I'm not sure if it's in a network file system. exclusiveBackup is also used this kind of exclusion, this also is replasable with the .tmp file. But after some thoughts, it found to need to add a bunch of error handling code for the file operations and it should become too complex. So I abandoned it. At Fri, 5 Aug 2016 12:16:13 +0900, Michael Paquierwrote in > On Thu, Aug 4, 2016 at 4:38 PM, Michael Paquier > wrote: > > Andreas, with the patch attached is the assertion still triggered? > > > > Thoughts from others are welcome as well. > > Self review: > * of streaming base backups currently in progress. forcePageWrites is > set > * to true when either of these is non-zero. lastBackupStart is the > latest > * checkpoint redo location used as a starting point for an online > backup. > + * exclusiveBackupStarted is true while pg_start_backup() is being called > + * during an exclusive backup. > */ > boolexclusiveBackup; > intnonExclusiveBackups; > XLogRecPtrlastBackupStart; > +boolexclusiveBackupStarted; > It would be better to just use one variable that uses an enum to track > the following states of an exclusive backup: none, starting and > in-progress. What I thought when I looked the first patch is that. Addition to that I don't see the name exclusiveBackupStated is appropriate. One more argument is the necessity of the WALInsertLock at the end of pg_start_backup. No other backend cannot reach there concurrently and possible latency from cache coherency won't be a matter for the purpose, I suppose. What do you think it is needed for? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers