[HACKERS] patch proposal
Hi, During the recovery process, It would be nice if PostgreSQL generates an error by aborting the recovery process (instead of starting-up the cluster) if the intended recovery target point is not reached and give an option to DBA to resume the recovery process from where it exactly stopped. The issue here is, if by any chance, the required WALs are not available or if there is any WAL missing or corrupted at the restore_command location, then PostgreSQL recovers until the end of the last available WAL and starts-up the cluster. At a later point-of-time, if the missing WAL is found, then, it is not possible to resume the recovery process from where it stopped previously. The whole backup restoration + recovery process must be initiated from the beginning which can be time consuming especially when recovering huge clusters sizing up to Giga bytes and Tera Bytes requiring 1000s of WALs to be copied over for recovery. Any thoughts ? comments? I can work on this patch based on the comments/feedback. Regards, Venkata B N Fujitsu Australia
Re: [HACKERS] condition variables
On Mon, Aug 15, 2016 at 5:58 PM, Thomas Munro wrote: > Also, I have attached a v2->v3 diff ... Ugh. I meant a v1->v2 diff. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] condition variables
On Sun, Aug 14, 2016 at 9:04 AM, Thomas Munro wrote: > On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas wrote: >> [condition-variable-v1.patch] > > Don't you need to set proc->cvSleeping = false in ConditionVariableSignal? I poked at this a bit... OK, a lot... and have some feedback: 1. As above, we need to clear cvSleeping before setting the latch. 2. The following schedule corrupts the waitlist by trying to delete something from it that isn't in it: P1: ConditionVariablePrepareToSleep: push self onto waitlist P2: ConditionVariableSignal: pop P1 from waitlist P1: P3: ConditionVariablePrepareToSleep: push self onto waitlist P1: ConditionVariableCancelSleep: delete self from waitlist (!) Without P3 coming along you probably wouldn't notice because the waitlist will be happily cleared and look valid, but P3's entry gets lost and then it hangs forever. One solution is to teach ConditionVariableCancelSleep to check if we're still actually there first. That can be done in O(1) time by clearing nodes' next and prev pointers when deleting, so that we can rely on that in a new proclist_contains() function. See attached. 3. The following schedule corrupts the waitlist by trying to insert something into it that is already in it: P1: ConditionVariablePrepareToSleep: push self onto waitlist P1: P1: ConditionVariableSleep P1: ConditionVariablePrepareToSleep: push self onto waitlist (!) Nodes before and after self's pre-existing position can be forgotten when self's node is pushed to the front of the list. That can be fixed by making ConditionVariablePrepareToSleep also check if we're already in the list. 4. The waitlist is handled LIFO-ly. Would it be better for the first guy to start waiting to be woken up first, like we do for locks? The Pthreads equivalent says that it depends on "scheduling policy". I don't know if it matters much, just an observation. 5. The new proclist function you added is the first to work in terms of PGPROC* rather than procno. Maybe the whole interface should work with either PGPROC pointers or procnos? No strong view. Please find attached a -v2 of your patch which includes suggestions 1-3 above. Like the -v1, it applies on top of lwlocks-in-dsm-v3.patch. Also, I have attached a v2->v3 diff to show just my proposed changes. -- Thomas Munro http://www.enterprisedb.com diff.patch Description: Binary data condition-variable-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
On 2016-08-14 21:04:57 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2016-08-07 14:46:06 -0400, Tom Lane wrote: > >> Robert Haas writes: > >>> I think the whole idea of a fast temporary table is that there are no > >>> catalog entries. If there are no catalog entries, then dependencies > >>> are not visible. If there ARE catalog entries, to what do they refer? > >>> Without a pg_class entry for the table, there's no table OID upon > >>> which to depend. > > >> TBH, I think that the chances of such a design getting committed are > >> not distinguishable from zero. Tables have to have OIDs; there is just > >> too much code that assumes that. And I seriously doubt that it will > >> work (for any large value of "work") without catalog entries. > > > That seems a bit too defeatist. > > Huh? I didn't say we shouldn't work on the problem --- I just think that > this particular approach isn't good. Which you seemed to agree with. I took your statement to mean that they need a pg_class entry - even if there were a partial solution to the pg_depend problem allowing to avoid pg_attribute entries, tha't still not really be a solution. If that's not what you mean, sorry - and nice that we agree ;) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
Andres Freund writes: > On 2016-08-07 14:46:06 -0400, Tom Lane wrote: >> Robert Haas writes: >>> I think the whole idea of a fast temporary table is that there are no >>> catalog entries. If there are no catalog entries, then dependencies >>> are not visible. If there ARE catalog entries, to what do they refer? >>> Without a pg_class entry for the table, there's no table OID upon >>> which to depend. >> TBH, I think that the chances of such a design getting committed are >> not distinguishable from zero. Tables have to have OIDs; there is just >> too much code that assumes that. And I seriously doubt that it will >> work (for any large value of "work") without catalog entries. > That seems a bit too defeatist. Huh? I didn't say we shouldn't work on the problem --- I just think that this particular approach isn't good. Which you seemed to agree with. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
On 2016-08-07 14:46:06 -0400, Tom Lane wrote: > Robert Haas writes: > > I think the whole idea of a fast temporary table is that there are no > > catalog entries. If there are no catalog entries, then dependencies > > are not visible. If there ARE catalog entries, to what do they refer? > > Without a pg_class entry for the table, there's no table OID upon > > which to depend. > > TBH, I think that the chances of such a design getting committed are > not distinguishable from zero. Tables have to have OIDs; there is just > too much code that assumes that. And I seriously doubt that it will > work (for any large value of "work") without catalog entries. That seems a bit too defeatist. It's obviously not a small change to get there - and I don't think the patch upthread is really attacking the relevant problems yet - but saying that we'll never have temp tables without pg_class/pg_depend bloat seems to be pretty close to just giving up. Having 8 byte oids (as explicit columns instead of magic? Or just oid64?) and then reserving ranges for temp objects stored in a local memory seems to be feasible. The pinning problem could potentially be solved by "session lifetime" pins in pg_depend, which prevents dependent objects being dropped. Obviously that's just spitballing; but I think the problem is too big to just give up. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Improve handling of ^C in psql / top-level transaction abort
I recently had a client contact me thinking that a CREATE MATERIALIZED VIEW had somehow managed to keep running in the background after being ^C'd in psql. Turns out this confusion was because their monitoring eventually complained about the still open (but now aborted) transaction. The user didn't realize that ^C of a command in an open transaction would still leave the transaction open. I'm wondering if there's some way to improve this. One idea is if TBLOCK_INPROGRESS could also do some of what CleanupTransaction() does but still leave the status in TBLOCK_ABORT. In particular, release snapshots. Since there's no substransaction, there's no possibility of running further commands. Is there risk to that with portals still open? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls
På søndag 07. august 2016 kl. 18:35:56, skrev Tom Lane mailto:t...@sss.pgh.pa.us>>: Dean Rasheed writes: > On 5 August 2016 at 21:48, Tom Lane wrote: >> OK, thanks. What shall we do about Andreas' request to back-patch this? >> I'm personally willing to do it, but there is the old bugaboo of "maybe >> it will destabilize a plan that someone is happy with". > My inclination would be to back-patch it because arguably it's a > bug-fix -- at the very least the old behaviour didn't match the docs > for stadistinct: Yeah. I suspect that situations like this are pretty rare, or we'd have recognized the problem sooner. And at least for Andreas, it'd be fixing a real problem. I'll apply the back-patch unless I hear objections in the next couple of hours. > Additionally, I think that example is misleading because it's only > really true if there are no null values in the column. Perhaps it > would help to have a more explicit example to illustrate how nulls > affect stadistinct, for example: Good idea, will do. regards, tom lane (Sorry for the HTML-formatting, I'm using a web-based client which isn't very good at converting the HTML to plain-text) Just wanted to thank you for fixing this. Take this query (pg-9.5): explain analyze SELECT com.entity_id AS company_id , sum((coalesce(log .adjusted_time,log.duration) / (3600 * 1000)::numeric) * coalesce(log .adjusted_hourly_rate,log.hourly_rate)) as not_invoiced_hours_amount_adjusted , sum((log.duration / (3600 * 1000)::numeric) * log.hourly_rate) as not_invoiced_hours_amount_original ,sum(coalesce(log.adjusted_time, log .duration))AS not_invoiced_hours_duration_adjusted , sum(log.duration) AS not_invoiced_hours_duration_originalFROM onp_crm_relation com JOIN onp_crm_activity_loglog ON com.entity_id = log.relation_id JOIN onp_crm_person logforON logfor.onp_user_id = log.logged_for AND logfor.is_resource = FALSE LEFT OUTER JOIN( onp_crm_activity act INNER JOIN onp_crm_project proj ON act.project_id = proj.id )ON log.activity_id = act.id WHERE 1 = 1 AND log .entry_type ='TIMESHEET_ENTRY' AND log.start_date IS NOT NULL AND log .is_chargable =TRUE AND log.status IN (1,2,3,5) AND log.start_date <= '2016-06-27' -- AND com.entity_id = 2246068 AND NOT EXISTS( SELECT * FROM onp_crm_calendarentry_invoice_membership cemJOIN onp_crm_invoice_line il ON cem.invoice_line_id = il.idJOIN onp_crm_invoice inv ON il.invoice_id = inv.entity_idWHERE cem.calendar_entry_id = log.id AND inv.status_key = 'INVOICE_STATUS_INVOICED' AND inv.sent_date <= '2016-06-27' AND NOT EXISTS( SELECT* FROM onp_crm_invoice creditnote WHERE il.invoice_id = creditnote.credit_againstAND creditnote.status_key = 'INVOICE_STATUS_INVOICED' ANDcreditnote.sent_date <= '2016-06-27' ) ) GROUP BY com.entity_id; Before the fix, the plan was like this: QUERY PLAN --- HashAggregate (cost=13874.94..13894.38 rows=972 width=32) (actual time =420188.465..420189.232rows=462 loops=1) Group Key: com.entity_id -> Nested Loop Left Join(cost=1710.21..13848.21 rows=972 width=32) (actual time =1004.858..420166.736rows=3882 loops=1) -> Nested Loop Anti Join (cost =1709.51..12990.02rows=972 width=36) (actual time=1003.001..419356.749 rows=3882 loops=1) Join Filter: (cem.calendar_entry_id = log.id) Rows Removed by Join Filter: 3996874113 -> Hash Join (cost=1172.69..10612.53 rows=972 width=44) (actualtime=29.085..333.323 rows=89431 loops=1) Hash Cond: (log.relation_id = com.entity_id) ->Hash Join (cost=1063.39..10489.85 rows=976 width=40) (actual time=27.598..261.822 rows=89431 loops=1) Hash Cond: (log.logged_for = logfor.onp_user_id) -> Bitmap Heap Scanon onp_crm_activity_log log (cost =976.83..10055.99rows=90010 width=44) (actual time=27.402..223.407 rows=89486 loops=1) Recheck Cond: ((status = ANY ('{1,2,3,5}'::integer[])) AND is_chargable AND(start_date IS NOT NULL) AND ((entry_type)::text = 'TIMESHEET_ENTRY'::text)) Filter: (start_date <= '2016-06-27'::date) Rows Removed by Filter: 991 Heap Blocks: exact=6095 -> Bitmap Index Scan on origo_calendar_entry_status_1_idx ( cost=0.00..954.33 rows=93008 width=0) (actual time=26.339..26.339 rows=101274 loops=1) Index Cond: (status = ANY ('{1,2,3,5}'::integer[])) -> Hash (cost =39.46..39.46rows=3768 width=8) (actual time=0.148..0.148 rows=36 loops=1) Buckets:4096 Batches: 1 Memory Usage: 34kB -> Bitmap Heap Scan on onp_crm_person logfor (cost=3.69..39.46 rows=3768 width=8) (actual time =0.049..0.137rows=36 loops=1) Recheck Cond: (onp_user_id IS NOT NULL) Filter: ( NOTis_resource) Rows Removed by Filter: 5 Heap Blocks: exact=11 -> Bitmap Index Scanon onp_crm_person_onp_id_idx (cost=0.00..2.75 rows=41 width=0) (actual time =0.021..0.021rows=41 loops=1) -> Hash (cost=80.66..80.66 r
[HACKERS] Why --backup-and-modify-in-place in perltidy config?
I did a trial run following the current pgindent README procedure, and noticed that the perltidy step left me with a pile of '.bak' files littering the entire tree. This seems like a pretty bad idea because a naive "git add ." would have committed them. It's evidently because src/tools/pgindent/perltidyrc includes --backup-and-modify-in-place. Is there a good reason for that, and if so what is it? Also, is there a reason why the perltidy invocation command hasn't been packaged into a shell script, rather than expecting the committer to copy-and-paste a rather large string? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Small patch: initdb: "'" for QUOTE_PATH (non-windows)
Hello Postgres Hackers! I sent this already a few hours ago but it got blocked because I had not yet joined the mailing list. Trying again, sorry for any redundancy or confusion! This is to fix an issue that came up for me when running initdb. At the end of a successful initdb it says: Success. You can now start the database server using: pg_ctl -D /some/path/to/data -l logfile start but this command did not work for me because my data directory contained a space. The src/bin/initdb/initdb.c source code did already have a QUOTE_PATH constant, but it was the empty string for non-windows cases. Therefore, added quotes via existing QUOTE_PATH constant: pg_ctl -D '/some/path/to/data' -l logfile start Best, Ryan 0001-initdb-for-QUOTE_PATH-non-windows.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY FREEZE and PD_ALL_VISIBLE
On Tue, Nov 3, 2015 at 6:37 AM, Simon Riggs wrote: > On 3 November 2015 at 15:23, Amit Kapila wrote: >> >> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs >> wrote: >>> >>> On 21 October 2015 at 13:31, Jeff Janes wrote: >>> Index-only scans will visit the heap for each tuple until the first VACUUM is done. The first vacuum will read the entire table, but not need to write it anymore. And will create the _vm file. I think we really want to create _vm file as well as set PD_ALL_VISIBLE, but I don't know the best way to do that. Set a flag somewhere and then create it in bulk at the end of the transaction? Set it bit by bit as the pages are extended and initialized? >>> >>> >>> Easy enough to do it at the end of the COPY FREEZE in one step. >> >> >> Here, we might want to consider that setting bit in visibility map >> will generate WAL log whereas Copy Freeze otherwise skip WAL >> when wal_level is less than archive. This can lead to extra disk >> writes which can slow down Copy Freeze, but OTOH that might >> be acceptable. > > > I'm building the map as I go, in the latest version of this patch I'm > working on. Hi Simon, Is this still on your radar? If you would like someone else to pick it up, can you post the WIP patch you have? Thanks, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [NOVICE] numeric data type upper limit.
Aravind Kumar writes: > when I do > select 1.0e+1001::numeric; > I get > invalid input syntax for type numeric: "1.0e+1001" > Postgresql documentation tells me that numeric data type has an upper > limit of 131072 digits before decimal point. You can successfully do select pow(10::numeric, 131071); or select ('1' || repeat('0', 131071))::numeric; so it seems pretty arbitrary that there's such a small limit on the allowed exponent value. This is coming from this bit in numeric.c's set_var_from_str(): if (exponent > NUMERIC_MAX_PRECISION || exponent < -NUMERIC_MAX_PRECISION) ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type numeric: \"%s\"", str))); which I think is just brain fade. We should probably change that so that it only restricts the exponent enough to prevent integer overflow in the weight calculations, and leave it to make_result() to decide whether the value is out of storage range. (I notice it's not bothering to check for an overflow signal from strtol(), either :-(. Also it seems like this should be an "overflow" errcode not an "invalid syntax" one.) Poking around for any other possible misuses of NUMERIC_MAX_PRECISION, I came across this bit in numeric_recv(): len = (uint16) pq_getmsgint(buf, sizeof(uint16)); if (len < 0 || len > NUMERIC_MAX_PRECISION + NUMERIC_MAX_RESULT_SCALE) ereport(ERROR, (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION), errmsg("invalid length in external \"numeric\" value"))); which again represents a totally misplaced assumption that NUMERIC_MAX_PRECISION has something to do with the max number of digits to the left of the decimal point. This is actually pretty bad because it creates a dump/reload failure hazard when using binary COPY: regression=# create table bign (f1 numeric); CREATE TABLE regression=# insert into bign select pow(9.9::numeric, 10); INSERT 0 1 regression=# \copy bign to 'bign.data' (format binary); COPY 1 regression=# \copy bign from 'bign.data' (format binary); ERROR: invalid length in external "numeric" value CONTEXT: COPY bign, line 1, column f1 I think we can probably just remove the length test there altogether: anything that fits in uint16 is probably OK. Because of the dump/reload hazard, I think the numeric_recv() issue is definitely a back-patchable bug fix. You could argue that relaxing the E-notation limit in numeric_in() is more of a feature addition, but considering that the documentation has stated very specifically since 9.1 that NUMERIC accepts large values (cf commit cabf5d84b), I'm inclined to think that's a bug fix too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, failed Spec compliant: tested, passed Documentation:tested, passed Hi hackers! I've read the patch and here is my code review. ==PURPOSE I've used this feature from time to time with MS SQL. From my experience INCLUDE is a 'sugar on top' feature. Some MS SQL classes do not even mention INCLUDE despite it's there from 2005 (though classes do not mention lots of important things, so it's not kind of valuable indicator). But those who use it, use it whenever possible. For example, system view with recommended indices rarely list one without INCLUDE columns. So, this feature is very important from perspective of converting MS SQL DBAs to PostgreSQL. This is how I see it. SUGGESTIONS== 0. Index build is broken. This script https://github.com/x4m/pggistopt/blob/8ad65d2e305e98c836388a07909af5983dba9c73/test.sql SEGFAULTs and may cause situation when you cannot insert anything into table (I think drop of index would help, but didn't tested this) 1. I think MS SQL syntax INCLUDE instead of INCLUDING would be better (for a purpose listed above) 2. Empty line added in ruleutils.c. Is it for a reason? 3. Now we have indnatts and indnkeyatts instead of indnatts. I think it is worth considering renaming indnatts to something different from old name. Someone somewhere could still suppose it's a number of keys. PERFORMANCE== Due to suggestion number 0 I could not measure performance of index build. Index crashes when there's more than 1.1 million of rows in a table. Performance test script is here https://github.com/x4m/pggistopt/blob/f206b4395baa15a2fa42897eeb27bd555619119a/test.sql Test scenario is following: 1. Create table, then create index, then add data. 2. Make a query touching data in INCLUDING columns. This scenario is tested against table with: A. Table with index, that do not contain touched columns, just PK. B. Index with all columns in index. C. Index with PK in keys and INCLUDING all other columns. Tests were executed 5 times on Ubuntu VM under Hyper-V i5 2500 CPU, 16 Gb of RAM, SSD disk. Time to insert 10M rows: A. AVG 110 seconds STD 4.8 B. AVG 121 seconds STD 2.0 C. AVG 111 seconds STD 5.7 Inserts to INCLUDING index is almost as fast as inserts to index without extra columns. Time to run SELECT query: A. AVG 2864 ms STD 794 B. AVG 2329 ms STD 84 C. AVG 2293 ms STD 58 Selects with INCLUDING columns is almost as fast as with full index. Index size (deterministic measure, STD = 0) A. 317 MB B. 509 MB C. 399 MB Index size is in the middle between full index and minimal index. I think this numbers agree with expectation from the feature. CONCLUSION== This patch brings useful and important feature. Build shall be repaired; other my suggestions are only suggestions. Best regards, Andrey Borodin, Octonica & Ural Federal University. The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)
Hello Postgres Team! This is to fix an issue that came up for me when running initdb. At the end of a successful initdb it says: Success. You can now start the database server using: pg_ctl -D /some/path/to/data -l logfile start but this command did not work for me because my data directory contained a space. The src/bin/initdb/initdb.c source code did already have a QUOTE_PATH constant, but it was the empty string for non-windows cases. Therefore, added quotes via existing QUOTE_PATH constant: pg_ctl -D '/some/path/to/data' -l logfile start 0001-initdb-for-QUOTE_PATH-non-windows.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers