[GENERAL] Optimizing CartoDB's JenksBins function
hi, Our friends from CartoDB [1] provide a beautiful Jenks Natural Breaks function for Postgres [2]. It is quite computationally intensive. Even if you don't know what Jenks is, do you see any optimizations? Best, thanks, Seamus PS. I was hoping for something magical like Tom Lane's VALUES() fix for DataDog [3] or HeapAnalytic's array fix [4], although I realize both are fundamentally different :) This is probably a more subtle loop optimization problem, if anything. [1] http://cartodb.com [2] https://github.com/CartoDB/cartodb/blob/master/lib/sql/scripts-available/CDB_JenksBins.sql [3] https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/ [4] http://blog.heapanalytics.com/dont-iterate-over-a-postgres-array-with-a-loop/ -- Seamus Abshere, SCEA https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Why is pg_restore trying to create tables in pg_catalog?
hi, Why is pg_restore trying to put stuff into the pg_catalog schema of all places? It's ignoring the schema specified in the pg_dump itself (`myschema`) and even my search_path (`public`). $ psql stuff_development --command show search_path search_path - public (1 row) $ pg_restore --list stuff.pg_dump ; ; Archive created at Fri Apr 4 00:55:50 2014 ; dbname: stuff_development ; TOC Entries: 14 ; Compression: -1 ; Dump Version: 1.12-0 ; Format: CUSTOM ; Integer: 4 bytes ; Offset: 8 bytes ; Dumped from database version: 9.3.4 ; Dumped by pg_dump version: 9.3.4 ; ; ; Selected TOC Entries: ; 205; 1259 95675 TABLE myschema stuff_one myuser [...] 3312; 0 95675 TABLE DATA myschema stuff_one myuser [...] $ pg_restore --verbose --no-owner --no-privileges --dbname stuff_development stuff.pg_dump pg_restore: connecting to database for restore pg_restore: creating TABLE stuff_one pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 205; 1259 95675 TABLE stuff_one myuser pg_restore: [archiver (db)] could not execute query: ERROR: permission denied to create pg_catalog.stuff_one DETAIL: System catalog modifications are currently disallowed. Command was: CREATE TABLE stuff_one ( the_geom public.geometry ); Thank you! Seamus -- Seamus Abshere, SCEA https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is pg_restore trying to create tables in pg_catalog?
On 4/4/14, 12:58 PM, Tom Lane wrote: Seamus Abshere sea...@abshere.net writes: Why is pg_restore trying to put stuff into the pg_catalog schema of all places? Hm ... does myschema actually exist in the target database? [...] if myschema doesn't exist, the creation target devolves to pg_catalog. This is not terribly elegant but I'm not sure if there's consensus to change it. Tom, You're right, myschema didn't exist (I thought I had created it separately, etc.) Perhaps it would be good to warn the user (at least in --verbose) if it's auto-devolving to pg_catalog? Thanks again, Seamus PS. Otherwise, if you google the error message, you get a whole bunch of stackoverflow posts recommending you make your user a superuser so you can write to pg_catalog, which probably isn't what the person wanted in the first place. -- Seamus Abshere, SCEA +598 99 54 99 54 https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Force specific index disuse
On 5/20/14, 1:38 PM, Steve Crawford wrote: Is there a way to force a specific index to be removed from consideration in planning a single query? hi Steve, What is the query? Or at least a sanitized but complete version? Thanks, Seamus PS. I've had luck hinting with OFFSET 0 but it might not help in your use case. http://seamusabshere.github.io/2013/03/29/hinting-postgres-and-mysql-with-offset-and-limit/ Specifically, on a 60-million-row table I have an index that is a candidate for removal. I have identified the sets of nightly queries that use the index but before dropping it I would like to run EXPLAIN and do timing tests on the queries to see the impact of not having that index available and rewrite the query to efficiently use other indexes if necessary. -- Seamus Abshere, SCEA +598 99 54 99 54 https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Why is unique constraint needed for upsert?
hi all, Upsert is usually defined [1] in reference to a violating a unique key: Insert, if unique constraint violation then update; or update, if not found then insert. Is this theoretically preferable to just looking for a row that matches certain criteria, updating it if found or inserting otherwise? For an example of the latter approach, see MongoDB's flavor of upsert [2]. You just give it a query and an update. It seems to me this is better because it functions correctly whether or not an index is in place. Best, thanks, Seamus [1] http://postgresql.uservoice.com/forums/21853-general/suggestions/245202-merge-upsert-replace [2] http://docs.mongodb.org/manual/reference/method/db.collection.update/ -- Seamus Abshere, SCEA https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is unique constraint needed for upsert? (treat atomicity as optional)
On 7/23/14 3:40 PM, Tom Lane wrote: John R Pierce pie...@hogranch.com writes: On 7/23/2014 10:21 AM, Seamus Abshere wrote: Upsert is usually defined [1] in reference to a violating a unique key: Is this theoretically preferable to just looking for a row that matches certain criteria, updating it if found or inserting otherwise? what happens when two connections do this more or less concurrently, in transactions? For the OP's benefit --- the subtext John left unstated is that the unique-key mechanism has already solved the problem of preventing concurrent updates from creating duplicate keys. If we build a version of UPSERT that doesn't rely on a unique index then it'll need some entirely new mechanism to prevent concurrent key insertion. (And if you don't care about concurrent cases, you don't really need UPSERT ...) hi all, What if we treat atomicity as optional? You could have extremely readable syntax like: -- no guarantees, no index required UPSERT age = 5 INTO dogs WHERE name = 'Jerry'; -- optionally tell us how you want to deal with collision UPSERT age = 3 INTO dogs WHERE name = 'Jerry' ON DUPLICATE KEY KEEP NEWEST; UPSERT age = 5 INTO dogs WHERE name = 'Jerry' ON DUPLICATE KEY KEEP NEWEST; -- only **require** (by throwing an error) a unique index or a locked table for queries like UPSERT age = age+1 INTO dogs WHERE name = 'Jerry'; Obviously this flies in the face of what most people say the fundamental Upsert property is [1] At READ COMMITTED isolation level, you should always get an atomic insert or update [1] I just think there are a lot of non-concurrent bulk loading and processing workflows that could benefit from the performance advantages of upsert (one trip to database). Best, thanks, Seamus [1] http://www.pgcon.org/2014/schedule/events/661.en.html -- Seamus Abshere, SCEA https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is unique constraint needed for upsert? (treat atomicity as optional)
On 7/23/14 6:03 PM, John R Pierce wrote: On 7/23/2014 1:45 PM, Seamus Abshere wrote: What if we treat atomicity as optional? atomicity is not and never will be optional in PostgreSQL. I'm wondering what a minimal definition of upsert could be - possibly separating concurrency handling out as a (rigorously defined) option for those who need it. -- no guarantees, no index required UPSERT age = 5 INTO dogs WHERE name = 'Jerry'; and if there's several rows with name='Jerry', you'd want to update them ALL ? if name isn't indexed, this will, as Tom suggests, require a FULL table scan, and it still will have issues with concurrency Ah, I was just saying, in terms of correctness, it seems to me that upsert shouldn't NEED a index to work, just like you don't need an index on name when you say WHERE name = 'Jerry' in SELECTs or INSERTS or UPDATES. Appreciate the defense of data integrity in any case!! Best, Seamus -- Seamus Abshere, SCEA https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Re: Why is unique constraint needed for upsert? (treat atomicity as optional)
On 7/23/14 6:50 PM, David G Johnston wrote: seamusabshere wrote On 7/23/14 6:03 PM, John R Pierce wrote: On 7/23/2014 1:45 PM, Seamus Abshere wrote: What if we treat atomicity as optional? atomicity is not and never will be optional in PostgreSQL. I'm wondering what a minimal definition of upsert could be - possibly separating concurrency handling out as a (rigorously defined) option for those who need it. Given we do not have native UPSERT I'm not sure where your question is coming from anyway. I'm not sure what the plans are for UPSERT at the moment but nothing prevents us from performing the UPSERT comparison on a non-uniqe set of columns. hi David, My argument lives and dies on the assumption that UPSERT would be useful even if it was (when given with no options) just a macro for UPDATE db SET b = data WHERE a = key; IF NOT found THEN INSERT INTO db(a,b) VALUES (key, data); END IF; Adding things like unique indexes would work like you would expect with individual INSERTs or UPDATEs - your statement might raise an exception. Then, going beyond, UPSERT would optionally support atomic a = a+1 stuff, special actions to take on duplicate keys, all the concurrency stuff that people have been talking about. IMO having such a complicated definition of what an upsert must be makes it a unicorn when it could just be a sibling to INSERT and UPDATE. Best, Seamus -- Seamus Abshere, SCEA https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Re: Why is unique constraint needed for upsert? (treat atomicity as optional)
On 7/23/14 7:45 PM, John R Pierce wrote: On 7/23/2014 3:29 PM, Seamus Abshere wrote: My argument lives and dies on the assumption that UPSERT would be useful even if it was (when given with no options) just a macro for UPDATE db SET b = data WHERE a = key; IF NOT found THEN INSERT INTO db(a,b) VALUES (key, data); END IF; but that won't work if two connections execute similar 'upserts' concurrently.both updates will see the record isn't there, then one or the other insert will fail, depending on which transaction commits first. John, Right - if you had a situation where that might happen, you would use a slightly more advanced version of the UPSERT command (and/or add a unique index). UPSERT, in this conception and in its most basic form, would be subject to many of the same (and more) concurrency concerns as basic INSERTs and UPDATEs. Providing options may be preferable magically handling everything. Best, Seamus -- Seamus Abshere, SCEA +598 99 54 99 54 https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Unexpected syntax error when using JSON - in 9.3.5
hi all, This part looks correct and expected: $ psql foobar psql (9.3.5) Type help for help. foobar=# select coalesce('{}'::json-'a', 1); ERROR: COALESCE types json and integer cannot be matched LINE 1: select coalesce('{}'::json-'a', 1); ^ but check it out when I use a string instead of an integer: foobar=# select coalesce('{}'::json-'a', 'b'); ERROR: invalid input syntax for type json LINE 1: select coalesce('{}'::json-'a', 'b'); ^ DETAIL: Token b is invalid. CONTEXT: JSON data, line 1: b That seems like the wrong error - shouldn't it be the equiv of [...] json and string cannot be matched? Thanks, Seamus -- Seamus Abshere, SCEA https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Will there be a JSON operator like - but returning numeric?
hi, I've got use cases like array_remove(array_agg((a-'b')::float), NULL) It would be nice to replace (a-'b')::float with something like a-^'b' that directly returned a numeric... is that in the cards? Thanks, Seamus -- Seamus Abshere, SCEA https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Comparing results of regexp_matches
hi, I want to check if two similar-looking addresses have the same numbered street, like 20th versus 21st. 2033 21st Ave S 2033 20th Ave S (they're different) I get an error: # select regexp_matches('2033 21st Ave S', '\m(\d+(?:st|th))\M') = regexp_matches('2033 20th Ave S', '\m(\d+(?:st|th))\M'); ERROR: functions and operators can take at most one set argument I've tried `()[1] == ()[1]`, etc. but the only thing that works is making it into 2 subqueries: # select (select * from regexp_matches('2033 21st Ave S', '\m(\d+(?:st|th))\M')) = (select * from regexp_matches('2033 20th Ave S', '\m(\d+(?:st|th))\M')); ?column? -- f (1 row) Is there a more elegant way to compare the results of `regexp_matches()`? Thanks, Seamus -- Seamus Abshere, SCEA https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Definitive answer: can functions use indexes?
On Wed, Jan 6, 2016, at 08:41 PM, Tom Lane wrote: > Seamus Abshere <sea...@abshere.net> writes: > > -> Can a function like `LEFT()` use an index? > Since the question makes little sense as stated, I'm going to assume > you mean "can a query like SELECT ... WHERE left(foo, 3) = 'bar' > use an index on column foo?" > > The answer to that is no, there is no such optimization built into > Postgres. (In principle there could be, but I've not heard enough > requests to make me think we'd ever pursue it.) > > The equivalent optimization that *is* built in, and has been for > a long time, is for LIKE: "SELECT ... WHERE foo LIKE 'bar%'" can > use an index on foo, at least if it's an index sorted according to > C collation. hi Tom, I should have been more general. In layman's/narrative terms, what's the deal with functions vs. operators for postgres indexes? For example, `exist(hstore,text)` vs. `hstore ? text` ? Thank you! Seamus PS. If I have understood correctly over the years, in order for the query planner to use indexes, it needs to see operators - functions are opaque to it. I'm looking for a bit more narrative on this to round out my understanding. -- Seamus Abshere, SCEA +598 99 54 99 54 https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Definitive answer: can functions use indexes?
hi, I've been using Postgres for years ( :heart: ) and I'm still in doubt about this. Would somebody provide an authoritative, definitive, narrative answer? -> Can a function like `LEFT()` use an index? (Or do I have to find an "equivalent" operator in order to leverage indexes?) Thanks! Seamus -- Seamus Abshere, SCEA +598 99 54 99 54 https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Read-only tables to avoid row visibility check
hi, https://wiki.postgresql.org/wiki/ReadOnlyTables mentions the possibility of `ALTER TABLE table SET READ ONLY`. Would this mean that row visibility checks could be skipped and thus index-only scans much more common? Thanks, Seamus -- Seamus Abshere, SCEA +598 99 54 99 54 https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] uuid gin operator class - why not include for everybody?
hi, We want to use gin indexes on arrays of UUIDs. It is as simple as: > CREATE OPERATOR CLASS _uuid_ops > DEFAULT FOR TYPE _uuid > USING gin AS > OPERATOR 1 &&(anyarray, anyarray), > OPERATOR 2 @>(anyarray, anyarray), > OPERATOR 3 <@(anyarray, anyarray), > OPERATOR 4 =(anyarray, anyarray), > FUNCTION 1 uuid_cmp(uuid, uuid), > FUNCTION 2 ginarrayextract(anyarray, internal, internal), > FUNCTION 3 ginqueryarrayextract(anyarray, internal, smallint, internal, > internal, internal, internal), > FUNCTION 4 ginarrayconsistent(internal, smallint, anyarray, integer, > internal, internal, internal, internal), > STORAGE uuid; Is there a reason not to put this into postgres itself? This already exists for text[]. Thanks, Seamus -- Seamus Abshere, SCEA +598 99 54 99 54 https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does query planner choose slower BitmapAnd ?
On Mon, Feb 22, 2016, at 01:48 PM, Tom Lane wrote: > Given how remarkably quick the single-index scan is, I also wonder if that > index is fully cached while we had to read some of the other index from > kernel or SSD. This makes sense, except that the speed of the query is the same if I run it many times in a row. Shouldn't the partially-cached index get loaded fully by the second query? On Mon, Feb 22, 2016, at 01:20 PM, Stephen Frost wrote: > The first question is probably- are we properly accounting for the cost of > scanning the index vs the cost of scanning one index and then applying the > filter? I can affect the query planner's cost estimates with random_page_cost (only), but I still can't get it to avoid the BitmapAnd - probably because I am affecting other cost estimates in the same proportion. No change with original settings OR cpu_tuple_cost=10 OR seq_page_cost=10 OR (cpu_tuple_cost=0.05, seq_page_cost=1, random_page_cost=1) > -> BitmapAnd (cost=105894.80..105894.80 rows=21002 width=0) (actual > time=4859.397..4859.397 rows=0 loops=1) > -> Bitmap Index Scan on idx_houses_city (cost=0.00..1666.90 rows=164044 > width=0) (actual time=16.098..16.098 rows=155690 loops=1) > Index Cond: (city = 'New York'::text) > -> Bitmap Index Scan on idx_houses_phoneable (cost=0.00..104224.60 > rows=10271471 width=0) (actual time=4771.520..4771.520 rows=10647041 loops=1) > Index Cond: (phoneable = true) However with random_page_cost=10 (hint: cost estimates go up by 4x or so) > -> BitmapAnd (cost=354510.80..354510.80 rows=21002 width=0) (actual > time=4603.575..4603.575 rows=0 loops=1) > -> Bitmap Index Scan on idx_houses_city (cost=0.00..5590.90 rows=164044 > width=0) (actual time=16.529..16.529 rows=155690 loops=1) > Index Cond: (city = 'New York'::text) > -> Bitmap Index Scan on idx_houses_phoneable (cost=0.00..348916.60 > rows=10271471 width=0) (actual time=4530.424..4530.424 rows=10647041 loops=1) > Index Cond: (phoneable = true) I think this is why we originally set random_page_cost so "low"... it was our way of "forcing" more index usage (we have a big, wide table). Is there any other way to differentiate the 2 index scans? FWIW, 10% of houses are phoneable, 0.2% are in the city. (Maybe I'm just supposed to drop the index like Tom said.) Best, thanks, Seamus -- Seamus Abshere, SCEA +598 99 54 99 54 https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Read-only tables to avoid row visibility check
On Mon, Feb 22, 2016, at 06:48 PM, David G. Johnston wrote: > it would probably be more constructive to actually communicate the thoughts > that provoked the question. My company has a largish table - 250+ columns, 1 row for every household in the US. It's read-only. We've gotten advice to convert to a column store (cstore_fdw, etc.) but we would love to just stay with tried-and-true postgres tables. Plus, many of our queries are against dozens of columns at once. Being able to tell postgres that our table is "Read Only" has imaginary mystical properties for me, first and foremost being able to count against indexes without ever hitting the disk. > ​If the system is working properly then a READ ONLY table in fact should be > able to use Index Only Scans without the hack of a DBA telling it that said > table is READ ONLY.​ So this should happen already? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Read-only tables to avoid row visibility check
On Mon, Feb 22, 2016, at 06:30 PM, Melvin Davidson wrote: > However, at this time, there is no such option as SET READ ONLY in any > version of PostgreSQL. I know. I am wondering if hypothetical read-only tables would make index-only scans more possible by avoiding the need for row visibility checks. -- Seamus Abshere, SCEA +598 99 54 99 54 https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does query planner choose slower BitmapAnd ?
On Mon, Feb 22, 2016, at 02:14 PM, Tom Lane wrote: > IOW, almost certainly we *don't* realize that the query will involve scanning > through gigabytes of index pages. But btree indexes are much simpler and > easier to make that estimate for... Isn't this the crux of my issue, at least? -- Seamus Abshere, SCEA +598 99 54 99 54 https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does query planner choose slower BitmapAnd ?
On Mon, Feb 22, 2016, at 02:30 PM, Jeff Janes wrote: > It charges 0.1 CPU_operator_cost, while reality seemed to be more like 6 > CPU_operator_cost. fdy=> select name, setting, boot_val from pg_settings where name ~ 'cpu'; name | setting | boot_val --+-+-- cpu_index_tuple_cost | 0.005 | 0.005 cpu_operator_cost| 0.0025 | 0.0025 cpu_tuple_cost | 0.01| 0.01 (3 rows) Inspired, I changed cpu_index_tuple_cost to 0.1 (default: 0.005). It "fixed" my problem by preventing the BitmapAnd. Is this dangerous? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does query planner choose slower BitmapAnd ?
On Mon, Feb 22, 2016, at 02:49 PM, Tom Lane wrote: > Seamus Abshere <sea...@abshere.net> writes: > > Inspired, I changed cpu_index_tuple_cost to 0.1 (default: 0.005). It > > "fixed" my problem by preventing the BitmapAnd. > > Is this dangerous? > > Use a gentle tap, man, don't swing the hammer with quite so much abandon. > I'd have tried doubling the setting to start with. Raising it 20X might > cause other queries to change behavior undesirably. Doubling it was enough :) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does query planner choose slower BitmapAnd ?
On Mon, Feb 22, 2016, at 02:53 PM, Seamus Abshere wrote: > On Mon, Feb 22, 2016, at 02:49 PM, Tom Lane wrote: > > Seamus Abshere <sea...@abshere.net> writes: > > > Inspired, I changed cpu_index_tuple_cost to 0.1 (default: 0.005). It > > > "fixed" my problem by preventing the BitmapAnd. > > > Is this dangerous? > > > > Use a gentle tap, man, don't swing the hammer with quite so much abandon. > > I'd have tried doubling the setting to start with. Raising it 20X might > > cause other queries to change behavior undesirably. > > Doubling it was enough :) name | setting | boot_val --+-+-- cpu_index_tuple_cost | 0.09| 0.005 <- 18x boot val, 9x cpu_tuple_cost cpu_operator_cost| 0.0025 | 0.0025 cpu_tuple_cost | 0.01| 0.01 In the end I'm back to the big hammer. I found that larger cities (e.g., more results from the city index) required a larger cpu_index_tuple_cost to prevent the BitmapAnd. Now cpu_index_tuple_cost is set to 0.09, which is 18x its boot_val and 9x cpu_tuple_cost... which seems strange. Logically, should I be changing cpu_operator_cost instead? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Why does query planner choose slower BitmapAnd ?
hi, I don't understand why the query planner is choosing a BitmapAnd when an Index Scan followed by a filter is obviously better. (Note that "new_york_houses" is a view of table "houses" with one condition on city - and there is an index idx_houses_city. That is the Index Scan that I think it should use exclusively.) Here's a fast query that uses the Index Scan followed by a filter: > => explain analyze SELECT COUNT(DISTINCT id) FROM new_york_houses WHERE > roof_area >= 0 AND roof_area < 278.7091; > > QUERY PLAN > --- > Aggregate (cost=167298.10..167298.11 rows=1 width=16) (actual > time=141.137..141.137 rows=1 loops=1) >-> Index Scan using idx_houses_city on households (cost=0.57..167178.87 > rows=47694 width=16) (actual time=0.045..105.953 rows=53971 loops=1) > Index Cond: (city = 'New York'::text) > Filter: ((roof_area >= 0) AND ((roof_area)::numeric < 278.7091)) > Rows Removed by Filter: 101719 > Planning time: 0.688 ms > Execution time: 141.250 ms > (7 rows) When I add another condition, "phoneable", however, it chooses an obviously wrong plan: > => explain analyze SELECT COUNT(DISTINCT id) FROM new_york_houses WHERE > roof_area >= 0 AND roof_area < 278.7091 AND phoneable = true; > >QUERY PLAN > --- > Aggregate (cost=128163.05..128163.06 rows=1 width=16) (actual > time=4564.677..4564.677 rows=1 loops=1) >-> Bitmap Heap Scan on households (cost=105894.80..128147.78 rows=6106 > width=16) (actual time=4456.690..4561.416 rows=5183 loops=1) > Recheck Cond: (city = 'New York'::text) > Filter: (phoneable AND (roof_area >= 0) AND ((roof_area)::numeric < > 278.7091)) > Rows Removed by Filter: 40103 > Heap Blocks: exact=14563 > -> BitmapAnd (cost=105894.80..105894.80 rows=21002 width=0) > (actual time=4453.510..4453.510 rows=0 loops=1) >-> Bitmap Index Scan on idx_houses_city (cost=0.00..1666.90 > rows=164044 width=0) (actual time=16.505..16.505 rows=155690 loops=1) > Index Cond: (city = 'New York'::text) >-> Bitmap Index Scan on idx_houses_phoneable > (cost=0.00..104224.60 rows=10271471 width=0) (actual time=4384.461..4384.461 > rows=10647041 loops=1) > Index Cond: (phoneable = true) > Planning time: 0.709 ms > Execution time: 4565.067 ms > (13 rows) On Postgres 9.4.4 with 244gb memory and SSDs maintenance_work_mem 100 work_mem 50 random_page_cost 1 seq_page_cost 2 The "houses" table has been analyzed recently and has statistics set to the max. Thanks, Seamus -- Seamus Abshere, SCEA https://github.com/seamusabshere https://www.linkedin.com/in/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] plpgsql update row from record variable
hi, I want to write a function that updates arbitrary columns and here's my pseudocode: CREATE OR REPLACE FUNCTION update_pet(id int, raw_data jsonb) RETURNS VOID AS $$ DECLARE data record; BEGIN SELECT jsonb_populate_record(null::pets, raw_data) INTO data; UPDATE pets [... from data ...] WHERE id = id; -- probably impossible END; $$ LANGUAGE plpgsql; e.g. SELECT update_pets(1, '{ "name" : "Jerry", "age": 9 }'::jsonb); Back in 2004, Tom showed how to insert from a plpgsql record: http://www.postgresql.org/message-id/17840.1087670...@sss.pgh.pa.us Is there any way to "update *" from a record? Thanks! Seamus PS. Whether I **should** do this is another matter, I just want to know if it's possible. -- Seamus Abshere, SCEA https://github.com/seamusabshere http://linkedin.com/in/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Does a row lock taken out in a CTE stay in place?
Given an update that uses CTEs like this: WITH lock_rows AS ( SELECT 1 FROM tbl WHERE [...] FOR UPDATE ) UPDATE [...] Will the rows in `tbl` remain locked until the UPDATE is finished? Also, does it matter if `lock_rows` is referenced? (IIUC the query wouldn't be run if the CTE isn't referenced if it was for a SELECT, but since it's an UPDATE, it will be run anyway) Thanks! Seamus -- Seamus Abshere, SCEA https://github.com/seamusabshere https://linkedin.com/in/seamusabshere https://www.faraday.io -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Concurrency and UPDATE [...] FROM
The purpose is to concat new data onto existing values of c: UPDATE tbl SET c = c || new_data.c FROM ( [...] ) AS new_data WHERE tbl.id = new_data.id It appears to have a race condition: t0: Query A starts subquery t1: Query A starts self-join t2. Query A starts UPDATE with data from self-join and subquery t3. Query B starts subquery t4. Query B starts self-join (note: data from t1!) [...] tN. Query A finishes UPDATE tN+1. Query B finishes UPDATE, missing any new_data from Query A My assumption is that t1 and t4 (the self-joins) use SELECT but not SELECT FOR UPDATE. If they did, I think the race condition would go away. Did I analyze that right? Thanks! -- Seamus Abshere, SCEA https://github.com/seamusabshere https://linkedin.com/in/seamusabshere https://www.faraday.io -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] CREATE AGGREGATE on jsonb concat
Seamus Abshere <sea...@abshere.net> writes: > That aggregates into an array. Our `jsonb_collect` aggregates into an object. Postgres 9.6 has (per https://www.postgresql.org/docs/9.6/static/functions-aggregate.html): * jsonb_agg(expression) * jsonb_object_agg(name, value) In retrospect, I think what I am proposing is: * jsonb_object_agg(expression) Is that sane? Best, Seamus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] CREATE AGGREGATE on jsonb concat
hi, We do this in our database: CREATE AGGREGATE jsonb_collect(jsonb) ( SFUNC = 'jsonb_concat', STYPE = jsonb, INITCOND = '{}' ); Is there some other built-in aggregate I'm missing that would do the same thing? It just feels like such an obvious feature. Thanks for your advice, Seamus -- Seamus Abshere, SCEA https://github.com/seamusabshere https://linkedin.com/in/seamusabshere https://www.faraday.io -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Desired behavior for || (jsonb_concat)
hi, # select '{"a":1}'::jsonb || null; ?column? -- null (1 row) Is there a theoretical reason that this has to return null as opposed to just {"a":1}? Thanks, Seamus -- Seamus Abshere, SCEA https://github.com/seamusabshere https://linkedin.com/in/seamusabshere https://www.faraday.io -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] CREATE AGGREGATE on jsonb concat
> Seamus Abshere <sea...@abshere.net> writes: > > We do this in our database: > > > CREATE AGGREGATE jsonb_collect(jsonb) ( > > SFUNC = 'jsonb_concat', > > STYPE = jsonb, > > INITCOND = '{}' > > ); > > > Is there some other built-in aggregate I'm missing that would do the > > same thing? It just feels like such an obvious feature. > On Thu, Jul 6, 2017, at 04:53 PM, Tom Lane wrote: > Doesn't jsonb_agg() do exactly that? hi Tom, That aggregates into an array. Our `jsonb_collect` aggregates into an object. Best, Seamus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Index-only scan on GIN index for COUNT() queries
hi, We have a GIN index on jsonb_col. We always get Bitmap Index Scan + Bitmap Heap Scan when we do things like SELECT COUNT(*) FROM mytable WHERE jsonb_col ? 'key1' Theoretically, could support be added for Index-only scans on GIN indexes when only a COUNT() is requested? Thanks, Seamus PS. Here is the real query analyze: https://gist.github.com/seamusabshere/b9d72132361fa598f7a431fa1bcb120f -- Seamus Abshere, SCEA http://faraday.io https://github.com/seamusabshere http://linkedin.com/in/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Non-overlapping updates blocking each other
> On Sat, Oct 14, 2017 at 10:30 AM, Seamus Abshere <sea...@abshere.net> > > UPDATE [...] WHERE id BETWEEN 'ff00----' AND > > 'ff0f----' > > and > > UPDATE [...] WHERE id BETWEEN 'f8c0----' AND > > 'f8ff----' > > Yet one blocks the other one. How is this possible? On Sat, Oct 14, 2017, at 12:32 PM, Melvin Davidson wrote: > More than likely, the optimizer has determined that a table scan is best, > in which case it will use a table lock. > You can also execute the following query and check the wait_event_type to > verify. hi Melvin, Very interesting! The result: wait_event | page wait_event_type | Lock So I guess this means that the ids don't overlap, but they are sometimes found in the same page, and the whole page gets locked? Any narrative (pretending I don't know anything) would be very helpful. Thanks! Seamus PS. I do a SELECT 1 FROM x WHERE [ID_RANGE] FOR UPDATE right before the update, but that's to prevent a race condition. The id ranges still don't overlap. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Non-overlapping updates blocking each other
On 2017-10-14 16:32:33 Tom Lane wrote: > More likely explanations for the OP's problem involve foreign key > constraints that cause two different row updates to need to lock > the same referenced row, or maybe he's using some index type that > has greater locking demands than a btree, or he's using serializable hi Tom, I hesitate to share my query and indexes because it makes this question seem more esoteric than I think it really is... but here we go. * Version 9.6.3. * I don't have any foreign key constraints. * I don't use serializable. * My update query is very careful to stay in an id range. [1] * I do have some exotic indexes [2]. gist, gin, postgis, fillfactor... My current theory is that, since the table is not clustered by id, rows with very distant ids get stored in the same page, and the whole page is locked during an update. Or something. [1] Update SQL: https://gist.github.com/seamusabshere/d04dad259e383c13f5559241d2fcad70 [2] Indexes: https://gist.github.com/seamusabshere/acba364b97e1dd221a589b1aaf22bddb Thanks, Seamus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Non-overlapping updates blocking each other
> Seamus Abshere <sea...@abshere.net> writes: > > * I do have some exotic indexes [2]. gist, gin, postgis, fillfactor... > On Mon, Oct 16, 2017, at 10:55 AM, Tom Lane wrote: > I'd bet on the last one, especially since you found that the problem > was a page-level lock. Did you look to see which relation the page > lock was in? The specific relation varies, but it appears to always be compound GIN index on (jsonb, text) Can I like decrease the fillfactor or something to make these "collisions" less frequent? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Non-overlapping updates blocking each other
> > On Mon, Oct 16, 2017, at 10:55 AM, Tom Lane wrote: > > I'd bet on the last one, especially since you found that the problem > > was a page-level lock. Did you look to see which relation the page > > lock was in? On Mon, Oct 16, 2017, at 12:34 PM, Seamus Abshere wrote: > The specific relation varies, but it appears to always be compound GIN > index on (jsonb, text) This is definitely GIN fastupdate. I turned off fastupdate and the blocks go away. I have a feeling, however, that my UPDATEs will actually get slower (testing now). I'm most interested in the fastest UPDATEs possible, even if reads suffer or similar UPDATEs take wildly different amounts of time. Should I crank maintenance_work_mem and gin_pending_list_limit way up, and autovacuum thresholds way down? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] "Shared strings"-style table
> > On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere wrote: > >> Theoretically / blue sky, could there be a table or column type that > >> transparently handles "shared strings" like this, reducing size on disk > >> at the cost of lookup overhead for all queries? > >> (I guess maybe it's like TOAST, but content-hashed and de-duped and not > >> only for large objects?) On Fri, Oct 13, 2017, at 01:29 PM, Melvin Davidson wrote: > What was described is exactly what relations and Foreign Keys are for. hi Melvin, appreciate the reminder. Our issue is that we have 300+ columns and frequently include them in the SELECT or WHERE clauses... so traditional normalization would involve hundreds of joins. That's why I ask about a new table or column type that handles basic translation and de-duping transparently, keeping the coded values in-table. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] "Shared strings"-style table
hey, In the spreadsheet world, there is this concept of "shared strings," a simple way of compressing spreadsheets when the data is duplicated in many cells. In my database, I have a table with >200 million rows and >300 columns (all the households in the United States). For clarity of development and debugging, I have not made any effort to normalize its contents, so millions of rows have, for example, "SINGLE FAMILY RESIDENCE / TOWNHOUSE" (yes, that whole string!) instead of some code representing it. Theoretically / blue sky, could there be a table or column type that transparently handles "shared strings" like this, reducing size on disk at the cost of lookup overhead for all queries? (I guess maybe it's like TOAST, but content-hashed and de-duped and not only for large objects?) Thanks, Seamus -- Seamus Abshere, SCEA https://www.faraday.io https://github.com/seamusabshere https://linkedin.com/in/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] "Shared strings"-style table
> On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere wrote > > Theoretically / blue sky, could there be a table or column type that > > transparently handles "shared strings" like this, reducing size on disk > > at the cost of lookup overhead for all queries? > > (I guess maybe it's like TOAST, but content-hashed and de-duped and not > > only for large objects?) On Fri, Oct 13, 2017, at 01:12 PM, David G. Johnston wrote: > Row-independence is baked into PostgreSQL pretty deeply... Could you say more about that? What about the comparison to TOAST, which stores values off-table? Thanks! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] "Shared strings"-style table
On Fri, Oct 13, 2017, at 03:16 PM, David G. Johnston wrote: > implement a "system-managed-enum" type with many of the same properties [...] > TOAST does involved compression but the input to > the compression algorithm is a single cell (row and column) in a table.​ > As noted above I consider the TOAST table and main table to be a single > logical table. See this sounds like _exactly_ what I want. Except with a content hash instead of an id. Seems to me like all of the machinery that allows you to look things up by TOASTed columns and subsequently return TOASTed values as if they resided in the same physical table is what is needed. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Non-overlapping updates blocking each other
hi, I've got 2 updates on non-overlapping uuid (primary key) ranges. For example: UPDATE [...] WHERE id BETWEEN 'ff00----' AND 'ff0f----' and UPDATE [...] WHERE id BETWEEN 'f8c0----' AND 'f8ff----' Yet one blocks the other one. How is this possible? Thanks, Seamus -- Seamus Abshere, SCEA https://www.faraday.io https://github.com/seamusabshere https://linkedin.com/in/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Function to return per-column counts?
> > > Does anybody have a function lying around (preferably pl/pgsql) that > > > takes a table name and returns coverage counts? > > > > What is "coverage count"? Ah, I should have explained better. I meant how much of a column is null. Basically you have to 0. count how many total records in a table 1. discover the column names in a table 2. for each column name, count how many nulls and subtract from total count If nobody has one written, I'll write one and blog it. Thanks! Seamus PS. In a similar vein, we published http://blog.faraday.io/how-to-do-histograms-in-postgresql/ which gives plpsql so you can do: SELECT * FROM histogram($table_name_or_subquery, $column_name) -- Seamus Abshere, SCEA https://www.faraday.io https://github.com/seamusabshere https://linkedin.com/in/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Function to return per-column counts?
hey, Does anybody have a function lying around (preferably pl/pgsql) that takes a table name and returns coverage counts? e.g. #> select * from column_counts('cats'::regclass); column_name | all_count | present_count | null_count | coverage | --- name | 300 | 100 | 200 | 0.66 Thanks! Seamus -- Seamus Abshere, SCEA https://www.faraday.io https://github.com/seamusabshere https://linkedin.com/in/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Surprising locking behavior with CTE, FOR SHARE, and UPDATE
I have a query that splits up work (and manually does locking) according to an id range: WITH new_data AS ( SELECT [...] FROM data WHERE id BETWEEN 1 AND 2 -- here's my "id range" ), old_data AS ( SELECT [...] FROM data WHERE id IN (SELECT id FROM new_data) FOR UPDATE -- a manual lock to prevent race conditions ) UPDATE data SET [...] FROM new_data, old_data WHERE [...] But I see that queries are blocking each other from non-overlapping id ranges. For example, "BETWEEN 1 AND 2" is blocking "BETWEEN 5 AND 6". This is Postgres 9.6.3. Would it help to "redundantly" add the id ranges everywhere? (e.g, in the where clauses of old_data AND the final update)? Thanks! -- Seamus Abshere, SCEA https://www.faraday.io https://github.com/seamusabshere https://linkedin.com/in/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Determine size of table before it's committed?
hi, I've had an `INSERT INTO x SELECT FROM [...]` query running for more then 2 days. Is there a way to see how big x has gotten? Even a very rough estimate (off by a gigabyte) would be fine. Best, Seamus -- Seamus Abshere, SCEA https://www.faraday.io https://github.com/seamusabshere https://linkedin.com/in/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How do you decide what aggregates to add?
hi, Who decides if a seemingly-useful aggregate is added to Postgres? I would like to advocate for a couple, but I worry that I'm misunderstanding some community process that has decided _not_ to add aggregates or something. 1. I just discovered first()/last() as defined in the wiki [1], where it's noted that conversion from Access or Oracle is much easier with them. 2. We use our "homemade" jsonb_object_agg(jsonb) constantly, which is modeled off of (built-in) json_object_agg(name, value) and (built-in) jsonb_agg(expression). [2] Since building these into Postgres (though not fast C versions) is a matter of a dozen lines of SQL, why haven't they been added already? Seems like a great thing to brag about in release notes, etc. Thanks for your thoughts, Seamus [1] https://wiki.postgresql.org/wiki/First/last_(aggregate) [2] http://blog.faraday.io/how-to-aggregate-jsonb-in-postgres/ -- Seamus Abshere, SCEA https://www.faraday.io https://github.com/seamusabshere https://linkedin.com/in/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general