[GENERAL] Optimizing CartoDB's JenksBins function

2014-03-18 Thread Seamus Abshere

hi,

Our friends from CartoDB [1] provide a beautiful Jenks Natural Breaks 
function for Postgres [2]. It is quite computationally intensive.


Even if you don't know what Jenks is, do you see any optimizations?

Best, thanks,
Seamus

PS. I was hoping for something magical like Tom Lane's VALUES() fix for 
DataDog [3] or HeapAnalytic's array fix [4], although I realize both are 
fundamentally different :) This is probably a more subtle loop 
optimization problem, if anything.



[1] http://cartodb.com
[2] 
https://github.com/CartoDB/cartodb/blob/master/lib/sql/scripts-available/CDB_JenksBins.sql
[3] 
https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/
[4] 
http://blog.heapanalytics.com/dont-iterate-over-a-postgres-array-with-a-loop/



--
Seamus Abshere, SCEA
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Why is pg_restore trying to create tables in pg_catalog?

2014-04-04 Thread Seamus Abshere

hi,

Why is pg_restore trying to put stuff into the pg_catalog schema of all 
places?


It's ignoring the schema specified in the pg_dump itself (`myschema`) 
and even my search_path (`public`).


$ psql stuff_development --command show search_path
 search_path
-
 public
(1 row)
$ pg_restore --list stuff.pg_dump
;
; Archive created at Fri Apr  4 00:55:50 2014
; dbname: stuff_development
; TOC Entries: 14
; Compression: -1
; Dump Version: 1.12-0
; Format: CUSTOM
; Integer: 4 bytes
; Offset: 8 bytes
; Dumped from database version: 9.3.4
; Dumped by pg_dump version: 9.3.4
;
;
; Selected TOC Entries:
;
205; 1259 95675 TABLE myschema stuff_one myuser
[...]
3312; 0 95675 TABLE DATA myschema stuff_one myuser
[...]
$ pg_restore --verbose --no-owner --no-privileges --dbname 
stuff_development stuff.pg_dump

pg_restore: connecting to database for restore
pg_restore: creating TABLE stuff_one
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 205; 1259 95675 TABLE 
stuff_one myuser
pg_restore: [archiver (db)] could not execute query: ERROR:  permission 
denied to create pg_catalog.stuff_one

DETAIL:  System catalog modifications are currently disallowed.
Command was: CREATE TABLE stuff_one (
the_geom public.geometry
);

Thank you!
Seamus

--
Seamus Abshere, SCEA
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is pg_restore trying to create tables in pg_catalog?

2014-04-04 Thread Seamus Abshere

On 4/4/14, 12:58 PM, Tom Lane wrote:

Seamus Abshere sea...@abshere.net writes:

Why is pg_restore trying to put stuff into the pg_catalog schema of all places?


Hm ... does myschema actually exist in the target database? [...] if myschema 
doesn't exist, the creation target devolves to pg_catalog.  This is not 
terribly elegant but I'm not sure if there's consensus to change it.


Tom,

You're right, myschema didn't exist (I thought I had created it 
separately, etc.)


Perhaps it would be good to warn the user (at least in --verbose) if 
it's auto-devolving to pg_catalog?


Thanks again,
Seamus

PS. Otherwise, if you google the error message, you get a whole bunch of 
stackoverflow posts recommending you make your user a superuser so you 
can write to pg_catalog, which probably isn't what the person wanted in 
the first place.


--
Seamus Abshere, SCEA
+598 99 54 99 54
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Force specific index disuse

2014-05-20 Thread Seamus Abshere

On 5/20/14, 1:38 PM, Steve Crawford wrote:

Is there a way to force a specific index to be removed from
consideration in planning a single query?


hi Steve,

What is the query? Or at least a sanitized but complete version?

Thanks,
Seamus

PS. I've had luck hinting with OFFSET 0 but it might not help in your 
use case. 
http://seamusabshere.github.io/2013/03/29/hinting-postgres-and-mysql-with-offset-and-limit/



Specifically, on a 60-million-row table I have an index that is a
candidate for removal. I have identified the sets of nightly queries
that use the index but before dropping it I would like to run EXPLAIN
and do timing tests on the queries to see the impact of not having that
index available and rewrite the query to efficiently use other indexes
if necessary.



--
Seamus Abshere, SCEA
+598 99 54 99 54
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Why is unique constraint needed for upsert?

2014-07-23 Thread Seamus Abshere

hi all,

Upsert is usually defined [1] in reference to a violating a unique key:


Insert, if unique constraint violation then update; or update, if not found 
then insert.


Is this theoretically preferable to just looking for a row that matches 
certain criteria, updating it if found or inserting otherwise?


For an example of the latter approach, see MongoDB's flavor of upsert 
[2]. You just give it a query and an update. It seems to me this is 
better because it functions correctly whether or not an index is in place.


Best, thanks,
Seamus


[1] 
http://postgresql.uservoice.com/forums/21853-general/suggestions/245202-merge-upsert-replace

[2] http://docs.mongodb.org/manual/reference/method/db.collection.update/

--
Seamus Abshere, SCEA
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is unique constraint needed for upsert? (treat atomicity as optional)

2014-07-23 Thread Seamus Abshere

On 7/23/14 3:40 PM, Tom Lane wrote:

John R Pierce pie...@hogranch.com writes:

On 7/23/2014 10:21 AM, Seamus Abshere wrote:

Upsert is usually defined [1] in reference to a violating a unique key:
Is this theoretically preferable to just looking for a row that
matches certain criteria, updating it if found or inserting otherwise?



what happens when two connections do this more or less concurrently, in
transactions?


For the OP's benefit --- the subtext John left unstated is that the
unique-key mechanism has already solved the problem of preventing
concurrent updates from creating duplicate keys.  If we build a version of
UPSERT that doesn't rely on a unique index then it'll need some entirely
new mechanism to prevent concurrent key insertion.  (And if you don't care
about concurrent cases, you don't really need UPSERT ...)


hi all,

What if we treat atomicity as optional? You could have extremely 
readable syntax like:



-- no guarantees, no index required
UPSERT age = 5 INTO dogs WHERE name = 'Jerry';



-- optionally tell us how you want to deal with collision
UPSERT age = 3 INTO dogs WHERE name = 'Jerry' ON DUPLICATE KEY KEEP NEWEST;
UPSERT age = 5 INTO dogs WHERE name = 'Jerry' ON DUPLICATE KEY KEEP NEWEST;



-- only **require** (by throwing an error) a unique index or a locked table for 
queries like
UPSERT age = age+1 INTO dogs WHERE name = 'Jerry';


Obviously this flies in the face of what most people say the 
fundamental Upsert property is [1]



At READ COMMITTED isolation level, you should always get an atomic insert or 
update [1]


I just think there are a lot of non-concurrent bulk loading and 
processing workflows that could benefit from the performance advantages 
of upsert (one trip to database).


Best, thanks,
Seamus

[1] http://www.pgcon.org/2014/schedule/events/661.en.html

--
Seamus Abshere, SCEA
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is unique constraint needed for upsert? (treat atomicity as optional)

2014-07-23 Thread Seamus Abshere

On 7/23/14 6:03 PM, John R Pierce wrote:

On 7/23/2014 1:45 PM, Seamus Abshere wrote:

What if we treat atomicity as optional?


atomicity is not and never will be optional in PostgreSQL.


I'm wondering what a minimal definition of upsert could be - possibly 
separating concurrency handling out as a (rigorously defined) option for 
those who need it.



-- no guarantees, no index required
UPSERT age = 5 INTO dogs WHERE name = 'Jerry';


and if there's several rows with name='Jerry', you'd want to update them
ALL ?  if name isn't indexed, this will, as Tom suggests, require a FULL
table scan, and it still will have issues with concurrency


Ah, I was just saying, in terms of correctness, it seems to me that 
upsert shouldn't NEED a index to work, just like you don't need an index 
on name when you say WHERE name = 'Jerry' in SELECTs or INSERTS or 
UPDATES.


Appreciate the defense of data integrity in any case!!

Best,
Seamus

--
Seamus Abshere, SCEA
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Why is unique constraint needed for upsert? (treat atomicity as optional)

2014-07-23 Thread Seamus Abshere

On 7/23/14 6:50 PM, David G Johnston wrote:

seamusabshere wrote

On 7/23/14 6:03 PM, John R Pierce wrote:

On 7/23/2014 1:45 PM, Seamus Abshere wrote:

What if we treat atomicity as optional?

atomicity is not and never will be optional in PostgreSQL.

I'm wondering what a minimal definition of upsert could be - possibly
separating concurrency handling out as a (rigorously defined) option for
those who need it.

Given we do not have native UPSERT I'm not sure where your question is
coming from anyway.  I'm not sure what the plans are for UPSERT at the
moment but nothing prevents us from performing the UPSERT comparison on a
non-uniqe set of columns.


hi David,

My argument lives and dies on the assumption that UPSERT would be useful 
even if it was (when given with no options) just a macro for



  UPDATE db SET b = data WHERE a = key;
  IF NOT found THEN
INSERT INTO db(a,b) VALUES (key, data);
  END IF;


Adding things like unique indexes would work like you would expect with 
individual INSERTs or UPDATEs - your statement might raise an exception. 
Then, going beyond, UPSERT would optionally support atomic a = a+1 
stuff, special actions to take on duplicate keys, all the concurrency 
stuff that people have been talking about.


IMO having such a complicated definition of what an upsert must be 
makes it a unicorn when it could just be a sibling to INSERT and UPDATE.


Best,
Seamus


--
Seamus Abshere, SCEA
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Why is unique constraint needed for upsert? (treat atomicity as optional)

2014-07-23 Thread Seamus Abshere

On 7/23/14 7:45 PM, John R Pierce wrote:

On 7/23/2014 3:29 PM, Seamus Abshere wrote:

My argument lives and dies on the assumption that UPSERT would be
useful even if it was (when given with no options) just a macro for


  UPDATE db SET b = data WHERE a = key;
  IF NOT found THEN
INSERT INTO db(a,b) VALUES (key, data);
  END IF;


but that won't work if two connections execute similar 'upserts'
concurrently.both updates will see the record isn't there, then one
or the other insert will fail, depending on which transaction commits
first.


John,

Right - if you had a situation where that might happen, you would use a 
slightly more advanced version of the UPSERT command (and/or add a 
unique index).


UPSERT, in this conception and in its most basic form, would be subject 
to many of the same (and more) concurrency concerns as basic INSERTs and 
UPDATEs.


Providing options may be preferable magically handling everything.

Best,
Seamus


--
Seamus Abshere, SCEA
+598 99 54 99 54
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Unexpected syntax error when using JSON - in 9.3.5

2014-09-23 Thread Seamus Abshere

hi all,

This part looks correct and expected:


$ psql foobar
psql (9.3.5)
Type help for help.

foobar=# select coalesce('{}'::json-'a', 1);
ERROR:  COALESCE types json and integer cannot be matched
LINE 1: select coalesce('{}'::json-'a', 1);
 ^


but check it out when I use a string instead of an integer:


foobar=# select coalesce('{}'::json-'a', 'b');
ERROR:  invalid input syntax for type json
LINE 1: select coalesce('{}'::json-'a', 'b');
 ^
DETAIL:  Token b is invalid.
CONTEXT:  JSON data, line 1: b


That seems like the wrong error - shouldn't it be the equiv of [...] 
json and string cannot be matched?


Thanks,
Seamus


--
Seamus Abshere, SCEA
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Will there be a JSON operator like - but returning numeric?

2014-09-23 Thread Seamus Abshere

hi,

I've got use cases like


array_remove(array_agg((a-'b')::float), NULL)


It would be nice to replace (a-'b')::float with something like


a-^'b'


that directly returned a numeric... is that in the cards?

Thanks,
Seamus


--
Seamus Abshere, SCEA
https://github.com/seamusabshere


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Comparing results of regexp_matches

2014-11-16 Thread Seamus Abshere
hi,

I want to check if two similar-looking addresses have the same numbered
street, like 20th versus 21st.

2033 21st Ave S
2033 20th Ave S (they're different)

I get an error:

# select regexp_matches('2033 21st Ave S', '\m(\d+(?:st|th))\M') =
regexp_matches('2033 20th Ave S', '\m(\d+(?:st|th))\M');
ERROR:  functions and operators can take at most one set argument

I've tried `()[1] == ()[1]`, etc. but the only thing that works is
making it into 2 subqueries:

# select (select * from regexp_matches('2033 21st Ave S',
'\m(\d+(?:st|th))\M')) = (select * from regexp_matches('2033 20th
Ave S', '\m(\d+(?:st|th))\M'));
 ?column?
--
 f
(1 row)

Is there a more elegant way to compare the results of
`regexp_matches()`?

Thanks,
Seamus

-- 
Seamus Abshere, SCEA
https://github.com/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Definitive answer: can functions use indexes?

2016-01-06 Thread Seamus Abshere
On Wed, Jan 6, 2016, at 08:41 PM, Tom Lane wrote:
> Seamus Abshere <sea...@abshere.net> writes:
> > -> Can a function like `LEFT()` use an index?
> Since the question makes little sense as stated, I'm going to assume
> you mean "can a query like SELECT ... WHERE left(foo, 3) = 'bar'
> use an index on column foo?"
> 
> The answer to that is no, there is no such optimization built into
> Postgres.  (In principle there could be, but I've not heard enough
> requests to make me think we'd ever pursue it.)
> 
> The equivalent optimization that *is* built in, and has been for
> a long time, is for LIKE: "SELECT ... WHERE foo LIKE 'bar%'" can
> use an index on foo, at least if it's an index sorted according to
> C collation.

hi Tom,

I should have been more general. In layman's/narrative terms, what's the
deal with functions vs. operators for postgres indexes?

For example, `exist(hstore,text)` vs. `hstore ? text` ?

Thank you!
Seamus

PS. If I have understood correctly over the years, in order for the
query planner to use indexes, it needs to see operators - functions are
opaque to it. I'm looking for a bit more narrative on this to round out
my understanding.

-- 
Seamus Abshere, SCEA
+598 99 54 99 54
https://github.com/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Definitive answer: can functions use indexes?

2016-01-06 Thread Seamus Abshere
hi,

I've been using Postgres for years ( :heart: ) and I'm still in doubt
about this. Would somebody provide an authoritative, definitive,
narrative answer?

-> Can a function like `LEFT()` use an index?

(Or do I have to find an "equivalent" operator in order to leverage
indexes?)

Thanks!
Seamus

-- 
Seamus Abshere, SCEA
+598 99 54 99 54
https://github.com/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Read-only tables to avoid row visibility check

2016-02-22 Thread Seamus Abshere
hi,

https://wiki.postgresql.org/wiki/ReadOnlyTables mentions the possibility
of `ALTER TABLE table SET READ ONLY`.

Would this mean that row visibility checks could be skipped and thus
index-only scans much more common?

Thanks,
Seamus

-- 
Seamus Abshere, SCEA
+598 99 54 99 54
https://github.com/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] uuid gin operator class - why not include for everybody?

2016-02-24 Thread Seamus Abshere
hi,

We want to use gin indexes on arrays of UUIDs. It is as simple as:

> CREATE OPERATOR CLASS _uuid_ops
> DEFAULT FOR TYPE _uuid
> USING gin AS
> OPERATOR 1 &&(anyarray, anyarray),
> OPERATOR 2 @>(anyarray, anyarray),
> OPERATOR 3 <@(anyarray, anyarray),
> OPERATOR 4 =(anyarray, anyarray),
> FUNCTION 1 uuid_cmp(uuid, uuid),
> FUNCTION 2 ginarrayextract(anyarray, internal, internal),
> FUNCTION 3 ginqueryarrayextract(anyarray, internal, smallint, internal, 
> internal, internal, internal),
> FUNCTION 4 ginarrayconsistent(internal, smallint, anyarray, integer, 
> internal, internal, internal, internal),
> STORAGE uuid;

Is there a reason not to put this into postgres itself? This already
exists for text[].

Thanks,
Seamus


-- 
Seamus Abshere, SCEA
+598 99 54 99 54
https://github.com/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does query planner choose slower BitmapAnd ?

2016-02-22 Thread Seamus Abshere
On Mon, Feb 22, 2016, at 01:48 PM, Tom Lane wrote:
> Given how remarkably quick the single-index scan is, I also wonder if that 
> index is fully cached while we had to read some of the other index from 
> kernel or SSD.

This makes sense, except that the speed of the query is the same if I
run it many times in a row. Shouldn't the partially-cached index get
loaded fully by the second query?

On Mon, Feb 22, 2016, at 01:20 PM, Stephen Frost wrote:
> The first question is probably- are we properly accounting for the cost of 
> scanning the index vs the cost of scanning one index and then applying the 
> filter?

I can affect the query planner's cost estimates with random_page_cost
(only), but I still can't get it to avoid the BitmapAnd - probably
because I am affecting other cost estimates in the same proportion.

No change with original settings OR cpu_tuple_cost=10 OR
seq_page_cost=10 OR (cpu_tuple_cost=0.05, seq_page_cost=1,
random_page_cost=1)

> ->  BitmapAnd  (cost=105894.80..105894.80 rows=21002 width=0) (actual 
> time=4859.397..4859.397 rows=0 loops=1)
>   ->  Bitmap Index Scan on idx_houses_city  (cost=0.00..1666.90 rows=164044 
> width=0) (actual time=16.098..16.098 rows=155690 loops=1)
> Index Cond: (city = 'New York'::text)
>   ->  Bitmap Index Scan on idx_houses_phoneable  (cost=0.00..104224.60 
> rows=10271471 width=0) (actual time=4771.520..4771.520 rows=10647041 loops=1)
> Index Cond: (phoneable = true)

However with random_page_cost=10 (hint: cost estimates go up by 4x or
so)

> ->  BitmapAnd  (cost=354510.80..354510.80 rows=21002 width=0) (actual 
> time=4603.575..4603.575 rows=0 loops=1)
>   ->  Bitmap Index Scan on idx_houses_city  (cost=0.00..5590.90 rows=164044 
> width=0) (actual time=16.529..16.529 rows=155690 loops=1)
> Index Cond: (city = 'New York'::text)
>   ->  Bitmap Index Scan on idx_houses_phoneable  (cost=0.00..348916.60 
> rows=10271471 width=0) (actual time=4530.424..4530.424 rows=10647041 loops=1)
> Index Cond: (phoneable = true)

I think this is why we originally set random_page_cost so "low"... it
was our way of "forcing" more index usage (we have a big, wide table).

Is there any other way to differentiate the 2 index scans? FWIW, 10% of
houses are phoneable, 0.2% are in the city. (Maybe I'm just supposed to
drop the index like Tom said.)

Best, thanks,
Seamus

-- 
Seamus Abshere, SCEA
+598 99 54 99 54
https://github.com/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Read-only tables to avoid row visibility check

2016-02-22 Thread Seamus Abshere
On Mon, Feb 22, 2016, at 06:48 PM, David G. Johnston wrote:
> it would probably be more constructive to actually communicate the thoughts 
> that provoked the question.

My company has a largish table - 250+ columns, 1 row for every household
in the US. It's read-only. We've gotten advice to convert to a column
store (cstore_fdw, etc.) but we would love to just stay with
tried-and-true postgres tables. Plus, many of our queries are against
dozens of columns at once.

Being able to tell postgres that our table is "Read Only" has imaginary
mystical properties for me, first and foremost being able to count
against indexes without ever hitting the disk.

> ​If the system is working properly then a READ ONLY table in fact should be 
> able to use Index Only Scans without the hack of a DBA telling it that said 
> table is READ ONLY.​

So this should happen already?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Read-only tables to avoid row visibility check

2016-02-22 Thread Seamus Abshere
On Mon, Feb 22, 2016, at 06:30 PM, Melvin Davidson wrote:
> However, at this time, there is no such option as SET READ ONLY in any 
> version of PostgreSQL.

I know.

I am wondering if hypothetical read-only tables would make index-only
scans more possible by avoiding the need for row visibility checks.


-- 
Seamus Abshere, SCEA
+598 99 54 99 54
https://github.com/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does query planner choose slower BitmapAnd ?

2016-02-22 Thread Seamus Abshere
On Mon, Feb 22, 2016, at 02:14 PM, Tom Lane wrote:
> IOW, almost certainly we *don't* realize that the query will involve scanning 
> through gigabytes of index pages.  But btree indexes are much simpler and 
> easier to make that estimate for...

Isn't this the crux of my issue, at least?


-- 
Seamus Abshere, SCEA
+598 99 54 99 54
https://github.com/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does query planner choose slower BitmapAnd ?

2016-02-22 Thread Seamus Abshere
On Mon, Feb 22, 2016, at 02:30 PM, Jeff Janes wrote:
> It charges 0.1 CPU_operator_cost, while reality seemed to be more like 6 
> CPU_operator_cost.

fdy=> select name, setting, boot_val from pg_settings where name ~
'cpu';
 name | setting | boot_val
--+-+--
 cpu_index_tuple_cost | 0.005   | 0.005
 cpu_operator_cost| 0.0025  | 0.0025
 cpu_tuple_cost   | 0.01| 0.01
(3 rows)

Inspired, I changed cpu_index_tuple_cost to 0.1 (default: 0.005). It
"fixed" my problem by preventing the BitmapAnd.

Is this dangerous?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does query planner choose slower BitmapAnd ?

2016-02-22 Thread Seamus Abshere
On Mon, Feb 22, 2016, at 02:49 PM, Tom Lane wrote:
> Seamus Abshere <sea...@abshere.net> writes:
> > Inspired, I changed cpu_index_tuple_cost to 0.1 (default: 0.005). It 
> > "fixed" my problem by preventing the BitmapAnd.
> > Is this dangerous?
> 
> Use a gentle tap, man, don't swing the hammer with quite so much abandon.
> I'd have tried doubling the setting to start with.  Raising it 20X might
> cause other queries to change behavior undesirably.

Doubling it was enough :)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does query planner choose slower BitmapAnd ?

2016-02-22 Thread Seamus Abshere
On Mon, Feb 22, 2016, at 02:53 PM, Seamus Abshere wrote:
> On Mon, Feb 22, 2016, at 02:49 PM, Tom Lane wrote:
> > Seamus Abshere <sea...@abshere.net> writes:
> > > Inspired, I changed cpu_index_tuple_cost to 0.1 (default: 0.005). It 
> > > "fixed" my problem by preventing the BitmapAnd.
> > > Is this dangerous?
> > 
> > Use a gentle tap, man, don't swing the hammer with quite so much abandon.
> > I'd have tried doubling the setting to start with.  Raising it 20X might
> > cause other queries to change behavior undesirably.
> 
> Doubling it was enough :)

 name | setting | boot_val
--+-+--
 cpu_index_tuple_cost | 0.09| 0.005   <- 18x boot val, 9x
 cpu_tuple_cost
 cpu_operator_cost| 0.0025  | 0.0025
 cpu_tuple_cost   | 0.01| 0.01
 
In the end I'm back to the big hammer.

I found that larger cities (e.g., more results from the city index)
required a larger cpu_index_tuple_cost to prevent the BitmapAnd.

Now cpu_index_tuple_cost is set to 0.09, which is 18x its boot_val and
9x cpu_tuple_cost... which seems strange.

Logically, should I be changing cpu_operator_cost instead?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Why does query planner choose slower BitmapAnd ?

2016-02-22 Thread Seamus Abshere
hi,

I don't understand why the query planner is choosing a BitmapAnd when an
Index Scan followed by a filter is obviously better.

(Note that "new_york_houses" is a view of table "houses" with one
condition on city - and there is an index idx_houses_city. That is the
Index Scan that I think it should use exclusively.)

Here's a fast query that uses the Index Scan followed by a filter:

> => explain analyze SELECT COUNT(DISTINCT id) FROM new_york_houses WHERE 
> roof_area >= 0 AND roof_area < 278.7091;
>   
>  QUERY PLAN
> ---
>  Aggregate  (cost=167298.10..167298.11 rows=1 width=16) (actual 
> time=141.137..141.137 rows=1 loops=1)
>->  Index Scan using idx_houses_city on households  (cost=0.57..167178.87 
> rows=47694 width=16) (actual time=0.045..105.953 rows=53971 loops=1)
>  Index Cond: (city = 'New York'::text)
>  Filter: ((roof_area >= 0) AND ((roof_area)::numeric < 278.7091))
>  Rows Removed by Filter: 101719
>  Planning time: 0.688 ms
>  Execution time: 141.250 ms
> (7 rows)

When I add another condition, "phoneable", however, it chooses an
obviously wrong plan:

> => explain analyze SELECT COUNT(DISTINCT id) FROM new_york_houses WHERE 
> roof_area >= 0 AND roof_area < 278.7091 AND phoneable = true;
>   
>QUERY PLAN
> ---
>  Aggregate  (cost=128163.05..128163.06 rows=1 width=16) (actual 
> time=4564.677..4564.677 rows=1 loops=1)
>->  Bitmap Heap Scan on households  (cost=105894.80..128147.78 rows=6106 
> width=16) (actual time=4456.690..4561.416 rows=5183 loops=1)
>  Recheck Cond: (city = 'New York'::text)
>  Filter: (phoneable AND (roof_area >= 0) AND ((roof_area)::numeric < 
> 278.7091))
>  Rows Removed by Filter: 40103
>  Heap Blocks: exact=14563
>  ->  BitmapAnd  (cost=105894.80..105894.80 rows=21002 width=0) 
> (actual time=4453.510..4453.510 rows=0 loops=1)
>->  Bitmap Index Scan on idx_houses_city  (cost=0.00..1666.90 
> rows=164044 width=0) (actual time=16.505..16.505 rows=155690 loops=1)
>  Index Cond: (city = 'New York'::text)
>->  Bitmap Index Scan on idx_houses_phoneable  
> (cost=0.00..104224.60 rows=10271471 width=0) (actual time=4384.461..4384.461 
> rows=10647041 loops=1)
>  Index Cond: (phoneable = true)
>  Planning time: 0.709 ms
>  Execution time: 4565.067 ms
> (13 rows)

On Postgres 9.4.4 with 244gb memory and SSDs

maintenance_work_mem 100
work_mem 50
random_page_cost 1
seq_page_cost 2

The "houses" table has been analyzed recently and has statistics set to
the max.

Thanks,
Seamus


-- 
Seamus Abshere, SCEA
https://github.com/seamusabshere
https://www.linkedin.com/in/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] plpgsql update row from record variable

2016-04-02 Thread Seamus Abshere
hi,

I want to write a function that updates arbitrary columns and here's my
pseudocode:

CREATE OR REPLACE FUNCTION update_pet(id int, raw_data jsonb) RETURNS
VOID AS $$
DECLARE
  data record;
BEGIN
  SELECT jsonb_populate_record(null::pets, raw_data) INTO data;
  UPDATE pets [... from data ...] WHERE id = id; -- probably impossible
END;
$$ LANGUAGE plpgsql;

e.g.

SELECT update_pets(1, '{ "name" : "Jerry", "age": 9 }'::jsonb);

Back in 2004, Tom showed how to insert from a plpgsql record:

http://www.postgresql.org/message-id/17840.1087670...@sss.pgh.pa.us

Is there any way to "update *" from a record?

Thanks!
Seamus

PS. Whether I **should** do this is another matter, I just want to know
if it's possible.

--
Seamus Abshere, SCEA
https://github.com/seamusabshere
http://linkedin.com/in/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Does a row lock taken out in a CTE stay in place?

2017-07-11 Thread Seamus Abshere
Given an update that uses CTEs like this:

WITH
lock_rows AS (
  SELECT 1 FROM tbl WHERE [...] FOR UPDATE
)
UPDATE [...]

Will the rows in `tbl` remain locked until the UPDATE is finished?

Also, does it matter if `lock_rows` is referenced? (IIUC the query
wouldn't be run if the CTE isn't referenced if it was for a SELECT, but
since it's an UPDATE, it will be run anyway)

Thanks!
Seamus

--
Seamus Abshere, SCEA
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere
https://www.faraday.io


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Concurrency and UPDATE [...] FROM

2017-07-10 Thread Seamus Abshere
The purpose is to concat new data onto existing values of c:

UPDATE tbl
SET c = c || new_data.c
FROM ( [...] ) AS new_data
WHERE
  tbl.id = new_data.id

It appears to have a race condition:

t0: Query A starts subquery
t1: Query A starts self-join
t2. Query A starts UPDATE with data from self-join and subquery
t3. Query B starts subquery
t4. Query B starts self-join (note: data from t1!)
[...]
tN. Query A finishes UPDATE
tN+1. Query B finishes UPDATE, missing any new_data from Query A

My assumption is that t1 and t4 (the self-joins) use SELECT but not
SELECT FOR UPDATE. If they did, I think the race condition would go
away.

Did I analyze that right?

Thanks!

--
Seamus Abshere, SCEA
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere
https://www.faraday.io


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] CREATE AGGREGATE on jsonb concat

2017-07-10 Thread Seamus Abshere
Seamus Abshere <sea...@abshere.net> writes:

> That aggregates into an array. Our `jsonb_collect` aggregates into an object.

Postgres 9.6 has (per
https://www.postgresql.org/docs/9.6/static/functions-aggregate.html):

* jsonb_agg(expression)
* jsonb_object_agg(name, value)

In retrospect, I think what I am proposing is:

* jsonb_object_agg(expression)

Is that sane?

Best,
Seamus


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] CREATE AGGREGATE on jsonb concat

2017-07-06 Thread Seamus Abshere
hi,

We do this in our database:

CREATE AGGREGATE jsonb_collect(jsonb) (
SFUNC = 'jsonb_concat',
STYPE = jsonb,
INITCOND = '{}'
);

Is there some other built-in aggregate I'm missing that would do the
same thing? It just feels like such an obvious feature.

Thanks for your advice,
Seamus

--
Seamus Abshere, SCEA
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere
https://www.faraday.io


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Desired behavior for || (jsonb_concat)

2017-07-06 Thread Seamus Abshere
hi,

# select '{"a":1}'::jsonb || null;
 ?column?
--
 null
(1 row)

Is there a theoretical reason that this has to return null as opposed to
just {"a":1}?

Thanks,
Seamus

--
Seamus Abshere, SCEA
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere
https://www.faraday.io


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] CREATE AGGREGATE on jsonb concat

2017-07-06 Thread Seamus Abshere
> Seamus Abshere <sea...@abshere.net> writes:
> > We do this in our database:
> 
> > CREATE AGGREGATE jsonb_collect(jsonb) (
> > SFUNC = 'jsonb_concat',
> > STYPE = jsonb,
> > INITCOND = '{}'
> > );
> 
> > Is there some other built-in aggregate I'm missing that would do the
> > same thing? It just feels like such an obvious feature.

> On Thu, Jul 6, 2017, at 04:53 PM, Tom Lane wrote:
> Doesn't jsonb_agg() do exactly that?

hi Tom,

That aggregates into an array. Our `jsonb_collect` aggregates into an
object.

Best,
Seamus


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Index-only scan on GIN index for COUNT() queries

2017-06-15 Thread Seamus Abshere
hi,

We have a GIN index on jsonb_col. We always get Bitmap Index Scan +
Bitmap Heap Scan when we do things like

SELECT COUNT(*) FROM mytable WHERE jsonb_col ? 'key1'

Theoretically, could support be added for Index-only scans on GIN
indexes when only a COUNT() is requested?

Thanks,
Seamus

PS. Here is the real query analyze:
https://gist.github.com/seamusabshere/b9d72132361fa598f7a431fa1bcb120f

--
Seamus Abshere, SCEA
http://faraday.io
https://github.com/seamusabshere
http://linkedin.com/in/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Non-overlapping updates blocking each other

2017-10-15 Thread Seamus Abshere
> On Sat, Oct 14, 2017 at 10:30 AM, Seamus Abshere <sea...@abshere.net>
> > UPDATE [...] WHERE id BETWEEN 'ff00----' AND
> > 'ff0f----'
> > and
> > UPDATE [...] WHERE id BETWEEN 'f8c0----' AND
> > 'f8ff----'
> > Yet one blocks the other one. How is this possible?

On Sat, Oct 14, 2017, at 12:32 PM, Melvin Davidson wrote:
> More than likely, the optimizer has determined that a table scan is best,
> in which case it will use a table lock.
> You can also execute the following query and check the wait_event_type to
> verify.

hi Melvin,

Very interesting! The result:

wait_event  | page
wait_event_type | Lock

So I guess this means that the ids don't overlap, but they are sometimes
found in the same page, and the whole page gets locked?

Any narrative (pretending I don't know anything) would be very helpful.

Thanks!
Seamus

PS. I do a SELECT 1 FROM x WHERE [ID_RANGE] FOR UPDATE right before the
update, but that's to prevent a race condition. The id ranges still
don't overlap.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Non-overlapping updates blocking each other

2017-10-16 Thread Seamus Abshere
On 2017-10-14 16:32:33 Tom Lane wrote:
> More likely explanations for the OP's problem involve foreign key
> constraints that cause two different row updates to need to lock
> the same referenced row, or maybe he's using some index type that
> has greater locking demands than a btree, or he's using serializable

hi Tom,

I hesitate to share my query and indexes because it makes this question
seem more esoteric than I think it really is... but here we go.

* Version 9.6.3.
* I don't have any foreign key constraints.
* I don't use serializable.
* My update query is very careful to stay in an id range. [1]
* I do have some exotic indexes [2]. gist, gin, postgis, fillfactor...

My current theory is that, since the table is not clustered by id, rows
with very distant ids get stored in the same page, and the whole page is
locked during an update. Or something.

[1] Update SQL:
https://gist.github.com/seamusabshere/d04dad259e383c13f5559241d2fcad70

[2] Indexes:
https://gist.github.com/seamusabshere/acba364b97e1dd221a589b1aaf22bddb

Thanks,
Seamus


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Non-overlapping updates blocking each other

2017-10-16 Thread Seamus Abshere
> Seamus Abshere <sea...@abshere.net> writes:
> > * I do have some exotic indexes [2]. gist, gin, postgis, fillfactor...

> On Mon, Oct 16, 2017, at 10:55 AM, Tom Lane wrote:
> I'd bet on the last one, especially since you found that the problem
> was a page-level lock.  Did you look to see which relation the page
> lock was in?

The specific relation varies, but it appears to always be compound GIN
index on (jsonb, text)

Can I like decrease the fillfactor or something to make these
"collisions" less frequent?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Non-overlapping updates blocking each other

2017-10-16 Thread Seamus Abshere
> > On Mon, Oct 16, 2017, at 10:55 AM, Tom Lane wrote:
> > I'd bet on the last one, especially since you found that the problem
> > was a page-level lock.  Did you look to see which relation the page
> > lock was in?

On Mon, Oct 16, 2017, at 12:34 PM, Seamus Abshere wrote:
> The specific relation varies, but it appears to always be compound GIN
> index on (jsonb, text)

This is definitely GIN fastupdate. I turned off fastupdate and the
blocks go away. I have a feeling, however, that my UPDATEs will actually
get slower (testing now).

I'm most interested in the fastest UPDATEs possible, even if reads
suffer or similar UPDATEs take wildly different amounts of time.

Should I crank maintenance_work_mem and gin_pending_list_limit  way up,
and autovacuum thresholds way down?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Seamus Abshere
> > On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere wrote:
> >> Theoretically / blue sky, could there be a table or column type that
> >> transparently handles "shared strings" like this, reducing size on disk
> >> at the cost of lookup overhead for all queries?
> >> (I guess maybe it's like TOAST, but content-hashed and de-duped and not
> >> only for large objects?)

On Fri, Oct 13, 2017, at 01:29 PM, Melvin Davidson wrote:
> What was described is exactly what relations and Foreign Keys are for.

hi Melvin, appreciate the reminder. Our issue is that we have 300+
columns and frequently include them in the SELECT or WHERE clauses... so
traditional normalization would involve hundreds of joins.

That's why I ask about a new table or column type that handles basic
translation and de-duping transparently, keeping the coded values
in-table.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] "Shared strings"-style table

2017-10-13 Thread Seamus Abshere
hey,

In the spreadsheet world, there is this concept of "shared strings," a
simple way of compressing spreadsheets when the data is duplicated in
many cells.

In my database, I have a table with >200 million rows and >300 columns
(all the households in the United States). For clarity of development
and debugging, I have not made any effort to normalize its contents, so
millions of rows have, for example, "SINGLE FAMILY RESIDENCE /
TOWNHOUSE" (yes, that whole string!) instead of some code representing
it.

Theoretically / blue sky, could there be a table or column type that
transparently handles "shared strings" like this, reducing size on disk
at the cost of lookup overhead for all queries?

(I guess maybe it's like TOAST, but content-hashed and de-duped and not
only for large objects?)

Thanks,
Seamus

--
Seamus Abshere, SCEA
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Seamus Abshere
> On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere wrote
> > Theoretically / blue sky, could there be a table or column type that
> > transparently handles "shared strings" like this, reducing size on disk
> > at the cost of lookup overhead for all queries?
> > (I guess maybe it's like TOAST, but content-hashed and de-duped and not
> > only for large objects?)

On Fri, Oct 13, 2017, at 01:12 PM, David G. Johnston wrote:
> Row-independence is baked into PostgreSQL pretty deeply...

Could you say more about that?

What about the comparison to TOAST, which stores values off-table?

Thanks!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Seamus Abshere
On Fri, Oct 13, 2017, at 03:16 PM, David G. Johnston wrote:
> implement a "system-managed-enum" type with many of the same properties
[...]
> TOAST does involved compression but the input to
> the compression algorithm is a single cell (row and column) in a table.​
> As noted above I consider the TOAST table and main table to be a single
> logical table.

See this sounds like _exactly_ what I want. Except with a content hash
instead of an id.

Seems to me like all of the machinery that allows you to look things up
by TOASTed columns and subsequently return TOASTed values as if they
resided in the same physical table is what is needed.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Non-overlapping updates blocking each other

2017-10-14 Thread Seamus Abshere
hi,

I've got 2 updates on non-overlapping uuid (primary key) ranges. For
example:

UPDATE [...] WHERE id BETWEEN 'ff00----' AND
'ff0f----'
and
UPDATE [...] WHERE id BETWEEN 'f8c0----' AND
'f8ff----'

Yet one blocks the other one. How is this possible?

Thanks,
Seamus

--
Seamus Abshere, SCEA
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Function to return per-column counts?

2017-09-28 Thread Seamus Abshere
> > > Does anybody have a function lying around (preferably pl/pgsql) that
> > > takes a table name and returns coverage counts?
> >
> > What is "coverage count"?

Ah, I should have explained better. I meant how much of a column is
null.

Basically you have to

0. count how many total records in a table
1. discover the column names in a table
2. for each column name, count how many nulls and subtract from total
count

If nobody has one written, I'll write one and blog it.

Thanks!
Seamus

PS. In a similar vein, we published
http://blog.faraday.io/how-to-do-histograms-in-postgresql/ which gives
plpsql so you can do:

SELECT * FROM histogram($table_name_or_subquery, $column_name)

--
Seamus Abshere, SCEA
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Function to return per-column counts?

2017-09-28 Thread Seamus Abshere
hey,

Does anybody have a function lying around (preferably pl/pgsql) that
takes a table name and returns coverage counts?

e.g.

#> select * from column_counts('cats'::regclass);
column_name | all_count | present_count | null_count | coverage |
---
name | 300 | 100 | 200 | 0.66

Thanks!
Seamus

--
Seamus Abshere, SCEA
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Surprising locking behavior with CTE, FOR SHARE, and UPDATE

2017-09-04 Thread Seamus Abshere
I have a query that splits up work (and manually does locking) according
to an id range:

WITH
new_data AS (
  SELECT [...] FROM data
  WHERE id BETWEEN 1 AND 2 -- here's my "id range"
),
old_data AS (
  SELECT [...] FROM data
  WHERE id IN (SELECT id FROM new_data)
  FOR UPDATE -- a manual lock to prevent race conditions
)
UPDATE data
SET [...]
FROM
new_data,
old_data
WHERE [...]

But I see that queries are blocking each other from non-overlapping id
ranges. For example, "BETWEEN 1 AND 2" is blocking "BETWEEN 5 AND 6".
This is Postgres 9.6.3.

Would it help to "redundantly" add the id ranges everywhere? (e.g, in
the where clauses of old_data AND the final update)?

Thanks!

--
Seamus Abshere, SCEA
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Determine size of table before it's committed?

2017-10-11 Thread Seamus Abshere
hi,

I've had an `INSERT INTO x SELECT FROM [...]` query running for more
then 2 days.

Is there a way to see how big x has gotten? Even a very rough estimate
(off by a gigabyte) would be fine.

Best,
Seamus


--
Seamus Abshere, SCEA
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] How do you decide what aggregates to add?

2017-10-20 Thread Seamus Abshere
hi,

Who decides if a seemingly-useful aggregate is added to Postgres? I
would like to advocate for a couple, but I worry that I'm
misunderstanding some community process that has decided _not_ to add
aggregates or something.

1. I just discovered first()/last() as defined in the wiki [1], where
it's noted that conversion from Access or Oracle is much easier with
them.
2. We use our "homemade" jsonb_object_agg(jsonb) constantly, which is
modeled off of (built-in) json_object_agg(name, value) and (built-in)
jsonb_agg(expression). [2]

Since building these into Postgres (though not fast C versions) is a
matter of a dozen lines of SQL, why haven't they been added already?
Seems like a great thing to brag about in release notes, etc.

Thanks for your thoughts,
Seamus

[1] https://wiki.postgresql.org/wiki/First/last_(aggregate)
[2] http://blog.faraday.io/how-to-aggregate-jsonb-in-postgres/

--
Seamus Abshere, SCEA
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general