Re: [PERFORM] Indexes for hashes

2016-06-16 Thread Claudio Freire
On Fri, Jun 17, 2016 at 1:09 AM, julyanto SUTANDANG
 wrote:
> This way is doing faster using crc32(data) than hashtext since crc32 is
> hardware accelerated in intel (and others perhaps)
> this way (crc32)  is no way the same as hash, much way faster than others...
>
> Regards,

Sure, but I've had uniformity issues with crc32.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Indexes for hashes

2016-06-16 Thread julyanto SUTANDANG
This way is doing faster using crc32(data) than hashtext since crc32 is
hardware accelerated in intel (and others perhaps)
this way (crc32)  is no way the same as hash, much way faster than
others...

Regards,


On Fri, Jun 17, 2016 at 10:51 AM, Claudio Freire 
wrote:

> On Wed, Jun 15, 2016 at 6:34 AM, Ivan Voras  wrote:
> >
> > I have an application which stores a large amounts of hex-encoded hash
> > strings (nearly 100 GB of them), which means:
> >
> > The number of distinct characters (alphabet) is limited to 16
> > Each string is of the same length, 64 characters
> > The strings are essentially random
> >
> > Creating a B-Tree index on this results in the index size being larger
> than
> > the table itself, and there are disk space constraints.
> >
> > I've found the SP-GIST radix tree index, and thought it could be a good
> > match for the data because of the above constraints. An attempt to
> create it
> > (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more
> > than 12 hours (while a similar B-tree index takes a few hours at most),
> so
> > I've interrupted it because "it probably is not going to finish in a
> > reasonable time". Some slides I found on the spgist index allude that
> both
> > build time and size are not really suitable for this purpose.
>
>
> I've found that hash btree indexes tend to perform well in these
> situations:
>
> CREATE INDEX ON t USING btree (hashtext(fieldname));
>
> However, you'll have to modify your queries to query for both, the
> hashtext and the text itself:
>
> SELECT * FROM t WHERE hashtext(fieldname) = hashtext('blabla') AND
> fieldname = 'blabla';
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 


Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source an Open Mind Company)

Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
Pusat
T: +6221 2282 F: +62216315281 M: +628164858028


Caution: The information enclosed in this email (and any attachments) may
be legally privileged and/or confidential and is intended only for the use
of the addressee(s). No addressee should forward, print, copy, or otherwise
reproduce this message in any manner that would allow it to be viewed by
any individual not originally listed as a recipient. If the reader of this
message is not the intended recipient, you are hereby notified that any
unauthorized disclosure, dissemination, distribution, copying or the taking
of any action in reliance on the information herein is strictly prohibited.
If you have received this communication in error, please immediately notify
the sender and delete this message.Unless it is made by the authorized
person, any views expressed in this message are those of the individual
sender and may not necessarily reflect the views of PT Equnix Business
Solutions.


Re: [PERFORM] 9.6 query slower than 9.5.3

2016-06-16 Thread Adam Brusselback
It'd be really hard to get a test dataset together I think, so I suppose
i'll learn how to compile Postgres.  Will let you know how that goes.


Re: [PERFORM] 9.6 query slower than 9.5.3

2016-06-16 Thread Tom Lane
Adam Brusselback  writes:
> Alright with that off I get:
> ...
> Way better.

OK, that confirms the suspicion that beta1's FK-join-estimation logic
is the culprit here.  We had already decided that that logic is broken,
and there's a rewrite in progress:
https://www.postgresql.org/message-id/15245.1466031608%40sss.pgh.pa.us

I wonder though whether the rewrite will fix your example.  Could you
either make some test data available, or try HEAD + aforesaid patch 
to see if it behaves sanely on your data?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Indexes for hashes

2016-06-16 Thread Claudio Freire
On Wed, Jun 15, 2016 at 6:34 AM, Ivan Voras  wrote:
>
> I have an application which stores a large amounts of hex-encoded hash
> strings (nearly 100 GB of them), which means:
>
> The number of distinct characters (alphabet) is limited to 16
> Each string is of the same length, 64 characters
> The strings are essentially random
>
> Creating a B-Tree index on this results in the index size being larger than
> the table itself, and there are disk space constraints.
>
> I've found the SP-GIST radix tree index, and thought it could be a good
> match for the data because of the above constraints. An attempt to create it
> (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more
> than 12 hours (while a similar B-tree index takes a few hours at most), so
> I've interrupted it because "it probably is not going to finish in a
> reasonable time". Some slides I found on the spgist index allude that both
> build time and size are not really suitable for this purpose.


I've found that hash btree indexes tend to perform well in these situations:

CREATE INDEX ON t USING btree (hashtext(fieldname));

However, you'll have to modify your queries to query for both, the
hashtext and the text itself:

SELECT * FROM t WHERE hashtext(fieldname) = hashtext('blabla') AND
fieldname = 'blabla';


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 9.6 query slower than 9.5.3

2016-06-16 Thread Adam Brusselback
Alright with that off I get:

'Nested Loop Anti Join  (cost=25.76..21210.81 rows=16684 width=106) (actual
time=0.688..249.585 rows=26994 loops=1)'
'  ->  Hash Join  (cost=25.34..7716.95 rows=21906 width=106) (actual
time=0.671..124.663 rows=28467 loops=1)'
'Hash Cond: (cp.claim_id = x.claim_id)'
'->  Seq Scan on claim_product cp  (cost=0.00..6714.76 rows=202076
width=106) (actual time=0.016..55.230 rows=202076 loops=1)'
'->  Hash  (cost=14.04..14.04 rows=904 width=16) (actual
time=0.484..0.484 rows=904 loops=1)'
'  Buckets: 1024  Batches: 1  Memory Usage: 51kB'
'  ->  Seq Scan on _claims_to_process x  (cost=0.00..14.04
rows=904 width=16) (actual time=0.013..0.235 rows=904 loops=1)'
'  ->  Index Only Scan using claim_product_reason_code_active_range_idx on
claim_product_reason_code r  (cost=0.42..0.61 rows=1 width=16) (actual
time=0.004..0.004 rows=0 loops=28467)'
'Index Cond: (claim_product_id = cp.claim_product_id)'
'Filter: (claim_reason_type = ANY
('{REJECT,OVERRIDE,RECALC}'::enum.claim_reason_type[]))'
'Rows Removed by Filter: 1'
'Heap Fetches: 27031'
'Planning time: 0.984 ms'
'Execution time: 253.976 ms'

Way better.


Re: [PERFORM] 9.6 query slower than 9.5.3

2016-06-16 Thread Tom Lane
Adam Brusselback  writes:
> Gah, hit send too soon...

Hm, definitely a lot of foreign keys in there.  Do the estimates get
better (or at least closer to 9.5) if you do
"set enable_fkey_estimates = off"?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 9.6 query slower than 9.5.3

2016-06-16 Thread Adam Brusselback
Gah, hit send too soon...

CREATE TEMPORARY TABLE _claims_to_process ( claim_id uuid, starting_state
enum.claim_state );

CREATE TABLE claim_product
(
  claim_product_id uuid NOT NULL DEFAULT gen_random_uuid(),
  claim_id uuid NOT NULL,
  product_id uuid NOT NULL,
  uom_type_id uuid NOT NULL,
  rebate_requested_quantity numeric NOT NULL,
  rebate_requested_rate numeric NOT NULL,
  rebate_allowed_quantity numeric NOT NULL,
  rebate_allowed_rate numeric NOT NULL,
  distributor_company_id uuid,
  location_company_id uuid,
  contract_item_id uuid,
  claimant_contract_name character varying, -- NOT SOURCE OF TRUTH; Client
defined. - Yesod
  resolve_date date NOT NULL, -- FIXME: TENTATIVE NAME; Does not mean
contract_item_id resolve date. - Yesod
  rebate_calculated_rate numeric NOT NULL,
  CONSTRAINT claim_product_pkey PRIMARY KEY (claim_product_id),
  CONSTRAINT claim_product_claim_id_fkey FOREIGN KEY (claim_id)
  REFERENCES claim (claim_id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT claim_product_contract_item_id_fkey FOREIGN KEY
(contract_item_id)
  REFERENCES contract_item (contract_item_id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT claim_product_distributor_company_id_fkey FOREIGN KEY
(distributor_company_id)
  REFERENCES company (company_id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT claim_product_location_company_id_fkey FOREIGN KEY
(location_company_id)
  REFERENCES company (company_id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT claim_product_product_id_fkey FOREIGN KEY (product_id)
  REFERENCES product (product_id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT claim_product_uom_type_id_fkey FOREIGN KEY (uom_type_id)
  REFERENCES uom_type (uom_type_id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);
ALTER TABLE claim_product
  OWNER TO root;
GRANT ALL ON TABLE claim_product TO root;
COMMENT ON COLUMN claim_product.claimant_contract_name IS 'NOT SOURCE OF
TRUTH; Client defined. - Yesod';
COMMENT ON COLUMN claim_product.resolve_date IS 'FIXME: TENTATIVE NAME;
Does not mean contract_item_id resolve date. - Yesod';


-- Index: idx_claim_product_claim_id

-- DROP INDEX idx_claim_product_claim_id;

CREATE INDEX idx_claim_product_claim_id
  ON claim_product
  USING btree
  (claim_id);

-- Index: idx_claim_product_contract_item_id

-- DROP INDEX idx_claim_product_contract_item_id;

CREATE INDEX idx_claim_product_contract_item_id
  ON claim_product
  USING btree
  (contract_item_id);


-- Trigger: claim_product_iud_trigger on claim_product

-- DROP TRIGGER claim_product_iud_trigger ON claim_product;

CREATE TRIGGER claim_product_iud_trigger
  AFTER INSERT OR UPDATE OR DELETE
  ON claim_product
  FOR EACH ROW
  EXECUTE PROCEDURE gosimple.claim_product_on_iud();

-- Trigger: claim_product_statement_trigger on claim_product

-- DROP TRIGGER claim_product_statement_trigger ON claim_product;

CREATE TRIGGER claim_product_statement_trigger
  AFTER INSERT OR UPDATE OR DELETE
  ON claim_product
  FOR EACH STATEMENT
  EXECUTE PROCEDURE gosimple.claim_product_statement_refresh_trigger();

CREATE TABLE claim_product_reason_code
(
  claim_product_reason_code_id uuid NOT NULL DEFAULT gen_random_uuid(),
  claim_product_id uuid NOT NULL,
  claim_reason_type enum.claim_reason_type NOT NULL,
  claim_reason_code enum.claim_reason_code NOT NULL,
  claim_reason_note character varying,
  active_range tstzrange NOT NULL DEFAULT tstzrange(now(), NULL::timestamp
with time zone),
  CONSTRAINT claim_product_reason_code_pkey PRIMARY KEY
(claim_product_reason_code_id),
  CONSTRAINT claim_product_reason_code_claim_product_id_fkey FOREIGN KEY
(claim_product_id)
  REFERENCES claim_product (claim_product_id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT claim_product_reason_code_active_range_excl EXCLUDE
  USING gist (gosimple.uuid_to_bytea(claim_product_id) WITH =,
gosimple.enum_to_oid('enum'::text, 'claim_reason_type'::text,
claim_reason_type) WITH =, gosimple.enum_to_oid('enum'::text,
'claim_reason_code'::text, claim_reason_code) WITH =, active_range WITH &&),
  CONSTRAINT claim_product_reason_code_excl EXCLUDE
  USING gist (gosimple.uuid_to_bytea(claim_product_id) WITH =, (
CASE
WHEN upper(active_range) IS NULL THEN 'infinity'::text
ELSE NULL::text
END) WITH =, gosimple.enum_to_oid('enum'::text, 'claim_reason_type'::text,
claim_reason_type) WITH <>),
  CONSTRAINT claim_product_reason_code_unique UNIQUE (claim_product_id,
claim_reason_type, claim_reason_code, active_range)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE claim_product_reason_code
  OWNER TO root;
GRANT ALL ON TABLE claim_product_reason_code TO root;

-- Index: claim_product_reason_code_active_range_idx

-- DROP INDEX claim_product_reason_code_active_range_idx;

CREATE INDEX claim_product_reason_code_active_range_idx
  ON 

Re: [PERFORM] 9.6 query slower than 9.5.3

2016-06-16 Thread Adam Brusselback
I analyzed all tables involved after loading, and also while trying to
diagnose this issue.

I have the same statistics target settings on both servers.

Here are the schemas for the tables:

On Thu, Jun 16, 2016 at 10:04 PM, Tom Lane  wrote:

> Adam Brusselback  writes:
> > Hey all, testing out 9.6 beta 1 right now on Debian 8.5.
> > I have a query that is much slower on 9.6 than 9.5.3.
>
> The rowcount estimates in 9.6 seem way off.  Did you ANALYZE the tables
> after loading them into 9.6?  Maybe you forgot some statistics target
> settings?
>
> If it's not that, I wonder whether the misestimates are connected to the
> foreign-key-based estimation feature.  Are there any FKs on the tables
> involved?  May we see the table schemas?
>
> regards, tom lane
>


Re: [PERFORM] 9.6 query slower than 9.5.3

2016-06-16 Thread Tom Lane
Adam Brusselback  writes:
> Hey all, testing out 9.6 beta 1 right now on Debian 8.5.
> I have a query that is much slower on 9.6 than 9.5.3.

The rowcount estimates in 9.6 seem way off.  Did you ANALYZE the tables
after loading them into 9.6?  Maybe you forgot some statistics target
settings?

If it's not that, I wonder whether the misestimates are connected to the
foreign-key-based estimation feature.  Are there any FKs on the tables
involved?  May we see the table schemas?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] 9.6 query slower than 9.5.3

2016-06-16 Thread Adam Brusselback
Hey all, testing out 9.6 beta 1 right now on Debian 8.5.

I have a query that is much slower on 9.6 than 9.5.3.

As a side note, when I explain analyze instead of just executing the query
it takes more than 2x as long to run. I have tried looking for info on that
online but have not found any.  Anyone know the reason for that?

The data is very close between the two servers, one is my production system
so the only difference is slightly more added today since I set up the 9.6
server last night.

The query in question is here:
SELECT cp.claim_id
, cp.claim_product_id
, cp.product_id
, cp.uom_type_id
, cp.rebate_requested_quantity
, cp.rebate_requested_rate
, cp.rebate_allowed_quantity
, cp.rebate_allowed_rate
, cp.distributor_company_id
, cp.resolve_date
FROM claim_product cp
INNER JOIN _claims_to_process x
ON cp.claim_id = x.claim_id
WHERE NOT EXISTS (
SELECT 1
FROM claim_product_reason_code r
WHERE r.claim_product_id = cp.claim_product_id
AND r.claim_reason_type = ANY (ARRAY['REJECT'::enum.claim_reason_type,
'OVERRIDE'::enum.claim_reason_type, 'RECALC'::enum.claim_reason_type])
AND upper_inf(r.active_range)
);

The query plan on 9.6 is here (disabled parallelism):
'Nested Loop  (cost=17574.63..30834.02 rows=1 width=106) (actual
time=241.934..40332.190 rows=26994 loops=1)'
'  Join Filter: (cp.claim_id = x.claim_id)'
'  Rows Removed by Join Filter: 92335590'
'  ->  Hash Anti Join  (cost=17574.63..30808.68 rows=1 width=106) (actual
time=173.742..586.805 rows=102171 loops=1)'
'Hash Cond: (cp.claim_product_id = r.claim_product_id)'
'->  Seq Scan on claim_product cp  (cost=0.00..6714.76 rows=202076
width=106) (actual time=0.028..183.376 rows=202076 loops=1)'
'->  Hash  (cost=16972.49..16972.49 rows=48171 width=16) (actual
time=173.436..173.436 rows=99905 loops=1)'
'  Buckets: 131072 (originally 65536)  Batches: 1 (originally
1)  Memory Usage: 5708kB'
'  ->  Bitmap Heap Scan on claim_product_reason_code r
 (cost=4398.71..16972.49 rows=48171 width=16) (actual time=25.278..127.540
rows=99905 loops=1)'
'Recheck Cond: ((claim_reason_type = ANY
('{REJECT,OVERRIDE,RECALC}'::enum.claim_reason_type[])) AND
upper_inf(active_range))'
'Heap Blocks: exact=10067'
'->  Bitmap Index Scan on
claim_product_reason_code_active_range_idx  (cost=0.00..4386.67 rows=48171
width=0) (actual time=23.174..23.174 rows=99905 loops=1)'
'  Index Cond: (claim_reason_type = ANY
('{REJECT,OVERRIDE,RECALC}'::enum.claim_reason_type[]))'
'  ->  Seq Scan on _claims_to_process x  (cost=0.00..14.04 rows=904
width=16) (actual time=0.005..0.182 rows=904 loops=102171)'
'Planning time: 1.934 ms'
'Execution time: 40337.858 ms'

The 9.5.3 plan is here:
'Hash Anti Join  (cost=19884.53..39281.57 rows=30681 width=106) (actual
time=848.791..978.036 rows=27354 loops=1)'
'  Hash Cond: (cp.claim_product_id = r.claim_product_id)'
'  ->  Nested Loop  (cost=0.42..17990.36 rows=41140 width=106) (actual
time=0.132..106.333 rows=28775 loops=1)'
'->  Seq Scan on _claims_to_process x  (cost=0.00..27.00 rows=1700
width=16) (actual time=0.037..0.465 rows=923 loops=1)'
'->  Index Scan using idx_claim_product_claim_id on claim_product
cp  (cost=0.42..10.33 rows=24 width=106) (actual time=0.015..0.093 rows=31
loops=923)'
'  Index Cond: (claim_id = x.claim_id)'
'  ->  Hash  (cost=19239.13..19239.13 rows=51599 width=16) (actual
time=848.263..848.263 rows=100024 loops=1)'
'Buckets: 131072 (originally 65536)  Batches: 1 (originally 1)
 Memory Usage: 5713kB'
'->  Bitmap Heap Scan on claim_product_reason_code r
 (cost=6240.64..19239.13 rows=51599 width=16) (actual time=31.505..782.799
rows=100024 loops=1)'
'  Recheck Cond: ((claim_reason_type = ANY
('{REJECT,OVERRIDE,RECALC}'::enum.claim_reason_type[])) AND
upper_inf(active_range))'
'  Heap Blocks: exact=6261'
'  ->  Bitmap Index Scan on
claim_product_reason_code_active_range_idx  (cost=0.00..6227.74 rows=51599
width=0) (actual time=30.231..30.231 rows=100051 loops=1)'
'Index Cond: (claim_reason_type = ANY
('{REJECT,OVERRIDE,RECALC}'::enum.claim_reason_type[]))'
'Planning time: 1.691 ms'
'Execution time: 982.667 ms'


Just for fun I set enable_nestloop=false on 9.6 and this is the plan I get:
'Hash Join  (cost=17599.97..30834.04 rows=1 width=106) (actual
time=108.892..349.885 rows=26994 loops=1)'
'  Hash Cond: (cp.claim_id = x.claim_id)'
'  ->  Hash Anti Join  (cost=17574.63..30808.68 rows=1 width=106) (actual
time=107.464..316.527 rows=102171 loops=1)'
'Hash Cond: (cp.claim_product_id = r.claim_product_id)'
'->  Seq Scan on claim_product cp  (cost=0.00..6714.76 rows=202076
width=106) (actual time=0.011..61.230 rows=202076 loops=1)'
'->  Hash  (cost=16972.49..16972.49 rows=48171 width=16) (actual
time=107.315..107.315 rows=99905 loops=1)'
'  Buckets: 131072 (originally 

Re: [PERFORM] Index not used

2016-06-16 Thread David G. Johnston
On Thu, Jun 16, 2016 at 11:05 AM, Tom Lane  wrote:

> meike.talb...@women-at-work.org writes:
> > When I query this through pgsql, the queries are fast as expected.
> > select * from push_topic where guid =
> 'DD748CCD-B8A4-3B9F-8F60-67F1F673CFE5'
> > Index Scan using push_topic_idx_topicguid on push_topic
> (cost=0.42..8.44 rows=1 width=103) (actual time=0.117..0.121 rows=1 loops=1)
> >   Index Cond: ((guid)::bpchar =
> 'DD748CCD-B8A4-3B9F-8F60-67F1F673CFE5'::bpchar)
> >   Buffers: shared hit=3 read=1
> > Total runtime: 0.191 ms
>
> > However when I run the exact query through a different application
> (CodeSynthesis ORM) the query is very slow (~ 115ms logged)
> > I noted this is due to a sequential scan happening on the table instead
> of an index scan.
>
> It looks like what that app is actually issuing is something different
> from what you tested by hand, to wit
>
> select * from push_topic where guid =
> 'DD748CCD-B8A4-3B9F-8F60-67F1F673CFE5'::text
>
> which causes the comparison to be resolved as texteq not bpchareq, ie you
> effectively have
>
> select * from push_topic where guid::text =
> 'DD748CCD-B8A4-3B9F-8F60-67F1F673CFE5'::text
>
> and that doesn't match a bpchar index.  If you can't persuade the app to
> label the comparison value as bpchar not text, the easiest fix would be
> to create an additional index on "guid::text".
>

​Or, better, persuade the app to label the value "
​
public.push_guid
​" since that is the column's type​...a type you haven't defined for us.
If you get to add explicit casts this should be easy...but I'm not familiar
with the framework you are using.

David J.


Re: [PERFORM] Index not used

2016-06-16 Thread Tom Lane
meike.talb...@women-at-work.org writes:
> When I query this through pgsql, the queries are fast as expected.
> select * from push_topic where guid = 'DD748CCD-B8A4-3B9F-8F60-67F1F673CFE5'
> Index Scan using push_topic_idx_topicguid on push_topic  (cost=0.42..8.44 
> rows=1 width=103) (actual time=0.117..0.121 rows=1 loops=1)
>   Index Cond: ((guid)::bpchar = 
> 'DD748CCD-B8A4-3B9F-8F60-67F1F673CFE5'::bpchar)
>   Buffers: shared hit=3 read=1
> Total runtime: 0.191 ms

> However when I run the exact query through a different application 
> (CodeSynthesis ORM) the query is very slow (~ 115ms logged)
> I noted this is due to a sequential scan happening on the table instead of an 
> index scan.

It looks like what that app is actually issuing is something different
from what you tested by hand, to wit

select * from push_topic where guid = 
'DD748CCD-B8A4-3B9F-8F60-67F1F673CFE5'::text

which causes the comparison to be resolved as texteq not bpchareq, ie you
effectively have

select * from push_topic where guid::text = 
'DD748CCD-B8A4-3B9F-8F60-67F1F673CFE5'::text

and that doesn't match a bpchar index.  If you can't persuade the app to
label the comparison value as bpchar not text, the easiest fix would be
to create an additional index on "guid::text".

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index not used

2016-06-16 Thread John Gorman
When you run psql, are you running that on the application server or the 
database server? Does the application run on the same server as the database 
and how is the application connecting to the database (JDBC, ODBC, etc)?

In other words is there a difference in network time between the 2?

Also the queries are not exactly the same. With psql you use "select *" and the 
application specifies what columns it wants returned and the order to return 
them. Try running the exact query on both.

Regards
John
 
-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of 
meike.talb...@women-at-work.org
Sent: Thursday, June 16, 2016 12:59 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Index not used

Hello,
 
I've a basic table with about 100K rows:
 


CREATE TABLE "public"."push_topic" (
 "id" Serial PRIMARY KEY,
 "guid" public.push_guid NOT NULL,
 "authenticatorsending" Varchar(32) NOT NULL,
 "authenticatorsubscription" Varchar(32) NOT NULL,
 "countpushed" Integer NOT NULL,
 "datecreated" timestamp NOT NULL,
 "datelastpush" timestamp
)
CREATE UNIQUE INDEX push_topic_idx_topicguid ON push_topic
  USING btree (guid)


 
When I query this through pgsql, the queries are fast as expected.

This is the query:

select * from push_topic where guid = 'DD748CCD-B8A4-3B9F-8F60-67F1F673CFE5'

And the plan:



Index Scan using push_topic_idx_topicguid on push_topic  (cost=0.42..8.44 
rows=1 width=103) (actual time=0.117..0.121 rows=1 loops=1)
  Index Cond: ((guid)::bpchar = 'DD748CCD-B8A4-3B9F-8F60-67F1F673CFE5'::bpchar)
  Buffers: shared hit=3 read=1
Total runtime: 0.191 ms



However when I run the exact query through a different application 
(CodeSynthesis ORM) the query is very slow (~ 115ms logged)
I noted this is due to a sequential scan happening on the table instead of an 
index scan.

This is query plan in the log file:



LOG:  plan:
DETAIL: {PLANNEDSTMT 
   :commandType 1 
   :queryId 0 
   :hasReturning false 
   :hasModifyingCTE false 
   :canSetTag true 
   :transientPlan false 
   :planTree 
  {SEQSCAN 
  :startup_cost 0.00 
  :total_cost 2877.58 
  :plan_rows 429 
  :plan_width 103 
  :targetlist (
 {TARGETENTRY 
 :expr 
{VAR 
:varno 1 
:varattno 1 
:vartype 23 
:vartypmod -1 
:varcollid 0 
:varlevelsup 0 
:varnoold 1 
:varoattno 1 
:location 7
}
 :resno 1 
 :resname id 
 :ressortgroupref 0 
 :resorigtbl 16393 
 :resorigcol 1 
 :resjunk false
 }
 {TARGETENTRY 
 :expr 
{VAR 
:varno 1 
:varattno 2 
:vartype 16385 
:vartypmod -1 
:varcollid 100 
:varlevelsup 0 
:varnoold 1 
:varoattno 2 
:location 26
}
 :resno 2 
 :resname guid 
 :ressortgroupref 0 
 :resorigtbl 16393 
 :resorigcol 2 
 :resjunk false
 }
 {TARGETENTRY 
 :expr 
{VAR 
:varno 1 
:varattno 3 
:vartype 1043 
:vartypmod 36 
:varcollid 100 
:varlevelsup 0 
:varnoold 1 
:varoattno 3 
:location 47
}
 :resno 3 
 :resname authenticatorsending 
 :ressortgroupref 0 
 :resorigtbl 16393 
 :resorigcol 3 
 :resjunk false
 }
 {TARGETENTRY 
 :expr 
{VAR 
:varno 1 
:varattno 4 
:vartype 1043 
:vartypmod 36 
:varcollid 100 
:varlevelsup 0 
:varnoold 1 
:varoattno 4 
:location 84
}
 :resno 4 
 :resname authenticatorsubscription 
 :ressortgroupref 0 
 :resorigtbl 16393 
 :resorigcol 4 
 :resjunk false
 }
 {TARGETENTRY 
 :expr 
{VAR 
:varno 1 
  

Re: [PERFORM] Many-to-many performance problem

2016-06-16 Thread Rowan Seymour
When you create an Postgres RDS instance, it's comes with a
"default.postgres9.3" parameter group which contains substitutions based on
the server size. The defaults for the memory related settings are:

effective_cache_size = {DBInstanceClassMemory/16384}
maintenance_work_mem = GREATEST({DBInstanceClassMemory/63963136*1024},65536)
shared_buffers = {DBInstanceClassMemory/32768}
temp_buffers = 
work_mem = 

According to
http://www.davidmkerr.com/2013/11/tune-your-postgres-rds-instance-via.html,
the units for effective_cache_size on AWS RDS, are 8kb blocks (am not sure
why this is...), so DBInstanceClassMemory/16384 = DBInstanceClassMemory/(2
* 8kb) = 50% of system memory.

We upgraded the server over the weekend which doubled the system memory and
increased the available IOPS, and that appears to have greatly improved the
situation, but there have still been a few timeouts. I'm wondering now if
activity on the other database in this instance doesn't occasionally push
our indexes out of memory.

Thanks, Rowan

On 10 June 2016 at 18:11, Alex Ignatov  wrote:

>
> On 10.06.2016 16:04, Rowan Seymour wrote:
>
> In our Django app we have messages (currently about 7 million in table
> msgs_message) and labels (about 300), and a join table to associate
> messages with labels (about 500,000 in msgs_message_labels). Not sure
> you'll need them, but here are the relevant table schemas:
>
> CREATE TABLE msgs_message
> (
> id INTEGER PRIMARY KEY NOT NULL,
> type VARCHAR NOT NULL,
> text TEXT NOT NULL,
> is_archived BOOLEAN NOT NULL,
> created_on TIMESTAMP WITH TIME ZONE NOT NULL,
> contact_id INTEGER NOT NULL,
> org_id INTEGER NOT NULL,
> case_id INTEGER,
> backend_id INTEGER NOT NULL,
> is_handled BOOLEAN NOT NULL,
> is_flagged BOOLEAN NOT NULL,
> is_active BOOLEAN NOT NULL,
> has_labels BOOLEAN NOT NULL,
> CONSTRAINT
> msgs_message_contact_id_5c8e3f216c115643_fk_contacts_contact_id FOREIGN KEY
> (contact_id) REFERENCES contacts_contact (id),
> CONSTRAINT msgs_message_org_id_81a0adfcc99151d_fk_orgs_org_id FOREIGN
> KEY (org_id) REFERENCES orgs_org (id),
> CONSTRAINT msgs_message_case_id_51998150f9629c_fk_cases_case_id
> FOREIGN KEY (case_id) REFERENCES cases_case (id)
> );
> CREATE UNIQUE INDEX msgs_message_backend_id_key ON msgs_message
> (backend_id);
> CREATE INDEX msgs_message_6d82f13d ON msgs_message (contact_id);
> CREATE INDEX msgs_message_9cf869aa ON msgs_message (org_id);
> CREATE INDEX msgs_message_7f12ca67 ON msgs_message (case_id);
>
> CREATE TABLE msgs_message_labels
> (
> id INTEGER PRIMARY KEY NOT NULL,
> message_id INTEGER NOT NULL,
> label_id INTEGER NOT NULL,
> CONSTRAINT
> msgs_message_lab_message_id_1dfa44628fe448dd_fk_msgs_message_id FOREIGN KEY
> (message_id) REFERENCES msgs_message (id),
> CONSTRAINT
> msgs_message_labels_label_id_77cbdebd8d255b7a_fk_msgs_label_id FOREIGN KEY
> (label_id) REFERENCES msgs_label (id)
> );
> CREATE UNIQUE INDEX msgs_message_labels_message_id_label_id_key ON
> msgs_message_labels (message_id, label_id);
> CREATE INDEX msgs_message_labels_4ccaa172 ON msgs_message_labels
> (message_id);
> CREATE INDEX msgs_message_labels_abec2aca ON msgs_message_labels
> (label_id);
>
> Users can search for messages, and they are returned page by page in
> reverse chronological order. There are several partial multi-column indexes
> on the message table, but the one used for the example queries below is
>
> CREATE INDEX msgs_inbox ON msgs_message(org_id, created_on DESC)
> WHERE is_active = TRUE AND is_handled = TRUE AND is_archived = FALSE AND
> has_labels = TRUE;
>
> So a typical query for the latest page of messages looks like (
> https://explain.depesz.com/s/G9ew):
>
> SELECT "msgs_message".*
> FROM "msgs_message"
> WHERE ("msgs_message"."org_id" = 7
> AND "msgs_message"."is_active" = true
> AND "msgs_message"."is_handled" = true
> AND "msgs_message"."has_labels" = true
> AND "msgs_message"."is_archived" = false
> AND "msgs_message"."created_on" < '2016-06-10T07:11:06.381000
> +00:00'::timestamptz
> ) ORDER BY "msgs_message"."created_on" DESC LIMIT 50
>
> But users can also search for messages that have one or more labels,
> leading to queries that look like:
>
> SELECT DISTINCT "msgs_message".*
> FROM "msgs_message"
> INNER JOIN "msgs_message_labels" ON ( "msgs_message"."id" =
> "msgs_message_labels"."message_id" )
> WHERE ("msgs_message"."org_id" = 7
> AND "msgs_message"."is_active" = true
> AND "msgs_message"."is_handled" = true
> AND "msgs_message_labels"."label_id" IN (127, 128, 135, 136, 137, 138,
> 140, 141, 143, 144)
> AND "msgs_message"."has_labels" = true
> AND "msgs_message"."is_archived" = false
> AND "msgs_message"."created_on" < '2016-06-10T07:11:06.381000
> +00:00'::timestamptz
> ) ORDER BY "msgs_message"."created_on" DESC LIMIT 50
>
> Most of time, this query performs like 
> 

[PERFORM] Index not used

2016-06-16 Thread meike . talbach
Hello,
 
I've a basic table with about 100K rows:
 


CREATE TABLE "public"."push_topic" (
 "id" Serial PRIMARY KEY,
 "guid" public.push_guid NOT NULL,
 "authenticatorsending" Varchar(32) NOT NULL,
 "authenticatorsubscription" Varchar(32) NOT NULL,
 "countpushed" Integer NOT NULL,
 "datecreated" timestamp NOT NULL,
 "datelastpush" timestamp
)
CREATE UNIQUE INDEX push_topic_idx_topicguid ON push_topic
  USING btree (guid)


 
When I query this through pgsql, the queries are fast as expected.

This is the query:

select * from push_topic where guid = 'DD748CCD-B8A4-3B9F-8F60-67F1F673CFE5'

And the plan:



Index Scan using push_topic_idx_topicguid on push_topic  (cost=0.42..8.44 
rows=1 width=103) (actual time=0.117..0.121 rows=1 loops=1)
  Index Cond: ((guid)::bpchar = 'DD748CCD-B8A4-3B9F-8F60-67F1F673CFE5'::bpchar)
  Buffers: shared hit=3 read=1
Total runtime: 0.191 ms



However when I run the exact query through a different application 
(CodeSynthesis ORM) the query is very slow (~ 115ms logged)
I noted this is due to a sequential scan happening on the table instead of an 
index scan.

This is query plan in the log file:



LOG:  plan:
DETAIL: {PLANNEDSTMT 
   :commandType 1 
   :queryId 0 
   :hasReturning false 
   :hasModifyingCTE false 
   :canSetTag true 
   :transientPlan false 
   :planTree 
  {SEQSCAN 
  :startup_cost 0.00 
  :total_cost 2877.58 
  :plan_rows 429 
  :plan_width 103 
  :targetlist (
 {TARGETENTRY 
 :expr 
{VAR 
:varno 1 
:varattno 1 
:vartype 23 
:vartypmod -1 
:varcollid 0 
:varlevelsup 0 
:varnoold 1 
:varoattno 1 
:location 7
}
 :resno 1 
 :resname id 
 :ressortgroupref 0 
 :resorigtbl 16393 
 :resorigcol 1 
 :resjunk false
 }
 {TARGETENTRY 
 :expr 
{VAR 
:varno 1 
:varattno 2 
:vartype 16385 
:vartypmod -1 
:varcollid 100 
:varlevelsup 0 
:varnoold 1 
:varoattno 2 
:location 26
}
 :resno 2 
 :resname guid 
 :ressortgroupref 0 
 :resorigtbl 16393 
 :resorigcol 2 
 :resjunk false
 }
 {TARGETENTRY 
 :expr 
{VAR 
:varno 1 
:varattno 3 
:vartype 1043 
:vartypmod 36 
:varcollid 100 
:varlevelsup 0 
:varnoold 1 
:varoattno 3 
:location 47
}
 :resno 3 
 :resname authenticatorsending 
 :ressortgroupref 0 
 :resorigtbl 16393 
 :resorigcol 3 
 :resjunk false
 }
 {TARGETENTRY 
 :expr 
{VAR 
:varno 1 
:varattno 4 
:vartype 1043 
:vartypmod 36 
:varcollid 100 
:varlevelsup 0 
:varnoold 1 
:varoattno 4 
:location 84
}
 :resno 4 
 :resname authenticatorsubscription 
 :ressortgroupref 0 
 :resorigtbl 16393 
 :resorigcol 4 
 :resjunk false
 }
 {TARGETENTRY 
 :expr 
{VAR 
:varno 1 
:varattno 5 
:vartype 23 
:vartypmod -1 
:varcollid 0 
:varlevelsup 0 
:varnoold 1 
:varoattno 5 
:location 126
}
 :resno 5 
 :resname countpushed 
 :ressortgroupref 0 
 :resorigtbl 16393 
 :resorigcol 5 
 :resjunk false
 }
 {TARGETENTRY 
 :expr 
{VAR 
:varno 1 
:varattno 6 
:vartype 1114 
:vartypmod -1 
:varcollid 0 
:varlevelsup 0