Re: Query slow for new participants

2019-02-26 Thread Laurenz Albe
supp...@mekong.be wrote:
> EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN
> "Index Scan using ix_companyarticledb_company on companyarticledb  
> (cost=0.57..2.80 rows=1 width=193) (actual time=1011.335..1011.454 rows=1 
> loops=1)"
> "  Index Cond: (companyid = 77)"
> "  Filter: (articleid = 7869071)"
> "  Rows Removed by Filter: 2674361"
> "  Buffers: shared hit=30287"
> "Planning time: 0.220 ms"
> "Execution time: 1011.502 ms"

Your problem are the "Rows Removed by Filter: 2674361".

The first thing I would try is:

  ALTER TABLE public.companyarticledb
 ALTER companyid SET STATISTICS 1000;

  ALTER TABLE public.companyarticledb
 ALTER articleid SET STATISTICS 1000;

  ANALYZE public.companyarticledb;

Then PostgreSQL has a better idea which condition is selective.

You can set STATISTICS up to 1, but don't forget that high values
make ANALYZE and planning slower.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Query slow for new participants

2019-02-26 Thread Michael Lewis
> Indexes:
> "pk_pricedb" PRIMARY KEY, btree (companyid, articleid)
> "EnabledIndex" btree (enabled)
> "ix_companyarticledb_article" btree (articleid)
> "ix_companyarticledb_company" btree (companyid)
>

I'd say drop ix_companyarticledb_company since pk_pricedb can be used
instead even if other queries are only on companyid field, and it will be
faster for this case certainly since it targets the row you want directly
from the index without the *"Rows Removed by Filter: 2674361"*

I doubt the default_statistics_target = 100 default is doing you any
favors. You may want to try increasing that to 500 or 1000 if you can
afford a small increase in planning cost and more storage for the bigger
sampling of stats.


Re: Query slow for new participants

2019-02-26 Thread supp...@mekong.be
Hello,

Things to Try Before You Post
-> I went through these steps and they did not bring any difference.


Information You Need To Include
Postgres version
"PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian
6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit"

Full Table and Index Schema
The difference is very bad for the new company,  even on the simplest query

   SELECT * FROM CompanyArticleDB
 WHERE CompanyId = '77'
 AND ArticleId= '7869071'

  Table "public.companyarticledb"
   Column   |Type | Collation |
Nullable | Default
+-+---+--+-
 companyid  | integer |   | not
null |
 articleid  | integer |   | not
null |
 price  | numeric(19,4)   |   |
  |
 contractstartdate  | timestamp without time zone |   |
  |
 contractenddate| timestamp without time zone |   |
  |
 enabled| boolean |   |
  |
 visible| boolean |   |
  |
 sheid  | integer |   |
  |
 inmassbalance  | boolean |   |
  |
 internalwastetype  | character varying(50)   |   |
  |
 buom   | character varying(50)   |   |
  |
 stockunit  | numeric(18,2)   |   |
  |
 priceperbuom   | numeric(19,4)   |   |
  |
 purchaseunit   | numeric(18,2)   |   |
  |
 preventioncounselorid  | integer |   |
  |
 licenseprovided| boolean |   |
  |
 licensevaliduntil  | timestamp without time zone |   |
  |
 authorisationlocationid| integer |   |
  |
 priceagreementreference| character varying(50)   |   |
  |
 interfaceaccountid | integer |   |
  |
 createdon  | timestamp without time zone |   |
  |
 modifiedby | integer |   |
  |
 createdby  | integer |   |
  |
 modifiedon | timestamp without time zone |   |
  |
 createdonsupplier  | timestamp without time zone |   |
  |
 modifiedbysupplier | integer |   |
  |
 createdbysupplier  | integer |   |
  |
 modifiedonsupplier | timestamp without time zone |   |
  |
 newprice   | numeric(19,4)   |   |
  |
 newcontractstartdate   | timestamp without time zone |   |
  |
 newcontractenddate | timestamp without time zone |   |
  |
 newpriceagreementreference | character varying(50)   |   |
  |
 licensereference   | character varying(50)   |   |
  |
 purchasercomment   | character varying(500)  |   |
  |
 reportingunit  | character varying(5)|   |
  |
 articlecode| character varying(50)   |   |
  |
 participantdescription | character varying(500)  |   |
  |
 motivationneeded   | boolean |   |
  |
 photourl   | character varying(500)  |   |
  |
 reviewedshe| boolean |   |
  |
noinspectionuntil  | timestamp without time zone |   |
|
 priority   | boolean |   |
  |
 needschecking  | boolean |   |
  |
 role   | character varying(20)   |   |
  |
Indexes:
"pk_pricedb" PRIMARY KEY, btree (companyid, articleid)
"EnabledIndex" btree (enabled)
"ix_companyarticledb_article" btree (articleid)
"ix_companyarticledb_company" btree (companyid)
"participantarticlecodeindex" btree (articlecode)
"participantdescriptionindex" gin (participantdescription gin_trgm_ops)
Foreign-key constraints:
"fk_companyarticledb_accountsdb" FOREIGN KEY (modifiedby) REFERENCES
accountsdb(id)
"fk_companyarticledb_accountsdb1" FOREIGN KEY (createdby) REFERENCES
accountsdb(id)
"fk_companyarticledb_accountsdb2" FOREIGN KEY (preventioncounselorid)
REFERENCES accountsdb(id)
"fk_companyarticledb_articledb" FOREIGN KEY (articleid) REFERENCES
articledb(id)
"fk_companyarticledb_companydb" FOREIGN 

Re: Aggregate and many LEFT JOIN

2019-02-26 Thread Jeff Janes
On Mon, Feb 25, 2019 at 3:54 AM kimaidou  wrote:


> Wich strikes me is that if I try to simplify it a lot, removing all data
> but the main table (occtax.observation) primary key cd_nom and aggregate,
> the query plan should be able tu use the cd_nom index for sorting and
> provide better query plan (hash aggregate), but it does not seems so :
>

HashAggregate doesn't support aggregates with DISTINCT.  I don't think
there is any reason it can't, it is just that no one has gotten around to
it.

Aggregates with DISTINCT also kill your ability to get parallel queries.

Cheers,

Jeff


Re: Aggregate and many LEFT JOIN

2019-02-26 Thread kimaidou
I manage to avoid the disk sort after performing a VACUUM ANALYSE;
And with a session work_mem = '250MB'

* SQL http://paste.debian.net/1070207/
* EXPLAIN https://explain.depesz.com/s/nJ2y

It stills spent 16s
It seems this kind of query will need better hardware to scale...

Thanks for your help


Le lun. 25 févr. 2019 à 19:30, Michael Lewis  a écrit :

>
>
> On Mon, Feb 25, 2019 at 2:44 AM kimaidou  wrote:
>
>> I have better results with this version. Basically, I run a first query
>> only made for aggregation, and then do a JOIN to get other needed data.
>>
>> * SQL : http://paste.debian.net/1070007/
>> * EXPLAIN: https://explain.depesz.com/s/D0l
>>
>> Not really "fast", but I gained 30%
>>
>
>
> It still seems that disk sort and everything after that is where the query
> plan dies. It seems odd that it went to disk if work_mem was already 250MB.
> Can you allocate more as a test? As an alternative, if this is a frequently
> needed data, can you aggregate this data and keep a summarized copy updated
> periodically?
>