[GENERAL] Queries on very big table

Job Mon, 02 Jan 2017 03:29:02 -0800

Hello guys and very good new year to everybody!
 
We are now approaching some queries and statistics on very big table (about 180 
millions of record).
The table is partitioned by day (about ~3 Gb of data for every partition/day).
We use Postgresql 9.6.1
 
I am experiencing quite important slowdown on queries.
I manually made a "vacuum full" and a "reindex" on every partition in order to 
clean free space and reorder records.


I have a BRIN index on timestamp and index on other field (btree)
 
Starting by a simple query: explain analyze select count(domain) from 
webtraffic_archive:

                                                                                
   QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=3220451.94..3220451.95 rows=1 width=8) (actual 
time=36912.624..36912.624 rows=1 loops=1)
   ->  Gather  (cost=3220451.52..3220451.93 rows=4 width=8) (actual 
time=36911.600..36912.614 rows=5 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Partial Aggregate  (cost=3219451.52..3219451.53 rows=1 width=8) 
(actual time=36906.804..36906.804 rows=1 loops=5)
               ->  Append  (cost=0.00..3094635.41 rows=49926443 width=0) 
(actual time=4.716..31331.229 rows=39853988 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive  
(cost=0.00..0.00 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_01_01 
 (cost=0.00..10.47 rows=47 width=0) (actual time=0.000..0.000 rows=0 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_16 
 (cost=0.00..213728.26 rows=3498026 width=0) (actual time=4.713..2703.458 
rows=2798421 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_17 
 (cost=0.00..201379.39 rows=3247739 width=0) (actual time=6.334..2364.726 
rows=2598191 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_18 
 (cost=0.00..176248.86 rows=2824986 width=0) (actual time=7.437..2014.812 
rows=2259989 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_19 
 (cost=0.00..177493.33 rows=2866433 width=0) (actual time=9.951..2145.958 
rows=2293146 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_20 
 (cost=0.00..120271.83 rows=1960883 width=0) (actual time=0.011..372.092 
rows=1568706 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_21 
 (cost=0.00..276391.94 rows=4485294 width=0) (actual time=5.386..3111.589 
rows=3588235 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_22 
 (cost=0.00..287611.68 rows=4630668 width=0) (actual time=6.598..3335.834 
rows=3704535 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_23 
 (cost=0.00..249047.61 rows=4014361 width=0) (actual time=7.206..2628.884 
rows=3211489 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_24 
 (cost=0.00..192008.70 rows=3097370 width=0) (actual time=9.870..1882.826 
rows=2477896 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_25 
 (cost=0.00..87385.16 rows=1405616 width=0) (actual time=0.018..427.248 
rows=1124493 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_26 
 (cost=0.00..88262.80 rows=1436080 width=0) (actual time=0.014..277.327 
rows=1148864 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_27 
 (cost=0.00..222607.43 rows=3557243 width=0) (actual time=8.497..1232.210 
rows=2845795 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_28 
 (cost=0.00..210414.76 rows=3365676 width=0) (actual time=0.033..548.878 
rows=2692541 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_29 
 (cost=0.00..185065.72 rows=2955872 width=0) (actual time=0.031..498.079 
rows=2364697 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_30 
 (cost=0.00..149139.55 rows=2382656 width=0) (actual time=0.011..501.351 
rows=1906124 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_31 
 (cost=0.00..166991.89 rows=2664288 width=0) (actual time=0.041..437.631 
rows=2131431 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2017_01_01 
 (cost=0.00..79197.29 rows=1260930 width=0) (actual time=0.018..254.124 
rows=1008744 loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive_day_2017_01_02 
 (cost=0.00..11378.74 rows=272274 width=0) (actual time=0.017..34.352 
rows=130691 loops=5)
 Planning time: 313.907 ms
 Execution time: 36941.700 ms

Other more complex queries are slower.

How can i improve it?
Records number can raise up until 1.000 millions.
Do i need a third-part tool for big data?

THANK YOU!
/F


 
 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Queries on very big table

Reply via email to