Hello guys and very good new year to everybody!
We are now approaching some queries and statistics on very big table (about 180
millions of record).
The table is partitioned by day (about ~3 Gb of data for every partition/day).
We use Postgresql 9.6.1
I am experiencing quite important slowdown on queries.
I manually made a "vacuum full" and a "reindex" on every partition in order to
clean free space and reorder records.
I have a BRIN index on timestamp and index on other field (btree)
Starting by a simple query: explain analyze select count(domain) from
webtraffic_archive:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=3220451.94..3220451.95 rows=1 width=8) (actual
time=36912.624..36912.624 rows=1 loops=1)
-> Gather (cost=3220451.52..3220451.93 rows=4 width=8) (actual
time=36911.600..36912.614 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial Aggregate (cost=3219451.52..3219451.53 rows=1 width=8)
(actual time=36906.804..36906.804 rows=1 loops=5)
-> Append (cost=0.00..3094635.41 rows=49926443 width=0)
(actual time=4.716..31331.229 rows=39853988 loops=5)
-> Parallel Seq Scan on webtraffic_archive
(cost=0.00..0.00 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_01_01
(cost=0.00..10.47 rows=47 width=0) (actual time=0.000..0.000 rows=0 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_16
(cost=0.00..213728.26 rows=3498026 width=0) (actual time=4.713..2703.458
rows=2798421 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_17
(cost=0.00..201379.39 rows=3247739 width=0) (actual time=6.334..2364.726
rows=2598191 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_18
(cost=0.00..176248.86 rows=2824986 width=0) (actual time=7.437..2014.812
rows=2259989 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_19
(cost=0.00..177493.33 rows=2866433 width=0) (actual time=9.951..2145.958
rows=2293146 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_20
(cost=0.00..120271.83 rows=1960883 width=0) (actual time=0.011..372.092
rows=1568706 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_21
(cost=0.00..276391.94 rows=4485294 width=0) (actual time=5.386..3111.589
rows=3588235 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_22
(cost=0.00..287611.68 rows=4630668 width=0) (actual time=6.598..3335.834
rows=3704535 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_23
(cost=0.00..249047.61 rows=4014361 width=0) (actual time=7.206..2628.884
rows=3211489 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_24
(cost=0.00..192008.70 rows=3097370 width=0) (actual time=9.870..1882.826
rows=2477896 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_25
(cost=0.00..87385.16 rows=1405616 width=0) (actual time=0.018..427.248
rows=1124493 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_26
(cost=0.00..88262.80 rows=1436080 width=0) (actual time=0.014..277.327
rows=1148864 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_27
(cost=0.00..222607.43 rows=3557243 width=0) (actual time=8.497..1232.210
rows=2845795 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_28
(cost=0.00..210414.76 rows=3365676 width=0) (actual time=0.033..548.878
rows=2692541 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_29
(cost=0.00..185065.72 rows=2955872 width=0) (actual time=0.031..498.079
rows=2364697 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_30
(cost=0.00..149139.55 rows=2382656 width=0) (actual time=0.011..501.351
rows=1906124 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2016_12_31
(cost=0.00..166991.89 rows=2664288 width=0) (actual time=0.041..437.631
rows=2131431 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2017_01_01
(cost=0.00..79197.29 rows=1260930 width=0) (actual time=0.018..254.124
rows=1008744 loops=5)
-> Parallel Seq Scan on webtraffic_archive_day_2017_01_02
(cost=0.00..11378.74 rows=272274 width=0) (actual time=0.017..34.352
rows=130691 loops=5)
Planning time: 313.907 ms
Execution time: 36941.700 ms
Other more complex queries are slower.
How can i improve it?
Records number can raise up until 1.000 millions.
Do i need a third-part tool for big data?
THANK YOU!
/F
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general