Re: [GENERAL] Queries on very big table

junior Fri, 06 Jan 2017 10:48:07 -0800

Hi,

It is better to use a data warehouse software with columnar storage(clickhouse, 
greenplum etc) for BI queries on large datasets but data offloading could be a 
complicated task. It is possible to try tune postgres and the environment it 
works in to process query such as yours faster. You should increase OS 
pagecache size adding RAM and tune readahead buffer size of block devices if 
you use linux.


02.01.2017, 14:29, "Job" <j...@colliniconsulting.it>:
> Hello guys and very good new year to everybody!
>
> We are now approaching some queries and statistics on very big table (about 
> 180 millions of record).
> The table is partitioned by day (about ~3 Gb of data for every partition/day).
> We use Postgresql 9.6.1
>
> I am experiencing quite important slowdown on queries.
> I manually made a "vacuum full" and a "reindex" on every partition in order 
> to clean free space and reorder records.
>
> I have a BRIN index on timestamp and index on other field (btree)
>
> Starting by a simple query: explain analyze select count(domain) from 
> webtraffic_archive:
>
>                                                                               
>      QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Finalize Aggregate (cost=3220451.94..3220451.95 rows=1 width=8) (actual 
> time=36912.624..36912.624 rows=1 loops=1)
>    -> Gather (cost=3220451.52..3220451.93 rows=4 width=8) (actual 
> time=36911.600..36912.614 rows=5 loops=1)
>          Workers Planned: 4
>          Workers Launched: 4
>          -> Partial Aggregate (cost=3219451.52..3219451.53 rows=1 width=8) 
> (actual time=36906.804..36906.804 rows=1 loops=5)
>                -> Append (cost=0.00..3094635.41 rows=49926443 width=0) 
> (actual time=4.716..31331.229 rows=39853988 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive 
> (cost=0.00..0.00 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_01_01 (cost=0.00..10.47 rows=47 width=0) (actual 
> time=0.000..0.000 rows=0 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_16 (cost=0.00..213728.26 rows=3498026 width=0) 
> (actual time=4.713..2703.458 rows=2798421 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_17 (cost=0.00..201379.39 rows=3247739 width=0) 
> (actual time=6.334..2364.726 rows=2598191 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_18 (cost=0.00..176248.86 rows=2824986 width=0) 
> (actual time=7.437..2014.812 rows=2259989 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_19 (cost=0.00..177493.33 rows=2866433 width=0) 
> (actual time=9.951..2145.958 rows=2293146 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_20 (cost=0.00..120271.83 rows=1960883 width=0) 
> (actual time=0.011..372.092 rows=1568706 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_21 (cost=0.00..276391.94 rows=4485294 width=0) 
> (actual time=5.386..3111.589 rows=3588235 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_22 (cost=0.00..287611.68 rows=4630668 width=0) 
> (actual time=6.598..3335.834 rows=3704535 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_23 (cost=0.00..249047.61 rows=4014361 width=0) 
> (actual time=7.206..2628.884 rows=3211489 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_24 (cost=0.00..192008.70 rows=3097370 width=0) 
> (actual time=9.870..1882.826 rows=2477896 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_25 (cost=0.00..87385.16 rows=1405616 width=0) 
> (actual time=0.018..427.248 rows=1124493 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_26 (cost=0.00..88262.80 rows=1436080 width=0) 
> (actual time=0.014..277.327 rows=1148864 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_27 (cost=0.00..222607.43 rows=3557243 width=0) 
> (actual time=8.497..1232.210 rows=2845795 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_28 (cost=0.00..210414.76 rows=3365676 width=0) 
> (actual time=0.033..548.878 rows=2692541 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_29 (cost=0.00..185065.72 rows=2955872 width=0) 
> (actual time=0.031..498.079 rows=2364697 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_30 (cost=0.00..149139.55 rows=2382656 width=0) 
> (actual time=0.011..501.351 rows=1906124 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2016_12_31 (cost=0.00..166991.89 rows=2664288 width=0) 
> (actual time=0.041..437.631 rows=2131431 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2017_01_01 (cost=0.00..79197.29 rows=1260930 width=0) 
> (actual time=0.018..254.124 rows=1008744 loops=5)
>                      -> Parallel Seq Scan on 
> webtraffic_archive_day_2017_01_02 (cost=0.00..11378.74 rows=272274 width=0) 
> (actual time=0.017..34.352 rows=130691 loops=5)
>  Planning time: 313.907 ms
>  Execution time: 36941.700 ms
>
> Other more complex queries are slower.
>
> How can i improve it?
> Records number can raise up until 1.000 millions.
> Do i need a third-part tool for big data?
>
> THANK YOU!
> /F
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Queries on very big table

Reply via email to