SeqScan vs. IndexScan

2018-04-18 Thread Vitaliy Garnashevich

Hi,

I'm running the same query with "set enable_seqscan = on;" and "set 
enable_seqscan = off;":


->  Nested Loop Left Join  (cost=0.00..89642.86 rows=1 width=30) (actual 
time=1.612..6924.232 rows=3289 loops=1)

  Join Filter: (sys_user.user_id = j_6634.id)
  Rows Removed by Join Filter: 14330174
  ->  Seq Scan on sys_user  (cost=0.00..89449.85 rows=1 width=16) 
(actual time=0.117..39.802 rows=3289 loops=1)

    Filter: ...
  ->  Seq Scan on cmn_user j_6634  (cost=0.00..138.56 rows=4356 
width=22) (actual time=0.001..0.973 rows=4358 loops=3289)


(Full plan: https://explain.depesz.com/s/plAO)

->  Nested Loop Left Join  (cost=0.56..89643.52 rows=1 width=30) (actual 
time=0.589..39.674 rows=3288 loops=1)
  ->  Index Scan using sys_user_pkey on sys_user 
(cost=0.28..89635.21 rows=1 width=16) (actual time=0.542..29.435 
rows=3288 loops=1)

    Filter: ...
  ->  Index Scan using cmn_user_pkey on cmn_user j_6634 
(cost=0.28..8.30 rows=1 width=22) (actual time=0.002..0.002 rows=1 
loops=3288)

    Index Cond: (sys_user.user_id = id)

(Full plan: https://explain.depesz.com/s/4QXy)

Why optimizer is choosing SeqScan (on cmn_user) in the first query, 
instead of an IndexScan, despite of SeqScan being more costly?


Regards,
Vitaliy



Re: effective_io_concurrency on EBS/gp2

2018-02-23 Thread Vitaliy Garnashevich
I noticed that the recent round of tests being discussed never 
mentioned the file system used.  Was it XFS?  Does changing the 
agcount change the behaviour?


It was ext4.

Regards,
Vitaliy




Re: effective_io_concurrency on EBS/gp2

2018-02-05 Thread Vitaliy Garnashevich

I mean, that the issue is indeed affected by the order of rows in the
table. Random heap access patterns result in sparse bitmap heap scans,
whereas less random heap access patterns result in denser bitmap heap
scans. Dense scans have large portions of contiguous fetches, a
pattern that is quite adversely affected by the current prefetch
mechanism in linux.



Thanks for your input.

How can I test a sparse bitmap scan? Can you think of any SQL commands 
which would generate data and run such scans?


Would a bitmap scan over expression index ((aid%1000)=0) do a sparse 
bitmap scan?


Regards,
Vitaliy



Re: effective_io_concurrency on EBS/gp2

2018-02-02 Thread Vitaliy Garnashevich
I did some more tests. I've made an SQL dump of the table. Then used 
head/tail commands to cut the data part. Then used shuf command to 
shuffle rows, and then joined the pieces back and restored the table 
back into DB.


Before:
select array_agg(aid) from (select aid from pgbench_accounts order by 
ctid limit 20)_;

{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}

effective_io_concurrency=0 Execution time: 1455.336 ms
effective_io_concurrency=1 Execution time: 8365.070 ms
effective_io_concurrency=2 Execution time: 4791.961 ms
effective_io_concurrency=4 Execution time: 4113.713 ms
effective_io_concurrency=8 Execution time: 1584.862 ms
effective_io_concurrency=16 Execution time: 1533.096 ms
effective_io_concurrency=8 Execution time: 1494.494 ms
effective_io_concurrency=4 Execution time: 3235.892 ms
effective_io_concurrency=2 Execution time: 4624.334 ms
effective_io_concurrency=1 Execution time: 7831.310 ms
effective_io_concurrency=0 Execution time: 1422.203 ms

After:
select array_agg(aid) from (select aid from pgbench_accounts order by 
ctid limit 20)_;

{6861090,18316007,2361004,11880097,5079470,9859942,13776329,12687163,3793362,18312052,15912971,9928864,10179242,9307499,2737986,13911147,5337329,12582498,3019085,4631617}

effective_io_concurrency=0 Execution time: 71321.723 ms
effective_io_concurrency=1 Execution time: 180230.742 ms
effective_io_concurrency=2 Execution time: 98635.566 ms
effective_io_concurrency=4 Execution time: 91464.375 ms
effective_io_concurrency=8 Execution time: 91048.939 ms
effective_io_concurrency=16 Execution time: 97682.475 ms
effective_io_concurrency=8 Execution time: 91262.404 ms
effective_io_concurrency=4 Execution time: 90945.560 ms
effective_io_concurrency=2 Execution time: 97019.504 ms
effective_io_concurrency=1 Execution time: 180331.474 ms
effective_io_concurrency=0 Execution time: 71469.484 ms

The numbers are not directly comparable with the previous tests, because 
this time I used scale factor 200.


Regards,
Vitaliy

On 2018-02-01 20:39, Claudio Freire wrote:

On Wed, Jan 31, 2018 at 11:21 PM, hzzhangjiazhi
 wrote:

HI

  I think this parameter will be usefull when the storage using RAID
stripe , otherwise turn up this parameter is meaningless when only has one
device。

Not at all. Especially on EBS, where keeping a relatively full queue
is necessary to get max thoughput out of the drive.

Problem is, if you're scanning a highly correlated index, the
mechanism is counterproductive. I had worked on some POC patches for
correcting that, I guess I could work something out, but it's
low-priority for me. Especially since it's actually a kernel "bug" (or
shortcoming), that could be fixed in the kernel rather than worked
around by postgres.




effective_io_concurrency=0
   QUERY PLAN   


 Bitmap Heap Scan on pgbench_accounts  (cost=12838.24..357966.26 rows=1 
width=97) (actual time=1454.570..1454.570 rows=0 loops=1)
   Recheck Cond: ((aid >= 1000) AND (aid <= 100))
   Filter: (abalance <> 0)
   Rows Removed by Filter: 999001
   Heap Blocks: exact=16378
   Buffers: shared hit=2 read=19109
   ->  Bitmap Index Scan on pgbench_accounts_pkey  (cost=0.00..12838.24 
rows=986230 width=0) (actual time=447.877..447.877 rows=999001 loops=1)
 Index Cond: ((aid >= 1000) AND (aid <= 100))
 Buffers: shared hit=2 read=2731
 Planning time: 15.782 ms
 Execution time: 1455.336 ms
(11 rows)

effective_io_concurrency=1
   QUERY PLAN   


 Bitmap Heap Scan on pgbench_accounts  (cost=12838.24..357966.26 rows=1 
width=97) (actual time=8364.272..8364.272 rows=0 loops=1)
   Recheck Cond: ((aid >= 1000) AND (aid <= 100))
   Filter: (abalance <> 0)
   Rows Removed by Filter: 999001
   Heap Blocks: exact=16378
   Buffers: shared hit=2 read=19109
   ->  Bitmap Index Scan on pgbench_accounts_pkey  (cost=0.00..12838.24 
rows=986230 width=0) (actual time=448.043..448.043 rows=999001 loops=1)
 Index Cond: ((aid >= 1000) AND (aid <= 100))
 Buffers: shared hit=2 read=2731
 Planning time: 15.036 ms
 Execution time: 8365.070 ms
(11 rows)

effective_io_concurrency=2
   QUERY PLAN   


 Bitmap Heap Scan on 

Re: effective_io_concurrency on EBS/gp2

2018-01-31 Thread Vitaliy Garnashevich

I've done some more tests. Here they are all:

io1, 100 GB SSD, 1000 IOPS
effective_io_concurrency=0 Execution time: 40333.626 ms
effective_io_concurrency=1 Execution time: 163840.500 ms
effective_io_concurrency=2 Execution time: 162606.330 ms
effective_io_concurrency=4 Execution time: 163670.405 ms
effective_io_concurrency=8 Execution time: 161800.478 ms
effective_io_concurrency=16 Execution time: 161962.319 ms
effective_io_concurrency=32 Execution time: 160451.435 ms
effective_io_concurrency=64 Execution time: 161763.632 ms
effective_io_concurrency=128 Execution time: 161687.398 ms
effective_io_concurrency=256 Execution time: 160945.066 ms
effective_io_concurrency=256 Execution time: 161226.440 ms
effective_io_concurrency=128 Execution time: 161977.954 ms
effective_io_concurrency=64 Execution time: 159122.006 ms
effective_io_concurrency=32 Execution time: 154923.569 ms
effective_io_concurrency=16 Execution time: 160922.819 ms
effective_io_concurrency=8 Execution time: 160577.122 ms
effective_io_concurrency=4 Execution time: 157509.481 ms
effective_io_concurrency=2 Execution time: 161806.713 ms
effective_io_concurrency=1 Execution time: 164026.708 ms
effective_io_concurrency=0 Execution time: 40196.182 ms

gp2, 100 GB SSD
effective_io_concurrency=0 Execution time: 40262.781 ms
effective_io_concurrency=1 Execution time: 98125.987 ms
effective_io_concurrency=2 Execution time: 55343.776 ms
effective_io_concurrency=4 Execution time: 52505.638 ms
effective_io_concurrency=8 Execution time: 54954.024 ms
effective_io_concurrency=16 Execution time: 54346.455 ms
effective_io_concurrency=32 Execution time: 55196.626 ms
effective_io_concurrency=64 Execution time: 55057.956 ms
effective_io_concurrency=128 Execution time: 54963.510 ms
effective_io_concurrency=256 Execution time: 54339.258 ms

io1, 1 TB SSD, 3000 IOPS
effective_io_concurrency=0 Execution time: 40691.396 ms
effective_io_concurrency=1 Execution time: 87524.939 ms
effective_io_concurrency=2 Execution time: 54197.982 ms
effective_io_concurrency=4 Execution time: 55082.740 ms
effective_io_concurrency=8 Execution time: 54838.161 ms
effective_io_concurrency=16 Execution time: 52561.553 ms
effective_io_concurrency=32 Execution time: 54266.847 ms
effective_io_concurrency=64 Execution time: 54683.102 ms
effective_io_concurrency=128 Execution time: 54643.874 ms
effective_io_concurrency=256 Execution time: 42944.938 ms

gp2, 1 TB SSD
effective_io_concurrency=0 Execution time: 40072.880 ms
effective_io_concurrency=1 Execution time: 83528.679 ms
effective_io_concurrency=2 Execution time: 55706.941 ms
effective_io_concurrency=4 Execution time: 55664.646 ms
effective_io_concurrency=8 Execution time: 54699.658 ms
effective_io_concurrency=16 Execution time: 54632.291 ms
effective_io_concurrency=32 Execution time: 54793.305 ms
effective_io_concurrency=64 Execution time: 55227.875 ms
effective_io_concurrency=128 Execution time: 54638.744 ms
effective_io_concurrency=256 Execution time: 54869.761 ms

st1, 500 GB HDD
effective_io_concurrency=0 Execution time: 40542.583 ms
effective_io_concurrency=1 Execution time: 119996.892 ms
effective_io_concurrency=2 Execution time: 51137.998 ms
effective_io_concurrency=4 Execution time: 42301.922 ms
effective_io_concurrency=8 Execution time: 42081.877 ms
effective_io_concurrency=16 Execution time: 42253.782 ms
effective_io_concurrency=32 Execution time: 42087.216 ms
effective_io_concurrency=64 Execution time: 42112.105 ms
effective_io_concurrency=128 Execution time: 42271.850 ms
effective_io_concurrency=256 Execution time: 42213.074 ms

Regards,
Vitaliy




Re: effective_io_concurrency on EBS/gp2

2018-01-31 Thread Vitaliy Garnashevich

More tests:

io1, 100 GB:

effective_io_concurrency=0
 Execution time: 40333.626 ms
effective_io_concurrency=1
 Execution time: 163840.500 ms
effective_io_concurrency=2
 Execution time: 162606.330 ms
effective_io_concurrency=4
 Execution time: 163670.405 ms
effective_io_concurrency=8
 Execution time: 161800.478 ms
effective_io_concurrency=16
 Execution time: 161962.319 ms
effective_io_concurrency=32
 Execution time: 160451.435 ms
effective_io_concurrency=64
 Execution time: 161763.632 ms
effective_io_concurrency=128
 Execution time: 161687.398 ms
effective_io_concurrency=256
 Execution time: 160945.066 ms

effective_io_concurrency=256
 Execution time: 161226.440 ms
effective_io_concurrency=128
 Execution time: 161977.954 ms
effective_io_concurrency=64
 Execution time: 159122.006 ms
effective_io_concurrency=32
 Execution time: 154923.569 ms
effective_io_concurrency=16
 Execution time: 160922.819 ms
effective_io_concurrency=8
 Execution time: 160577.122 ms
effective_io_concurrency=4
 Execution time: 157509.481 ms
effective_io_concurrency=2
 Execution time: 161806.713 ms
effective_io_concurrency=1
 Execution time: 164026.708 ms
effective_io_concurrency=0
 Execution time: 40196.182 ms


st1, 500 GB:

effective_io_concurrency=0
 Execution time: 40542.583 ms
effective_io_concurrency=1
 Execution time: 119996.892 ms
effective_io_concurrency=2
 Execution time: 51137.998 ms
effective_io_concurrency=4
 Execution time: 42301.922 ms
effective_io_concurrency=8
 Execution time: 42081.877 ms
effective_io_concurrency=16
 Execution time: 42253.782 ms
effective_io_concurrency=32
 Execution time: 42087.216 ms
effective_io_concurrency=64
 Execution time: 42112.105 ms
effective_io_concurrency=128
 Execution time: 42271.850 ms
effective_io_concurrency=256
 Execution time: 42213.074 ms

effective_io_concurrency=256
 Execution time: 42255.568 ms
effective_io_concurrency=128
 Execution time: 42030.515 ms
effective_io_concurrency=64
 Execution time: 41713.753 ms
effective_io_concurrency=32
 Execution time: 42035.436 ms
effective_io_concurrency=16
 Execution time: 42221.581 ms
effective_io_concurrency=8
 Execution time: 42203.730 ms
effective_io_concurrency=4
 Execution time: 42236.082 ms
effective_io_concurrency=2
 Execution time: 49531.558 ms
effective_io_concurrency=1
 Execution time: 117160.222 ms
effective_io_concurrency=0
 Execution time: 40059.259 ms

Regards,
Vitaliy

On 31/01/2018 15:46, Gary Doades wrote:


> I've tried to re-run the test for some specific values of 
effective_io_concurrency. The results were the same.


 > That's why I don't think the order of tests or variability in 
"hardware" performance affected the results.


We run many MS SQL server VMs in AWS with more than adequate performance.

AWS EBS performance is variable and depends on various factors, mainly 
the size of the volume and the size of the VM it is attached to. The 
bigger the VM, the more EBS “bandwidth” is available, especially if 
the VM is EBS Optimised.


The size of the disk determines the IOPS available, with smaller disks 
naturally getting less. However, even a small disk with (say) 300 IOPS 
is allowed to burst up to 3000 IOPS for a while and then gets 
clobbered. If you want predictable performance then get a bigger disk! 
If you really want maximum, predictable performance get an EBS 
Optimised VM and use Provisioned IOPS EBS volumes…. At a price!


Cheers,

Gary.

On 31/01/2018 15:01, Rick Otten wrote:

We moved our stuff out of AWS a little over a year ago because the
performance was crazy inconsistent and unpredictable.  I think
they do a lot of oversubscribing so you get strange sawtooth
performance patterns depending on who else is sharing your
infrastructure and what they are doing at the time.

The same unit of work would take 20 minutes each for several
hours, and then take 2 1/2 hours each for a day, and then back to
20 minutes, and sometimes anywhere in between for hours or days at
a stretch.  I could never tell the business when the processing
would be done, which made it hard for them to set expectations
with customers, promise deliverables, or manage the business. 
Smaller nodes seemed to be worse than larger nodes, I only have
theories as to why.  I never got good support from AWS to help me
figure out what was happening.

My first thought is to run the same test on different days of the
week and different times of day to see if the numbers change
radically.  Maybe spin up a node in another data center and
availability zone and try the test there too.

My real suggestion is to move to Google Cloud or Rackspace or
Digital Ocean or somewhere other than AWS.   (We moved to Google
Cloud and have been very happy there.  The performance is much
more consistent, the management UI is more intuitive, AND the cost
for equivalent infrastructure is lower too.)

On Wed, Jan 31, 2018 at 7:03 AM, Vitaliy Garnashevich

Re: effective_io_concurrency on EBS/gp2

2018-01-31 Thread Vitaliy Garnashevich
I've tried to re-run the test for some specific values of 
effective_io_concurrency. The results were the same.


That's why I don't think the order of tests or variability in "hardware" 
performance affected the results.


Regards,
Vitaliy

On 31/01/2018 15:01, Rick Otten wrote:
We moved our stuff out of AWS a little over a year ago because the 
performance was crazy inconsistent and unpredictable.  I think they do 
a lot of oversubscribing so you get strange sawtooth performance 
patterns depending on who else is sharing your infrastructure and what 
they are doing at the time.


The same unit of work would take 20 minutes each for several hours, 
and then take 2 1/2 hours each for a day, and then back to 20 minutes, 
and sometimes anywhere in between for hours or days at a stretch.  I 
could never tell the business when the processing would be done, which 
made it hard for them to set expectations with customers, promise 
deliverables, or manage the business.  Smaller nodes seemed to be 
worse than larger nodes, I only have theories as to why.  I never got 
good support from AWS to help me figure out what was happening.


My first thought is to run the same test on different days of the week 
and different times of day to see if the numbers change radically.  
Maybe spin up a node in another data center and availability zone and 
try the test there too.


My real suggestion is to move to Google Cloud or Rackspace or Digital 
Ocean or somewhere other than AWS.   (We moved to Google Cloud and 
have been very happy there.  The performance is much more consistent, 
the management UI is more intuitive, AND the cost for equivalent 
infrastructure is lower too.)



On Wed, Jan 31, 2018 at 7:03 AM, Vitaliy Garnashevich 
<vgarnashev...@gmail.com <mailto:vgarnashev...@gmail.com>> wrote:


Hi,

I've tried to run a benchmark, similar to this one:


https://www.postgresql.org/message-id/flat/CAHyXU0yiVvfQAnR9cyH%3DHWh1WbLRsioe%3DmzRJTHwtr%3D2azsTdQ%40mail.gmail.com#CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azs...@mail.gmail.com

<https://www.postgresql.org/message-id/flat/CAHyXU0yiVvfQAnR9cyH%3DHWh1WbLRsioe%3DmzRJTHwtr%3D2azsTdQ%40mail.gmail.com#CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azs...@mail.gmail.com>

CREATE TABLESPACE test OWNER postgres LOCATION '/path/to/ebs';

pgbench -i -s 1000 --tablespace=test pgbench

echo "" >test.txt
for i in 0 1 2 4 8 16 32 64 128 256 ; do
  sync; echo 3 > /proc/sys/vm/drop_caches; service postgresql restart
  echo "effective_io_concurrency=$i" >>test.txt
  psql pgbench -c "set effective_io_concurrency=$i; set
enable_indexscan=off; explain (analyze, buffers)  select * from
pgbench_accounts where aid between 1000 and 1000 and abalance
!= 0;" >>test.txt
done

I get the following results:

effective_io_concurrency=0
 Execution time: 40262.781 ms
effective_io_concurrency=1
 Execution time: 98125.987 ms
effective_io_concurrency=2
 Execution time: 55343.776 ms
effective_io_concurrency=4
 Execution time: 52505.638 ms
effective_io_concurrency=8
 Execution time: 54954.024 ms
effective_io_concurrency=16
 Execution time: 54346.455 ms
effective_io_concurrency=32
 Execution time: 55196.626 ms
effective_io_concurrency=64
 Execution time: 55057.956 ms
effective_io_concurrency=128
 Execution time: 54963.510 ms
effective_io_concurrency=256
 Execution time: 54339.258 ms

The test was using 100 GB gp2 SSD EBS. More detailed query plans
are attached.

PostgreSQL 9.6.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit

The results look really confusing to me in two ways. The first one
is that I've seen recommendations to set
effective_io_concurrency=256 (or more) on EBS. The other one is
that effective_io_concurrency=1 (the worst case) is actually the
default for PostgreSQL on Linux.

Thoughts?

Regards,
Vitaliy






effective_io_concurrency on EBS/gp2

2018-01-31 Thread Vitaliy Garnashevich

Hi,

I've tried to run a benchmark, similar to this one:

https://www.postgresql.org/message-id/flat/CAHyXU0yiVvfQAnR9cyH%3DHWh1WbLRsioe%3DmzRJTHwtr%3D2azsTdQ%40mail.gmail.com#CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azs...@mail.gmail.com

CREATE TABLESPACE test OWNER postgres LOCATION '/path/to/ebs';

pgbench -i -s 1000 --tablespace=test pgbench

echo "" >test.txt
for i in 0 1 2 4 8 16 32 64 128 256 ; do
  sync; echo 3 > /proc/sys/vm/drop_caches; service postgresql restart
  echo "effective_io_concurrency=$i" >>test.txt
  psql pgbench -c "set effective_io_concurrency=$i; set 
enable_indexscan=off; explain (analyze, buffers)  select * from 
pgbench_accounts where aid between 1000 and 1000 and abalance != 0;" 
>>test.txt

done

I get the following results:

effective_io_concurrency=0
 Execution time: 40262.781 ms
effective_io_concurrency=1
 Execution time: 98125.987 ms
effective_io_concurrency=2
 Execution time: 55343.776 ms
effective_io_concurrency=4
 Execution time: 52505.638 ms
effective_io_concurrency=8
 Execution time: 54954.024 ms
effective_io_concurrency=16
 Execution time: 54346.455 ms
effective_io_concurrency=32
 Execution time: 55196.626 ms
effective_io_concurrency=64
 Execution time: 55057.956 ms
effective_io_concurrency=128
 Execution time: 54963.510 ms
effective_io_concurrency=256
 Execution time: 54339.258 ms

The test was using 100 GB gp2 SSD EBS. More detailed query plans are 
attached.


PostgreSQL 9.6.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit


The results look really confusing to me in two ways. The first one is 
that I've seen recommendations to set effective_io_concurrency=256 (or 
more) on EBS. The other one is that effective_io_concurrency=1 (the 
worst case) is actually the default for PostgreSQL on Linux.


Thoughts?

Regards,
Vitaliy


effective_io_concurrency=0
  QUERY 
PLAN  
--
 Bitmap Heap Scan on pgbench_accounts  (cost=137192.84..1960989.89 rows=1 
width=97) (actual time=40261.322..40261.322 rows=0 loops=1)
   Recheck Cond: ((aid >= 1000) AND (aid <= 1000))
   Rows Removed by Index Recheck: 23
   Filter: (abalance <> 0)
   Rows Removed by Filter: 001
   Heap Blocks: exact=97869 lossy=66050
   Buffers: shared hit=3 read=191240
   ->  Bitmap Index Scan on pgbench_accounts_pkey  (cost=0.00..137192.84 
rows=10540117 width=0) (actual time=3366.623..3366.623 rows=001 loops=1)
 Index Cond: ((aid >= 1000) AND (aid <= 1000))
 Buffers: shared hit=3 read=27321
 Planning time: 17.285 ms
 Execution time: 40262.781 ms
(12 rows)

effective_io_concurrency=1
  QUERY 
PLAN  
--
 Bitmap Heap Scan on pgbench_accounts  (cost=137192.84..1960989.89 rows=1 
width=97) (actual time=98124.607..98124.607 rows=0 loops=1)
   Recheck Cond: ((aid >= 1000) AND (aid <= 1000))
   Rows Removed by Index Recheck: 23
   Filter: (abalance <> 0)
   Rows Removed by Filter: 001
   Heap Blocks: exact=97869 lossy=66050
   Buffers: shared hit=3 read=191240
   ->  Bitmap Index Scan on pgbench_accounts_pkey  (cost=0.00..137192.84 
rows=10540117 width=0) (actual time=3373.380..3373.380 rows=001 loops=1)
 Index Cond: ((aid >= 1000) AND (aid <= 1000))
 Buffers: shared hit=3 read=27321
 Planning time: 18.110 ms
 Execution time: 98125.987 ms
(12 rows)

effective_io_concurrency=2
  QUERY 
PLAN  
--
 Bitmap Heap Scan on pgbench_accounts  (cost=137192.84..1960989.89 rows=1 
width=97) (actual time=55340.663..55340.663 rows=0 loops=1)
   Recheck Cond: ((aid >= 1000) AND (aid <= 1000))
   Rows Removed by Index Recheck: 23
   Filter: (abalance <> 0)
   Rows Removed by Filter: 001
   Heap Blocks: exact=97869 lossy=66050
   Buffers: shared hit=3 read=191240
   ->  Bitmap Index Scan on pgbench_accounts_pkey  (cost=0.00..137192.84 
rows=10540117 width=0) (actual time=3306.896..3306.896 rows=001 loops=1)
 Index Cond: ((aid >= 1000) AND (aid <= 1000))
 Buffers: shared hit=3 read=27321
 Planning time: 30.986 ms
 Execution time: 55343.776 ms
(12 rows)

effective_io_concurrency=4
  

Re: Bitmap scan is undercosted?

2017-12-03 Thread Vitaliy Garnashevich

On 03/12/2017 03:27, Jeff Janes wrote:
Due to that, when I disable bitmapscans and seqscans, I start getting 
slow index scans on the wrong index, i2 rather than i1.  I don't know 
why he doesn't see that in his example.
When I increase effective_cache_size to 1024MB, I start getting the plan 
with the slower index i2, too.


*Bitmap Heap Scan* on public.aaa  (cost=12600.90..*23688**.70* rows=9488 
width=5) (actual time=107.529..*115.902* rows=9976 loops=1)
  ->  BitmapAnd  (cost=12600.90..12600.90 rows=9488 width=0) (actual 
time=105.133..105.133 rows=0 loops=1)
    ->  Bitmap Index Scan on i1  (cost=0.00..1116.43 rows=96000 
width=0) (actual time=16.313..16.313 rows=100508 loops=1)
    ->  Bitmap Index Scan on i2  (cost=0.00..11479.47 rows=988338 
width=0) (actual time=77.950..77.950 rows=1000200 loops=1)


*Index Scan* using i2 on public.aaa  (cost=0.44..*48227.31* rows=9488 
width=5) (actual time=0.020..*285.695* rows=9976 loops=1)


*Seq Scan* on public.aaa  (cost=0.00..*169248.54* rows=9488 width=5) 
(actual time=0.024..*966.469* rows=9976 loops=1)


This way the estimates and the actual time get more sense. But then 
there's the question - maybe it's i1 runs too fast, and is estimated 
incorrectly? Why that happens?


Here are the complete plans with the two different kinds of index scans 
once again:


Index Scan using i1 on public.aaa  (cost=0.44..66621.56 rows=10340 
width=5) (actual time=0.027..47.075 rows=9944 loops=1)

  Output: num, flag
  Index Cond: (aaa.num = 1)
  Filter: aaa.flag
  Rows Removed by Filter: 89687
  Buffers: shared hit=39949
Planning time: 0.104 ms
Execution time: 47.351 ms

Index Scan using i2 on public.aaa  (cost=0.44..48227.31 rows=9488 
width=5) (actual time=0.020..285.695 rows=9976 loops=1)

  Output: num, flag
  Index Cond: (aaa.flag = true)
  Filter: (aaa.flag AND (aaa.num = 1))
  Rows Removed by Filter: 990224
  Buffers: shared hit=46984
Planning time: 0.098 ms
Execution time: 286.081 ms


// The test DB was populated with: create table aaa as select 
floor(random()*100)::int num, (random()*10 < 1)::bool flag from 
generate_series(1, 1000) id;


Regards,
Vitaliy



Re: Bitmap scan is undercosted?

2017-12-03 Thread Vitaliy Garnashevich

On 02/12/2017 23:17, Jeff Janes wrote:
Right, so there is a cpu costing problem (which could only be fixed by 
hacking postgresql and recompiling it), but it is much smaller of a 
problem than the IO cost not being accurate due to the high hit rate. 
Fixing the CPU costing problem is unlikely to make a difference to 
your real query.  If you set the page costs to zero, what happens to 
your real query?
I can't reproduce the exact issue on the real database any more. The 
query started to use the slow bitmap scan recently, and had been doing 
so for some time lately, but now it's switched back to use the index 
scan. The table involved in the query gets modified a lot. It has 
hundreds of millions of rows. Lots of new rows are appended to it every 
day, the oldest rows are sometimes removed. The table is analyzed at 
least daily. It's possible that statistics was updated and that caused 
the query to run differently. But I still would like to understand why 
that issue happened, and how to properly fix it, in case the issue returns.


But I doubt that the settings seq_page_cost = random_page_cost =
0.0 should actually be used.


Why not?  If your production server really has everything in memory 
during normal operation, that is the correct course of action.  If you 
ever restart the server, then you could have some unpleasant time 
getting it back up to speed again, but pg_prewarm could help with that.
In the real database, not everything is in memory. There are 200GB+ of 
RAM, but DB is 500GB+. The table involved in the query itself is 60GB+ 
of data and 100GB+ of indexes. I'm running the test case in a way where 
all reads are done from RAM, only to make it easier to reproduce and to 
avoid unrelated effects.


As far as know, costs in Postgres were designed to be relative to 
seq_page_cost, which for that reason is usually defined as 1.0. Even if 
everything would be in RAM, accesses to the pages would still not have 
zero cost. Setting 0.0 just seems too extreme, as all other non-zero 
costs would become infinitely bigger.
If you really want to target the plan with the BitmapAnd, you should 
increase cpu_index_tuple_cost and/or cpu_operator_cost but not 
increase cpu_tuple_cost.  That is because the  unselective bitmap 
index scan does not incur any cpu_tuple_cost, but does incur 
index_tuple and operator costs.  Unfortunately all other index scans 
in the system will also be skewed by such a change if you make the 
change system-wide.
Exactly. I'd like to understand why the worse plan is being chosen, and 
1) if it's fixable by tuning costs, to figure out the right settings 
which could be used in production, 2) if there is a bug in Postgres 
optimizer, then to bring some attention to it, so that it's eventually 
fixed in one of future releases, 3) if Postgres is supposed to work this 
way, then at least I (and people who ever read this thread) would 
understand it better.


Regards,
Vitaliy




Re: Bitmap scan is undercosted?

2017-12-01 Thread Vitaliy Garnashevich

On 02/12/2017 07:51, Jeff Janes wrote:
On Fri, Dec 1, 2017 at 3:54 PM, Vitaliy Garnashevich 
<vgarnashev...@gmail.com <mailto:vgarnashev...@gmail.com>> wrote:


On 02/12/2017 01:11, Justin Pryzby wrote:

I tried to reproduce this issue and couldn't, under PG95 and 10.1:

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy
Garnashevich wrote:

We recently had an issue in production, where a bitmap
scan was chosen
instead of an index scan. Despite being 30x slower,
the bitmap scan had
about the same cost as the index scan.
drop table if exists aaa;
create table aaa as select (id%100)::int num,
(id%10=1)::bool flag from
generate_series(1, 1000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

What is:
effective_io_concurrency
max_parallel_workers_per_gather (I gather you don't have this)

effective_io_concurrency = 0
max_parallel_workers_per_gather = 0

Did you notice random_page_cost = 1.5 ?


For the aaa.num = 39 case, the faster index scan actually does hit 15 
times more buffers than the bitmap scan does.  While 1.5 is lot lower 
than 4.0, it is still much higher than the true cost of reading a page 
from the buffer cache.   This why the index scan is getting punished.  
You could lower random_page_cost and seq_page_cost to 0, to remove 
those considerations.  (I'm not saying you should do this on your 
production system, but rather you should do it as a way to investigate 
the issue.  But it might make sense on production as well)

seq_page_cost = 1.0
random_page_cost = 1.0*
*explain analyze select * from aaa where num = 2 and flag = true;

Bitmap Heap Scan on aaa  (cost=11536.74..20856.96 rows=10257 width=5) 
(actual time=108.338..108.338 rows=0 loops=1)
  ->  BitmapAnd  (cost=11536.74..11536.74 rows=10257 width=0) (actual 
time=108.226..108.226 rows=0 loops=1)
    ->  Bitmap Index Scan on i1  (cost=0.00..1025.43 rows=10 
width=0) (actual time=18.563..18.563 rows=10 loops=1)
    ->  Bitmap Index Scan on i2  (cost=0.00..10505.93 rows=1025666 
width=0) (actual time=78.493..78.493 rows=100 loops=1)


Index Scan using i1 on aaa  (cost=0.44..44663.58 rows=10257 width=5) 
(actual time=51.264..51.264 rows=0 loops=1)


Here I've used the filter num = 2, which produces rows=0 at BitmapAnd, 
and thus avoids a lot of work at "Bitmap Heap Scan" node, while still 
leaving about the same proportion in bitmap vs index - the bitmap is 
twice slower but twice less costly. It does not matter much which value 
to use for the filter, if it's other than num = 1.



seq_page_cost = 0.0
random_page_cost = 0.0
explain analyze select * from aaa where num = 2 and flag = true;

Bitmap Heap Scan on aaa  (cost=753.00..2003.00 rows=10257 width=5) 
(actual time=82.212..82.212 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..750.43 rows=10 width=0) 
(actual time=17.401..17.401 rows=10 loops=1)


Index Scan using i1 on aaa  (cost=0.44..1750.43 rows=10257 width=5) 
(actual time=49.766..49.766 rows=0 loops=1)


The bitmap plan was reduced to use only one bitmap scan, and finally it 
costs more than the index plan. But I doubt that the settings 
seq_page_cost = random_page_cost = 0.0 should actually be used. Probably 
it should be instead something like 1.0/1.0 or 1.0/1.1, but other costs 
increased, to have more weight.



# x4 tuple/operator costs - bitmap scan still a bit cheaper
set seq_page_cost = 1.0;
set random_page_cost = 1.0;
set cpu_tuple_cost = 0.04;
set cpu_index_tuple_cost = 0.02;
set cpu_operator_cost = 0.01;

Bitmap Heap Scan on aaa  (cost=36882.97..46587.82 rows=10257 width=5) 
(actual time=106.045..106.045 rows=0 loops=1)
  ->  BitmapAnd  (cost=36882.97..36882.97 rows=10257 width=0) (actual 
time=105.966..105.966 rows=0 loops=1)
    ->  Bitmap Index Scan on i1  (cost=0.00..3276.74 rows=10 
width=0) (actual time=15.977..15.977 rows=10 loops=1)
    ->  Bitmap Index Scan on i2  (cost=0.00..33584.72 rows=1025666 
width=0) (actual time=79.208..79.208 rows=100 loops=1)


Index Scan using i1 on aaa  (cost=1.74..49914.89 rows=10257 width=5) 
(actual time=50.144..50.144 rows=0 loops=1)



# x5 tuple/operator costs - switched to single bitmap index scan, but 
now it costs more than the index scan

set seq_page_cost = 1.0;
set random_page_cost = 1.0;
set cpu_tuple_cost = 0.05;
set cpu_index_tuple_cost = 0.025;
set cpu_operator_cost = 0.0125;

Bitmap Heap Scan on aaa  (cost=4040.00..54538.00 rows=10257 width=5) 
(actual time=82.338..82.338 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..4027.18 rows=10 width=0) 
(actual time=19.541..19.541 rows=1

Re: Bitmap scan is undercosted?

2017-12-01 Thread Vitaliy Garnashevich

On 02/12/2017 01:11, Justin Pryzby wrote:

I tried to reproduce this issue and couldn't, under PG95 and 10.1:

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.
drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 1000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

What is:
effective_io_concurrency
max_parallel_workers_per_gather (I gather you don't have this)

effective_io_concurrency = 0
max_parallel_workers_per_gather = 0

Did you notice random_page_cost = 1.5 ?

For this test I'm using SSD and Windows (if that matters). On production 
we also use SSD, hence lower random_page_cost. But with the default 
random_page_cost=4.0, the difference in cost between the index scan plan 
and the bitmap scan plan is even bigger.


Note:
postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND 
attname='num';
correlation | 0.00710112

..so this is different from the issue corrected by the patch I created while
testing.


Note that id%100==1 implies flag='t', so the planner anticipates retrieving
fewer rows than it will ultimately read, probably by 2x.  It makes sense that
causes the index scan to be more expensive than expected, but that's only
somewhat important, since there's no joins involved.

I changed the query from COUNT(*) TO * for easier to read explain:

CREATE TABLE aaa AS SELECT (id%100)::int num, (id%10=1)::bool flag FROM 
generate_series(1, 1000) id;
CREATE INDEX i1 ON aaa(num);
CREATE INDEX i2 ON aaa (flag);
ANALYZE VERBOSE aaa;
EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND 
flag=true;
  Bitmap Heap Scan on public.aaa  (cost=20652.98..45751.75 rows=10754 width=5) 
(actual time=85.314..185.107 rows=10 loops=1)
->  BitmapAnd  (cost=20652.98..20652.98 rows=10754 width=0) (actual 
time=163.220..163.220 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..1965.93 rows=106333 width=0) 
(actual time=26.943..26.943 rows=10 loops=1)
  ->  Bitmap Index Scan on i2  (cost=0.00..18681.42 rows=1011332 
width=0) (actual time=133.804..133.804 rows=100 loops=1)

..which is what's wanted with no planner hints (PG10.1 here).
Yes, that's what you get without planner hints, but it's strange to get 
this plan, when there is another one, which runs 2-3 times faster, but 
happens to be estimated to be twice more costly than the one with bitmap 
scans:


# set enable_bitmapscan = off; set enable_indexscan = on;  set 
enable_seqscan = off;

# explain analyze select * from aaa where num = 1 and flag = true;
Index Scan using i1 on aaa  (cost=0.44..66369.81 rows=10428 width=5) 
(actual time=0.020..57.765 rows=10 loops=1)


vs.

# set enable_bitmapscan = on;  set enable_indexscan = off; set 
enable_seqscan = off;

# explain analyze select * from aaa where num = 1 and flag = true;
Bitmap Heap Scan on aaa  (cost=13099.33..25081.40 rows=10428 width=5) 
(actual time=122.137..182.811 rows=10 loops=1)
  ->  BitmapAnd  (cost=13099.33..13099.33 rows=10428 width=0) (actual 
time=110.168..110.168 rows=0 loops=1)
    ->  Bitmap Index Scan on i1  (cost=0.00..1181.44 rows=101667 
width=0) (actual time=20.845..20.845 rows=10 loops=1)
    ->  Bitmap Index Scan on i2  (cost=0.00..11912.43 rows=1025666 
width=0) (actual time=80.323..80.323 rows=100 loops=1)




Same on PG95:
postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE 
num=1 AND flag=true;
  Bitmap Heap Scan on public.aaa  (cost=19755.64..43640.32 rows=9979 width=5) 
(actual time=230.017..336.583 rows=10 loops=1)
->  BitmapAnd  (cost=19755.64..19755.64 rows=9979 width=0) (actual 
time=205.242..205.242 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..1911.44 rows=103334 width=0) 
(actual time=24.911..24.911 rows=10 loops=1)
  ->  Bitmap Index Scan on i2  (cost=0.00..17838.96 rows=965670 
width=0) (actual time=154.237..154.237 rows=100 loops=1)

The rowcount is off, but not a critical issue without a join.

Justin






Re: Bitmap scan is undercosted?

2017-12-01 Thread Vitaliy Garnashevich

On 01/12/2017 20:34, Justin Pryzby wrote:

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.

Me too, see also:
https://www.postgresql.org/message-id/flat/CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA%40mail.gmail.com#cah2-wzkrtggiy_lkquu-oviyp6y_hhz-a1ywacpy4tcywv1...@mail.gmail.com


drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 1000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

select relname, reltuples::bigint, relpages::bigint,
(reltuples/relpages)::bigint tpp from pg_class where relname
in('aaa','i1','i2') order by relname;
"aaa";985;44248;226
"i1";985;27422;365
"i2";985;27422;365

The query was:
explain (analyze,verbose,costs,buffers)
select count(*) from aaa where num = 1 and flag = true;

Note that id%100==1 implies flag='t', so the planner anticipates retrieving
fewer rows than it will ultimately read, probably by 2x.  It makes sense that
causes the index scan to be more expensive than expected, but that's only
somewhat important, since there's no joins involved.
I don't think the planner is that smart to account for correlation 
between values in different columns. When different values are used in 
filter (num=2, num=39, num=74), the query actually runs faster, while 
still being about twice slower than using an index scan. But the cost 
does not change much. It jumps up and down for different values, but 
it's still close to the initial value.


Aggregate  (cost=24239.02..24239.03 rows=1 width=8) (actual 
time=105.239..105.239 rows=1 loops=1)

  Output: count(*)
  Buffers: shared hit=3011
  ->  Bitmap Heap Scan on public.aaa  (cost=12812.05..24214.48 
rows=9816 width=0) (actual time=105.236..105.236 rows=0 loops=1)

    Output: num, flag
    Recheck Cond: (aaa.num = 39)
    Filter: aaa.flag
    Buffers: shared hit=3011
    ->  BitmapAnd  (cost=12812.05..12812.05 rows=9816 width=0) 
(actual time=105.157..105.157 rows=0 loops=1)

  Buffers: shared hit=3011
  ->  Bitmap Index Scan on i1  (cost=0.00..1134.94 
rows=97667 width=0) (actual time=15.725..15.725 rows=10 loops=1)

    Index Cond: (aaa.num = 39)
    Buffers: shared hit=276
  ->  Bitmap Index Scan on i2  (cost=0.00..11671.96 
rows=1005003 width=0) (actual time=77.920..77.920 rows=100 loops=1)

    Index Cond: (aaa.flag = true)
    Buffers: shared hit=2735
Planning time: 0.104 ms
Execution time: 105.553 ms

Aggregate  (cost=65785.99..65786.00 rows=1 width=8) (actual 
time=48.587..48.587 rows=1 loops=1)

  Output: count(*)
  Buffers: shared hit=44524
  ->  Index Scan using i1 on public.aaa  (cost=0.44..65761.45 rows=9816 
width=0) (actual time=48.583..48.583 rows=0 loops=1)

    Output: num, flag
    Index Cond: (aaa.num = 39)
    Filter: aaa.flag
    Rows Removed by Filter: 10
    Buffers: shared hit=44524
Planning time: 0.097 ms
Execution time: 48.620 ms



The reason why it's more than a bit slower is due to the "density" [0] of the
heap pages read.  num=1 is more selective than flag=true, so it scans i1,
reading 1% of the whole table.  But it's not reading the first 1% or
some other 1% of the table, it reads tuples evenly distributed across the
entire table (226*0.01 = ~2 rows of each page).  Since the index was created
after the INSERT, the repeated keys (logical value: id%100) are read in
physical order on the heap, so this is basically doing a seq scan, but with the
additional overhead of reading the index, and maybe doing an fseek() before
each/some read()s, too.  You could confirm that by connecting strace to the
backend before starting the query.

Since you did that using % and with indices added after the INSERT, you can't
improve it by reindexing (as I was able to for my case).  That's an elegant
test case, so thanks.

I think shared_buffers=512MB is just small enough for this test to be painful
for 1e7 rows.  I see the table+index is 559MB.
   table   | ~count  |    size    |   toast |  idx   | 
size + toast + idx

---+-++++
 aaa   | 994 | 346 MB | 0 bytes    | 428 MB 
| 774 MB


But the plan says all buffers are "shared hit", and none "read", so 
that's probably not an issue.




I don't know if that's really similar to your production use case, but I would
recommend trying BRIN indices, which always require a bitmap scan.  Note that
some things (like max()) that can use an btree index cannot use brin.  PG10.1
has WITH autosummarize, which was import

Bitmap scan is undercosted?

2017-12-01 Thread Vitaliy Garnashevich

Hi,

We recently had an issue in production, where a bitmap scan was chosen 
instead of an index scan. Despite being 30x slower, the bitmap scan had 
about the same cost as the index scan.


I've found some cases where similar issues with bitmap scans were 
reported before:


https://www.postgresql.org/message-id/flat/1456154321.976561.528310154.6A623C0E%40webmail.messagingengine.com

https://www.postgresql.org/message-id/flat/CA%2BwPC0MRMhF_8fD9dc8%2BQWZQzUvHahPRSv%3DxMtCmsVLSsy-p0w%40mail.gmail.com

I've made a synthetic test, which kind of reproduces the issue:

shared_buffers = 512MB
effective_cache_size = 512MB
work_mem = 100MB

set seq_page_cost = 1.0;
set random_page_cost = 1.5;
set cpu_tuple_cost = 0.01;
set cpu_index_tuple_cost = 0.005;
set cpu_operator_cost = 0.0025;

drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from 
generate_series(1, 1000) id;

create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

select relname, reltuples::bigint, relpages::bigint, 
(reltuples/relpages)::bigint tpp from pg_class where relname 
in('aaa','i1','i2') order by relname;

"aaa";985;44248;226
"i1";985;27422;365
"i2";985;27422;365

I've been running the same query while enabling and disabling different 
kinds of scans:


1) set enable_bitmapscan = on;  set enable_indexscan = off; set 
enable_seqscan = off;
2) set enable_bitmapscan = off; set enable_indexscan = on;  set 
enable_seqscan = off;
3) set enable_bitmapscan = off; set enable_indexscan = off; set 
enable_seqscan = on;


The query was:
explain (analyze,verbose,costs,buffers)
select count(*) from aaa where num = 1 and flag = true;

Here are the results for PostgreSQL 9.6 (for 9.3 and 10.1 the results 
are very similar):


1) Aggregate  (cost=24821.70..24821.71 rows=1 width=8) (actual 
time=184.591..184.591 rows=1 loops=1)

  Output: count(*)
  Buffers: shared hit=47259
  ->  Bitmap Heap Scan on public.aaa  (cost=13038.21..24796.22 
rows=10189 width=0) (actual time=122.492..178.006 rows=10 loops=1)

    Output: num, flag
    Recheck Cond: (aaa.num = 1)
    Filter: aaa.flag
    Heap Blocks: exact=44248
    Buffers: shared hit=47259
    ->  BitmapAnd  (cost=13038.21..13038.21 rows=10189 width=0) 
(actual time=110.699..110.699 rows=0 loops=1)

  Buffers: shared hit=3011
  ->  Bitmap Index Scan on i1  (cost=0.00..1158.94 
rows=99667 width=0) (actual time=19.600..19.600 rows=10 loops=1)

    Index Cond: (aaa.num = 1)
    Buffers: shared hit=276
  ->  Bitmap Index Scan on i2  (cost=0.00..11873.92 
rows=1022332 width=0) (actual time=81.676..81.676 rows=100 loops=1)

    Index Cond: (aaa.flag = true)
    Buffers: shared hit=2735
Planning time: 0.104 ms
Execution time: 184.988 ms

2) Aggregate  (cost=67939.09..67939.10 rows=1 width=8) (actual 
time=67.510..67.510 rows=1 loops=1)

  Output: count(*)
  Buffers: shared hit=44524
  ->  Index Scan using i1 on public.aaa  (cost=0.44..67910.95 
rows=11256 width=0) (actual time=0.020..61.180 rows=10 loops=1)

    Output: num, flag
    Index Cond: (aaa.num = 1)
    Filter: aaa.flag
    Buffers: shared hit=44524
Planning time: 0.096 ms
Execution time: 67.543 ms

3) Aggregate  (cost=169276.49..169276.50 rows=1 width=8) (actual 
time=977.063..977.063 rows=1 loops=1)

  Output: count(*)
  Buffers: shared hit=44248
  ->  Seq Scan on public.aaa  (cost=0.00..169248.35 rows=11256 width=0) 
(actual time=0.018..969.294 rows=10 loops=1)

    Output: num, flag
    Filter: (aaa.flag AND (aaa.num = 1))
    Rows Removed by Filter: 990
    Buffers: shared hit=44248
Planning time: 0.099 ms
Execution time: 977.094 ms


The bitmap scan version runs more than twice slower than the one with 
index scan, while being costed at more than twice cheaper.


I've tried to increase cpu_tuple_cost and cpu_index_tuple_cost, and this 
behavior remains after 6x increase in values. Although the difference in 
costs becomes much less. After increasing the settings more than 6x, 
PostgreSQL decides to use a different plan for bitmap scans, so it's 
hard to make conclusions at that point.


Could such cases be fixed with tuning of cost settings, or that's just 
how PostgreSQL estimates bitmap scans and this can't be fixed without 
modifying the optimizer? Or am I missing something and that's the 
expected behavior? Thoughts?


Regards,
Vitaliy