Hi,

I have to provide a summary of how much spaces is used in the large objects 
table based on a group by condition.
I would expect an index only scan on the large object table, but a full seq 
scan that last for hours is performed.

BigSql distribution
PostgreSQL 9.6.5 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by 
MSYS2 project) 4.9.2, 64-bit
Win Server 2012 R2, 8GB RAM
pg server mem settings:
effective_cache_size | 6GB
maintenance_work_mem | 819MB
random_page_cost     | 2
shared_buffers       | 2GB
work_mem             | 32MB

Testcase 1: Here is a simplified query, timing and the explain plan:
SELECT ima.sit_cod, COUNT(*)*2048*4/3
  FROM images ima JOIN pg_largeobject ON (loid=ima.val)
GROUP BY ima.sit_cod;
Time: 343997.661 ms (about 6 min) ran on a small DB, took 4hrs on a ~1TB table

HashAggregate  (cost=2452378.86..2452379.01 rows=15 width=14)
  Group Key: ima.sit_cod
  ->  Hash Join  (cost=1460.40..2418245.74 rows=6826625 width=6)
        Hash Cond: (pg_largeobject.loid = ima.val)
--------->  Seq Scan on pg_largeobject  (cost=0.00..2322919.25 rows=6826625 
width=4)
        ->  Hash  (cost=1114.62..1114.62 rows=27662 width=10)
              ->  Seq Scan on images ima  (cost=0.00..1114.62 rows=27662 
width=10)


Testcase 2: A simple count(*) for a specific group (small group) perform an 
Index Only Scan and last few secs.
SELECT COUNT(*)
  FROM images ima JOIN pg_largeobject ON (loid=ima.val)
WHERE sit_cod='W8213';
count
-------
  8599
Time: 12.090 ms

Aggregate  (cost=11930.30..11930.31 rows=1 width=8)
  ->  Nested Loop  (cost=2.87..11918.58 rows=4689 width=0)
        ->  Bitmap Heap Scan on images ima  (cost=2.43..37.81 rows=19 width=4)
              Recheck Cond: ((sit_cod)::text = 'W8213'::text)
              ->  Bitmap Index Scan on ima_pk  (cost=0.00..2.43 rows=19 width=0)
                    Index Cond: ((sit_cod)::text = 'W8213'::text)
--------->  Index Only Scan using pg_largeobject_loid_pn_index on 
pg_largeobject  (cost=0.43..621.22 rows=408 width=4)
              Index Cond: (loid = ima.val)


Testcase 3: However, larger group still perform full seq scan
SELECT COUNT(*)
  FROM images ima JOIN pg_largeobject ON (loid=ima.val)
WHERE sit_cod='W8317';
  count
---------
2209704
Time: 345638.118 ms (about 6 min)

Aggregate  (cost=2369363.01..2369363.02 rows=1 width=8)
  ->  Hash Join  (cost=1125.63..2365419.35 rows=1577463 width=0)
        Hash Cond: (pg_largeobject.loid = ima.val)
--------->  Seq Scan on pg_largeobject  (cost=0.00..2322919.25 rows=6826625 
width=4)
        ->  Hash  (cost=1045.73..1045.73 rows=6392 width=4)
              ->  Bitmap Heap Scan on images ima  (cost=127.83..1045.73 
rows=6392 width=4)
                    Recheck Cond: ((sit_cod)::text = 'W8317'::text)
                    ->  Bitmap Index Scan on ima_pk  (cost=0.00..126.23 
rows=6392 width=0)
                          Index Cond: ((sit_cod)::text = 'W8317'::text)

Pretty sure that using the index would lead to much better perf.
Any idea of what can be done?

Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com<http://www.ultra-forensictechnology.com>

Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com<http://www.ultra-forensictechnology.com>

Reply via email to