> Please provide some specifics.  It's been a very long time since the
> planner was completely unaware of the size of such a table.  Lack of
> stats is certainly a handicap, but I'm not convinced it should result
> in horrible plans.  Maybe a more appropriate answer to this type of
> issue is to tweak some of the default selectivity numbers.

Sure.  See attached output.  This is from 8.2.9, but the behavior on
HEAD is similar.  The first query executed before and then again after
the ANALYZE is OK, but the second, which involves an additional join
condition, is 6X slower prior to the ANALYZE.

I don't see how you're going to fix this problem by tweaking the
selectivity estimates.  If it were possible to generate good query
plans without selectivity estimates derived from the actual table
contents, we wouldn't need ANALYZE in the first place.

>> And maybe also do the same thing if the table has grown significantly
>> (not sure what the threshold should be) since the last ANALYZE.
> Autovacuum already does this type of thing.

It's asynchronous, though.  Frequently, you want to load a bunch of
data into a table and then immediately execute a query against it, or
possibly several queries.  It's pretty annoying to have to write logic
that says - ok, if the number of rows that we just inserted was really
big relative to what was already in the table, then do an ANALYZE on
the table before issuing the SELECT, otherwise skip it.

I would be happy enough if we could recognize CREATE TABLE ... insert
a bunch of data ... SELECT as a case where we need to force a
synchronous ANALYZE - because in my experience you almost always do.
Recognizing the case where the table has grown a lot since the last
ANALYZE is probably harder, and a bit less important, but would surely
be nice if it could be done.

portal=# create table bulk_data (a integer, b integer, primary key (a, b));     
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "bulk_data_pkey" 
for table "bulk_data"
portal=# insert into bulk_data select 1,generate_series(1,1000000);             
INSERT 0 1000000
portal=# insert into bulk_data select 2,generate_series(1,1000000);
INSERT 0 1000000
portal=# explain analyze select * from bulk_data where a = 1 limit 100;         
                                                               QUERY PLAN       
 Limit  (cost=0.00..272.33 rows=100 width=8) (actual time=0.461..0.875 rows=100 
   ->  Index Scan using bulk_data_pkey on bulk_data  (cost=0.00..25898.89 
rows=9510 width=8) (actual time=0.454..0.630 rows=100 loops=1)
         Index Cond: (a = 1)
 Total runtime: 1.094 ms
(4 rows)

portal=# explain analyze select * from bulk_data where a = 1 and (b % 1000) = 0 
limit 100;
                                                                 QUERY PLAN     
 Limit  (cost=183.81..10439.27 rows=48 width=8) (actual time=409.110..494.701 
rows=100 loops=1)
   ->  Bitmap Heap Scan on bulk_data  (cost=183.81..10439.27 rows=48 width=8) 
(actual time=409.103..494.228 rows=100 loops=1)
         Recheck Cond: (a = 1)
         Filter: ((b % 1000) = 0)
         ->  Bitmap Index Scan on bulk_data_pkey  (cost=0.00..183.79 rows=9510 
width=0) (actual time=404.616..404.616 rows=1000000 loops=1)
               Index Cond: (a = 1)
 Total runtime: 495.186 ms
(7 rows)

portal=# analyze bulk_data;
portal=# explain analyze select * from bulk_data where a = 1 limit 100;         
                                                     QUERY PLAN                 
 Limit  (cost=0.00..3.50 rows=100 width=8) (actual time=0.037..0.435 rows=100 
   ->  Seq Scan on bulk_data  (cost=0.00..34804.20 rows=995341 width=8) (actual 
time=0.031..0.184 rows=100 loops=1)
         Filter: (a = 1)
 Total runtime: 0.676 ms
(4 rows)

portal=# explain analyze select * from bulk_data where a = 1 and (b % 1000) = 0 
limit 100;
                                                    QUERY PLAN                  
 Limit  (cost=0.00..900.23 rows=100 width=8) (actual time=1.261..76.754 
rows=100 loops=1)
   ->  Seq Scan on bulk_data  (cost=0.00..44804.28 rows=4977 width=8) (actual 
time=1.255..76.381 rows=100 loops=1)
         Filter: ((a = 1) AND ((b % 1000) = 0))
 Total runtime: 77.084 ms
(4 rows)

