Re: [HACKERS] Why the buildfarm is all pink

2013-12-11 Thread Andres Freund
On 2013-12-10 19:55:12 -0500, Tom Lane wrote:
 I was surprised to see that my back-patches of the recent SubLink
 unpleasantness were failing on many of the buildfarm members, but
 only in the 9.1 and 9.0 branches.  The difficulty appears to be
 that the EXPLAIN output for the new test query changes depending on
 whether or not tenk1 has been analyzed yet.  In 9.2 and up,
 it reliably has been, because create_index runs first and that script
 does this:

 create_index.sql:901:vacuum analyze tenk1;-- ensure we get 
 consistent plans here

 so depending on timing, one of those might have gotten the job done,
 or maybe autovacuum would show up in time to save the day.

 We need a more consistent strategy for this :-(

Agreed, although I have no clue how it should look like. As a further
datapoint I'll add that installcheck already regularly fails in HEAD if
you have a HS standby connected via SR and hot_standby_feedback=on on
the standby. Some plans just change from index(only) scans to sequential
scans, presumably because of the lower xmin horizon changed the
stats. Since there's nothing running on the standby in those cases,
there has to be a pretty damn tiny window here somewhere.

I've wondered whether we could fix that by a) more explicit
vacuum/analyzes b) a function waiting for quiescent state.

Arguably we could just define that being unsupported, but given there's
no testing of recovery but that at all that doesn't seem like a good
idea.

There's also fun in running with vacuum_defer_cleanup_age  0, but I
don't think there's much chance of supporting that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why the buildfarm is all pink

2013-12-11 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:

 I haven't touched matview.sql here; that seems like a distinct issue.


I'll fix that.

--

Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why the buildfarm is all pink

2013-12-11 Thread Kevin Grittner
Kevin Grittner kgri...@ymail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:

 I haven't touched matview.sql here; that seems like a distinct
 issue.

 I'll fix that.

Done.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why the buildfarm is all pink

2013-12-11 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-12-10 19:55:12 -0500, Tom Lane wrote:
 We need a more consistent strategy for this :-(

 Agreed, although I have no clue how it should look like. As a further
 datapoint I'll add that installcheck already regularly fails in HEAD if
 you have a HS standby connected via SR and hot_standby_feedback=on on
 the standby. Some plans just change from index(only) scans to sequential
 scans, presumably because of the lower xmin horizon changed the
 stats. Since there's nothing running on the standby in those cases,
 there has to be a pretty damn tiny window here somewhere.

The case in create_index does a vacuum analyze tenk1 and expects
to get an index-only scan in the very next SQL command.  So any delay
in considering the table all-visible could break that test.  I'm not
sure if that's what you're talking about though.  We could easily
create some more delay for that case, for instance by moving the
vacuum step to copy.sql as I was idly speculating about upthread.
Do you remember offhand where the failures are?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why the buildfarm is all pink

2013-12-11 Thread Andres Freund
On 2013-12-11 10:07:19 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-12-10 19:55:12 -0500, Tom Lane wrote:
  We need a more consistent strategy for this :-(

  Agreed, although I have no clue how it should look like. As a further
  datapoint I'll add that installcheck already regularly fails in HEAD if
  you have a HS standby connected via SR and hot_standby_feedback=on on
  the standby. Some plans just change from index(only) scans to sequential
  scans, presumably because of the lower xmin horizon changed the
  stats. Since there's nothing running on the standby in those cases,
  there has to be a pretty damn tiny window here somewhere.
 
 The case in create_index does a vacuum analyze tenk1 and expects
 to get an index-only scan in the very next SQL command.  So any delay
 in considering the table all-visible could break that test.  I'm not
 sure if that's what you're talking about though.  We could easily
 create some more delay for that case, for instance by moving the
 vacuum step to copy.sql as I was idly speculating about upthread.
 Do you remember offhand where the failures are?

No, but they are easy enough to reproduce. Out of 10 runs, I've attached
the one with the most failures and checked that it seems to contain all
the failures from other runs. All of them probably could be fixed by
moving things around, but I am not sure how maintainable that approach
is :/

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
--- /home/andres/src/postgresql/src/test/regress/expected/create_index.out  
2013-12-08 19:57:01.646559353 +0100
+++ 
/home/andres/build/postgres/dev-assert/vpath/src/test/regress/results/create_index.out
  2013-12-11 16:54:00.043641015 +0100
@@ -2720,11 +2720,15 @@
 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)
 ORDER BY unique1;
-  QUERY PLAN   

- Index Only Scan using tenk1_unique1 on tenk1
-   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+QUERY PLAN 
+---
+ Sort
+   Sort Key: unique1
+   -  Bitmap Heap Scan on tenk1
+ Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+ -  Bitmap Index Scan on tenk1_unique1
+   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+(6 rows)
 
 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)

==

--- /home/andres/src/postgresql/src/test/regress/expected/subselect.out 
2013-12-11 16:42:18.791039738 +0100
+++ 
/home/andres/build/postgres/dev-assert/vpath/src/test/regress/results/subselect.out
 2013-12-11 16:54:06.823376218 +0100
@@ -727,7 +727,7 @@
-  Seq Scan on public.tenk1 b
  Output: b.unique1, b.unique2, b.two, b.four, b.ten, b.twenty, 
b.hundred, b.thousand, b.twothousand, b.fivethous, b.tenthous, b.odd, b.even, 
b.stringu1, b.stringu2, b.string4
SubPlan 1
- -  Index Only Scan using tenk1_unique1 on public.tenk1 a
+ -  Seq Scan on public.tenk1 a
Output: a.unique1
 (10 rows)
 

==

--- /home/andres/src/postgresql/src/test/regress/expected/join.out  
2013-12-08 19:57:01.649892559 +0100
+++ 
/home/andres/build/postgres/dev-assert/vpath/src/test/regress/results/join.out  
2013-12-11 16:54:08.439979745 +0100
@@ -3354,16 +3354,16 @@
 explain (costs off)
   select count(*) from tenk1 a,
 tenk1 b join lateral (values(a.unique1)) ss(x) on b.unique2 = ss.x;
-QUERY PLAN
---
+ QUERY PLAN  
+-
  Aggregate
-  Hash Join
  Hash Cond: (*VALUES*.column1 = b.unique2)
  -  Nested Loop
-   -  Index Only Scan using tenk1_unique1 on tenk1 a
+   -  Seq Scan on tenk1 a
-  Values Scan on *VALUES*
  -  Hash
-   -  Index Only Scan using tenk1_unique2 on tenk1 b
+   -  Seq Scan on tenk1 b
 (8 rows)
 
 select count(*) from tenk1 a,

==

--- /home/andres/src/postgresql/src/test/regress/expected/rowtypes.out  
2013-12-08 19:14:10.614689754 +0100
+++ 
/home/andres/build/postgres/dev-assert/vpath/src/test/regress/results/rowtypes.out
  2013-12-11 16:54:31.072429133 +0100
@@ -240,11 +240,15 @@
 select thousand, tenthous from tenk1
 where (thousand, tenthous) = (997, 5000)
 order by thousand, tenthous;
-QUERY PLAN 

- 

Re: [HACKERS] Why the buildfarm is all pink

2013-12-11 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 Kevin Grittner kgri...@ymail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 I haven't touched matview.sql here; that seems like a distinct issue.

 I'll fix that.

 Done.

Thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why the buildfarm is all pink

2013-12-11 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-12-11 10:07:19 -0500, Tom Lane wrote:
 Do you remember offhand where the failures are?

 No, but they are easy enough to reproduce. Out of 10 runs, I've attached
 the one with the most failures and checked that it seems to contain all
 the failures from other runs. All of them probably could be fixed by
 moving things around, but I am not sure how maintainable that approach
 is :/

Thanks for doing the legwork.  These all seem to be cases where the
planner decided against doing an index-only scan on tenk1, which is
presumably because its relallvisible fraction is too low.  But these are
later in the test series than the vacuum analyze tenk1 that's currently
present in create_index, and most of them are even later than the
database-wide VACUUM in sanity_check.  So those vacuums are failing to
mark the table as all-visible, even though it's not changed since the COPY
test.  This seems odd.  Do you know why your slave server is holding back
the xmin horizon so much?

After looking at this, I conclude that moving the vacuums earlier would
probably make things worse not better, because the critical interval seems
to be from the COPY TO tenk1 command to the vacuum command.  So the idea
of putting vacuums into the COPY test is a bad one, and I'll proceed with
the patch I posted yesterday for moving the ANALYZE steps around.  I think
fixing what you're seeing is going to be a different issue.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why the buildfarm is all pink

2013-12-10 Thread Robert Haas
On Tue, Dec 10, 2013 at 7:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 This doesn't make me happy.  Aside from the sheer waste of cycles
 involved in re-analyzing the entire regression database, this
 test runs in parallel with half a dozen others, and it could cause
 plan instability in those.  Of course, if it does, then most likely
 those tests have a hazard from autovacuum anyway.  But this still
 looks to me like a poor bit of test design.

Agreed.

 Anyway, bottom line is that I think we need to institute, and
 back-patch, some consistent scheme for when to analyze the standard
 tables during the regression tests, so that we don't have hazards
 like this for tests that want to check what plan gets selected.

 Comments?

Everything you're saying sounds reasonable from here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why the buildfarm is all pink

2013-12-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Dec 10, 2013 at 7:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Anyway, bottom line is that I think we need to institute, and
 back-patch, some consistent scheme for when to analyze the standard
 tables during the regression tests, so that we don't have hazards
 like this for tests that want to check what plan gets selected.
 
 Comments?

 Everything you're saying sounds reasonable from here.

I experimented with this and found out that an across-the-board ANALYZE
in the copy test causes a number of existing plan results to change,
because actually the very small tables like int4_tbl have never been
analyzed at all in the past.  Now, that might be just fine, or it might
be that those tests would no longer test the code path they were meant
to test.  I lack the energy right now to examine each one and decide
whether the change is OK.  What I propose instead is that the copy test
only analyze the tables that it itself loaded.  This lets us get rid
of most of the ad-hoc ANALYZEs of pre-existing tables, as in the attached
proposed patch against HEAD.  I ended up leaving in the vacuum analyze
tenk1 in create_index.sql, because it turns out what that is actually
accomplishing is to fill the visibility map so that an index-only scan
will be chosen.  Maybe we should change all the added ANALYZEs in the
copy test to VACUUM ANALYZEs?  That would be redundant though with the
database-wide VACUUM in sanity_check.sql, which unfortunately runs after
create_index.sql.

I haven't touched matview.sql here; that seems like a distinct issue.

regards, tom lane

diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index e17881c..c05b39c 100644
*** a/src/test/regress/expected/aggregates.out
--- b/src/test/regress/expected/aggregates.out
*** FROM bool_test;
*** 506,512 
  -- Test cases that should be optimized into indexscans instead of
  -- the generic aggregate implementation.
  --
- analyze tenk1;		-- ensure we get consistent plans here
  -- Basic cases
  explain (costs off)
select min(unique1) from tenk1;
--- 506,511 
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 232a233..0f0c638 100644
*** a/src/test/regress/expected/alter_table.out
--- b/src/test/regress/expected/alter_table.out
*** ALTER INDEX tmp_onek_unique1 RENAME TO o
*** 140,146 
  CREATE VIEW tmp_view (unique1) AS SELECT unique1 FROM tenk1;
  ALTER TABLE tmp_view RENAME TO tmp_view_new;
  -- hack to ensure we get an indexscan here
- ANALYZE tenk1;
  set enable_seqscan to off;
  set enable_bitmapscan to off;
  -- 5 values, sorted
--- 140,145 
diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out
index d47861e..23b3902 100644
*** a/src/test/regress/expected/arrays.out
--- b/src/test/regress/expected/arrays.out
*** SELECT 0 || ARRAY[1,2] || 3 AS {0,1,2,3
*** 421,427 
   {0,1,2,3}
  (1 row)
  
- ANALYZE array_op_test;
  SELECT * FROM array_op_test WHERE i @ '{32}' ORDER BY seqno;
   seqno |i| t  
  ---+-+
--- 421,426 
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index c3598e2..85113a9 100644
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*** on (x1 = xx1) where (xx2 is not null);
*** 2129,2135 
  -- regression test: check for bug with propagation of implied equality
  -- to outside an IN
  --
- analyze tenk1;		-- ensure we get consistent plans here
  select count(*) from tenk1 a where unique1 in
(select unique1 from tenk1 b join tenk1 c using (unique1)
 where b.unique2 = 42);
--- 2129,2134 
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index ab3f508..80f911f 100644
*** a/src/test/regress/input/copy.source
--- b/src/test/regress/input/copy.source
*** COPY array_op_test FROM '@abs_srcdir@/da
*** 60,65 
--- 60,90 
  
  COPY array_index_op_test FROM '@abs_srcdir@/data/array.data';
  
+ -- analyze all the data we just loaded, to ensure plan consistency
+ -- in later tests
+ 
+ ANALYZE aggtest;
+ ANALYZE onek;
+ ANALYZE tenk1;
+ ANALYZE slow_emp4000;
+ ANALYZE person;
+ ANALYZE emp;
+ ANALYZE student;
+ ANALYZE stud_emp;
+ ANALYZE road;
+ ANALYZE real_city;
+ ANALYZE hash_i4_heap;
+ ANALYZE hash_name_heap;
+ ANALYZE hash_txt_heap;
+ ANALYZE hash_f8_heap;
+ ANALYZE test_tsvector;
+ ANALYZE bt_i4_heap;
+ ANALYZE bt_name_heap;
+ ANALYZE bt_txt_heap;
+ ANALYZE bt_f8_heap;
+ ANALYZE array_op_test;
+ ANALYZE array_index_op_test;
+