[isabelle-dev] Safe approach to hypothesis substitution

2010-08-23 Thread Thomas Sewell
As previously discussed in the thread Safe for boolean equality, the 
current strategy for using equality hypotheses 'x = expr', in which 
substitution for x is done and then the hypothesis is discarded, is 
unsafe. The conclusion of that discussion was that the problem was 
annoying but fairly rare, that there is some concern it may become more 
common as local are used more extensively, and that fixing it would 
probably require a painful change to the behaviour of auto.


The problem is that by forgetting what x equates to in our goal, we lose 
the connection from the goal to the context outside our goal. There may 
be other facts available that name x. There may also be schematic 
variables which could be instantiated to hd x but not the bound 
variable replacing it.


The simple solution in my mind is to keep the equality in the goal, and 
since noone else seemed interested I thought I'd attempt to do this 
myself and see how difficult it was. I attach the resulting patch. After 
several rounds of bugfixes and compatibility improvements, it requires 
23 lines of changes to theories to rebuild HOL, HOL-ex and HOL-Word.


The code change in Provers/hypsubst.ML adds code for counting the free 
and bound variables in a goal, and checking whether either side of an 
equality hypothesis is a unique variable, occuring nowhere else. The 
main substitution algorithms avoid using such equalities and also 
preserve rather than delete the hypothesis they select. There is a new 
tactic for discarding these equalities, which is added as an unsafe 
wrapper in Context_Rules, allowing all unsafe tactics to behave roughly 
as before. The version of hypsubst called by blast is left unchanged, as 
blast proofs seem to be highly sensitive to changes. There is plenty of 
room for optimisation, I tried to keep the diff readable.


Three kinds of proofs seem to need fixing:
  1. proofs where subgoal_tac or similar now needs to refer to xa 
rather than x.
  2. proofs where erule ssubst, drule sym/arg_cong/fun_cong etc need to 
be further specialised.
  3. proofs where variables are eliminated and then induction is 
performed, i.e. the word library. Explicitly adding apply 
hypsubst_thin seems the best solution.


At this stage I'm not sure how to proceed. I would be very happy to see 
safe get safer, but I'm not sure how acceptable this level of 
incompatibility is in an Isabelle change. There's an alternative 
approach I thought of recently but haven't tried yet - replacing used 
equalities with meta-equalities - which might reduce the 
incompatibilities of type 2.


I haven't checked whether there are any performance implications.

I'd be keen to hear what other Isabelle developers think.

Yours,
Thomas.


diff -r 2e1b6a8a0fdd src/HOL/Divides.thy
--- a/src/HOL/Divides.thy   Wed Jul 28 06:38:17 2010 +1000
+++ b/src/HOL/Divides.thy   Mon Aug 23 17:33:59 2010 +1000
@@ -429,7 +429,7 @@
 apply (case_tac y = 0) apply simp
 apply (auto simp add: dvd_def)
 apply (subgoal_tac -(y * k) = y * - k)
- apply (erule ssubst)
+ apply (simp only:)
  apply (erule div_mult_self1_is_id)
 apply simp
 done
@@ -438,8 +438,7 @@
 apply (case_tac y = 0) apply simp
 apply (auto simp add: dvd_def)
 apply (subgoal_tac y * k = -y * -k)
- apply (erule ssubst)
- apply (rule div_mult_self1_is_id)
+ apply (erule ssubst, rule div_mult_self1_is_id)
  apply simp
 apply simp
 done
diff -r 2e1b6a8a0fdd src/HOL/HOL.thy
--- a/src/HOL/HOL.thy   Wed Jul 28 06:38:17 2010 +1000
+++ b/src/HOL/HOL.thy   Mon Aug 23 17:33:59 2010 +1000
@@ -871,6 +871,7 @@
   Hypsubst.hypsubst_setup
   (*prevent substitution on bool*)
   # Context_Rules.addSWrapper (fn tac = hyp_subst_tac' ORELSE' tac)
+  # Context_Rules.addWrapper (fn tac = Hypsubst.thin_triv_tac false ORELSE' 
tac)
 end
 *}
 
diff -r 2e1b6a8a0fdd src/HOL/Int.thy
--- a/src/HOL/Int.thy   Wed Jul 28 06:38:17 2010 +1000
+++ b/src/HOL/Int.thy   Mon Aug 23 17:33:59 2010 +1000
@@ -542,7 +542,7 @@
 lemma negD: (x \Colon int)  0 \Longrightarrow \existsn. x = - (of_nat 
(Suc n))
 apply (cases x)
 apply (auto simp add: le minus Zero_int_def int_def order_less_le)
-apply (rule_tac x=y - Suc x in exI, arith)
+apply (rule_tac x=y - Suc xa in exI, arith)
 done
 
 
diff -r 2e1b6a8a0fdd src/HOL/Library/Multiset.thy
--- a/src/HOL/Library/Multiset.thy  Wed Jul 28 06:38:17 2010 +1000
+++ b/src/HOL/Library/Multiset.thy  Mon Aug 23 17:33:59 2010 +1000
@@ -1116,7 +1116,7 @@
  apply (simp (no_asm))
  apply (rule_tac x = (K - {#a#}) + Ka in exI)
  apply (simp (no_asm_simp) add: add_assoc [symmetric])
- apply (drule_tac f = \lambdaM. M - {#a#} in arg_cong)
+ apply (drule_tac f = \lambdaM. M - {#a#} and x=?S + ?T in arg_cong)
  apply (simp add: diff_union_single_conv)
  apply (simp (no_asm_use) add: trans_def)
  apply blast
@@ -1127,7 +1127,7 @@
  apply (rule conjI)
   apply (simp add: multiset_ext_iff split: nat_diff_split)
  apply (rule conjI)
-  apply (drule_tac f = \lambdaM. M - {#a#} in arg_cong, simp)
+  apply (drule_tac f 

Re: [isabelle-dev] Safe approach to hypothesis substitution

2010-08-23 Thread Alexander Krauss

Hi Thomas,

Thanks for digging into this. The concrete patch is exactly what was 
needed to advance this discussion. Here are a few comments. I am sure 
others will have more...


I attach the resulting patch. After 
several rounds of bugfixes and compatibility improvements, it requires 
23 lines of changes to theories to rebuild HOL, HOL-ex and HOL-Word.


So it is painful (as there are a much more theories out there), but it 
may work, and it may be worth the trouble in the long run.


The code change in Provers/hypsubst.ML adds code for counting the free 
and bound variables in a goal, and checking whether either side of an 
equality hypothesis is a unique variable, occuring nowhere else. The 
main substitution algorithms avoid using such equalities and also 
preserve rather than delete the hypothesis they select.


Do I understand correctly that substituting the hypothesis is safe, but 
discarding them afterwards is unsafe? Then, why should they not be used?


More importantly, why is it safe to use and discard an equality when the 
variable occurs elsewhere in the goal? The equality itself could 
interact with a fact from the context? Consider the contrived example


locale l =
  fixes x :: nat
  assumes neq: x ~= 0
begin

lemma x = 0 == x = x + 1 == False
apply safe

After safe, the lemma becomes unprovable, since the hypothesis that 
contradicts the context is removed. It seems that your patch does not 
fix this... (I haven't tried... just guessing from reading the code)



[...]

A few concrete remarks concerning the patch (which seems to be relative 
to Isabelle2009-2):




diff -r 2e1b6a8a0fdd src/HOL/Parity.thy
--- a/src/HOL/Parity.thyWed Jul 28 06:38:17 2010 +1000
+++ b/src/HOL/Parity.thyMon Aug 23 17:33:59 2010 +1000
@@ -66,9 +66,10 @@
   by presburger
 
 lemma even_times_anything: even (x::int) == even (x * y)

-  by algebra
+  by (simp add: int_even_iff_2_dvd)
 
-lemma anything_times_even: even (y::int) == even (x * y) by algebra

+lemma anything_times_even: even (y::int) == even (x * y)
+  by (simp add: int_even_iff_2_dvd)
 
 lemma odd_times_odd: odd (x::int) == odd y == odd (x * y) 
   by (simp add: even_def zmod_zmult1_eq)


?!? Why is the behaviour of the algebra method changed?


diff -r 2e1b6a8a0fdd src/Provers/hypsubst.ML
--- a/src/Provers/hypsubst.ML   Wed Jul 28 06:38:17 2010 +1000
+++ b/src/Provers/hypsubst.ML   Mon Aug 23 17:33:59 2010 +1000



+(* Counts the occurences of Free and Bound variables in a term. *)
+fun count_vars' _ (Free f) = Termtab.map_default (Free f, 0) (fn x = x + 1)
+  | count_vars' _ (Var _) = I
+  | count_vars' _ (Const _) = I
+  | count_vars' n (Bound m) = Termtab.map_default (Bound (m - n), 0)
+  (fn x = x + 1)
+  | count_vars' n (Abs (_, _, t)) = count_vars' (n + 1) t
+  | count_vars' n (f $ x) = count_vars' n f o count_vars' n x
+
+fun count_vars ts = fold (count_vars' 0) ts Termtab.empty


This is very strange, since it counts the bound variables in
(%f. f) (%x. x) as the same variable. It doesn't seem to produce a 
problem now, since the variables in a hypothesis are goal parameters, 
which are nested linearly, but it is confusing nevertheless.



+
+(* An equality hypothesis 'x = y' is trivial if substituting with it would have
+   no useful effect. The principal test for this is if x or y is a unique
+   variable, occuring nowhere else in the goal.  The exception to the rule is
+   if x is bound and y is unique but free. *)


What is a useful effect? This is not a rhetorical question -- I really 
don't understand. Is the second sentence the actual definition?


A more general remark:
Larry's comments in hypsubst.ML:

(*If novars then we forbid Vars in the equality.
  If bnd then we only look for Bound variables to eliminate.
  When can we safely delete the equality?
Not if it equates two constants; consider 0=1.
Not if it resembles x=t[x], since substitution does not eliminate x.
Not if it resembles ?x=0; consider ?x=0 == ?x=1 or even ?x=0 == P
Not if it involves a variable free in the premises,
but we can't check for this -- hence bnd and bound_hyp_subst_tac


The last point really indicates the main problem: the premises are 
actually facts in the surrounding context. In other words, it is almost 
never safe to eliminate Frees, unless we know for some reason that all 
relevant facts have already been inserted in the goal...


So it seems the only really conservative step is to eliminate bound 
variables only??? How many proofs would have to be fixed if safe 
methods were restricted to this? I gues it's a lot more, but it may 
still be better than trading one heuristic for a more complex one, which 
is still unsafe...


Alex

___
Isabelle-dev mailing list
Isabelle-dev@mailbroy.informatik.tu-muenchen.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Safe approach to hypothesis substitution

2010-08-23 Thread Thomas Sewell
Hello again isabelle-dev.

I should clear up some confusion that may have been caused by my mail. I was 
trying to avoid repeating all that had been said on this issue, and it seems I 
left out some that hadn't been said as well.

This approach avoids ever deleting an equality on a Free variable during a 
'safe' reasoning step. It will substitute with that equality but keep it in 
place. This is a significant change to the intermediate state of some tactic 
scripts, thus the incompatibilities, especially on induction. It is somewhat 
surprising there aren't more such incompatibilities. There's no truly safe way 
to delete an equality on a Free variable without inspecting the whole context.

Since equalities are not deleted, care must be taken to avoid an infinite loop 
in which a substitution is repeatedly attempted. This is the reason for the 
uniqueness check - if x appears only in 'x = expr' then there is no more 
substitution to be done. Some subtleties were discovered experimentally:
  - 'x = y' should not be substituted if either x or y is unique, or else we 
enter a loop substituting x - y - x
  - 'expr = x' needs to be reoriented when it is substituted, or else hypsubst 
and simp will loop doing opposite substitutions.
  - 'x = y' is OK to substitute if y is a unique free and x is a non-unique 
Bound.

This is what is meant by a 'useful' substitution in the comment. Other 
equalities, which would cause a no-effect or cyclic substitution, are referred 
to as trivial (abbreviated triv) in this code. The language should probably be 
cleanup up to use a consistent set of names.

Let me address Alex's detailed comments:

1. In the HOL sources the algebra method is used to solve two problems about 
parity essentially by accident. No Groebner bases were involved, but the 
surrouding logic, which appeals to clarify_tac and a custom simpset, happened 
to solve the goal. With this change the simplified goal had an extra quantifier 
in it. Note that with this change terminal tactics operating in 'unsafe' mode - 
fast, best, etc - will call thin_triv_tac via Context_Rules and behave 
similarly to before. It is terminal tactics which take safe steps - like 
algebra - that are most likely to break.

2. Yes, count vars is a bit strange, as it will consider (%f. f) (%x. x) to 
contain two copies of Bound -1. The point is to count loose Bounds in the 
hypotheses as returned by Logic.strip_assums_hyp, which it gets right.

Yours,
Thomas.


From: Alexander Krauss [kra...@in.tum.de]
Sent: Monday, August 23, 2010 7:35 PM
To: Thomas Sewell
Cc: Isabelle Developers Mailing List
Subject: Re: [isabelle-dev] Safe approach to hypothesis substitution

Hi Thomas,

Thanks for digging into this. The concrete patch is exactly what was
needed to advance this discussion. Here are a few comments. I am sure
others will have more...

 I attach the resulting patch. After
 several rounds of bugfixes and compatibility improvements, it requires
 23 lines of changes to theories to rebuild HOL, HOL-ex and HOL-Word.

So it is painful (as there are a much more theories out there), but it
may work, and it may be worth the trouble in the long run.

 The code change in Provers/hypsubst.ML adds code for counting the free
 and bound variables in a goal, and checking whether either side of an
 equality hypothesis is a unique variable, occuring nowhere else. The
 main substitution algorithms avoid using such equalities and also
 preserve rather than delete the hypothesis they select.

Do I understand correctly that substituting the hypothesis is safe, but
discarding them afterwards is unsafe? Then, why should they not be used?

More importantly, why is it safe to use and discard an equality when the
variable occurs elsewhere in the goal? The equality itself could
interact with a fact from the context? Consider the contrived example

locale l =
   fixes x :: nat
   assumes neq: x ~= 0
begin

lemma x = 0 == x = x + 1 == False
apply safe

After safe, the lemma becomes unprovable, since the hypothesis that
contradicts the context is removed. It seems that your patch does not
fix this... (I haven't tried... just guessing from reading the code)


[...]

A few concrete remarks concerning the patch (which seems to be relative
to Isabelle2009-2):


 diff -r 2e1b6a8a0fdd src/HOL/Parity.thy
 --- a/src/HOL/Parity.thy  Wed Jul 28 06:38:17 2010 +1000
 +++ b/src/HOL/Parity.thy  Mon Aug 23 17:33:59 2010 +1000
 @@ -66,9 +66,10 @@
by presburger

  lemma even_times_anything: even (x::int) == even (x * y)
 -  by algebra
 +  by (simp add: int_even_iff_2_dvd)

 -lemma anything_times_even: even (y::int) == even (x * y) by algebra
 +lemma anything_times_even: even (y::int) == even (x * y)
 +  by (simp add: int_even_iff_2_dvd)

  lemma odd_times_odd: odd (x::int) == odd y == odd (x * y)
by (simp add: even_def zmod_zmult1_eq)

?!? Why is the behaviour of the algebra method changed?

 diff -r 2e1b6a8a0fdd