Kevin,
What you describe is essentially what I was getting at.
With one proviso. In my "chaotic universe" that .5 I put on t for
tomorrow has a lot of higher order uncertainty. Right now my best guess is
.5, but when I get to tomorrow, I might believe the correct probability to
have assigned today is ANYTHING.
Kathy
At 7:59 AM -0600 7/1/99, Kevin S. Van Horn wrote:
>Ronald E. Parr wrote:
>>
>> However you interpret Bayes rule, you make the assumption that the
>> evidence upon which you are conditioning is germane to the proposition
>> in question.
>
>Not at all, as I demonstrated in another posting. Bayes' Rule is perfectly
>capable of handling situations where the evidence is irrelevant to the
>proposition of interest, in which case the posterior is identical to the
>prior.
>
>I should be clear, by the way, that I was merely addressing a technical point
>about Bayes' Rule; I wasn't really adressing the question you and Kathy were
>debating over whether you can use Bayes' Rule to determine whether induction
>works. The discussion struck me as too vague -- too dependent on slippery
>English -- for any useful conclusion to be reached.
>
>However... now my interest is piqued. Can we turn this into a precise
>mathematical question? I'm going to take a shot at it.
>
>First of all, our universe is far too messy, so I'm going to look at a much
>simpler universe, following a long tradition in mathematics and physics of
>tackling simple problems before you even consider more complicated ones. In
>this universe, time is discrete, and there is only one observable: a Boolean
>value. So at time t, the past is described by
>
> x_1, x_1, ..., x_t
>
>and we want to know if we can make any inferences from this as to whether
>"induction works."
>
>I don't know exactly what it means to say that "induction works" or "induction
>doesn't work," so I'm going to just propose a definition to get the discussion
>off the ground. It's perfectly reasonable for you to object to my definition,
>but then you'll have to help me find a different -- and equally precise --
>definition that is acceptable to you.
>
>My interpretation of "induction works" is that there is some regularity to the
>universe. Let L be some algorithmic language and p be some program in that
>language that takes as input some *state* s and produces as output a new state
>s' and a Boolean value y. We define
>
> out(q,s_0,t) is the sequence y_1,...,y_t produced by setting
> (s_(i+1),y_(i+1)) = q(s_i) for all 0 <= i < t.
>
>Then I take "induction works" to mean that there is some program p in
>language L
>and initial state s_0 such that
>
>- x_1,...,x_t = out(p,s_0,t) for all t.
>- out(p,s_0,t) is defined for all t (p never gets hung up in an infinite loop
>when computing the next state and value).
>
>That is, our universe has a hidden state at each time step, and both the next
>state and the observable x_t at the current time step can be computed from the
>current state using p. To complete the hypothesis of "induction works", we'll
>also have to specify a prior f over the programs p and initial states s_0.
>For
>example, we could use a prior that assigns higher probabilities to (program,
>initial state) pairs representable in fewer bits, in keeping with our
>assumption
>that the universe is regular.
>
>My interpretation of "induction doesn't work" is that the x_i are completely
>unrelated, that is, they are independent random variables, so knowing any
>finite
>subset of the x_i tells you absolutely nothing about any of the remaining
>x_i.
>As to what probability we should assign to x_i, we choose the least
>informative
>distribution: x_i and ~x_i equally probable.
>
>Let I be the hypothesis that induction works, and R be the hypothesis that it
>does not. Let X be the information about I and R described above. We can
>formalize all of this as follows:
>
> 1. P(I or R | X) = 1
>
> 2. P(x_i | x_(k_1), ..., x_(k_n), R, X) = P(x_i | R, X)
> (that is, the x_i are independent of each other, given R)
>
> 3. P(x_i | R, X) = 1/2.
>
> 4. P(x_1, ..., x_t | p, s_0, I, X) = 1 if x_1,...,x_t = out(p, s_0, t);
> otherwise it is 0.
>
> 5. P(p, s_0 | I, X) = f(p, s_0).
>
> 6. P(I | X) = a
> (we have to assign some prior probability to the hypothesis that
>induction
>works).
>
>Then
>
> P(x_1, ..., x_t | R, X) = 2^(-t)
> P(x_1, ..., x_t | I, X) = (SUM p, s_0: out(p, s_0, t) = x_1,...,x_t:
>f(p,s_0))
> = S(x_1,...,x_t) [definition]
>
>So now we can compute the probability that "induction works," i.e., that there
>is some regularity to our universe:
>
> P(I | x_1,...,x_t, X) = a * S(x_1,...x_t)
> -----------------------------------
> (1-a) * 2^(-t) + a * S(x_1,...,x_t)
>
>For example, if x_1,...,x_t = false,true,false,true,..., then for large enough
>t, S(x_1,...,x_t) will converge to a constant > 0, whereas 2^(-t) will become
>arbitrarily small, so we become more and more sure that I is true as t
>increases.
>
>On the other hand, if x_1,...,x_t is a random sequence for all t (using the
>technical definition of randomness from Kolmogorov complexity theory, i.e.,
>there is no program representable in fewer than t bits that produces the
>x_1,...,x_t), then we will find S(x_1,...,x_t) going to zero faster than
>2^(-t),
>so we become more and more sure that R is true as t increases.
>
>We have not made any circular assumptions here -- we have explicitly
>allowed for
>the possibility that the past tells you nothing about the future, and we are
>capable reaching either conclusion (I true, or R true) depending on the data.
>The point is that Bayes' Rule allows us to *compare* different
>assumptions. We
>have used it here to compare the assumption of no order whatsoever (the past
>tells you nothing about the future) with an assumption of a regular universe.