Jonathan Weiss wrote:
> >The problem as stated is ill-posed until we know what set of alternatives we
> >are considering.
> 
> If that is the case, then every real-world problem is ill-posed.

I've never run into a real-world problem where I couldn't enumerate a sensible
set of alternatives; there only seems to be difficulty when dealing with
hypothetical problems in debates :-).  In fact, in the case we are talking about
(guessing the color of a card), once we take into account everything we really
know and stop pretending to be totally ignorant, we find that we can in fact
enumerate the set of alternatives -- it is the three-dimensional space of
perceptually distinct colors.

> >Then, by the permutation invariance argument given below, this state of ignorance
> >must be represented by P(red | X0) = 1/2.
> 
> Only if you first accept the possibility of being completely ignorant.  My
> point throughout is that complete ignorance is itself ill-defined.

There are sensible ways of describing a state of complete ignorance (beyond
knowledge of the set of alternatives, i.e., knowing the sample space) into a
corresponding probability distribution.  The three papers I referenced in
response to Haenni discuss the subject in some detail.  This is still an area
(expressing the concept of total ignorance, and, in general, methods for
converting various kinds of information into probability distributions) that
needs more research.  But there are specific circumstances in which the notion
of complete ignorance is well-defined.

Whether complete ignorance is a useful notion (and whether one is ever
completely ignorant) is a different matter altogether.  I agree with you that
complete ignorance is probably pretty rare, if it ever exists at all.  But the
notion can be useful in establishing reference priors for inference in
contentious circumstances where different people assign vastly differing priors
based on greatly divergent beliefs.  It is also useful as a simplification where
one's actual prior is sufficiently close to a non-informative prior, or there is
sufficient data, that the posterior probabilities are little changed by using
the non-informative prior.  Finally, when using maximum entropy techniques to
construct a prior encoding certain information about a continuous sample space,
it turns out that the necessary first step is to identify the appropriate
non-informative prior for your sample space.

> Besides, the only ignorance we should be addressing is ignorance about the colors of
> the particular cards in this example, not ignorance about what a color is or what
> the names of colors are.

I agree with you here.  I examined the other possibility simply to highlight the
different kinds of ignorance one can have, and what kinds of priors are
appropriate in these different cases.

> >  2) Assuming you assigned some finite probability P(red), now for the same
> >     card that you still haven't seen, what is the probability that it is
> >     blue?
> >
> >[...]  We are talking about two qualitatively different conditional probabilities,
> >one conditioned on X0, the other conditioned on X1.  It should surprise nobody that
> >my assignment of probabilities changes when I have access to more information.
> 
> What is different about your state of information?

Given the context of the discussion that preceded your questions, I assumed that
we were trying to get a grip on the notion of complete ignorance.  So I
proceeded on the admittedly unrealistic assumption of ignorance even of what the
labels meant.  For each question I assumed that we had no information available
other than what could be found in the question itself.  Since (2) mentions an
alternative that is not present in (1), for a completely ignorant person (who
only knows what is contained in the statement of the question) it represents a
different state of information.

Once again, I will readily admit that this is all somewhat artificial.  The
right answer is really to use all the information you have available.  But I
think it is somewhat instructive to consider what the answer would be for some
of these highly-ignorant states of information.

My preferred solution is to choose some parameterization x,y,z over the
three-dimensional color space and define a non-informative prior probability
density p(x,y,z) over it.  Then, once you define exactly what volume of this
space you mean by "red", the probability of a red card -- assuming complete
ignorance as to choice of color -- is simply the integral of p(x,y,z) over this
volume.  You get the same answer no matter how many additional colors you bring
into the problem.

> There are lots of ways to organize color space, and Munsell is only one.  How
> do you pick what space to assign a uniform density over?

Non-informative priors don't necessarily correspond to uniform densities.  If I
were to look into this problem in more detail, I would try to identify various
changes of parameter that leave the problem invariant, and derive a density from
these considerations.

> >[Discussion of using an improper uniform prior to express ignorance about a
> >location parameter, and an improper prior proportional to 1/x to express
> >ignorance about a scale parameter.]
> 
> Now who's assuming a lot of information? This subject doesn't know his colors
> yet, but he can tell a location parameter from a scale parameter and apply
> invariance arguments!?! :-)

You asked about an ignorance prior over the real line, which implied that we
weren't talking about colors (colors occupy a bounded, three-dimensional
space).  

> Of what value is an improper uniform prior when you have to make a bet?  The
> probability of any bounded interval would be zero.

Of course, improper priors are useless *by themselves* when it comes to making
decisions.  But as the limit of a sequence of progressively more spread-out and
less informative priors, they can be quite useful in simplifying problems of
parameter estimation where you have very little prior knowledge, or a lot of
data.  You use Bayes' Rule to get a posterior distribution from your improper
prior and the likelihood function for your data.  Usually this will be a proper
distribution.  If it isn't, this tells you that you can't get away with
pretending to be ignorant a priori because the data provide insufficient
information for any useful inference in that case, and you're going to have to
think carefully about what information you do have, so that you can produce a
proper prior.  (I speak from experience -- I once made the mistake of using an
improper prior without checking to make sure the posterior was proper, and was
very puzzled at the results my program produced...)

> I would very much like to see a concrete example of a situation where an
> "uninformed" prior is superior in any decision-relevant way to a possibly
> arbitrary or erroneous, but at least consistent, "real" prior.

OK, here's one.  I am measuring voltages in some experiment, and I know from
past experience and other information to expect non-systematic errors with a
variance of sigma**2.  I would like to calibrate my measurements to remove any
systematic error.  So my model is
  v - v0 = u + sigma * e
where v is the measured voltage, v0 is the actual voltage, e is a random
variable distributed normally, u is the unknown bias, and sigma is a known
quantity.

Now suppose I can produce a set of reference voltages, known to a much greater
accuracy than sigma.  Perhaps I have some high-accuracy method of determining
voltages that is too expensive and time-consuming to use in my experiment.  This
gives me a set of data
  (v[1], v0[1]), (v[2], v0[2]), (v[3], v0[3]), ...

Suppose that experience suggests using a Gaussian with mean 0 and standard
deviation of 0.1 volt as a prior over the systematic error u.  If I use an
improper prior over u, my posterior over u will be a Gaussian centered at the
mean of the differences v[i] - v0[i], with a variance of sigma**2 / n (where n
is the number of data points).  If n is large enough that sigma/sqrt(n) << 0.1,
the posterior obtained using an improper prior will be close to the posterior
obtained using the "right" prior.  However, suppose I use a grossly
inappropriate prior over u, for example, a Gaussian centered at 10 million
volts, with a standard deviation of 1 volt.  In this case the prior will
dominate the data until n becomes very large, causing the posterior to be
concentrated almost entirely at very large values of u. Not until I have an
enormous amount of data will the damage caused from using this grossly
inappropriate prior become inconsequential.

Reply via email to