Just a minor clarification in response to Jason Palmer's latest post:

I see merit in some of Prof. Zadeh's claims. I am insufficiently qualified
even to say whether or not I agree with all of his claims. But I am honored
to be mentioned in such eminent company.

For what it is worth: I tend to think that a variety of formal approaches
would be necessary to describe how the law does handle or should handle
(various kinds of) uncertainty.

Very sincerely,

  Peter T


-----Original Message-----
From: Jason Palmer [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 08, 2004 5:08 PM
To: Lotfi A. Zadeh; UAI
Cc: Peter Tillers; David Larkin
Subject: RE: [UAI] causal_vs._functional models

Prof. Zadeh,

The following claim seems to be central to your (and Prof. Tillers')
position: Real concepts do not have hard boundaries. For example there is no
hard cutoff for a thing being large or not, or a person being tall or not,
or even a law being broken or not. Thus it makes no sense to discuss things
like the probability that a certain thing is large. Rather we should assign
a set-membership function that generalizes the indicator function, and gives
the "degree" to which the category applies, or the "degree" of smallness or
tallness or illegality.

I agree that there are no hard boundaries, but I would argue that this does
not invalidate the consideration of bivalent logic. A given object will be
classified by a given person as either small or not. There are certainly
"borderline" cases where one is hesitant to say whether something is small
or not. But if we disallow the possibility of refusing to answer, then when
asked "Is this small?", we must answer "Yes, it is small", or "No, it is not
small".

The necessity to answer may seem ridiculous in the case of determining
whether something is small, or someone is tall, but it seems to be very
appropriate in the more practically important legal examples that have been
mentioned by Prof. Tillers. A Judge cannot refuse to make a decision as to
the application or not of a law to a specific instance. The concepts
involved may be fuzzy, and there may be no way of consistently deciding all
instances, but this doesn't mean that a decision is not made, be it a good
one or a bad one.

To me, this situation suggests that we consider not the probability that
something is small or that a law was broken, nor the degree to which
something is small or a law was broken, but rather the conditional
distribution of the classification of a thing as small given that its size
is x, etc., and the conditional distribution of deciding that the law was
broken given that such and such was done. There is no way, and no need to
decide whether the thing _acutally is_ small, or whether the law _actually
was_ broken---we are assuming that there is no single consistent criterion
that we can apply to decide all cases. But in particular instances a person
will say the thing is small or not, and a Judge will decide the law was
broken or not.

Agents may employ considerations of degree in making decisions, but what is
"ontologically" important is the decision that is ultimately made. The
future is determined by the classification of a thing in particular
instances, e.g. the future of the roller skater is determined by the Judge's
decision. Once the decision is made, it doesn't matter to what degree we say
she broke the law. The future depends only on the decision, which will
always be a definite non-fuzzy outcome. In order to predict the future, we
attempt to determine the probability of certain classifications being made.
There is no reason to determine an abstract "degree" that a category holds,
apart from any decisions made based on it. The only thing we are interested
in is which categrory will be decided, and for this we use the conditional
probability of the decision given the data. Such conditional probabilities
may look suspiciously like set-membership functions, but they come with the
added benefit of the probabilistic calculus.

- ----

The following answers to the 5 questions you proposed are admittedly
simplistic, but are intended to suggest that probabilistic answers are not
impossible. I rely on the Maximum Entropy principle (Jaynes; Shore and
Johnson), which may have difficulties of its own that can certainly be
discussed.

Question 1. The balls-in-box problem. A box contains black and white balls.
My perceptions are: (a) there are about twenty balls; (b) most are black;
and (c) there are several times as many black balls as white balls. What is
the probability that a ball drawn at random is white?"

Bayesian answer: Let N be a random variable representing the number of balls
in the box, let T be a random variable representing the number of times as
many black balls as white balls there are. Let W be a random variable
representing how many white balls there are. We are interested in the random
variable P = W / N = W / (W + T*W) = 1 / (1 + T). So P depends only on T.
Assign a distribution to T consistent with your perception, e.g. Gamma with
mean 3 and some variance. Then the posterior distribution of P is easily
calculated. Choose a risk function and determine the value of P^ that
minimizes the risk.

Question 2. The Robert example. Usually Robert leaves his office at about
5:30pm. Usually it takes him about thirty minutes to get home. What is the
probability that Robert is home at about 6:15pm?

Bayesian answer: Let T be the time that Robert leaves his office, and let D
be the time it takes him to drive home, both random variables. We are
interested in the random variable H = T + D. Assign priors with means and
variances consistent with your perception of the mean of T = 17.5 and mean
of D = 0.5, and consistent with your level of certainty in these estimates.
Calculate the distribution of H (the convolution of the distributions of T
and D). Then the desired probability can be determined, for example, by
integrating the density of H from 8.25 - delta to 8.25 + delta, where
2*delta is the size of a symmetric window around 8.25.

Question 3. The tall Swedes problem. My perception is that most Swedes are
tall. What is the average height of Swedes?

Bayesian answer: Assign a prior classification distribution P(T=1|H=h) over
the range of possible heights h (for a real Bayesian learner, this
distribution will be "built in" through the learning process). Now assign a
probability for P(T=1|Sw=1), or make this probability a random variable and
assign a distribution to it centered around a value consistent with what you
consider "most". We are interested in E(H|S=1), for which we need the
distribution p(H=h|Sw=1). There is not enough information to determine the
desired distribution. Following the Maximum Entropy principle, we choose the
distribution with maximum entropy subject to the constraints imposed by our
determinations of P(T=1|H=h) and P(T=1|Sw=1).

Question 4. X is a real-valued random variable. Usually X is not very large.
Usually X is not very small. What is the probability that X is neither small
nor large? What is the expected value of X?

Bayesian answer: Assign the prior probabilities P(L=1) and P(S=1) and the
distributions P(L=1|X=x) and P(S=1|X=x). Then we are interested in P(N=1) =
P(L=0 and S=0) = 1 - P(L=1) - P(S=1), and E(X) = P(L=1)*E(X|L=1) +
P(S=1)*E(X|S=1). For the latter, determine the maximum entropy distribution
for X consistent with the given P(L=1|X=x) = p(X=x|L=1)*P(L=1)/p(X=x) and
P(S=1|X=x).

Question 5. X is a real-valued random variable. My perception of the
probability distribution of X may be described as: Prob(X is small) is low;
Prob(X is medium) is high; Prob(X is large) is low. What is the expected
value of X?

Bayesian answer: Assign prior classification distributions P(S=1|X=x),
P(M=1|X=x), and P(L=1|X=x), and determine the maximum entropy distribution
for X, then calculate E(X).

- ----

My ideas are certainly not as well thought out as yours, and have likely
been voiced many times before, but will perhaps serve as a stimulus for
further constructive debate.

Regards,
Jason


- -----Original Message-----
From: Lotfi A. Zadeh [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 5:25 PM
To: UAI
Cc: Peter Tillers; Peter Tillers; Jason Palmer; David Larkin; Lotfi A. Zadeh
Subject: Re: [UAI] causal_vs._functional models


<!--[if !supportEmptyParas]--> <!--[endif]-->
What are the tools that are needed to solve such problems? What is needed is
a generalization of PT�a generalization which adds to PT the capability to
operate on perception-based information expressed in a natural language.
Such generalization, call it PTp, was described in my paper, �Toward a
Perception-Based Theory of Probabilistic Reasoning with Imprecise
Probabilities,� Journal of Statistical Planning and Inference, Vol. 105,
233-264, 2002. (Downloadable
http://www-bisc.cs.berkeley.edu/BISCProgram/Projects.htm).
            For illustration, a very brief sketch of PTp-based solutions of
Problems 1 and 3 is presented in this following.
Problem 1. Let X, Y and P denote, respectively, the number of black balls,
the number of white balls and the probability that a ball drawn at random is
white. Let a* denote �approximately a,� with �approximately a� defined as a
fuzzy set centering on a. Translating perception-based information into the
Generalized Constraint Language (GCL), we arrive at the following equation:
                        (X+Y) is 20*
                        X is most�20*
                        X is several�Y
                        P is Y/20*.
In these equations, most and several are fuzzy numbers which are
subjectively defined through their membership functions. X, Y and P are
fuzzy numbers which are solutions of the equations. X, Y and P can readily
be computed through the use of fuzzy integer programming.
            Problem 3. Assume that height of Swedes ranges from hmin to
hmax. Let g(u) denote the count density function, meaning that g(u)du is the
proportion of Swedes whose height lies in the interval u and u+du. The
proposition �Most Swedes are tall� translates into �The integral over the
interval [hmin , hmax] of g(u) times the membership function of tall, t(u),
is most,� where most is a fuzzy number which is subjectively defined through
its membership function.
            The average height, have, is the integral over the interval
[hmin, hmax] of g(u) times u. If g were known, this would be the average
height of tall Swedes. In our problem, g is not known, but what we know is
that it is constrained by the translation of the proposition �Most Swedes
are tall.� Through constraint propagation, the constraint on g induces a
constraint on the average height. The rule governing constraint propagation
is the extension principle of fuzzy logic. Applying this principle to the
problem in question, leads to the membership function of the fuzzy set which
describes the average height of Swedes. Details relating to use of the
extension principle may be found in my JSPI paper.
            To understand why reasoning with perception-based information
described in a natural language is beyond the reach of PT and BNT, it is
helpful to introduce the concept of dimensionality of natural languages.
Basically, a natural language is a system for describing perceptions. Among
the many perceptions which underlie human cognition, there are three that
stand not in importance: perception of truth (verity); perception of
certainty (probability); and perception of possibility. These perceptions
are distinct and each is associated with a degree which may be interpreted
as a coordinate along a dimension. Thus, we can speak of the dimension of
truth (verity), dimension of certainty (probability) and dimension of
possibility.
            Natural languages are three-dimensional in the sense that, in
general, a proposition, in a natural language is partially true, or
partially certain, or partially possible, or some combination of the three.
For example, �It is very likely that Robert is tall,� is associated with
partial certainty and partial possibility, while �It is quite true that Mary
is rich,� is associated with partial truth and partial possibility. Standard
probability theory, PT, is one-dimensional in that it deals only with
partiality of certainty and not with partiality of truth nor with partiality
of possibility. The mismatch in dimensionalities is the reason why PT and
BNT are ill-equipped for dealing with perception-based information expressed
in a natural language. Note that, unlike PT, PTp is three-dimensional.
In retrospect, historians of science may find it difficult to understand why
what is so obvious�that partiality of certainty and partiality of truth are
distinct concepts and require different modes of treatment�encountered so
much denial and resistance.
            Partiality of truth and partiality of certainty play pivotal
roles in human cognition. But, in the realm of law, partiality of truth, and
partiality of class membership, are much more pervasive than partiality of
certainty. In many instances, they occur in combination.

                            Warm regards to all,
                                  Lotfi
Lotfi A. Zadeh
Computer Science Division
University of California
Berkeley, CA 94720-1776
Tel(office): (510) 642-4959 Fax(office): (510) 642-1712


Reply via email to