Prof. Zadeh, The following claim seems to be central to your (and Prof. Tillers') position: Real concepts do not have hard boundaries. For example there is no hard cutoff for a thing being large or not, or a person being tall or not, or even a law being broken or not. Thus it makes no sense to discuss things like the probability that a certain thing is large. Rather we should assign a set-membership function that generalizes the indicator function, and gives the "degree" to which the category applies, or the "degree" of smallness or tallness or illegality.
I agree that there are no hard boundaries, but I would argue that this does not invalidate the consideration of bivalent logic. A given object will be classified by a given person as either small or not. There are certainly "borderline" cases where one is hesitant to say whether something is small or not. But if we disallow the possibility of refusing to answer, then when asked "Is this small?", we must answer "Yes, it is small", or "No, it is not small". The necessity to answer may seem ridiculous in the case of determining whether something is small, or someone is tall, but it seems to be very appropriate in the more practically important legal examples that have been mentioned by Prof. Tillers. A Judge cannot refuse to make a decision as to the application or not of a law to a specific instance. The concepts involved may be fuzzy, and there may be no way of consistently deciding all instances, but this doesn't mean that a decision is not made, be it a good one or a bad one. To me, this situation suggests that we consider not the probability that something is small or that a law was broken, nor the degree to which something is small or a law was broken, but rather the conditional distribution of the classification of a thing as small given that its size is x, etc., and the conditional distribution of deciding that the law was broken given that such and such was done. There is no way, and no need to decide whether the thing _acutally is_ small, or whether the law _actually was_ broken---we are assuming that there is no single consistent criterion that we can apply to decide all cases. But in particular instances a person will say the thing is small or not, and a Judge will decide the law was broken or not. Agents may employ considerations of degree in making decisions, but what is "ontologically" important is the decision that is ultimately made. The future is determined by the classification of a thing in particular instances, e.g. the future of the roller skater is determined by the Judge's decision. Once the decision is made, it doesn't matter to what degree we say she broke the law. The future depends only on the decision, which will always be a definite non-fuzzy outcome. In order to predict the future, we attempt to determine the probability of certain classifications being made. There is no reason to determine an abstract "degree" that a category holds, apart from any decisions made based on it. The only thing we are interested in is which categrory will be decided, and for this we use the conditional probability of the decision given the data. Such conditional probabilities may look suspiciously like set-membership functions, but they come with the added benefit of the probabilistic calculus. - ---- The following answers to the 5 questions you proposed are admittedly simplistic, but are intended to suggest that probabilistic answers are not impossible. I rely on the Maximum Entropy principle (Jaynes; Shore and Johnson), which may have difficulties of its own that can certainly be discussed. Question 1. The balls-in-box problem. A box contains black and white balls. My perceptions are: (a) there are about twenty balls; (b) most are black; and (c) there are several times as many black balls as white balls. What is the probability that a ball drawn at random is white?" Bayesian answer: Let N be a random variable representing the number of balls in the box, let T be a random variable representing the number of times as many black balls as white balls there are. Let W be a random variable representing how many white balls there are. We are interested in the random variable P = W / N = W / (W + T*W) = 1 / (1 + T). So P depends only on T. Assign a distribution to T consistent with your perception, e.g. Gamma with mean 3 and some variance. Then the posterior distribution of P is easily calculated. Choose a risk function and determine the value of P^ that minimizes the risk. Question 2. The Robert example. Usually Robert leaves his office at about 5:30pm. Usually it takes him about thirty minutes to get home. What is the probability that Robert is home at about 6:15pm? Bayesian answer: Let T be the time that Robert leaves his office, and let D be the time it takes him to drive home, both random variables. We are interested in the random variable H = T + D. Assign priors with means and variances consistent with your perception of the mean of T = 17.5 and mean of D = 0.5, and consistent with your level of certainty in these estimates. Calculate the distribution of H (the convolution of the distributions of T and D). Then the desired probability can be determined, for example, by integrating the density of H from 8.25 - delta to 8.25 + delta, where 2*delta is the size of a symmetric window around 8.25. Question 3. The tall Swedes problem. My perception is that most Swedes are tall. What is the average height of Swedes? Bayesian answer: Assign a prior classification distribution P(T=1|H=h) over the range of possible heights h (for a real Bayesian learner, this distribution will be "built in" through the learning process). Now assign a probability for P(T=1|Sw=1), or make this probability a random variable and assign a distribution to it centered around a value consistent with what you consider "most". We are interested in E(H|S=1), for which we need the distribution p(H=h|Sw=1). There is not enough information to determine the desired distribution. Following the Maximum Entropy principle, we choose the distribution with maximum entropy subject to the constraints imposed by our determinations of P(T=1|H=h) and P(T=1|Sw=1). Question 4. X is a real-valued random variable. Usually X is not very large. Usually X is not very small. What is the probability that X is neither small nor large? What is the expected value of X? Bayesian answer: Assign the prior probabilities P(L=1) and P(S=1) and the distributions P(L=1|X=x) and P(S=1|X=x). Then we are interested in P(N=1) = P(L=0 and S=0) = 1 - P(L=1) - P(S=1), and E(X) = P(L=1)*E(X|L=1) + P(S=1)*E(X|S=1). For the latter, determine the maximum entropy distribution for X consistent with the given P(L=1|X=x) = p(X=x|L=1)*P(L=1)/p(X=x) and P(S=1|X=x). Question 5. X is a real-valued random variable. My perception of the probability distribution of X may be described as: Prob(X is small) is low; Prob(X is medium) is high; Prob(X is large) is low. What is the expected value of X? Bayesian answer: Assign prior classification distributions P(S=1|X=x), P(M=1|X=x), and P(L=1|X=x), and determine the maximum entropy distribution for X, then calculate E(X). - ---- My ideas are certainly not as well thought out as yours, and have likely been voiced many times before, but will perhaps serve as a stimulus for further constructive debate. Regards, Jason - -----Original Message----- From: Lotfi A. Zadeh [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 5:25 PM To: UAI Cc: Peter Tillers; Peter Tillers; Jason Palmer; David Larkin; Lotfi A. Zadeh Subject: Re: [UAI] causal_vs._functional models <!--[if !supportEmptyParas]--> <!--[endif]--> What are the tools that are needed to solve such problems? What is needed is a generalization of PT�a generalization which adds to PT the capability to operate on perception-based information expressed in a natural language. Such generalization, call it PTp, was described in my paper, �Toward a Perception-Based Theory of Probabilistic Reasoning with Imprecise Probabilities,� Journal of Statistical Planning and Inference, Vol. 105, 233-264, 2002. (Downloadable http://www-bisc.cs.berkeley.edu/BISCProgram/Projects.htm). For illustration, a very brief sketch of PTp-based solutions of Problems 1 and 3 is presented in this following. Problem 1. Let X, Y and P denote, respectively, the number of black balls, the number of white balls and the probability that a ball drawn at random is white. Let a* denote �approximately a,� with �approximately a� defined as a fuzzy set centering on a. Translating perception-based information into the Generalized Constraint Language (GCL), we arrive at the following equation: (X+Y) is 20* X is most�20* X is several�Y P is Y/20*. In these equations, most and several are fuzzy numbers which are subjectively defined through their membership functions. X, Y and P are fuzzy numbers which are solutions of the equations. X, Y and P can readily be computed through the use of fuzzy integer programming. Problem 3. Assume that height of Swedes ranges from hmin to hmax. Let g(u) denote the count density function, meaning that g(u)du is the proportion of Swedes whose height lies in the interval u and u+du. The proposition �Most Swedes are tall� translates into �The integral over the interval [hmin , hmax] of g(u) times the membership function of tall, t(u), is most,� where most is a fuzzy number which is subjectively defined through its membership function. The average height, have, is the integral over the interval [hmin, hmax] of g(u) times u. If g were known, this would be the average height of tall Swedes. In our problem, g is not known, but what we know is that it is constrained by the translation of the proposition �Most Swedes are tall.� Through constraint propagation, the constraint on g induces a constraint on the average height. The rule governing constraint propagation is the extension principle of fuzzy logic. Applying this principle to the problem in question, leads to the membership function of the fuzzy set which describes the average height of Swedes. Details relating to use of the extension principle may be found in my JSPI paper. To understand why reasoning with perception-based information described in a natural language is beyond the reach of PT and BNT, it is helpful to introduce the concept of dimensionality of natural languages. Basically, a natural language is a system for describing perceptions. Among the many perceptions which underlie human cognition, there are three that stand not in importance: perception of truth (verity); perception of certainty (probability); and perception of possibility. These perceptions are distinct and each is associated with a degree which may be interpreted as a coordinate along a dimension. Thus, we can speak of the dimension of truth (verity), dimension of certainty (probability) and dimension of possibility. Natural languages are three-dimensional in the sense that, in general, a proposition, in a natural language is partially true, or partially certain, or partially possible, or some combination of the three. For example, �It is very likely that Robert is tall,� is associated with partial certainty and partial possibility, while �It is quite true that Mary is rich,� is associated with partial truth and partial possibility. Standard probability theory, PT, is one-dimensional in that it deals only with partiality of certainty and not with partiality of truth nor with partiality of possibility. The mismatch in dimensionalities is the reason why PT and BNT are ill-equipped for dealing with perception-based information expressed in a natural language. Note that, unlike PT, PTp is three-dimensional. In retrospect, historians of science may find it difficult to understand why what is so obvious�that partiality of certainty and partiality of truth are distinct concepts and require different modes of treatment�encountered so much denial and resistance. Partiality of truth and partiality of certainty play pivotal roles in human cognition. But, in the realm of law, partiality of truth, and partiality of class membership, are much more pervasive than partiality of certainty. In many instances, they occur in combination. Warm regards to all, Lotfi Lotfi A. Zadeh Computer Science Division University of California Berkeley, CA 94720-1776 Tel(office): (510) 642-4959 Fax(office): (510) 642-1712
