Dear Prof. Zadeh,

Thanks for your response. [I have added your attachment (Fuzzy solution)
after the quote of your message below.]

It seems to me that you are not explaining sufficiently clearly why Bayesian
methods fail on this problem. You cite as a deficiency in the Maximum
Entropy principle that one cannot be sure about the answer one obtains by
it, but I don't see where you have made the point that Fuzzy methods allow
one to be sure. You say that the Fuzzy solution gives you a fuzzy answer
while the Bayesian/MaxEnt solution gives you a crisp answer. But in fact a
Bayesian would be only to happy to give you a posterior distribution of
height as the answer. The question was "What is the height ... ?", so I gave
a height as the answer. If you asked me to give you all the information I
have about the height of Swedes that might assist you in making some
decision, then I would give you a posterior distribution. I don't see how
assumptions made on Fuzzy variables and a Fuzzy answer are superior to the
Bayesian prior assumptions and posterior distribution answer.

Actually part of the point of my post was that distributions and fuzzy
answers are never the _real_ answer. They are something that you, or nature,
might use in determining a real answer. But ultimately, something
non-probabilistic and non-fuzzy must happen, even in quantum mechanics. So
it's perfectly natural that one would be required to give a specific answer
to the question "What is the average height ... ?" since ultimately one must
make a committment. It's certainly possible for the question to be "What is
the distribution of the height ... ?", for example if one is manufacturing
clothing in Sweden, one would want to know the distribution so that one
could make appropriate numbers of the different sizes. But here too one
still has ultimately to make a choice of exactly how many of each size to
make.

Regarding the problem versions you give, I think the Bayes/MaxEnt method can
readily deal with them and provide a posterior distribution of the height.
You give various rough constraints which can be modeled as prior
probabilities, and standard probability and Maximum Entropy can be used to
determine a posterior distribution of the height consistent with your
constraints and otherwise remaining maximally non-committal. It does indeed
seem odd to assume a uniform distibution for height, but it's equally odd to
assume a hard upper bound and a lower bound greater than zero on the
possible heights. If nature did in fact restrict heights to a certain
interval, then she might very well distribute them uniformly in that
interval. As nature does not impose this restriction, the MaxEnt posterior
that we get by imposing it is naturally incongruous.

If you ask me the average height of Swedes, and tell me that it is extremely
important that the answer be correct, then I would likely discretize the
possible heights (to use 0-1 loss), determine some posterior distribution of
the heights (given what I know, or what I believe weighted by the belief),
calculate the risk associated with each possible estimate by integrating the
risk times the posterior over all other values, and then choose a height
from the set of heights with the least risk. Or if I were feeling
particularly Bayesian, I might try a whole set of priors weighted by my
belief that they could be the true prior, and determine the risk of eash
estimate averaged over the possible priors. This is "model comparison",
which relies for it's consistency on the universal norm of probability. Of
course I would never be "sure" of my answer, but I have to give one, just as
someone using Fuzzy Logic would have to give an answer and be unsure of it.
If the Fuzzy Logicist is allowed to give fuzzy answers, then the Bayesian
must be allowed to give posterior distributions.

Kind regards,
Jason


- -----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of
Lotfi A. Zadeh
Sent: Friday, January 23, 2004 10:27 AM
To: [EMAIL PROTECTED]
Subject: Re: [UAI] functional_vs_causal models


Dear Jason:

    Thank you for solutions to the test problems. Your solutions show
that you have a high level of expertise in standard probability theory
(PT) and the maximum entropy principle. However, in my view your
solutions support, indirectly, my contention that PT and the maximum
entropy principle do not have a capability to deal with perception-based
information. To make my point, I will focus on the tall Swedes problem.
For convenience, I will formulate a progression of versions of this
problem. In these versions, what varies is the initial dataset. The
question is the same: What is the average height of Swedes. In the
following, a* denotes "approximately a."
    Version 1 (crisp). Swedes over 20 range in height from 140cm to
220cm. Let h be the height of a Swede picked at random. I am told that
the distribution of h is uniform, and am asked, "What is the average
height of Swedes?" My answer is: 180cm. If I am asked, "Are you sure?"
my answer would be "Yes."
    Now, I ask you the same question but without telling you what is the
probability distribution of h. You invoke the maximum entropy principle
and in response to my question tell me that the average height is 180cm.
But then I ask you, "Jason, are you sure that the average height is
180cm? If not, I may be in serious trouble." Your answer would have to
be: "No, I am not sure." This is a fundamental flaw of the maximum
entropy principle. Furthermore, as I have pointed out in earlier
messages, the principle is not applicable when information is
perception-based.
    Version 2 (fuzzy) Swedes over 20 range in height from 140cm to
220cm. Over 70* percent are taller than 170* cm. What is the average
height of Swedes over 20? A fuzzy logic solution is described in the
attachment.
    Version 3 (fuzzy) Swedes over 20 range in height from 140cm to
220cm. Over 70* percent are taller than 170*cm. Less than 10* percent
are shorter than 150*cm. Less than 15* percent are taller than 200*cm.
What is the average height of Swedes over 20?
    I would be very interested in your solutions to these versions, and
your answer to my question: Are you sure that the answer is correct?
What if an incorrect answer may lead to a serious loss?

                    With my warm regards,

                            Lotfi

Attachment:

Version 2. Swedes over 20 range in height from 140cm to 220cm. Over 70*
percent are taller than 170*cm. What is the average height of Swedes over
20?
        Fuzzy logic solution. Consider a population of Swedes over 20, S={Swede1,
Swede2, �, SwedeN}, with hi, i=1, �, N, being the height of Si.
        The datum �Over 70* percent of S are taller than 170*cm,� constrains the hi
in h=(hi, �, hN). The constraint is precisiated through translation into the
Generalized Constraint Language, GCL. More specifically, let X denote a
variable taking values in S, and let X|(h(X) is 170) denote a fuzzy subset
of S induced by the constraint h(X) is ? 170*. Then
Over 70* percent of S are taller than 170* ?
(GCL): 1/N Count(X|h(X) is >170*| is ? 0.7*
where Count is the fuzzy count of X�s which satisfy the fuzzy constraint
h(X) is ? 170*.
        A general deduction rule in fuzzy logic is the following. In this rule, X
is a variable which takes values in a finite set U={u, u2, �, uN}, and a(X)
is a real-valued attribute of X, with ai=a(ui) and a=(ai, �, aN)

1/N Count(X|a(X) is C) is B
        Av(X) is ?D

where Av(X) is the fuzzy average value of X over U. Thus, computation of the
average value, D, reduces to the solution of the fuzzy nonlinear programming
problem

        �_D(v)= max_a(sum_i �_i(a_i))
subject to
        v= sum_i a_i            (average height)

where �_D and �_C are the membership functions of D and C, respectively.
This is the fuzzy logic solution to Version 2. Note that computation of D
requires calibration of the membership functions of ? 170* and ? 0.7*. Note
also that the fuzzy logic solution is a solution in the sense that it
reduces the original problem to a well-defined mathematical problem.
        Jason, your Bayesian solution of Version 2 would yield a crisp value of D.
The fuzzy logic solution leads to a fuzzy value of D, in consequence of the
fuzziness of the initial dataset. This is an instance of the principle:
fuzzy in, fuzzy out.






- --
Lotfi A. Zadeh
Professor in the Graduate School, Computer Science Division
Department of Electrical Engineering and Computer Sciences
University of California
Berkeley, CA 94720 -1776
Director, Berkeley Initiative in Soft Computing (BISC)

Address:
Computer Science Division
University of California
Berkeley, CA 94720-1776
[EMAIL PROTECTED]
Tel.(office): (510) 642-4959
Fax (office): (510) 642-1712
Tel.(home): (510) 526-2569
Fax (home): (510) 526-2433
Fax (home): (510) 526-5181
http://www.cs.berkeley.edu/People/Faculty/Homepages/zadeh.html

BISC Homepage URLs:
URL: http://www-bisc.cs.berkeley/
URL: http://zadeh.cs.berkeley.edu/

Reply via email to