Re: [agi] Reply to Bill Hubbard's post: Mon, 10 Feb 2003

2003-02-14 Thread Eliezer S. Yudkowsky
C. David Noziglia wrote:


The problem with the issue we are discussing here is that the worst-case
scenario for handing power to unrestricted, super-capable AI entities is
very bad, indeed.  So what we are looking for is not really building an
ethical structure or moral sense at all.  Failure is not an option.  The
only way to prevent the worst-case scenarios that have been mentioned by
discussants is not to design moral values and hope, but to build in
hard-wired, Three Laws-type rules that cannot be overridden.  And then, on
top of that, build in social, competitive systems that use the presence of
mulitple AIs, dependence on humans as suppliers or intermediaries, ethical,
legal, and even game-theory (remember the movie /War Games/?) strictures,
and even punishment systems up to and incuding shut-down capabilities.


That *still* doesn't work.

1)  Hard-wired rules are a pipe dream.  It consists of mixing 
mechanomorphism (machines only do what they're told to do) with 
anthropomorphism (I wish those slaves down on the plantation would stop 
rebelling).  The only hard-wired level of organization is code, or, in a 
seed AI, physics.  Once cognition exists it can no longer be usefully 
described using the adjective hard-wired.  This is like saying you can 
make Windows XP stable by hard-wiring it not to crash, presumably by 
including the precompilation statement #define BUGS 0.

2)  Any individual ethic that cannot be overridden - if we are speaking 
about a successfully implemented design property of the system, and not a 
mythical hardwiring - will never be any stronger, smarter, or more 
reliable than the frozen goal system of its creator as it existed at the 
time of producing that ethic.  In particular, odd things start happening 
when you take an intelligence of order X and try to control it using goal 
patterns that were produced by an intelligence of order  X.  You say 
cannot be overridden, I hear cannot be renormalized.

3)  A society of selfish AIs may develop certain (not really primatelike) 
rules for enforcing cooperative interactions among themselves; but you 
cannot prove for any entropic specification, and I will undertake to 
*disprove* for any clear specification, that this creates any rational 
reason to assign a greater probability to the proposition that the AI 
society will protect human beings.

4)  As for dependence on human suppliers, if you're talking about 
transhumans of any kind, AIs, uploads, what-have-you, transhumans 
dependent on a human economy is a pipe dream.  (Order custom proteins from 
an online DNA synthesis and peptide sequencer; build nanotech; total time 
of dependence on human economy, 48 hours.)

--
Eliezer S. Yudkowsky  http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]


Re: [agi] Reply to Bill Hubbard's post: Mon, 10 Feb 2003

2003-02-14 Thread Bill Hibbard
Hi David,

 The problem here, I guess, is the conflict between Platonic expectations of
 perfection and the messiness of the real world.

I never said perfection, and in my book make it clear that
the task of a super-intelligent machine learning behaviors
to promote human happiness will be very messy. That's why
it needs to be super-intelligent.

 The only systems we know of that generate the most happiness, freedom, and
 prosperity are democracy and free enterprise.  Both systems are messy and
 far from perfect.  They both generate a lot of unhappiness and poverty in
 their operation.  Both need regulation and control mechanisms (rule of law)
 to inhibit their unrestricted action.  The goal is to find a balance between
 the social justice goal of wealth redistibution and the social welfare goal
 of wealth generation through unrestricted innovation.  They  in a sense need
 the messiness in order to generate the benefits; designing systems to
 generate happiness has always been a recipe for totalitarianism.  When the
 systems does not allow balance, or failure, when no company, say, can go
 bankrupt or fail, no company can succeed, change, or take risks.  That's
 socialism, and that's what's wrong with it.

 The problem with the issue we are discussing here is that the worst-case
 scenario for handing power to unrestricted, super-capable AI entities is
 very bad, indeed.  So what we are looking for is not really building an
 ethical structure or moral sense at all.  Failure is not an option.  The
 only way to prevent the worst-case scenarios that have been mentioned by
 discussants is not to design moral values and hope, but to build in
 hard-wired, Three Laws-type rules that cannot be overridden.  And then, on
 top of that, build in social, competitive systems that use the presence of
 mulitple AIs, dependence on humans as suppliers or intermediaries, ethical,
 legal, and even game-theory (remember the movie /War Games/?) strictures,
 and even punishment systems up to and incuding shut-down capabilities.

The problem with laws is that they are inevitably ambiguous.
They are analogous to the expert system approach to AI, that
cannot cope with the messiness of the real world. Human laws
require intelligent judges to resolve their ambiguities. Who
will supply the intelligent judgement for applying laws to
super-intelligent machines?

I agree whole heartedly that the stakes are high, but think
the safer apporach is to build ethics into the fundamental
driver of super-intelligent machines, which will be their
reinforcement values.

Cheers,
Bill
--
Bill Hibbard, SSEC, 1225 W. Dayton St., Madison, WI  53706
[EMAIL PROTECTED]  608-263-4427  fax: 608-263-6738
http://www.ssec.wisc.edu/~billh/vis.html

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]



Re: [agi] Reply to Bill Hubbard's post: Mon, 10 Feb 2003

2003-02-14 Thread Eliezer S. Yudkowsky
Brad Wyble wrote:

 3)  A society of selfish AIs may develop certain (not really
 primatelike) rules for enforcing cooperative interactions among
 themselves; but you cannot prove for any entropic specification, and
 I will undertake to *disprove* for any clear specification, that this
 creates any rational reason to assign a greater probability to the
 proposition that the AI society will protect human beings.

 No, you can't guarantee anything.

Read the above statement carefully.  I am saying that selfish AIs as a 
group are no more likely to protect humans than an individual selfish AI. 
 Because of the apparent popularity of the argument oh, well, the 
Singularity means I can say anything I like about what AIs do, and you 
can't disprove it, I am specifying that this means we have no rational 
(Bayesian) reason to expect AI groups to be more helpful toward humans 
than AI individuals, without specification of AI initial conditions of a 
degree adequate to create an individual unselfish AI.  I.e., we have no 
rational reason to expect groups to help.

In plain language:  I am not saying that a group of selfish AIs is not 
guaranteed not to kill you.  I am saying that if you give me any clear 
and complete specification (no room for handwaving) of a group of AIs that 
are not individually helpful of humans, I will show that the direct 
extrapolation of that specification shows group behaviors that are also 
not helpful to humans.  Of course no one has ever given a clear 
specification except Marcus Hutter - but given Hutter's specification it 
is very straightforward to show that grouping makes no difference.

In even plainer language:  If you rely on groups of AIs to police 
themselves you *will* get killed unless a miracle happens.

A miracle m may be defined as a complex event which we have no Bayesian 
reason to expect, ergo, having probability 2^-K(m).

The Singularity means that there are unknown unknowns that could 
potentially supervene on any point in your model - but to expect an 
unknown unknown to intervene in the form of *specific complexity* that 
pulls your butt out of the fire is praying for a miracle.  In terms of 
formal reasoning, the Singularity unknown unknown effect can only add to 
the formal entropy of a model.  This doesn't make the Singularity a net 
bad thing because there's more to the Singularity than just the unknown 
unknown effect; there are positive known unknowns like How much moral 
value can a superintelligence create?, Singularity scenarios that are 
comparatively more and less tolerant of unknown unknows, and 
straightforward extrapolations to more moral value than the human goal 
system can represent.  But it means you've got to be careful.

 But if AI's are in any way grounded
 in our mentality, protective (as opposed to not anti-humanitarian)
 tendencies have a good chance of evolving.  This is a good argument for
 modeling AGI's on our brains.  While humans are not angels by any
 means, I do believe that we would, *as a community* look after lesser
 beings once having achieved a comfortable level of need-fulfillment.  I
 say this by analogy to our current situation in which developed
 countries are starting to show an interest in the preservation of the
 environment and endangered species, even at great expense and
 inconvenience.

You have to ground the AIs in our mentality in a very specific way, which 
(as it happens) directly transfers protective tendencies *as well as* 
transferring the initial conditions from which protective tendencies 
develop.  Your intuition that you can create a simple AI design that 
naturally develops protective tendencies is wrong.  Protective 
tendencies turn out to be a hell of a lot more complex than they look to 
humans, who expect other minds to behave like humans.  This is a verrry 
complex thing that looks to humans like a simple thing.  Humans already 
have this complexity built into them, so we exhibit protective tendencies 
given a wide variety of simple external conditions, but *that's not the 
whole dependency* despite our intuition that it's the external condition 
that causes the protectiveness.  The reason the external condition is seen 
as causing the protectiveness is that the innate complexity is 
species-universal and is hence an invisible enabling condition.  It'd be 
like dropping your glass, watching it shatter, and then saying Damn, too 
bad I wasn't on the Moon where the gravity is lower instead of I wish my 
hands hadn't been so sweaty.

There are simple external conditions that provoke protective tendencies in 
humans following chains of logic that seem entirely natural to us.  Our 
intuition that reproducing these simple external conditions serve to 
provoke protective tendencies in AIs is knowably wrong, failing an 
unsupported specific complex miracle.

 Eliezer, I think your quest to provide a surefire guarantee against a
 singularity that eliminates mankind is unfulfillable.  We will be
 rolling the 

Re: [agi] Reply to Bill Hubbard's post: Mon, 10 Feb 2003

2003-02-14 Thread Brad Wyble
 
 There are simple external conditions that provoke protective tendencies in 
 humans following chains of logic that seem entirely natural to us.  Our 
 intuition that reproducing these simple external conditions serve to 
 provoke protective tendencies in AIs is knowably wrong, failing an 
 unsupported specific complex miracle.

Well said.
 
 Or to put it another way, you see Friendliness in AIs as pretty likely 
 regardless, and you think I'm going to all these lengths to provide a 
 guarantee.  I'm not.  I'm going to all these lengths to create a 
 *significant probability* of Friendliness.
 

You're mischaracterizing my position.  I'm certainly not saying we'll get friendliness 
for free, but was trying to reason by analogy (perhaps in a flawed way), that our best 
chance of success may be to model AGI's based on our innate tendencies wherever 
possible.  Human behavior is a knowable quality.

I perceived, based on the character of your discussion, that you would be unsatisfied 
with anything short of a formal, mathetmatical proof that any given AGI would not 
destroy us before giving the assent to turning it on.  If that characterization was 
incorrect, the fault is mine.


-Brad

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]



Re: [agi] Reply to Bill Hubbard's post: Mon, 10 Feb 2003

2003-02-14 Thread Eliezer S. Yudkowsky
Brad Wyble wrote:
 There are simple external conditions that provoke protective
 tendencies in humans following chains of logic that seem entirely
 natural to us.  Our intuition that reproducing these simple external
 conditions serve to provoke protective tendencies in AIs is knowably
 wrong, failing an unsupported specific complex miracle.

 Well said.

 Or to put it another way, you see Friendliness in AIs as pretty
 likely regardless, and you think I'm going to all these lengths to
 provide a guarantee.  I'm not.  I'm going to all these lengths to
 create a *significant probability* of Friendliness.

 You're mischaracterizing my position.  I'm certainly not saying we'll
 get friendliness for free, but was trying to reason by analogy (perhaps
 in a flawed way), that our best chance of success may be to model AGI's
 based on our innate tendencies wherever possible.  Human behavior is a
 knowable quality.

Okay... what I'm saying, basically, is that to connect AI morality to 
human morality turns out to be a very complex problem that is not solved 
by saying let's copy human nature.  You need a very specific description 
of what you have to copy, how you do the copying, and so on, and this 
involves all sorts of complex nonobvious concepts within a complex 
nonobvious theory that completely changes the way you see morality.  It 
would even be fair to say, dismayingly, that in saying let's build AGI's 
which reproduce certain human behaviors, you have not even succeeded in 
stating the problem, let alone the solution.

This isn't intended in any personal way, btw.  It's just that, like, the 
fate of the world *does* actually depend on it and all, so I have to be 
very precise about how much progress has occurred at a given point of 
theoretical development, rather than offering encouragement.

 I perceived, based on the character of your discussion, that you would
 be unsatisfied with anything short of a formal, mathetmatical proof
 that any given AGI would not destroy us before giving the assent to
 turning it on.  If that characterization was incorrect, the fault is
 mine.

No!  It's *my* fault!  You can't have any!  Anyhow, I don't think such a 
formal proof is possible.  The problem with the proposals I see is not 
that they are not *provably* Friendly but that a rational extrapolation of 
them shows that they are *unFriendly* barring a miracle.  I'll take a 
proposal whose rational extrapolation is to Friendliness and which seems 
to lie at a local optimum relative to the improvements I can imagine; 
proof is impossible.

--
Eliezer S. Yudkowsky  http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]