Re: [agi] Reply to Bill Hubbard's post: Mon, 10 Feb 2003
C. David Noziglia wrote: The problem with the issue we are discussing here is that the worst-case scenario for handing power to unrestricted, super-capable AI entities is very bad, indeed. So what we are looking for is not really building an ethical structure or moral sense at all. Failure is not an option. The only way to prevent the worst-case scenarios that have been mentioned by discussants is not to design moral values and hope, but to build in hard-wired, Three Laws-type rules that cannot be overridden. And then, on top of that, build in social, competitive systems that use the presence of mulitple AIs, dependence on humans as suppliers or intermediaries, ethical, legal, and even game-theory (remember the movie /War Games/?) strictures, and even punishment systems up to and incuding shut-down capabilities. That *still* doesn't work. 1) Hard-wired rules are a pipe dream. It consists of mixing mechanomorphism (machines only do what they're told to do) with anthropomorphism (I wish those slaves down on the plantation would stop rebelling). The only hard-wired level of organization is code, or, in a seed AI, physics. Once cognition exists it can no longer be usefully described using the adjective hard-wired. This is like saying you can make Windows XP stable by hard-wiring it not to crash, presumably by including the precompilation statement #define BUGS 0. 2) Any individual ethic that cannot be overridden - if we are speaking about a successfully implemented design property of the system, and not a mythical hardwiring - will never be any stronger, smarter, or more reliable than the frozen goal system of its creator as it existed at the time of producing that ethic. In particular, odd things start happening when you take an intelligence of order X and try to control it using goal patterns that were produced by an intelligence of order X. You say cannot be overridden, I hear cannot be renormalized. 3) A society of selfish AIs may develop certain (not really primatelike) rules for enforcing cooperative interactions among themselves; but you cannot prove for any entropic specification, and I will undertake to *disprove* for any clear specification, that this creates any rational reason to assign a greater probability to the proposition that the AI society will protect human beings. 4) As for dependence on human suppliers, if you're talking about transhumans of any kind, AIs, uploads, what-have-you, transhumans dependent on a human economy is a pipe dream. (Order custom proteins from an online DNA synthesis and peptide sequencer; build nanotech; total time of dependence on human economy, 48 hours.) -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Reply to Bill Hubbard's post: Mon, 10 Feb 2003
Hi David, The problem here, I guess, is the conflict between Platonic expectations of perfection and the messiness of the real world. I never said perfection, and in my book make it clear that the task of a super-intelligent machine learning behaviors to promote human happiness will be very messy. That's why it needs to be super-intelligent. The only systems we know of that generate the most happiness, freedom, and prosperity are democracy and free enterprise. Both systems are messy and far from perfect. They both generate a lot of unhappiness and poverty in their operation. Both need regulation and control mechanisms (rule of law) to inhibit their unrestricted action. The goal is to find a balance between the social justice goal of wealth redistibution and the social welfare goal of wealth generation through unrestricted innovation. They in a sense need the messiness in order to generate the benefits; designing systems to generate happiness has always been a recipe for totalitarianism. When the systems does not allow balance, or failure, when no company, say, can go bankrupt or fail, no company can succeed, change, or take risks. That's socialism, and that's what's wrong with it. The problem with the issue we are discussing here is that the worst-case scenario for handing power to unrestricted, super-capable AI entities is very bad, indeed. So what we are looking for is not really building an ethical structure or moral sense at all. Failure is not an option. The only way to prevent the worst-case scenarios that have been mentioned by discussants is not to design moral values and hope, but to build in hard-wired, Three Laws-type rules that cannot be overridden. And then, on top of that, build in social, competitive systems that use the presence of mulitple AIs, dependence on humans as suppliers or intermediaries, ethical, legal, and even game-theory (remember the movie /War Games/?) strictures, and even punishment systems up to and incuding shut-down capabilities. The problem with laws is that they are inevitably ambiguous. They are analogous to the expert system approach to AI, that cannot cope with the messiness of the real world. Human laws require intelligent judges to resolve their ambiguities. Who will supply the intelligent judgement for applying laws to super-intelligent machines? I agree whole heartedly that the stakes are high, but think the safer apporach is to build ethics into the fundamental driver of super-intelligent machines, which will be their reinforcement values. Cheers, Bill -- Bill Hibbard, SSEC, 1225 W. Dayton St., Madison, WI 53706 [EMAIL PROTECTED] 608-263-4427 fax: 608-263-6738 http://www.ssec.wisc.edu/~billh/vis.html --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Reply to Bill Hubbard's post: Mon, 10 Feb 2003
Brad Wyble wrote: 3) A society of selfish AIs may develop certain (not really primatelike) rules for enforcing cooperative interactions among themselves; but you cannot prove for any entropic specification, and I will undertake to *disprove* for any clear specification, that this creates any rational reason to assign a greater probability to the proposition that the AI society will protect human beings. No, you can't guarantee anything. Read the above statement carefully. I am saying that selfish AIs as a group are no more likely to protect humans than an individual selfish AI. Because of the apparent popularity of the argument oh, well, the Singularity means I can say anything I like about what AIs do, and you can't disprove it, I am specifying that this means we have no rational (Bayesian) reason to expect AI groups to be more helpful toward humans than AI individuals, without specification of AI initial conditions of a degree adequate to create an individual unselfish AI. I.e., we have no rational reason to expect groups to help. In plain language: I am not saying that a group of selfish AIs is not guaranteed not to kill you. I am saying that if you give me any clear and complete specification (no room for handwaving) of a group of AIs that are not individually helpful of humans, I will show that the direct extrapolation of that specification shows group behaviors that are also not helpful to humans. Of course no one has ever given a clear specification except Marcus Hutter - but given Hutter's specification it is very straightforward to show that grouping makes no difference. In even plainer language: If you rely on groups of AIs to police themselves you *will* get killed unless a miracle happens. A miracle m may be defined as a complex event which we have no Bayesian reason to expect, ergo, having probability 2^-K(m). The Singularity means that there are unknown unknowns that could potentially supervene on any point in your model - but to expect an unknown unknown to intervene in the form of *specific complexity* that pulls your butt out of the fire is praying for a miracle. In terms of formal reasoning, the Singularity unknown unknown effect can only add to the formal entropy of a model. This doesn't make the Singularity a net bad thing because there's more to the Singularity than just the unknown unknown effect; there are positive known unknowns like How much moral value can a superintelligence create?, Singularity scenarios that are comparatively more and less tolerant of unknown unknows, and straightforward extrapolations to more moral value than the human goal system can represent. But it means you've got to be careful. But if AI's are in any way grounded in our mentality, protective (as opposed to not anti-humanitarian) tendencies have a good chance of evolving. This is a good argument for modeling AGI's on our brains. While humans are not angels by any means, I do believe that we would, *as a community* look after lesser beings once having achieved a comfortable level of need-fulfillment. I say this by analogy to our current situation in which developed countries are starting to show an interest in the preservation of the environment and endangered species, even at great expense and inconvenience. You have to ground the AIs in our mentality in a very specific way, which (as it happens) directly transfers protective tendencies *as well as* transferring the initial conditions from which protective tendencies develop. Your intuition that you can create a simple AI design that naturally develops protective tendencies is wrong. Protective tendencies turn out to be a hell of a lot more complex than they look to humans, who expect other minds to behave like humans. This is a verrry complex thing that looks to humans like a simple thing. Humans already have this complexity built into them, so we exhibit protective tendencies given a wide variety of simple external conditions, but *that's not the whole dependency* despite our intuition that it's the external condition that causes the protectiveness. The reason the external condition is seen as causing the protectiveness is that the innate complexity is species-universal and is hence an invisible enabling condition. It'd be like dropping your glass, watching it shatter, and then saying Damn, too bad I wasn't on the Moon where the gravity is lower instead of I wish my hands hadn't been so sweaty. There are simple external conditions that provoke protective tendencies in humans following chains of logic that seem entirely natural to us. Our intuition that reproducing these simple external conditions serve to provoke protective tendencies in AIs is knowably wrong, failing an unsupported specific complex miracle. Eliezer, I think your quest to provide a surefire guarantee against a singularity that eliminates mankind is unfulfillable. We will be rolling the
Re: [agi] Reply to Bill Hubbard's post: Mon, 10 Feb 2003
There are simple external conditions that provoke protective tendencies in humans following chains of logic that seem entirely natural to us. Our intuition that reproducing these simple external conditions serve to provoke protective tendencies in AIs is knowably wrong, failing an unsupported specific complex miracle. Well said. Or to put it another way, you see Friendliness in AIs as pretty likely regardless, and you think I'm going to all these lengths to provide a guarantee. I'm not. I'm going to all these lengths to create a *significant probability* of Friendliness. You're mischaracterizing my position. I'm certainly not saying we'll get friendliness for free, but was trying to reason by analogy (perhaps in a flawed way), that our best chance of success may be to model AGI's based on our innate tendencies wherever possible. Human behavior is a knowable quality. I perceived, based on the character of your discussion, that you would be unsatisfied with anything short of a formal, mathetmatical proof that any given AGI would not destroy us before giving the assent to turning it on. If that characterization was incorrect, the fault is mine. -Brad --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Reply to Bill Hubbard's post: Mon, 10 Feb 2003
Brad Wyble wrote: There are simple external conditions that provoke protective tendencies in humans following chains of logic that seem entirely natural to us. Our intuition that reproducing these simple external conditions serve to provoke protective tendencies in AIs is knowably wrong, failing an unsupported specific complex miracle. Well said. Or to put it another way, you see Friendliness in AIs as pretty likely regardless, and you think I'm going to all these lengths to provide a guarantee. I'm not. I'm going to all these lengths to create a *significant probability* of Friendliness. You're mischaracterizing my position. I'm certainly not saying we'll get friendliness for free, but was trying to reason by analogy (perhaps in a flawed way), that our best chance of success may be to model AGI's based on our innate tendencies wherever possible. Human behavior is a knowable quality. Okay... what I'm saying, basically, is that to connect AI morality to human morality turns out to be a very complex problem that is not solved by saying let's copy human nature. You need a very specific description of what you have to copy, how you do the copying, and so on, and this involves all sorts of complex nonobvious concepts within a complex nonobvious theory that completely changes the way you see morality. It would even be fair to say, dismayingly, that in saying let's build AGI's which reproduce certain human behaviors, you have not even succeeded in stating the problem, let alone the solution. This isn't intended in any personal way, btw. It's just that, like, the fate of the world *does* actually depend on it and all, so I have to be very precise about how much progress has occurred at a given point of theoretical development, rather than offering encouragement. I perceived, based on the character of your discussion, that you would be unsatisfied with anything short of a formal, mathetmatical proof that any given AGI would not destroy us before giving the assent to turning it on. If that characterization was incorrect, the fault is mine. No! It's *my* fault! You can't have any! Anyhow, I don't think such a formal proof is possible. The problem with the proposals I see is not that they are not *provably* Friendly but that a rational extrapolation of them shows that they are *unFriendly* barring a miracle. I'll take a proposal whose rational extrapolation is to Friendliness and which seems to lie at a local optimum relative to the improvements I can imagine; proof is impossible. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]