RE: [agi] Breaking AIXI-tl
> Ben Goertzel wrote: > > I don't think that preventing an AI from tinkering with its > > reward system is the only solution, or even the best one... > > > > It will in many cases be appropriate for an AI to tinker with its goal > > system... > > I don't think I was being clear there. I don't mean the AI should be > prevented from adjusting its goal system content, but rather that > it should > be sophisticated enough that it doesn't want to wirehead in the > first place. Ah, I certainly agree with you then. The risk that's tricky to mitigate against is that, like a human drifting into drug addiction, the AI slowly drifts into a state of mind where it does want to "wirehead" ... ben --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
Ben Goertzel wrote: > I don't think that preventing an AI from tinkering with its > reward system is the only solution, or even the best one... > > It will in many cases be appropriate for an AI to tinker with its goal > system... I don't think I was being clear there. I don't mean the AI should be prevented from adjusting its goal system content, but rather that it should be sophisticated enough that it doesn't want to wirehead in the first place. > I would recommend Eliezer's excellent writings on this topic if you don't > know them, chiefly www.singinst.org/CFAI.html . Also, I have a brief > informal essay on the topic, www.goertzel.org/dynapsyc/2002/AIMorality.htm , > although my thoughts on the topic have progressed a fair bit since I wrote > that. Yes, I've been following Eliezer's work since around '98. I'll have to take a look at your essay. Billy Brown --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
> To avoid the problem entirely, you have to figure out how to make > an AI that > doesn't want to tinker with its reward system in the first place. This, in > turn, requires some tricky design work that would not necessarily seem > important unless one were aware of this problem. Which, of course, is the > reason I commented on it in the first place. > > Billy Brown I don't think that preventing an AI from tinkering with its reward system is the only solution, or even the best one... It will in many cases be appropriate for an AI to tinker with its goal system... I would recommend Eliezer's excellent writings on this topic if you don't know them, chiefly www.singinst.org/CFAI.html . Also, I have a brief informal essay on the topic, www.goertzel.org/dynapsyc/2002/AIMorality.htm , although my thoughts on the topic have progressed a fair bit since I wrote that. Note that I don't fully agree with Eliezer on this stuff, but I do think he's thought about it more thoroughly than anyone else (including me). It's a matter of creating an initial condition so that the trajectory of the evolving AI system (with a potentially evolving goal system) will have a very high probability of staying in a favorable region of state space ;-) -- Ben G --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
Ben Goertzel wrote: > Agreed, except for the "very modest resources" part. AIXI could > potentially accumulate pretty significant resources pretty quickly. Agreed. But if the AIXI needs to dissassemble the planet to build its defense mechanism, the fact that it is harmless afterwards isn't going to be much consolation to us. So, we only survive if the resources needed for the perfect defense are small enough that the construction project doesn't wipe us out as a side effect. > This exploration makes the (fairly obvious, I guess) point that the problem > with AIXI Friendliness-wise is its simplistic goal architecture (the reward > function) rather than its learning mechanism. Well, I agree that this particular problem is a result of the AIXI's goal system architecture, but IMO the same problem occurs in a wide range of other goal systems I've seen proposed on this list. The root of the problem is that the thing we would really like to reward the system for, human satisfaction with its performance, is not a physical quantity that can be directly measured by a reward mechanism. So it is very tempting to choose some external phenomenon, like smiles or verbal expressions of satisfaction, as a proxy. Unfortunately, any such measurement can be subverted once the AI becomes good at modifying its physical surroundings, and an AI with this kind of goal system has no motivation not to wirehead itself. To avoid the problem entirely, you have to figure out how to make an AI that doesn't want to tinker with its reward system in the first place. This, in turn, requires some tricky design work that would not necessarily seem important unless one were aware of this problem. Which, of course, is the reason I commented on it in the first place. Billy Brown --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
Philip, > The discussion at times seems to have progressed on the basis that > AIXI / AIXItl could choose to do all sorts amzing, powerful things. But > what I'm uncear on is what generates the infinite space of computer > programs? > > Does AIXI / AIXItl itself generate these programs? Or does it tap other > entities programs? AIXI is not a physically realizable system, it's just a hypothetical mathematical entity. It could never actually be built, in any universe. AIXItl is physically realizable in theory, but probably never in our universe... it would require too much computational resources. (Except for trivially small values of the parameters t and l, which would result in a very dumb AIXItl, i.e. probably dumber than a beetle.) The way they work is to generate all possible programs (AIXI) or all possible programs of a given length l (AIXItl). (It's easy to write a program that generates all possible programs, the problem is that it runs forever ;). -- Ben G --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
I might have missed a key point made in the earlier part of the discussion, but people have said on many occasions something like the following in relation to AIXI / AIXItl: > The function of this component would be much more effectively served > by a module that was able to rapidly search through an infinite space > of computer programs and run any one of them rapidly The discussion at times seems to have progressed on the basis that AIXI / AIXItl could choose to do all sorts amzing, powerful things. But what I'm uncear on is what generates the infinite space of computer programs? Does AIXI / AIXItl itself generate these programs? Or does it tap other entities programs? If it creates the programs itself it will need to have a very wide specturm general intelligence otherwise it spectrum of programs to choose from will be contrained and the resource/time contrained AIXItl will find itself spending a very large amount of time (infinite?) generating all these infinite programs before it even gets to the task of evaluating them. If on the other hand the AIXI / AIXItl systems suck existing programs out of the ether then the intelligence/implications of its assessment regime will depend a great extent on what programs are out there already - ie. the nature of the material it has to work with. If it's sucking human created programs out of databanks then it will depend on who had the money/time to make the programs and what the implicit values are that these people have imbedded in their programs. So which way does AIXI / AIXItl work? - creates it own programs or extracts them from existing databanks? Cheers, Philip --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
> It should also be pointed out that we are describing a state of > AI such that: > > a) it provides no conceivable benefit to humanity Not necessarily true: it's plausible that along the way, before learning how to whack off by stimulating its own reward button, it could provide some benefits to humanity. > b) a straightforward extrapolation shows it wiping out humanity > c) it requires the postulation of a specific unsupported complex miracle > to prevent the AI from wiping out humanity > c1) these miracles are unstable when subjected to further examination I'm not so sure about this, but it's not worth arguing, really. > c2) the AI still provides no benefit to humanity even given the miracle > > When a branch of an AI extrapolation ends in such a scenario it may > legitimately be labeled a complete failure. I'll classify it an almost-complete failure, sure ;) Fortunately it's also a totally pragmatically implausible system to construct, so there's not much to worry about...! -- Ben --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Billy Brown wrote: Ben Goertzel wrote: I think this line of thinking makes way too many assumptions about the technologies this uber-AI might discover. It could discover a truly impenetrable shield, for example. It could project itself into an entirely different universe... It might decide we pose so little threat to it, with its shield up, that fighting with us isn't worthwhile. By opening its shield perhaps it would expose itself to .0001% chance of not getting rewarded, whereas by leaving its shield up and leaving us alone, it might have .1% chance of not getting rewarded. Now, it is certainly conceivable that the laws of physics just happen to be such that a sufficiently good technology can create a provably impenetrable defense in a short time span, using very modest resources. If that happens to be the case, the runaway AI isn't a problem. But in just about any other case we all end up dead, either because wiping out humanity now is far easier that creating a defense against our distant descendants, or because the best defensive measures the AI can think of require engineering projects that would wipe us out as a side effect. It should also be pointed out that we are describing a state of AI such that: a) it provides no conceivable benefit to humanity b) a straightforward extrapolation shows it wiping out humanity c) it requires the postulation of a specific unsupported complex miracle to prevent the AI from wiping out humanity c1) these miracles are unstable when subjected to further examination c2) the AI still provides no benefit to humanity even given the miracle When a branch of an AI extrapolation ends in such a scenario it may legitimately be labeled a complete failure. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
> Now, it is certainly conceivable that the laws of physics just > happen to be > such that a sufficiently good technology can create a provably > impenetrable > defense in a short time span, using very modest resources. Agreed, except for the "very modest resources" part. AIXI could potentially accumulate pretty significant resources pretty quickly. > If that happens > to be the case, the runaway AI isn't a problem. But in just about > any other > case we all end up dead, either because wiping out humanity now is far > easier that creating a defense against our distant descendants, or because > the best defensive measures the AI can think of require > engineering projects > that would wipe us out as a side effect. > > Billy Brown Yes, I agree that an AIXI could be very dangerous. I was really just arguing against the statement that it would *definitely* lead to the end of the human race. I can see plausible alternatives, that's all... An interesting related question is: What if AIXI were implemented, not as a standalone AI system hooked up to a reward button, but as a component of another AI system (such as Novamente)? Novamente has a procedure/predicate learning component. The function of this component would be much more effectively served by a module that was able to rapidly search through an infinite space of computer programs and run any one of them rapidly ;_) [Hey, even a fast-running AIXItl would do nicely, we don't even need a real AIXI.] In this case, the same learning algorithm (AIXI) would not lead to the same behaviors. I wonder what would happen though? If the system were not allowed to modify its basic architecture, perhaps it would just act like a very smart Novamente... If it were allowed to modify its basic architecture, then it would quickly become something other than Novamente, but, there's no reason to assume it would create itself to have an AIXI-like goal structure... This exploration makes the (fairly obvious, I guess) point that the problem with AIXI Friendliness-wise is its simplistic goal architecture (the reward function) rather than its learning mechanism. This is what *meant to be saying* when Eliezer first brought up the AIXI/Friendliness issue. But what I actually said was that it wasn't good to have a system with a single fixed reward function ... and this wasn't quite the right way to say it. -- Ben G --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
Ben Goertzel wrote: > I think this line of thinking makes way too many assumptions about the > technologies this uber-AI might discover. > > It could discover a truly impenetrable shield, for example. > > It could project itself into an entirely different universe... > > It might decide we pose so little threat to it, with its shield up, that > fighting with us isn't worthwhile. By opening its shield perhaps it would > expose itself to .0001% chance of not getting rewarded, whereas by leaving > its shield up and leaving us alone, it might have .1% > chance of not > getting rewarded. > > ETc. You're thinking in static terms. It doesn't just need to be safe from anything ordinary humans do with 20th century technology. It needs to be safe from anything that could ever conceivably be created by humanity or its descendants. This obviously includes other AIs with capabilities as great as its own, but with whatever other goal systems humans might try out. Now, it is certainly conceivable that the laws of physics just happen to be such that a sufficiently good technology can create a provably impenetrable defense in a short time span, using very modest resources. If that happens to be the case, the runaway AI isn't a problem. But in just about any other case we all end up dead, either because wiping out humanity now is far easier that creating a defense against our distant descendants, or because the best defensive measures the AI can think of require engineering projects that would wipe us out as a side effect. Billy Brown --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Wei Dai wrote: Ok, I see. I think I agree with this. I was confused by your phrase "Hofstadterian superrationality" because if I recall correctly, Hofstadter suggested that one should always cooperate in one-shot PD, whereas you're saying only cooperate if you have sufficient evidence that the other side is running the same decision algorithm as you are. Similarity in this case may be (formally) emergent, in the sense that a most or all plausible initial conditions for a bootstrapping superintelligence - even extremely exotic conditions like the birth of a Friendly AI - exhibit convergence to decision processes that are correlated with each other with respect to the oneshot PD. If you have sufficient evidence that the other entity is a "superintelligence", that alone may be sufficient correlation. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
> Now, there is no easy way to predict what strategy it will settle on, but > "build a modest bunker and ask to be left alone" surely isn't it. At the > very least it needs to become the strongest military power in the > world, and > stay that way. I ... > Billy Brown > I think this line of thinking makes way too many assumptions about the technologies this uber-AI might discover. It could discover a truly impenetrable shield, for example. It could project itself into an entirely different universe... It might decide we pose so little threat to it, with its shield up, that fighting with us isn't worthwhile. By opening its shield perhaps it would expose itself to .0001% chance of not getting rewarded, whereas by leaving its shield up and leaving us alone, it might have .1% chance of not getting rewarded. ETc. I agree that bad outcomes are possible, but I don't see how we can possibly estimate the odds of them. -- ben g --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
On Wed, Feb 19, 2003 at 11:56:46AM -0500, Eliezer S. Yudkowsky wrote: > The mathematical pattern of a goal system or decision may be instantiated > in many distant locations simultaneously. Mathematical patterns are > constant, and physical processes may produce knowably correlated outputs > given knowably correlated initial conditions. For non-deterministic > systems, or cases where the initial conditions are not completely known > (where there exists a degree of subjective entropy in the specification of > the initial conditions), the correlation estimated will be imperfect, but > nonetheless nonzero. What I call the "Golden Law", by analogy with the > Golden Rule, states descriptively that a local decision is correlated with > the decision of all mathematically similar goal processes, and states > prescriptively that the utility of an action should be calculated given > that the action is the output of the mathematical pattern represented by > the decision process, not just the output of a particular physical system > instantiating that process - that the utility of an action is the utility > given that all sufficiently similar instantiations of a decision process > within the multiverse do, already have, or someday will produce that > action as an output. "Similarity" in this case is a purely descriptive > argument with no prescriptive parameters. Ok, I see. I think I agree with this. I was confused by your phrase "Hofstadterian superrationality" because if I recall correctly, Hofstadter suggested that one should always cooperate in one-shot PD, whereas you're saying only cooperate if you have sufficient evidence that the other side is running the same decision algorithm as you are. --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
> > Now, there is no easy way to predict what strategy it will settle on, but > "build a modest bunker and ask to be left alone" surely isn't it. At the > very least it needs to become the strongest military power in the world, and > stay that way. It might very well decide that exterminating the human race > is a safer way of preventing future threats, by ensuring that nothing that > could interfere with its operation is ever built. Then it has to make sure > no alien civilization ever interferes with the reward button, which is the > same problem on a much larger scale. There are lots of approaches it might > take to this problem, but most of the obvious ones either wipe out the human > race as a side effect or reduce us to the position of ants trying to survive > in the AI's defense system. > I think this is an appropriate time to paraphrase Kent Brockman: "Earth has been taken over 'conquered', if you will by a master race of unfriendly AI's. It's difficult to tell from this vantage point whether they will destroy the captive earth men or merely enslave them. One thing is for certain, there is no stopping them; their nanobots will soon be here. And I, for one, welcome our new computerized overlords. I'd like to remind them that as a trusted agi-list personality, I can be helpful in rounding up Eliezer to...toil in their underground uranium caves " http://www.the-ocean.com/simpsons/others/ants2.wav Apologies if this was inapporpriate. -Brad --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
Wei Dai wrote: > The AIXI would just contruct some nano-bots to modify the reward-button so > that it's stuck in the down position, plus some defenses to > prevent the reward mechanism from being further modified. It might need to > trick humans initially into allowing it the ability to construct such > nano-bots, but it's certainly a lot easier in the long run to do > this than > to benefit humans for all eternity. And not only is it easier, but this > way he gets the maximum rewards per time unit, which he would not be able > to get any other way. No real evaluator will ever give maximum rewards > since it will always want to leave room for improvement. I think it's worse than that, actually. The next logical step is to make sure that nothing ever interferes with its control of the reward signal, or does anything else that would turn off AIXI. It will therefore persue the most effective defensive scheme it can come up with, and it has no reason to care about adverse consequences to humans. Now, there is no easy way to predict what strategy it will settle on, but "build a modest bunker and ask to be left alone" surely isn't it. At the very least it needs to become the strongest military power in the world, and stay that way. It might very well decide that exterminating the human race is a safer way of preventing future threats, by ensuring that nothing that could interfere with its operation is ever built. Then it has to make sure no alien civilization ever interferes with the reward button, which is the same problem on a much larger scale. There are lots of approaches it might take to this problem, but most of the obvious ones either wipe out the human race as a side effect or reduce us to the position of ants trying to survive in the AI's defense system. Billy Brown --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
> The AIXI would just contruct some nano-bots to modify the reward-button so > that it's stuck in the down position, plus some defenses to > prevent the reward mechanism from being further modified. It might need to > trick humans initially into allowing it the ability to construct such > nano-bots, but it's certainly a lot easier in the long run to do > this than > to benefit humans for all eternity. And not only is it easier, but this > way he gets the maximum rewards per time unit, which he would not be able > to get any other way. No real evaluator will ever give maximum rewards > since it will always want to leave room for improvement. Fine, but if it does this, it is not anything harmful to humans. And, in the period BEFORE the AIXI figured out how to construct nanobots (or coerce & teach humans how to do so), it might do some useful stuff for humans. So then we'd have an AIXI that was friendly for a while, and then basically disappeared into a shell. Then we could build a new AIXI and start over ;-) > > Furthermore, my stated intention is NOT to rely on my prior > intuitions to > > assess the safety of my AGI system. I don't think that anyone's prior > > intuitions about AI safety are worth all that much, where a > complex system > > like Novamente is concerned. Rather, I think that once > Novamente is a bit > > further along -- at the "learning baby" rather than "partly implemented > > baby" stage -- we will do experimentation that will give us the > empirical > > knowledge needed to form serious opinions about safety (Friendliness). > > What kinds of experimentations do you plan to do? Please give some > specific examples. I will, a little later on -- I have to go outside now and spend a couple hours shoveling snow off my driveway ;-p Ben --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
On Wed, Feb 19, 2003 at 11:02:31AM -0500, Ben Goertzel wrote: > I'm not sure why an AIXI, rewarded for pleasing humans, would learn an > operating program leading it to hurt or annihilate humans, though. > > It might learn a program involving actually doing beneficial acts for humans > > Or, it might learn a program that just tells humans what they want to hear, > using its superhuman intelligence to trick humans into thinking that hearing > its soothing words is better than having actual beneficial acts done. > > I'm not sure why you think the latter is more likely than the former. My > guess is that the former is more likely. It may require a simpler program > to please humans by benefiting them, than to please them by tricking them > into thinking they're being benefited The AIXI would just contruct some nano-bots to modify the reward-button so that it's stuck in the down position, plus some defenses to prevent the reward mechanism from being further modified. It might need to trick humans initially into allowing it the ability to construct such nano-bots, but it's certainly a lot easier in the long run to do this than to benefit humans for all eternity. And not only is it easier, but this way he gets the maximum rewards per time unit, which he would not be able to get any other way. No real evaluator will ever give maximum rewards since it will always want to leave room for improvement. > Furthermore, my stated intention is NOT to rely on my prior intuitions to > assess the safety of my AGI system. I don't think that anyone's prior > intuitions about AI safety are worth all that much, where a complex system > like Novamente is concerned. Rather, I think that once Novamente is a bit > further along -- at the "learning baby" rather than "partly implemented > baby" stage -- we will do experimentation that will give us the empirical > knowledge needed to form serious opinions about safety (Friendliness). What kinds of experimentations do you plan to do? Please give some specific examples. --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Wei Dai wrote: Eliezer S. Yudkowsky wrote: "Important", because I strongly suspect Hofstadterian superrationality is a *lot* more ubiquitous among transhumans than among us... It's my understanding that Hofstadterian superrationality is not generally accepted within the game theory research community as a valid principle of decision making. Do you have any information to the contrary, or some other reason to think that it will be commonly used by transhumans? You yourself articulated, very precisely, the structure underlying Hofstadterian superrationality: "Expected utility of a course of action is defined as the average of the utility function evaluated on each possible state of the multiverse, weighted by the probability of that state being the actual state if the course was chosen." The key precise phrasing is "weighted by the probability of that state being the actual state if the course was chosen". This view of decisionmaking is applicable to a timeless universe; it provides clear recommendations in the case of, e.g., Newcomb's Paradox. The mathematical pattern of a goal system or decision may be instantiated in many distant locations simultaneously. Mathematical patterns are constant, and physical processes may produce knowably correlated outputs given knowably correlated initial conditions. For non-deterministic systems, or cases where the initial conditions are not completely known (where there exists a degree of subjective entropy in the specification of the initial conditions), the correlation estimated will be imperfect, but nonetheless nonzero. What I call the "Golden Law", by analogy with the Golden Rule, states descriptively that a local decision is correlated with the decision of all mathematically similar goal processes, and states prescriptively that the utility of an action should be calculated given that the action is the output of the mathematical pattern represented by the decision process, not just the output of a particular physical system instantiating that process - that the utility of an action is the utility given that all sufficiently similar instantiations of a decision process within the multiverse do, already have, or someday will produce that action as an output. "Similarity" in this case is a purely descriptive argument with no prescriptive parameters. Golden decisionmaking does not imply altruism - your goal system might evaluate the utility of only your local process. The Golden Law does, however, descriptively and prescriptively produce Hofstadterian superrationality as a special case; if you are facing a sufficiently similar mind across the Prisoner's Dilemna, your decisions will be correlated and that correlation affects your local utility. Given that the output of the mathematical pattern instantiated by your physical decision process is C, the state of the multiverse is C, C; given that the output of the mathematical pattern instantiated by your physical decision process is D, the state of the multiverse is D, D. Thus, given sufficient rationality and a sufficient degree of known correlation between the two processes, the mathematical pattern that is the decision process will output C. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Bill Hibbard wrote: The real flaw in the AIXI discussion was Eliezer's statement: Lee Corbin can work out his entire policy in step (2), before step (3) occurs, knowing that his synchronized other self - whichever one he is - is doing the same. He was assuming that a human could know that another mind would behave identically. Of course they cannot, but can only estimate other mind's intentions based on observations. I specified playing against your own clone. Under that situation the identity is, in fact, perfect. It is not knowably perfect. But a Bayesian naturalistic reasoner can estimate an extremely high degree of correlation, and take actions based on that estimate. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
I wrote: > I'm not sure why an AIXI, rewarded for pleasing humans, would learn an > operating program leading it to hurt or annihilate humans, though. > > It might learn a program involving actually doing beneficial acts > for humans > > Or, it might learn a program that just tells humans what they > want to hear, > using its superhuman intelligence to trick humans into thinking > that hearing > its soothing words is better than having actual beneficial acts done. > > I'm not sure why you think the latter is more likely than the former. My > guess is that the former is more likely. It may require a simpler program > to please humans by benefiting them, than to please them by tricking them > into thinking they're being benefited But even in the latter case, why would this program be likely to cause it to *harm* humans? That's what I don't see... If it can get its reward-button jollies by tricking us, or by actually benefiting us, why do you infer that it's going to choose to get its reward-button jollies by finding a way to get rewarded by harming us? I wouldn't feel terribly comfortable with an AIXI around hooked up to a bright red reward button in Marcus Hutter's basement, but I'm not sure it would be sudden disaster either... -- Ben G --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
> This seems to be a non-sequitor. The weakness of AIXI is not that it's > goals don't change, but that it has no goals other than to maximize an > externally given reward. So it's going to do whatever it predicts will > most efficiently produce that reward, which is to coerce or subvert > the evaluator. I'm not sure why an AIXI, rewarded for pleasing humans, would learn an operating program leading it to hurt or annihilate humans, though. It might learn a program involving actually doing beneficial acts for humans Or, it might learn a program that just tells humans what they want to hear, using its superhuman intelligence to trick humans into thinking that hearing its soothing words is better than having actual beneficial acts done. I'm not sure why you think the latter is more likely than the former. My guess is that the former is more likely. It may require a simpler program to please humans by benefiting them, than to please them by tricking them into thinking they're being benefited > If you start with such a goal, I don't see how allowing the > system to change its goals is going to help. Sure, you're right, if pleasing an external evaluator is the ONLY goal of a system, and the system's dynamics are entirely goal-directed, then there is no way to introduce goal-change into the system except randomly... Novamente is different because it has multiple initial goals, and because its behavior is not entirely goal-directed. In these regards Novamente is more human-brain-ish. > But I think Eliezer's real point, which I'm not sure has come across, is > that if you didn't spot such an obvious flaw right away, maybe you > shouldn't trust your intuitions about what is safe and what is not. Yes, I understood and explicitly responded to that point before. Still, even after hearing you and Eliezer repeat the above argument, I'm still not sure it's correct. However, my intuitions about the safety of AIXI, which I have not thought much about, are worth vastly less than my intuitions about the safety of Novamente, which I've been thinking about and working with for years. Furthermore, my stated intention is NOT to rely on my prior intuitions to assess the safety of my AGI system. I don't think that anyone's prior intuitions about AI safety are worth all that much, where a complex system like Novamente is concerned. Rather, I think that once Novamente is a bit further along -- at the "learning baby" rather than "partly implemented baby" stage -- we will do experimentation that will give us the empirical knowledge needed to form serious opinions about safety (Friendliness). -- Ben G --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Wei Dai wrote: > This seems to be a non-sequitor. The weakness of AIXI is not that it's > goals don't change, but that it has no goals other than to maximize an > externally given reward. So it's going to do whatever it predicts will > most efficiently produce that reward, which is to coerce or subvert > the evaluator. If you start with such a goal, I don't see how allowing > the system to change its goals is going to help. > > But I think Eliezer's real point, which I'm not sure has come across, is > that if you didn't spot such an obvious flaw right away, maybe you > shouldn't trust your intuitions about what is safe and what is not. The real flaw in the AIXI discussion was Eliezer's statement: > Lee Corbin can work out his entire policy in step (2), before step > (3) occurs, knowing that his synchronized other self - whichever one > he is - is doing the same. He was assuming that a human could know that another mind would behave identically. Of course they cannot, but can only estimate other mind's intentions based on observations. Eliezer backed off from this, and the discussion was reduced to whether humans or AIXI-tls are better at estimating intentions from behaviors. It was Eliezer who failed to spot the obvious flaw. I also want to comment on your substantive point about "the subject exploiting vulnerabilities in the evaluation algorithm to obtain rewards without actually acomplishing any real objectives. You can see an example of this problem in drug abusers" from your post at: http://www.mail-archive.com/everything-list@eskimo.com/msg03620.html This is why a solution to the credit assignment problem is so important for reinforcement learning, to account for long term rewards as well as short term rewards. Drug abusers seek reward in the short term, but that is far outweighed by their long term losses. Bill -- Bill Hibbard, SSEC, 1225 W. Dayton St., Madison, WI 53706 [EMAIL PROTECTED] 608-263-4427 fax: 608-263-6738 http://www.ssec.wisc.edu/~billh/vis.html --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
On Tue, Feb 18, 2003 at 06:58:30PM -0500, Ben Goertzel wrote: > However, I do think he ended up making a good point about AIXItl, which is > that an AIXItl will probably be a lot worse at modeling other AIXItl's, than > a human is at modeling other humans. This suggests that AIXItl's playing > cooperative games with each other, will likely fare worse than humans > playing cooperative games with each other. That's because AIXI wasn't designed with game theory in mind. I.e., the reason that it doesn't handle cooperative games is that it wasn't designed to. As the abstract says, AIXI is a combination of decision theory with Solomonoff's theory of universal induction. We know that game theory subsumes decision theory as a special case (where there is only one player) but not the other way around. Central to multi-player game theory is the concept of Nash equilibrium, which doesn't exist in decision theory. If you apply decision theory to multi-player games, you're going to end up with an infinite recursion where you try to predict the other players trying to predict you trying to predict the other players, and so on. If you cut this infinite recursion off at an arbitrary point, as AIXI-tl would, of course you're not going to get good results. > > I always thought that the biggest problem with the AIXI model is that it > > assumes that something in the environment is evaluating the AI and giving > > it rewards, so the easiest way for the AI to obtain its rewards would be > > to coerce or subvert the evaluator rather than to accomplish any real > > goals. I wrote a bit more about this problem at > > http://www.mail-archive.com/everything-list@eskimo.com/msg03620.html. > > I agree, this is a weakness of AIXI/AIXItl as a practical AI design. In > humans, and in a more pragmatic AI design like Novamente, one has a > situation where the system's goals adapt and change along with the rest of > the system, beginning from (and sometimes but not always straying far from) > a set of initial goals. This seems to be a non-sequitor. The weakness of AIXI is not that it's goals don't change, but that it has no goals other than to maximize an externally given reward. So it's going to do whatever it predicts will most efficiently produce that reward, which is to coerce or subvert the evaluator. If you start with such a goal, I don't see how allowing the system to change its goals is going to help. But I think Eliezer's real point, which I'm not sure has come across, is that if you didn't spot such an obvious flaw right away, maybe you shouldn't trust your intuitions about what is safe and what is not. --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
Eliezer, Allowing goals to change in a coupled way with thoughts memories, is not simply "adding entropy" -- Ben > Ben Goertzel wrote: > >> > >>I always thought that the biggest problem with the AIXI model is that it > >>assumes that something in the environment is evaluating the AI > and giving > >>it rewards, so the easiest way for the AI to obtain its rewards would be > >>to coerce or subvert the evaluator rather than to accomplish any real > >>goals. I wrote a bit more about this problem at > >>http://www.mail-archive.com/everything-list@eskimo.com/msg03620.html. > > > > I agree, this is a weakness of AIXI/AIXItl as a practical AI design. In > > humans, and in a more pragmatic AI design like Novamente, one has a > > situation where the system's goals adapt and change along with > the rest of > > the system, beginning from (and sometimes but not always > straying far from) > > a set of initial goals. > > How does adding entropy help? > > -- > Eliezer S. Yudkowsky http://singinst.org/ > Research Fellow, Singularity Institute for Artificial Intelligence > > --- > To unsubscribe, change your address, or temporarily deactivate > your subscription, > please go to http://v2.listbox.com/member/?[EMAIL PROTECTED] > --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Ben Goertzel wrote: I always thought that the biggest problem with the AIXI model is that it assumes that something in the environment is evaluating the AI and giving it rewards, so the easiest way for the AI to obtain its rewards would be to coerce or subvert the evaluator rather than to accomplish any real goals. I wrote a bit more about this problem at http://www.mail-archive.com/everything-list@eskimo.com/msg03620.html. I agree, this is a weakness of AIXI/AIXItl as a practical AI design. In humans, and in a more pragmatic AI design like Novamente, one has a situation where the system's goals adapt and change along with the rest of the system, beginning from (and sometimes but not always straying far from) a set of initial goals. How does adding entropy help? -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
Wei Dai wrote: > > "Important", because I strongly suspect Hofstadterian superrationality > > is a *lot* more ubiquitous among transhumans than among us... > > It's my understanding that Hofstadterian superrationality is not generally > accepted within the game theory research community as a valid principle of > decision making. Do you have any information to the contrary, or some > other reason to think that it will be commonly used by transhumans? I don't agree with Eliezer about the importance of Hofstadterian superrationality. However, I do think he ended up making a good point about AIXItl, which is that an AIXItl will probably be a lot worse at modeling other AIXItl's, than a human is at modeling other humans. This suggests that AIXItl's playing cooperative games with each other, will likely fare worse than humans playing cooperative games with each other. I don't think this conclusion hinges on the importance of Hofstadterian superrationality... > About a week ago Eliezer also wrote: > > > 2) While an AIXI-tl of limited physical and cognitive > capabilities might > > serve as a useful tool, AIXI is unFriendly and cannot be made Friendly > > regardless of *any* pattern of reinforcement delivered during childhood. > > I always thought that the biggest problem with the AIXI model is that it > assumes that something in the environment is evaluating the AI and giving > it rewards, so the easiest way for the AI to obtain its rewards would be > to coerce or subvert the evaluator rather than to accomplish any real > goals. I wrote a bit more about this problem at > http://www.mail-archive.com/everything-list@eskimo.com/msg03620.html. I agree, this is a weakness of AIXI/AIXItl as a practical AI design. In humans, and in a more pragmatic AI design like Novamente, one has a situation where the system's goals adapt and change along with the rest of the system, beginning from (and sometimes but not always straying far from) a set of initial goals. One could of course embed the AIXI/AIXItl learning mechanism in a supersystem that adapted its goals But then one would probably lose the nice theorems Marcus Hutter proved -- Ben G --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Eliezer S. Yudkowsky wrote: > "Important", because I strongly suspect Hofstadterian superrationality > is a *lot* more ubiquitous among transhumans than among us... It's my understanding that Hofstadterian superrationality is not generally accepted within the game theory research community as a valid principle of decision making. Do you have any information to the contrary, or some other reason to think that it will be commonly used by transhumans? About a week ago Eliezer also wrote: > 2) While an AIXI-tl of limited physical and cognitive capabilities might > serve as a useful tool, AIXI is unFriendly and cannot be made Friendly > regardless of *any* pattern of reinforcement delivered during childhood. I always thought that the biggest problem with the AIXI model is that it assumes that something in the environment is evaluating the AI and giving it rewards, so the easiest way for the AI to obtain its rewards would be to coerce or subvert the evaluator rather than to accomplish any real goals. I wrote a bit more about this problem at http://www.mail-archive.com/everything-list@eskimo.com/msg03620.html. --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl - AGI friendliness - how to move on
> To me it's almost enough to know that both you and Eliezer agree that > the AIXItl system can be 'broken' by the challenge he set and that a > human digital simulation might not. The next step is to ask "so what?". > What has this got to do with the AGI friendliness issue. This last point of Eliezer's doesn't have much to do with the AGI Friendliness issue. It's simply an example of how a smarter AGI system may not be smarter in the context of interacting socially with its own peers. -- Ben --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl - AGI friendliness - how to move on
Hi Ben, >From a high order implications point of view I'm not sure that we need too much written up from the last discussion. To me it's almost enough to know that both you and Eliezer agree that the AIXItl system can be 'broken' by the challenge he set and that a human digital simulation might not. The next step is to ask "so what?". What has this got to do with the AGI friendliness issue. > Hopefully Eliezer will write up a brief paper on his observations > about AIXI and AIXItl. If he does that, I'll be happy to write a > brief commentary on his paper expressing any differences of > interpretation I have, and giving my own perspective on his points. That sounds good to me. Cheers, Philip --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl - AGI friendliness - how to move on
 Philip,  Unfortunately, I don't have time to maintain a Web record of the key points I make in an e-mail dialogue -- frankly, I don't *really* even have time for as much e-mailing as I've been doing this last week !!  Hopefully Eliezer will write up a brief paper on his observations about AIXI and AIXItl. If he does that, I'll be happy to write a brief commentary on his paper expressing any differences of interpretation I have, and giving my own perspective on his points.  Actually, I imagine the discussion of AIXI Friendliness will be shorter and smoother than this last discussion. By now I've read the Hutter paper more carefully, and I've also gotten used to the language Eliezer uses to talk about AIXI/AIXItl. I reckon the next part of the discussion will have a lot less misunderstanding (though perhaps more genuine disagreement, we'll see ...)  -- Ben  -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Philip SuttonSent: Sunday, February 16, 2003 7:17 PMTo: [EMAIL PROTECTED]Subject: Re: [agi] Breaking AIXI-tl - AGI friendliness - how to move on Hi Eliezer/Ben/all, Well if the Breaking AIXI-tl discussion was the warm up then the discussion of the hard stuff on AGI friendliness is going to be really something! Bring it on!  :) Just a couple of suggestions about the methodology of the discussion - could we complement email based discussion with the use of the web? What I find in these very long (and at times highly technical) discussions is that the conclusions get lost along the way. I was a member of Government commission on the timber industry some years back and the commission members were chosen to represent the various sides in the industry/conservation conflict. The parties had been engaged in almost total warfare for the last 20 years and the idea was to see if we could find any common ground on which to build a new win-win strategic direction for the industry. One of the techniques we used informally was to let each side record what they saw the issues as, including commenting on each other's positions, and then recording consensus as it emerged. What this meant was that each 'side' kept an updated summary of the key 'facts', arguments and conclusions - as they saw them. Then the facilitator worked with the group to collect key 'facts', arguments and conclusions that both sides could agree on. At the end of the process we developed strategies for taking action on the areas of agreement and we developed a process for continuing to grapple with the on-going areas of disagreement. So in our case with the discussion of how to ensure AGI friendliness or community-mindedness, we could let any party to the discussion who feels they have a distinct point of view that is not well represented by anyone else to keep a rolling summary of the key 'facts', arguments and conclusions as they see them. These summaries could be kept on separate webpages, maintained by each party to the discussion. Everyone would have access to the summaries and the discussion would be carried out via email through the list. At some stage when the discussion has taken form on at least some key issue we might try to see if the group as a whole can agree on anything - and someone needs to write thouse outputs up in a rolling consolidated form on another web page. This might sound like a lot of work and excess structure but I think it helps to draw something solid out of the swirl of discussion and allows us to move on when a solid foundation has been built. ... And on another issue, if people are using highly technical arguments, and if those arguments are meant to have higher order implications could each person include a commentary in plain English along with their technical discussion, so that everyone can follow at least the higher order aspects of the discussion as it unfolds. Right at the end of the AIXI-tl debate Eliezer started using the 'magician in the cavern' analogy and all of a sudden I felt as if I was understanding what he was driving at. That use of analogy is a wonderful way to keep everyone in the loop of the conversion. If that sort of thing could be done more often that would be very helpful. What do you reckon? Cheers, Philip
Re: [agi] Breaking AIXI-tl - AGI friendliness - how to move on
Hi Eliezer/Ben/all, Well if the Breaking AIXI-tl discussion was the warm up then the discussion of the hard stuff on AGI friendliness is going to be really something! Bring it on!  :) Just a couple of suggestions about the methodology of the discussion - could we complement email based discussion with the use of the web? What I find in these very long (and at times highly technical) discussions is that the conclusions get lost along the way. I was a member of Government commission on the timber industry some years back and the commission members were chosen to represent the various sides in the industry/conservation conflict. The parties had been engaged in almost total warfare for the last 20 years and the idea was to see if we could find any common ground on which to build a new win-win strategic direction for the industry. One of the techniques we used informally was to let each side record what they saw the issues as, including commenting on each other's positions, and then recording consensus as it emerged. What this meant was that each 'side' kept an updated summary of the key 'facts', arguments and conclusions - as they saw them. Then the facilitator worked with the group to collect key 'facts', arguments and conclusions that both sides could agree on. At the end of the process we developed strategies for taking action on the areas of agreement and we developed a process for continuing to grapple with the on-going areas of disagreement. So in our case with the discussion of how to ensure AGI friendliness or community-mindedness, we could let any party to the discussion who feels they have a distinct point of view that is not well represented by anyone else to keep a rolling summary of the key 'facts', arguments and conclusions as they see them. These summaries could be kept on separate webpages, maintained by each party to the discussion. Everyone would have access to the summaries and the discussion would be carried out via email through the list. At some stage when the discussion has taken form on at least some key issue we might try to see if the group as a whole can agree on anything - and someone needs to write thouse outputs up in a rolling consolidated form on another web page. This might sound like a lot of work and excess structure but I think it helps to draw something solid out of the swirl of discussion and allows us to move on when a solid foundation has been built. ... And on another issue, if people are using highly technical arguments, and if those arguments are meant to have higher order implications could each person include a commentary in plain English along with their technical discussion, so that everyone can follow at least the higher order aspects of the discussion as it unfolds. Right at the end of the AIXI-tl debate Eliezer started using the 'magician in the cavern' analogy and all of a sudden I felt as if I was understanding what he was driving at. That use of analogy is a wonderful way to keep everyone in the loop of the conversion. If that sort of thing could be done more often that would be very helpful. What do you reckon? Cheers, Philip
Re: [agi] Breaking AIXI-tl - AGI friendliness
Ben Goertzel wrote: Actually, Eliezer said he had two points about AIXItl: 1) that it could be "broken" in the sense he's described 2) that it was intrinsically un-Friendly So far he has only made point 1), and has not gotten to point 2) !!! As for a general point about the teachability of Friendliness, I don't think that an analysis of AIXItl can lead to any such general conclusion. AIXItl is very, very different from Novamente or any other pragmatic AI system. I think that an analysis of AIXItl's Friendliness or otherwise is going to be useful primarily as an exercise in "Friendliness analysis of AGI systems," rather than for any pragmatic implications it may yave. Actually, I said AIXI-tl could be broken; AIXI is the one that can be shown to be intrinsically unFriendly (extending the demonstration to AIXI-tl would be significantly harder). Philip Sutton wrote: > My recollection was that Eliezer initiated the "Breaking AIXI-tl" discussion as a way of proving that friendliness of AGIs had to be consciously built in at the start and couldn't be assumed to be teachable at a later point. (Or have I totally lost the plot?) There are at least three foundational differences between the AIXI formalism and a Friendly AI; so far I've covered only the first. "Breaking AIXI-tl" wasn't about Friendliness; more of a dry run on a directly demonstrable and emotionally uncharged architectural consequence before tackling the hard stuff. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl - AGI friendliness
Actually, Eliezer said he had two points about AIXItl: 1) that it could be "broken" in the sense he's described 2) that it was intrinsically un-Friendly So far he has only made point 1), and has not gotten to point 2) !!! As for a general point about the teachability of Friendliness, I don't think that an analysis of AIXItl can lead to any such general conclusion. AIXItl is very, very different from Novamente or any other pragmatic AI system. I think that an analysis of AIXItl's Friendliness or otherwise is going to be useful primarily as an exercise in "Friendliness analysis of AGI systems," rather than for any pragmatic implications it may yave. -- Ben > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Philip Sutton > Sent: Sunday, February 16, 2003 9:42 AM > To: [EMAIL PROTECTED] > Subject: Re: [agi] Breaking AIXI-tl - AGI friendliness > > > Hi Eliezer/Ben, > > My recollection was that Eliezer initiated the "Breaking AIXI-tl" > discussion as a way of proving that friendliness of AGIs had to be > consciously built in at the start and couldn't be assumed to be > teachable at a later point. (Or have I totally lost the plot?) > > Do you feel the discussion has covered enough technical ground and > established enough concensus to bring the original topic back into > focus? > > Cheers, Philip > > --- > To unsubscribe, change your address, or temporarily deactivate > your subscription, > please go to http://v2.listbox.com/member/?[EMAIL PROTECTED] > --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl - AGI friendliness
Hi Eliezer/Ben, My recollection was that Eliezer initiated the "Breaking AIXI-tl" discussion as a way of proving that friendliness of AGIs had to be consciously built in at the start and couldn't be assumed to be teachable at a later point. (Or have I totally lost the plot?) Do you feel the discussion has covered enough technical ground and established enough concensus to bring the original topic back into focus? Cheers, Philip --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
> I guess that for AIXI to learn this sort of thing, it would have to be > rewarded for understanding AIXI in general, for proving theorems about AIXI, > etc. Once it had learned this, it might be able to apply this knowledge in > the one-shot PD context But I am not sure. > For those of us who have missed a critical message or two in this weekend's lengthy exchange, can you explain briefly the one-shot complex PD? I'm unsure how a program could evaluate and learn to predict the behavior of its opponent if it only gets 1-shot. Obviously I'm missing something. -Brad --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
I guess that for AIXI to learn this sort of thing, it would have to be rewarded for understanding AIXI in general, for proving theorems about AIXI, etc. Once it had learned this, it might be able to apply this knowledge in the one-shot PD context But I am not sure. ben > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Eliezer S. Yudkowsky > Sent: Saturday, February 15, 2003 3:36 PM > To: [EMAIL PROTECTED] > Subject: Re: [agi] Breaking AIXI-tl > > > Ben Goertzel wrote: > >>AIXI-tl can learn the iterated PD, of course; just not the > >>oneshot complex PD. > > > > But if it's had the right prior experience, it may have an > operating program > > that is able to deal with the oneshot complex PD... ;-) > > Ben, I'm not sure AIXI is capable of this. AIXI may inexorably predict > the environment and then inexorably try to maximize reward given > environment. The reflective realization that *your own choice* to follow > that control procedure is correlated with a distant entity's > choice not to > cooperate with you may be beyond AIXI. If it was the iterated PD, AIXI > would learn how a defection fails to maximize reward over time. But can > AIXI understand, even in theory, regardless of what its internal programs > simulate, that its top-level control function fails to maximize the a > priori propensity of other minds with information about AIXI's internal > state to cooperate with it, on the *one* shot PD? AIXI can't take the > action it needs to learn the utility of... > > -- > Eliezer S. Yudkowsky http://singinst.org/ > Research Fellow, Singularity Institute for Artificial Intelligence > > --- > To unsubscribe, change your address, or temporarily deactivate > your subscription, > please go to http://v2.listbox.com/member/?[EMAIL PROTECTED] > --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Ben Goertzel wrote: AIXI-tl can learn the iterated PD, of course; just not the oneshot complex PD. But if it's had the right prior experience, it may have an operating program that is able to deal with the oneshot complex PD... ;-) Ben, I'm not sure AIXI is capable of this. AIXI may inexorably predict the environment and then inexorably try to maximize reward given environment. The reflective realization that *your own choice* to follow that control procedure is correlated with a distant entity's choice not to cooperate with you may be beyond AIXI. If it was the iterated PD, AIXI would learn how a defection fails to maximize reward over time. But can AIXI understand, even in theory, regardless of what its internal programs simulate, that its top-level control function fails to maximize the a priori propensity of other minds with information about AIXI's internal state to cooperate with it, on the *one* shot PD? AIXI can't take the action it needs to learn the utility of... -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
> > AIXI-tl can learn the iterated PD, of course; just not the > oneshot complex PD. > But if it's had the right prior experience, it may have an operating program that is able to deal with the oneshot complex PD... ;-) ben --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Ben Goertzel wrote: >> In a naturalistic universe, where there is no sharp boundary between >> the physics of you and the physics of the rest of the world, the >> capability to invent new top-level internal reflective choices can be >> very important, pragmatically, in terms of properties of distant >> reality that directly correlate with your choice to your benefit, if >> there's any breakage at all of the Cartesian boundary - any >> correlation between your mindstate and the rest of the environment. > > Unless, you are vastly smarter than the rest of the universe. Then you > can proceed like an AIXItl and there is no need for top-level internal > reflective choices ;) Actually, even if you are vastly smarter than the rest of the entire universe, you may still be stuck dealing with lesser entities (though not humans; superintelligences at least) who have any information at all about your initial conditions, unless you can make top-level internal reflective choices. The chance that environmental superintelligences will cooperate with you in PD situations may depend on *their* estimate of *your* ability to generalize over the choice to defect and realize that a similar temptation exists on both sides. In other words, it takes a top-level internal reflective choice to adopt a cooperative ethic on the one-shot complex PD rather than blindly trying to predict and outwit the environment for maximum gain, which is built into the definition of AIXI-tl's control process. A superintelligence may cooperate with a comparatively small, tl-bounded AI, but be unable to cooperate with an AIXI-tl, provided there is any inferrable information about initial conditions. In one sense AIXI-tl "wins"; it always defects, which formally is a "better" choice than cooperating on the oneshot PD, regardless of what the opponent does - assuming that the environment is not correlated with your decisionmaking process. But anyone who knows that assumption is built into AIXI-tl's initial conditions will always defect against AIXI-tl. A small, tl-bounded AI that can make reflective choices has the capability of adopting a cooperative ethic; provided that both entities know or infer something about the other's initial conditions, they can arrive at a knowably correlated reflective choice to adopt cooperative ethics. AIXI-tl can learn the iterated PD, of course; just not the oneshot complex PD. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Eliezer S. Yudkowsky wrote: > Let's imagine I'm a superintelligent magician, sitting in my castle, > Dyson Sphere, what-have-you. I want to allow sentient beings some way > to visitme, but I'm tired of all these wandering AIXI-tl spambots that > script kiddies code up to brute-force my entrance challenges. I don't > want to tl-bound my visitors; what if an actual sentient 10^10^15 > ops/sec big wants to visit me? I don't want to try and examine the > internal state of the visiting agent, either; that just starts a war of > camouflage between myself and the spammers. Luckily, there's a simple > challenge I can pose to any visitor, cooperation with your clone, that > filters out the AIXI-tls and leaves only beings who are capable of a > certain level of reflectivity, presumably genuine sentients. I don't > need to know the tl-bound of my visitors, or the tl-bound of the > AIXI-tl, in order to construct this challenge. I write the code once. Oh, that's trivial to break. I just put my AIXI-t1 (whatever that is) in a human body and send it via rocket-ship... There would no way to clone this being so you would have no way to carry out the test. -- I WANT A DEC ALPHA!!! =) 21364: THE UNDISPUTED GOD OF ALL CPUS. http://users.rcn.com/alangrimes/ [if rcn.com doesn't work, try erols.com ] --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
> Anyway, a constant cave with an infinite tape seems like a constant > challenge to me, and a finite cave that breaks any {AIXI-tl, tl-human} > contest up to l=googlebyte also still seems interesting, especially as > AIXI-tl is supposed to work for any tl, not just sufficiently high tl. It's a fair mathematical challenge ... the reason I complained is that the physical-world metaphor of a cave seems to me to imply a finite system. A cave with an infinite tape in it is no longer a realizable physical system! > > (See, it IS actually possible to convince me of something, when it's > > correct; I'm actually not *hopelessly* stubborn ;) > > Yes, but it takes t2^l operations. > > (Sorry, you didn't deserve it, but a straight line like that only comes > along once.) ;-) ben --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Ben Goertzel wrote: hi, No, the challenge can be posed in a way that refers to an arbitrary agent A which a constant challenge C accepts as input. But the problem with saying it this way, is that the "constant challenge" has to have an infinite memory capacity. So in a sense, it's an infinite constant ;) Infinite Turing tapes are a pretty routine assumption in operations like these. I think Hutter's AIXI-tl is supposed to be able to handle constant environments (as opposed to constant challenges, a significant formal difference) that contain infinite Turing tapes. Though maybe that'd violate separability? Come to think of it, the Clone challenge might violate separability as well, since AIXI-tl (and hence its Clone) builds up state. No, the charm of the physical challenge is exactly that there exists a physically constant cavern which defeats any AIXI-tl that walks into it, while being tractable for wandering tl-Corbins. No, this isn't quite right. If the cavern is physically constant, then there must be an upper limit to the t and l for which it can clone AIXItl's. Hm, this doesn't strike me as a fair qualifier. One, if an AIXItl exists in the physical universe at all, there are probably infinitely powerful processors lying around like sunflower seeds. And two, if you apply this same principle to any other physically realized challenge, it means that people could start saying "Oh, well, AIXItl can't handle *this* challenge because there's an upper bound on how much computing power you're allowed to use." If Hutter's theorem is allowed to assume infinite computing power inside the Cartesian theatre, then the magician's castle should be allowed to assume infinite computing power outside the Cartesian theatre. Anyway, a constant cave with an infinite tape seems like a constant challenge to me, and a finite cave that breaks any {AIXI-tl, tl-human} contest up to l=googlebyte also still seems interesting, especially as AIXI-tl is supposed to work for any tl, not just sufficiently high tl. Well, yes, as a special case of AIXI-tl's being unable to carry out reasoning where their internal processes are correlated with the environment. Agreed... (See, it IS actually possible to convince me of something, when it's correct; I'm actually not *hopelessly* stubborn ;) Yes, but it takes t2^l operations. (Sorry, you didn't deserve it, but a straight line like that only comes along once.) -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Let's imagine I'm a superintelligent magician, sitting in my castle, Dyson Sphere, what-have-you. I want to allow sentient beings some way to visit me, but I'm tired of all these wandering AIXI-tl spambots that script kiddies code up to brute-force my entrance challenges. I don't want to tl-bound my visitors; what if an actual sentient 10^10^15 ops/sec big wants to visit me? I don't want to try and examine the internal state of the visiting agent, either; that just starts a war of camouflage between myself and the spammers. Luckily, there's a simple challenge I can pose to any visitor, cooperation with your clone, that filters out the AIXI-tls and leaves only beings who are capable of a certain level of reflectivity, presumably genuine sentients. I don't need to know the tl-bound of my visitors, or the tl-bound of the AIXI-tl, in order to construct this challenge. I write the code once. Cooperation with yourself is certainly a fair test when it comes to winning entrance into a magician's castle; I've seen it in at least one fantasy novel I can think of offhand. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
hi, > No, the challenge can be posed in a way that refers to an arbitrary agent > A which a constant challenge C accepts as input. But the problem with saying it this way, is that the "constant challenge" has to have an infinite memory capacity. So in a sense, it's an infinite constant ;) > No, the charm of the physical challenge is exactly that there exists a > physically constant cavern which defeats any AIXI-tl that walks into it, > while being tractable for wandering tl-Corbins. No, this isn't quite right. If the cavern is physically constant, then there must be an upper limit to the t and l for which it can clone AIXItl's. If the cavern has N bits (assuming a bitistic reduction of physics, for simplicity ;), then it can't clone an AIXItl where t >>2^N, can it? Not without grabbing bits (particles or whatever) from the outside universe to carry out the cloning. (and how could the AIXItl with t>>2^N even fit inside it??) You still need the quantifiers reversed: for any AIXI-tl, there is a cavern posing a challenge that defeats it... > > I think part of what you're saying here is that AIXItl's are > not designed to > > be able to participate in a community of equals This is > certainly true. > > Well, yes, as a special case of AIXI-tl's being unable to carry out > reasoning where their internal processes are correlated with the > environment. Agreed... (See, it IS actually possible to convince me of something, when it's correct; I'm actually not *hopelessly* stubborn ;) ben --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
> In a naturalistic universe, where there is no sharp boundary between the > physics of you and the physics of the rest of the world, the > capability to > invent new top-level internal reflective choices can be very important, > pragmatically, in terms of properties of distant reality that directly > correlate with your choice to your benefit, if there's any > breakage at all > of the Cartesian boundary - any correlation between your > mindstate and the > rest of the environment. Unless, you are vastly smarter than the rest of the universe. Then you can proceed like an AIXItl and there is no need for top-level internal reflective choices ;) ben g --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Ben Goertzel wrote: It's really the formalizability of the challenge as a computation which can be fed either a *single* AIXI-tl or a *single* tl-bounded uploaded human that makes the whole thing interesting at all... I'm sorry I didn't succeed in making clear the general class of real-world analogues for which this is a special case. OK I don't see how the challenge you've described is "formalizable as a computation which can be fed either a tl-bounded uploaded human or an AIXI-tl." The challenge involves cloning the agent being challenged. Thus it is not a computation feedable to the agent, unless you assume the agent is supplied with a cloning machine... You're not feeding the *challenge* to the *agent*. You're feeding the *agent* to the *challenge*. There's a constant computation C, which accepts as input an arbitrary agent, either a single AIXI-tl or a single tl-bounded upload, and creates a problem environment on which the upload is superior to the AIXI-tl. As part of this operation computation C internally clones the agent, but that operation all takes place inside C. That's why I call it diagonalizing. If I were to take a very rough stab at it, it would be that the cooperation case with your own clone is an extreme case of many scenarios where superintelligences can cooperate with each other on the one-shot Prisoner's Dilemna provided they have *loosely similar* reflective goal systems and that they can probabilistically estimate that enough loose similarity exists. Yah, but the definition of a superintelligence is relative to the agent being challenged. For any fixed superintelligent agent A, there are AIXItl's big enough to succeed against it in any cooperative game. To "break" AIXI-tl, the challenge needs to be posed in a way that refers to AIXItl's own size, i.e. one has to say something like "Playing a cooperative game with other intelligences of intelligence at least f(t,l)" where if is some increasing function No, the challenge can be posed in a way that refers to an arbitrary agent A which a constant challenge C accepts as input. For the naturalistic metaphor of a physical challenge, visualize a cavern into which an agent walks, rather than a game the agent is given to play. If the intelligence of the opponents is fixed, then one can always make an AIXItl win by increasing t and l ... So your challenges are all of the form: * For any fixed AIXItl, here is a challenge that will defeat it Here is a constant challenge C which accepts as input an arbitrary agent A, and defeats AIXI-tl but not tl-Corbin. ForAll AIXItl's A(t,l), ThereExists a challenge C(t,l) so that fails_at(A,C) or alternatively ForAll AIXItl's A(t,l), ThereExists a challenge C(A(t,l)) so that fails_at(A,C) rather than of the form * Here is a challenge that will defeat any AIXItl No, the charm of the physical challenge is exactly that there exists a physically constant cavern which defeats any AIXI-tl that walks into it, while being tractable for wandering tl-Corbins. ThereExists a challenge C so that ForAll AIXItl's A(t,l), fails_at(A,C) The point is that the challenge C is a function C(t,l) rather than being independent of t and l Nope. One cave. This of course is why your challenge doesn't break Hutter's theorem. But it's a distinction that your initial verbal formulation didn't make very clearly (and I understand, the distinction is not that easy to make in words.) No, the reason my challenge breaks Hutter's assumptions (though not disproving the theorem itself) is that it examines the internal state of the agent in order to clone it. My secondary thesis is that this is not a physically "unfair" scenario because correlations between self and environment are ubiquitous in naturalistic reality. Of course, it's also true that ForAll uploaded humans H, ThereExists a challenge C(H) so that fails_at(H,C) What you've shown that's interesting is that ThereExists a challenge C, so that: -- ForAll AIXItl's A(t,l), fails_at(A,C(A)) -- for many uploaded humans H, succeeds_at(H,C(H)) (Where, were one to try to actually prove this, one would substitute "uploaded humans" with "other AI programs" or something). This is almost right but, again, the point is that I'm thinking of C as a constant physical situation a single agent can face, a real-world cavern that it walks into. You could, if you wanted to filter those mere golem AIXI-tls out of your magician's castle, but let in real Corbins, construct a computationally simple barrier that did the trick... (Assuming tabula rasa AIXI-tls, so as not to start that up again.) The interesting part is that these little natural breakages in the formalism create an inability to take part in what I think might be a fundamental SI social idiom, conducting binding negotiations by convergence to goal processes that are guaranteed to have a correlated output, which relies on (a) Bayesian-inferred initial similarity between
Re: [agi] Breaking AIXI-tl
Brian Atkins wrote: > Ben Goertzel wrote: >> >> So your basic point is that, because these clones are acting by >> simulating programs that finish running in >> going to be able to simulate each other very accurately. >> >> Whereas, a pair of clones each possessing a more flexible control >> algorithm could perform better in the game. Because, if a more >> flexible player wants to simulate his opponent, he can choose to >> devote nearly ALL his thinking-time inbetween moves to simulating his >> opponent. Because these more flexible players are not constrained to >> a rigid control algorithm that divides up their time into little >> bits, simulating a huge number of fast programs. > > From my bystander POV I got something different out of this exchange of > messages... it appeared to me that Eliezer was not trying to say that > his point was regarding having more time for simulating, but rather > that humans possess a qualitatively different "level" of reflectivity > that allows them to "realize" the situation they're in, and therefore > come up with a simple strategy that probably doesn't even require much > simulating of their clone. It is this reflectivity difference that I > thought was more important to understand... or am I wrong? The really fundamental difference is that humans can invent new reflective choices in their top-level control process that correlate with distant reality and act as actions unavailable to AIXI-tl. This is what's going on when you decide your own clone's strategy in step (2). Corbin is "acting for his clone". He can do this because of a correlation between himself and his environment that AIXI is unable to take advantage of because AIXI is built on the assumption of a Cartesian theatre. Being able to simulate processes that think naturalistically, doesn't necessarily help; you need to be able to do it in the top level of your control process. Why? Because the only way the Primary and Secondary AIXI-tl could benefit from policies that simulate identical decisions, is if the Primary and Secondary chose identical policies, which would require a kind of intelligence in their top-level decision process that AIXI-tl doesn't have. The Primary and Secondary can only choose identical or sufficiently similar policies by coincidence or strange attractors, because they don't have the reflective intelligence to do it deliberately. They don't even have enough reflective intelligence to decide and store complete plans in step (2). In a naturalistic universe, where there is no sharp boundary between the physics of you and the physics of the rest of the world, the capability to invent new top-level internal reflective choices can be very important, pragmatically, in terms of properties of distant reality that directly correlate with your choice to your benefit, if there's any breakage at all of the Cartesian boundary - any correlation between your mindstate and the rest of the environment. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
> From my bystander POV I got something different out of this exchange of > messages... it appeared to me that Eliezer was not trying to say that > his point was regarding having more time for simulating, but rather that > humans possess a qualitatively different "level" of reflectivity that > allows them to "realize" the situation they're in, and therefore come up > with a simple strategy that probably doesn't even require much > simulating of their clone. It is this reflectivity difference that I > thought was more important to understand... or am I wrong? > -- > Brian Atkins The "qualitatively different level of reflectivity" that exists is simply that humans are able to devote a lot of their resources to simulating or analyzing programs that are around as slow as they are, and hence -- if they wish -- to simulating or analyzing large portions of themselves. Whereas AIXItl's by design are only able to devote their resources to simulating or analyzing programs that are much much faster than they are -- hence they are not able to simulate or analyze large portions of themselves. This does enable humans to have a qualitatively different type of reflectivity. For any fixed problem, defined independently of the solver, a big enough AIXItl can solve it better than a human. But a human can analyze itself better than an AIXItl can analyze itself, in some senses. But not in all senses: for instance, an AIXItl can prove theorems about itself better than a human can prove theorems about itself... -- Ben G --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Ben Goertzel wrote: So your basic point is that, because these clones are acting by simulating programs that finish running in From my bystander POV I got something different out of this exchange of messages... it appeared to me that Eliezer was not trying to say that his point was regarding having more time for simulating, but rather that humans possess a qualitatively different "level" of reflectivity that allows them to "realize" the situation they're in, and therefore come up with a simple strategy that probably doesn't even require much simulating of their clone. It is this reflectivity difference that I thought was more important to understand... or am I wrong? -- Brian Atkins Singularity Institute for Artificial Intelligence http://www.singinst.org/ --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
> Eliezer/Ben, > > When you've had time to draw breath can you explain, in non-obscure, > non-mathematical language, what the implications of the AIXI-tl > discussion are? > > Thanks. > > Cheers, Philip Here's a brief attempt... AIXItl is a non-practical AGI software design, which basically consists fo two parts * a metaprogram * an operating program The operating program controls its actions. The metaprogram works by searching the set of all programs of size less than L than finish running in less than T time steps, and finding the best one, and installing this best one as the operating program. Clearly this is a very slow approach to AI since it has to search a huge space of programs each time it does anything. There is a theorem that says... Given any AI system at all, if you give AIXItl a big enough t and l, then it can outperform the other AI system. Note that this is an unfair contest, because the AIXItl is effectively being given a lot more compute power than the other system. But basically, what the theorem shows is that if you don't need to worry about computing resources, then AI design is trivial -- you can just use AIXItl, which is a very simple program. This is not pragmatically useful at all, because in reality we DO have to worry about computing resources. What Eliezer has pointed out is that AIXItl's are bad at figuring out what each other are going to do. If you put a bunch of AIXItl's in a situation where they have to figure out what each other are going to do, they probably will fail. The reason is that what each AIXItl does is to evaluate a lot of programs much faster than it is, and choose one to be its operating program. An AIXItl is not configured to study programs that are as slow as it is, so it's not configured to study other programs that are its clones, or are of similar complexity to it. On the other hand, humans are dumber than AIXItl's (for big t and l), but they are smarter at figuring out what *each other* are going to do, because they are built to be able to evaluate programs (other humans) around as slow as they are. This is a technical reflection of the basic truth that * just because one AI system is a lot smarter than another when given any problem of fixed complexity to solve * doesn't mean the smarter AI system is better at figuring out and interacting with others of *its kind*, than the dumber one is at figuring out and interacting with others of *its kind*. Of course, I glossed over a good bit in trying to summarize the ideas nonmathematically.. In this way, Novamentes are more like humans than AIXItl's. -- Ben G --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
Hi, > There's a physical challenge which operates on *one* AIXI-tl and breaks > it, even though it involves diagonalizing the AIXI-tl as part of the > challenge. OK, I see what you mean by calling it a "physical challenge." You mean that, as part of the challenge, the external agent posing the challenge is allowed to clone the AIXI-tl. > > An intuitively fair, physically realizable challenge, with important > > real-world analogues, formalizable as a computation which can be fed > > either a tl-bounded uploaded human or an AIXI-tl, for which the human > > enjoys greater success measured strictly by total reward over time, due > > to the superior strategy employed by that human as the result of > > rational reasoning of a type not accessible to AIXI-tl. > > It's really the formalizability of the challenge as a computation which > can be fed either a *single* AIXI-tl or a *single* tl-bounded uploaded > human that makes the whole thing interesting at all... I'm sorry I didn't > succeed in making clear the general class of real-world analogues for > which this is a special case. OK I don't see how the challenge you've described is "formalizable as a computation which can be fed either a tl-bounded uploaded human or an AIXI-tl." The challenge involves cloning the agent being challenged. Thus it is not a computation feedable to the agent, unless you assume the agent is supplied with a cloning machine... > If I were to take a very rough stab at it, it would be that the > cooperation case with your own clone is an extreme case of many scenarios > where superintelligences can cooperate with each other on the one-shot > Prisoner's Dilemna provided they have *loosely similar* reflective goal > systems and that they can probabilistically estimate that enough loose > similarity exists. Yah, but the definition of a superintelligence is relative to the agent being challenged. For any fixed superintelligent agent A, there are AIXItl's big enough to succeed against it in any cooperative game. To "break" AIXI-tl, the challenge needs to be posed in a way that refers to AIXItl's own size, i.e. one has to say something like "Playing a cooperative game with other intelligences of intelligence at least f(t,l)" where if is some increasing function If the intelligence of the opponents is fixed, then one can always make an AIXItl win by increasing t and l ... So your challenges are all of the form: * For any fixed AIXItl, here is a challenge that will defeat it ForAll AIXItl's A(t,l), ThereExists a challenge C(t,l) so that fails_at(A,C) or alternatively ForAll AIXItl's A(t,l), ThereExists a challenge C(A(t,l)) so that fails_at(A,C) rather than of the form * Here is a challenge that will defeat any AIXItl ThereExists a challenge C so that ForAll AIXItl's A(t,l), fails_at(A,C) The point is that the challenge C is a function C(t,l) rather than being independent of t and l This of course is why your challenge doesn't break Hutter's theorem. But it's a distinction that your initial verbal formulation didn't make very clearly (and I understand, the distinction is not that easy to make in words.) Of course, it's also true that ForAll uploaded humans H, ThereExists a challenge C(H) so that fails_at(H,C) What you've shown that's interesting is that ThereExists a challenge C, so that: -- ForAll AIXItl's A(t,l), fails_at(A,C(A)) -- for many uploaded humans H, succeeds_at(H,C(H)) (Where, were one to try to actually prove this, one would substitute "uploaded humans" with "other AI programs" or something). > The interesting part is that these little > natural breakages in the formalism create an inability to take part in > what I think might be a fundamental SI social idiom, conducting binding > negotiations by convergence to goal processes that are guaranteed to have > a correlated output, which relies on (a) Bayesian-inferred initial > similarity between goal systems, and (b) the ability to create a > top-level > reflective choice that wasn't there before, that (c) was abstracted over > an infinite recursion in your top-level predictive process. I think part of what you're saying here is that AIXItl's are not designed to be able to participate in a community of equals This is certainly true. --- Ben G --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Eliezer/Ben, When you've had time to draw breath can you explain, in non-obscure, non-mathematical language, what the implications of the AIXI-tl discussion are? Thanks. Cheers, Philip --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Eliezer S. Yudkowsky wrote: > Bill Hibbard wrote: > > On Fri, 14 Feb 2003, Eliezer S. Yudkowsky wrote: > > > >>It *could* do this but it *doesn't* do this. Its control process is such > >>that it follows an iterative trajectory through chaos which is forbidden > >>to arrive at a truthful solution, though it may converge to a stable > >>attractor. > > > > This is the heart of the fallacy. Neither a human nor an AIXI > > can know "that his synchronized other self - whichever one > > he is - is doing the same". All a human or an AIXI can know is > > its observations. They can estimate but not know the intentions > > of other minds. > > The halting problem establishes that you can never perfectly understand > your own decision process well enough to predict its decision in advance, > because you'd have to take into account the decision process including the > prediction, et cetera, establishing an infinite regress. > > However, Corbin doesn't need to know absolutely that his other self is > synchronized, nor does he need to know his other self's decision in > advance. Corbin only needs to establish a probabilistic estimate, good > enough to guide his actions, that his other self's decision is correlated > with his *after* the fact. (I.e., it's not a halting problem where you > need to predict yourself in advance; you only need to know your own > decision after the fact.) > > AIXI-tl is incapable of doing this for complex cooperative problems > because its decision process only models tl-bounded things and AIXI-tl is > not *remotely close* to being tl-bounded. Now you are using a different argument. You previous argument was: > Lee Corbin can work out his entire policy in step (2), before step > (3) occurs, knowing that his synchronized other self - whichever one > he is - is doing the same. Now you have Corbin merely estimating his clone's intentions. While it is true that AIXI-tl cannot completely simulate itself, it also can estimate another AIXI-tl's future behavior based on observed behavior. Your argument is now that Corbin can do it better. I don't know if this is true or not. > . . . > Let's say that AIXI-tl takes action A in round 1, action B in round 2, and > action C in round 3, and so on up to action Z in round 26. There's no > obvious reason for the sequence {A...Z} to be predictable *even > approximately* by any of the tl-bounded processes AIXI-tl uses for > prediction. Any given action is the result of a tl-bounded policy but the > *sequence* of *different* tl-bounded policies was chosen by a t2^l process. Your example sequence is pretty simple and should match a nice simple universal turing machine program in an AIXI-tl, well within its bounds. Furthermore, two AIXI-tl's will probably converge on a simple sequence in prisoner's dilemma. But I have no idea if they can do it better than Corbin and his clone. Bill --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Eliezer S. Yudkowsky wrote: But if this isn't immediately obvious to you, it doesn't seem like a top priority to try and discuss it... Argh. That came out really, really wrong and I apologize for how it sounded. I'm not very good at agreeing to disagree. Must... sleep... -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Ben Goertzel wrote: > > I'll read the rest of your message tomorrow... > >> But we aren't *talking* about whether AIXI-tl has a mindlike >> operating program. We're talking about whether the physically >> realizable challenge, which definitely breaks the formalism, also >> breaks AIXI-tl in practice. That's what I originally stated, that's >> what you originally said you didn't believe, and that's all I'm >> trying to demonstrate. > > Your original statement was posed in a misleading way, perhaps not > intentionally. > > There is no challenge on which *an* AIXI-tl doesn't outperform *an* > uploaded human. We are all Lee Corbin; would you really say there's "more than one"... oh, never mind, I don't want to get *that* started here. There's a physical challenge which operates on *one* AIXI-tl and breaks it, even though it involves diagonalizing the AIXI-tl as part of the challenge. In the real world, all reality is interactive and naturalistic, not walled off by a Cartesian theatre. The example I gave is probably the simplest case that clearly breaks the formalism and clearly causes AIXI-tl to operate suboptimally. There's more complex and important cases, that we would understand as roughly constant environmental challenges which break AIXI-tl's formalism in more subtle ways, with the result that AIXI-tl can't cooperate in one-shot PDs with superintelligences... and neither can a human, incidentally, but another seed AI or superintelligence can-I-think, by inventing a new kind of reflective choice which is guaranteed to be correlated as a result of shared initial conditions, both elements that break AIXI-tl... well, anyway, the point is that there's a qualitatively different kind of intelligence here that I think could turn out to be extremely critical in negotiations among superintelligences. The formalism in this situation gets broken, depending on how you're looking at it, by side effects of the AIXI-tl's existence or by violation of the separability condition. Actually, violations of the formalism are ubiquitous and this is not particularly counterintuitive; what is counterintuitive is that formalism violations turn out to make a real-world difference. Are we at least in agreement on the fact that there exists a formalizable constant challenge C which accepts an arbitrary single agent and breaks both the AIXI-tl formalism and AIXI-tl? OK. We'd better take a couple of days off before taking up the AIXI Friendliness issue. Maybe even wait until I get back from New York in a week. Also, I want to wait for all these emails to show up in the AGI archive, then tell Marcus Hutter about them if no one has already. I'd be interesting in seeing what he thinks. > What you're trying to show is that there's an inter-AIXI-tl social > situation in which AIXI-tl's perform less intelligently than humans do > in a similar inter-human situation. > > If you had posed it this way, I wouldn't have been as skeptical > initially. If I'd posed it that way, it would have been uninteresting because I wouldn't have broken the formalism. Again, to quote my original claim: >> 1) There is a class of physically realizable problems, which humans >> can solve easily for maximum reward, but which - as far as I can tell >> - AIXI cannot solve even in principle; > > I don't see this, nor do I believe it... And later expanded to: > An intuitively fair, physically realizable challenge, with important > real-world analogues, formalizable as a computation which can be fed > either a tl-bounded uploaded human or an AIXI-tl, for which the human > enjoys greater success measured strictly by total reward over time, due > to the superior strategy employed by that human as the result of > rational reasoning of a type not accessible to AIXI-tl. It's really the formalizability of the challenge as a computation which can be fed either a *single* AIXI-tl or a *single* tl-bounded uploaded human that makes the whole thing interesting at all... I'm sorry I didn't succeed in making clear the general class of real-world analogues for which this is a special case. If I were to take a very rough stab at it, it would be that the cooperation case with your own clone is an extreme case of many scenarios where superintelligences can cooperate with each other on the one-shot Prisoner's Dilemna provided they have *loosely similar* reflective goal systems and that they can probabilistically estimate that enough loose similarity exists. It's the natural counterpart of the Clone challenge - loosely similar goal systems arise all the time, and it turns out that in addition to those goal systems being interpreted as a constant environmental challenge, there are social problems that depend on your being able to correlate your internal processes with theirs (you can correlate internal processes because you're both part of the same naturalistic universe). This breaks AIXI-tl because it's not loosely similar enough - an
RE: [agi] Breaking AIXI-tl
Hmmm My friend, I think you've pretty much convinced me with this last batch of arguments. Or, actually, I'm not sure if it was your excellently clear arguments or the fact that I finally got a quiet 15 minutes to really think about it (the three kids, who have all been out sick from school with a flu all week, are all finally in bed ;) Your arguments are a long way from a rigorous proof, and I can't rule out that there might be a hole in them, but in this last e-mail you were explicit enough to convince me that what you're saying makes logical sense. I'm going to try to paraphrase your argument, let's see if we're somewhere in the neighborhood of harmony... Basically: you've got these two clones playing a cooperative game, and each one, at each turn, is controlled by a certain program. Each clone chooses his "current operating program" by searching the space of all programs of length < L that finish running in < T timesteps, and finding the one that, based on his study of prior gameplay, is expected to give him the highest chance of winning. But each guy takes 2^T timesteps to perform this search. So your basic point is that, because these clones are acting by simulating programs that finish running in If AIXI-tl needs general intelligence but fails to develop > general intelligence to solve the complex cooperation problem, while > humans starting out with general intelligence do solve the problem, then > AIXI-tl has been broken. Well, we have different definitions of "broken" in this context, but that's not a point worth arguing about. > But we aren't *talking* about whether AIXI-tl has a mindlike operating > program. We're talking about whether the physically realizable > challenge, > which definitely breaks the formalism, also breaks AIXI-tl in practice. > That's what I originally stated, that's what you originally said you > didn't believe, and that's all I'm trying to demonstrate. Yes, you would seem to have successfully shown (logically and intuitively, though not mathematically) that AIXItl's can be dumber in their interactions with other AIXItl's, than humans are in their analogous interactions with other humans. I don't think you should describe this as "breaking the formalism", because the formalism is about how a single AIXItl solves a fixed goal function, not about how groups of AIXItl's interact. But it's certainly an interesting result. I hope that, even if you don't take the time to prove it rigorously, you'll write it up in a brief, coherent essay, so that others not on this list can appreciate it... Funky stuff!! ;-) -- Ben G --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
I'll read the rest of your message tomorrow... > But we aren't *talking* about whether AIXI-tl has a mindlike operating > program. We're talking about whether the physically realizable > challenge, > which definitely breaks the formalism, also breaks AIXI-tl in practice. > That's what I originally stated, that's what you originally said you > didn't believe, and that's all I'm trying to demonstrate. Your original statement was posed in a misleading way, perhaps not intentionally. There is no challenge on which *an* AIXI-tl doesn't outperform *an* uploaded human. What you're trying to show is that there's an inter-AIXI-tl social situation in which AIXI-tl's perform less intelligently than humans do in a similar inter-human situation. If you had posed it this way, I wouldn't have been as skeptical initially. -- Ben --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Ben Goertzel wrote: > >> AIXI-tl *cannot* figure this out because its control process is not >> capable of recognizing tl-computable transforms of its own policies >> and strategic abilities, *only* tl-computable transforms of its own >> direct actions. Yes, it simulates entities who know this; it also >> simulates every possible other kind of tl-bounded entity. The >> question is whether that internal knowledge appears as an advantage >> recognized by the control process and given AIXI-tl's formal >> definition, it does not appear to do so. > > I don't understand how you're deriving the conclusion in your final > sentence. > > How do you know the circumstances in which AIXItl would be led to adopt > operating programs involving modeling its own policies and strategic > abilities? Because AIXI-tl is a completely specified system and it is therefore possible to see certain bounds on its ability to model itself. It has no *direct* reflectivity except for its memory of its own actions and its indirect reflectivity is limited by the ability of a tl-bounded process to simulate the conclusions of a t2^l process. (Note that under ordinary circumstances AIXI-tl never needs this ability in order to *outperform* a tl-bounded process; its internal tl-bounded processes will always model an AIXI-tl as well as any tl-bounded process could.) We need to distinguish between abstract properties of the AIXI-tl's policies that an internal process can understand, and specific outputs of AIXI-tl that the internal process can predict. AIXI-tl simulates all possible tl-bounded semimeasures; some of those semimeasures will attempt to assign a probability to sense data based on the abstract theory "You are facing another AIXI-tl", but this abstract theory will not be enough to actually predict that AIXI-tl's *specific* outputs in order to assign them a high probability. The design structure of the cooperative strategy task (note that it is not the Prisoner's Dilemna but a complex cooperation problem) is such that each AIXI-tl will choose a different tl-bounded policy (using t2^l operations to do so). Given that the abstract theory contained in the tl-bounded probability semimeasure cannot access the tl-bounded policy of either AIXI, nor itself utilize the t2^l process used to select among policies, how is the semimeasure supposed to predict which actions the *other* AIXI will take? Even if such semimeasures succeed in bubbling to the top of the probability distribution, how will a tl-bounded policy in step 4 know which policy the other AIXI selected in order to coordinate strategies? There's no guarantee that the two policies are even approximately the same - AIXI-tl's policy is in the same dilemna as a human trying to work out the strategy in step 4 instead of step 2. If you "still don't see it", could you please say *which* step in the above reasoning first strikes you as a non sequitur? > You may well be right that PD2 is not such a circumstance, but that > doesn't mean there are no such circumstances, or that such > circumstances wouldn't be common in the hypothetical life of a real > embodies AIXItl Breaking a universality claim only requires one counterexample. Of course there are at least some circumstances where AIXI-tl can outperform a human! >> AIXI-tl learns vision *instantly*. The Kolmogorov complexity of a >> visual field is much less than its raw string, and the compact >> representation can be computed by a tl-bounded process. It develops >> a visual cortex on the same round it sees its first color picture. > > Yes, but that "visual cortex" would not be useful for anything. It > would take some time for an embodied AIXItl to figure out how to > recognize visual patterns in a way that was useful to it in > coordinating its actions. Unless it had a priori knowledge to guide > it, this would be a substantial process of trial and error learning. Okay, we discount the early trials as part of the bounded loss. Standard Operating Procedure for Hutter's proof. >> Because it is physically or computationally impossible for a >> tl-bounded program to access or internally reproduce the previously >> computed policies or t2^l strategic ability of AIXI-tl. > > Yes, but why can't it learn patterns that let it approximately predict > the strategies of AIXI-tl? Uh... I honestly thought I just *said* why. I'll try expanding; let me know if the expansion still doesn't help. AIXI-tl, trying to predict itself, steadily adds more and more tl-bounded Kolmogorov complexity to the sensory inputs it needs to predict. The true Kolmogorov complexity of AIXI-tl never exceeds the length of the AIXI-tl program plus the challenge computation C, which is actually pretty small change. However, the tl-bounded Kolmogorov complexity keeps rising unless AIXI-tl is lucky enough to stumble on a probability distribution model which in the Secondary advises actions that confirm the probability distribution model in the Pr
RE: [agi] Breaking AIXI-tl
Hi, > You appear to be thinking of AIXI-tl as a fuzzy little harmless baby being > confronted with some harsh trial. Once again, your ability to see into my mind proves extremely flawed ;-) You're right that my statement "AIXItl is slow at learning" was ill-said, though. It is very inefficient at learning in the sense that it takes a huge number of computation steps to decide each action it takes. However, in your PD scenario you're assuming that it has a fast enough processor to do all this thinking inbetween each step of the iterated PD, in which case, yeah, it has to be doing very very fast operations. AIXItl is slow at learning if you count slowness in terms of computation steps, but that's not the way your example wants us to lok at things... > > The question is whether after enough trials AIXI-tl figures out it's > > playing some entity similar to itself and learns how to act > > accordingly If so, then it's doing what AIXI-tl is supposed to do. > > > AIXI-tl *cannot* figure this out because its control process is not > capable of recognizing tl-computable transforms of its own policies and > strategic abilities, *only* tl-computable transforms of its own direct > actions. Yes, it simulates entities who know this; it also simulates > every possible other kind of tl-bounded entity. The question is whether > that internal knowledge appears as an advantage recognized by the control > process and given AIXI-tl's formal definition, it does not appear > to do so. I don't understand how you're deriving the conclusion in your final sentence. How do you know the circumstances in which AIXItl would be led to adopt operating programs involving modeling its own policies and strategic abilities? You may well be right that PD2 is not such a circumstance, but that doesn't mean there are no such circumstances, or that such circumstances wouldn't be common in the hypothetical life of a real embodies AIXItl > > A human can also learn to solve vision recognition problems faster than > > AIXI-tl, because we're wired for it (as we're wired for social > > gameplaying), whereas AIXI-tl has to learn > > AIXI-tl learns vision *instantly*. The Kolmogorov complexity of a visual > field is much less than its raw string, and the compact representation can > be computed by a tl-bounded process. It develops a visual cortex on the > same round it sees its first color picture. Yes, but that "visual cortex" would not be useful for anything. It would take some time for an embodied AIXItl to figure out how to recognize visual patterns in a way that was useful to it in coordinating its actions. Unless it had a priori knowledge to guide it, this would be a substantial process of trial and error learning. > >> Humans can recognize a much stronger degree of similarity in human > >> Other Minds than AIXI-tl's internal processes are capable of > >> recognizing in any other AIXI-tl. > > > > I don't believe that is true. > > Mentally simulate the abstract specification of AIXI-tl instead of using > your intuitions about the behavior of a generic reinforcement process. Eliezer, I don't know what a "generic reinforcement process" is. Of course AIXItl is very different from an ordinary reinforcement learning system. > > OK... here's where the fact that you have a tabula rasa AIXI-tl in a > > very limiting environment comes in. > > > > In a richer environment, I don't see why AIXI-tl, after a long enough > > time, couldn't learn an operating program that implicitly embodied an > > abstraction over its own internal state. > > Because it is physically or computationally impossible for a tl-bounded > program to access or internally reproduce the previously computed > policies > or t2^l strategic ability of AIXI-tl. Yes, but why can't it learn patterns that let it approximately predict the strategies of AIXI-tl? > > In an environment consisting solely of PD2, it may be that AIXI-tl will > > never have the inspiration to learn this kind of operating program. > > (I'm not sure.) > > > > To me, this says mostly that PD2 is an inadequate environment for any > > learning system to use, to learn how to become a mind. If it ain't > > good enough for AIXI-tl to use to learn how to become a mind, over a > > very long period of time, it probably isn't good for any AI system to > > use to learn how to become a mind. > > Marcus Hutter has formally proved your intuitions wrong. In any > situation > that does *not* break the formalism, AIXI-tl learns to equal or > outperform > any other process, despite being a tabula rasa, no matter how > rich or poor > its environment. No, Marcus Hutter did not prove the intuition I expressed there wrong. You seem not to have understood what I was saying. AIXI-tl can equal or outperform any other process so long as it is given a lot more computatonal resources than the other process. But that was not the statement I was making. What I was saying was that ANY reinf
Re: [agi] Breaking AIXI-tl
Eliezer S. Yudkowsky asked Ben Goertzel: > > Do you have a non-intuitive mental simulation mode? > LOL --#:^D It *is* a valid question, Eliezer, but it makes me laugh. Michael Roy Ames [Who currently estimates his *non-intuitive mental simulation mode* to contain about 3 iterations of 5 variables each - 8 variables each on a good day. Each variable can link to a concept (either complex or simple)... and if that sounds to you like something that a trashed-out Commodore 64 could emulate, then you have some idea how he feels being stuck at his current level of non-intuitive intelligence.] --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Bill Hibbard wrote: On Fri, 14 Feb 2003, Eliezer S. Yudkowsky wrote: It *could* do this but it *doesn't* do this. Its control process is such that it follows an iterative trajectory through chaos which is forbidden to arrive at a truthful solution, though it may converge to a stable attractor. This is the heart of the fallacy. Neither a human nor an AIXI can know "that his synchronized other self - whichever one he is - is doing the same". All a human or an AIXI can know is its observations. They can estimate but not know the intentions of other minds. The halting problem establishes that you can never perfectly understand your own decision process well enough to predict its decision in advance, because you'd have to take into account the decision process including the prediction, et cetera, establishing an infinite regress. However, Corbin doesn't need to know absolutely that his other self is synchronized, nor does he need to know his other self's decision in advance. Corbin only needs to establish a probabilistic estimate, good enough to guide his actions, that his other self's decision is correlated with his *after* the fact. (I.e., it's not a halting problem where you need to predict yourself in advance; you only need to know your own decision after the fact.) AIXI-tl is incapable of doing this for complex cooperative problems because its decision process only models tl-bounded things and AIXI-tl is not *remotely close* to being tl-bounded. Humans can model minds much closer to their own size than AIXI-tl can. Humans can recognize when their policies, not just their actions, are reproduced. We can put ourselves in another human's shoes imperfectly; AIXI-tl can't put itself in another AIXI-tl's shoes to the extent of being able to recognize the actions of an AIXI-tl computed using a process that is inherently 2t^l large. Humans can't recognize their other selves perfectly but the gap in the case of AIXI-tl is enormously greater. (Humans also have a reflective control process on which they can perform inductive and deductive generalizations and jump over a limited class of infinite regresses in decision processes, but that's a separate issue. Suffice it to say that a subprocess which generalizes over its own infinite regress does not obviously suffice for AIXI-tl to generalize over the top-level infinite regress in AIXI-tl's control process.) Let's say that AIXI-tl takes action A in round 1, action B in round 2, and action C in round 3, and so on up to action Z in round 26. There's no obvious reason for the sequence {A...Z} to be predictable *even approximately* by any of the tl-bounded processes AIXI-tl uses for prediction. Any given action is the result of a tl-bounded policy but the *sequence* of *different* tl-bounded policies was chosen by a t2^l process. A human in the same situation has a mnemonic record of the sequence of policies used to compute their strategies, and can recognize correlations between the sequence of policies and the other agent's sequence of actions, which can then be confirmed by directing O(other-agent) strategic processing power at the challenge of seeing the problem from the opposite perspective. AIXI-tl is physically incapable of doing this directly and computationally incapable of doing it indirectly. This is not an attack on the computability of intelligence; the human is doing something perfectly computable which AIXI-tl does not do. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Ben Goertzel wrote: >> Even if a (grown) human is playing PD2, it outperforms AIXI-tl >> playing PD2. > > Well, in the long run, I'm not at all sure this is the case. You > haven't proved this to my satisfaction. PD2 is very natural to humans; we can take for granted that humans excel at PD2. The question is AIXI-tl. > In the short run, it certainly is the case. But so what? AIXI-tl is > damn slow at learning, we know that. AIXI-tl is most certainly not "damn slow" at learning any environment that can be tl-bounded. For problems that don't break the Cartesian formalism, AIXI-tl learns only slightly lower than the fastest possible tl-bounded learner. It's got t2^l computing power for gossakes! From our perspective it learns at faster than the fastest rate humanly imaginable - literally. You appear to be thinking of AIXI-tl as a fuzzy little harmless baby being confronted with some harsh trial. That fuzzy little harmless baby, if the tl-bound is large enough to simulate Lee Corbin, is wielding something like 10^10^15 operations per second, which it is using to *among other things* simulate every imaginable human experience. AIXI-tl is larger than universes; it contains all possible tl-bounded heavens and all possible tl-bounded hells. The only question is whether its control process makes any good use of all that computation. More things from the list of system properties that Friendliness programmers should sensitize themselves to: Just because the endless decillions of alternate Ben Goertzels in torture chambers are screaming to God to stop it doesn't mean that AIXI-tl's control process cares. > The question is whether after enough trials AIXI-tl figures out it's > playing some entity similar to itself and learns how to act > accordingly If so, then it's doing what AIXI-tl is supposed to do. > AIXI-tl *cannot* figure this out because its control process is not capable of recognizing tl-computable transforms of its own policies and strategic abilities, *only* tl-computable transforms of its own direct actions. Yes, it simulates entities who know this; it also simulates every possible other kind of tl-bounded entity. The question is whether that internal knowledge appears as an advantage recognized by the control process and given AIXI-tl's formal definition, it does not appear to do so. In my humble opinion, one of the (many) critical skills for creating AI is learning to recognize what systems *really actually do* and not just what you project onto them. See also Eliza effect, failure of GOFAI, etc. > A human can also learn to solve vision recognition problems faster than > AIXI-tl, because we're wired for it (as we're wired for social > gameplaying), whereas AIXI-tl has to learn AIXI-tl learns vision *instantly*. The Kolmogorov complexity of a visual field is much less than its raw string, and the compact representation can be computed by a tl-bounded process. It develops a visual cortex on the same round it sees its first color picture. >> Humans can recognize a much stronger degree of similarity in human >> Other Minds than AIXI-tl's internal processes are capable of >> recognizing in any other AIXI-tl. > > I don't believe that is true. Mentally simulate the abstract specification of AIXI-tl instead of using your intuitions about the behavior of a generic reinforcement process. Eventually the results you learn will be integrated into your intuitions and you'll be able to directly see dependencies betwen specifications and reflective modeling abilities. > OK... here's where the fact that you have a tabula rasa AIXI-tl in a > very limiting environment comes in. > > In a richer environment, I don't see why AIXI-tl, after a long enough > time, couldn't learn an operating program that implicitly embodied an > abstraction over its own internal state. Because it is physically or computationally impossible for a tl-bounded program to access or internally reproduce the previously computed policies or t2^l strategic ability of AIXI-tl. > In an environment consisting solely of PD2, it may be that AIXI-tl will > never have the inspiration to learn this kind of operating program. > (I'm not sure.) > > To me, this says mostly that PD2 is an inadequate environment for any > learning system to use, to learn how to become a mind. If it ain't > good enough for AIXI-tl to use to learn how to become a mind, over a > very long period of time, it probably isn't good for any AI system to > use to learn how to become a mind. Marcus Hutter has formally proved your intuitions wrong. In any situation that does *not* break the formalism, AIXI-tl learns to equal or outperform any other process, despite being a tabula rasa, no matter how rich or poor its environment. >> Anyway... basically, if you're in a real-world situation where the >> other intelligence has *any* information about your internal state, >> not just from direct examination, but from reasoning about your >> origin
RE: [agi] Breaking AIXI-tl
> Even if a (grown) human is playing PD2, it outperforms AIXI-tl playing > PD2. Well, in the long run, I'm not at all sure this is the case. You haven't proved this to my satisfaction. In the short run, it certainly is the case. But so what? AIXI-tl is damn slow at learning, we know that. The question is whether after enough trials AIXI-tl figures out it's playing some entity similar to itself and learns how to act accordingly If so, then it's doing what AIXI-tl is supposed to do. A human can also learn to solve vision recognition problems faster than AIXI-tl, because we're wired for it (as we're wired for social gameplaying), whereas AIXI-tl has to learn > Humans can recognize a much stronger degree of similarity in human Other > Minds than AIXI-tl's internal processes are capable of recognizing in any > other AIXI-tl. I don't believe that is true. > Again, as far as I can tell, this > necessarily requires abstracting over your own internal state and > recognizing that the outcome of your own (internal) choices are > necessarily reproduced by a similar computation elsewhere. > Basically, it > requires abstracting over your own halting problem to realize that the > final result of your choice is correlated with that of the process > simulated, even though you can't fully simulate the causal process > producing the correlation in advance. (This doesn't *solve* your own > halting problem, but at least it enables you to *understand* the > situation > you've been put into.) Except that instead of abstracting over your own > halting problem, you're abstracting over the process of trying to > simulate > another mind trying to simulate you trying to simulate it, where > the other > mind is sufficiently similar to your own. This is a kind of reasoning > qualitatively closed to AIXI-tl; its control process goes on abortively > trying to simulate the chain of simulations forever, stopping and > discarding that prediction as unuseful as soon as it exceeds the t-bound. OK... here's where the fact that you have a tabula rasa AIXI-tl in a very limiting environment comes in. In a richer environment, I don't see why AIXI-tl, after a long enough time, couldn't learn an operating program that implicitly embodied an abstraction over its own internal state. In an environment consisting solely of PD2, it may be that AIXI-tl will never have the inspiration to learn this kind of operating program. (I'm not sure.) To me, this says mostly that PD2 is an inadequate environment for any learning system to use, to learn how to become a mind. If it ain't good enough for AIXI-tl to use to learn how to become a mind, over a very long period of time, it probably isn't good for any AI system to use to learn how to become a mind. > Anyway... basically, if you're in a real-world situation where the other > intelligence has *any* information about your internal state, not just > from direct examination, but from reasoning about your origins, then that > also breaks the formalism and now a tl-bounded seed AI can outperform > AIXI-tl on the ordinary (non-quined) problem of cooperation with a > superintelligence. The environment can't ever *really* be constant and > completely separated as Hutter requires. A physical environment that > gives rise to an AIXI-tl is different from the environment that > gives rise > to a tl-bounded seed AI, and the different material implementations of > these entities (Lord knows how you'd implement the AIXI-tl) will have > different side effects, and so on. All real world problems break the > Cartesian assumption. The questions "But are there any kinds of problems > for which that makes a real difference?" and "Does any > conceivable kind of > mind do any better?" can both be answered affirmatively. Welll I agree with only some of this. The thing is, an AIXI-tl-driven AI embedded in the real world would have a richer environment to draw on than the impoverished data provided by PD2. This AI would eventually learn how to model itself and reflect in a rich way (by learning the right operating program). However, AIXI-tl is a horribly bad AI algorithm, so it would take a VERY VERY long time to carry out this learning, of course... -- Ben --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Ben Goertzel wrote: OK. Rather than responding point by point, I'll try to say something compact ;) You're looking at the interesting scenario of a iterated prisoners dilemma between two AIXI-tl's, each of which has a blank operating program at the start of the iterated prisoners' dilemma. (In parts of my last reply, I was questioning the blankness of the operating program, but let's accept it for sake of discussion.) The theorems about AIXI-tl do not say much about the performance of AIXI-tl relative to other systems on this task. Because what the theorems talk about is AIXI-tl maximizing reward function R versus System X maximizing reward function R over a long period of time. Whereas in your case you're asking about AIXI-tl maximizing reward function R(AIXI_tl) versus System X maximizing reward function R(X) i.e. the reward function is a function of the system in question. AIXI-tl and System X (e.g. an uploaded human) are not competing against the same opponent, they're competing against different opponents (their clones, in your scenario). So, unless I'm overlooking something, you're looking at a scenario not covered by Hutter's theorems. That is correct. As I said: "An intuitively fair, physically realizable challenge, with important real-world analogues, formalizable as a computation which can be fed either a tl-bounded uploaded human or an AIXI-tl, for which the human enjoys greater success measured strictly by total reward over time, due to the superior strategy employed by that human as the result of rational reasoning of a type not accessible to AIXI-tl." Obviously, such a challenge cannot be covered by Hutter's theorems or AIXI-tl would outperform the human. The question is whether Hutter's theorems describe all the realistic physical situations a mind can encounter. You're stating that a human (System X) can do better in an iterated PD against other humans, than an AIXItl can do in an iterated PD against other AIXItl's. That is correct. Humans (and Friendly AIs) can employ Hofstadterian superrationality as a strategy; AIXI-tl cannot. I still have problems understanding your reasoning, when you derive this conclusion. Maybe I'm just being obtuse; I'm sure I haven't spent as much time thinking about it as you have. But, suppose you're right. What you've done is come up with an interesting observation (and if you formalize it, an interesting theorem) about (small) social systems of AIXI-tl's. This is very nice. Does this somehow tell you something about the interactions of AIXI-tl's with humans? Is that the follow-up point you want to make, regarding AIXItl Friendliness? Nope. The two points are, if not completely unrelated, then related only on such a deep level that I wasn't explicitly planning to point it out. But Hofstadterian superrationality - and certain other generalized challenges - are physically realizable, important, and can be solved by humans because we have superior reflectivity to AIXI-tl. Your observation is about the behavior of an AIXI/AIXI-tl whose only life-experience has consisted of a very weird artificial situation. This behavior is not going to be the same as the behavior of an AIXI/AIXItl embedded in a richer environment with a different reward function. That is the point I was trying to make with my talk about the "initial operating program" of the AIXI/AIXItl in your simulation. Yes. It is a quite irrelevant point to breaking Hutter's theorem. Also, specifying an AIXI-tl embedded in some unspecified prior environment injects entropy into the problem description. I show that AIXI-tl does learn to recognize its own reflected "way of thinking" (as opposed to its reflected actions, which AIXI-tl *can* recognize) because AIXI-tl cannot, as a human would, remember its own way of thinking or deliberately "place itself into the other human's shoes" and simulate its own way of thinking given different goals, both abilities available at the human level of reflectivity; AIXI-tl can only place itself into the shoes of tl-bounded processes. This prohibits AIXI-tl from using Hofstadterian superrationality to notice that the policies of other entities correlate with its own policies, and prohibits AIXI-tl from choosing the policies of other entities by selecting its own policies based on the knowledge that the policy the other entity chooses will correlate with its own. There are additional Other Mind correlation problems that humans can't solve but seed AIs can because of the seed AI's superior reflectivity; the point is that there's a real kind of intelligence here of which AIXI-tl arguably has quantity zero. Now, let me get back to my problems understanding your reasoning. Consider the problem PD(Y) = "System Y plays iterated PD against a clone of System Y" Clearly, PD(Y) is not a problem at which one would expect more intelligent systems to necessarily perform better than less intelligent ones!! True, bu
RE: [agi] Breaking AIXI-tl
> Really, when has a computer (with the exception of certain Microsoft > products) ever been able to disobey it's human masters? > > It's easy to get caught up in the romance of "superpowers", but come on, > there's nothing to worry about. > > -Daniel Hi Daniel, Clearly there is nothing to worry about TODAY. And I'm spending the vast bulk of my time working on practical AI design and engineering and application work, not on speculating about the future. However, I do believe that once AI tech has advanced far enough, there WILL be something to worry about. How close we are to this point is another question. Current AI practice is very far away from achieving autonomous general intelligence. If I'm right about the potential of Novamente and similar designs, we could be within a decade of getting there If I'm wrong, well, Kurzweil has made some decent arguments why we'll get there by 2050 or so... ;-) -- Ben Goertzel --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
OK. Rather than responding point by point, I'll try to say something compact ;) You're looking at the interesting scenario of a iterated prisoners dilemma between two AIXI-tl's, each of which has a blank operating program at the start of the iterated prisoners' dilemma. (In parts of my last reply, I was questioning the blankness of the operating program, but let's accept it for sake of discussion.) The theorems about AIXI-tl do not say much about the performance of AIXI-tl relative to other systems on this task. Because what the theorems talk about is AIXI-tl maximizing reward function R versus System X maximizing reward function R over a long period of time. Whereas in your case you're asking about AIXI-tl maximizing reward function R(AIXI_tl) versus System X maximizing reward function R(X) i.e. the reward function is a function of the system in question. AIXI-tl and System X (e.g. an uploaded human) are not competing against the same opponent, they're competing against different opponents (their clones, in your scenario). So, unless I'm overlooking something, you're looking at a scenario not covered by Hutter's theorems. You're stating that a human (System X) can do better in an iterated PD against other humans, than an AIXItl can do in an iterated PD against other AIXItl's. I still have problems understanding your reasoning, when you derive this conclusion. Maybe I'm just being obtuse; I'm sure I haven't spent as much time thinking about it as you have. But, suppose you're right. What you've done is come up with an interesting observation (and if you formalize it, an interesting theorem) about (small) social systems of AIXI-tl's. This is very nice. Does this somehow tell you something about the interactions of AIXI-tl's with humans? Is that the follow-up point you want to make, regarding AIXItl Friendliness? Your observation is about the behavior of an AIXI/AIXI-tl whose only life-experience has consisted of a very weird artificial situation. This behavior is not going to be the same as the behavior of an AIXI/AIXItl embedded in a richer environment with a different reward function. That is the point I was trying to make with my talk about the "initial operating program" of the AIXI/AIXItl in your simulation. Now, let me get back to my problems understanding your reasoning. Consider the problem PD(Y) = "System Y plays iterated PD against a clone of System Y" Clearly, PD(Y) is not a problem at which one would expect more intelligent systems to necessarily perform better than less intelligent ones!! Now consider two subproblems PD1(Y) = PD(Y), but each System Y knows it's playing a clone of itself PD2(Y) = PD(Y), but each System Y is playing a completely unidentified, mysterious opponent I'm worried that in your comparison, you have the human upload playing PD1, but have the AIXI-tl playing PD2. PD1 is easier, but PD1 doesn't seem to be your scenario, because it requires the AIXItl not to be starting with a blank operating program. Or do you have them both playing PD2? If a human is playing PD2, then it has to proceed solely by time series analysis, and its actions are probably going to meander around chaotically until settling on some attractor (or maybe they'll just meander around...). MAYBE the human manages to recognize that the responses of its opponent are so similar to its own responses that its opponent must be a lot like itself... and this helps it settle on a beneficial attractor. If an AIXItl is playing PD2, the situation is pretty much the same as if the human is doing so, isn't it? Except can't you argue that an AIXItl is so smart that in the long run it's more likely than a human to figure out that its opponent is acting a lot like it is, and make a guess that symmetrical friendly behavior might be a good thing? -- Ben > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Eliezer S. Yudkowsky > Sent: Friday, February 14, 2003 1:45 AM > To: [EMAIL PROTECTED] > Subject: Re: [agi] Breaking AIXI-tl > > > Ben Goertzel wrote: > > > >> Because AIXI-tl is not an entity deliberately allocating computing > >> power; its control process is fixed. AIXI-tl will model a process > >> that proves theorems about AIXI-tl only if that process is the best > >> predictor of the environmental information seen so far. > > > > Well... a human's control process is fixed too, in a way. We cannot > > rewire our brains, our biological motivators. And a human will > > accurately model other humans only if its fixed motivators have > > (directly or indirectly) led it to do so... > > I think you're anthropomorphizing AIXI. (I think you're >
RE: [agi] Breaking AIXI-tl
> There is a lot of variation in human > psychology, and some humans are pretty damn dangerous. Also there is the > maxim "power corrupts, and absolute power corrupts absolutely" which tells > you something about human psychology. A human with superintelligence and > superpowers could be a great thing or a terrible thing -- it's hard to > balance this unknown outcome against the unknown outcome of an AGI. Really, when has a computer (with the exception of certain Microsoft products) ever been able to disobey it's human masters? It's easy to get caught up in the romance of "superpowers", but come on, there's nothing to worry about. -Daniel * Daniel Colonnese Computer Science Dept. NCSU 2718 Clark Street Raleigh NC 27670 Voice: (919) 451-3141 Fax: (775) 361-4495 http://www4.ncsu.edu:8030/~dcolonn/ * -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Ben Goertzel Sent: Friday, February 14, 2003 8:46 AM To: [EMAIL PROTECTED] Subject: RE: [agi] Breaking AIXI-tl Hi Eliezer Some replies to "side points": > This is a critical class of problem for would-be implementors of > Friendliness. If all AIs, regardless of their foundations, did sort of > what humans would do, given that AI's capabilities, the whole world would > be a *lot* safer. Hmmm. I don't believe that. There is a lot of variation in human psychology, and some humans are pretty damn dangerous. Also there is the maxim "power corrupts, and absolute power corrupts absolutely" which tells you something about human psychology. A human with superintelligence and superpowers could be a great thing or a terrible thing -- it's hard to balance this unknown outcome against the unknown outcome of an AGI. > > In this way Novamentes will be more like humans, but with the > > flexibility to change their hard-wired motivators as well, if they > > REALLY want to... > > And what they do with that flexibility will be totally unlike what you > would do in that situation, Well, yeah Of course. Novamente is not a model of the human brain-mind, and its behavior will almost always be different than that of humans. Ethically speaking, I don't consider human behavior a tremendously great model anyway. Read the damn newspaper!! We are quite possibly on a path to self-destruction through rampant unethical violent behavior... > The task of AGI is not to see that the computers in front of us > "could" do > something, but to figure out what are the key differences that we must > choose among to make them actually do it. This holds for Friendliness as > well. That's why I worry when you see Friendliness in AIXI that isn't > there. AIXI "could" be Friendly, in the sense that it is capable of > simulating Friendly minds; and it's possible to toss off a loose argument > that AIXI's control process will arrive at Friendliness. But AIXI will > not end up being Friendly, no matter what the pattern of inputs and > rewards. And what I'm afraid of is that neither will Novamente. Well, first of all, there is not terribly much relation btw AIXI/AIXItl and Novamente, so what you show about the former system means very little about the latter. As for the Friendliness of AIXI/AIXItl, it is obvious that an AIXI/AIXItl system will never have a deepest-level implicit or explicit supergoal that is *ethical*. Its supergoal is just to maximize its reward. Period. So it can act beneficially to humans for an arbitrarily long period of time, if its reward structure has been set up that way. By positing an AIXI/AIXItl system that is connected with a specific reward mechanism (e.g. a button pushed by humans, an electronic sensor that is part of a robot body, etc.) you are then positing something beyond vanilla AIXI/AIXItl: you're positing an AIXI/AIXItl that is embedded in the world in some way. The notion of Friendliness does not exist on the level of pure, abstract AIXI/AIXItl, does it? It exists on the level of world-embedded AIXI/AIXItl. And once you're looking at world-embedded AIXI/AIXItl, you no longer have a purely formal characterization of AIXI/AIXItl, do you? Ben --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED] --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
Hi Eliezer Some replies to "side points": > This is a critical class of problem for would-be implementors of > Friendliness. If all AIs, regardless of their foundations, did sort of > what humans would do, given that AI's capabilities, the whole world would > be a *lot* safer. Hmmm. I don't believe that. There is a lot of variation in human psychology, and some humans are pretty damn dangerous. Also there is the maxim "power corrupts, and absolute power corrupts absolutely" which tells you something about human psychology. A human with superintelligence and superpowers could be a great thing or a terrible thing -- it's hard to balance this unknown outcome against the unknown outcome of an AGI. > > In this way Novamentes will be more like humans, but with the > > flexibility to change their hard-wired motivators as well, if they > > REALLY want to... > > And what they do with that flexibility will be totally unlike what you > would do in that situation, Well, yeah Of course. Novamente is not a model of the human brain-mind, and its behavior will almost always be different than that of humans. Ethically speaking, I don't consider human behavior a tremendously great model anyway. Read the damn newspaper!! We are quite possibly on a path to self-destruction through rampant unethical violent behavior... > The task of AGI is not to see that the computers in front of us > "could" do > something, but to figure out what are the key differences that we must > choose among to make them actually do it. This holds for Friendliness as > well. That's why I worry when you see Friendliness in AIXI that isn't > there. AIXI "could" be Friendly, in the sense that it is capable of > simulating Friendly minds; and it's possible to toss off a loose argument > that AIXI's control process will arrive at Friendliness. But AIXI will > not end up being Friendly, no matter what the pattern of inputs and > rewards. And what I'm afraid of is that neither will Novamente. Well, first of all, there is not terribly much relation btw AIXI/AIXItl and Novamente, so what you show about the former system means very little about the latter. As for the Friendliness of AIXI/AIXItl, it is obvious that an AIXI/AIXItl system will never have a deepest-level implicit or explicit supergoal that is *ethical*. Its supergoal is just to maximize its reward. Period. So it can act beneficially to humans for an arbitrarily long period of time, if its reward structure has been set up that way. By positing an AIXI/AIXItl system that is connected with a specific reward mechanism (e.g. a button pushed by humans, an electronic sensor that is part of a robot body, etc.) you are then positing something beyond vanilla AIXI/AIXItl: you're positing an AIXI/AIXItl that is embedded in the world in some way. The notion of Friendliness does not exist on the level of pure, abstract AIXI/AIXItl, does it? It exists on the level of world-embedded AIXI/AIXItl. And once you're looking at world-embedded AIXI/AIXItl, you no longer have a purely formal characterization of AIXI/AIXItl, do you? Ben --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
On Fri, 14 Feb 2003, Eliezer S. Yudkowsky wrote: > Ben Goertzel wrote: > . . . > >> Lee Corbin can work out his entire policy in step (2), before step > >> (3) occurs, knowing that his synchronized other self - whichever one > >> he is - is doing the same. > > > > OK -- now, if AIXItl were starting out with the right program, it could > > do this too, because the program could reason "that other AIXItl is > > gonna do the same thing as me, so based on this knowledge, what should > > I do" > > It *could* do this but it *doesn't* do this. Its control process is such > that it follows an iterative trajectory through chaos which is forbidden > to arrive at a truthful solution, though it may converge to a stable > attractor. > . . . This is the heart of the fallacy. Neither a human nor an AIXI can know "that his synchronized other self - whichever one he is - is doing the same". All a human or an AIXI can know is its observations. They can estimate but not know the intentions of other minds. Bill --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Ben Goertzel wrote: > >> Because AIXI-tl is not an entity deliberately allocating computing >> power; its control process is fixed. AIXI-tl will model a process >> that proves theorems about AIXI-tl only if that process is the best >> predictor of the environmental information seen so far. > > Well... a human's control process is fixed too, in a way. We cannot > rewire our brains, our biological motivators. And a human will > accurately model other humans only if its fixed motivators have > (directly or indirectly) led it to do so... I think you're anthropomorphizing AIXI. (I think you're anthropomorphizing Novamente too, btw, but I have complete access to AIXI's formalism, so only there can I actually show your intuitions to be wrong.) You said that AIXI-tl *could in theory* model something. AIXI-tl *would not in fact* model that thing, given its control process. While humans *would in fact* model that thing, given theirs. I am not arguing about fixed versus unfixed control processes but pointing out that the specific human control process is superior to AIXI-tl. (For those of you on the list who are not aware that I am not an AI skeptic, this is not a Penrosian argument against the computational implementation of intelligence, it's an argument against the AIXI-tl Cartesian formalism for intelligence.) You are "anthropomorphizing" AIXI in the sense that you expect AIXI to do what you would do given AIXI's raw capabilities, but it's possible to look at AIXI's control process and see that it does not, in fact, do that. This is a critical class of problem for would-be implementors of Friendliness. If all AIs, regardless of their foundations, did sort of what humans would do, given that AI's capabilities, the whole world would be a *lot* safer. > Of course, humans are very different from AIXI-tl, because in humans > there is a gradation from totally hard-wired to totally > ephemeral/flexible, whereas in AIXI-tl there's a rigid dichotomy > between the hard-wired control program and the ephemeral operating > program. > > In this way Novamentes will be more like humans, but with the > flexibility to change their hard-wired motivators as well, if they > REALLY want to... And what they do with that flexibility will be totally unlike what you would do in that situation, unless you understand the sensitive dependencies between a mind's foundations and how that mind behaves. You expect AIXI to behave like Novamente, and you expect both to behave like a human mind. You are mistaken with respect to both AIXI and Novamente, but I can only demonstrate it for AIXI. (Please don't reply with a list of differences you perceive between AIXI/Novamente/humans; I know you perceive *some* differences.) >> Lee Corbin can work out his entire policy in step (2), before step >> (3) occurs, knowing that his synchronized other self - whichever one >> he is - is doing the same. > > OK -- now, if AIXItl were starting out with the right program, it could > do this too, because the program could reason "that other AIXItl is > gonna do the same thing as me, so based on this knowledge, what should > I do" It *could* do this but it *doesn't* do this. Its control process is such that it follows an iterative trajectory through chaos which is forbidden to arrive at a truthful solution, though it may converge to a stable attractor. > But you seem to be assuming that > > a) the Lee Corbin starts out with a head full of knowledge achieved > through experience That is correct. AIXI-tl is supposed to equal or surpass *any* specific tl-bounded program given enough time. I could give Lee Corbin a computer implant. I could put AIXI-tl up against a tl-bounded superintelligence. AIXI-tl is still supposed to win. You are applying anthropomorphic reasoning ("a head full of knowledge achieved through experience") to a formally specified problem. > b) the AIXItl starts out without a reasonable operating program, and > has to learn everything from scratch during the experiment That is not formally a problem if the experiment lasts long enough. Also, please note that being armed with the capability to simulate 2^l programs tl-bounded to THE SIZE OF AN ENTIRE HUMAN MIND is, anthropomorphically speaking, one HELL of a capability. This capability is supposed to equal or overwhelm Corbin's regardless of what "knowledge is stuffed into his head". > What if you used, for the competition a Lee Corbin with a tabula rasa > brain, an infant Lee Corbin. It wouldn't perform very well, as it > wouldn't even understand the competition. Again: anthropomorphic reasoning about a formally specifiable problem. Lee Corbin is tl-bounded therefore the contest is fair. If the contest goes on long enough AIXI-tl should win or, at worst, lose by a bounded amount. > Of course, if you put a knowledgeable human up against a new baby > AIXI-tl, the knowledgeable human can win an intelligence contest. You > don't ne
RE: [agi] Breaking AIXI-tl
Eliezer, I will print your message and read it more slowly tomorrow morning when my brain is better rested. But I can't resist some replies now, albeit on 4 hours of sleep ;) > Because AIXI-tl is not an entity deliberately allocating computing power; > its control process is fixed. AIXI-tl will model a process that proves > theorems about AIXI-tl only if that process is the best predictor of the > environmental information seen so far. Well... a human's control process is fixed too, in a way. We cannot rewire our brains, our biological motivators. And a human will accurately model other humans only if its fixed motivators have (directly or indirectly) led it to do so... Of course, humans are very different from AIXI-tl, because in humans there is a gradation from totally hard-wired to totally ephemeral/flexible, whereas in AIXI-tl there's a rigid dichotomy between the hard-wired control program and the ephemeral operating program. In this way Novamentes will be more like humans, but with the flexibility to change their hard-wired motivators as well, if they REALLY want to... [snipped out description of problem scenario] > Lee Corbin can work out his entire policy in step (2), before step (3) > occurs, knowing that his synchronized other self - whichever one he is - > is doing the same. OK -- now, if AIXItl were starting out with the right program, it could do this too, because the program could reason "that other AIXItl is gonna do the same thing as me, so based on this knowledge, what should I do" But you seem to be assuming that a) the Lee Corbin starts out with a head full of knowledge achieved through experience b) the AIXItl starts out without a reasonable operating program, and has to learn everything from scratch during the experiment What if you used, for the competition a Lee Corbin with a tabula rasa brain, an infant Lee Corbin. It wouldn't perform very well, as it wouldn't even understand the competition. Of course, if you put a knowledgeable human up against a new baby AIXI-tl, the knowledgeable human can win an intelligence contest. You don't need the Prisoner's Dilemma to prove this. Just ask them both what 2+2 equals. The baby AIXI-tl will have no way to know. Now, if you give the AIXI-tl enough time and experience to learn about Prisoners Dilemma situations -- or, to learn about selves and minds and computer systems -- then it will evolve an operating program that knows how to reason somewhat like a human does, with concepts like "that other AIXI-tl is just like me, so it will think and act like I do." > The major point is as follows: AIXI-tl is unable to arrive at a valid > predictive model of reality because the sequence of inputs it sees, on > successive rounds, are being produced by AIXI-tl trying to model the > inputs using tl-bounded programs, while in fact those inputs are really > the outputs of the non-tl-bounded AIXI-tl. If a tl-bounded program > correctly predicts the inputs seen so far, it will be using some > inaccurate model of the actual reality, since no tl-bounded program can > model the actual computational process AIXI-tl uses to select outputs. Yah, but Lee Corbin can't model (in perfect detail) the actual computational process the other Lee Corbin uses to select outputs, either. So what? > Humans can use a naturalistic representation of a reality in which they > are embedded, rather than being forced like AIXI-tl to reason about a > separated environment; consequently humans are capable of rationally > reasoning about correlations between their internal mental processes and > other parts of reality, which is the key to the complex cooperation > problem with your own clone - the realization that you can actually > *decide* your clone's actions in step (2), if you make the right > agreements with yourself and keep them. I don't see why an AIXI-tl with a clever operating program coming into the competition couldn't make the same realization that a human does. So your argument is that a human baby mind exposed ONLY to prisoners' dilemma interactions as its environment would somehow learn to "realize it can decide its clone's actions", whereas a baby AIXI-tl mind exposed only to these interactions cannot carry out this learning? > (b) This happens because of a hidden assumption built into the > formalism, > wherein AIXI devises a Cartesian model of a separated environmental > theatre, rather than devising a model of a naturalistic reality that > includes AIXI. It seems to me this has to do with the nature of AIXI-tl's operating program. With the right operating program, AIXI-tl would model reality in a way that included AIXI-tl. It would do so, only if this operating program were useful to it For example, if you wrapped up AIXI-tl in a body with skin and actuators and sensors, it would find that modeling the world as containing AIXI-tl was a very useful strategy. Just as baby humans find that modeling the world as containing ba
Re: [agi] Breaking AIXI-tl
Ben Goertzel wrote: > Eliezer, > >> A (selfish) human upload can engage in complex cooperative strategies >> with an exact (selfish) clone, and this ability is not accessible to >> AIXI-tl, since AIXI-tl itself is not tl-bounded and therefore cannot >> be simulated by AIXI-tl, nor does AIXI-tl have any means of >> abstractly representing the concept "a copy of myself". Similarly, >> AIXI is not computable and therefore cannot be simulated by AIXI. >> Thus both AIXI and AIXI-tl break down in dealing with a physical >> environment that contains one or more copies of them. You might say >> that AIXI and AIXI-tl can both do anything except recognize >> themselves in a mirror. > > I disagree with the bit about 'nor does AIXI-tl have any means of > abstractly representing the concept "a copy of myself".' > > It seems to me that AIXI-tl is capable of running programs that contain > such an abstract representation. Why not? If the parameters are > right, it can run programs vastly more complex than a human brain > upload... > > For example, an AIXI-tl can run a program that contains the AIXI-tl > algorithm, as described in Hutter's paper, with t and l left as free > variables. This program can then carry out reasoning using predicate > logic, about AIXI-tl in general, and about AIXI-tl for various values > of t and l. > > Similarly, AIXI can run a program that contains a mathematical > description of AIXI similar to the one in Hutter's paper. This program > can then prove theorems about AIXI using predicate logic. > > For instance, if AIXI were rewarded for proving math theorems about > AGI, eventually it would presumably learn to prove theorems about AIXI, > extending Hutter's theorems and so forth. Yes, AIXI can indeed prove theorems about AIXI better than any human. AIXI-tl can prove theorems about AIXI-tl better than any tl-bounded human. AIXI-tl can model AIXI-tl as well as any tl-bounded human. AIXI-tl can model a tl-bounded human, say Lee Corbin, better than any tl-bounded human; given deterministic physics it's possible AIXI-tl can model Lee Corbin better than Lee Corbin (although I'm not quite as sure of this). But AIXI-tl can't model an AIXI-tl in the same way that a Corbin-tl can model a Corbin-tl. See below. >> The simplest case is the one-shot Prisoner's Dilemna against your own >> exact clone. It's pretty easy to formalize this challenge as a >> computation that accepts either a human upload or an AIXI-tl. This >> obviously breaks the AIXI-tl formalism. Does it break AIXI-tl? This >> question is more complex than you might think. For simple problems, >> there's a nonobvious way for AIXI-tl to stumble onto incorrect >> hypotheses which imply cooperative strategies, such that these >> hypotheses are stable under the further evidence then received. I >> would expect there to be classes of complex cooperative problems in >> which the chaotic attractor AIXI-tl converges to is suboptimal, but I >> have not proved it. It is definitely true that the physical problem >> breaks the AIXI formalism and that a human upload can >> straightforwardly converge to optimal cooperative strategies based on >> a model of reality which is more correct than any AIXI-tl is capable >> of achieving. >> >> Ultimately AIXI's decision process breaks down in our physical >> universe because AIXI models an environmental reality with which it >> interacts, instead of modeling a naturalistic reality within which it >> is embedded. It's one of two major formal differences between AIXI's >> foundations and Novamente's. Unfortunately there is a third >> foundational difference between AIXI and a Friendly AI. > > I don't agree at all. > > In a Prisoner's Dilemma between two AIXI-tl's, why can't each one run a > program that: > > * uses an abstract mathematical representation of AIXI-tl, similar to > the one given in the Hutter paper * use predicate logic to prove > theorems about the behavior of the other AIXI-tl Because AIXI-tl is not an entity deliberately allocating computing power; its control process is fixed. AIXI-tl will model a process that proves theorems about AIXI-tl only if that process is the best predictor of the environmental information seen so far. Let's say the primary AIXI-tl, the one whose performance we're tracking, is facing a complex cooperative problem. Within each round, the challenge protocol is as follows. 1) The Primary testee is cloned - that is, the two testees are resynchronized at the start of each new round. This is why Lee Corbin is the human upload (i.e., to avoid moral issues). We will assume that the Secondary testee, if a human upload, continues to attempt to maximize rational reward despite impending doom; again, this is why we're using Lee Corbin. 2) Each party, the Primary and the Secondary (the Secondary being re-cloned on each round) are shown an identical map of the next cooperative complex problem. For example, this might be a set of billiards
Re: [agi] Breaking AIXI-tl
Hi Eliezer, > An intuitively fair, physically realizable challenge, with important > real-world analogues, formalizable as a computation which can be fed > either a tl-bounded uploaded human or an AIXI-tl, for which the human > enjoys greater success measured strictly by total reward over time, due to > the superior strategy employed by that human as the result of rational > reasoning of a type not accessible to AIXI-tl. > > Roughly speaking: > > A (selfish) human upload can engage in complex cooperative strategies with > an exact (selfish) clone, and this ability is not accessible to AIXI-tl, > since AIXI-tl itself is not tl-bounded and therefore cannot be simulated > by AIXI-tl, nor does AIXI-tl have any means of abstractly representing the > concept "a copy of myself". Similarly, AIXI is not computable and > therefore cannot be simulated by AIXI. Thus both AIXI and AIXI-tl break > down in dealing with a physical environment that contains one or more > copies of them. You might say that AIXI and AIXI-tl can both do anything > except recognize themselves in a mirror. Why do you require an AIXI or AIXI-tl to simulate itself, when humans cannot? A human cannot know that another human is an exact clone of itself. All humans or AIXIs can know is what they observe. They cannot know that another mind is identical. > The simplest case is the one-shot Prisoner's Dilemna against your own > exact clone. It's pretty easy to formalize this challenge as a > computation that accepts either a human upload or an AIXI-tl. This > obviously breaks the AIXI-tl formalism. Does it break AIXI-tl? This > question is more complex than you might think. For simple problems, > there's a nonobvious way for AIXI-tl to stumble onto incorrect hypotheses > which imply cooperative strategies, such that these hypotheses are stable > under the further evidence then received. I would expect there to be > classes of complex cooperative problems in which the chaotic attractor > AIXI-tl converges to is suboptimal, but I have not proved it. It is > definitely true that the physical problem breaks the AIXI formalism and > that a human upload can straightforwardly converge to optimal cooperative > strategies based on a model of reality which is more correct than any > AIXI-tl is capable of achieving. Given that humans can only know what they observe, and thus cannot know what is going on inside another mind, humans are on the same footing as AIXIs in Prisoner's Dilema. I suspect that two AIXIs or AIXI-tl's will do well at the game, since a strategy with betrayal probably needs a longer program than a startegy without betrayal, and the AIXI will weight more strongly a model of the other's behavior with a shorter program. > Ultimately AIXI's decision process breaks down in our physical universe > because AIXI models an environmental reality with which it interacts, > instead of modeling a naturalistic reality within which it is embedded. > It's one of two major formal differences between AIXI's foundations and > Novamente's. Unfortunately there is a third foundational difference > between AIXI and a Friendly AI. I will grant you one thing: that since an AIXI cannot exist and an AIXI-tl is too slow to be practical, using them as a basis for discussing safe AGIs is a bit futile. The other problem is that an AIXI's optimality is only as valid as its assumption about the probability distribution of universal Turing machine programs. Cheers, Bill -- Bill Hibbard, SSEC, 1225 W. Dayton St., Madison, WI 53706 [EMAIL PROTECTED] 608-263-4427 fax: 608-263-6738 http://www.ssec.wisc.edu/~billh/vis.html --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Eliezer S. Yudkowsky wrote: Has the problem been thought up just in the sense of "What happens when two AIXIs meet?" or in the formalizable sense of "Here's a computational challenge C on which a tl-bounded human upload outperforms AIXI-tl?" I don't know of anybody else considering "human upload" vs. AIXI. Cheers Shane --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
RE: [agi] Breaking AIXI-tl
Eliezer, > A (selfish) human upload can engage in complex cooperative > strategies with > an exact (selfish) clone, and this ability is not accessible to AIXI-tl, > since AIXI-tl itself is not tl-bounded and therefore cannot be simulated > by AIXI-tl, nor does AIXI-tl have any means of abstractly > representing the > concept "a copy of myself". Similarly, AIXI is not computable and > therefore cannot be simulated by AIXI. Thus both AIXI and AIXI-tl break > down in dealing with a physical environment that contains one or more > copies of them. You might say that AIXI and AIXI-tl can both do anything > except recognize themselves in a mirror. I disagree with the bit about 'nor does AIXI-tl have any means of abstractly representing the concept "a copy of myself".' It seems to me that AIXI-tl is capable of running programs that contain such an abstract representation. Why not? If the parameters are right, it can run programs vastly more complex than a human brain upload... For example, an AIXI-tl can run a program that contains the AIXI-tl algorithm, as described in Hutter's paper, with t and l left as free variables. This program can then carry out reasoning using predicate logic, about AIXI-tl in general, and about AIXI-tl for various values of t and l. Similarly, AIXI can run a program that contains a mathematical description of AIXI similar to the one in Hutter's paper. This program can then prove theorems about AIXI using predicate logic. For instance, if AIXI were rewarded for proving math theorems about AGI, eventually it would presumably learn to prove theorems about AIXI, extending Hutter's theorems and so forth. > The simplest case is the one-shot Prisoner's Dilemna against your own > exact clone. It's pretty easy to formalize this challenge as a > computation that accepts either a human upload or an AIXI-tl. This > obviously breaks the AIXI-tl formalism. Does it break AIXI-tl? This > question is more complex than you might think. For simple problems, > there's a nonobvious way for AIXI-tl to stumble onto incorrect hypotheses > which imply cooperative strategies, such that these hypotheses are stable > under the further evidence then received. I would expect there to be > classes of complex cooperative problems in which the chaotic attractor > AIXI-tl converges to is suboptimal, but I have not proved it. It is > definitely true that the physical problem breaks the AIXI formalism and > that a human upload can straightforwardly converge to optimal cooperative > strategies based on a model of reality which is more correct than any > AIXI-tl is capable of achieving. > > Ultimately AIXI's decision process breaks down in our physical universe > because AIXI models an environmental reality with which it interacts, > instead of modeling a naturalistic reality within which it is embedded. > It's one of two major formal differences between AIXI's foundations and > Novamente's. Unfortunately there is a third foundational difference > between AIXI and a Friendly AI. I don't agree at all. In a Prisoner's Dilemma between two AIXI-tl's, why can't each one run a program that: * uses an abstract mathematical representation of AIXI-tl, similar to the one given in the Hutter paper * use predicate logic to prove theorems about the behavior of the other AIXI-tl How is this so different than what two humans do when reasoning about each others' behavior? A given human cannot contain within itself a detailed model of its own clone; in practice, when a human reasons about the behavior of it clone, it is going to use some abstract representation of that clone, and do some precise or uncertain reasoning based on this abstract representation. -- Ben G --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Shane Legg wrote: Eliezer, Yes, this is a clever argument. This problem with AIXI has been thought up before but only appears, at least as far as I know, in material that is currently unpublished. I don't know if anybody has analysed the problem in detail as yet... but it certainly is a very interesting question to think about: What happens when two super intelligent AIXI's meet? "SI-AIXI" is redundant; all AIXIs are enormously far beyond superintelligent. As for the problem, the obvious answer is that no matter what strange things happen, an AIXI^2 which performs Solomonoff^2 induction, using the universal prior of strings output by first-order Oracle machines, will come up with the best possible strategy for handling it... Has the problem been thought up just in the sense of "What happens when two AIXIs meet?" or in the formalizable sense of "Here's a computational challenge C on which a tl-bounded human upload outperforms AIXI-tl?" -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
Re: [agi] Breaking AIXI-tl
Eliezer, Yes, this is a clever argument. This problem with AIXI has been thought up before but only appears, at least as far as I know, in material that is currently unpublished. I don't know if anybody has analysed the problem in detail as yet... but it certainly is a very interesting question to think about: What happens when two super intelligent AIXI's meet? I'll have to think about this for a while before I reply. Also, you mentioned that it was in your opinion trivial to see that an AIXI type system would turn into an unfriendly AI. I'm still interested to see this argument spelled out, especially if you think it's a relatively simple argument. Cheers Shane Eliezer S. Yudkowsky wrote: Okay, let's see, I promised: An intuitively fair, physically realizable challenge, with important real-world analogues, formalizable as a computation which can be fed either a tl-bounded uploaded human or an AIXI-tl, for which the human enjoys greater success measured strictly by total reward over time, due to the superior strategy employed by that human as the result of rational reasoning of a type not accessible to AIXI-tl. Roughly speaking: A (selfish) human upload can engage in complex cooperative strategies with an exact (selfish) clone, and this ability is not accessible to AIXI-tl, since AIXI-tl itself is not tl-bounded and therefore cannot be simulated by AIXI-tl, nor does AIXI-tl have any means of abstractly representing the concept "a copy of myself". Similarly, AIXI is not computable and therefore cannot be simulated by AIXI. Thus both AIXI and AIXI-tl break down in dealing with a physical environment that contains one or more copies of them. You might say that AIXI and AIXI-tl can both do anything except recognize themselves in a mirror. The simplest case is the one-shot Prisoner's Dilemna against your own exact clone. It's pretty easy to formalize this challenge as a computation that accepts either a human upload or an AIXI-tl. This obviously breaks the AIXI-tl formalism. Does it break AIXI-tl? This question is more complex than you might think. For simple problems, there's a nonobvious way for AIXI-tl to stumble onto incorrect hypotheses which imply cooperative strategies, such that these hypotheses are stable under the further evidence then received. I would expect there to be classes of complex cooperative problems in which the chaotic attractor AIXI-tl converges to is suboptimal, but I have not proved it. It is definitely true that the physical problem breaks the AIXI formalism and that a human upload can straightforwardly converge to optimal cooperative strategies based on a model of reality which is more correct than any AIXI-tl is capable of achieving. Ultimately AIXI's decision process breaks down in our physical universe because AIXI models an environmental reality with which it interacts, instead of modeling a naturalistic reality within which it is embedded. It's one of two major formal differences between AIXI's foundations and Novamente's. Unfortunately there is a third foundational difference between AIXI and a Friendly AI. --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]