date:20100705

Re: [agi] Reward function vs utility

2010-07-05 Thread Joshua Fox

Abram,

Good point. But I am ignoring the implementation of the  utility/reward
function , and treating it as a Platonic  mathematical function of
world-state or observations which cannot be changed without reducing the
total utility/reward. You are quite right that when we do bring
implementation into account, as one must in the real world,
the implementation (e.g., the person you mentioned) can be gamed.

Even the pure mathematical function, however, can be gamed if you can alter
its inputs unfairly, as in the example I gave of altering observations to
optimize a function of the observations.

Regards,

Joshua

On Sun, Jul 4, 2010 at 6:43 PM, Abram Demski abramdem...@gmail.com wrote:

 Joshua,

 But couldn't it game the external utility function by taking actions which
 modify it? For example, if the suggestion is taken literally and you have a
 person deciding the reward at each moment, an AI would want to focus on
 making that person *think* the reward should be high, rather than focusing
 on actually doing well at whatever task it's set...and the two would tend to
 diverge greatly for more and more complex/difficult tasks, since these tend
 to be harder to judge. Furthermore, the AI would be very pleased to knock
 the human out of the loop and push its own buttons. Similar comments would
 apply to automated reward calculations.

 --Abram


 On Sun, Jul 4, 2010 at 4:40 AM, Joshua Fox joshuat...@gmail.com wrote:

 Another point. I'm probably repeating the obvious, but perhaps this will
 be useful to some.

 On the one hand,  an agent could not game a Legg-like intelligence metric
 by altering the utility function, even an internal one,, since the metric is
 based on the function before any such change.

 On the other hand, since an  internally-calculated utility function would
 necessarily be a function of observations, rather than of actual world
 state, it could be successfully gamed by altering observations.

 This latter objection does not apply to functions which are externally
 calculated, whether known or unknown.

 Joshua



 On Fri, Jul 2, 2010 at 7:23 PM, Joshua Fox joshuat...@gmail.com wrote:

 I found the answer as given by Legg, *Machine Superintelligence*, p. 72,
 copied below. A reward function is used to bypass potential difficulty in
 communicating a utility function to the agent.

 Joshua

 The existence of a goal raises the problem of how the agent knows what
 the
 goal is. One possibility would be for the goal to be known in advance and
 for this knowledge to be built into the agent. The problem with this is
 that
 it limits each agent to just one goal. We need to allow agents that are
 more
 flexible, specifically, we need to be able to inform the agent of what
 the goal
 is. For humans this is easily done using language. In general however,
 the
 possession of a suffciently high level of language is too strong an
 assumption
 to make about the agent. Indeed, even for something as intelligent as a
 dog
 or a cat, direct explanation is not very effective.

 Fortunately there is another possibility which is, in some sense, a blend
 of
 the above two. We define an additional communication channel with the
 sim-
 plest possible semantics: a signal that indicates how good the agent’s
 current
 situation is. We will call this signal the reward. The agent simply has
 to
 maximise the amount of reward it receives, which is a function of the
 goal. In
 a complex setting the agent might be rewarded for winning a game or
 solving
 a puzzle. If the agent is to succeed in its environment, that is, receive
 a lot of
 reward, it must learn about the structure of the environment and in
 particular
 what it needs to do in order to get reward.




 On Mon, Jun 28, 2010 at 1:32 AM, Ben Goertzel b...@goertzel.org wrote:

 You can always build the utility function into the assumed universal
 Turing machine underlying the definition of algorithmic information...

 I guess this will improve learning rate by some additive constant, in
 the long run ;)

 ben

 On Sun, Jun 27, 2010 at 4:22 PM, Joshua Fox joshuat...@gmail.comwrote:

 This has probably been discussed at length, so I will appreciate a
 reference on this:

 Why does Legg's definition of intelligence (following on Hutters' AIXI
 and related work) involve a reward function rather than a utility 
 function?
 For this purpose, reward is a function of the word state/history which is
 unknown to the agent while  a utility function is known to the agent.

 Even if  we replace the former with the latter, we can still have a
 definition of intelligence that integrates optimization capacity over
 possible all utility functions.

 What is the real  significance of the difference between the two types
 of functions here?

 Joshua
*agi* | Archives https://www.listbox.com/member/archive/303/=now
 https://www.listbox.com/member/archive/rss/303/ | 
 Modifyhttps://www.listbox.com/member/?;Your Subscription
 http://www.listbox.com




 --
 Ben Goertzel, PhD

Re: [agi] Reward function vs utility

2010-07-05 Thread Abram Demski

Joshua,

Fortunately, this is not that hard to fix by abandoning the idea of a reward
function and going back to a normal utility function... I am working on a
paper on how to do that.

--Abram

On Mon, Jul 5, 2010 at 9:43 AM, Joshua Fox joshuat...@gmail.com wrote:

 Abram,

 Good point. But I am ignoring the implementation of the  utility/reward
 function , and treating it as a Platonic  mathematical function of
 world-state or observations which cannot be changed without reducing the
 total utility/reward. You are quite right that when we do bring
 implementation into account, as one must in the real world,
 the implementation (e.g., the person you mentioned) can be gamed.

 Even the pure mathematical function, however, can be gamed if you can alter
 its inputs unfairly, as in the example I gave of altering observations to
 optimize a function of the observations.

 Regards,

 Joshua

 On Sun, Jul 4, 2010 at 6:43 PM, Abram Demski abramdem...@gmail.comwrote:

 Joshua,

 But couldn't it game the external utility function by taking actions which
 modify it? For example, if the suggestion is taken literally and you have a
 person deciding the reward at each moment, an AI would want to focus on
 making that person *think* the reward should be high, rather than focusing
 on actually doing well at whatever task it's set...and the two would tend to
 diverge greatly for more and more complex/difficult tasks, since these tend
 to be harder to judge. Furthermore, the AI would be very pleased to knock
 the human out of the loop and push its own buttons. Similar comments would
 apply to automated reward calculations.

 --Abram


 On Sun, Jul 4, 2010 at 4:40 AM, Joshua Fox joshuat...@gmail.com wrote:

 Another point. I'm probably repeating the obvious, but perhaps this will
 be useful to some.

 On the one hand,  an agent could not game a Legg-like intelligence metric
 by altering the utility function, even an internal one,, since the metric is
 based on the function before any such change.

 On the other hand, since an  internally-calculated utility function would
 necessarily be a function of observations, rather than of actual world
 state, it could be successfully gamed by altering observations.

 This latter objection does not apply to functions which are externally
 calculated, whether known or unknown.

 Joshua



 On Fri, Jul 2, 2010 at 7:23 PM, Joshua Fox joshuat...@gmail.com wrote:

 I found the answer as given by Legg, *Machine Superintelligence*, p.
 72, copied below. A reward function is used to bypass potential difficulty
 in communicating a utility function to the agent.

 Joshua

 The existence of a goal raises the problem of how the agent knows what
 the
 goal is. One possibility would be for the goal to be known in advance
 and
 for this knowledge to be built into the agent. The problem with this is
 that
 it limits each agent to just one goal. We need to allow agents that are
 more
 flexible, specifically, we need to be able to inform the agent of what
 the goal
 is. For humans this is easily done using language. In general however,
 the
 possession of a suffciently high level of language is too strong an
 assumption
 to make about the agent. Indeed, even for something as intelligent as a
 dog
 or a cat, direct explanation is not very effective.

 Fortunately there is another possibility which is, in some sense, a
 blend of
 the above two. We define an additional communication channel with the
 sim-
 plest possible semantics: a signal that indicates how good the agent’s
 current
 situation is. We will call this signal the reward. The agent simply has
 to
 maximise the amount of reward it receives, which is a function of the
 goal. In
 a complex setting the agent might be rewarded for winning a game or
 solving
 a puzzle. If the agent is to succeed in its environment, that is,
 receive a lot of
 reward, it must learn about the structure of the environment and in
 particular
 what it needs to do in order to get reward.




 On Mon, Jun 28, 2010 at 1:32 AM, Ben Goertzel b...@goertzel.org wrote:

 You can always build the utility function into the assumed universal
 Turing machine underlying the definition of algorithmic information...

 I guess this will improve learning rate by some additive constant, in
 the long run ;)

 ben

 On Sun, Jun 27, 2010 at 4:22 PM, Joshua Fox joshuat...@gmail.comwrote:

 This has probably been discussed at length, so I will appreciate a
 reference on this:

 Why does Legg's definition of intelligence (following on Hutters' AIXI
 and related work) involve a reward function rather than a utility 
 function?
 For this purpose, reward is a function of the word state/history which is
 unknown to the agent while  a utility function is known to the agent.

 Even if  we replace the former with the latter, we can still have a
 definition of intelligence that integrates optimization capacity over
 possible all utility functions.

 What is the real  significance of the difference

Re: [agi] Reward function vs utility

2010-07-05 Thread Abram Demski

Ian,

The reward button *would* be amoung the well-defined ones, though... sounds
to me like you are just abusing Goedel's theorem. Can you give a more
detailed argument?

--Abram

On Sun, Jul 4, 2010 at 4:47 PM, Ian Parker ianpark...@gmail.com wrote:



 No it would not. AI willk press its own buttons only if those buttons are
 defined. In one sense you can say that Goedel's theorem is a proof of
 friendliness as it means that there must always be one button that AI cannot
 press.


   - Ian Parker

 On 4 July 2010 16:43, Abram Demski abramdem...@gmail.com wrote:

 Joshua,

 But couldn't it game the external utility function by taking actions which
 modify it? For example, if the suggestion is taken literally and you have a
 person deciding the reward at each moment, an AI would want to focus on
 making that person *think* the reward should be high, rather than focusing
 on actually doing well at whatever task it's set...and the two would tend to
 diverge greatly for more and more complex/difficult tasks, since these tend
 to be harder to judge. Furthermore, the AI would be very pleased to knock
 the human out of the loop and push its own buttons. Similar comments would
 apply to automated reward calculations.

 --Abram



   *agi* | Archives https://www.listbox.com/member/archive/303/=now
 https://www.listbox.com/member/archive/rss/303/ | 
 Modifyhttps://www.listbox.com/member/?;Your Subscription
 http://www.listbox.com


*agi* | Archives https://www.listbox.com/member/archive/303/=now
 https://www.listbox.com/member/archive/rss/303/ | 
 Modifyhttps://www.listbox.com/member/?;Your Subscription
 http://www.listbox.com




-- 
Abram Demski
http://lo-tho.blogspot.com/
http://groups.google.com/group/one-logic



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

[agi] New KurzweilAI.net site... with my silly article sillier chatbot ;-p ;) ....

2010-07-05 Thread Ben Goertzel

Check out my article on the H+ Summit

http://www.kurzweilai.net/h-summit-harvard-the-rise-of-the-citizen-scientist

and also the Ramona4 chatbot that Novamente LLC built for Ray Kurzweil
a while back

http://www.kurzweilai.net/ramona4/ramona.html

It's not AGI at all; but it's pretty funny ;-)

-- Ben



-- 
Ben Goertzel, PhD
CEO, Novamente LLC and Biomind LLC
CTO, Genescient Corp
Vice Chairman, Humanity+
Advisor, Singularity University and Singularity Institute
External Research Professor, Xiamen University, China
b...@goertzel.org

   
“When nothing seems to help, I go look at a stonecutter hammering away
at his rock, perhaps a hundred times without as much as a crack
showing in it. Yet at the hundred and first blow it will split in two,
and I know it was not that blow that did it, but all that had gone
before.”


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Reward function vs utility

Re: [agi] Reward function vs utility

Re: [agi] Reward function vs utility

[agi] New KurzweilAI.net site... with my silly article sillier chatbot ;-p ;) ....

4 matches

Site Navigation

Mail list logo

Footer information