Re: [agi] Occam's Razor and its abuse
Let's try this . . . . In Universal Algorithmic Intelligence on page 20, Hutter uses Occam's razor in the definition of . Then, at the bottom of the page, he merely claims that using as an estimate for ? may be a reasonable thing to do That's not a proof of Occam's Razor. = = = = = = He also references Occam's Razor on page 33 where he says: We believe the answer to be negative, which on the positive side would show the necessity of Occam's razor assumption, and the distinguishedness of AIXI. That's calling Occam's razor a necessary assumption and bases that upon a *belief*. = = = = = = Where do you believe that he proves Occam's razor? - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, October 29, 2008 10:46 PM Subject: Re: [agi] Occam's Razor and its abuse --- On Wed, 10/29/08, Mark Waser [EMAIL PROTECTED] wrote: Hutter *defined* the measure of correctness using simplicity as a component. Of course, they're correlated when you do such a thing. That's not a proof, that's an assumption. Hutter defined the measure of correctness as the accumulated reward by the agent in AIXI. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
I think Hutter is being modest. -- Matt Mahoney, [EMAIL PROTECTED] --- On Fri, 10/31/08, Mark Waser [EMAIL PROTECTED] wrote: From: Mark Waser [EMAIL PROTECTED] Subject: Re: [agi] Occam's Razor and its abuse To: agi@v2.listbox.com Date: Friday, October 31, 2008, 5:41 PM Let's try this . . . . In Universal Algorithmic Intelligence on page 20, Hutter uses Occam's razor in the definition of . Then, at the bottom of the page, he merely claims that using as an estimate for ? may be a reasonable thing to do That's not a proof of Occam's Razor. = = = = = = He also references Occam's Razor on page 33 where he says: We believe the answer to be negative, which on the positive side would show the necessity of Occam's razor assumption, and the distinguishedness of AIXI. That's calling Occam's razor a necessary assumption and bases that upon a *belief*. = = = = = = Where do you believe that he proves Occam's razor? - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, October 29, 2008 10:46 PM Subject: Re: [agi] Occam's Razor and its abuse --- On Wed, 10/29/08, Mark Waser [EMAIL PROTECTED] wrote: Hutter *defined* the measure of correctness using simplicity as a component. Of course, they're correlated when you do such a thing. That's not a proof, that's an assumption. Hutter defined the measure of correctness as the accumulated reward by the agent in AIXI. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
I think Hutter is being modest. Huh? So . . . . are you going to continue claiming that Occam's Razor is proved or are you going to stop (or are you going to point me to the proof)? - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Friday, October 31, 2008 5:54 PM Subject: Re: [agi] Occam's Razor and its abuse I think Hutter is being modest. -- Matt Mahoney, [EMAIL PROTECTED] --- On Fri, 10/31/08, Mark Waser [EMAIL PROTECTED] wrote: From: Mark Waser [EMAIL PROTECTED] Subject: Re: [agi] Occam's Razor and its abuse To: agi@v2.listbox.com Date: Friday, October 31, 2008, 5:41 PM Let's try this . . . . In Universal Algorithmic Intelligence on page 20, Hutter uses Occam's razor in the definition of . Then, at the bottom of the page, he merely claims that using as an estimate for ? may be a reasonable thing to do That's not a proof of Occam's Razor. = = = = = = He also references Occam's Razor on page 33 where he says: We believe the answer to be negative, which on the positive side would show the necessity of Occam's razor assumption, and the distinguishedness of AIXI. That's calling Occam's razor a necessary assumption and bases that upon a *belief*. = = = = = = Where do you believe that he proves Occam's razor? - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, October 29, 2008 10:46 PM Subject: Re: [agi] Occam's Razor and its abuse --- On Wed, 10/29/08, Mark Waser [EMAIL PROTECTED] wrote: Hutter *defined* the measure of correctness using simplicity as a component. Of course, they're correlated when you do such a thing. That's not a proof, that's an assumption. Hutter defined the measure of correctness as the accumulated reward by the agent in AIXI. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
(1) Simplicity (in conclusions, hypothesis, theories, etc.) is preferred. (2) The preference to simplicity does not need a reason or justification. (3) Simplicity is preferred because it is correlated with correctness. I agree with (1), but not (2) and (3). I concur but would add that (4) Simplicity is preferred because it is correlated with correctness *of implementation* (or ease of implementation correctly :-) - Original Message - From: Pei Wang [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, October 28, 2008 10:15 PM Subject: Re: [agi] Occam's Razor and its abuse Eric, I highly respect your work, though we clearly have different opinions on what intelligence is, as well as on how to achieve it. For example, though learning and generalization play central roles in my theory about intelligence, I don't think PAC learning (or the other learning algorithms proposed so far) provides a proper conceptual framework for the typical situation of this process. Generally speaking, I'm not building some system that learns about the world, in the sense that there is a correct way to describe the world waiting to be discovered, which can be captured by some algorithm. Instead, learning to me is a non-algorithmic open-ended process by which the system summarizes its own experience, and uses it to predict the future. I fully understand that most people in this field probably consider this opinion wrong, though I haven't been convinced yet by the arguments I've seen so far. Instead of addressing all of the relevant issues, in this discussion I have a very limited goal. To rephrase what I said initially, I see that under the term Occam's Razor, currently there are three different statements: (1) Simplicity (in conclusions, hypothesis, theories, etc.) is preferred. (2) The preference to simplicity does not need a reason or justification. (3) Simplicity is preferred because it is correlated with correctness. I agree with (1), but not (2) and (3). I know many people have different opinions, and I don't attempt to argue with them here --- these problems are too complicated to be settled by email exchanges. However, I do hope to convince people in this discussion that the three statements are not logically equivalent, and (2) and (3) are not implied by (1), so to use Occam's Razor to refer to all of them is not a good idea, because it is going to mix different issues. Therefore, I suggest people to use Occam's Razor in its original and basic sense, that is (1), and to use other terms to refer to (2) and (3). Otherwise, when people talk about Occam's Razor, I just don't know what to say. Pei On Tue, Oct 28, 2008 at 8:09 PM, Eric Baum [EMAIL PROTECTED] wrote: Pei Triggered by several recent discussions, I'd like to make the Pei following position statement, though won't commit myself to long Pei debate on it. ;-) Pei Occam's Razor, in its original form, goes like entities must not Pei be multiplied beyond necessity, and it is often stated as All Pei other things being equal, the simplest solution is the best or Pei when multiple competing theories are equal in other respects, Pei the principle recommends selecting the theory that introduces the Pei fewest assumptions and postulates the fewest entities --- all Pei from http://en.wikipedia.org/wiki/Occam's_razor Pei I fully agree with all of the above statements. Pei However, to me, there are two common misunderstandings associated Pei with it in the context of AGI and philosophy of science. Pei (1) To take this statement as self-evident or a stand-alone Pei postulate Pei To me, it is derived or implied by the insufficiency of Pei resources. If a system has sufficient resources, it has no good Pei reason to prefer a simpler theory. With all due respect, this is mistaken. Occam's Razor, in some form, is the heart of Generalization, which is the essence (and G) of GI. For example, if you study concept learning from examples, say in the PAC learning context (related theorems hold in some other contexts as well), there are theorems to the effect that if you find a hypothesis from a simple enough class of a hypotheses it will with very high probability accurately classify new examples chosen from the same distribution, and conversely theorems that state (roughly speaking) that any method that chooses a hypothesis from too expressive a class of hypotheses will have a probability that can be bounded below by some reasonable number like 1/7, of having large error in its predictions on new examples-- in other words it is impossible to PAC learn without respecting Occam's Razor. For discussion of the above paragraphs, I'd refer you to Chapter 4 of What is Thought? (MIT Press, 2004). In other words, if you are building some system that learns about the world, it had better respect Occam's razor if you want whatever it learns to apply to new experience. (I use the term Occam's razor loosely; using hypotheses that are highly constrained
Re: [agi] Occam's Razor and its abuse
Hutter proved (3), although as a general principle it was already a well established practice in machine learning. Also, I agree with (4) but this is not the primary reason to prefer simplicity. Hutter *defined* the measure of correctness using simplicity as a component. Of course, they're correlated when you do such a thing. That's not a proof, that's an assumption. Regarding (4), I was deliberately ambiguous as to whether I meant implementation of thinking system or implementation of thought itself. - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, October 29, 2008 11:11 AM Subject: Re: [agi] Occam's Razor and its abuse --- On Wed, 10/29/08, Mark Waser [EMAIL PROTECTED] wrote: (1) Simplicity (in conclusions, hypothesis, theories, etc.) is preferred. (2) The preference to simplicity does not need a reason or justification. (3) Simplicity is preferred because it is correlated with correctness. I agree with (1), but not (2) and (3). I concur but would add that (4) Simplicity is preferred because it is correlated with correctness *of implementation* (or ease of implementation correctly :-) Occam said (1) but had no proof. Hutter proved (3), although as a general principle it was already a well established practice in machine learning. Also, I agree with (4) but this is not the primary reason to prefer simplicity. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
RE: [agi] Occam's Razor and its abuse
Pei, My understanding is that when you reason from data, you often want the ability to extrapolate, which requires some sort of assumptions about the type of mathematical model to be used. How do you deal with that in NARS? Ed Porter -Original Message- From: Pei Wang [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2008 9:40 PM To: agi@v2.listbox.com Subject: Re: [agi] Occam's Razor and its abuse Ed, Since NARS doesn't follow the Bayesian approach, there is no initial priors to be assumed. If we use a more general term, such as initial knowledge or innate beliefs, then yes, you can add them into the system, will will improve the system's performance. However, they are optional. In NARS, all object-level (i.e., not meta-level) innate beliefs can be learned by the system afterward. Pei On Tue, Oct 28, 2008 at 5:37 PM, Ed Porter [EMAIL PROTECTED] wrote: It appears to me that the assumptions about initial priors used by a self learning AGI or an evolutionary line of AGI's could be quite minimal. My understanding is that once a probability distribution starts receiving random samples from its distribution the effect of the original prior becomes rapidly lost, unless it is a rather rare one. Such rare problem priors would get selected against quickly by evolution. Evolution would tend to tune for the most appropriate priors for the success of subsequent generations (either or computing in the same system if it is capable of enough change or of descendant systems). Probably the best priors would generally be ones that could be trained moderately rapidly by data. So it seems an evolutionary system or line could initially learn priors without any assumptions for priors other than a random picking of priors. Over time and multiple generations it might develop hereditary priors, an perhaps even different hereditary priors for parts of its network connected to different inputs, outputs or internal controls. The use of priors in an AGI could be greatly improved by having a gen/comp hiearachy in which models for a given concept could be inherited from the priors of sets of models for similar concepts, and that the set of priors appropriate could change contextually. It would also seem that the notion of a prior could be improve by blending information from episodic and probabilistic models. It would appear than in almost any generally intelligent system, being able to approximate reality in a manner sufficient for evolutionary success with the most efficient representations would be a characteristic that would be greatly preferred by evolution, because it would allow systems to better model more of their environement sufficiently well for evolutionary success with whatever current modeling capacity they have. So, although a completely accurate description of virtually anything may not find much use for Occam's Razor, as a practically useful representation it often will. It seems to me that Occam's Razor is more oriented to deriving meaningful generalizations that it is exact descriptions of anything. Furthermore, it would seem to me that a more simple set of preconditions, is generally more probable than a more complex one, because it requires less coincidence. It would seem to me this would be true under most random sets of priors for the probabilities of the possible sets of components involved and Occam's Razor type selection. The are the musings of an untrained mind, since I have not spent much time studying philosophy, because such a high percent of it was so obviously stupid (such as what was commonly said when I was young, that you can't have intelligence without language) and my understanding of math is much less than that of many on this list. But none the less I think much of what I have said above is true. I think its gist is not totally dissimilar to what Abram has said. Ed Porter -Original Message- From: Pei Wang [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2008 3:05 PM To: agi@v2.listbox.com Subject: Re: [agi] Occam's Razor and its abuse Abram, I agree with your basic idea in the following, though I usually put it in different form. Pei On Tue, Oct 28, 2008 at 2:52 PM, Abram Demski [EMAIL PROTECTED] wrote: Ben, You assert that Pei is forced to make an assumption about the regulatiry of the world to justify adaptation. Pei could also take a different argument. He could try to show that *if* a strategy exists that can be implemented given the finite resources, NARS will eventually find it. Thus, adaptation is justified on a sort of we might as well try basis. (The proof would involve showing that NARS searches the state of finite-state-machines that can be implemented with the resources at hand, and is more probable to stay for longer periods of time in configurations that give more reward, such that NARS would eventually
Re: [agi] Occam's Razor and its abuse
Ed, When NARS extrapolates its past experience to the current and the future, it is indeed based on the assumption that its future experience will be similar to its past experience (otherwise any prediction will be equally valid), however it does not assume the world can be captured by any specific mathematical model, such as a Turing Machine or a probability distribution defined on a propositional space. Concretely speaking, when a statement S has been tested N times, and in M times it is true, but in N-M times it is false, then NARS's expectation value for it to be true in the next testing is E(S) = (M+0.5)/(N+1) [if there is no other relevant knowledge], and the system will use this value to decide whether to accept a bet on S. However, neither the system nor its designer assumes that there is a true probability for S to occur for which the above expectation is an approximation. Also, it is not assumed that E(S) will converge when the testing on S continues. Pei On Wed, Oct 29, 2008 at 11:33 AM, Ed Porter [EMAIL PROTECTED] wrote: Pei, My understanding is that when you reason from data, you often want the ability to extrapolate, which requires some sort of assumptions about the type of mathematical model to be used. How do you deal with that in NARS? Ed Porter -Original Message- From: Pei Wang [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2008 9:40 PM To: agi@v2.listbox.com Subject: Re: [agi] Occam's Razor and its abuse Ed, Since NARS doesn't follow the Bayesian approach, there is no initial priors to be assumed. If we use a more general term, such as initial knowledge or innate beliefs, then yes, you can add them into the system, will will improve the system's performance. However, they are optional. In NARS, all object-level (i.e., not meta-level) innate beliefs can be learned by the system afterward. Pei On Tue, Oct 28, 2008 at 5:37 PM, Ed Porter [EMAIL PROTECTED] wrote: It appears to me that the assumptions about initial priors used by a self learning AGI or an evolutionary line of AGI's could be quite minimal. My understanding is that once a probability distribution starts receiving random samples from its distribution the effect of the original prior becomes rapidly lost, unless it is a rather rare one. Such rare problem priors would get selected against quickly by evolution. Evolution would tend to tune for the most appropriate priors for the success of subsequent generations (either or computing in the same system if it is capable of enough change or of descendant systems). Probably the best priors would generally be ones that could be trained moderately rapidly by data. So it seems an evolutionary system or line could initially learn priors without any assumptions for priors other than a random picking of priors. Over time and multiple generations it might develop hereditary priors, an perhaps even different hereditary priors for parts of its network connected to different inputs, outputs or internal controls. The use of priors in an AGI could be greatly improved by having a gen/comp hiearachy in which models for a given concept could be inherited from the priors of sets of models for similar concepts, and that the set of priors appropriate could change contextually. It would also seem that the notion of a prior could be improve by blending information from episodic and probabilistic models. It would appear than in almost any generally intelligent system, being able to approximate reality in a manner sufficient for evolutionary success with the most efficient representations would be a characteristic that would be greatly preferred by evolution, because it would allow systems to better model more of their environement sufficiently well for evolutionary success with whatever current modeling capacity they have. So, although a completely accurate description of virtually anything may not find much use for Occam's Razor, as a practically useful representation it often will. It seems to me that Occam's Razor is more oriented to deriving meaningful generalizations that it is exact descriptions of anything. Furthermore, it would seem to me that a more simple set of preconditions, is generally more probable than a more complex one, because it requires less coincidence. It would seem to me this would be true under most random sets of priors for the probabilities of the possible sets of components involved and Occam's Razor type selection. The are the musings of an untrained mind, since I have not spent much time studying philosophy, because such a high percent of it was so obviously stupid (such as what was commonly said when I was young, that you can't have intelligence without language) and my understanding of math is much less than that of many on this list. But none the less I think much of what I have said above is true. I think its gist is not totally dissimilar to what Abram has said
Re: [agi] Occam's Razor and its abuse
But, NARS as an overall software system will perform more effectively (i.e., learn more rapidly) in some environments than in others, for a variety of reasons. There are many biases built into the NARS architecture in various ways ... it's just not obvious to spell out what they are, because the NARS system was not explicitly designed based on that sort of thinking... The same is true of every other complex AGI architecture... ben g On Wed, Oct 29, 2008 at 12:07 PM, Pei Wang [EMAIL PROTECTED] wrote: Ed, When NARS extrapolates its past experience to the current and the future, it is indeed based on the assumption that its future experience will be similar to its past experience (otherwise any prediction will be equally valid), however it does not assume the world can be captured by any specific mathematical model, such as a Turing Machine or a probability distribution defined on a propositional space. Concretely speaking, when a statement S has been tested N times, and in M times it is true, but in N-M times it is false, then NARS's expectation value for it to be true in the next testing is E(S) = (M+0.5)/(N+1) [if there is no other relevant knowledge], and the system will use this value to decide whether to accept a bet on S. However, neither the system nor its designer assumes that there is a true probability for S to occur for which the above expectation is an approximation. Also, it is not assumed that E(S) will converge when the testing on S continues. Pei On Wed, Oct 29, 2008 at 11:33 AM, Ed Porter [EMAIL PROTECTED] wrote: Pei, My understanding is that when you reason from data, you often want the ability to extrapolate, which requires some sort of assumptions about the type of mathematical model to be used. How do you deal with that in NARS? Ed Porter -Original Message- From: Pei Wang [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2008 9:40 PM To: agi@v2.listbox.com Subject: Re: [agi] Occam's Razor and its abuse Ed, Since NARS doesn't follow the Bayesian approach, there is no initial priors to be assumed. If we use a more general term, such as initial knowledge or innate beliefs, then yes, you can add them into the system, will will improve the system's performance. However, they are optional. In NARS, all object-level (i.e., not meta-level) innate beliefs can be learned by the system afterward. Pei On Tue, Oct 28, 2008 at 5:37 PM, Ed Porter [EMAIL PROTECTED] wrote: It appears to me that the assumptions about initial priors used by a self learning AGI or an evolutionary line of AGI's could be quite minimal. My understanding is that once a probability distribution starts receiving random samples from its distribution the effect of the original prior becomes rapidly lost, unless it is a rather rare one. Such rare problem priors would get selected against quickly by evolution. Evolution would tend to tune for the most appropriate priors for the success of subsequent generations (either or computing in the same system if it is capable of enough change or of descendant systems). Probably the best priors would generally be ones that could be trained moderately rapidly by data. So it seems an evolutionary system or line could initially learn priors without any assumptions for priors other than a random picking of priors. Over time and multiple generations it might develop hereditary priors, an perhaps even different hereditary priors for parts of its network connected to different inputs, outputs or internal controls. The use of priors in an AGI could be greatly improved by having a gen/comp hiearachy in which models for a given concept could be inherited from the priors of sets of models for similar concepts, and that the set of priors appropriate could change contextually. It would also seem that the notion of a prior could be improve by blending information from episodic and probabilistic models. It would appear than in almost any generally intelligent system, being able to approximate reality in a manner sufficient for evolutionary success with the most efficient representations would be a characteristic that would be greatly preferred by evolution, because it would allow systems to better model more of their environement sufficiently well for evolutionary success with whatever current modeling capacity they have. So, although a completely accurate description of virtually anything may not find much use for Occam's Razor, as a practically useful representation it often will. It seems to me that Occam's Razor is more oriented to deriving meaningful generalizations that it is exact descriptions of anything. Furthermore, it would seem to me that a more simple set of preconditions, is generally more probable than a more complex one, because it requires less coincidence. It would seem to me this would be true
Re: [agi] Occam's Razor and its abuse
Ben, I never claimed that NARS is not based on assumptions (or call them biases), but only on truths. It surely is, and many of the assumptions are my beliefs and intuitions, which I cannot convince other people to accept very soon. However, it does not mean that all assumptions are equally acceptable, or as soon as something is called a assumption, the author will be released from the duty of justifying it. Going back to the original topic, since simplicity/complexity of a description is correlated with its prior probability is the core assumption of certain research paradigms, it should be justified. Call it Occam's Razor so as to suggest it is self-evident is not the proper way to do the job. This is all I want to argue in this discussion. Pei On Wed, Oct 29, 2008 at 12:10 PM, Ben Goertzel [EMAIL PROTECTED] wrote: But, NARS as an overall software system will perform more effectively (i.e., learn more rapidly) in some environments than in others, for a variety of reasons. There are many biases built into the NARS architecture in various ways ... it's just not obvious to spell out what they are, because the NARS system was not explicitly designed based on that sort of thinking... The same is true of every other complex AGI architecture... ben g On Wed, Oct 29, 2008 at 12:07 PM, Pei Wang [EMAIL PROTECTED] wrote: Ed, When NARS extrapolates its past experience to the current and the future, it is indeed based on the assumption that its future experience will be similar to its past experience (otherwise any prediction will be equally valid), however it does not assume the world can be captured by any specific mathematical model, such as a Turing Machine or a probability distribution defined on a propositional space. Concretely speaking, when a statement S has been tested N times, and in M times it is true, but in N-M times it is false, then NARS's expectation value for it to be true in the next testing is E(S) = (M+0.5)/(N+1) [if there is no other relevant knowledge], and the system will use this value to decide whether to accept a bet on S. However, neither the system nor its designer assumes that there is a true probability for S to occur for which the above expectation is an approximation. Also, it is not assumed that E(S) will converge when the testing on S continues. Pei On Wed, Oct 29, 2008 at 11:33 AM, Ed Porter [EMAIL PROTECTED] wrote: Pei, My understanding is that when you reason from data, you often want the ability to extrapolate, which requires some sort of assumptions about the type of mathematical model to be used. How do you deal with that in NARS? Ed Porter -Original Message- From: Pei Wang [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2008 9:40 PM To: agi@v2.listbox.com Subject: Re: [agi] Occam's Razor and its abuse Ed, Since NARS doesn't follow the Bayesian approach, there is no initial priors to be assumed. If we use a more general term, such as initial knowledge or innate beliefs, then yes, you can add them into the system, will will improve the system's performance. However, they are optional. In NARS, all object-level (i.e., not meta-level) innate beliefs can be learned by the system afterward. Pei On Tue, Oct 28, 2008 at 5:37 PM, Ed Porter [EMAIL PROTECTED] wrote: It appears to me that the assumptions about initial priors used by a self learning AGI or an evolutionary line of AGI's could be quite minimal. My understanding is that once a probability distribution starts receiving random samples from its distribution the effect of the original prior becomes rapidly lost, unless it is a rather rare one. Such rare problem priors would get selected against quickly by evolution. Evolution would tend to tune for the most appropriate priors for the success of subsequent generations (either or computing in the same system if it is capable of enough change or of descendant systems). Probably the best priors would generally be ones that could be trained moderately rapidly by data. So it seems an evolutionary system or line could initially learn priors without any assumptions for priors other than a random picking of priors. Over time and multiple generations it might develop hereditary priors, an perhaps even different hereditary priors for parts of its network connected to different inputs, outputs or internal controls. The use of priors in an AGI could be greatly improved by having a gen/comp hiearachy in which models for a given concept could be inherited from the priors of sets of models for similar concepts, and that the set of priors appropriate could change contextually. It would also seem that the notion of a prior could be improve by blending information from episodic and probabilistic models. It would appear than in almost any generally intelligent system, being able to approximate reality in a manner
Re: [agi] Occam's Razor and its abuse
However, it does not mean that all assumptions are equally acceptable, or as soon as something is called a assumption, the author will be released from the duty of justifying it. Hume argued that at the basis of any approach to induction, there will necessarily lie some assumption that is *not* inductively justified, but must in essence be taken on faith or as an unjustified assumption He claimed that humans make certain unjustified assumptions of this nature automatically due to human nature This is an argument that not all assumptions can be expected to be justified ... Comments? ben g --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
Ben, It goes back to what justification we are talking about. To prove it is a strong version, and to show supporting evidence is a weak version. Hume pointed out that induction cannot be justified in the sense that there is no way to guarantee that all inductive conclusions will be confirmed. I don't think Hume can be cited to support the assumption that complexity is correlated to probability, or that this assumption does not need justification, just because inductive conclusions can be wrong. There are much more reasons to accept induction than to accept the above assumption. Pei On Wed, Oct 29, 2008 at 12:31 PM, Ben Goertzel [EMAIL PROTECTED] wrote: However, it does not mean that all assumptions are equally acceptable, or as soon as something is called a assumption, the author will be released from the duty of justifying it. Hume argued that at the basis of any approach to induction, there will necessarily lie some assumption that is *not* inductively justified, but must in essence be taken on faith or as an unjustified assumption He claimed that humans make certain unjustified assumptions of this nature automatically due to human nature This is an argument that not all assumptions can be expected to be justified ... Comments? ben g agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
--- On Tue, 10/28/08, Pei Wang [EMAIL PROTECTED] wrote: Whenever someone prove something outside mathematics, it is always based on certain assumptions. If the assumptions are not well justified, there is no strong reason for people to accept the conclusion, even though the proof process is correct. My assumption is that the physics of the observable universe is computable (which is widely believed to be true). If it is true, then AIXI proves that Occam's Razor holds. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
--- On Wed, 10/29/08, Mark Waser [EMAIL PROTECTED] wrote: Hutter *defined* the measure of correctness using simplicity as a component. Of course, they're correlated when you do such a thing. That's not a proof, that's an assumption. Hutter defined the measure of correctness as the accumulated reward by the agent in AIXI. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
[agi] Occam's Razor and its abuse
Triggered by several recent discussions, I'd like to make the following position statement, though won't commit myself to long debate on it. ;-) Occam's Razor, in its original form, goes like entities must not be multiplied beyond necessity, and it is often stated as All other things being equal, the simplest solution is the best or when multiple competing theories are equal in other respects, the principle recommends selecting the theory that introduces the fewest assumptions and postulates the fewest entities --- all from http://en.wikipedia.org/wiki/Occam's_razor I fully agree with all of the above statements. However, to me, there are two common misunderstandings associated with it in the context of AGI and philosophy of science. (1) To take this statement as self-evident or a stand-alone postulate To me, it is derived or implied by the insufficiency of resources. If a system has sufficient resources, it has no good reason to prefer a simpler theory. (2) To take it to mean The simplest answer is usually the correct answer. This is a very different statement, which cannot be justified either analytically or empirically. When theory A is an approximation of theory B, usually the former is simpler than the latter, but less correct or accurate, in terms of its relation with all available evidence. When we are short in resources and have a low demand on accuracy, we often prefer A over B, but it does not mean that by doing so we judge A as more correct than B. In summary, in choosing among alternative theories or conclusions, the preference for simplicity comes from shortage of resources, though simplicity and correctness are logically independent of each other. Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
Ben, Thanks. So the other people now see that I'm not attacking a straw man. My solution to Hume's problem, as embedded in the experience-grounded semantics, is to assume no predictability, but to justify induction as adaptation. However, it is a separate topic which I've explained in my other publications. Here I just want to point out that the original and basic meaning of Occam's Razor and those two common (mis)usages of it are not necessarily the same. I fully agree with the former, but not the latter, and I haven't seen any convincing justification of the latter. Instead, they are often taken as granted, under the name of Occam's Razor. Pei On Tue, Oct 28, 2008 at 12:37 PM, Ben Goertzel [EMAIL PROTECTED] wrote: Hi Pei, This is an interesting perspective; I just want to clarify for others on the list that it is a particular and controversial perspective, and contradicts the perspectives of many other well-informed research professionals and deep thinkers on relevant topics. Many serious thinkers in the area *do* consider Occam's Razor a standalone postulate. This fits in naturally with the Bayesian perspective, in which one needs to assume *some* prior distribution, so one often assumes some sort of Occam prior (e.g. the Solomonoff-Levin prior, the speed prior, etc.) as a standalone postulate. Hume pointed out that induction (in the old sense of extrapolating from the past into the future) is not solvable except by introducing some kind of a priori assumption. Occam's Razor, in one form or another, is a suitable a prior assumption to plug into this role. If you want to replace the Occam's Razor assumption with the assumption that the world is predictable by systems with limited resources, and we will prefer explanations that consume less resources, that seems unproblematic as it's basically equivalent to assuming an Occam prior. On the other hand, I just want to point out that to get around Hume's complaint you do need to make *some* kind of assumption about the regularity of the world. What kind of assumption of this nature underlies your work on NARS (if any)? ben On Tue, Oct 28, 2008 at 8:58 AM, Pei Wang [EMAIL PROTECTED] wrote: Triggered by several recent discussions, I'd like to make the following position statement, though won't commit myself to long debate on it. ;-) Occam's Razor, in its original form, goes like entities must not be multiplied beyond necessity, and it is often stated as All other things being equal, the simplest solution is the best or when multiple competing theories are equal in other respects, the principle recommends selecting the theory that introduces the fewest assumptions and postulates the fewest entities --- all from http://en.wikipedia.org/wiki/Occam's_razor I fully agree with all of the above statements. However, to me, there are two common misunderstandings associated with it in the context of AGI and philosophy of science. (1) To take this statement as self-evident or a stand-alone postulate To me, it is derived or implied by the insufficiency of resources. If a system has sufficient resources, it has no good reason to prefer a simpler theory. (2) To take it to mean The simplest answer is usually the correct answer. This is a very different statement, which cannot be justified either analytically or empirically. When theory A is an approximation of theory B, usually the former is simpler than the latter, but less correct or accurate, in terms of its relation with all available evidence. When we are short in resources and have a low demand on accuracy, we often prefer A over B, but it does not mean that by doing so we judge A as more correct than B. In summary, in choosing among alternative theories or conclusions, the preference for simplicity comes from shortage of resources, though simplicity and correctness are logically independent of each other. Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com -- Ben Goertzel, PhD CEO, Novamente LLC and Biomind LLC Director of Research, SIAI [EMAIL PROTECTED] A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects. -- Robert Heinlein agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed:
Re: [agi] Occam's Razor and its abuse
Ben, You assert that Pei is forced to make an assumption about the regulatiry of the world to justify adaptation. Pei could also take a different argument. He could try to show that *if* a strategy exists that can be implemented given the finite resources, NARS will eventually find it. Thus, adaptation is justified on a sort of we might as well try basis. (The proof would involve showing that NARS searches the state of finite-state-machines that can be implemented with the resources at hand, and is more probable to stay for longer periods of time in configurations that give more reward, such that NARS would eventually settle on a configuration if that configuration consistently gave the highest reward.) So, some form of learning can take place with no assumptions. The problem is that the search space is exponential in the resources available, so there is some maximum point where the system would perform best (because the amount of resources match the problem), but giving the system more resources would hurt performance (because the system searches the unnecessarily large search space). So, in this sense, the system's behavior seems counterintuitive-- it does not seem to be taking advantage of the increased resources. I'm not claiming NARS would have that problem, of course just that a theoretical no-assumption learner would. --Abram On Tue, Oct 28, 2008 at 2:12 PM, Ben Goertzel [EMAIL PROTECTED] wrote: On Tue, Oct 28, 2008 at 10:00 AM, Pei Wang [EMAIL PROTECTED] wrote: Ben, Thanks. So the other people now see that I'm not attacking a straw man. My solution to Hume's problem, as embedded in the experience-grounded semantics, is to assume no predictability, but to justify induction as adaptation. However, it is a separate topic which I've explained in my other publications. Right, but justifying induction as adaptation only works if the environment is assumed to have certain regularities which can be adapted to. In a random environment, adaptation won't work. So, still, to justify induction as adaptation you have to make *some* assumptions about the world. The Occam prior gives one such assumption: that (to give just one form) sets of observations in the world tend to be producible by short computer programs. For adaptation to successfully carry out induction, *some* vaguely comparable property to this must hold, and I'm not sure if you have articulated which one you assume, or if you leave this open. In effect, you implicitly assume something like an Occam prior, because you're saying that a system with finite resources can successfully adapt to the world ... which means that sets of observations in the world *must* be approximately summarizable via subprograms that can be executed within this system. So I argue that, even though it's not your preferred way to think about it, your own approach to AI theory and practice implicitly assumes some variant of the Occam prior holds in the real world. Here I just want to point out that the original and basic meaning of Occam's Razor and those two common (mis)usages of it are not necessarily the same. I fully agree with the former, but not the latter, and I haven't seen any convincing justification of the latter. Instead, they are often taken as granted, under the name of Occam's Razor. I agree that the notion of an Occam prior is a significant conceptual beyond the original Occam's Razor precept enounced long ago. Also, I note that, for those who posit the Occam prior as a **prior assumption**, there is not supposed to be any convincing justification for it. The idea is simply that: one must make *some* assumption (explicitly or implicitly) if one wants to do induction, and this is the assumption that some people choose to make. -- Ben G agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
Ben, It seems that you agree the issue I pointed out really exists, but just take it as a necessary evil. Furthermore, you think I also assumed the same thing, though I failed to see it. I won't argue against the necessary evil part --- as far as you agree that those postulates (such as the universe is computable) are not convincingly justified. I won't try to disprove them. As for the latter part, I don't think you can convince me that you know me better than I know myself. ;-) The following is from http://nars.wang.googlepages.com/wang.semantics.pdf , page 28: If the answers provided by NARS are fallible, in what sense these answers are better than arbitrary guesses? This leads us to the concept of rationality. When infallible predictions cannot be obtained (due to insufficient knowledge and resources), answers based on past experience are better than arbitrary guesses, if the environment is relatively stable. To say an answer is only a summary of past experience (thus no future confirmation guaranteed) does not make it equal to an arbitrary conclusion — it is what adaptation means. Adaptation is the process in which a system changes its behaviors as if the future is similar to the past. It is a rational process, even though individual conclusions it produces are often wrong. For this reason, valid inference rules (deduction, induction, abduction, and so on) are the ones whose conclusions correctly (according to the semantics) summarize the evidence in the premises. They are truth-preserving in this sense, not in the model-theoretic sense that they always generate conclusions which are immune from future revision. --- so you see, I don't assume adaptation will always be successful, even successful to a certain probability. You can dislike this conclusion, though you cannot say it is the same as what is assumed by Novamente and AIXI. Pei On Tue, Oct 28, 2008 at 2:12 PM, Ben Goertzel [EMAIL PROTECTED] wrote: On Tue, Oct 28, 2008 at 10:00 AM, Pei Wang [EMAIL PROTECTED] wrote: Ben, Thanks. So the other people now see that I'm not attacking a straw man. My solution to Hume's problem, as embedded in the experience-grounded semantics, is to assume no predictability, but to justify induction as adaptation. However, it is a separate topic which I've explained in my other publications. Right, but justifying induction as adaptation only works if the environment is assumed to have certain regularities which can be adapted to. In a random environment, adaptation won't work. So, still, to justify induction as adaptation you have to make *some* assumptions about the world. The Occam prior gives one such assumption: that (to give just one form) sets of observations in the world tend to be producible by short computer programs. For adaptation to successfully carry out induction, *some* vaguely comparable property to this must hold, and I'm not sure if you have articulated which one you assume, or if you leave this open. In effect, you implicitly assume something like an Occam prior, because you're saying that a system with finite resources can successfully adapt to the world ... which means that sets of observations in the world *must* be approximately summarizable via subprograms that can be executed within this system. So I argue that, even though it's not your preferred way to think about it, your own approach to AI theory and practice implicitly assumes some variant of the Occam prior holds in the real world. Here I just want to point out that the original and basic meaning of Occam's Razor and those two common (mis)usages of it are not necessarily the same. I fully agree with the former, but not the latter, and I haven't seen any convincing justification of the latter. Instead, they are often taken as granted, under the name of Occam's Razor. I agree that the notion of an Occam prior is a significant conceptual beyond the original Occam's Razor precept enounced long ago. Also, I note that, for those who posit the Occam prior as a **prior assumption**, there is not supposed to be any convincing justification for it. The idea is simply that: one must make *some* assumption (explicitly or implicitly) if one wants to do induction, and this is the assumption that some people choose to make. -- Ben G agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
Most certainly ... and the human mind seems to make a lot of other, more specialized assumptions about the environment also ... so that unless the environment satisfies a bunch of these other more specialized assumptions, its adaptation will be very slow and resource-inefficient... ben g On Tue, Oct 28, 2008 at 12:05 PM, Pei Wang [EMAIL PROTECTED] wrote: We can say the same thing for the human mind, right? Pei On Tue, Oct 28, 2008 at 2:54 PM, Ben Goertzel [EMAIL PROTECTED] wrote: Sure ... but my point is that unless the environment satisfies a certain Occam-prior-like property, NARS will be useless... ben On Tue, Oct 28, 2008 at 11:52 AM, Abram Demski [EMAIL PROTECTED] wrote: Ben, You assert that Pei is forced to make an assumption about the regulatiry of the world to justify adaptation. Pei could also take a different argument. He could try to show that *if* a strategy exists that can be implemented given the finite resources, NARS will eventually find it. Thus, adaptation is justified on a sort of we might as well try basis. (The proof would involve showing that NARS searches the state of finite-state-machines that can be implemented with the resources at hand, and is more probable to stay for longer periods of time in configurations that give more reward, such that NARS would eventually settle on a configuration if that configuration consistently gave the highest reward.) So, some form of learning can take place with no assumptions. The problem is that the search space is exponential in the resources available, so there is some maximum point where the system would perform best (because the amount of resources match the problem), but giving the system more resources would hurt performance (because the system searches the unnecessarily large search space). So, in this sense, the system's behavior seems counterintuitive-- it does not seem to be taking advantage of the increased resources. I'm not claiming NARS would have that problem, of course just that a theoretical no-assumption learner would. --Abram On Tue, Oct 28, 2008 at 2:12 PM, Ben Goertzel [EMAIL PROTECTED] wrote: On Tue, Oct 28, 2008 at 10:00 AM, Pei Wang [EMAIL PROTECTED] wrote: Ben, Thanks. So the other people now see that I'm not attacking a straw man. My solution to Hume's problem, as embedded in the experience-grounded semantics, is to assume no predictability, but to justify induction as adaptation. However, it is a separate topic which I've explained in my other publications. Right, but justifying induction as adaptation only works if the environment is assumed to have certain regularities which can be adapted to. In a random environment, adaptation won't work. So, still, to justify induction as adaptation you have to make *some* assumptions about the world. The Occam prior gives one such assumption: that (to give just one form) sets of observations in the world tend to be producible by short computer programs. For adaptation to successfully carry out induction, *some* vaguely comparable property to this must hold, and I'm not sure if you have articulated which one you assume, or if you leave this open. In effect, you implicitly assume something like an Occam prior, because you're saying that a system with finite resources can successfully adapt to the world ... which means that sets of observations in the world *must* be approximately summarizable via subprograms that can be executed within this system. So I argue that, even though it's not your preferred way to think about it, your own approach to AI theory and practice implicitly assumes some variant of the Occam prior holds in the real world. Here I just want to point out that the original and basic meaning of Occam's Razor and those two common (mis)usages of it are not necessarily the same. I fully agree with the former, but not the latter, and I haven't seen any convincing justification of the latter. Instead, they are often taken as granted, under the name of Occam's Razor. I agree that the notion of an Occam prior is a significant conceptual beyond the original Occam's Razor precept enounced long ago. Also, I note that, for those who posit the Occam prior as a **prior assumption**, there is not supposed to be any convincing justification for it. The idea is simply that: one must make *some* assumption (explicitly or implicitly) if one wants to do induction, and this is the assumption that some people choose to make. -- Ben G agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed:
Re: [agi] Occam's Razor and its abuse
We can say the same thing for the human mind, right? Pei On Tue, Oct 28, 2008 at 2:54 PM, Ben Goertzel [EMAIL PROTECTED] wrote: Sure ... but my point is that unless the environment satisfies a certain Occam-prior-like property, NARS will be useless... ben On Tue, Oct 28, 2008 at 11:52 AM, Abram Demski [EMAIL PROTECTED] wrote: Ben, You assert that Pei is forced to make an assumption about the regulatiry of the world to justify adaptation. Pei could also take a different argument. He could try to show that *if* a strategy exists that can be implemented given the finite resources, NARS will eventually find it. Thus, adaptation is justified on a sort of we might as well try basis. (The proof would involve showing that NARS searches the state of finite-state-machines that can be implemented with the resources at hand, and is more probable to stay for longer periods of time in configurations that give more reward, such that NARS would eventually settle on a configuration if that configuration consistently gave the highest reward.) So, some form of learning can take place with no assumptions. The problem is that the search space is exponential in the resources available, so there is some maximum point where the system would perform best (because the amount of resources match the problem), but giving the system more resources would hurt performance (because the system searches the unnecessarily large search space). So, in this sense, the system's behavior seems counterintuitive-- it does not seem to be taking advantage of the increased resources. I'm not claiming NARS would have that problem, of course just that a theoretical no-assumption learner would. --Abram On Tue, Oct 28, 2008 at 2:12 PM, Ben Goertzel [EMAIL PROTECTED] wrote: On Tue, Oct 28, 2008 at 10:00 AM, Pei Wang [EMAIL PROTECTED] wrote: Ben, Thanks. So the other people now see that I'm not attacking a straw man. My solution to Hume's problem, as embedded in the experience-grounded semantics, is to assume no predictability, but to justify induction as adaptation. However, it is a separate topic which I've explained in my other publications. Right, but justifying induction as adaptation only works if the environment is assumed to have certain regularities which can be adapted to. In a random environment, adaptation won't work. So, still, to justify induction as adaptation you have to make *some* assumptions about the world. The Occam prior gives one such assumption: that (to give just one form) sets of observations in the world tend to be producible by short computer programs. For adaptation to successfully carry out induction, *some* vaguely comparable property to this must hold, and I'm not sure if you have articulated which one you assume, or if you leave this open. In effect, you implicitly assume something like an Occam prior, because you're saying that a system with finite resources can successfully adapt to the world ... which means that sets of observations in the world *must* be approximately summarizable via subprograms that can be executed within this system. So I argue that, even though it's not your preferred way to think about it, your own approach to AI theory and practice implicitly assumes some variant of the Occam prior holds in the real world. Here I just want to point out that the original and basic meaning of Occam's Razor and those two common (mis)usages of it are not necessarily the same. I fully agree with the former, but not the latter, and I haven't seen any convincing justification of the latter. Instead, they are often taken as granted, under the name of Occam's Razor. I agree that the notion of an Occam prior is a significant conceptual beyond the original Occam's Razor precept enounced long ago. Also, I note that, for those who posit the Occam prior as a **prior assumption**, there is not supposed to be any convincing justification for it. The idea is simply that: one must make *some* assumption (explicitly or implicitly) if one wants to do induction, and this is the assumption that some people choose to make. -- Ben G agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com -- Ben Goertzel, PhD CEO, Novamente LLC and Biomind LLC Director of Research, SIAI [EMAIL PROTECTED] A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze
Re: [agi] Occam's Razor and its abuse
Abram, I agree with your basic idea in the following, though I usually put it in different form. Pei On Tue, Oct 28, 2008 at 2:52 PM, Abram Demski [EMAIL PROTECTED] wrote: Ben, You assert that Pei is forced to make an assumption about the regulatiry of the world to justify adaptation. Pei could also take a different argument. He could try to show that *if* a strategy exists that can be implemented given the finite resources, NARS will eventually find it. Thus, adaptation is justified on a sort of we might as well try basis. (The proof would involve showing that NARS searches the state of finite-state-machines that can be implemented with the resources at hand, and is more probable to stay for longer periods of time in configurations that give more reward, such that NARS would eventually settle on a configuration if that configuration consistently gave the highest reward.) So, some form of learning can take place with no assumptions. The problem is that the search space is exponential in the resources available, so there is some maximum point where the system would perform best (because the amount of resources match the problem), but giving the system more resources would hurt performance (because the system searches the unnecessarily large search space). So, in this sense, the system's behavior seems counterintuitive-- it does not seem to be taking advantage of the increased resources. I'm not claiming NARS would have that problem, of course just that a theoretical no-assumption learner would. --Abram On Tue, Oct 28, 2008 at 2:12 PM, Ben Goertzel [EMAIL PROTECTED] wrote: On Tue, Oct 28, 2008 at 10:00 AM, Pei Wang [EMAIL PROTECTED] wrote: Ben, Thanks. So the other people now see that I'm not attacking a straw man. My solution to Hume's problem, as embedded in the experience-grounded semantics, is to assume no predictability, but to justify induction as adaptation. However, it is a separate topic which I've explained in my other publications. Right, but justifying induction as adaptation only works if the environment is assumed to have certain regularities which can be adapted to. In a random environment, adaptation won't work. So, still, to justify induction as adaptation you have to make *some* assumptions about the world. The Occam prior gives one such assumption: that (to give just one form) sets of observations in the world tend to be producible by short computer programs. For adaptation to successfully carry out induction, *some* vaguely comparable property to this must hold, and I'm not sure if you have articulated which one you assume, or if you leave this open. In effect, you implicitly assume something like an Occam prior, because you're saying that a system with finite resources can successfully adapt to the world ... which means that sets of observations in the world *must* be approximately summarizable via subprograms that can be executed within this system. So I argue that, even though it's not your preferred way to think about it, your own approach to AI theory and practice implicitly assumes some variant of the Occam prior holds in the real world. Here I just want to point out that the original and basic meaning of Occam's Razor and those two common (mis)usages of it are not necessarily the same. I fully agree with the former, but not the latter, and I haven't seen any convincing justification of the latter. Instead, they are often taken as granted, under the name of Occam's Razor. I agree that the notion of an Occam prior is a significant conceptual beyond the original Occam's Razor precept enounced long ago. Also, I note that, for those who posit the Occam prior as a **prior assumption**, there is not supposed to be any convincing justification for it. The idea is simply that: one must make *some* assumption (explicitly or implicitly) if one wants to do induction, and this is the assumption that some people choose to make. -- Ben G agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
2008/10/28 Ben Goertzel [EMAIL PROTECTED]: On the other hand, I just want to point out that to get around Hume's complaint you do need to make *some* kind of assumption about the regularity of the world. What kind of assumption of this nature underlies your work on NARS (if any)? Not directed to me, but my take on this interesting question. The initial architecture would have limited assumptions about the world. Then the programming in the architecture would for the bias. Initially the system would divide up the world into the simple (inanimate) and highly complex (animate). Why should the system expect animate things to be complex? Because it applies the intentional stance and thinks that they are optimal problem solvers. Optimal problems solvers in a social environment tend to high complexity, as there is an arms race as to who can predict the others, but not be predicted and exploited by the others. Thinking, there are other things like me out here, when you are a complex entity entails thinking things are complex, even when there might be simpler explanations. E.g. what causes weather. Will Pearson --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
What Hutter proved is (very roughly) that given massive computational resources, following Occam's Razor will be -- within some possibly quite large constant -- the best way to achieve goals in a computable environment... That's not exactly proving Occam's Razor, though it is a proof related to Occam's Razor... One could easily argue it is totally irrelevant to AI due to its assumption of massive computational resources ben g On Tue, Oct 28, 2008 at 2:23 PM, Matt Mahoney [EMAIL PROTECTED] wrote: Hutter proved Occam's Razor (AIXI) for the case of any environment with a computable probability distribution. It applies to us because the observable universe is Turing computable according to currently known laws of physics. Specifically, the observable universe has a finite description length (approximately 2.91 x 10^122 bits, the Bekenstein bound of the Hubble radius). AIXI has nothing to do with insufficiency of resources. Given unlimited resources we would still prefer the (algorithmically) simplest explanation because it is the most likely under a Solomonoff distribution of possible environments. Also, AIXI does not state the simplest answer is the best answer. It says that the simplest answer consistent with observation so far is the best answer. When we are short on resources (and we always are because AIXI is not computable), then we may choose a different explanation than the simplest one. However this does not make the alternative correct. -- Matt Mahoney, [EMAIL PROTECTED] --- On Tue, 10/28/08, Pei Wang [EMAIL PROTECTED] wrote: From: Pei Wang [EMAIL PROTECTED] Subject: [agi] Occam's Razor and its abuse To: agi@v2.listbox.com Date: Tuesday, October 28, 2008, 11:58 AM Triggered by several recent discussions, I'd like to make the following position statement, though won't commit myself to long debate on it. ;-) Occam's Razor, in its original form, goes like entities must not be multiplied beyond necessity, and it is often stated as All other things being equal, the simplest solution is the best or when multiple competing theories are equal in other respects, the principle recommends selecting the theory that introduces the fewest assumptions and postulates the fewest entities --- all from http://en.wikipedia.org/wiki/Occam's_razorhttp://en.wikipedia.org/wiki/Occam%27s_razor I fully agree with all of the above statements. However, to me, there are two common misunderstandings associated with it in the context of AGI and philosophy of science. (1) To take this statement as self-evident or a stand-alone postulate To me, it is derived or implied by the insufficiency of resources. If a system has sufficient resources, it has no good reason to prefer a simpler theory. (2) To take it to mean The simplest answer is usually the correct answer. This is a very different statement, which cannot be justified either analytically or empirically. When theory A is an approximation of theory B, usually the former is simpler than the latter, but less correct or accurate, in terms of its relation with all available evidence. When we are short in resources and have a low demand on accuracy, we often prefer A over B, but it does not mean that by doing so we judge A as more correct than B. In summary, in choosing among alternative theories or conclusions, the preference for simplicity comes from shortage of resources, though simplicity and correctness are logically independent of each other. Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com -- Ben Goertzel, PhD CEO, Novamente LLC and Biomind LLC Director of Research, SIAI [EMAIL PROTECTED] A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects. -- Robert Heinlein --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
Au contraire, I suspect that the fact that biological organisms grow via the same sorts of processes as the biological environment in which the live, causes the organisms' minds to be built with **a lot** of implicit bias that is useful for surviving in the environment... Some have argued that this kind of bias is **all you need** for evolution... see Evolution without Selection by A. Lima de Faria. I think that is wrong, but it's interesting that there's enough evidence to even try to make the argument... ben g On Tue, Oct 28, 2008 at 2:37 PM, Ed Porter [EMAIL PROTECTED] wrote: It appears to me that the assumptions about initial priors used by a self learning AGI or an evolutionary line of AGI's could be quite minimal. My understanding is that once a probability distribution starts receiving random samples from its distribution the effect of the original prior becomes rapidly lost, unless it is a rather rare one. Such rare problem priors would get selected against quickly by evolution. Evolution would tend to tune for the most appropriate priors for the success of subsequent generations (either or computing in the same system if it is capable of enough change or of descendant systems). Probably the best priors would generally be ones that could be trained moderately rapidly by data. So it seems an evolutionary system or line could initially learn priors without any assumptions for priors other than a random picking of priors. Over time and multiple generations it might develop hereditary priors, an perhaps even different hereditary priors for parts of its network connected to different inputs, outputs or internal controls. The use of priors in an AGI could be greatly improved by having a gen/comp hiearachy in which models for a given concept could be inherited from the priors of sets of models for similar concepts, and that the set of priors appropriate could change contextually. It would also seem that the notion of a prior could be improve by blending information from episodic and probabilistic models. It would appear than in almost any generally intelligent system, being able to approximate reality in a manner sufficient for evolutionary success with the most efficient representations would be a characteristic that would be greatly preferred by evolution, because it would allow systems to better model more of their environement sufficiently well for evolutionary success with whatever current modeling capacity they have. So, although a completely accurate description of virtually anything may not find much use for Occam's Razor, as a practically useful representation it often will. It seems to me that Occam's Razor is more oriented to deriving meaningful generalizations that it is exact descriptions of anything. Furthermore, it would seem to me that a more simple set of preconditions, is generally more probable than a more complex one, because it requires less coincidence. It would seem to me this would be true under most random sets of priors for the probabilities of the possible sets of components involved and Occam's Razor type selection. The are the musings of an untrained mind, since I have not spent much time studying philosophy, because such a high percent of it was so obviously stupid (such as what was commonly said when I was young, that you can't have intelligence without language) and my understanding of math is much less than that of many on this list. But none the less I think much of what I have said above is true. I think its gist is not totally dissimilar to what Abram has said. Ed Porter -Original Message- From: Pei Wang [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2008 3:05 PM To: agi@v2.listbox.com Subject: Re: [agi] Occam's Razor and its abuse Abram, I agree with your basic idea in the following, though I usually put it in different form. Pei On Tue, Oct 28, 2008 at 2:52 PM, Abram Demski [EMAIL PROTECTED] wrote: Ben, You assert that Pei is forced to make an assumption about the regulatiry of the world to justify adaptation. Pei could also take a different argument. He could try to show that *if* a strategy exists that can be implemented given the finite resources, NARS will eventually find it. Thus, adaptation is justified on a sort of we might as well try basis. (The proof would involve showing that NARS searches the state of finite-state-machines that can be implemented with the resources at hand, and is more probable to stay for longer periods of time in configurations that give more reward, such that NARS would eventually settle on a configuration if that configuration consistently gave the highest reward.) So, some form of learning can take place with no assumptions. The problem is that the search space is exponential in the resources available, so there is some maximum point where the system would perform best (because the amount
Re: [agi] Occam's Razor and its abuse
Matt, The currently known laws of physics is a *description* of the universe at a certain level, which is fundamentally different from the universe itself. Also, All human knowledge can be reduced into physics is not a view point accepted by everyone. Furthermore, computable is a property of a mathematical function. It takes a bunch of assumptions to be applied to a statement, and some additional ones to be applied to an object --- Is the Earth computable? Does the previous question ever make sense? Whenever someone prove something outside mathematics, it is always based on certain assumptions. If the assumptions are not well justified, there is no strong reason for people to accept the conclusion, even though the proof process is correct. Pei On Tue, Oct 28, 2008 at 5:23 PM, Matt Mahoney [EMAIL PROTECTED] wrote: Hutter proved Occam's Razor (AIXI) for the case of any environment with a computable probability distribution. It applies to us because the observable universe is Turing computable according to currently known laws of physics. Specifically, the observable universe has a finite description length (approximately 2.91 x 10^122 bits, the Bekenstein bound of the Hubble radius). AIXI has nothing to do with insufficiency of resources. Given unlimited resources we would still prefer the (algorithmically) simplest explanation because it is the most likely under a Solomonoff distribution of possible environments. Also, AIXI does not state the simplest answer is the best answer. It says that the simplest answer consistent with observation so far is the best answer. When we are short on resources (and we always are because AIXI is not computable), then we may choose a different explanation than the simplest one. However this does not make the alternative correct. -- Matt Mahoney, [EMAIL PROTECTED] --- On Tue, 10/28/08, Pei Wang [EMAIL PROTECTED] wrote: From: Pei Wang [EMAIL PROTECTED] Subject: [agi] Occam's Razor and its abuse To: agi@v2.listbox.com Date: Tuesday, October 28, 2008, 11:58 AM Triggered by several recent discussions, I'd like to make the following position statement, though won't commit myself to long debate on it. ;-) Occam's Razor, in its original form, goes like entities must not be multiplied beyond necessity, and it is often stated as All other things being equal, the simplest solution is the best or when multiple competing theories are equal in other respects, the principle recommends selecting the theory that introduces the fewest assumptions and postulates the fewest entities --- all from http://en.wikipedia.org/wiki/Occam's_razor I fully agree with all of the above statements. However, to me, there are two common misunderstandings associated with it in the context of AGI and philosophy of science. (1) To take this statement as self-evident or a stand-alone postulate To me, it is derived or implied by the insufficiency of resources. If a system has sufficient resources, it has no good reason to prefer a simpler theory. (2) To take it to mean The simplest answer is usually the correct answer. This is a very different statement, which cannot be justified either analytically or empirically. When theory A is an approximation of theory B, usually the former is simpler than the latter, but less correct or accurate, in terms of its relation with all available evidence. When we are short in resources and have a low demand on accuracy, we often prefer A over B, but it does not mean that by doing so we judge A as more correct than B. In summary, in choosing among alternative theories or conclusions, the preference for simplicity comes from shortage of resources, though simplicity and correctness are logically independent of each other. Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
--- On Tue, 10/28/08, Ben Goertzel [EMAIL PROTECTED] wrote: What Hutter proved is (very roughly) that given massive computational resources, following Occam's Razor will be -- within some possibly quite large constant -- the best way to achieve goals in a computable environment... That's not exactly proving Occam's Razor, though it is a proof related to Occam's Razor... No, that's AIXI^tl. I was talking about AIXI. Hutter proved both. One could easily argue it is totally irrelevant to AI due to its assumption of massive computational resources If you mean AIXI^tl, I agree. However, it is AIXI that proves Occam's Razor. AIXI is useful to AGI exactly because it proves noncomputability. We can stop looking for a neat solution. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
[agi] Occam's Razor and its abuse
Pei Triggered by several recent discussions, I'd like to make the Pei following position statement, though won't commit myself to long Pei debate on it. ;-) Pei Occam's Razor, in its original form, goes like entities must not Pei be multiplied beyond necessity, and it is often stated as All Pei other things being equal, the simplest solution is the best or Pei when multiple competing theories are equal in other respects, Pei the principle recommends selecting the theory that introduces the Pei fewest assumptions and postulates the fewest entities --- all Pei from http://en.wikipedia.org/wiki/Occam's_razor Pei I fully agree with all of the above statements. Pei However, to me, there are two common misunderstandings associated Pei with it in the context of AGI and philosophy of science. Pei (1) To take this statement as self-evident or a stand-alone Pei postulate Pei To me, it is derived or implied by the insufficiency of Pei resources. If a system has sufficient resources, it has no good Pei reason to prefer a simpler theory. With all due respect, this is mistaken. Occam's Razor, in some form, is the heart of Generalization, which is the essence (and G) of GI. For example, if you study concept learning from examples, say in the PAC learning context (related theorems hold in some other contexts as well), there are theorems to the effect that if you find a hypothesis from a simple enough class of a hypotheses it will with very high probability accurately classify new examples chosen from the same distribution, and conversely theorems that state (roughly speaking) that any method that chooses a hypothesis from too expressive a class of hypotheses will have a probability that can be bounded below by some reasonable number like 1/7, of having large error in its predictions on new examples-- in other words it is impossible to PAC learn without respecting Occam's Razor. For discussion of the above paragraphs, I'd refer you to Chapter 4 of What is Thought? (MIT Press, 2004). In other words, if you are building some system that learns about the world, it had better respect Occam's razor if you want whatever it learns to apply to new experience. (I use the term Occam's razor loosely; using hypotheses that are highly constrained in ways other than just being concise may work, but you'd better respect simplicity broadly defined. See Chap 6 of WIT? for more discussion of this point.) The core problem of GI is generalization: you want to be able to figure out new problems as they come along that you haven't seen before. In order to do that, you basically must implicitly or explicitly employ some version of Occam's Razor, independent of how much resources you have. In my view, the first and most important question to ask about any proposal for AGI is, in what way is it going to produce Occam hypotheses. If you can't answer that, don't bother implementing a huge system in hopes of capturing your many insights, because the bigger your implementation gets, the less likely it is to get where you want in the end. --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
RE: [agi] Occam's Razor and its abuse
===Below Ben wrote=== I suspect that the fact that biological organisms grow via the same sorts of processes as the biological environment in which the live, causes the organisms' minds to be built with **a lot** of implicit bias that is useful for surviving in the environment... ===My Response== Au Similaire. That was one of the points I was trying to make! And that arguably supports at least part of what Pei was arguing. I am not arguing it is all you need. You at least need some mechanism for exploring at least some subspace space of possible priors, but you don't need any specific pre-selected set of priors. Ed Porter -Original Message- From: Ben Goertzel [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2008 5:50 PM To: agi@v2.listbox.com Subject: Re: [agi] Occam's Razor and its abuse Au contraire, I suspect that the fact that biological organisms grow via the same sorts of processes as the biological environment in which the live, causes the organisms' minds to be built with **a lot** of implicit bias that is useful for surviving in the environment... Some have argued that this kind of bias is **all you need** for evolution... see Evolution without Selection by A. Lima de Faria. I think that is wrong, but it's interesting that there's enough evidence to even try to make the argument... ben g On Tue, Oct 28, 2008 at 2:37 PM, Ed Porter [EMAIL PROTECTED] wrote: It appears to me that the assumptions about initial priors used by a self learning AGI or an evolutionary line of AGI's could be quite minimal. My understanding is that once a probability distribution starts receiving random samples from its distribution the effect of the original prior becomes rapidly lost, unless it is a rather rare one. Such rare problem priors would get selected against quickly by evolution. Evolution would tend to tune for the most appropriate priors for the success of subsequent generations (either or computing in the same system if it is capable of enough change or of descendant systems). Probably the best priors would generally be ones that could be trained moderately rapidly by data. So it seems an evolutionary system or line could initially learn priors without any assumptions for priors other than a random picking of priors. Over time and multiple generations it might develop hereditary priors, an perhaps even different hereditary priors for parts of its network connected to different inputs, outputs or internal controls. The use of priors in an AGI could be greatly improved by having a gen/comp hiearachy in which models for a given concept could be inherited from the priors of sets of models for similar concepts, and that the set of priors appropriate could change contextually. It would also seem that the notion of a prior could be improve by blending information from episodic and probabilistic models. It would appear than in almost any generally intelligent system, being able to approximate reality in a manner sufficient for evolutionary success with the most efficient representations would be a characteristic that would be greatly preferred by evolution, because it would allow systems to better model more of their environement sufficiently well for evolutionary success with whatever current modeling capacity they have. So, although a completely accurate description of virtually anything may not find much use for Occam's Razor, as a practically useful representation it often will. It seems to me that Occam's Razor is more oriented to deriving meaningful generalizations that it is exact descriptions of anything. Furthermore, it would seem to me that a more simple set of preconditions, is generally more probable than a more complex one, because it requires less coincidence. It would seem to me this would be true under most random sets of priors for the probabilities of the possible sets of components involved and Occam's Razor type selection. The are the musings of an untrained mind, since I have not spent much time studying philosophy, because such a high percent of it was so obviously stupid (such as what was commonly said when I was young, that you can't have intelligence without language) and my understanding of math is much less than that of many on this list. But none the less I think much of what I have said above is true. I think its gist is not totally dissimilar to what Abram has said. Ed Porter -Original Message- From: Pei Wang [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2008 3:05 PM To: agi@v2.listbox.com Subject: Re: [agi] Occam's Razor and its abuse Abram, I agree with your basic idea in the following, though I usually put it in different form. Pei On Tue, Oct 28, 2008 at 2:52 PM, Abram Demski [EMAIL PROTECTED] wrote: Ben, You assert that Pei is forced to make an assumption about the regulatiry of the world to justify adaptation. Pei could also take a different argument. He could try to show
Re: [agi] Occam's Razor and its abuse
Eric:The core problem of GI is generalization: you want to be able to figure out new problems as they come along that you haven't seen before. In order to do that, you basically must implicitly or explicitly employ some version of Occam's Razor It all depends on the subject matter of the generalization. It's a fairly good principle, but there is such a thing as simple-mindedness. For example, what is the cluster of associations evoked in the human brain by any given idea, and what is the principle [or principles] that determines how many associations in how many domains and how many brain areas? The answers to these questions are unlikely to be simple. IOW if the subject matter is complex, the generalization may also have to be complex. --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34 Powered by Listbox: http://www.listbox.com
Re: [agi] Occam's Razor and its abuse
Ed, Since NARS doesn't follow the Bayesian approach, there is no initial priors to be assumed. If we use a more general term, such as initial knowledge or innate beliefs, then yes, you can add them into the system, will will improve the system's performance. However, they are optional. In NARS, all object-level (i.e., not meta-level) innate beliefs can be learned by the system afterward. Pei On Tue, Oct 28, 2008 at 5:37 PM, Ed Porter [EMAIL PROTECTED] wrote: It appears to me that the assumptions about initial priors used by a self learning AGI or an evolutionary line of AGI's could be quite minimal. My understanding is that once a probability distribution starts receiving random samples from its distribution the effect of the original prior becomes rapidly lost, unless it is a rather rare one. Such rare problem priors would get selected against quickly by evolution. Evolution would tend to tune for the most appropriate priors for the success of subsequent generations (either or computing in the same system if it is capable of enough change or of descendant systems). Probably the best priors would generally be ones that could be trained moderately rapidly by data. So it seems an evolutionary system or line could initially learn priors without any assumptions for priors other than a random picking of priors. Over time and multiple generations it might develop hereditary priors, an perhaps even different hereditary priors for parts of its network connected to different inputs, outputs or internal controls. The use of priors in an AGI could be greatly improved by having a gen/comp hiearachy in which models for a given concept could be inherited from the priors of sets of models for similar concepts, and that the set of priors appropriate could change contextually. It would also seem that the notion of a prior could be improve by blending information from episodic and probabilistic models. It would appear than in almost any generally intelligent system, being able to approximate reality in a manner sufficient for evolutionary success with the most efficient representations would be a characteristic that would be greatly preferred by evolution, because it would allow systems to better model more of their environement sufficiently well for evolutionary success with whatever current modeling capacity they have. So, although a completely accurate description of virtually anything may not find much use for Occam's Razor, as a practically useful representation it often will. It seems to me that Occam's Razor is more oriented to deriving meaningful generalizations that it is exact descriptions of anything. Furthermore, it would seem to me that a more simple set of preconditions, is generally more probable than a more complex one, because it requires less coincidence. It would seem to me this would be true under most random sets of priors for the probabilities of the possible sets of components involved and Occam's Razor type selection. The are the musings of an untrained mind, since I have not spent much time studying philosophy, because such a high percent of it was so obviously stupid (such as what was commonly said when I was young, that you can't have intelligence without language) and my understanding of math is much less than that of many on this list. But none the less I think much of what I have said above is true. I think its gist is not totally dissimilar to what Abram has said. Ed Porter -Original Message- From: Pei Wang [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2008 3:05 PM To: agi@v2.listbox.com Subject: Re: [agi] Occam's Razor and its abuse Abram, I agree with your basic idea in the following, though I usually put it in different form. Pei On Tue, Oct 28, 2008 at 2:52 PM, Abram Demski [EMAIL PROTECTED] wrote: Ben, You assert that Pei is forced to make an assumption about the regulatiry of the world to justify adaptation. Pei could also take a different argument. He could try to show that *if* a strategy exists that can be implemented given the finite resources, NARS will eventually find it. Thus, adaptation is justified on a sort of we might as well try basis. (The proof would involve showing that NARS searches the state of finite-state-machines that can be implemented with the resources at hand, and is more probable to stay for longer periods of time in configurations that give more reward, such that NARS would eventually settle on a configuration if that configuration consistently gave the highest reward.) So, some form of learning can take place with no assumptions. The problem is that the search space is exponential in the resources available, so there is some maximum point where the system would perform best (because the amount of resources match the problem), but giving the system more resources would hurt performance (because the system searches the unnecessarily large search
Re: [agi] Occam's Razor and its abuse
Eric, I highly respect your work, though we clearly have different opinions on what intelligence is, as well as on how to achieve it. For example, though learning and generalization play central roles in my theory about intelligence, I don't think PAC learning (or the other learning algorithms proposed so far) provides a proper conceptual framework for the typical situation of this process. Generally speaking, I'm not building some system that learns about the world, in the sense that there is a correct way to describe the world waiting to be discovered, which can be captured by some algorithm. Instead, learning to me is a non-algorithmic open-ended process by which the system summarizes its own experience, and uses it to predict the future. I fully understand that most people in this field probably consider this opinion wrong, though I haven't been convinced yet by the arguments I've seen so far. Instead of addressing all of the relevant issues, in this discussion I have a very limited goal. To rephrase what I said initially, I see that under the term Occam's Razor, currently there are three different statements: (1) Simplicity (in conclusions, hypothesis, theories, etc.) is preferred. (2) The preference to simplicity does not need a reason or justification. (3) Simplicity is preferred because it is correlated with correctness. I agree with (1), but not (2) and (3). I know many people have different opinions, and I don't attempt to argue with them here --- these problems are too complicated to be settled by email exchanges. However, I do hope to convince people in this discussion that the three statements are not logically equivalent, and (2) and (3) are not implied by (1), so to use Occam's Razor to refer to all of them is not a good idea, because it is going to mix different issues. Therefore, I suggest people to use Occam's Razor in its original and basic sense, that is (1), and to use other terms to refer to (2) and (3). Otherwise, when people talk about Occam's Razor, I just don't know what to say. Pei On Tue, Oct 28, 2008 at 8:09 PM, Eric Baum [EMAIL PROTECTED] wrote: Pei Triggered by several recent discussions, I'd like to make the Pei following position statement, though won't commit myself to long Pei debate on it. ;-) Pei Occam's Razor, in its original form, goes like entities must not Pei be multiplied beyond necessity, and it is often stated as All Pei other things being equal, the simplest solution is the best or Pei when multiple competing theories are equal in other respects, Pei the principle recommends selecting the theory that introduces the Pei fewest assumptions and postulates the fewest entities --- all Pei from http://en.wikipedia.org/wiki/Occam's_razor Pei I fully agree with all of the above statements. Pei However, to me, there are two common misunderstandings associated Pei with it in the context of AGI and philosophy of science. Pei (1) To take this statement as self-evident or a stand-alone Pei postulate Pei To me, it is derived or implied by the insufficiency of Pei resources. If a system has sufficient resources, it has no good Pei reason to prefer a simpler theory. With all due respect, this is mistaken. Occam's Razor, in some form, is the heart of Generalization, which is the essence (and G) of GI. For example, if you study concept learning from examples, say in the PAC learning context (related theorems hold in some other contexts as well), there are theorems to the effect that if you find a hypothesis from a simple enough class of a hypotheses it will with very high probability accurately classify new examples chosen from the same distribution, and conversely theorems that state (roughly speaking) that any method that chooses a hypothesis from too expressive a class of hypotheses will have a probability that can be bounded below by some reasonable number like 1/7, of having large error in its predictions on new examples-- in other words it is impossible to PAC learn without respecting Occam's Razor. For discussion of the above paragraphs, I'd refer you to Chapter 4 of What is Thought? (MIT Press, 2004). In other words, if you are building some system that learns about the world, it had better respect Occam's razor if you want whatever it learns to apply to new experience. (I use the term Occam's razor loosely; using hypotheses that are highly constrained in ways other than just being concise may work, but you'd better respect simplicity broadly defined. See Chap 6 of WIT? for more discussion of this point.) The core problem of GI is generalization: you want to be able to figure out new problems as they come along that you haven't seen before. In order to do that, you basically must implicitly or explicitly employ some version of Occam's Razor, independent of how much resources you have. In my view, the first and most important question to ask about any proposal for AGI is, in what way is it going to
Re: [agi] Occam's Razor and its abuse
If not verify, what about falsify? To me Occam's Razor has always been seen as a tool for selecting the first argument to attempt to falsify. If you can't, or haven't, falsified it, then it's usually the best assumption to go on (presuming that the costs of failing are evenly distributed). OTOH, Occam's Razor clearly isn't quantitative, and it doesn't always pick the right answer, just one that's good enough based on what we know at the moment. (Again presuming evenly distributed costs of failure.) (And actually that's an oversimplification. I've been considering the costs of applying the presumption of the theory chosen by Occam's Razor to be equal to or lower then the costs of the alternatives. Whoops! The simplest workable approach isn't always the cheapest, and given that all non-falsified-as-of-now approaches have closely equal plausibility...perhaps one should instead choose the cheapest to presume of all theories that have been vetted against current knowledge.) Occam's Razor is fine for it's original purposes, but when you try to apply it to practical rather than logical problems then you start needing to evaluate relative costs. Both costs of presuming and costs of failure. And actually often it turns out that a solution based on a theory known to be incorrect (e.g. Newton's laws) is good enough, so you don't need to decide about the correct answer. NASA uses Newton, not Einstein, even though Einstein might be correct and Newton is known to be wrong. Pei Wang wrote: Ben, It seems that you agree the issue I pointed out really exists, but just take it as a necessary evil. Furthermore, you think I also assumed the same thing, though I failed to see it. I won't argue against the necessary evil part --- as far as you agree that those postulates (such as the universe is computable) are not convincingly justified. I won't try to disprove them. As for the latter part, I don't think you can convince me that you know me better than I know myself. ;-) The following is from http://nars.wang.googlepages.com/wang.semantics.pdf , page 28: If the answers provided by NARS are fallible, in what sense these answers are better than arbitrary guesses? This leads us to the concept of rationality. When infallible predictions cannot be obtained (due to insufficient knowledge and resources), answers based on past experience are better than arbitrary guesses, if the environment is relatively stable. To say an answer is only a summary of past experience (thus no future confirmation guaranteed) does not make it equal to an arbitrary conclusion — it is what adaptation means. Adaptation is the process in which a system changes its behaviors as if the future is similar to the past. It is a rational process, even though individual conclusions it produces are often wrong. For this reason, valid inference rules (deduction, induction, abduction, and so on) are the ones whose conclusions correctly (according to the semantics) summarize the evidence in the premises. They are truth-preserving in this sense, not in the model-theoretic sense that they always generate conclusions which are immune from future revision. --- so you see, I don't assume adaptation will always be successful, even successful to a certain probability. You can dislike this conclusion, though you cannot say it is the same as what is assumed by Novamente and AIXI. Pei On Tue, Oct 28, 2008 at 2:12 PM, Ben Goertzel [EMAIL PROTECTED] wrote: On Tue, Oct 28, 2008 at 10:00 AM, Pei Wang [EMAIL PROTECTED] wrote: Ben, Thanks. So the other people now see that I'm not attacking a straw man. My solution to Hume's problem, as embedded in the experience-grounded semantics, is to assume no predictability, but to justify induction as adaptation. However, it is a separate topic which I've explained in my other publications. Right, but justifying induction as adaptation only works if the environment is assumed to have certain regularities which can be adapted to. In a random environment, adaptation won't work. So, still, to justify induction as adaptation you have to make *some* assumptions about the world. The Occam prior gives one such assumption: that (to give just one form) sets of observations in the world tend to be producible by short computer programs. For adaptation to successfully carry out induction, *some* vaguely comparable property to this must hold, and I'm not sure if you have articulated which one you assume, or if you leave this open. In effect, you implicitly assume something like an Occam prior, because you're saying that a system with finite resources can successfully adapt to the world ... which means that sets of observations in the world *must* be approximately summarizable via subprograms that can be executed within this system. So I argue that, even though it's not your preferred way to think about it, your own approach to AI theory and practice implicitly assumes some variant of the