Re: [agi] Re: Huge Progress on the Core of AGI
Abram, I haven't found a method that I think works consistently yet. Basically I was trying methods like the one you suggested, which measures the number of correct predictions or expectations. But, then I ran into the problem of, what if the predictions you are counting are more of the same? Do you count them or not? For example, lets say that we see a piece of paper on a table in an image and we see that the paper looks different but moves with the table. So, we can hypothesize that they are attached. Now what if it is not a piece of paper, but a mural. Do you count every little piece of the mural that moves with the desk as a correct prediction? Is it a single prediction? What about the number of times they move together? It doesn't seem right to count each and every time, but we also have to be careful about coincidental movement together. Just because it seems to move together in one frame out of 1000 does not mean we should consider them temporarily attached. So, quantitatively defining simpler and predictive is quite challenging. I am honestly a bit stumped at how to do it at the moment. I will keep trying to find ways to at least approximate it, but I'm really not sure the best way. Of course, I haven't been working on this specific problem long, but other people have tried to quantify our explanatory methods in other areas and have also failed. I think part of the failure has to do with the fact that the things they want to explain using the same method should probably use different methods and should be more heuristic than mathematically precise. It's all quite overwhelming to analyze sometimes. I may have thought about fractions correct vs. incorrect also. The truth is, I haven't locked on and carefully analyzed the different ideas I've come up with because they all seem to have issues and it is difficult to analyze. I definitely need to try some out and just see what the results are and document them better. Dave On Thu, Jul 22, 2010 at 10:23 PM, Abram Demski abramdem...@gmail.comwrote: David, What are the different ways you are thinking of for measuring the predictiveness? I can think of a few different possibilities (such as measuring number incorrect vs measuring fraction incorrect, et cetera) but I'm wondering which variations you consider significant/troublesome/etc. --Abram On Thu, Jul 22, 2010 at 7:12 PM, David Jones davidher...@gmail.comwrote: It's certainly not as simple as you claim. First, assigning a probability is not always possible, nor is it easy. The factors in calculating that probability are unknown and are not the same for every instance. Since we do not know what combination of observations we will see, we cannot have a predefined set of probabilities, nor is it any easier to create a probability function that generates them for us. That is just as exactly what I meant by quantitatively define the predictiveness... it would be proportional to the probability. Second, if you can define a program ina way that is always simpler when it is smaller, then you can do the same thing without a program. I don't think it makes any sense to do it this way. It is not that simple. If it was, we could solve a large portion of agi easily. On Thu, Jul 22, 2010 at 3:16 PM, Matt Mahoney matmaho...@yahoo.com wrote: David Jones wrote: But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems. It isn't hard. To measure predictiveness, you assign a probability to each possible outcome. If the actual outcome has probability p, you score a penalty of log(1/p) bits. To measure simplicity, use the compressed size of the code for your prediction algorithm. Then add the two scores together. That's how it is done in the Calgary challenge http://www.mailcom.com/challenge/ and in my own text compression benchmark. -- Matt Mahoney, matmaho...@yahoo.com *From:* David Jones davidher...@gmail.com *To:* agi agi@v2.listbox.com *Sent:* Thu, July 22, 2010 3:11:46 PM *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI Because simpler is not better if it is less predictive. On Thu, Jul 22, 2010 at 1:21 PM, Abram Demski abramdem...@gmail.com wrote: Jim, Why more predictive *and then* simpler? --Abram On Thu, Jul 22, 2010 at 11:49 AM, David Jones davidher...@gmail.com wrote: An Update I think the following gets to the heart of general AI and what it takes to achieve it. It also provides us with evidence as to why general AI is so difficult. With this new knowledge in mind, I think I will be much more capable now of solving the problems and making it work. I've come to the conclusion lately that the best hypothesis is better because it is more predictive and then simpler than other hypotheses (in that order more predictive... then simpler). But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems
Re: [agi] Re: Huge Progress on the Core of AGI
Abram, I should also mention that I ran into problems mainly because I was having a hard time deciding how to identify objects and determine what is really going on in a scene. This adds a whole other layer of complexity to hypotheses. It's not just about what is more predictive of the observations, it is about deciding what exactly you are observing in the first place. (although you might say its the same problem). I ran into this problem when my algorithm finds matches between items that are not the same. Or it may not find any matches between items that are the same, but have changed. So, how do you decide whether it is 1) the same object, 2) a different object or 3) the same object but it has changed. And how do you decide its relationship to something else... is it 1) dependently attached 2) semi-dependently attached(can move independently, but only in certain ways. Yet also moves dependently) 3) independent 4) sometimes dependent 5) was dependent, but no longer is, 6) was dependent on something else, but then was independent, but now is dependent on something new. These hypotheses are different ways of explaining the same observations, but are complicated by the fact that we aren't sure of the identity of the objects we are observing in the first place. Multiple hypotheses may fit the same observations, and its hard to decide why one is simpler or better than the other. The object you were observing at first may have disappeared. A new object may have appeared at the same time (this is why screenshots are a bit malicious). Or the object you were observing may have changed. In screenshots, sometimes the objects that you are trying to identify as different never appear at the same time because they always completely occlude each other. So, that can make it extremely difficult to decide whether they are the same object that has changed or different objects. Such ambiguities are common in AGI. It is unclear to me yet how to deal with them effectively, although I am continuing to work hard on it. I know its a bit of a mess, but I'm just trying to demonstrate the trouble I've run into. I hope that makes it more clear why I'm having so much trouble finding a way of determining what hypothesis is most predictive and simplest. Dave On Thu, Jul 22, 2010 at 10:23 PM, Abram Demski abramdem...@gmail.comwrote: David, What are the different ways you are thinking of for measuring the predictiveness? I can think of a few different possibilities (such as measuring number incorrect vs measuring fraction incorrect, et cetera) but I'm wondering which variations you consider significant/troublesome/etc. --Abram On Thu, Jul 22, 2010 at 7:12 PM, David Jones davidher...@gmail.comwrote: It's certainly not as simple as you claim. First, assigning a probability is not always possible, nor is it easy. The factors in calculating that probability are unknown and are not the same for every instance. Since we do not know what combination of observations we will see, we cannot have a predefined set of probabilities, nor is it any easier to create a probability function that generates them for us. That is just as exactly what I meant by quantitatively define the predictiveness... it would be proportional to the probability. Second, if you can define a program ina way that is always simpler when it is smaller, then you can do the same thing without a program. I don't think it makes any sense to do it this way. It is not that simple. If it was, we could solve a large portion of agi easily. On Thu, Jul 22, 2010 at 3:16 PM, Matt Mahoney matmaho...@yahoo.com wrote: David Jones wrote: But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems. It isn't hard. To measure predictiveness, you assign a probability to each possible outcome. If the actual outcome has probability p, you score a penalty of log(1/p) bits. To measure simplicity, use the compressed size of the code for your prediction algorithm. Then add the two scores together. That's how it is done in the Calgary challenge http://www.mailcom.com/challenge/ and in my own text compression benchmark. -- Matt Mahoney, matmaho...@yahoo.com *From:* David Jones davidher...@gmail.com *To:* agi agi@v2.listbox.com *Sent:* Thu, July 22, 2010 3:11:46 PM *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI Because simpler is not better if it is less predictive. On Thu, Jul 22, 2010 at 1:21 PM, Abram Demski abramdem...@gmail.com wrote: Jim, Why more predictive *and then* simpler? --Abram On Thu, Jul 22, 2010 at 11:49 AM, David Jones davidher...@gmail.com wrote: An Update I think the following gets to the heart of general AI and what it takes to achieve it. It also provides us with evidence as to why general AI is so difficult. With this new knowledge in mind, I think I will be much more capable now of solving the problems and making it work. I've
Re: [agi] Re: Huge Progress on the Core of AGI
David Jones wrote: I should also mention that I ran into problems mainly because I was having a hard time deciding how to identify objects and determine what is really going on in a scene. I think that your approach makes the problem harder than it needs to be (not that it is easy). Natural language processing is hard, so researchers in an attempt to break down the task into simpler parts, focused on steps like lexical analysis, parsing, part of speech resolution, and semantic analysis. While these problems went unsolved, Google went directly to a solution by skipping them. Likewise, parsing an image into physically separate objects and then building a 3-D model makes the problem harder, not easier. Again, look at the whole picture. You input an image and output a response. Let the system figure out which features are important. If your goal is to count basketball passes, then it is irrelevant whether the AGI recognizes that somebody is wearing a gorilla suit. -- Matt Mahoney, matmaho...@yahoo.com From: David Jones davidher...@gmail.com To: agi agi@v2.listbox.com Sent: Sat, July 24, 2010 2:25:49 PM Subject: Re: [agi] Re: Huge Progress on the Core of AGI Abram, I should also mention that I ran into problems mainly because I was having a hard time deciding how to identify objects and determine what is really going on in a scene. This adds a whole other layer of complexity to hypotheses. It's not just about what is more predictive of the observations, it is about deciding what exactly you are observing in the first place. (although you might say its the same problem). I ran into this problem when my algorithm finds matches between items that are not the same. Or it may not find any matches between items that are the same, but have changed. So, how do you decide whether it is 1) the same object, 2) a different object or 3) the same object but it has changed. And how do you decide its relationship to something else... is it 1) dependently attached 2) semi-dependently attached(can move independently, but only in certain ways. Yet also moves dependently) 3) independent 4) sometimes dependent 5) was dependent, but no longer is, 6) was dependent on something else, but then was independent, but now is dependent on something new. These hypotheses are different ways of explaining the same observations, but are complicated by the fact that we aren't sure of the identity of the objects we are observing in the first place. Multiple hypotheses may fit the same observations, and its hard to decide why one is simpler or better than the other. The object you were observing at first may have disappeared. A new object may have appeared at the same time (this is why screenshots are a bit malicious). Or the object you were observing may have changed. In screenshots, sometimes the objects that you are trying to identify as different never appear at the same time because they always completely occlude each other. So, that can make it extremely difficult to decide whether they are the same object that has changed or different objects. Such ambiguities are common in AGI. It is unclear to me yet how to deal with them effectively, although I am continuing to work hard on it. I know its a bit of a mess, but I'm just trying to demonstrate the trouble I've run into. I hope that makes it more clear why I'm having so much trouble finding a way of determining what hypothesis is most predictive and simplest. Dave On Thu, Jul 22, 2010 at 10:23 PM, Abram Demski abramdem...@gmail.com wrote: David, What are the different ways you are thinking of for measuring the predictiveness? I can think of a few different possibilities (such as measuring number incorrect vs measuring fraction incorrect, et cetera) but I'm wondering which variations you consider significant/troublesome/etc. --Abram On Thu, Jul 22, 2010 at 7:12 PM, David Jones davidher...@gmail.com wrote: It's certainly not as simple as you claim. First, assigning a probability is not always possible, nor is it easy. The factors in calculating that probability are unknown and are not the same for every instance. Since we do not know what combination of observations we will see, we cannot have a predefined set of probabilities, nor is it any easier to create a probability function that generates them for us. That is just as exactly what I meant by quantitatively define the predictiveness... it would be proportional to the probability. Second, if you can define a program ina way that is always simpler when it is smaller, then you can do the same thing without a program. I don't think it makes any sense to do it this way. It is not that simple. If it was, we could solve a large portion of agi easily. On Thu, Jul 22, 2010 at 3:16 PM, Matt Mahoney matmaho...@yahoo.com wrote: David Jones wrote: But, I am amazed at how difficult it is to quantitatively define more
Re: [agi] Re: Huge Progress on the Core of AGI
Huh, Matt? What examples of this holistic scene analysis are there (or are you thinking about)? From: Matt Mahoney Sent: Saturday, July 24, 2010 10:25 PM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI David Jones wrote: I should also mention that I ran into problems mainly because I was having a hard time deciding how to identify objects and determine what is really going on in a scene. I think that your approach makes the problem harder than it needs to be (not that it is easy). Natural language processing is hard, so researchers in an attempt to break down the task into simpler parts, focused on steps like lexical analysis, parsing, part of speech resolution, and semantic analysis. While these problems went unsolved, Google went directly to a solution by skipping them. Likewise, parsing an image into physically separate objects and then building a 3-D model makes the problem harder, not easier. Again, look at the whole picture. You input an image and output a response. Let the system figure out which features are important. If your goal is to count basketball passes, then it is irrelevant whether the AGI recognizes that somebody is wearing a gorilla suit. --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Mike Tintner wrote: Huh, Matt? What examples of this holistic scene analysis are there (or are you thinking about)? I mean a neural model with increasingly complex features, as opposed to an algorithmic 3-D model (like video game graphics in reverse). Of course David rejects such ideas ( http://practicalai.org/Prize/Default.aspx ) even though the one proven working vision model uses it. -- Matt Mahoney, matmaho...@yahoo.com From: Mike Tintner tint...@blueyonder.co.uk To: agi agi@v2.listbox.com Sent: Sat, July 24, 2010 6:16:07 PM Subject: Re: [agi] Re: Huge Progress on the Core of AGI Huh, Matt? What examples of this holistic scene analysis are there (or are you thinking about)? From: Matt Mahoney Sent: Saturday, July 24, 2010 10:25 PM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI David Jones wrote: I should also mention that I ran into problems mainly because I was having a hard time deciding how to identify objects and determine what is really going on in a scene. I think that your approach makes the problem harder than it needs to be (not that it is easy). Natural language processing is hard, so researchers in an attempt to break down the task into simpler parts, focused on steps like lexical analysis, parsing, part of speech resolution, and semantic analysis. While these problems went unsolved, Google went directly to a solution by skipping them. Likewise, parsing an image into physically separate objects and then building a 3-D model makes the problem harder, not easier. Again, look at the whole picture. You input an image and output a response. Let the system figure out which features are important. If your goal is to count basketball passes, then it is irrelevant whether the AGI recognizes that somebody is wearing a gorilla suit. agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Matt, Any method must deal with similar, if not the same, ambiguities. You need to show how neural nets solve this problem or how they solve agi goals while completely skipping the problem. Until then, it is not a successful method. Dave On Jul 24, 2010 7:18 PM, Matt Mahoney matmaho...@yahoo.com wrote: Mike Tintner wrote: Huh, Matt? What examples of this holistic scene analysis are there (or are y... I mean a neural model with increasingly complex features, as opposed to an algorithmic 3-D model (like video game graphics in reverse). Of course David rejects such ideas ( http://practicalai.org/Prize/Default.aspx ) even though the one proven working vision model uses it. -- Matt Mahoney, matmaho...@yahoo.com -- *From:* Mike Tintner tint...@blueyonder.co.uk To: agi agi@v2.listbox.com *Sent:* Sat, July 24, 2010 6:16:07 PM Subject: Re: [agi] Re: Huge Progress on the Core of AGI Huh, Matt? What examples of this holistic scene analysis are there (or are you thinking about)? ... *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Matt: I mean a neural model with increasingly complex features, as opposed to an algorithmic 3-D model (like video game graphics in reverse). Of course David rejects such ideas ( http://practicalai.org/Prize/Default.aspx ) even though the one proven working vision model uses it. Which is? and does what? (I'm starting to consider that vision and visual perception - or perhaps one should say common sense, since no sense in humans works independent of the others - may well be considerably *more* complex than language. The evolutionary time required to develop our common sense perception and conception of the world was vastly greater than that required to develop language. And we are as a culture merely in our babbling infancy in beginning to understand how sensory images work and are processed). --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Mike Tintner wrote: Which is? The one right behind your eyes. -- Matt Mahoney, matmaho...@yahoo.com From: Mike Tintner tint...@blueyonder.co.uk To: agi agi@v2.listbox.com Sent: Sat, July 24, 2010 9:00:42 PM Subject: Re: [agi] Re: Huge Progress on the Core of AGI Matt: I mean a neural model with increasingly complex features, as opposed to an algorithmic 3-D model (like video game graphics in reverse). Of course David rejects such ideas ( http://practicalai.org/Prize/Default.aspx ) even though the one proven working vision model uses it. Which is? and does what? (I'm starting to consider that vision and visual perception - or perhaps one should say common sense, since no sense in humans works independent of the others - may well be considerably *more* complex than language. The evolutionary time required to develop our common sense perception and conception of the world was vastly greater than that required to develop language. And we are as a culture merely in our babbling infancy in beginning to understand how sensory images work and are processed). agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Check this out! The title Space and time, not surface features, guide object persistence says it all. http://pbr.psychonomic-journals.org/content/14/6/1199.full.pdf Over just the last couple days I have begun to realize that they are so right. My idea before of using high frame rates is also spot on. The brain does not use features as much as we think. First we construct a model of the object, then we probably decide what features to index it with for future search. If we know that the object occurs at a particular location in space, then we can learn a great deal about it with very little ambiguity! Of course, processing images at all is hard, but that's besides the point... The point is that we can automatically learn about the world using high frame rates and a simple heuristic for identifying specific objects in a scene. Because we can reliably identify them, we can learn an extremely large amount in a very short period of time. We can learn about how lighting affects the colors, noise, size, shape, components, attachment relationships, etc. etc. So, it is very likely that screenshots are not simpler than real images! lol. The objects in real images usually don't change as much, as drastically or as quickly as the objects in screenshots. That means that we can use the simple heuristics of size, shape, location and continuity of time to match objects and learn about them. Dave On Sat, Jul 24, 2010 at 9:10 PM, Matt Mahoney matmaho...@yahoo.com wrote: Mike Tintner wrote: Which is? The one right behind your eyes. -- Matt Mahoney, matmaho...@yahoo.com -- *From:* Mike Tintner tint...@blueyonder.co.uk *To:* agi agi@v2.listbox.com *Sent:* Sat, July 24, 2010 9:00:42 PM *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI Matt: I mean a neural model with increasingly complex features, as opposed to an algorithmic 3-D model (like video game graphics in reverse). Of course David rejects such ideas ( http://practicalai.org/Prize/Default.aspx ) even though the one proven working vision model uses it. Which is? and does what? (I'm starting to consider that vision and visual perception - or perhaps one should say common sense, since no sense in humans works independent of the others - may well be considerably *more* complex than language. The evolutionary time required to develop our common sense perception and conception of the world was vastly greater than that required to develop language. And we are as a culture merely in our babbling infancy in beginning to understand how sensory images work and are processed). *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
This is absolutely incredible. The answer was right there in the last paragraph: The present experiments suggest that the computation of object persistence appears to rely so heavily upon spatiotemporal information that it will not (or at least is unlikely to) use otherwise available surface feature information, particularly when there is conflicting spatiotemporal information. This reveals a striking limitation, given various theories that visual perception uses whatever shortcuts, or heuristics, it can to simplify processing, as well as the theory that perception evolves out of a buildup of the statistical nature of our environment (e.g., Purves Lotto, 2003). Instead, it appears that the object file system has “tunnel vision” and turns a blind eye to surface feature information, focusing on spatiotemporal information when computing persistence. So much for Matt's claim that the brain uses hierarchical features LOL Dave On Sat, Jul 24, 2010 at 11:52 PM, David Jones davidher...@gmail.com wrote: Check this out! The title Space and time, not surface features, guide object persistence says it all. http://pbr.psychonomic-journals.org/content/14/6/1199.full.pdf Over just the last couple days I have begun to realize that they are so right. My idea before of using high frame rates is also spot on. The brain does not use features as much as we think. First we construct a model of the object, then we probably decide what features to index it with for future search. If we know that the object occurs at a particular location in space, then we can learn a great deal about it with very little ambiguity! Of course, processing images at all is hard, but that's besides the point... The point is that we can automatically learn about the world using high frame rates and a simple heuristic for identifying specific objects in a scene. Because we can reliably identify them, we can learn an extremely large amount in a very short period of time. We can learn about how lighting affects the colors, noise, size, shape, components, attachment relationships, etc. etc. So, it is very likely that screenshots are not simpler than real images! lol. The objects in real images usually don't change as much, as drastically or as quickly as the objects in screenshots. That means that we can use the simple heuristics of size, shape, location and continuity of time to match objects and learn about them. Dave On Sat, Jul 24, 2010 at 9:10 PM, Matt Mahoney matmaho...@yahoo.comwrote: Mike Tintner wrote: Which is? The one right behind your eyes. -- Matt Mahoney, matmaho...@yahoo.com -- *From:* Mike Tintner tint...@blueyonder.co.uk *To:* agi agi@v2.listbox.com *Sent:* Sat, July 24, 2010 9:00:42 PM *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI Matt: I mean a neural model with increasingly complex features, as opposed to an algorithmic 3-D model (like video game graphics in reverse). Of course David rejects such ideas ( http://practicalai.org/Prize/Default.aspx ) even though the one proven working vision model uses it. Which is? and does what? (I'm starting to consider that vision and visual perception - or perhaps one should say common sense, since no sense in humans works independent of the others - may well be considerably *more* complex than language. The evolutionary time required to develop our common sense perception and conception of the world was vastly greater than that required to develop language. And we are as a culture merely in our babbling infancy in beginning to understand how sensory images work and are processed). *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Jim, Why more predictive *and then* simpler? --Abram On Thu, Jul 22, 2010 at 11:49 AM, David Jones davidher...@gmail.com wrote: An Update I think the following gets to the heart of general AI and what it takes to achieve it. It also provides us with evidence as to why general AI is so difficult. With this new knowledge in mind, I think I will be much more capable now of solving the problems and making it work. I've come to the conclusion lately that the best hypothesis is better because it is more predictive and then simpler than other hypotheses (in that order more predictive... then simpler). But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems. This is why I have sometimes doubted the truth of the statement. In addition, the observations that the AI gets are not representative of all observations! This means that if your measure of predictiveness depends on the number of certain observations, it could make mistakes! So, the specific observations you are aware of may be unrepresentative of the predictiveness of a hypothesis relative to the truth. If you try to calculate which hypothesis is more predictive and you don't have the critical observations that would give you the right answer, you may get the wrong answer! This all depends of course on your method of calculation, which is quite elusive to define. Visual input from screenshots, for example, can be somewhat malicious. Things can move, appear, disappear or occlude each other suddenly. So, without sufficient knowledge it is hard to decide whether matches you find between such large changes are because it is the same object or a different object. This may indicate that bias and preprogrammed experience should be introduced to the AI before training. Either that or the training inputs should be carefully chosen to avoid malicious input and to make them nice for learning. This is the correspondence problem that is typical of computer vision and has never been properly solved. Such malicious input also makes it difficult to learn automatically because the AI doesn't have sufficient experience to know which changes or transformations are acceptable and which are not. It is immediately bombarded with malicious inputs. I've also realized that if a hypothesis is more explanatory, it may be better. But quantitatively defining explanatory is also elusive and truly depends on the specific problems you are applying it to because it is a heuristic. It is not a true measure of correctness. It is not loyal to the truth. More explanatory is really a heuristic that helps us find hypothesis that are more predictive. The true measure of whether a hypothesis is better is simply the most accurate and predictive hypothesis. That is the ultimate and true measure of correctness. Also, since we can't measure every possible prediction or every last prediction (and we certainly can't predict everything), our measure of predictiveness can't possibly be right all the time! We have no choice but to use a heuristic of some kind. So, its clear to me that the right hypothesis is more predictive and then simpler. But, it is also clear that there will never be a single measure of this that can be applied to all problems. I hope to eventually find a nice model for how to apply it to different problems though. This may be the reason that so many people have tried and failed to develop general AI. Yes, there is a solution. But there is no silver bullet that can be applied to all problems. Some methods are better than others. But I think another major reason of the failures is that people think they can predict things without sufficient information. By approaching the problem this way, we compound the need for heuristics and the errors they produce because we simply don't have sufficient information to make a good decision with limited evidence. If approached correctly, the right solution would solve many more problems with the same efforts than a poor solution would. It would also eliminate some of the difficulties we currently face if sufficient data is available to learn from. In addition to all this theory about better hypotheses, you have to add on the need to solve problems in reasonable time. This also compounds the difficulty of the problem and the complexity of solutions. I am always fascinated by the extraordinary difficulty and complexity of this problem. The more I learn about it, the more I appreciate it. Dave *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com -- Abram Demski http://lo-tho.blogspot.com/ http://groups.google.com/group/one-logic --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed:
Re: [agi] Re: Huge Progress on the Core of AGI
Predicting the old and predictable [incl in shape and form] is narrow AI. Squaresville. Adapting to the new and unpredictable [incl in shape and form] is AGI. Rock on. From: David Jones Sent: Thursday, July 22, 2010 4:49 PM To: agi Subject: [agi] Re: Huge Progress on the Core of AGI An Update I think the following gets to the heart of general AI and what it takes to achieve it. It also provides us with evidence as to why general AI is so difficult. With this new knowledge in mind, I think I will be much more capable now of solving the problems and making it work. I've come to the conclusion lately that the best hypothesis is better because it is more predictive and then simpler than other hypotheses (in that order more predictive... then simpler). But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems. This is why I have sometimes doubted the truth of the statement. In addition, the observations that the AI gets are not representative of all observations! This means that if your measure of predictiveness depends on the number of certain observations, it could make mistakes! So, the specific observations you are aware of may be unrepresentative of the predictiveness of a hypothesis relative to the truth. If you try to calculate which hypothesis is more predictive and you don't have the critical observations that would give you the right answer, you may get the wrong answer! This all depends of course on your method of calculation, which is quite elusive to define. Visual input from screenshots, for example, can be somewhat malicious. Things can move, appear, disappear or occlude each other suddenly. So, without sufficient knowledge it is hard to decide whether matches you find between such large changes are because it is the same object or a different object. This may indicate that bias and preprogrammed experience should be introduced to the AI before training. Either that or the training inputs should be carefully chosen to avoid malicious input and to make them nice for learning. This is the correspondence problem that is typical of computer vision and has never been properly solved. Such malicious input also makes it difficult to learn automatically because the AI doesn't have sufficient experience to know which changes or transformations are acceptable and which are not. It is immediately bombarded with malicious inputs. I've also realized that if a hypothesis is more explanatory, it may be better. But quantitatively defining explanatory is also elusive and truly depends on the specific problems you are applying it to because it is a heuristic. It is not a true measure of correctness. It is not loyal to the truth. More explanatory is really a heuristic that helps us find hypothesis that are more predictive. The true measure of whether a hypothesis is better is simply the most accurate and predictive hypothesis. That is the ultimate and true measure of correctness. Also, since we can't measure every possible prediction or every last prediction (and we certainly can't predict everything), our measure of predictiveness can't possibly be right all the time! We have no choice but to use a heuristic of some kind. So, its clear to me that the right hypothesis is more predictive and then simpler. But, it is also clear that there will never be a single measure of this that can be applied to all problems. I hope to eventually find a nice model for how to apply it to different problems though. This may be the reason that so many people have tried and failed to develop general AI. Yes, there is a solution. But there is no silver bullet that can be applied to all problems. Some methods are better than others. But I think another major reason of the failures is that people think they can predict things without sufficient information. By approaching the problem this way, we compound the need for heuristics and the errors they produce because we simply don't have sufficient information to make a good decision with limited evidence. If approached correctly, the right solution would solve many more problems with the same efforts than a poor solution would. It would also eliminate some of the difficulties we currently face if sufficient data is available to learn from. In addition to all this theory about better hypotheses, you have to add on the need to solve problems in reasonable time. This also compounds the difficulty of the problem and the complexity of solutions. I am always fascinated by the extraordinary difficulty and complexity of this problem. The more I learn about it, the more I appreciate it. Dave agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Because simpler is not better if it is less predictive. On Thu, Jul 22, 2010 at 1:21 PM, Abram Demski abramdem...@gmail.com wrote: Jim, Why more predictive *and then* simpler? --Abram On Thu, Jul 22, 2010 at 11:49 AM, David Jones davidher...@gmail.comwrote: An Update I think the following gets to the heart of general AI and what it takes to achieve it. It also provides us with evidence as to why general AI is so difficult. With this new knowledge in mind, I think I will be much more capable now of solving the problems and making it work. I've come to the conclusion lately that the best hypothesis is better because it is more predictive and then simpler than other hypotheses (in that order more predictive... then simpler). But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems. This is why I have sometimes doubted the truth of the statement. In addition, the observations that the AI gets are not representative of all observations! This means that if your measure of predictiveness depends on the number of certain observations, it could make mistakes! So, the specific observations you are aware of may be unrepresentative of the predictiveness of a hypothesis relative to the truth. If you try to calculate which hypothesis is more predictive and you don't have the critical observations that would give you the right answer, you may get the wrong answer! This all depends of course on your method of calculation, which is quite elusive to define. Visual input from screenshots, for example, can be somewhat malicious. Things can move, appear, disappear or occlude each other suddenly. So, without sufficient knowledge it is hard to decide whether matches you find between such large changes are because it is the same object or a different object. This may indicate that bias and preprogrammed experience should be introduced to the AI before training. Either that or the training inputs should be carefully chosen to avoid malicious input and to make them nice for learning. This is the correspondence problem that is typical of computer vision and has never been properly solved. Such malicious input also makes it difficult to learn automatically because the AI doesn't have sufficient experience to know which changes or transformations are acceptable and which are not. It is immediately bombarded with malicious inputs. I've also realized that if a hypothesis is more explanatory, it may be better. But quantitatively defining explanatory is also elusive and truly depends on the specific problems you are applying it to because it is a heuristic. It is not a true measure of correctness. It is not loyal to the truth. More explanatory is really a heuristic that helps us find hypothesis that are more predictive. The true measure of whether a hypothesis is better is simply the most accurate and predictive hypothesis. That is the ultimate and true measure of correctness. Also, since we can't measure every possible prediction or every last prediction (and we certainly can't predict everything), our measure of predictiveness can't possibly be right all the time! We have no choice but to use a heuristic of some kind. So, its clear to me that the right hypothesis is more predictive and then simpler. But, it is also clear that there will never be a single measure of this that can be applied to all problems. I hope to eventually find a nice model for how to apply it to different problems though. This may be the reason that so many people have tried and failed to develop general AI. Yes, there is a solution. But there is no silver bullet that can be applied to all problems. Some methods are better than others. But I think another major reason of the failures is that people think they can predict things without sufficient information. By approaching the problem this way, we compound the need for heuristics and the errors they produce because we simply don't have sufficient information to make a good decision with limited evidence. If approached correctly, the right solution would solve many more problems with the same efforts than a poor solution would. It would also eliminate some of the difficulties we currently face if sufficient data is available to learn from. In addition to all this theory about better hypotheses, you have to add on the need to solve problems in reasonable time. This also compounds the difficulty of the problem and the complexity of solutions. I am always fascinated by the extraordinary difficulty and complexity of this problem. The more I learn about it, the more I appreciate it. Dave *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com -- Abram Demski http://lo-tho.blogspot.com/
Re: [agi] Re: Huge Progress on the Core of AGI
David Jones wrote: But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems. It isn't hard. To measure predictiveness, you assign a probability to each possible outcome. If the actual outcome has probability p, you score a penalty of log(1/p) bits. To measure simplicity, use the compressed size of the code for your prediction algorithm. Then add the two scores together. That's how it is done in the Calgary challenge http://www.mailcom.com/challenge/ and in my own text compression benchmark. -- Matt Mahoney, matmaho...@yahoo.com From: David Jones davidher...@gmail.com To: agi agi@v2.listbox.com Sent: Thu, July 22, 2010 3:11:46 PM Subject: Re: [agi] Re: Huge Progress on the Core of AGI Because simpler is not better if it is less predictive. On Thu, Jul 22, 2010 at 1:21 PM, Abram Demski abramdem...@gmail.com wrote: Jim, Why more predictive *and then* simpler? --Abram On Thu, Jul 22, 2010 at 11:49 AM, David Jones davidher...@gmail.com wrote: An Update I think the following gets to the heart of general AI and what it takes to achieve it. It also provides us with evidence as to why general AI is so difficult. With this new knowledge in mind, I think I will be much more capable now of solving the problems and making it work. I've come to the conclusion lately that the best hypothesis is better because it is more predictive and then simpler than other hypotheses (in that order more predictive... then simpler). But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems. This is why I have sometimes doubted the truth of the statement. In addition, the observations that the AI gets are not representative of all observations! This means that if your measure of predictiveness depends on the number of certain observations, it could make mistakes! So, the specific observations you are aware of may be unrepresentative of the predictiveness of a hypothesis relative to the truth. If you try to calculate which hypothesis is more predictive and you don't have the critical observations that would give you the right answer, you may get the wrong answer! This all depends of course on your method of calculation, which is quite elusive to define. Visual input from screenshots, for example, can be somewhat malicious. Things can move, appear, disappear or occlude each other suddenly. So, without sufficient knowledge it is hard to decide whether matches you find between such large changes are because it is the same object or a different object. This may indicate that bias and preprogrammed experience should be introduced to the AI before training. Either that or the training inputs should be carefully chosen to avoid malicious input and to make them nice for learning. This is the correspondence problem that is typical of computer vision and has never been properly solved. Such malicious input also makes it difficult to learn automatically because the AI doesn't have sufficient experience to know which changes or transformations are acceptable and which are not. It is immediately bombarded with malicious inputs. I've also realized that if a hypothesis is more explanatory, it may be better. But quantitatively defining explanatory is also elusive and truly depends on the specific problems you are applying it to because it is a heuristic. It is not a true measure of correctness. It is not loyal to the truth. More explanatory is really a heuristic that helps us find hypothesis that are more predictive. The true measure of whether a hypothesis is better is simply the most accurate and predictive hypothesis. That is the ultimate and true measure of correctness. Also, since we can't measure every possible prediction or every last prediction (and we certainly can't predict everything), our measure of predictiveness can't possibly be right all the time! We have no choice but to use a heuristic of some kind. So, its clear to me that the right hypothesis is more predictive and then simpler. But, it is also clear that there will never be a single measure of this that can be applied to all problems. I hope to eventually find a nice model for how to apply it to different problems though. This may be the reason that so many people have tried and failed to develop general AI. Yes, there is a solution. But there is no silver bullet that can be applied to all problems. Some methods are better than others. But I think another major reason of the failures is that people think they can predict things without sufficient information. By approaching the problem this way, we compound the need for heuristics and the errors they produce because we simply don't have sufficient information to make a good decision with limited evidence. If approached correctly, the right solution would solve many more
Re: [agi] Re: Huge Progress on the Core of AGI
It's certainly not as simple as you claim. First, assigning a probability is not always possible, nor is it easy. The factors in calculating that probability are unknown and are not the same for every instance. Since we do not know what combination of observations we will see, we cannot have a predefined set of probabilities, nor is it any easier to create a probability function that generates them for us. That is just as exactly what I meant by quantitatively define the predictiveness... it would be proportional to the probability. Second, if you can define a program ina way that is always simpler when it is smaller, then you can do the same thing without a program. I don't think it makes any sense to do it this way. It is not that simple. If it was, we could solve a large portion of agi easily. On Thu, Jul 22, 2010 at 3:16 PM, Matt Mahoney matmaho...@yahoo.com wrote: David Jones wrote: But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems. It isn't hard. To measure predictiveness, you assign a probability to each possible outcome. If the actual outcome has probability p, you score a penalty of log(1/p) bits. To measure simplicity, use the compressed size of the code for your prediction algorithm. Then add the two scores together. That's how it is done in the Calgary challenge http://www.mailcom.com/challenge/ and in my own text compression benchmark. -- Matt Mahoney, matmaho...@yahoo.com *From:* David Jones davidher...@gmail.com *To:* agi agi@v2.listbox.com *Sent:* Thu, July 22, 2010 3:11:46 PM *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI Because simpler is not better if it is less predictive. On Thu, Jul 22, 2010 at 1:21 PM, Abram Demski abramdem...@gmail.com wrote: Jim, Why more predictive *and then* simpler? --Abram On Thu, Jul 22, 2010 at 11:49 AM, David Jones davidher...@gmail.com wrote: An Update I think the following gets to the heart of general AI and what it takes to achieve it. It also provides us with evidence as to why general AI is so difficult. With this new knowledge in mind, I think I will be much more capable now of solving the problems and making it work. I've come to the conclusion lately that the best hypothesis is better because it is more predictive and then simpler than other hypotheses (in that order more predictive... then simpler). But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems. This is why I have sometimes doubted the truth of the statement. In addition, the observations that the AI gets are not representative of all observations! This means that if your measure of predictiveness depends on the number of certain observations, it could make mistakes! So, the specific observations you are aware of may be unrepresentative of the predictiveness of a hypothesis relative to the truth. If you try to calculate which hypothesis is more predictive and you don't have the critical observations that would give you the right answer, you may get the wrong answer! This all depends of course on your method of calculation, which is quite elusive to define. Visual input from screenshots, for example, can be somewhat malicious. Things can move, appear, disappear or occlude each other suddenly. So, without sufficient knowledge it is hard to decide whether matches you find between such large changes are because it is the same object or a different object. This may indicate that bias and preprogrammed experience should be introduced to the AI before training. Either that or the training inputs should be carefully chosen to avoid malicious input and to make them nice for learning. This is the correspondence problem that is typical of computer vision and has never been properly solved. Such malicious input also makes it difficult to learn automatically because the AI doesn't have sufficient experience to know which changes or transformations are acceptable and which are not. It is immediately bombarded with malicious inputs. I've also realized that if a hypothesis is more explanatory, it may be better. But quantitatively defining explanatory is also elusive and truly depends on the specific problems you are applying it to because it is a heuristic. It is not a true measure of correctness. It is not loyal to the truth. More explanatory is really a heuristic that helps us find hypothesis that are more predictive. The true measure of whether a hypothesis is better is simply the most accurate and predictive hypothesis. That is the ultimate and true measure of correctness. Also, since we can't measure every possible prediction or every last prediction (and we certainly can't predict everything), our measure of predictiveness can't possibly be right all the time! We have no choice but to use a heuristic of some kind. So, its clear to me that the right hypothesis is more predictive and then simpler. But, it is also clear
Re: [agi] Re: Huge Progress on the Core of AGI
ps-- Sorry for accidentally calling you Jim! On Thu, Jul 22, 2010 at 1:21 PM, Abram Demski abramdem...@gmail.com wrote: Jim, Why more predictive *and then* simpler? --Abram On Thu, Jul 22, 2010 at 11:49 AM, David Jones davidher...@gmail.comwrote: An Update I think the following gets to the heart of general AI and what it takes to achieve it. It also provides us with evidence as to why general AI is so difficult. With this new knowledge in mind, I think I will be much more capable now of solving the problems and making it work. I've come to the conclusion lately that the best hypothesis is better because it is more predictive and then simpler than other hypotheses (in that order more predictive... then simpler). But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems. This is why I have sometimes doubted the truth of the statement. In addition, the observations that the AI gets are not representative of all observations! This means that if your measure of predictiveness depends on the number of certain observations, it could make mistakes! So, the specific observations you are aware of may be unrepresentative of the predictiveness of a hypothesis relative to the truth. If you try to calculate which hypothesis is more predictive and you don't have the critical observations that would give you the right answer, you may get the wrong answer! This all depends of course on your method of calculation, which is quite elusive to define. Visual input from screenshots, for example, can be somewhat malicious. Things can move, appear, disappear or occlude each other suddenly. So, without sufficient knowledge it is hard to decide whether matches you find between such large changes are because it is the same object or a different object. This may indicate that bias and preprogrammed experience should be introduced to the AI before training. Either that or the training inputs should be carefully chosen to avoid malicious input and to make them nice for learning. This is the correspondence problem that is typical of computer vision and has never been properly solved. Such malicious input also makes it difficult to learn automatically because the AI doesn't have sufficient experience to know which changes or transformations are acceptable and which are not. It is immediately bombarded with malicious inputs. I've also realized that if a hypothesis is more explanatory, it may be better. But quantitatively defining explanatory is also elusive and truly depends on the specific problems you are applying it to because it is a heuristic. It is not a true measure of correctness. It is not loyal to the truth. More explanatory is really a heuristic that helps us find hypothesis that are more predictive. The true measure of whether a hypothesis is better is simply the most accurate and predictive hypothesis. That is the ultimate and true measure of correctness. Also, since we can't measure every possible prediction or every last prediction (and we certainly can't predict everything), our measure of predictiveness can't possibly be right all the time! We have no choice but to use a heuristic of some kind. So, its clear to me that the right hypothesis is more predictive and then simpler. But, it is also clear that there will never be a single measure of this that can be applied to all problems. I hope to eventually find a nice model for how to apply it to different problems though. This may be the reason that so many people have tried and failed to develop general AI. Yes, there is a solution. But there is no silver bullet that can be applied to all problems. Some methods are better than others. But I think another major reason of the failures is that people think they can predict things without sufficient information. By approaching the problem this way, we compound the need for heuristics and the errors they produce because we simply don't have sufficient information to make a good decision with limited evidence. If approached correctly, the right solution would solve many more problems with the same efforts than a poor solution would. It would also eliminate some of the difficulties we currently face if sufficient data is available to learn from. In addition to all this theory about better hypotheses, you have to add on the need to solve problems in reasonable time. This also compounds the difficulty of the problem and the complexity of solutions. I am always fascinated by the extraordinary difficulty and complexity of this problem. The more I learn about it, the more I appreciate it. Dave *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com -- Abram Demski http://lo-tho.blogspot.com/ http://groups.google.com/group/one-logic
Re: [agi] Re: Huge Progress on the Core of AGI
David, What are the different ways you are thinking of for measuring the predictiveness? I can think of a few different possibilities (such as measuring number incorrect vs measuring fraction incorrect, et cetera) but I'm wondering which variations you consider significant/troublesome/etc. --Abram On Thu, Jul 22, 2010 at 7:12 PM, David Jones davidher...@gmail.com wrote: It's certainly not as simple as you claim. First, assigning a probability is not always possible, nor is it easy. The factors in calculating that probability are unknown and are not the same for every instance. Since we do not know what combination of observations we will see, we cannot have a predefined set of probabilities, nor is it any easier to create a probability function that generates them for us. That is just as exactly what I meant by quantitatively define the predictiveness... it would be proportional to the probability. Second, if you can define a program ina way that is always simpler when it is smaller, then you can do the same thing without a program. I don't think it makes any sense to do it this way. It is not that simple. If it was, we could solve a large portion of agi easily. On Thu, Jul 22, 2010 at 3:16 PM, Matt Mahoney matmaho...@yahoo.com wrote: David Jones wrote: But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems. It isn't hard. To measure predictiveness, you assign a probability to each possible outcome. If the actual outcome has probability p, you score a penalty of log(1/p) bits. To measure simplicity, use the compressed size of the code for your prediction algorithm. Then add the two scores together. That's how it is done in the Calgary challenge http://www.mailcom.com/challenge/ and in my own text compression benchmark. -- Matt Mahoney, matmaho...@yahoo.com *From:* David Jones davidher...@gmail.com *To:* agi agi@v2.listbox.com *Sent:* Thu, July 22, 2010 3:11:46 PM *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI Because simpler is not better if it is less predictive. On Thu, Jul 22, 2010 at 1:21 PM, Abram Demski abramdem...@gmail.com wrote: Jim, Why more predictive *and then* simpler? --Abram On Thu, Jul 22, 2010 at 11:49 AM, David Jones davidher...@gmail.com wrote: An Update I think the following gets to the heart of general AI and what it takes to achieve it. It also provides us with evidence as to why general AI is so difficult. With this new knowledge in mind, I think I will be much more capable now of solving the problems and making it work. I've come to the conclusion lately that the best hypothesis is better because it is more predictive and then simpler than other hypotheses (in that order more predictive... then simpler). But, I am amazed at how difficult it is to quantitatively define more predictive and simpler for specific problems. This is why I have sometimes doubted the truth of the statement. In addition, the observations that the AI gets are not representative of all observations! This means that if your measure of predictiveness depends on the number of certain observations, it could make mistakes! So, the specific observations you are aware of may be unrepresentative of the predictiveness of a hypothesis relative to the truth. If you try to calculate which hypothesis is more predictive and you don't have the critical observations that would give you the right answer, you may get the wrong answer! This all depends of course on your method of calculation, which is quite elusive to define. Visual input from screenshots, for example, can be somewhat malicious. Things can move, appear, disappear or occlude each other suddenly. So, without sufficient knowledge it is hard to decide whether matches you find between such large changes are because it is the same object or a different object. This may indicate that bias and preprogrammed experience should be introduced to the AI before training. Either that or the training inputs should be carefully chosen to avoid malicious input and to make them nice for learning. This is the correspondence problem that is typical of computer vision and has never been properly solved. Such malicious input also makes it difficult to learn automatically because the AI doesn't have sufficient experience to know which changes or transformations are acceptable and which are not. It is immediately bombarded with malicious inputs. I've also realized that if a hypothesis is more explanatory, it may be better. But quantitatively defining explanatory is also elusive and truly depends on the specific problems you are applying it to because it is a heuristic. It is not a true measure of correctness. It is not loyal to the truth. More explanatory is really a heuristic that helps us find hypothesis that are more predictive. The true measure of whether a hypothesis is better is simply the most
Re: [agi] Re: Huge Progress on the Core of AGI
PS-- I am not denying that statistics is applied probability theory. :) When I say they are different, what I mean is that saying I'm going to use probability theory and I'm going to use statistics tend to indicate very different approaches. Probability is a set of axioms, whereas statistics is a set of methods. The probability theory camp tends to be bayesian, whereas the stats camp tends to be frequentist. Your complaint that probability theory doesn't try to figure out why it was wrong in the 30% (or whatever) it misses is a common objection. Probability theory glosses over important detail, it encourages lazy thinking, etc. However, this all depends on the space of hypotheses being examined. Statistical methods will be prone to this objection because they are essentially narrow-AI methods: they don't *try* to search in the space of all hypotheses a human might consider. An AGI setup can and should have such a large hypothesis space. Note that AIXI is typically formulated as using a space of crisp (non-probabilistic) hypotheses, though probability theory is used to reason about them. This means no theory it considers will gloss over detail in this way: every theory completely explains the data. (I use AIXI as a convenient example, not because I agree with it.) --Abram On Mon, Jul 12, 2010 at 2:42 PM, Abram Demski abramdem...@gmail.com wrote: David, I tend to think of probability theory and statistics as different things. I'd agree that statistics is not enough for AGI, but in contrast I think probability theory is a pretty good foundation. Bayesianism to me provides a sound way of integrating the elegance/utility tradeoff of explanation-based reasoning into the basic fabric of the uncertainty calculus. Others advocate different sorts of uncertainty than probabilities, but so far what I've seen indicates more a lack of ability to apply probability theory than a need for a new type of uncertainty. What other methods do you favor for dealing with these things? --Abram On Sun, Jul 11, 2010 at 12:30 PM, David Jones davidher...@gmail.comwrote: Thanks Abram, I know that probability is one approach. But there are many problems with using it in actual implementations. I know a lot of people will be angered by that statement and retort with all the successes that they have had using probability. But, the truth is that you can solve the problems many ways and every way has its pros and cons. I personally believe that probability has unacceptable cons if used all by itself. It must only be used when it is the best tool for the task. I do plan to use some probability within my approach. But only when it makes sense to do so. I do not believe in completely statistical solutions or completely Bayesian machine learning alone. A good example of when I might use it is when a particular hypothesis predicts something with 70% accuracy, well it may be better than any other hypothesis we can come up with so far. So, we may use that hypothesis. But, the 30% unexplained errors should be explained if possible with the resources and algorithms available, if at all possible. This is where my method differs from statistical methods. I want to build algorithms that resolve the 30% and explain it. For many problems, there are rules and knowledge that will solve them effectively. Probability should only be used when you cannot find a more accurate solution. Basically we should use probability when we don't know the factors involved, can't find any rules to explain the phenomena or we don't have the time and resources to figure it out. So you must simply guess at the most probable event without any rules for figuring out which event is more applicable under the current circumstances. So, in summary, probability definitely has its place. I just think that explanatory reasoning and other more accurate methods should be preferred whenever possible. Regarding learning the knowledge being the bigger problem, I completely agree. That is why I think it is so important to develop machine learning that can learn by direct observation of the environment. Without that, it is practically impossible to gather the knowledge required for AGI-type applications. We can learn this knowledge by analyzing the world automatically and generally through video. My step by step approach for learning and then applying the knowledge for agi is as follows: 1) Understand and learn about the environment(through Computer Vision for now and other sensory perceptions in the future) 2) learn about your own actions and how they affect the environment 3) learn about language and how it is associated with or related to the environment. 4) learn goals from language(such as through dedicated inputs). 5) Goal pursuit 6) Other Miscellaneous capabilities as needed Dave On Sat, Jul 10, 2010 at 8:40 PM, Abram Demski abramdem...@gmail.comwrote: David, Sorry for the slow response. I agree
Re: [agi] Re: Huge Progress on the Core of AGI
On Tue, Jul 13, 2010 at 2:29 AM, Abram Demski abramdem...@gmail.com wrote: [The]complaint that probability theory doesn't try to figure out why it was wrong in the 30% (or whatever) it misses is a common objection. Probability theory glosses over important detail, it encourages lazy thinking, etc. However, this all depends on the space of hypotheses being examined. Statistical methods will be prone to this objection because they are essentially narrow-AI methods: they don't *try* to search in the space of all hypotheses a human might consider. An AGI setup can and should have such a large hypothesis space. --- That is the thing. We cannot search all possible hypotheses because we could not even write all possible hypotheses down. This is why hypotheses have to be formed creatively in response to an analysis of a situation. In my arrogant opinion, this is best done through a method that creatively uses discreet representations. Of course it can use statistical or probabilistic data in making those creative hypotheses when there is good data to be used. But the best way to do this is through categorization based creativity. But this is an imaginative method, one which creates imaginative explanations (or other co-relations) for observed or conjectured events. Those imaginative hypotheses then have to be compared to a situation through some trial and error methods. Then the tentative conjectures that seem to withstand initial tests have to be further integrated into other hypotheses, conjectures and explanations that are related to the subject of the hypotheses. This process of conceptual integration, a process which has to rely on both creative methods and rational methods, is a fundamental part of the process which does not seem to be clearly understood. Conceptual Integration cannot be accomplished by reducing a concept to True or False or to some number from 0 to 1 and then combined with other concepts that were also so reduced. Ideas take on roles when combined with other ideas. Basically, a new idea has to be fit into a complex of other ideas that are strongly related to it. Jim Bromer On Tue, Jul 13, 2010 at 2:29 AM, Abram Demski abramdem...@gmail.com wrote: PS-- I am not denying that statistics is applied probability theory. :) When I say they are different, what I mean is that saying I'm going to use probability theory and I'm going to use statistics tend to indicate very different approaches. Probability is a set of axioms, whereas statistics is a set of methods. The probability theory camp tends to be bayesian, whereas the stats camp tends to be frequentist. Your complaint that probability theory doesn't try to figure out why it was wrong in the 30% (or whatever) it misses is a common objection. Probability theory glosses over important detail, it encourages lazy thinking, etc. However, this all depends on the space of hypotheses being examined. Statistical methods will be prone to this objection because they are essentially narrow-AI methods: they don't *try* to search in the space of all hypotheses a human might consider. An AGI setup can and should have such a large hypothesis space. Note that AIXI is typically formulated as using a space of crisp (non-probabilistic) hypotheses, though probability theory is used to reason about them. This means no theory it considers will gloss over detail in this way: every theory completely explains the data. (I use AIXI as a convenient example, not because I agree with it.) --Abram On Mon, Jul 12, 2010 at 2:42 PM, Abram Demski abramdem...@gmail.comwrote: David, I tend to think of probability theory and statistics as different things. I'd agree that statistics is not enough for AGI, but in contrast I think probability theory is a pretty good foundation. Bayesianism to me provides a sound way of integrating the elegance/utility tradeoff of explanation-based reasoning into the basic fabric of the uncertainty calculus. Others advocate different sorts of uncertainty than probabilities, but so far what I've seen indicates more a lack of ability to apply probability theory than a need for a new type of uncertainty. What other methods do you favor for dealing with these things? --Abram On Sun, Jul 11, 2010 at 12:30 PM, David Jones davidher...@gmail.comwrote: Thanks Abram, I know that probability is one approach. But there are many problems with using it in actual implementations. I know a lot of people will be angered by that statement and retort with all the successes that they have had using probability. But, the truth is that you can solve the problems many ways and every way has its pros and cons. I personally believe that probability has unacceptable cons if used all by itself. It must only be used when it is the best tool for the task. I do plan to use some probability within my approach. But only when it makes sense to do so. I do not believe in completely
Re: [agi] Re: Huge Progress on the Core of AGI
Abram, Thanks for the clarification Abram. I don't have a single way to deal with uncertainty. I try not to decide on a method ahead of time because what I really want to do is analyze the problems and find a solution. But, at the same time. I have looked at the probabilistic approaches and they don't seem to be sufficient to solve the problem as they are now. So, my inclination is to use methods that don't gloss over important details. For me, the most important way of dealing with uncertainty is through explanatory-type reasoning. But, explanatory reasoning has not been well defined yet. So, the implementation is not yet clear. That's where I am now. I've begun to approach problems as follows. I try to break the problem down and answer the following questions: 1) How do we come up with or construct possible hypotheses. 2) How do we compare hypotheses to determine which is better. 3) How do we lower the uncertainty of hypotheses. 4) How do we determine the likelihood or strength of a single hypothesis all by itself. Is it sufficient on its own? With those questions in mind, the solution seems to be to break possible hypotheses down into pieces that are generally applicable. For example, in image analysis, a particular type of hypothesis might be related to 1) motion or 2) attachment relationships or 3) change or movement behavior of an object, etc. By breaking the possible hypotheses into very general pieces, you can apply them to just about any problem. With that as a tool, you can then develop general methods for resolving uncertainty of such hypotheses using explanatory scoring, consistency, and even statistical analysis. Does that make sense to you? Dave On Tue, Jul 13, 2010 at 2:29 AM, Abram Demski abramdem...@gmail.com wrote: PS-- I am not denying that statistics is applied probability theory. :) When I say they are different, what I mean is that saying I'm going to use probability theory and I'm going to use statistics tend to indicate very different approaches. Probability is a set of axioms, whereas statistics is a set of methods. The probability theory camp tends to be bayesian, whereas the stats camp tends to be frequentist. Your complaint that probability theory doesn't try to figure out why it was wrong in the 30% (or whatever) it misses is a common objection. Probability theory glosses over important detail, it encourages lazy thinking, etc. However, this all depends on the space of hypotheses being examined. Statistical methods will be prone to this objection because they are essentially narrow-AI methods: they don't *try* to search in the space of all hypotheses a human might consider. An AGI setup can and should have such a large hypothesis space. Note that AIXI is typically formulated as using a space of crisp (non-probabilistic) hypotheses, though probability theory is used to reason about them. This means no theory it considers will gloss over detail in this way: every theory completely explains the data. (I use AIXI as a convenient example, not because I agree with it.) --Abram --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
You seem to be reaching for something important here, but it isn't at all clear what you mean. I would say that any creative activity (incl. pure problemsolving) begins from a *conceptual paradigm* - a v. rough outline - of the form of that activity and the form of its end-product or -procedure. As distinct from rational activities where a formula (and algorithm) define the form of the product (and activity) with complete precision. You have a conceptual paradigm of writing a post or shopping for groceries or having a conversation. You couldn't possibly have a formula or algorithm completely defining every step - every word and sentence, every food, every topic - you may have or want to take. And programs as we know them, don't and can't handle *concepts* - despite the misnomers of conceptual graphs/spaces etc wh are not concepts at all. They can't for example handle writing or shopping because these can only be expressed as flexible outlines/schemas as per ideograms. What do you mean? . From: Jim Bromer Sent: Tuesday, July 13, 2010 2:50 PM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI On Tue, Jul 13, 2010 at 2:29 AM, Abram Demski abramdem...@gmail.com wrote: [The]complaint that probability theory doesn't try to figure out why it was wrong in the 30% (or whatever) it misses is a common objection. Probability theory glosses over important detail, it encourages lazy thinking, etc. However, this all depends on the space of hypotheses being examined. Statistical methods will be prone to this objection because they are essentially narrow-AI methods: they don't *try* to search in the space of all hypotheses a human might consider. An AGI setup can and should have such a large hypothesis space. --- That is the thing. We cannot search all possible hypotheses because we could not even write all possible hypotheses down. This is why hypotheses have to be formed creatively in response to an analysis of a situation. In my arrogant opinion, this is best done through a method that creatively uses discreet representations. Of course it can use statistical or probabilistic data in making those creative hypotheses when there is good data to be used. But the best way to do this is through categorization based creativity. But this is an imaginative method, one which creates imaginative explanations (or other co-relations) for observed or conjectured events. Those imaginative hypotheses then have to be compared to a situation through some trial and error methods. Then the tentative conjectures that seem to withstand initial tests have to be further integrated into other hypotheses, conjectures and explanations that are related to the subject of the hypotheses. This process of conceptual integration, a process which has to rely on both creative methods and rational methods, is a fundamental part of the process which does not seem to be clearly understood. Conceptual Integration cannot be accomplished by reducing a concept to True or False or to some number from 0 to 1 and then combined with other concepts that were also so reduced. Ideas take on roles when combined with other ideas. Basically, a new idea has to be fit into a complex of other ideas that are strongly related to it. Jim Bromer On Tue, Jul 13, 2010 at 2:29 AM, Abram Demski abramdem...@gmail.com wrote: PS-- I am not denying that statistics is applied probability theory. :) When I say they are different, what I mean is that saying I'm going to use probability theory and I'm going to use statistics tend to indicate very different approaches. Probability is a set of axioms, whereas statistics is a set of methods. The probability theory camp tends to be bayesian, whereas the stats camp tends to be frequentist. Your complaint that probability theory doesn't try to figure out why it was wrong in the 30% (or whatever) it misses is a common objection. Probability theory glosses over important detail, it encourages lazy thinking, etc. However, this all depends on the space of hypotheses being examined. Statistical methods will be prone to this objection because they are essentially narrow-AI methods: they don't *try* to search in the space of all hypotheses a human might consider. An AGI setup can and should have such a large hypothesis space. Note that AIXI is typically formulated as using a space of crisp (non-probabilistic) hypotheses, though probability theory is used to reason about them. This means no theory it considers will gloss over detail in this way: every theory completely explains the data. (I use AIXI as a convenient example, not because I agree with it.) --Abram On Mon, Jul 12, 2010 at 2:42 PM, Abram Demski abramdem...@gmail.com wrote: David, I tend to think of probability theory and statistics as different things. I'd agree that statistics is not enough for AGI, but in contrast I think probability
Re: [agi] Re: Huge Progress on the Core of AGI
On Tue, Jul 13, 2010 at 10:07 AM, Mike Tintner tint...@blueyonder.co.ukwrote: And programs as we know them, don't and can't handle *concepts* - despite the misnomers of conceptual graphs/spaces etc wh are not concepts at all. They can't for example handle writing or shopping because these can only be expressed as flexible outlines/schemas as per ideograms. I disagree with this, and so this is proper focus for our disagreement. Although there are other aspects of the problem that we probably disagree on, this is such a fundamental issue, that nothing can get past it. Either programs can deal with flexible outlines/schema or they can't. If they can't then AGI is probably impossible. If they can, then AGI is probably possible. I think that we both agree that creativity and imagination is absolutely necessary aspects of intelligence. Jim Bromer --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
I meant, I think that we both agree that creativity and imagination are absolutely necessary aspects of intelligence. of course! On Tue, Jul 13, 2010 at 12:46 PM, Jim Bromer jimbro...@gmail.com wrote: On Tue, Jul 13, 2010 at 10:07 AM, Mike Tintner tint...@blueyonder.co.ukwrote: And programs as we know them, don't and can't handle *concepts* - despite the misnomers of conceptual graphs/spaces etc wh are not concepts at all. They can't for example handle writing or shopping because these can only be expressed as flexible outlines/schemas as per ideograms. I disagree with this, and so this is proper focus for our disagreement. Although there are other aspects of the problem that we probably disagree on, this is such a fundamental issue, that nothing can get past it. Either programs can deal with flexible outlines/schema or they can't. If they can't then AGI is probably impossible. If they can, then AGI is probably possible. I think that we both agree that creativity and imagination is absolutely necessary aspects of intelligence. Jim Bromer --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
The first thing is to acknowledge that programs *don't* handle concepts - if you think they do, you must give examples. The reasons they can't, as presently conceived, is a) concepts encase a more or less *infinite diversity of forms* (even if only applying at first to a species of object) - *chair* for example as I've demonstrated embraces a vast open-ended diversity of radically different chair forms; higher order concepts like furniture embrace ... well, it's hard to think even of the parameters, let alone the diversity of forms, here. b) concepts are *polydomain*- not just multi- but open-endedly extensible in their domains; chair for example, can also refer to a person, skin in French, two humans forming a chair to carry s.o., a prize, etc. Basically concepts have a freeform realm or sphere of reference, and you can't have a setform, preprogrammed approach to defining that realm. There's no reason however why you can't mechanically and computationally begin to instantiate the kind of freeform approach I'm proposing. The most important obstacle is the setform mindset of AGI-ers - epitomised by Dave looking at squares, moving along set lines - setform objects in setform motion - when it would be more appropriate to look at something like snakes.- freeform objects in freeform motion. Concepts also - altho this is a huge subject - are *the* language of the general programs (as distinct from specialist programs, wh. is all we have right now) that must inform an AGI. Anyone proposing a grandscale AGI project like Ben's (wh. I def. wouldn't recommend) must crack the problem of conceptualisation more or less from the beginning. I'm not aware of anyone who has any remotely viable proposals here, are you? From: Jim Bromer Sent: Tuesday, July 13, 2010 5:46 PM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI On Tue, Jul 13, 2010 at 10:07 AM, Mike Tintner tint...@blueyonder.co.uk wrote: And programs as we know them, don't and can't handle *concepts* - despite the misnomers of conceptual graphs/spaces etc wh are not concepts at all. They can't for example handle writing or shopping because these can only be expressed as flexible outlines/schemas as per ideograms. I disagree with this, and so this is proper focus for our disagreement. Although there are other aspects of the problem that we probably disagree on, this is such a fundamental issue, that nothing can get past it. Either programs can deal with flexible outlines/schema or they can't. If they can't then AGI is probably impossible. If they can, then AGI is probably possible. I think that we both agree that creativity and imagination is absolutely necessary aspects of intelligence. Jim Bromer agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Mike, you are so full of it. There is a big difference between *can* and *don't*. You have no proof that programs can't handle anything you say that can't. On Tue, Jul 13, 2010 at 2:36 PM, Mike Tintner tint...@blueyonder.co.ukwrote: The first thing is to acknowledge that programs *don't* handle concepts - if you think they do, you must give examples. The reasons they can't, as presently conceived, is a) concepts encase a more or less *infinite diversity of forms* (even if only applying at first to a species of object) - *chair* for example as I've demonstrated embraces a vast open-ended diversity of radically different chair forms; higher order concepts like furniture embrace ... well, it's hard to think even of the parameters, let alone the diversity of forms, here. b) concepts are *polydomain*- not just multi- but open-endedly extensible in their domains; chair for example, can also refer to a person, skin in French, two humans forming a chair to carry s.o., a prize, etc. Basically concepts have a freeform realm or sphere of reference, and you can't have a setform, preprogrammed approach to defining that realm. There's no reason however why you can't mechanically and computationally begin to instantiate the kind of freeform approach I'm proposing. The most important obstacle is the setform mindset of AGI-ers - epitomised by Dave looking at squares, moving along set lines - setform objects in setform motion - when it would be more appropriate to look at something like snakes.- freeform objects in freeform motion. Concepts also - altho this is a huge subject - are *the* language of the general programs (as distinct from specialist programs, wh. is all we have right now) that must inform an AGI. Anyone proposing a grandscale AGI project like Ben's (wh. I def. wouldn't recommend) must crack the problem of conceptualisation more or less from the beginning. I'm not aware of anyone who has any remotely viable proposals here, are you? *From:* Jim Bromer jimbro...@gmail.com *Sent:* Tuesday, July 13, 2010 5:46 PM *To:* agi agi@v2.listbox.com *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI On Tue, Jul 13, 2010 at 10:07 AM, Mike Tintner tint...@blueyonder.co.ukwrote: And programs as we know them, don't and can't handle *concepts* - despite the misnomers of conceptual graphs/spaces etc wh are not concepts at all. They can't for example handle writing or shopping because these can only be expressed as flexible outlines/schemas as per ideograms. I disagree with this, and so this is proper focus for our disagreement. Although there are other aspects of the problem that we probably disagree on, this is such a fundamental issue, that nothing can get past it. Either programs can deal with flexible outlines/schema or they can't. If they can't then AGI is probably impossible. If they can, then AGI is probably possible. I think that we both agree that creativity and imagination is absolutely necessary aspects of intelligence. Jim Bromer *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Mike, see below. On Tue, Jul 13, 2010 at 2:36 PM, Mike Tintner tint...@blueyonder.co.ukwrote: The first thing is to acknowledge that programs *don't* handle concepts - if you think they do, you must give examples. The reasons they can't, as presently conceived, is a) concepts encase a more or less *infinite diversity of forms* (even if only applying at first to a species of object) - *chair* for example as I've demonstrated embraces a vast open-ended diversity of radically different chair forms; higher order concepts like furniture embrace ... well, it's hard to think even of the parameters, let alone the diversity of forms, here. invoking infinity is insufficient argument to say that a program can't recognize an infinite number of forms. In fact, I can prove it. Lets say that all numbers are made of digits 0,1,2,3...9. If you can recognize just 9 digits, you can read infinitely large numbers. Another example, you can create an infinite number of very diverse shapes and forms out of clay. But, I can represent every last one of them using simple mesh models. The mesh models are made of a very small number of concepts: lines, points, distance constraints, etc. So, an infinite number of diverse concepts or forms can be modeled using a very small number of concepts. In conclusion, you have no proof at all that programs can't handle these things. You just THINK they can't. But, I for one, know you're dead wrong. b) concepts are *polydomain*- not just multi- but open-endedly extensible in their domains; chair for example, can also refer to a person, skin in French, two humans forming a chair to carry s.o., a prize, etc. A chair is defined by anything you can sit on. Anything you can sit on is defined by a certain type of form that you can actually learn inductively. It is not impossible to teach a computer to recognize things that could be sat on or even things that seem like they have the form of something that might be sat on. To say that a computer can never learn this is impossible for you to claim. You see, very diverse concepts can be represented by a small number of other concepts such as time, space, 3D form, etc. You claim is completely baseless. Basically concepts have a freeform realm or sphere of reference, and you can't have a setform, preprogrammed approach to defining that realm. you can if it covers base concepts which can represent larger concepts. There's no reason however why you can't mechanically and computationally begin to instantiate the kind of freeform approach I'm proposing. The most important obstacle is the setform mindset of AGI-ers - epitomised by Dave looking at squares, moving along set lines - setform objects in setform motion - when it would be more appropriate to look at something like snakes.- freeform objects in freeform motion. squares can move in an infinite number of ways. It is an experiment An exercise... to learn how AGI deals with uncertainty, even if the uncertainty is very limited. Clearly you have no imagination to understand why doing such experiments might be useful. You think moving squares is simple just because they are squares. But, you fail to realize that uncertainty can be generated out of even very simple systems. And so far you have never stated how you would deal with such uncertainty. --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
On Tue, Jul 13, 2010 at 2:36 PM, Mike Tintner tint...@blueyonder.co.ukwrote: The first thing is to acknowledge that programs *don't* handle concepts - if you think they do, you must give examples. The reasons they can't, as presently conceived, is a) concepts encase a more or less *infinite diversity of forms* (even if only applying at first to a species of object) - *chair* for example as I've demonstrated embraces a vast open-ended diversity of radically different chair forms; higher order concepts like furniture embrace ... well, it's hard to think even of the parameters, let alone the diversity of forms, here. b) concepts are *polydomain*- not just multi- but open-endedly extensible in their domains; chair for example, can also refer to a person, skin in French, two humans forming a chair to carry s.o., a prize, etc. Basically concepts have a freeform realm or sphere of reference, and you can't have a setform, preprogrammed approach to defining that realm. There's no reason however why you can't mechanically and computationally begin to instantiate the kind of freeform approach I'm proposing. So here you are saying that programs don't handle concepts but they could begin to instantiate the kind of freeform approach that you are proposing. Are you sure you are not saying that programs can't handle concepts unless we do exactly what you are suggesting that we should do. Because a lot of us say that. Jim Bromer --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Thanks Abram, I'll read up on it when I get a chance. On Tue, Jul 13, 2010 at 12:03 PM, Abram Demski abramdem...@gmail.comwrote: David, Yes, this makes sense to me. To go back to your original query, I still think you will find a rich community relevant to your work if you look into the MDL literature (which additionally does not rely on probability theory, though as I said I'd say it's equivalent). Perhaps this book might be helpful: http://www.amazon.com/Description-Principle-Adaptive-Computation-Learning/dp/0262072815/ref=sr_1_1?ie=UTF8s=booksqid=1279036776sr=8-1 It includes a (short-ish?) section comparing the pros/cons of MDL and Bayesianism, and examining some of the mathematical linkings between them-- with the aim of showing that MDL is a broader principle. I disagree there, of course. :) --Abram On Tue, Jul 13, 2010 at 10:01 AM, David Jones davidher...@gmail.comwrote: Abram, Thanks for the clarification Abram. I don't have a single way to deal with uncertainty. I try not to decide on a method ahead of time because what I really want to do is analyze the problems and find a solution. But, at the same time. I have looked at the probabilistic approaches and they don't seem to be sufficient to solve the problem as they are now. So, my inclination is to use methods that don't gloss over important details. For me, the most important way of dealing with uncertainty is through explanatory-type reasoning. But, explanatory reasoning has not been well defined yet. So, the implementation is not yet clear. That's where I am now. I've begun to approach problems as follows. I try to break the problem down and answer the following questions: 1) How do we come up with or construct possible hypotheses. 2) How do we compare hypotheses to determine which is better. 3) How do we lower the uncertainty of hypotheses. 4) How do we determine the likelihood or strength of a single hypothesis all by itself. Is it sufficient on its own? With those questions in mind, the solution seems to be to break possible hypotheses down into pieces that are generally applicable. For example, in image analysis, a particular type of hypothesis might be related to 1) motion or 2) attachment relationships or 3) change or movement behavior of an object, etc. By breaking the possible hypotheses into very general pieces, you can apply them to just about any problem. With that as a tool, you can then develop general methods for resolving uncertainty of such hypotheses using explanatory scoring, consistency, and even statistical analysis. Does that make sense to you? Dave On Tue, Jul 13, 2010 at 2:29 AM, Abram Demski abramdem...@gmail.comwrote: PS-- I am not denying that statistics is applied probability theory. :) When I say they are different, what I mean is that saying I'm going to use probability theory and I'm going to use statistics tend to indicate very different approaches. Probability is a set of axioms, whereas statistics is a set of methods. The probability theory camp tends to be bayesian, whereas the stats camp tends to be frequentist. Your complaint that probability theory doesn't try to figure out why it was wrong in the 30% (or whatever) it misses is a common objection. Probability theory glosses over important detail, it encourages lazy thinking, etc. However, this all depends on the space of hypotheses being examined. Statistical methods will be prone to this objection because they are essentially narrow-AI methods: they don't *try* to search in the space of all hypotheses a human might consider. An AGI setup can and should have such a large hypothesis space. Note that AIXI is typically formulated as using a space of crisp (non-probabilistic) hypotheses, though probability theory is used to reason about them. This means no theory it considers will gloss over detail in this way: every theory completely explains the data. (I use AIXI as a convenient example, not because I agree with it.) --Abram *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com -- Abram Demski http://lo-tho.blogspot.com/ http://groups.google.com/group/one-logic *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
start making standard specification cherry cakes with standard ingredients, and standard mathematical sums with standard numbers and operations, and standard logical variables with standard meanings [and cut out any kind of et cetera] ** (And for much the same reason programs can't - aren't meant to - handle concepts. Every concept , like chair has to refer to a general class of objects embracing et ceteras - new, unspecified, yet-to-be-invented kinds of objects and ones that you simply haven't heard of yet, as well as specified, known kinds of object . Concepts are wonderful cognitive tools for embracing unspecified objects. Concepts, for example, like things, objects, actions, do something - anything all sorts of things - everything you can possibly think of even write totally new kinds of programs - anti-programs - et cetera - such concepts endow humans with massive creative freedom and scope of reference. You along with the whole of AI/AGI are effectively claiming that there is or can be a formula/program for dealing with the unknown - i.e. unknown kinds of objects. It's patent absurdity - and counter to the whole spirit of logic and rationality - in fact lunacy. You'll wonder in years to come how so smart people could be so dumb. Could think they're producing programs that can make anything - can make cars or cakes - any car or cake - when the rest of the world and his uncle can see that they're only producing very specific brands of car and cake (with very specific objects/parts). VW Beetles not cars let alone vehicles let alone forms of transportation let alone means of travel let alone universal programs. . I'm full of it? AI/AGI is full of the most amazing hype about its generality and creativity wh. you have clearly swallowed whole . Programs are simply specialist procedures for producing specialist products and procedures with specified kinds of actions and objects - they cannot deal with unspecified kinds of actions and objects, period. You won't produce any actual examples to the contrary. From: David Jones Sent: Tuesday, July 13, 2010 8:00 PM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI Correction: Mike, you are so full of it. There is a big difference between *can* and *don't*. You have no proof that programs can't handle anything you say [they] can't. On Tue, Jul 13, 2010 at 2:59 PM, David Jones davidher...@gmail.com wrote: Mike, you are so full of it. There is a big difference between *can* and *don't*. You have no proof that programs can't handle anything you say that can't. On Tue, Jul 13, 2010 at 2:36 PM, Mike Tintner tint...@blueyonder.co.uk wrote: The first thing is to acknowledge that programs *don't* handle concepts - if you think they do, you must give examples. The reasons they can't, as presently conceived, is a) concepts encase a more or less *infinite diversity of forms* (even if only applying at first to a species of object) - *chair* for example as I've demonstrated embraces a vast open-ended diversity of radically different chair forms; higher order concepts like furniture embrace ... well, it's hard to think even of the parameters, let alone the diversity of forms, here. b) concepts are *polydomain*- not just multi- but open-endedly extensible in their domains; chair for example, can also refer to a person, skin in French, two humans forming a chair to carry s.o., a prize, etc. Basically concepts have a freeform realm or sphere of reference, and you can't have a setform, preprogrammed approach to defining that realm. There's no reason however why you can't mechanically and computationally begin to instantiate the kind of freeform approach I'm proposing. The most important obstacle is the setform mindset of AGI-ers - epitomised by Dave looking at squares, moving along set lines - setform objects in setform motion - when it would be more appropriate to look at something like snakes.- freeform objects in freeform motion. Concepts also - altho this is a huge subject - are *the* language of the general programs (as distinct from specialist programs, wh. is all we have right now) that must inform an AGI. Anyone proposing a grandscale AGI project like Ben's (wh. I def. wouldn't recommend) must crack the problem of conceptualisation more or less from the beginning. I'm not aware of anyone who has any remotely viable proposals here, are you? From: Jim Bromer Sent: Tuesday, July 13, 2010 5:46 PM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI On Tue, Jul 13, 2010 at 10:07 AM, Mike Tintner tint...@blueyonder.co.uk wrote: And programs as we know them, don't and can't handle *concepts* - despite the misnomers of conceptual graphs/spaces etc wh are not concepts at all. They can't for example handle writing or shopping because these can only be expressed as flexible outlines/schemas
Re: [agi] Re: Huge Progress on the Core of AGI
, find one of them that works with *unspecified kinds of actions and objects.* (Or you can always try and explain how formulae that are clearly designed to be setform can somehow simultaneously be freeform and embrace et cetera ). There are by the same token no branches of logic and maths that work with *unspecified kinds of actions and objects.* (Mathematicians who invent new formulae have to work with and develop new kinds of objects - but normal maths can't help them do this). The whole of rationality - incl. all rational technology - only works with specified kinds of actions and objects. **One of the most basic rationales of rationality is let's stop everyone farting around making their own versions of products with their own differently specified actions and objects; let's specify/standardise the actions and objects that everyone must use. Let's start making standard specification cherry cakes with standard ingredients, and standard mathematical sums with standard numbers and operations, and standard logical variables with standard meanings [and cut out any kind of et cetera] ** (And for much the same reason programs can't - aren't meant to - handle concepts. Every concept , like chair has to refer to a general class of objects embracing et ceteras - new, unspecified, yet-to-be-invented kinds of objects and ones that you simply haven't heard of yet, as well as specified, known kinds of object . Concepts are wonderful cognitive tools for embracing unspecified objects. Concepts, for example, like things, objects, actions, do something - anything all sorts of things - everything you can possibly think of even write totally new kinds of programs - anti-programs - et cetera - such concepts endow humans with massive creative freedom and scope of reference. You along with the whole of AI/AGI are effectively claiming that there is or can be a formula/program for dealing with the unknown - i.e. unknown kinds of objects. It's patent absurdity - and counter to the whole spirit of logic and rationality - in fact lunacy. You'll wonder in years to come how so smart people could be so dumb. Could think they're producing programs that can make anything - can make cars or cakes - any car or cake - when the rest of the world and his uncle can see that they're only producing very specific brands of car and cake (with very specific objects/parts). VW Beetles not cars let alone vehicles let alone forms of transportation let alone means of travel let alone universal programs. . I'm full of it? AI/AGI is full of the most amazing hype about its generality and creativity wh. you have clearly swallowed whole . Programs are simply specialist procedures for producing specialist products and procedures with specified kinds of actions and objects - they cannot deal with unspecified kinds of actions and objects, period. You won't produce any actual examples to the contrary. *From:* David Jones davidher...@gmail.com *Sent:* Tuesday, July 13, 2010 8:00 PM *To:* agi agi@v2.listbox.com *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI Correction: Mike, you are so full of it. There is a big difference between *can* and *don't*. You have no proof that programs can't handle anything you say [they] can't. On Tue, Jul 13, 2010 at 2:59 PM, David Jones davidher...@gmail.comwrote: Mike, you are so full of it. There is a big difference between *can* and *don't*. You have no proof that programs can't handle anything you say that can't. On Tue, Jul 13, 2010 at 2:36 PM, Mike Tintner tint...@blueyonder.co.ukwrote: The first thing is to acknowledge that programs *don't* handle concepts - if you think they do, you must give examples. The reasons they can't, as presently conceived, is a) concepts encase a more or less *infinite diversity of forms* (even if only applying at first to a species of object) - *chair* for example as I've demonstrated embraces a vast open-ended diversity of radically different chair forms; higher order concepts like furniture embrace ... well, it's hard to think even of the parameters, let alone the diversity of forms, here. b) concepts are *polydomain*- not just multi- but open-endedly extensible in their domains; chair for example, can also refer to a person, skin in French, two humans forming a chair to carry s.o., a prize, etc. Basically concepts have a freeform realm or sphere of reference, and you can't have a setform, preprogrammed approach to defining that realm. There's no reason however why you can't mechanically and computationally begin to instantiate the kind of freeform approach I'm proposing. The most important obstacle is the setform mindset of AGI-ers - epitomised by Dave looking at squares, moving along set lines - setform objects in setform motion - when it would be more appropriate to look at something like snakes.- freeform objects in freeform motion. Concepts also
Re: [agi] Re: Huge Progress on the Core of AGI
Dave: The goal of the formula is to scan any unknown object How does the program define and therefore recognize object ? (And why then are you dealing with just squares if it can deal with this apparently vast and unlimited range of objects? ) If you go into detail, you'll find no program can deal with or define object. Jeez, none can recognize a chair - but now apparently they can recognize objects. What exactly does the program do? Your description is confusing. What forms are input and output? Specific examples. If I put in a drawing of overlaid circles or a cartoon face, or a Jackson Pollock, or a photo of any scene, this program will give me 3-d versions? Here's a bet - you're giving me yet more hype. From: David Jones Sent: Wednesday, July 14, 2010 1:32 AM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI I'm not even going to read your whole email. I'll give you a great example of a formula handling unknown objects. The goal of the formula is to scan any unknown object and produce a 3D model of it using laser scanning. The objects are unknown, but that doesn't mean you can't handle unknown inputs. They all have things in common. Objects all have surfaces (at least the vast majority). So, whatever methods you can apply to analyze object surfaces, will work for the vast majority of objects. So, you *CAN* handle unknown objects. The same type of solution can be applied to many other problems, including AGI. The complete properties of the object or concept may be unknown, but the components that can be used to describe it are usually known. Your claim is baseless. Dave On Tue, Jul 13, 2010 at 7:34 PM, Mike Tintner tint...@blueyonder.co.uk wrote: Dave:You have no proof that programs can't handle anything you say that can't Sure I do. **There is no such thing as a formula (or program as we currently understand it) that can or is meant to handle UNSPECIFIED, (ESP NEW, UNKNOWN) KINDS OF ACTIONS AND OBJECTS** Every program is essentially a formula for a set form activity which directs how to take a closed set of **specified kinds of actions and objects** - e,g, a + b + c + d + = [take an a and a b and a c and a d ..] in order to produce set forms of products and procedures - (set combinations of those a,b,c,and d actions and objects) A recipe that specifies a set kind of cherry cake with set ingredients. [GA's, if you're wondering, are merely glorified recipes for mixing and endlessly remixing the same set of specific ingredients. Even random programs work with specified actions and objects.] There is no formula or program that says: take an a and a b and a c oh, and something else - a certain 'je ne sais quoi' - I don't know what it is, but you may be able to recognize it when you find it.Just keep looking There is no formula of the form A + B + C + D + ETC. = [ETC.= et cetera/some other unspecified things ] still less A + B + C + D + ETC ^ETC = [some other things x some other operations] That, I trust you will agree, is a contradiction of a formula and a program - more like an anti-formula/program. There are no et cetera formulas, and no logical or mathematical symbols for etc are there? But to be creative and produce new kinds of products and procedures, small and trivial as well as large, you have to be able to work with and find just such **unspecified (and esp. new) kinds of actions and objects.** - et ceteras. If you want to develop a new kind of fruit cake or new kind of cherry cake or even make a slightly different stew or more or less the same cherry cake but without the maraschinos wh. have gone missing, then you have to be able to work with and find new kinds of ingredients and mix/prepare them in new kinds of ways - new exotic kinds of fruit and other foods in new mixes and mashes and fermentations - et cetera x et cetera. If you want to develop a new kind of word or alphabet, (or develop a new kind of formula as I just did above, then you have to be able to work with and find new kinds of letters and symbols and abbreviations (as I just did) - etc. If you even want to engage with any part of the real world at the most mundane level - walk down a street say - you have to be able to be creative and deal with new unspecified kinds of actions and objects that you may find there - because you can't predict what that street will contain. And to be creative, you do indeed have to start not from a perfectly, fully specified formula, but something more like an et cetera anti-formula -a v. imperfectly and partially specified *conceptual paradigm*, such as -: if you want to make a new different kind of cake/ house/ structure, you'll probably need an a and a b and a c but you'll also need some other things - some 'je ne sais quoi's - I don't know what they are, -- but you may be able to recognize them when you
Re: [agi] Re: Huge Progress on the Core of AGI
David, I tend to think of probability theory and statistics as different things. I'd agree that statistics is not enough for AGI, but in contrast I think probability theory is a pretty good foundation. Bayesianism to me provides a sound way of integrating the elegance/utility tradeoff of explanation-based reasoning into the basic fabric of the uncertainty calculus. Others advocate different sorts of uncertainty than probabilities, but so far what I've seen indicates more a lack of ability to apply probability theory than a need for a new type of uncertainty. What other methods do you favor for dealing with these things? --Abram On Sun, Jul 11, 2010 at 12:30 PM, David Jones davidher...@gmail.com wrote: Thanks Abram, I know that probability is one approach. But there are many problems with using it in actual implementations. I know a lot of people will be angered by that statement and retort with all the successes that they have had using probability. But, the truth is that you can solve the problems many ways and every way has its pros and cons. I personally believe that probability has unacceptable cons if used all by itself. It must only be used when it is the best tool for the task. I do plan to use some probability within my approach. But only when it makes sense to do so. I do not believe in completely statistical solutions or completely Bayesian machine learning alone. A good example of when I might use it is when a particular hypothesis predicts something with 70% accuracy, well it may be better than any other hypothesis we can come up with so far. So, we may use that hypothesis. But, the 30% unexplained errors should be explained if possible with the resources and algorithms available, if at all possible. This is where my method differs from statistical methods. I want to build algorithms that resolve the 30% and explain it. For many problems, there are rules and knowledge that will solve them effectively. Probability should only be used when you cannot find a more accurate solution. Basically we should use probability when we don't know the factors involved, can't find any rules to explain the phenomena or we don't have the time and resources to figure it out. So you must simply guess at the most probable event without any rules for figuring out which event is more applicable under the current circumstances. So, in summary, probability definitely has its place. I just think that explanatory reasoning and other more accurate methods should be preferred whenever possible. Regarding learning the knowledge being the bigger problem, I completely agree. That is why I think it is so important to develop machine learning that can learn by direct observation of the environment. Without that, it is practically impossible to gather the knowledge required for AGI-type applications. We can learn this knowledge by analyzing the world automatically and generally through video. My step by step approach for learning and then applying the knowledge for agi is as follows: 1) Understand and learn about the environment(through Computer Vision for now and other sensory perceptions in the future) 2) learn about your own actions and how they affect the environment 3) learn about language and how it is associated with or related to the environment. 4) learn goals from language(such as through dedicated inputs). 5) Goal pursuit 6) Other Miscellaneous capabilities as needed Dave On Sat, Jul 10, 2010 at 8:40 PM, Abram Demski abramdem...@gmail.comwrote: David, Sorry for the slow response. I agree completely about expectations vs predictions, though I wouldn't use that terminology to make the distinction (since the two terms are near-synonyms in English, and I'm not aware of any technical definitions that are common in the literature). This is why I think probability theory is necessary: to formalize this idea of expectations. I also agree that it's good to utilize previous knowledge. However, I think existing AI research has tackled this over and over; learning that knowledge is the bigger problem. --Abram *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com -- Abram Demski http://lo-tho.blogspot.com/ http://groups.google.com/group/one-logic --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Thanks Abram, I know that probability is one approach. But there are many problems with using it in actual implementations. I know a lot of people will be angered by that statement and retort with all the successes that they have had using probability. But, the truth is that you can solve the problems many ways and every way has its pros and cons. I personally believe that probability has unacceptable cons if used all by itself. It must only be used when it is the best tool for the task. I do plan to use some probability within my approach. But only when it makes sense to do so. I do not believe in completely statistical solutions or completely Bayesian machine learning alone. A good example of when I might use it is when a particular hypothesis predicts something with 70% accuracy, well it may be better than any other hypothesis we can come up with so far. So, we may use that hypothesis. But, the 30% unexplained errors should be explained if possible with the resources and algorithms available, if at all possible. This is where my method differs from statistical methods. I want to build algorithms that resolve the 30% and explain it. For many problems, there are rules and knowledge that will solve them effectively. Probability should only be used when you cannot find a more accurate solution. Basically we should use probability when we don't know the factors involved, can't find any rules to explain the phenomena or we don't have the time and resources to figure it out. So you must simply guess at the most probable event without any rules for figuring out which event is more applicable under the current circumstances. So, in summary, probability definitely has its place. I just think that explanatory reasoning and other more accurate methods should be preferred whenever possible. Regarding learning the knowledge being the bigger problem, I completely agree. That is why I think it is so important to develop machine learning that can learn by direct observation of the environment. Without that, it is practically impossible to gather the knowledge required for AGI-type applications. We can learn this knowledge by analyzing the world automatically and generally through video. My step by step approach for learning and then applying the knowledge for agi is as follows: 1) Understand and learn about the environment(through Computer Vision for now and other sensory perceptions in the future) 2) learn about your own actions and how they affect the environment 3) learn about language and how it is associated with or related to the environment. 4) learn goals from language(such as through dedicated inputs). 5) Goal pursuit 6) Other Miscellaneous capabilities as needed Dave On Sat, Jul 10, 2010 at 8:40 PM, Abram Demski abramdem...@gmail.com wrote: David, Sorry for the slow response. I agree completely about expectations vs predictions, though I wouldn't use that terminology to make the distinction (since the two terms are near-synonyms in English, and I'm not aware of any technical definitions that are common in the literature). This is why I think probability theory is necessary: to formalize this idea of expectations. I also agree that it's good to utilize previous knowledge. However, I think existing AI research has tackled this over and over; learning that knowledge is the bigger problem. --Abram --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Mike, Using the image itself as a template to match is possible. In fact it has been done before. But it suffers from several problems that would also need solving. 1) Images are 2D. I assume you are also referring to 2D outlines. Real objects are 3D. So, you're going to have to infer the shape of the object... which means you are no longer actually transforming the image itself. You are transforming a model of the image, which would have points, curves, dimensions, etc. Basically, a mathematical shape :) No matter how much you disapprove of encoding info, sometimes it makes sense to do it. 2) Creating the first outline and figuring out what to outline is not trivial at all. So, this method can only be used after you can do that. There is a lot more uncertainty involved here than you seem to realize. First, how do you even determine the outline? That is an unsolved problem. So, not only will you have to try many transformations with the right outline, you have to try many with wrong outlines, increase the possibilities (exponentially?). It looks like you need a way to score possibilities and decide which ones to try. 3) rock is a word and words are always learned by induction along with other types of reasoning before we can even consider it a type of object. So, you are starting with a somewhat unrepresentative or artificial problem. 4) Even the same rock can look very different from different perspectives. In fact, how do you even match the same rock? Please describe how your system would do this. It is not trivial at all. And you will soon see that there is an extremely large amount of uncertainty. Dealing with this type of uncertainty is the central problem of AGI. The central problem is not fluid schemas.Even if I used this method, I would be stuck with the same exact uncertainty problems. So, you're not going to get passed them like this. The same research on explanatory and non-monotonic type reasoning must still be done. 5) what is a fluid transform? You can't just throw out words. Please define it. You are going to realize that your representation is pretty much geometric anyway. Regardless, it will likely be equivalent. Are you going to try every possible transformation? Nope. That would be impossible. So, how do you decide what transformations to try? When is a transformation too large of a change to consider it the same rock? When is it too large to consider it a different rock? 6) Are you seriously going to transform every object you've every tried to outline? This is going to be prohibitively costly in terms of processing. Especially because you haven't defined how you're going to decide what to transform and what not to. So, before you can even use this algorithm, you're going to have to use something else to decide what is a possible candidate and what is not. On Fri, Jul 9, 2010 at 6:42 PM, Mike Tintner tint...@blueyonder.co.ukwrote: Now let's see **you** answer a question. Tell me how any algorithmic/mathematical approach of any kind actual or in pure principle can be applied to recognize raindrops falling down a pane - and to predict their movement? Like I've said many times before, we can't predict everything, and we certainly shouldn't try. But http://www.pond5.com/stock-footage/263778/beautiful-rain-drops.html or to recognize a rock? http://www.handprint.com/HP/WCL/IMG/LPR/adams.jpg or a [filled] shopping bag? http://www.abc.net.au/reslib/200801/r215609_837743.jpg http://www.sustainableisgood.com/photos/uncategorized/2007/03/29/shoppingbags.jpg http://thegogreenblog.com/wp-content/uploads/2007/12/plastic_shopping_bag.jpg or if you want a real killer, google some vid clips of amoebas in oozing motion? PS In every case, I suggest, the brain observes different principles of transformation - for the most part unconsciously. And they can be of various kinds not just direct natural transformations, of course. It's possible, it occurs to me, that the principle that applies to rocks might just be something like whatever can be carved out of stone --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
I accidentally pressed something and it sent it early... this is a finished version: Mike, Using the image itself as a template to match is possible. In fact it has been done before. But it suffers from several problems that would also need solving. 1) Images are 2D. I assume you are also referring to 2D outlines. Real objects are 3D. So, you're going to have to infer the shape of the object... which means you are no longer actually transforming the image itself. You are transforming a model of the image, which would have points, curves, dimensions, etc. Basically, a mathematical shape :) No matter how much you disapprove of encoding info, sometimes it makes sense to do it. 2) Creating the first outline and figuring out what to outline is not trivial at all. So, this method can only be used after you can do that. There is a lot more uncertainty involved here than you seem to realize. First, how do you even determine the outline? That is an unsolved problem. So, not only will you have to try many transformations with the right outline, you have to try many with wrong outlines, increase the possibilities (exponentially?). It looks like you need a way to score possibilities and decide which ones to try. 3) rock is a word and words are always learned by induction along with other types of reasoning before we can even consider it a type of object. So, you are starting with a somewhat unrepresentative or artificial problem. 4) Even the same rock can look very different from different perspectives. In fact, how do you even match the same rock? Please describe how your system would do this. It is not trivial at all. And you will soon see that there is an extremely large amount of uncertainty. Dealing with this type of uncertainty is the central problem of AGI. The central problem is not fluid schemas.Even if I used this method, I would be stuck with the same exact uncertainty problems. So, you're not going to get passed them like this. The same research on explanatory and non-monotonic type reasoning must still be done. 5) what is a fluid transform? You can't just throw out words. Please define it. You are going to realize that your representation is pretty much geometric anyway. Regardless, it will likely be equivalent. Are you going to try every possible transformation? Nope. That would be impossible. So, how do you decide what transformations to try? When is a transformation too large of a change to consider it the same rock? When is it too large to consider it a different rock? 6) Are you seriously going to transform every object you've every tried to outline? This is going to be prohibitively costly in terms of processing. Especially because you haven't defined how you're going to decide what to transform and what not to. So, before you can even use this algorithm, you're going to have to use something else to decide what is a possible candidate and what is not. On Fri, Jul 9, 2010 at 6:42 PM, Mike Tintner tint...@blueyonder.co.ukwrote: Now let's see **you** answer a question. Tell me how any algorithmic/mathematical approach of any kind actual or in pure principle can be applied to recognize raindrops falling down a pane - and to predict their movement? Like I've said many times before, we can't predict everything, and we certainly shouldn't try. But we should expect what might happen. Raindrops are probably recognized as an unexpected distortion when it occurs on a window. When its not on a window, it is still a sort of distortion of brightness and just a small object with different contrast. You're right that geometric definitions are not the right way to recognize that. It would have to use a different method to remember the features/properties of raindrops and how they appeared, such as the contrast, size, quantity, location, context, etc. http://www.pond5.com/stock-footage/263778/beautiful-rain-drops.html or to recognize a rock? A specific rock could be recognized with geometric definitions. Texture is certainly important, size, context (very important), etc. If we are talking about the category rock, that's different than the instance of a rock. The category of a rock probably needs a description of the types of properties that rocks have, such as the types of curves, texture, sizes, interactions, behavior, etc. Exactly how you do it, I haven't decided. I'm not at that point yet. http://www.handprint.com/HP/WCL/IMG/LPR/adams.jpg or a [filled] shopping bag? same as the rock. http://www.abc.net.au/reslib/200801/r215609_837743.jpg http://www.sustainableisgood.com/photos/uncategorized/2007/03/29/shoppingbags.jpg http://thegogreenblog.com/wp-content/uploads/2007/12/plastic_shopping_bag.jpg or if you want a real killer, google some vid clips of amoebas in oozing motion? same. PS In every case, I suggest, the brain observes different principles of transformation - for the most part unconsciously. And they can be of various kinds not just direct natural
Re: [agi] Re: Huge Progress on the Core of AGI
. You are ironically misunderstanding the very foundations and rationale of geometry. Geometry - with its set form forms - was invented precisely because mathematicians didn't like the freeform nature of the world - wanted to create set forms (in the footsteps of the rational technologists who preceded them) - that they could control and reduce to formulae and reproduce with ease. Freeform rocks are a lot more complex to draw and make and reproduce than set form rectangular bricks. Set forms are not free forms. They are the opposite. (And while you and others will continue to *claim* in theory absolute setform=freeform nonsense, you will in practice always, always stick to setform objects. Some part of you knows the v.obvious truth ). From: David Jones Sent: Saturday, July 10, 2010 3:51 PM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI Mike, Using the image itself as a template to match is possible. In fact it has been done before. But it suffers from several problems that would also need solving. 1) Images are 2D. I assume you are also referring to 2D outlines. Real objects are 3D. So, you're going to have to infer the shape of the object... which means you are no longer actually transforming the image itself. You are transforming a model of the image, which would have points, curves, dimensions, etc. Basically, a mathematical shape :) No matter how much you disapprove of encoding info, sometimes it makes sense to do it. 2) Creating the first outline and figuring out what to outline is not trivial at all. So, this method can only be used after you can do that. There is a lot more uncertainty involved here than you seem to realize. First, how do you even determine the outline? That is an unsolved problem. So, not only will you have to try many transformations with the right outline, you have to try many with wrong outlines, increase the possibilities (exponentially?). It looks like you need a way to score possibilities and decide which ones to try. 3) rock is a word and words are always learned by induction along with other types of reasoning before we can even consider it a type of object. So, you are starting with a somewhat unrepresentative or artificial problem. 4) Even the same rock can look very different from different perspectives. In fact, how do you even match the same rock? Please describe how your system would do this. It is not trivial at all. And you will soon see that there is an extremely large amount of uncertainty. Dealing with this type of uncertainty is the central problem of AGI. The central problem is not fluid schemas.Even if I used this method, I would be stuck with the same exact uncertainty problems. So, you're not going to get passed them like this. The same research on explanatory and non-monotonic type reasoning must still be done. 5) what is a fluid transform? You can't just throw out words. Please define it. You are going to realize that your representation is pretty much geometric anyway. Regardless, it will likely be equivalent. Are you going to try every possible transformation? Nope. That would be impossible. So, how do you decide what transformations to try? When is a transformation too large of a change to consider it the same rock? When is it too large to consider it a different rock? 6) Are you seriously going to transform every object you've every tried to outline? This is going to be prohibitively costly in terms of processing. Especially because you haven't defined how you're going to decide what to transform and what not to. So, before you can even use this algorithm, you're going to have to use something else to decide what is a possible candidate and what is not. On Fri, Jul 9, 2010 at 6:42 PM, Mike Tintner tint...@blueyonder.co.uk wrote: Now let's see **you** answer a question. Tell me how any algorithmic/mathematical approach of any kind actual or in pure principle can be applied to recognize raindrops falling down a pane - and to predict their movement? Like I've said many times before, we can't predict everything, and we certainly shouldn't try. But http://www.pond5.com/stock-footage/263778/beautiful-rain-drops.html or to recognize a rock? http://www.handprint.com/HP/WCL/IMG/LPR/adams.jpg or a [filled] shopping bag? http://www.abc.net.au/reslib/200801/r215609_837743.jpg http://www.sustainableisgood.com/photos/uncategorized/2007/03/29/shoppingbags.jpg http://thegogreenblog.com/wp-content/uploads/2007/12/plastic_shopping_bag.jpg or if you want a real killer, google some vid clips of amoebas in oozing motion? PS In every case, I suggest, the brain observes different principles of transformation - for the most part unconsciously. And they can be of various kinds not just direct natural transformations, of course. It's possible, it occurs to me, that the principle that applies to rocks might just be something like whatever can be carved out of stone
Re: [agi] Re: Huge Progress on the Core of AGI
or reason. Here is a graphic demonstration of what you're trying to claim - in effect, you're saying geometry can define 'a piece of plasticine' [and by extension any standard transformation of a piece of plasticine as in a playroom] That's a nonsense. A piece of plasticine is a **freeform** object - it can be transformed into an unlimited diversity of shapes/forms (albeit with constraints). Formulae - the formulae of geometry - can only define **set form** objects, with a precise form and structure. There are no exceptions. Black is not white. Homogeneous is not heterogeneous. Set form is not freeform. All the objects I list - all irregular objects - are freeform objects. You are ironically misunderstanding the very foundations and rationale of geometry. Geometry - with its set form forms - was invented precisely because mathematicians didn't like the freeform nature of the world - wanted to create set forms (in the footsteps of the rational technologists who preceded them) - that they could control and reduce to formulae and reproduce with ease. Freeform rocks are a lot more complex to draw and make and reproduce than set form rectangular bricks. Set forms are not free forms. They are the opposite. (And while you and others will continue to *claim* in theory absolute setform=freeform nonsense, you will in practice always, always stick to setform objects. Some part of you knows the v.obvious truth ). *From:* David Jones davidher...@gmail.com *Sent:* Saturday, July 10, 2010 3:51 PM *To:* agi agi@v2.listbox.com *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI Mike, Using the image itself as a template to match is possible. In fact it has been done before. But it suffers from several problems that would also need solving. 1) Images are 2D. I assume you are also referring to 2D outlines. Real objects are 3D. So, you're going to have to infer the shape of the object... which means you are no longer actually transforming the image itself. You are transforming a model of the image, which would have points, curves, dimensions, etc. Basically, a mathematical shape :) No matter how much you disapprove of encoding info, sometimes it makes sense to do it. 2) Creating the first outline and figuring out what to outline is not trivial at all. So, this method can only be used after you can do that. There is a lot more uncertainty involved here than you seem to realize. First, how do you even determine the outline? That is an unsolved problem. So, not only will you have to try many transformations with the right outline, you have to try many with wrong outlines, increase the possibilities (exponentially?). It looks like you need a way to score possibilities and decide which ones to try. 3) rock is a word and words are always learned by induction along with other types of reasoning before we can even consider it a type of object. So, you are starting with a somewhat unrepresentative or artificial problem. 4) Even the same rock can look very different from different perspectives. In fact, how do you even match the same rock? Please describe how your system would do this. It is not trivial at all. And you will soon see that there is an extremely large amount of uncertainty. Dealing with this type of uncertainty is the central problem of AGI. The central problem is not fluid schemas.Even if I used this method, I would be stuck with the same exact uncertainty problems. So, you're not going to get passed them like this. The same research on explanatory and non-monotonic type reasoning must still be done. 5) what is a fluid transform? You can't just throw out words. Please define it. You are going to realize that your representation is pretty much geometric anyway. Regardless, it will likely be equivalent. Are you going to try every possible transformation? Nope. That would be impossible. So, how do you decide what transformations to try? When is a transformation too large of a change to consider it the same rock? When is it too large to consider it a different rock? 6) Are you seriously going to transform every object you've every tried to outline? This is going to be prohibitively costly in terms of processing. Especially because you haven't defined how you're going to decide what to transform and what not to. So, before you can even use this algorithm, you're going to have to use something else to decide what is a possible candidate and what is not. On Fri, Jul 9, 2010 at 6:42 PM, Mike Tintner tint...@blueyonder.co.ukwrote: Now let's see **you** answer a question. Tell me how any algorithmic/mathematical approach of any kind actual or in pure principle can be applied to recognize raindrops falling down a pane - and to predict their movement? Like I've said many times before, we can't predict everything, and we certainly shouldn't try. But http://www.pond5.com/stock-footage/263778/beautiful-rain-drops.html
Re: [agi] Re: Huge Progress on the Core of AGI
Dave:You can't solve the problems with your approach either This is based on knowledge of what examples? Zero? I have given you one instance of s.o. [a technologist not a philosopher like me] who is if only in broad principle, trying to proceed in a non-encoding, analog-comparison direction. There must be others who are however crudely trying and considering what can be broadly classified as analog approaches. How much do you know, or have you even thought about such approaches? [Of course, computing doesn't have to be either/or analog-digital but can be both] My point 6) BTW is irrefutable, completely irrefutable, and puts a finger bang on why geometry obviously cannot cope with real objects, ( although I can and must, do a much more extensive job of exposition). From: David Jones Sent: Saturday, July 10, 2010 5:44 PM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI Mike, Your claim that you have to reject encoded and simpler descriptions of the world to solve AGI is unfounded. You can't solve the problems with your approach either. So, this is argument is going no where. You won't admit that you're faced with the same problems no matter how you approach it. I do admit that your ideas on transformations can be useful, but not at all by themselves and definitely not in the absense of math or geometry. They also are certainly not a solution to any of the problems I'm considering. Regardless, we both face the same problems of uncertainty and encoding. Dave On Sat, Jul 10, 2010 at 12:09 PM, Mike Tintner tint...@blueyonder.co.uk wrote: General point: you keep talking as if algorithms *work* for visual AGI - they don't - they simply haven't. Unless you take a set of objects carefully chosen to be closely aligned and close in overall form- and then it's not AGI. But in general the algorithmic patterned approach has been a bust - because natural objects as well as clusters of diverse artificial objects are not patterned. You can see this. It's actually obvious if you care to look. Re 2) It may well be that you've gotta have a body to move around to different POV's for objects, and to touch those objects and use another sense or two to determine the outlines. I haven't thought all this through at all, but you've got to realise that the whole of evolution tells you that sensing the world is a *multi*-*common*-sense affair, and not a single one. You're trying to isolate a sense - and insisting that that's the only way things can be done, even while you along with others are continuously failing. Respect and learn from evolution. Re 1) I again haven't thought this through, but it sounds like you're again assuming that your AGI vision must automatically meet adult, educated criteria. Presumably it takes time to perceive and appreciate the 3-d ness of objects.And 3-d is a mathematical, highly evolved idea. Yes, objects are solid, but they were never 3-d until geometry was invented a mere 2,000 or so years ago. Primitive people see very differently from modern people. Read McLuhan on this (v. worthwhile generally for s.o. like you). And no, rocks are simply *not* mathematical objects. There are no rocks in geometry period. *You* can use a mathematically-based program to draw a rock, but that's down to your AGI mind, not the mathematics. [Look BTW how you approach all these things - you always start mathematically - but it is a simple fact that maths. was invented only a few thousand years ago, animals and humans happily existed and saw the world without it, and maths objects are **abstract fictions** - they do not exist in the real world, as maths itself will tell you - and you have to be able to *see* that - to see and know that there is a diff. between a postulated math square and any concrete, real object version of a square. What visual processing are you going to use to tell the difference between a math and a real object? Are you saying you can use maths to do that? Non-sense. 3) I am starting with simple natural irregular objects. I can recognize that rocks may have too large a range of irregularity for first visual perception. (It'd be v.g. to know how soon infants recognize them). Maybe then you need something with a narrower range like shopping bags. I'd again study the development of infant perception - that will give you the best ideas re what to start with. But what's vital is that your objects be natural and irregular, not narrow AI formulaic squares. 5) A fluid transform is er a fluid transform. What are all the ways a raindrop as per the vid can transform into a different form - all the ways that the outline of the drop can continuously reshape. Jeez they're pretty well infinite, except that they're constrained. The drop isn't suddenly going to become a square or rectilinear. And you can presumably invent new lines/fields of transformation wh. could turn out to be true. But if you think
Re: [agi] Re: Huge Progress on the Core of AGI
On Sat, Jul 10, 2010 at 5:02 PM, Mike Tintner tint...@blueyonder.co.ukwrote: Dave:You can't solve the problems with your approach either This is based on knowledge of what examples? Zero? It is based on the fact that you have refused to show how you deal with uncertainty. You haven't even conceded that there is uncertainty. I know for a fact that your method cannot solve the uncertainty, because it doesn't even consider that there might be any uncertainty. It is not a solution to anything. It is a mere suggestion of a way to compare objects. It isn't even a way to match them! So, when you're done comparing, your method only says it is different by this much. Well, what the hell does that do for you? Nothing at all. So, clearly my statement that your approach doesn't solve anything is well based. Yet, your claim that my approach is wrong is very poorly based. Your main disagreement is my simplification of the problem. That doesn't mean anything. I can go back and forth between the simple version and the more complex version whenever I want to after I've gained understanding through experiments on the simpler version. There is nothing wrong with the approach I am taking. It is completely necessary to study the nature of the problems and the principles that can solve the problems. I have given you one instance of s.o. [a technologist not a philosopher like me] who is if only in broad principle, trying to proceed in a non-encoding, analog-comparison direction. There must be others who are however crudely trying and considering what can be broadly classified as analog approaches. How much do you know, or have you even thought about such approaches? [Of course, computing doesn't have to be either/or analog-digital but can be both] the approaches are equivalent. I don't even say that my approach is digital. If I find a reason to use an analog approach, I'll use it. But so far, I can't find any reason to do so. BTW, you would be wiser to realize that analog can likely be well represented by digital encoding for the problems we are discussing. I see absolutely no reason to think analog is better than digital for any of these problems. You simply have a bias against my approach. And bias is not sufficient reason to disagree with me. My point 6) BTW is irrefutable, completely irrefutable, and puts a finger bang on why geometry obviously cannot cope with real objects, ( although I can and must, do a much more extensive job of exposition). That is ridiculous. First of all, a plastic bag can easily be represented geometrically as a mesh with length constraints and connectivity constraints. Of course it doesn't represent every possible transformation of the bag. It doesn't even make sense to store such a representation. In fact, its not possible. Your claim that geometry can't represent a plastic bag is downright dumb and trivially refutable. You could easily use your own ideas then to transform the mesh for matching, although I still claim this is not the right way to always match objects. In fact, I would dare say it is often the wrong way to match objects because of the processing and time cost. --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
David, Sorry for the slow response. I agree completely about expectations vs predictions, though I wouldn't use that terminology to make the distinction (since the two terms are near-synonyms in English, and I'm not aware of any technical definitions that are common in the literature). This is why I think probability theory is necessary: to formalize this idea of expectations. I also agree that it's good to utilize previous knowledge. However, I think existing AI research has tackled this over and over; learning that knowledge is the bigger problem. --Abram On Thu, Jul 8, 2010 at 6:32 PM, David Jones davidher...@gmail.com wrote: Abram, Yeah, I would have to object for a couple reasons. First, prediction requires previous knowledge. So, even if you make that your primary goal, you're still going to have my research goals as the prerequisite: which are to process visual information in a more general way and learn about the environment in a more general way. Second, not everything is predictable. Certainly, we should not try to predict everything. Only after we have experience, can we actually predict anything. Even then, it's not precise prediction, like predicting the next frame of a video. It's more like having knowledge of what is quite likely to occur, or maybe an approximate prediction, but not guaranteed in the least. For example, based on previous experience, striking a match will light it. But, sometimes it doesn't light, and that too is expected to occur sometimes. We definitely don't predict the next image we'll see when it lights though. We just have expectations for what we might see and this helps us interpret the image effectively. We should try to expect certain outcomes or possible outcomes though. You could call that prediction, but it's not quite the same. The things we are more likely to see should be attempted as an explanation first and preferred if not given a reason to think otherwise. Dave On Thu, Jul 8, 2010 at 5:51 PM, Abram Demski abramdem...@gmail.comwrote: David, How I'd present the problem would be predict the next frame, or more generally predict a specified portion of video given a different portion. Do you object to this approach? --Abram On Thu, Jul 8, 2010 at 5:30 PM, David Jones davidher...@gmail.comwrote: It may not be possible to create a learning algorithm that can learn how to generally process images and other general AGI problems. This is for the same reason that completely general vision algorithms are likely impossible. I think that figuring out how to process sensory information intelligently requires either 1) impossible amounts of processing or 2) intelligent design and understanding by us. Maybe you could be more specific about how general learning algorithms would solve problems such as the one I'm tackling. But, I am extremely doubtful it can be done because the problems cannot be effectively described to such an algorithm. If you can't describe the problem, it can't search for solutions. If it can't search for solutions, you're basically stuck with evolution type algorithms, which require prohibitory amounts of processing. The reason that vision is so important for learning is that sensory perception is the foundation required to learn everything else. If you don't start with a foundational problem like this, you won't be representing the real nature of general intelligence problems that require extensive knowledge of the world to solve properly. Sensory perception is required to learn the information needed to understand everything else. Text and language for example, require extensive knowledge about the world to understand and especially to learn about. If you start with general learning algorithms on these unrepresentative problems, you will get stuck as we already have. So, it still makes a lot of sense to start with a concrete problem that does not require extensive amounts of previous knowledge to start learning. In fact, AGI requires that you not pre-program the AI with such extensive knowledge. So, lots of people are working on general learning algorithms that are unrepresentative of what is required for AGI because the algorithms don't have the knowledge needed to learn what they are trying to learn about. Regardless of how you look at it, my approach is definitely the right approach to AGI in my opinion. On Thu, Jul 8, 2010 at 5:02 PM, Abram Demski abramdem...@gmail.comwrote: David, That's why, imho, the rules need to be *learned* (and, when need be, unlearned). IE, what we need to work on is general learning algorithms, not general visual processing algorithms. As you say, there's not even such a thing as a general visual processing algorithm. Learning algorithms suffer similar environment-dependence, but (by their nature) not as severe... --Abram On Thu, Jul 8, 2010 at 3:17 PM, David Jones davidher...@gmail.comwrote: I've learned something really
Re: [agi] Re: Huge Progress on the Core of AGI
Mike, On Thu, Jul 8, 2010 at 6:52 PM, Mike Tintner tint...@blueyonder.co.ukwrote: Isn't the first problem simply to differentiate the objects in a scene? Well, that is part of the movement problem. If you say something moved, you are also saying that the objects in the two or more video frames are the same instance. (Maybe the most important movement to begin with is not the movement of the object, but of the viewer changing their POV if only slightly - wh. won't be a factor if you're looking at a screen) Maybe, but this problem becomes kind of trivial in a 2D environment, assuming you don't allow rotation of the POV. Moving the POV would simply translate all the objects linearly. If you make it a 3D environment, it becomes significantly more complicated. I could work on 3D, which I will, but I'm not sure I should start there. I probably should consider it though and see what complications it adds to the problem and how they might be solved. And that I presume comes down to being able to put a crude, highly tentative, and fluid outline round them (something that won't be neces. if you're dealing with squares?) . Without knowing v. little if anything about what kind of objects they are. As an infant most likely does. {See infants' drawings and how they evolve v. gradually from a v. crude outline blob that at first can represent anything - that I'm suggesting is a replay of how visual perception developed). The fluid outline or image schema is arguably the basis of all intelligence - just about everything AGI is based on it. You need an outline for instance not just of objects, but of where you're going, and what you're going to try and do - if you want to survive in the real world. Schemas connect everything AGI. And it's not a matter of choice - first you have to have an outline/sense of the whole - whatever it is - before you can start filling in the parts. Well, this is the question. The solution is underdetermined, which means that a right solution is not possible to know with complete certainty. So, you may take the approach of using contours to match objects, but that is certainly not the only way to approach the problem. Yes, you have to use local features in the image to group pixels together in some way. I agree with you there. Is using contours the right way? Maybe, but not by itself. You have to define the problem a little better than just saying that we need to construct an outline. The real problem/question is this: How do you determine the uncertainty of a hypothesis, lower it and also determine how good a hypothesis is, especially in comparison to other hypotheses? So, in this case, we are trying to use an outline comparison to determine the best match hypotheses between objects. But, that doesn't define how you score alternative hypotheses. That also is certainly not the only way to do it. You could use the details within the outline too. In fact, in some situations, this would be required to disambiguate between the possible hypotheses. P.S. It would be mindblowingly foolish BTW to think you can do better than the way an infant learns to see - that's an awfully big visual section of the brain there, and it works. I'm not trying to do better than the human brain. I am trying to solve the same problems that the brain solves in a different way, sometimes better than the brain, sometimes worse, sometimes equivalently. What would be foolish is to assume the only way to duplicate general intelligence is to copy the human brain. By taking this approach, you are forced to reverse engineer and understand something that is extremely difficult to reverse engineer. In addition, a solution that using the brain's design may not be economically feasible. So, approaching the problem by copying the human brain has additional risks. You may end up figuring out how the brain works and not be able to use it. In addition might not end up with a good understanding of what other solutions might be possible. Dave --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Couple of quick comments (I'm still thinking about all this - but I'm confident everything AGI links up here). A fluid schema is arguably by its v. nature a method - a trial and error, arguably universal method. It links vision to the hand or any effector. Handling objects also is based on fluid schemas - you put out a fluid adjustably-shaped hand to grasp things. And even if you don't have hands, like a worm, and must grasp things with your body, and must grasp the ground under which you move, then too you must use fluid body schemas/maps. All concepts - the basis of language and before language, all intelligence - are also almost certainly fluid schemas (and not as you suggested, patterns). All creative problemsolving begins from concepts of what you want to do (and not formulae or algorithms as in rational problemsolving). Any suggestion to the contrary will not, I suggest, bear the slightest serious examination. **Fluid schemas/concepts/fluid outlines are attempts-to-grasp-things - gropings.** Point 2 : I'd relook at your assumptions in all your musings - my impression is they all assume, unwittingly, an *adult* POV - the view of s.o. who already knows how to see - as distinct from an infant who is just learning to see and get to grips with an extremely blurred world, (even more blurred and confusing, I wouldn't be surprised, than that Prakash video). You're unwittingly employing top down, fully-formed-intelligence assumptions even while overtly trying to produce a learning system - you're looking for what an adult wants to know, rather than what an infant starting-from-almost-no-knowledge-of-the-world wants to know. If you accept the point in any way, major philosophical rethinking is required. From: David Jones Sent: Friday, July 09, 2010 1:56 PM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI Mike, On Thu, Jul 8, 2010 at 6:52 PM, Mike Tintner tint...@blueyonder.co.uk wrote: Isn't the first problem simply to differentiate the objects in a scene? Well, that is part of the movement problem. If you say something moved, you are also saying that the objects in the two or more video frames are the same instance. (Maybe the most important movement to begin with is not the movement of the object, but of the viewer changing their POV if only slightly - wh. won't be a factor if you're looking at a screen) Maybe, but this problem becomes kind of trivial in a 2D environment, assuming you don't allow rotation of the POV. Moving the POV would simply translate all the objects linearly. If you make it a 3D environment, it becomes significantly more complicated. I could work on 3D, which I will, but I'm not sure I should start there. I probably should consider it though and see what complications it adds to the problem and how they might be solved. And that I presume comes down to being able to put a crude, highly tentative, and fluid outline round them (something that won't be neces. if you're dealing with squares?) . Without knowing v. little if anything about what kind of objects they are. As an infant most likely does. {See infants' drawings and how they evolve v. gradually from a v. crude outline blob that at first can represent anything - that I'm suggesting is a replay of how visual perception developed). The fluid outline or image schema is arguably the basis of all intelligence - just about everything AGI is based on it. You need an outline for instance not just of objects, but of where you're going, and what you're going to try and do - if you want to survive in the real world. Schemas connect everything AGI. And it's not a matter of choice - first you have to have an outline/sense of the whole - whatever it is - before you can start filling in the parts. Well, this is the question. The solution is underdetermined, which means that a right solution is not possible to know with complete certainty. So, you may take the approach of using contours to match objects, but that is certainly not the only way to approach the problem. Yes, you have to use local features in the image to group pixels together in some way. I agree with you there. Is using contours the right way? Maybe, but not by itself. You have to define the problem a little better than just saying that we need to construct an outline. The real problem/question is this: How do you determine the uncertainty of a hypothesis, lower it and also determine how good a hypothesis is, especially in comparison to other hypotheses? So, in this case, we are trying to use an outline comparison to determine the best match hypotheses between objects. But, that doesn't define how you score alternative hypotheses. That also is certainly not the only way to do it. You could use the details within the outline too. In fact, in some situations, this would be required to disambiguate between the possible hypotheses. P.S. It would
Re: [agi] Re: Huge Progress on the Core of AGI
, 2010 1:56 PM *To:* agi agi@v2.listbox.com *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI Mike, On Thu, Jul 8, 2010 at 6:52 PM, Mike Tintner tint...@blueyonder.co.ukwrote: Isn't the first problem simply to differentiate the objects in a scene? Well, that is part of the movement problem. If you say something moved, you are also saying that the objects in the two or more video frames are the same instance. (Maybe the most important movement to begin with is not the movement of the object, but of the viewer changing their POV if only slightly - wh. won't be a factor if you're looking at a screen) Maybe, but this problem becomes kind of trivial in a 2D environment, assuming you don't allow rotation of the POV. Moving the POV would simply translate all the objects linearly. If you make it a 3D environment, it becomes significantly more complicated. I could work on 3D, which I will, but I'm not sure I should start there. I probably should consider it though and see what complications it adds to the problem and how they might be solved. And that I presume comes down to being able to put a crude, highly tentative, and fluid outline round them (something that won't be neces. if you're dealing with squares?) . Without knowing v. little if anything about what kind of objects they are. As an infant most likely does. {See infants' drawings and how they evolve v. gradually from a v. crude outline blob that at first can represent anything - that I'm suggesting is a replay of how visual perception developed). The fluid outline or image schema is arguably the basis of all intelligence - just about everything AGI is based on it. You need an outline for instance not just of objects, but of where you're going, and what you're going to try and do - if you want to survive in the real world. Schemas connect everything AGI. And it's not a matter of choice - first you have to have an outline/sense of the whole - whatever it is - before you can start filling in the parts. Well, this is the question. The solution is underdetermined, which means that a right solution is not possible to know with complete certainty. So, you may take the approach of using contours to match objects, but that is certainly not the only way to approach the problem. Yes, you have to use local features in the image to group pixels together in some way. I agree with you there. Is using contours the right way? Maybe, but not by itself. You have to define the problem a little better than just saying that we need to construct an outline. The real problem/question is this: How do you determine the uncertainty of a hypothesis, lower it and also determine how good a hypothesis is, especially in comparison to other hypotheses? So, in this case, we are trying to use an outline comparison to determine the best match hypotheses between objects. But, that doesn't define how you score alternative hypotheses. That also is certainly not the only way to do it. You could use the details within the outline too. In fact, in some situations, this would be required to disambiguate between the possible hypotheses. P.S. It would be mindblowingly foolish BTW to think you can do better than the way an infant learns to see - that's an awfully big visual section of the brain there, and it works. I'm not trying to do better than the human brain. I am trying to solve the same problems that the brain solves in a different way, sometimes better than the brain, sometimes worse, sometimes equivalently. What would be foolish is to assume the only way to duplicate general intelligence is to copy the human brain. By taking this approach, you are forced to reverse engineer and understand something that is extremely difficult to reverse engineer. In addition, a solution that using the brain's design may not be economically feasible. So, approaching the problem by copying the human brain has additional risks. You may end up figuring out how the brain works and not be able to use it. In addition might not end up with a good understanding of what other solutions might be possible. Dave *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
If fluid schemas - speaking broadly - are what is needed, (and I'm pretty sure they are), it's n.g. trying for something else. You can't substitute a square approach for a fluid amoeba outline approach. (And you will certainly need exactly such an approach to recognize amoeba's). If it requires a new kind of machine, or a radically new kind of instruction set for computers, then that's what it requires - Stan Franklin, BTW, is one person who does recognize, and is trying to deal with this problem - might be worth checking up on him. This is partly BTW why my instinct is that it may be better to start with tasks for robot hands*, because it should be possible to get them to apply a relatively flexible and fluid grip/handshape and grope for and experiment with differently shaped objects And if you accept the broad philosophy I've been outlining, then it does make sense that evolution should have started with touch as a more primary sense, well before it got to vision. *Or perhaps it may prove better to start with robot snakes/bodies or somesuch. From: David Jones Sent: Friday, July 09, 2010 3:22 PM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI On Fri, Jul 9, 2010 at 10:04 AM, Mike Tintner tint...@blueyonder.co.uk wrote: Couple of quick comments (I'm still thinking about all this - but I'm confident everything AGI links up here). A fluid schema is arguably by its v. nature a method - a trial and error, arguably universal method. It links vision to the hand or any effector. Handling objects also is based on fluid schemas - you put out a fluid adjustably-shaped hand to grasp things. And even if you don't have hands, like a worm, and must grasp things with your body, and must grasp the ground under which you move, then too you must use fluid body schemas/maps. All concepts - the basis of language and before language, all intelligence - are also almost certainly fluid schemas (and not as you suggested, patterns). fluid schemas is not an actual algorithm. It is not clear how to go about implementing such a design. Even so, when you get into the details of actually implementing it, you will find yourself faced with the exact same problems I'm trying to solve. So, lets say you take the first frame and generate an initial fluid schema. What if an object disappears? What if the object changes? What if the object moves a little or a lot? What if a large number of changes occur at once, like one new thing suddenly blocking a bunch of similar stuff that is behind it? How far does your fluid schema have to be distorted for the algorithm to realize that it needs a new schema and can't use the same old one? You can't just say that all objects are always present and just distort the schema. What if two similar objects appear or both move and one disappears? How does your schema handle this? Regardless of whether you talk about hypotheses or schemas, it is the SAME problem. You can't avoid the fact that the whole thing is underdetermined and you need a way to score and compare hypotheses. If you disagree, please define your schema algorithm a bit more specifically. Then we would be able to analyze its pros and cons better. All creative problemsolving begins from concepts of what you want to do (and not formulae or algorithms as in rational problemsolving). Any suggestion to the contrary will not, I suggest, bear the slightest serious examination. Sure. I would point out though that children do stuff just to learn in the beginning. A good example is our desire to play. Playing is a strategy by which children learn new things even though they don't have a need for those things yet. It motivates us to learn for the future and not for any pressing present needs. No matter how you look at it, you will need algorithms for general intelligence. To say otherwise makes zero sense. No algorithms, no design. No matter what design you come up with, I call that an algorithm. Algorithms don't have to be formulaic or narrow. Keep an open mind about the world algorithm, unless you can suggest a better term to describe general AI algorithms. **Fluid schemas/concepts/fluid outlines are attempts-to-grasp-things - gropings.** Point 2 : I'd relook at your assumptions in all your musings - my impression is they all assume, unwittingly, an *adult* POV - the view of s.o. who already knows how to see - as distinct from an infant who is just learning to see and get to grips with an extremely blurred world, (even more blurred and confusing, I wouldn't be surprised, than that Prakash video). You're unwittingly employing top down, fully-formed-intelligence assumptions even while overtly trying to produce a learning system - you're looking for what an adult wants to know, rather than what an infant starting-from-almost-no-knowledge-of-the-world wants to know. If you accept the point in any way, major philosophical rethinking
Re: [agi] Re: Huge Progress on the Core of AGI
Mike, Please outline your algorithm for fluid schemas though. It will be clear when you do that you are faced with the exact same uncertainty problems I am dealing with and trying to solve. The problems are completely equivalent. Yours is just a specific approach that is not sufficiently defined. You have to define how you deal with uncertainty when using fluid schemas or even how to approach the task of figuring it out. Until then, its not a solution to anything. Dave On Fri, Jul 9, 2010 at 10:59 AM, Mike Tintner tint...@blueyonder.co.ukwrote: If fluid schemas - speaking broadly - are what is needed, (and I'm pretty sure they are), it's n.g. trying for something else. You can't substitute a square approach for a fluid amoeba outline approach. (And you will certainly need exactly such an approach to recognize amoeba's). If it requires a new kind of machine, or a radically new kind of instruction set for computers, then that's what it requires - Stan Franklin, BTW, is one person who does recognize, and is trying to deal with this problem - might be worth checking up on him. This is partly BTW why my instinct is that it may be better to start with tasks for robot hands*, because it should be possible to get them to apply a relatively flexible and fluid grip/handshape and grope for and experiment with differently shaped objects And if you accept the broad philosophy I've been outlining, then it does make sense that evolution should have started with touch as a more primary sense, well before it got to vision. *Or perhaps it may prove better to start with robot snakes/bodies or somesuch. *From:* David Jones davidher...@gmail.com *Sent:* Friday, July 09, 2010 3:22 PM *To:* agi agi@v2.listbox.com *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI On Fri, Jul 9, 2010 at 10:04 AM, Mike Tintner tint...@blueyonder.co.ukwrote: Couple of quick comments (I'm still thinking about all this - but I'm confident everything AGI links up here). A fluid schema is arguably by its v. nature a method - a trial and error, arguably universal method. It links vision to the hand or any effector. Handling objects also is based on fluid schemas - you put out a fluid adjustably-shaped hand to grasp things. And even if you don't have hands, like a worm, and must grasp things with your body, and must grasp the ground under which you move, then too you must use fluid body schemas/maps. All concepts - the basis of language and before language, all intelligence - are also almost certainly fluid schemas (and not as you suggested, patterns). fluid schemas is not an actual algorithm. It is not clear how to go about implementing such a design. Even so, when you get into the details of actually implementing it, you will find yourself faced with the exact same problems I'm trying to solve. So, lets say you take the first frame and generate an initial fluid schema. What if an object disappears? What if the object changes? What if the object moves a little or a lot? What if a large number of changes occur at once, like one new thing suddenly blocking a bunch of similar stuff that is behind it? How far does your fluid schema have to be distorted for the algorithm to realize that it needs a new schema and can't use the same old one? You can't just say that all objects are always present and just distort the schema. What if two similar objects appear or both move and one disappears? How does your schema handle this? Regardless of whether you talk about hypotheses or schemas, it is the SAME problem. You can't avoid the fact that the whole thing is underdetermined and you need a way to score and compare hypotheses. If you disagree, please define your schema algorithm a bit more specifically. Then we would be able to analyze its pros and cons better. All creative problemsolving begins from concepts of what you want to do (and not formulae or algorithms as in rational problemsolving). Any suggestion to the contrary will not, I suggest, bear the slightest serious examination. Sure. I would point out though that children do stuff just to learn in the beginning. A good example is our desire to play. Playing is a strategy by which children learn new things even though they don't have a need for those things yet. It motivates us to learn for the future and not for any pressing present needs. No matter how you look at it, you will need algorithms for general intelligence. To say otherwise makes zero sense. No algorithms, no design. No matter what design you come up with, I call that an algorithm. Algorithms don't have to be formulaic or narrow. Keep an open mind about the world algorithm, unless you can suggest a better term to describe general AI algorithms. **Fluid schemas/concepts/fluid outlines are attempts-to-grasp-things - gropings.** Point 2 : I'd relook at your assumptions in all your musings - my impression is they all assume, unwittingly
Re: [agi] Re: Huge Progress on the Core of AGI
There isn't an algorithm. It's basically a matter of overlaying shapes to see if they fit - much as you put one hand against another to see if they fit - much as you can overlay a hand to see if it fits and is capable of grasping an object - except considerably more fluid/ rougher. There has to be some instruction generating the process, but it's not an algorithm. How can you have an algorithm for recognizing amoebas - or rocks or a drop of water? They are not patterned entities - or by extension reducible to algorithms. You don't need to think too much about internal visual processes - you can just look,at the external objects-to-be-classified , the objects that make up this world, and see this. Just as you can look at a set of diverse patterns and see that they too are not reducible to any single formula/pattern/algorithm. We're talking about the fundamental structure of the universe and its contents. If this is right and God is an artist before he is a mathematician, then it won't do any good screaming about it, you're going to have to invent a way to do art, so to speak, on computers . Or you can pretend that dealing with mathematical squares will somehow help here - but it hasn't and won't. Do you think that a creative process like creating http://www.apocalyptic-theories.com/gallery/lastjudge/bosch.jpg started with an algorithm? There are other ways of solving problems than algorithms - the person who created each algorithm in the first place certainly didn't have one. From: David Jones Sent: Friday, July 09, 2010 4:20 PM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI Mike, Please outline your algorithm for fluid schemas though. It will be clear when you do that you are faced with the exact same uncertainty problems I am dealing with and trying to solve. The problems are completely equivalent. Yours is just a specific approach that is not sufficiently defined. You have to define how you deal with uncertainty when using fluid schemas or even how to approach the task of figuring it out. Until then, its not a solution to anything. Dave On Fri, Jul 9, 2010 at 10:59 AM, Mike Tintner tint...@blueyonder.co.uk wrote: If fluid schemas - speaking broadly - are what is needed, (and I'm pretty sure they are), it's n.g. trying for something else. You can't substitute a square approach for a fluid amoeba outline approach. (And you will certainly need exactly such an approach to recognize amoeba's). If it requires a new kind of machine, or a radically new kind of instruction set for computers, then that's what it requires - Stan Franklin, BTW, is one person who does recognize, and is trying to deal with this problem - might be worth checking up on him. This is partly BTW why my instinct is that it may be better to start with tasks for robot hands*, because it should be possible to get them to apply a relatively flexible and fluid grip/handshape and grope for and experiment with differently shaped objects And if you accept the broad philosophy I've been outlining, then it does make sense that evolution should have started with touch as a more primary sense, well before it got to vision. *Or perhaps it may prove better to start with robot snakes/bodies or somesuch. From: David Jones Sent: Friday, July 09, 2010 3:22 PM To: agi Subject: Re: [agi] Re: Huge Progress on the Core of AGI On Fri, Jul 9, 2010 at 10:04 AM, Mike Tintner tint...@blueyonder.co.uk wrote: Couple of quick comments (I'm still thinking about all this - but I'm confident everything AGI links up here). A fluid schema is arguably by its v. nature a method - a trial and error, arguably universal method. It links vision to the hand or any effector. Handling objects also is based on fluid schemas - you put out a fluid adjustably-shaped hand to grasp things. And even if you don't have hands, like a worm, and must grasp things with your body, and must grasp the ground under which you move, then too you must use fluid body schemas/maps. All concepts - the basis of language and before language, all intelligence - are also almost certainly fluid schemas (and not as you suggested, patterns). fluid schemas is not an actual algorithm. It is not clear how to go about implementing such a design. Even so, when you get into the details of actually implementing it, you will find yourself faced with the exact same problems I'm trying to solve. So, lets say you take the first frame and generate an initial fluid schema. What if an object disappears? What if the object changes? What if the object moves a little or a lot? What if a large number of changes occur at once, like one new thing suddenly blocking a bunch of similar stuff that is behind it? How far does your fluid schema have to be distorted for the algorithm to realize that it needs a new schema and can't use the same old one? You can't just say that all objects are always
Re: [agi] Re: Huge Progress on the Core of AGI
The way I define algorithms encompasses just about any intelligently designed system. So, call it what you want. I really wish you would stop avoiding the word. But, fine. I'll play your word game... Define your system please. And justify why or how it handles uncertainty. You said overlay a hand to see if it fits. How do you define fits? The truth is that it will never fit perfectly, so how do you define a good fit and a bad one? You will find that you end up with the same exact problems I am working on. You keep avoiding the need to define the system of fluid schemas. You're avoiding it because it's not a solution to anything and you can't define it without realizing that your idea doesn't pan out. So, I dare you. Define your fluid schemas without revealing the fatal flaw in your reasoning. Dave On Fri, Jul 9, 2010 at 12:05 PM, Mike Tintner tint...@blueyonder.co.ukwrote: There isn't an algorithm. It's basically a matter of overlaying shapes to see if they fit - much as you put one hand against another to see if they fit - much as you can overlay a hand to see if it fits and is capable of grasping an object - except considerably more fluid/ rougher. There has to be some instruction generating the process, but it's not an algorithm. How can you have an algorithm for recognizing amoebas - or rocks or a drop of water? They are not patterned entities - or by extension reducible to algorithms. You don't need to think too much about internal visual processes - you can just look,at the external objects-to-be-classified , the objects that make up this world, and see this. Just as you can look at a set of diverse patterns and see that they too are not reducible to any single formula/pattern/algorithm. We're talking about the fundamental structure of the universe and its contents. If this is right and God is an artist before he is a mathematician, then it won't do any good screaming about it, you're going to have to invent a way to do art, so to speak, on computers . Or you can pretend that dealing with mathematical squares will somehow help here - but it hasn't and won't. Do you think that a creative process like creating http://www.apocalyptic-theories.com/gallery/lastjudge/bosch.jpg started with an algorithm? There are other ways of solving problems than algorithms - the person who created each algorithm in the first place certainly didn't have one. *From:* David Jones davidher...@gmail.com *Sent:* Friday, July 09, 2010 4:20 PM *To:* agi agi@v2.listbox.com *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI Mike, Please outline your algorithm for fluid schemas though. It will be clear when you do that you are faced with the exact same uncertainty problems I am dealing with and trying to solve. The problems are completely equivalent. Yours is just a specific approach that is not sufficiently defined. You have to define how you deal with uncertainty when using fluid schemas or even how to approach the task of figuring it out. Until then, its not a solution to anything. Dave On Fri, Jul 9, 2010 at 10:59 AM, Mike Tintner tint...@blueyonder.co.ukwrote: If fluid schemas - speaking broadly - are what is needed, (and I'm pretty sure they are), it's n.g. trying for something else. You can't substitute a square approach for a fluid amoeba outline approach. (And you will certainly need exactly such an approach to recognize amoeba's). If it requires a new kind of machine, or a radically new kind of instruction set for computers, then that's what it requires - Stan Franklin, BTW, is one person who does recognize, and is trying to deal with this problem - might be worth checking up on him. This is partly BTW why my instinct is that it may be better to start with tasks for robot hands*, because it should be possible to get them to apply a relatively flexible and fluid grip/handshape and grope for and experiment with differently shaped objects And if you accept the broad philosophy I've been outlining, then it does make sense that evolution should have started with touch as a more primary sense, well before it got to vision. *Or perhaps it may prove better to start with robot snakes/bodies or somesuch. *From:* David Jones davidher...@gmail.com *Sent:* Friday, July 09, 2010 3:22 PM *To:* agi agi@v2.listbox.com *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI On Fri, Jul 9, 2010 at 10:04 AM, Mike Tintner tint...@blueyonder.co.ukwrote: Couple of quick comments (I'm still thinking about all this - but I'm confident everything AGI links up here). A fluid schema is arguably by its v. nature a method - a trial and error, arguably universal method. It links vision to the hand or any effector. Handling objects also is based on fluid schemas - you put out a fluid adjustably-shaped hand to grasp things. And even if you don't have hands, like a worm, and must grasp things with your body, and must grasp the ground under
Re: [agi] Re: Huge Progress on the Core of AGI
David, That's why, imho, the rules need to be *learned* (and, when need be, unlearned). IE, what we need to work on is general learning algorithms, not general visual processing algorithms. As you say, there's not even such a thing as a general visual processing algorithm. Learning algorithms suffer similar environment-dependence, but (by their nature) not as severe... --Abram On Thu, Jul 8, 2010 at 3:17 PM, David Jones davidher...@gmail.com wrote: I've learned something really interesting today. I realized that general rules of inference probably don't really exists. There is no such thing as complete generality for these problems. The rules of inference that work for one environment would fail in alien environments. So, I have to modify my approach to solving these problems. As I studied over simplified problems, I realized that there are probably an infinite number of environments with their own behaviors that are not representative of the environments we want to put a general AI in. So, it is not ok to just come up with any case study and solve it. The case study has to actually be representative of a problem we want to solve in an environment we want to apply AI. Otherwise the solution required will take too long to develop because of it tries to accommodate too much generality. As I mentioned, such a general solution is likely impossible. So, someone could easily get stuck trying to solve an impossible task of creating one general solution to too many problems that don't allow a general solution. The best course is a balance between the time required to write a very general solution and the time required to write less general solutions for multiple problem types and environments. The best way to do this is to choose representative case studies to solve and make sure the solutions are truth-tropic and justified for the environments they are to be applied. Dave On Sun, Jun 27, 2010 at 1:31 AM, David Jones davidher...@gmail.comwrote: A method for comparing hypotheses in explanatory-based reasoning: * We prefer the hypothesis or explanation that ***expects* more observations. If both explanations expect the same observations, then the simpler of the two is preferred (because the unnecessary terms of the more complicated explanation do not add to the predictive power).* *Why are expected events so important?* They are a measure of 1) explanatory power and 2) predictive power. The more predictive and the more explanatory a hypothesis is, the more likely the hypothesis is when compared to a competing hypothesis. Here are two case studies I've been analyzing from sensory perception of simplified visual input: The goal of the case studies is to answer the following: How do you generate the most likely motion hypothesis in a way that is general and applicable to AGI? *Case Study 1)* Here is a link to an example: animated gif of two black squares move from left to righthttp://practicalai.org/images/CaseStudy1.gif. *Description: *Two black squares are moving in unison from left to right across a white screen. In each frame the black squares shift to the right so that square 1 steals square 2's original position and square two moves an equal distance to the right. *Case Study 2) *Here is a link to an example: the interrupted squarehttp://practicalai.org/images/CaseStudy2.gif. *Description:* A single square is moving from left to right. Suddenly in the third frame, a single black square is added in the middle of the expected path of the original black square. This second square just stays there. So, what happened? Did the square moving from left to right keep moving? Or did it stop and then another square suddenly appeared and moved from left to right? *Here is a simplified version of how we solve case study 1: *The important hypotheses to consider are: 1) the square from frame 1 of the video that has a very close position to the square from frame 2 should be matched (we hypothesize that they are the same square and that any difference in position is motion). So, what happens is that in each two frames of the video, we only match one square. The other square goes unmatched. 2) We do the same thing as in hypothesis #1, but this time we also match the remaining squares and hypothesize motion as follows: the first square jumps over the second square from left to right. We hypothesize that this happens over and over in each frame of the video. Square 2 stops and square 1 jumps over it over and over again. 3) We hypothesize that both squares move to the right in unison. This is the correct hypothesis. So, why should we prefer the correct hypothesis, #3 over the other two? Well, first of all, #3 is correct because it has the most explanatory power of the three and is the simplest of the three. Simpler is better because, with the given evidence and information, there is no reason to desire a more complicated hypothesis such
Re: [agi] Re: Huge Progress on the Core of AGI
It may not be possible to create a learning algorithm that can learn how to generally process images and other general AGI problems. This is for the same reason that completely general vision algorithms are likely impossible. I think that figuring out how to process sensory information intelligently requires either 1) impossible amounts of processing or 2) intelligent design and understanding by us. Maybe you could be more specific about how general learning algorithms would solve problems such as the one I'm tackling. But, I am extremely doubtful it can be done because the problems cannot be effectively described to such an algorithm. If you can't describe the problem, it can't search for solutions. If it can't search for solutions, you're basically stuck with evolution type algorithms, which require prohibitory amounts of processing. The reason that vision is so important for learning is that sensory perception is the foundation required to learn everything else. If you don't start with a foundational problem like this, you won't be representing the real nature of general intelligence problems that require extensive knowledge of the world to solve properly. Sensory perception is required to learn the information needed to understand everything else. Text and language for example, require extensive knowledge about the world to understand and especially to learn about. If you start with general learning algorithms on these unrepresentative problems, you will get stuck as we already have. So, it still makes a lot of sense to start with a concrete problem that does not require extensive amounts of previous knowledge to start learning. In fact, AGI requires that you not pre-program the AI with such extensive knowledge. So, lots of people are working on general learning algorithms that are unrepresentative of what is required for AGI because the algorithms don't have the knowledge needed to learn what they are trying to learn about. Regardless of how you look at it, my approach is definitely the right approach to AGI in my opinion. On Thu, Jul 8, 2010 at 5:02 PM, Abram Demski abramdem...@gmail.com wrote: David, That's why, imho, the rules need to be *learned* (and, when need be, unlearned). IE, what we need to work on is general learning algorithms, not general visual processing algorithms. As you say, there's not even such a thing as a general visual processing algorithm. Learning algorithms suffer similar environment-dependence, but (by their nature) not as severe... --Abram On Thu, Jul 8, 2010 at 3:17 PM, David Jones davidher...@gmail.com wrote: I've learned something really interesting today. I realized that general rules of inference probably don't really exists. There is no such thing as complete generality for these problems. The rules of inference that work for one environment would fail in alien environments. So, I have to modify my approach to solving these problems. As I studied over simplified problems, I realized that there are probably an infinite number of environments with their own behaviors that are not representative of the environments we want to put a general AI in. So, it is not ok to just come up with any case study and solve it. The case study has to actually be representative of a problem we want to solve in an environment we want to apply AI. Otherwise the solution required will take too long to develop because of it tries to accommodate too much generality. As I mentioned, such a general solution is likely impossible. So, someone could easily get stuck trying to solve an impossible task of creating one general solution to too many problems that don't allow a general solution. The best course is a balance between the time required to write a very general solution and the time required to write less general solutions for multiple problem types and environments. The best way to do this is to choose representative case studies to solve and make sure the solutions are truth-tropic and justified for the environments they are to be applied. Dave On Sun, Jun 27, 2010 at 1:31 AM, David Jones davidher...@gmail.comwrote: A method for comparing hypotheses in explanatory-based reasoning: * We prefer the hypothesis or explanation that ***expects* more observations. If both explanations expect the same observations, then the simpler of the two is preferred (because the unnecessary terms of the more complicated explanation do not add to the predictive power).* *Why are expected events so important?* They are a measure of 1) explanatory power and 2) predictive power. The more predictive and the more explanatory a hypothesis is, the more likely the hypothesis is when compared to a competing hypothesis. Here are two case studies I've been analyzing from sensory perception of simplified visual input: The goal of the case studies is to answer the following: How do you generate the most likely motion hypothesis in a way that is general
Re: [agi] Re: Huge Progress on the Core of AGI
David, How I'd present the problem would be predict the next frame, or more generally predict a specified portion of video given a different portion. Do you object to this approach? --Abram On Thu, Jul 8, 2010 at 5:30 PM, David Jones davidher...@gmail.com wrote: It may not be possible to create a learning algorithm that can learn how to generally process images and other general AGI problems. This is for the same reason that completely general vision algorithms are likely impossible. I think that figuring out how to process sensory information intelligently requires either 1) impossible amounts of processing or 2) intelligent design and understanding by us. Maybe you could be more specific about how general learning algorithms would solve problems such as the one I'm tackling. But, I am extremely doubtful it can be done because the problems cannot be effectively described to such an algorithm. If you can't describe the problem, it can't search for solutions. If it can't search for solutions, you're basically stuck with evolution type algorithms, which require prohibitory amounts of processing. The reason that vision is so important for learning is that sensory perception is the foundation required to learn everything else. If you don't start with a foundational problem like this, you won't be representing the real nature of general intelligence problems that require extensive knowledge of the world to solve properly. Sensory perception is required to learn the information needed to understand everything else. Text and language for example, require extensive knowledge about the world to understand and especially to learn about. If you start with general learning algorithms on these unrepresentative problems, you will get stuck as we already have. So, it still makes a lot of sense to start with a concrete problem that does not require extensive amounts of previous knowledge to start learning. In fact, AGI requires that you not pre-program the AI with such extensive knowledge. So, lots of people are working on general learning algorithms that are unrepresentative of what is required for AGI because the algorithms don't have the knowledge needed to learn what they are trying to learn about. Regardless of how you look at it, my approach is definitely the right approach to AGI in my opinion. On Thu, Jul 8, 2010 at 5:02 PM, Abram Demski abramdem...@gmail.comwrote: David, That's why, imho, the rules need to be *learned* (and, when need be, unlearned). IE, what we need to work on is general learning algorithms, not general visual processing algorithms. As you say, there's not even such a thing as a general visual processing algorithm. Learning algorithms suffer similar environment-dependence, but (by their nature) not as severe... --Abram On Thu, Jul 8, 2010 at 3:17 PM, David Jones davidher...@gmail.comwrote: I've learned something really interesting today. I realized that general rules of inference probably don't really exists. There is no such thing as complete generality for these problems. The rules of inference that work for one environment would fail in alien environments. So, I have to modify my approach to solving these problems. As I studied over simplified problems, I realized that there are probably an infinite number of environments with their own behaviors that are not representative of the environments we want to put a general AI in. So, it is not ok to just come up with any case study and solve it. The case study has to actually be representative of a problem we want to solve in an environment we want to apply AI. Otherwise the solution required will take too long to develop because of it tries to accommodate too much generality. As I mentioned, such a general solution is likely impossible. So, someone could easily get stuck trying to solve an impossible task of creating one general solution to too many problems that don't allow a general solution. The best course is a balance between the time required to write a very general solution and the time required to write less general solutions for multiple problem types and environments. The best way to do this is to choose representative case studies to solve and make sure the solutions are truth-tropic and justified for the environments they are to be applied. Dave On Sun, Jun 27, 2010 at 1:31 AM, David Jones davidher...@gmail.comwrote: A method for comparing hypotheses in explanatory-based reasoning: * We prefer the hypothesis or explanation that ***expects* more observations. If both explanations expect the same observations, then the simpler of the two is preferred (because the unnecessary terms of the more complicated explanation do not add to the predictive power).* *Why are expected events so important?* They are a measure of 1) explanatory power and 2) predictive power. The more predictive and the more explanatory a hypothesis is, the more likely the
Re: [agi] Re: Huge Progress on the Core of AGI
Abram, Yeah, I would have to object for a couple reasons. First, prediction requires previous knowledge. So, even if you make that your primary goal, you're still going to have my research goals as the prerequisite: which are to process visual information in a more general way and learn about the environment in a more general way. Second, not everything is predictable. Certainly, we should not try to predict everything. Only after we have experience, can we actually predict anything. Even then, it's not precise prediction, like predicting the next frame of a video. It's more like having knowledge of what is quite likely to occur, or maybe an approximate prediction, but not guaranteed in the least. For example, based on previous experience, striking a match will light it. But, sometimes it doesn't light, and that too is expected to occur sometimes. We definitely don't predict the next image we'll see when it lights though. We just have expectations for what we might see and this helps us interpret the image effectively. We should try to expect certain outcomes or possible outcomes though. You could call that prediction, but it's not quite the same. The things we are more likely to see should be attempted as an explanation first and preferred if not given a reason to think otherwise. Dave On Thu, Jul 8, 2010 at 5:51 PM, Abram Demski abramdem...@gmail.com wrote: David, How I'd present the problem would be predict the next frame, or more generally predict a specified portion of video given a different portion. Do you object to this approach? --Abram On Thu, Jul 8, 2010 at 5:30 PM, David Jones davidher...@gmail.com wrote: It may not be possible to create a learning algorithm that can learn how to generally process images and other general AGI problems. This is for the same reason that completely general vision algorithms are likely impossible. I think that figuring out how to process sensory information intelligently requires either 1) impossible amounts of processing or 2) intelligent design and understanding by us. Maybe you could be more specific about how general learning algorithms would solve problems such as the one I'm tackling. But, I am extremely doubtful it can be done because the problems cannot be effectively described to such an algorithm. If you can't describe the problem, it can't search for solutions. If it can't search for solutions, you're basically stuck with evolution type algorithms, which require prohibitory amounts of processing. The reason that vision is so important for learning is that sensory perception is the foundation required to learn everything else. If you don't start with a foundational problem like this, you won't be representing the real nature of general intelligence problems that require extensive knowledge of the world to solve properly. Sensory perception is required to learn the information needed to understand everything else. Text and language for example, require extensive knowledge about the world to understand and especially to learn about. If you start with general learning algorithms on these unrepresentative problems, you will get stuck as we already have. So, it still makes a lot of sense to start with a concrete problem that does not require extensive amounts of previous knowledge to start learning. In fact, AGI requires that you not pre-program the AI with such extensive knowledge. So, lots of people are working on general learning algorithms that are unrepresentative of what is required for AGI because the algorithms don't have the knowledge needed to learn what they are trying to learn about. Regardless of how you look at it, my approach is definitely the right approach to AGI in my opinion. On Thu, Jul 8, 2010 at 5:02 PM, Abram Demski abramdem...@gmail.comwrote: David, That's why, imho, the rules need to be *learned* (and, when need be, unlearned). IE, what we need to work on is general learning algorithms, not general visual processing algorithms. As you say, there's not even such a thing as a general visual processing algorithm. Learning algorithms suffer similar environment-dependence, but (by their nature) not as severe... --Abram On Thu, Jul 8, 2010 at 3:17 PM, David Jones davidher...@gmail.comwrote: I've learned something really interesting today. I realized that general rules of inference probably don't really exists. There is no such thing as complete generality for these problems. The rules of inference that work for one environment would fail in alien environments. So, I have to modify my approach to solving these problems. As I studied over simplified problems, I realized that there are probably an infinite number of environments with their own behaviors that are not representative of the environments we want to put a general AI in. So, it is not ok to just come up with any case study and solve it. The case study has to actually be representative of a problem we
Re: [agi] Re: Huge Progress on the Core of AGI
Isn't the first problem simply to differentiate the objects in a scene? (Maybe the most important movement to begin with is not the movement of the object, but of the viewer changing their POV if only slightly - wh. won't be a factor if you're looking at a screen) And that I presume comes down to being able to put a crude, highly tentative, and fluid outline round them (something that won't be neces. if you're dealing with squares?) . Without knowing v. little if anything about what kind of objects they are. As an infant most likely does. {See infants' drawings and how they evolve v. gradually from a v. crude outline blob that at first can represent anything - that I'm suggesting is a replay of how visual perception developed). The fluid outline or image schema is arguably the basis of all intelligence - just about everything AGI is based on it. You need an outline for instance not just of objects, but of where you're going, and what you're going to try and do - if you want to survive in the real world. Schemas connect everything AGI. And it's not a matter of choice - first you have to have an outline/sense of the whole - whatever it is - before you can start filling in the parts. P.S. It would be mindblowingly foolish BTW to think you can do better than the way an infant learns to see - that's an awfully big visual section of the brain there, and it works. David, How I'd present the problem would be predict the next frame, or more generally predict a specified portion of video given a different portion. Do you object to this approach? --Abram On Thu, Jul 8, 2010 at 5:30 PM, David Jones davidher...@gmail.com wrote: It may not be possible to create a learning algorithm that can learn how to generally process images and other general AGI problems. This is for the same reason that completely general vision algorithms are likely impossible. I think that figuring out how to process sensory information intelligently requires either 1) impossible amounts of processing or 2) intelligent design and understanding by us. Maybe you could be more specific about how general learning algorithms would solve problems such as the one I'm tackling. But, I am extremely doubtful it can be done because the problems cannot be effectively described to such an algorithm. If you can't describe the problem, it can't search for solutions. If it can't search for solutions, you're basically stuck with evolution type algorithms, which require prohibitory amounts of processing. The reason that vision is so important for learning is that sensory perception is the foundation required to learn everything else. If you don't start with a foundational problem like this, you won't be representing the real nature of general intelligence problems that require extensive knowledge of the world to solve properly. Sensory perception is required to learn the information needed to understand everything else. Text and language for example, require extensive knowledge about the world to understand and especially to learn about. If you start with general learning algorithms on these unrepresentative problems, you will get stuck as we already have. So, it still makes a lot of sense to start with a concrete problem that does not require extensive amounts of previous knowledge to start learning. In fact, AGI requires that you not pre-program the AI with such extensive knowledge. So, lots of people are working on general learning algorithms that are unrepresentative of what is required for AGI because the algorithms don't have the knowledge needed to learn what they are trying to learn about. Regardless of how you look at it, my approach is definitely the right approach to AGI in my opinion. On Thu, Jul 8, 2010 at 5:02 PM, Abram Demski abramdem...@gmail.com wrote: David, That's why, imho, the rules need to be *learned* (and, when need be, unlearned). IE, what we need to work on is general learning algorithms, not general visual processing algorithms. As you say, there's not even such a thing as a general visual processing algorithm. Learning algorithms suffer similar environment-dependence, but (by their nature) not as severe... --Abram On Thu, Jul 8, 2010 at 3:17 PM, David Jones davidher...@gmail.com wrote: I've learned something really interesting today. I realized that general rules of inference probably don't really exists. There is no such thing as complete generality for these problems. The rules of inference that work for one environment would fail in alien environments. So, I have to modify my approach to solving these problems. As I studied over simplified problems, I realized that there are probably an infinite number of environments with their own behaviors that are not representative of the environments we want to put a general AI in. So, it is not ok to just come up with any case study and solve it. The case
Re: [agi] Re: Huge Progress on the Core of AGI
I figured out a way to make the Solomonoff Induction iteratively infinite, so I guess I was wrong. Thanks for explaining it to me. However, I don't accept that it is feasible to make those calculations since an examination of the infinite programs that could output each individual string would be required. My sense is that the statistics of a examination of a finite number of programs that output a finite number of strings could be used in Solomonff Induction to to give a reliable probability of what the next bit (or next sequence of bits) might be based on the sampling, under the condition that only those cases that had previously occurred would occur again and at the same frequencyy during the samplings. However, the attempt to figure the probabilities of concatenation of these strings or sub strings would be unreliable and void whatever benefit the theoretical model might appear to offer. Logic, probability and compression methods are all useful in AGI even though we are constantly violating the laws of logic and probability because it is necessary, and we sometimes need to use more complicated models (anti-compression so to speak) so that we can consider other possibilities based on what we have previously learned. So, I still don't see how Kolmogrov Complexity and Solomonoff Induction are truly useful except as theoretical methods that are interesting to consider. And, Occam's Razor is not reliable as an axiom of science. If we were to abide by it we would come to conclusions like a finding that describes an event by saying that it occurs some of the time, since it would be simpler than trying to describe the greater circumstances of the event in an effort to see if we can find something out about why the event occurred or didn't occur. In this sense Occam's Razor is anti-science since it implies that the status quo should be maintained since simpler is better. All things being equal, simpler is better. I think we all get that. However, the human mind is capable of re weighting the conditions and circumstances of a system to reconsider other possibilities and that seems to be an important and necessary method in research (and in planning). Jim Bromer On Sat, Jul 3, 2010 at 11:39 AM, Matt Mahoney matmaho...@yahoo.com wrote: Jim Bromer wrote: You can't assume a priori that the diagonal argument is not relevant. When I say infinite in my proof of Solomonoff induction, I mean countably infinite, as in aleph-null, as in there is a 1 to 1 mapping between the set and N, the set of natural numbers. There are a countably infinite number of finite strings, or of finite programs, or of finite length descriptions of any particular string. For any finite length string or program or description x with nonzero probability, there are a countably infinite number of finite length strings or programs or descriptions that are longer and less likely than x, and a finite number of finite length strings or programs or descriptions that are either shorter or more likely or both than x. Aleph-null is larger than any finite integer. This means that for any finite set and any countably infinite set, there is not a 1 to 1 mapping between the elements, and if you do map all of the elements of the finite set to elements of the infinite set, then there are unmapped elements of the infinite set left over. Cantor's diagonalization argument proves that there are infinities larger than aleph-null, such as the cardinality of the set of real numbers, which we call uncountably infinite. But since I am not using any uncountably infinite sets, I don't understand your objection. -- Matt Mahoney, matmaho...@yahoo.com -- *From:* Jim Bromer jimbro...@gmail.com *To:* agi agi@v2.listbox.com *Sent:* Sat, July 3, 2010 9:43:15 AM *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI On Fri, Jul 2, 2010 at 6:08 PM, Matt Mahoney matmaho...@yahoo.com wrote: Jim, to address all of your points, Solomonoff induction claims that the probability of a string is proportional to the number of programs that output the string, where each program M is weighted by 2^-|M|. The probability is dominated by the shortest program (Kolmogorov complexity), but it is not exactly the same. The difference is small enough that we may neglect it, just as we neglect differences that depend on choice of language. The infinite number of programs that could output the infinite number of strings that are to be considered (for example while using Solomonoff induction to predict what string is being output) lays out the potential for the diagonal argument. You can't assume a priori that the diagonal argument is not relevant. I don't believe that you can prove that it isn't relevant since as you say, Kolmogorov Complexity is not computable, and you cannot be sure that you have listed all the programs that were able to output a particular string. This creates a situation
Re: [agi] Re: Huge Progress on the Core of AGI
On Fri, Jul 2, 2010 at 6:08 PM, Matt Mahoney matmaho...@yahoo.com wrote: Jim, to address all of your points, Solomonoff induction claims that the probability of a string is proportional to the number of programs that output the string, where each program M is weighted by 2^-|M|. The probability is dominated by the shortest program (Kolmogorov complexity), but it is not exactly the same. The difference is small enough that we may neglect it, just as we neglect differences that depend on choice of language. The infinite number of programs that could output the infinite number of strings that are to be considered (for example while using Solomonoff induction to predict what string is being output) lays out the potential for the diagonal argument. You can't assume a priori that the diagonal argument is not relevant. I don't believe that you can prove that it isn't relevant since as you say, Kolmogorov Complexity is not computable, and you cannot be sure that you have listed all the programs that were able to output a particular string. This creates a situation in which the underlying logic of using Solmonoff induction is based on incomputable reasoning which can be shown using the diagonal argument. This kind of criticism cannot be answered with the kinds of presumptions that you used to derive the conclusions that you did. It has to be answered directly. I can think of other infinity to infinity relations in which the potential mappings can be countably derived from the formulas or equations, but I have yet to see any analysis which explains why this usage can be. Although you may imagine that the summation of the probabilities can be used just like it was an ordinary number, the unchecked usage is faulty. In other words the criticism has to be considered more carefully by someone capable of dealing with complex mathematical problems that involve the legitimacy of claims between infinite to infinite mappings. Jim Bromer On Fri, Jul 2, 2010 at 6:08 PM, Matt Mahoney matmaho...@yahoo.com wrote: Jim, to address all of your points, Solomonoff induction claims that the probability of a string is proportional to the number of programs that output the string, where each program M is weighted by 2^-|M|. The probability is dominated by the shortest program (Kolmogorov complexity), but it is not exactly the same. The difference is small enough that we may neglect it, just as we neglect differences that depend on choice of language. Here is the proof that Kolmogorov complexity is not computable. Suppose it were. Then I could test the Kolmogorov complexity of strings in increasing order of length (breaking ties lexicographically) and describe the first string that cannot be described in less than a million bits, contradicting the fact that I just did. (Formally, I could write a program that outputs the first string whose Kolmogorov complexity is at least n bits, choosing n to be larger than my program). Here is the argument that Occam's Razor and Solomonoff distribution must be true. Consider all possible probability distributions p(x) over any infinite set X of possible finite strings x, i.e. any X = {x: p(x) 0} that is infinite. All such distributions must favor shorter strings over longer ones. Consider any x in X. Then p(x) 0. There can be at most a finite number (less than 1/p(x)) of strings that are more likely than x, and therefore an infinite number of strings which are less likely than x. Of this infinite set, only a finite number (less than 2^|x|) can be shorter than x, and therefore there must be an infinite number that are longer than x. So for each x we can partition X into 4 subsets as follows: - shorter and more likely than x: finite - shorter and less likely than x: finite - longer and more likely than x: finite - longer and less likely than x: infinite. So in this sense, any distribution over the set of strings must favor shorter strings over longer ones. -- Matt Mahoney, matmaho...@yahoo.com -- *From:* Jim Bromer jimbro...@gmail.com *To:* agi agi@v2.listbox.com *Sent:* Fri, July 2, 2010 4:09:38 PM *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI On Fri, Jul 2, 2010 at 2:25 PM, Jim Bromer jimbro...@gmail.com wrote: There cannot be a one to one correspondence to the representation of the shortest program to produce a string and the strings that they produce. This means that if the consideration of the hypotheses were to be put into general mathematical form it must include the potential of many to one relations between candidate programs (or subprograms) and output strings. But, there is also no way to determine what the shortest program is, since there may be different programs that are the same length. That means that there is a many to one relation between programs and program length. So the claim that you could just iterate through programs *by length* is false
Re: [agi] Re: Huge Progress on the Core of AGI
This group, as in most AGI discussions, will use logic and statistical theory loosely. We have to. One is that we - thinking entities - do not know everything and so our reasoning is based on fragmentary knowledge. In this situation the boundaries of logical reasoning in thought, both natural and artificial, are going to be transgressed. However, knowing that is going to be the case in AGI, we can acknowledge it and try to figure out algorithms that will tend to ground our would-be programs. Now Solomonoff Induction and Algorithmic Information Theory are a little different. They deal with concrete data spaces. We can and should question how relevant those concrete sample spaces might be to general reasoning about the greater universe of knowledge, but the fact that they deal with concrete spaces means that they might be logically bound. But are they? If an idealism is both concrete (too concrete for our uses) and not logically computable then we have to really be wary of trying to use it. If using Solomonoff Induction is incomputable it does not prove that it is illogical. But if it is incomputable, it would be illogical to believe that it can be used reliably. Solomonoff Induction has been around long enough for serious mathematicians to examine its validity. If it was a genuinely sound method, mathematicians would have accepted it. However, if Solomonoff Induction is incomputable in practice it would be so unreliable that top mathematicians would tend to choose more productive and interesting subjects to study. As far as I can tell, Solomonoff Induction exists today within the backwash of AI communities. It has found new life in these kinds of discussion groups where most of us do not have the skill or the time to critically examine the basis of every theory that is put forward. The one test that we can make is whether or not some method that is being presented has some reliability in our programs which constitute mini experiments. Logic and probability pass the smell test, even though we know that our use of them in AGI is not ideal. Jim Bromer --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Jim Bromer wrote: You can't assume a priori that the diagonal argument is not relevant. When I say infinite in my proof of Solomonoff induction, I mean countably infinite, as in aleph-null, as in there is a 1 to 1 mapping between the set and N, the set of natural numbers. There are a countably infinite number of finite strings, or of finite programs, or of finite length descriptions of any particular string. For any finite length string or program or description x with nonzero probability, there are a countably infinite number of finite length strings or programs or descriptions that are longer and less likely than x, and a finite number of finite length strings or programs or descriptions that are either shorter or more likely or both than x. Aleph-null is larger than any finite integer. This means that for any finite set and any countably infinite set, there is not a 1 to 1 mapping between the elements, and if you do map all of the elements of the finite set to elements of the infinite set, then there are unmapped elements of the infinite set left over. Cantor's diagonalization argument proves that there are infinities larger than aleph-null, such as the cardinality of the set of real numbers, which we call uncountably infinite. But since I am not using any uncountably infinite sets, I don't understand your objection. -- Matt Mahoney, matmaho...@yahoo.com From: Jim Bromer jimbro...@gmail.com To: agi agi@v2.listbox.com Sent: Sat, July 3, 2010 9:43:15 AM Subject: Re: [agi] Re: Huge Progress on the Core of AGI On Fri, Jul 2, 2010 at 6:08 PM, Matt Mahoney matmaho...@yahoo.com wrote: Jim, to address all of your points, Solomonoff induction claims that the probability of a string is proportional to the number of programs that output the string, where each program M is weighted by 2^-|M|. The probability is dominated by the shortest program (Kolmogorov complexity), but it is not exactly the same. The difference is small enough that we may neglect it, just as we neglect differences that depend on choice of language. The infinite number of programs that could output the infinite number of strings that are to be considered (for example while using Solomonoff induction to predict what string is being output) lays out the potential for the diagonal argument. You can't assume a priori that the diagonal argument is not relevant. I don't believe that you can prove that it isn't relevant since as you say, Kolmogorov Complexity is not computable, and you cannot be sure that you have listed all the programs that were able to output a particular string. This creates a situation in which the underlying logic of using Solmonoff induction is based on incomputable reasoning which can be shown using the diagonal argument. This kind of criticism cannot be answered with the kinds of presumptions that you used to derive the conclusions that you did. It has to be answered directly. I can think of other infinity to infinity relations in which the potential mappings can be countably derived from the formulas or equations, but I have yet to see any analysis which explains why this usage can be. Although you may imagine that the summation of the probabilities can be used just like it was an ordinary number, the unchecked usage is faulty. In other words the criticism has to be considered more carefully by someone capable of dealing with complex mathematical problems that involve the legitimacy of claims between infinite to infinite mappings. Jim Bromer On Fri, Jul 2, 2010 at 6:08 PM, Matt Mahoney matmaho...@yahoo.com wrote: Jim, to address all of your points, Solomonoff induction claims that the probability of a string is proportional to the number of programs that output the string, where each program M is weighted by 2^-|M|. The probability is dominated by the shortest program (Kolmogorov complexity), but it is not exactly the same. The difference is small enough that we may neglect it, just as we neglect differences that depend on choice of language. Here is the proof that Kolmogorov complexity is not computable. Suppose it were. Then I could test the Kolmogorov complexity of strings in increasing order of length (breaking ties lexicographically) and describe the first string that cannot be described in less than a million bits, contradicting the fact that I just did. (Formally, I could write a program that outputs the first string whose Kolmogorov complexity is at least n bits, choosing n to be larger than my program). Here is the argument that Occam's Razor and Solomonoff distribution must be true. Consider all possible probability distributions p(x) over any infinite set X of possible finite strings x, i.e. any X = {x: p(x) 0} that is infinite. All such distributions must favor shorter strings over longer ones. Consider any x in X. Then p(x) 0. There can be at most a finite number (less than 1/p(x
Re: [agi] Re: Huge Progress on the Core of AGI
On Wed, Jun 30, 2010 at 5:13 PM, Matt Mahoney matmaho...@yahoo.com wrote: Jim, what evidence do you have that Occam's Razor ... is wrong, besides your own opinions? It is well established that elegant (short) theories are preferred in all branches of science because they have greater predictive power. -- Matt Mahoney, matmaho...@yahoo.com When a heuristic is used as if it were an axiom of truth, it will interfere in the development of reasonable insight just because an heuristic is not an axiom. Now to apply this heuristic (which does have value) as an unquestionable axiom of mind, you are making a more egregious claim because you are multiplying the force of the error. Occam's razor has greater predictive power within the boundaries of the isolation experiments which have the greatest potential to enhance its power. If simplest theories are preferred because they have the greater predictive power, then it would follow that isolation experiments would be the preferred vehicles of science just because they can produce theories that had the most predictive power. Whether this is the case or not (the popular opinion), it does not answer the question of whether narrow AI (for example) should be the preferred child of computer science just because the theorems of narrow AI are so much better at predicting their (narrow) events than the theorems of AGI are at comprehending their (more complicated) events. Jim Bromer --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
On Wed, Jun 30, 2010 at 5:13 PM, Matt Mahoney matmaho...@yahoo.com wrote: Jim, what evidence do you have that Occam's Razor or algorithmic information theory is wrong, Also, what does this have to do with Cantor's diagonalization argument? AIT considers only the countably infinite set of hypotheses. -- Matt Mahoney, matmaho...@yahoo.com There cannot be a one to one correspondence to the representation of the shortest program to produce a string and the strings that they produce. This means that if the consideration of the hypotheses were to be put into general mathematical form it must include the potential of many to one relations between candidate programs (or subprograms) and output strings. --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
On Fri, Jul 2, 2010 at 2:09 PM, Jim Bromer jimbro...@gmail.com wrote: On Wed, Jun 30, 2010 at 5:13 PM, Matt Mahoney matmaho...@yahoo.comwrote: Jim, what evidence do you have that Occam's Razor or algorithmic information theory is wrong, Also, what does this have to do with Cantor's diagonalization argument? AIT considers only the countably infinite set of hypotheses. -- Matt Mahoney, matmaho...@yahoo.com There cannot be a one to one correspondence to the representation of the shortest program to produce a string and the strings that they produce. This means that if the consideration of the hypotheses were to be put into general mathematical form it must include the potential of many to one relations between candidate programs (or subprograms) and output strings. But, there is also no way to determine what the shortest program is, since there may be different programs that are the same length. That means that there is a many to one relation between programs and program length. So the claim that you could just iterate through programs *by length* is false. This is the goal of algorithmic information theory not a premise of a methodology that can be used. So you have the diagonalization problem. --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
On Fri, Jul 2, 2010 at 2:25 PM, Jim Bromer jimbro...@gmail.com wrote: There cannot be a one to one correspondence to the representation of the shortest program to produce a string and the strings that they produce. This means that if the consideration of the hypotheses were to be put into general mathematical form it must include the potential of many to one relations between candidate programs (or subprograms) and output strings. But, there is also no way to determine what the shortest program is, since there may be different programs that are the same length. That means that there is a many to one relation between programs and program length. So the claim that you could just iterate through programs *by length* is false. This is the goal of algorithmic information theory not a premise of a methodology that can be used. So you have the diagonalization problem. A counter argument is that there are only a finite number of Turing Machine programs of a given length. However, since you guys have specifically designated that this theorem applies to any construction of a Turing Machine it is not clear that this counter argument can be used. And there is still the specific problem that you might want to try a program that writes a longer program to output a string (or many strings). Or you might want to write a program that can be called to write longer programs on a dynamic basis. I think these cases, where you might consider a program that outputs a longer program, (or another instruction string for another Turing Machine) constitutes a serious problem, that at the least, deserves to be answered with sound analysis. Part of my original intuitive argument, that I formed some years ago, was that without a heavy constraint on the instructions for the program, it will be practically impossible to test or declare that some program is indeed the shortest program. However, I can't quite get to the point now that I can say that there is definitely a diagonalization problem. Jim Bromer --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Jim, to address all of your points, Solomonoff induction claims that the probability of a string is proportional to the number of programs that output the string, where each program M is weighted by 2^-|M|. The probability is dominated by the shortest program (Kolmogorov complexity), but it is not exactly the same. The difference is small enough that we may neglect it, just as we neglect differences that depend on choice of language. Here is the proof that Kolmogorov complexity is not computable. Suppose it were. Then I could test the Kolmogorov complexity of strings in increasing order of length (breaking ties lexicographically) and describe the first string that cannot be described in less than a million bits, contradicting the fact that I just did. (Formally, I could write a program that outputs the first string whose Kolmogorov complexity is at least n bits, choosing n to be larger than my program). Here is the argument that Occam's Razor and Solomonoff distribution must be true. Consider all possible probability distributions p(x) over any infinite set X of possible finite strings x, i.e. any X = {x: p(x) 0} that is infinite. All such distributions must favor shorter strings over longer ones. Consider any x in X. Then p(x) 0. There can be at most a finite number (less than 1/p(x)) of strings that are more likely than x, and therefore an infinite number of strings which are less likely than x. Of this infinite set, only a finite number (less than 2^|x|) can be shorter than x, and therefore there must be an infinite number that are longer than x. So for each x we can partition X into 4 subsets as follows: - shorter and more likely than x: finite - shorter and less likely than x: finite - longer and more likely than x: finite - longer and less likely than x: infinite. So in this sense, any distribution over the set of strings must favor shorter strings over longer ones. -- Matt Mahoney, matmaho...@yahoo.com From: Jim Bromer jimbro...@gmail.com To: agi agi@v2.listbox.com Sent: Fri, July 2, 2010 4:09:38 PM Subject: Re: [agi] Re: Huge Progress on the Core of AGI On Fri, Jul 2, 2010 at 2:25 PM, Jim Bromer jimbro...@gmail.com wrote: There cannot be a one to one correspondence to the representation of the shortest program to produce a string and the strings that they produce. This means that if the consideration of the hypotheses were to be put into general mathematical form it must include the potential of many to one relations between candidate programs (or subprograms) and output strings. But, there is also no way to determine what the shortest program is, since there may be different programs that are the same length. That means that there is a many to one relation between programs and program length. So the claim that you could just iterate through programs by length is false. This is the goal of algorithmic information theory not a premise of a methodology that can be used. So you have the diagonalization problem. A counter argument is that there are only a finite number of Turing Machine programs of a given length. However, since you guys have specifically designated that this theorem applies to any construction of a Turing Machine it is not clear that this counter argument can be used. And there is still the specific problem that you might want to try a program that writes a longer program to output a string (or many strings). Or you might want to write a program that can be called to write longer programs on a dynamic basis. I think these cases, where you might consider a program that outputs a longer program, (or another instruction string for another Turing Machine) constitutes a serious problem, that at the least, deserves to be answered with sound analysis. Part of my original intuitive argument, that I formed some years ago, was that without a heavy constraint on the instructions for the program, it will be practically impossible to test or declare that some program is indeed the shortest program. However, I can't quite get to the point now that I can say that there is definitely a diagonalization problem. Jim Bromer agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Nice Occam's Razor argument. I understood it simply because I knew there are always an infinite number of possible explanations for every observation that are more complicated than the simplest explanation. So, without a reason to choose one of those other interpretations, then why choose it? You could look for reasons in complex environments, but it would likely be more efficient to wait for a reason to need a better explanation. It's more efficient to wait for an inconsistency than to search an infinite set without a reason to do so. Dave On Fri, Jul 2, 2010 at 6:08 PM, Matt Mahoney matmaho...@yahoo.com wrote: Jim, to address all of your points, Solomonoff induction claims that the probability of a string is proportional to the number of programs that output the string, where each program M is weighted by 2^-|M|. The probability is dominated by the shortest program (Kolmogorov complexity), but it is not exactly the same. The difference is small enough that we may neglect it, just as we neglect differences that depend on choice of language. Here is the proof that Kolmogorov complexity is not computable. Suppose it were. Then I could test the Kolmogorov complexity of strings in increasing order of length (breaking ties lexicographically) and describe the first string that cannot be described in less than a million bits, contradicting the fact that I just did. (Formally, I could write a program that outputs the first string whose Kolmogorov complexity is at least n bits, choosing n to be larger than my program). Here is the argument that Occam's Razor and Solomonoff distribution must be true. Consider all possible probability distributions p(x) over any infinite set X of possible finite strings x, i.e. any X = {x: p(x) 0} that is infinite. All such distributions must favor shorter strings over longer ones. Consider any x in X. Then p(x) 0. There can be at most a finite number (less than 1/p(x)) of strings that are more likely than x, and therefore an infinite number of strings which are less likely than x. Of this infinite set, only a finite number (less than 2^|x|) can be shorter than x, and therefore there must be an infinite number that are longer than x. So for each x we can partition X into 4 subsets as follows: - shorter and more likely than x: finite - shorter and less likely than x: finite - longer and more likely than x: finite - longer and less likely than x: infinite. So in this sense, any distribution over the set of strings must favor shorter strings over longer ones. -- Matt Mahoney, matmaho...@yahoo.com -- *From:* Jim Bromer jimbro...@gmail.com *To:* agi agi@v2.listbox.com *Sent:* Fri, July 2, 2010 4:09:38 PM *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI On Fri, Jul 2, 2010 at 2:25 PM, Jim Bromer jimbro...@gmail.com wrote: There cannot be a one to one correspondence to the representation of the shortest program to produce a string and the strings that they produce. This means that if the consideration of the hypotheses were to be put into general mathematical form it must include the potential of many to one relations between candidate programs (or subprograms) and output strings. But, there is also no way to determine what the shortest program is, since there may be different programs that are the same length. That means that there is a many to one relation between programs and program length. So the claim that you could just iterate through programs *by length* is false. This is the goal of algorithmic information theory not a premise of a methodology that can be used. So you have the diagonalization problem. A counter argument is that there are only a finite number of Turing Machine programs of a given length. However, since you guys have specifically designated that this theorem applies to any construction of a Turing Machine it is not clear that this counter argument can be used. And there is still the specific problem that you might want to try a program that writes a longer program to output a string (or many strings). Or you might want to write a program that can be called to write longer programs on a dynamic basis. I think these cases, where you might consider a program that outputs a longer program, (or another instruction string for another Turing Machine) constitutes a serious problem, that at the least, deserves to be answered with sound analysis. Part of my original intuitive argument, that I formed some years ago, was that without a heavy constraint on the instructions for the program, it will be practically impossible to test or declare that some program is indeed the shortest program. However, I can't quite get to the point now that I can say that there is definitely a diagonalization problem. Jim Bromer *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303
Re: [agi] Re: Huge Progress on the Core of AGI
Cantor's diagonal argument is (in all likelihood) mathematically correct. However the attempt to use Cantor's methodology to derive an irrational number that is the next greater irrational number from a given irrational number (to a degree of precision sufficient to distinguish the two numbers) is not mathematically correct. If you were to say that Cantor's argument was mathematically correct, I would agree with you. As far as I can tell it is. However, if you were then to use his method of enumerating irrational numbers as a means to discover subsequent irrational numbers, I would not conclude that you understand what it means to say that Cantor's diagonal argument was mathematically correct. (However, I am not a mathematician and I might be wrong in some ways.) Jim Bromer --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Re: Huge Progress on the Core of AGI
Jim, Well, like I said, it'll only probably lead you to accept AIT. :) In my case, it led me to accept AIT but not AIXI, with reasons somewhat similar to the ones Steve recently mentioned. I agree that there is not a perfect equivalence; the math here is subtle. Just saying it's equivalent glosses over many details... --Abram On Wed, Jun 30, 2010 at 9:13 AM, Jim Bromer jimbro...@gmail.com wrote: On Tue, Jun 29, 2010 at 11:46 PM, Abram Demski abramdem...@gmail.comwrote: In brief, the answer to your question is: we formalize the description length heuristic by assigning lower probabilities to longer hypotheses, and we apply Bayes law to update these probabilities given the data we observe. This updating captures the idea that we should reward theories which explain/expect more of the observations; it also provides a natural way to balance simplicity vs explanatory power, so that we can compare any two theories with a single scoring mechanism. Bayes Law automatically places the right amount of pressure to avoid overly elegant explanations which don't get much right, and to avoid overly complex explanations which fit the observations perfectly but which probably won't generalize to new data. ... If you go down this path, you will eventually come to understand (and, probably, accept) algorithmic information theory. Matt may be tring to force it on you too soon. :) --Abram David was asking about theories of explanation, and here you are suggesting that following a certain path of reasoning will lead to accepting AIT. What nonsense. Even assuming that Baye's law can be used to update probabilities of idealized utility, the connection between description length and explanatory power in general AI is tenuous. And when you realize that AIT is an unattainable idealism that lacks mathematical power (I do not believe that it is a valid mathematical method because it is incomputable and therefore innumerable and cannot be used to derive probability distributions even as ideals) you have to accept that the connection between explanatory theories and AIT is not established except as a special case based on the imagination that a similarities between a subclass of practical examples is the same as a powerful generalization of those examples. The problem is that while compression seems to be related to intelligence, it is not equivalent to intelligence. A much stronger but similarly false argument is that memory is intelligence. Of course memory is a major part of intelligence, but it is not everything. The argument that AIT is a reasonable substitute for developing more sophisticated theories about conceptual explanation is not well founded, it lacks any experimental evidence other than a spattering of results on simplistic cases, and it is just wrong to suggest that there is no reason to consider other theories of explanation. Yes compression has something to do with intelligence and, in some special cases it can be shown to act as an idealism for numerical rationality. And yes unattainable theories that examine the boundaries of productive mathematical systems is a legitimate subject for mathematics. But there is so much more to theories of explanatory reasoning that I genuinely feel sorry for those of you, who originally motivated to develop better AGI programs, would get caught in the obvious traps of AIT and AIXI. Jim Bromer On Tue, Jun 29, 2010 at 11:46 PM, Abram Demski abramdem...@gmail.comwrote: David, What Matt is trying to explain is all right, but I think a better way of answering your question would be to invoke the mighty mysterious Bayes' Law. I had an epiphany similar to yours (the one that started this thread) about 5 years ago now. At the time I did not know that it had all been done before. I think many people feel this way about MDL. Looking into the MDL (minimum description length) literature would be a good starting point. In brief, the answer to your question is: we formalize the description length heuristic by assigning lower probabilities to longer hypotheses, and we apply Bayes law to update these probabilities given the data we observe. This updating captures the idea that we should reward theories which explain/expect more of the observations; it also provides a natural way to balance simplicity vs explanatory power, so that we can compare any two theories with a single scoring mechanism. Bayes Law automatically places the right amount of pressure to avoid overly elegant explanations which don't get much right, and to avoid overly complex explanations which fit the observations perfectly but which probably won't generalize to new data. Bayes' Law and MDL have strong connections, though sometimes they part ways. There are deep theorems here. For me it's good enough to note that if we're using a maximally efficient code for our knowledge representation, they are equivalent. (This in itself involves some deep
Re: [agi] Re: Huge Progress on the Core of AGI
Jim, what evidence do you have that Occam's Razor or algorithmic information theory is wrong, besides your own opinions? It is well established that elegant (short) theories are preferred in all branches of science because they have greater predictive power. Also, what does this have to do with Cantor's diagonalization argument? AIT considers only the countably infinite set of hypotheses. -- Matt Mahoney, matmaho...@yahoo.com From: Jim Bromer jimbro...@gmail.com To: agi agi@v2.listbox.com Sent: Wed, June 30, 2010 9:13:44 AM Subject: Re: [agi] Re: Huge Progress on the Core of AGI On Tue, Jun 29, 2010 at 11:46 PM, Abram Demski abramdem...@gmail.com wrote: In brief, the answer to your question is: we formalize the description length heuristic by assigning lower probabilities to longer hypotheses, and we apply Bayes law to update these probabilities given the data we observe. This updating captures the idea that we should reward theories which explain/expect more of the observations; it also provides a natural way to balance simplicity vs explanatory power, so that we can compare any two theories with a single scoring mechanism. Bayes Law automatically places the right amount of pressure to avoid overly elegant explanations which don't get much right, and to avoid overly complex explanations which fit the observations perfectly but which probably won't generalize to new data. ... If you go down this path, you will eventually come to understand (and, probably, accept) algorithmic information theory. Matt may be tring to force it on you too soon. :) --Abram David was asking about theories of explanation, and here you are suggesting that following a certain path of reasoning will lead to accepting AIT. What nonsense. Even assuming that Baye's law can be used to update probabilities of idealized utility, the connection between description length and explanatory power in general AI is tenuous. And when you realize that AIT is an unattainable idealism that lacks mathematical power (I do not believe that it is a valid mathematical method because it is incomputable and therefore innumerable and cannot be used to derive probability distributions even as ideals) you have to accept that the connection between explanatory theories and AIT is not established except as a special case based on the imagination that a similarities between a subclass of practical examples is the same as a powerful generalization of those examples. The problem is that while compression seems to be related to intelligence, it is not equivalent to intelligence. A much stronger but similarly false argument is that memory is intelligence. Of course memory is a major part of intelligence, but it is not everything. The argument that AIT is a reasonable substitute for developing more sophisticated theories about conceptual explanation is not well founded, it lacks any experimental evidence other than a spattering of results on simplistic cases, and it is just wrong to suggest that there is no reason to consider other theories of explanation. Yes compression has something to do with intelligence and, in some special cases it can be shown to act as an idealism for numerical rationality. And yes unattainable theories that examine the boundaries of productive mathematical systems is a legitimate subject for mathematics. But there is so much more to theories of explanatory reasoning that I genuinely feel sorry for those of you, who originally motivated to develop better AGI programs, would get caught in the obvious traps of AIT and AIXI. Jim Bromer On Tue, Jun 29, 2010 at 11:46 PM, Abram Demski abramdem...@gmail.com wrote: David, What Matt is trying to explain is all right, but I think a better way of answering your question would be to invoke the mighty mysterious Bayes' Law. I had an epiphany similar to yours (the one that started this thread) about 5 years ago now. At the time I did not know that it had all been done before. I think many people feel this way about MDL. Looking into the MDL (minimum description length) literature would be a good starting point. In brief, the answer to your question is: we formalize the description length heuristic by assigning lower probabilities to longer hypotheses, and we apply Bayes law to update these probabilities given the data we observe. This updating captures the idea that we should reward theories which explain/expect more of the observations; it also provides a natural way to balance simplicity vs explanatory power, so that we can compare any two theories with a single scoring mechanism. Bayes Law automatically places the right amount of pressure to avoid overly elegant explanations which don't get much right, and to avoid overly complex explanations which fit the observations perfectly but which probably won't generalize to new data. Bayes' Law and MDL have strong connections, though
Re: [agi] Re: Huge Progress on the Core of AGI
David Jones wrote: If anyone has any knowledge of or references to the state of the art in explanation-based reasoning, can you send me keywords or links? The simplest explanation of the past is the best predictor of the future. http://en.wikipedia.org/wiki/Occam's_razor http://www.scholarpedia.org/article/Algorithmic_probability -- Matt Mahoney, matmaho...@yahoo.com From: David Jones davidher...@gmail.com To: agi agi@v2.listbox.com Sent: Tue, June 29, 2010 9:05:45 AM Subject: [agi] Re: Huge Progress on the Core of AGI If anyone has any knowledge of or references to the state of the art in explanation-based reasoning, can you send me keywords or links? I've read some through google, but I'm not really satisfied with anything I've found. Thanks, Dave On Sun, Jun 27, 2010 at 1:31 AM, David Jones davidher...@gmail.com wrote: A method for comparing hypotheses in explanatory-based reasoning: We prefer the hypothesis or explanation that *expects* more observations. If both explanations expect the same observations, then the simpler of the two is preferred (because the unnecessary terms of the more complicated explanation do not add to the predictive power). Why are expected events so important? They are a measure of 1) explanatory power and 2) predictive power. The more predictive and the more explanatory a hypothesis is, the more likely the hypothesis is when compared to a competing hypothesis. Here are two case studies I've been analyzing from sensory perception of simplified visual input: The goal of the case studies is to answer the following: How do you generate the most likely motion hypothesis in a way that is general and applicable to AGI? Case Study 1) Here is a link to an example: animated gif of two black squares move from left to right. Description: Two black squares are moving in unison from left to right across a white screen. In each frame the black squares shift to the right so that square 1 steals square 2's original position and square two moves an equal distance to the right. Case Study 2) Here is a link to an example: the interrupted square. Description: A single square is moving from left to right. Suddenly in the third frame, a single black square is added in the middle of the expected path of the original black square. This second square just stays there. So, what happened? Did the square moving from left to right keep moving? Or did it stop and then another square suddenly appeared and moved from left to right? Here is a simplified version of how we solve case study 1: The important hypotheses to consider are: 1) the square from frame 1 of the video that has a very close position to the square from frame 2 should be matched (we hypothesize that they are the same square and that any difference in position is motion). So, what happens is that in each two frames of the video, we only match one square. The other square goes unmatched. 2) We do the same thing as in hypothesis #1, but this time we also match the remaining squares and hypothesize motion as follows: the first square jumps over the second square from left to right. We hypothesize that this happens over and over in each frame of the video. Square 2 stops and square 1 jumps over it over and over again. 3) We hypothesize that both squares move to the right in unison. This is the correct hypothesis. So, why should we prefer the correct hypothesis, #3 over the other two? Well, first of all, #3 is correct because it has the most explanatory power of the three and is the simplest of the three. Simpler is better because, with the given evidence and information, there is no reason to desire a more complicated hypothesis such as #2. So, the answer to the question is because explanation #3 expects the most observations, such as: 1) the consistent relative positions of the squares in each frame are expected. 2) It also expects their new positions in each from based on velocity calculations. 3) It expects both squares to occur in each frame. Explanation 1 ignores 1 square from each frame of the video, because it can't match it. Hypothesis #1 doesn't have a reason for why the a new square appears in each frame and why one disappears. It doesn't expect these observations. In fact, explanation 1 doesn't expect anything that happens because something new happens in each frame, which doesn't give it a chance to confirm its hypotheses in subsequent frames. The power of this method is immediately clear. It is general and it solves the problem very cleanly. Here is a simplified version of how we solve case study 2: We expect the original square to move at a similar velocity from left to right because we hypothesized that it did move from left to right and we calculated its velocity. If this expectation is confirmed, then it is more likely than saying that the square suddenly stopped and another started moving
Re: [agi] Re: Huge Progress on the Core of AGI
Thanks Matt, Right. But Occam's Razor is not complete. It says simpler is better, but 1) this only applies when two hypotheses have the same explanatory power and 2) what defines simpler? So, maybe what I want to know from the state of the art in research is: 1) how precisely do other people define simpler and 2) More importantly, how do you compare competing explanations/hypotheses that have more or less explanatory power. Simpler does not apply unless you are comparing equally explanatory hypotheses. For example, the simplest hypothesis for all visual interpretation is that everything in the first image is gone in the second image, and everything in the second image is a new object. Simple. Done. Solved :) right? Well, clearly a more complicated explanation is warranted because a more complicated explanation is more *explanatory* and a better explanation. So, why is it better? Can it be defined as better in a precise way so that you can compare arbitrary hypotheses or explanations? That is what I'm trying to learn about. I don't think much progress has been made in this area, but I'd like to know what other people have done and any successes they've had. Dave On Tue, Jun 29, 2010 at 10:29 AM, Matt Mahoney matmaho...@yahoo.com wrote: David Jones wrote: If anyone has any knowledge of or references to the state of the art in explanation-based reasoning, can you send me keywords or links? The simplest explanation of the past is the best predictor of the future. http://en.wikipedia.org/wiki/Occam's_razorhttp://en.wikipedia.org/wiki/Occam%27s_razor http://en.wikipedia.org/wiki/Occam%27s_razor http://www.scholarpedia.org/article/Algorithmic_probability http://www.scholarpedia.org/article/Algorithmic_probability -- Matt Mahoney, matmaho...@yahoo.com -- *From:* David Jones davidher...@gmail.com *To:* agi agi@v2.listbox.com *Sent:* Tue, June 29, 2010 9:05:45 AM *Subject:* [agi] Re: Huge Progress on the Core of AGI If anyone has any knowledge of or references to the state of the art in explanation-based reasoning, can you send me keywords or links? I've read some through google, but I'm not really satisfied with anything I've found. Thanks, Dave On Sun, Jun 27, 2010 at 1:31 AM, David Jones davidher...@gmail.comwrote: A method for comparing hypotheses in explanatory-based reasoning: * We prefer the hypothesis or explanation that ***expects* more observations. If both explanations expect the same observations, then the simpler of the two is preferred (because the unnecessary terms of the more complicated explanation do not add to the predictive power).* *Why are expected events so important?* They are a measure of 1) explanatory power and 2) predictive power. The more predictive and the more explanatory a hypothesis is, the more likely the hypothesis is when compared to a competing hypothesis. Here are two case studies I've been analyzing from sensory perception of simplified visual input: The goal of the case studies is to answer the following: How do you generate the most likely motion hypothesis in a way that is general and applicable to AGI? *Case Study 1)* Here is a link to an example: animated gif of two black squares move from left to righthttp://practicalai.org/images/CaseStudy1.gif. *Description: *Two black squares are moving in unison from left to right across a white screen. In each frame the black squares shift to the right so that square 1 steals square 2's original position and square two moves an equal distance to the right. *Case Study 2) *Here is a link to an example: the interrupted squarehttp://practicalai.org/images/CaseStudy2.gif. *Description:* A single square is moving from left to right. Suddenly in the third frame, a single black square is added in the middle of the expected path of the original black square. This second square just stays there. So, what happened? Did the square moving from left to right keep moving? Or did it stop and then another square suddenly appeared and moved from left to right? *Here is a simplified version of how we solve case study 1: *The important hypotheses to consider are: 1) the square from frame 1 of the video that has a very close position to the square from frame 2 should be matched (we hypothesize that they are the same square and that any difference in position is motion). So, what happens is that in each two frames of the video, we only match one square. The other square goes unmatched. 2) We do the same thing as in hypothesis #1, but this time we also match the remaining squares and hypothesize motion as follows: the first square jumps over the second square from left to right. We hypothesize that this happens over and over in each frame of the video. Square 2 stops and square 1 jumps over it over and over again. 3) We hypothesize that both squares move to the right in unison. This is the correct hypothesis. So, why should we prefer
Re: [agi] Re: Huge Progress on the Core of AGI
Right. But Occam's Razor is not complete. It says simpler is better, but 1) this only applies when two hypotheses have the same explanatory power and 2) what defines simpler? A hypothesis is a program that outputs the observed data. It explains the data if its output matches what is observed. The simpler hypothesis is the shorter program, measured in bits. The language used to describe the data can be any Turing complete programming language (C, Lisp, etc) or any natural language such as English. It does not matter much which language you use, because for any two languages there is a fixed length procedure, described in either of the languages, independent of the data, that translates descriptions in one language to the other. For example, the simplest hypothesis for all visual interpretation is that everything in the first image is gone in the second image, and everything in the second image is a new object. Simple. Done. Solved :) right? The hypothesis is not the simplest. The program that outputs the two frames as if independent cannot be smaller than the two frames compressed independently. The program could be made smaller if it only described how the second frame is different than the first. It would be more likely to correctly predict the third frame if it continued to run and described how it would be different than the second frame. I don't think much progress has been made in this area, but I'd like to know what other people have done and any successes they've had. Kolmogorov proved that the solution is not computable. Given a hypothesis (a description of the observed data, or a program that outputs the observed data), there is no general procedure or test to determine whether a shorter (simpler, better) hypothesis exists. Proof: suppose there were. Then I could describe the first data set that cannot be described in less than a million bits even though I just did. (By first I mean the first data set encoded by a string from shortest to longest, breaking ties lexicographically). That said, I believe the state of the art in both language and vision are based on hierarchical neural models, i.e. pattern recognition using learned weighted combinations of simpler patterns. I am more familiar with language. The top ranked programs can be found at http://mattmahoney.net/dc/text.html -- Matt Mahoney, matmaho...@yahoo.com From: David Jones davidher...@gmail.com To: agi agi@v2.listbox.com Sent: Tue, June 29, 2010 10:44:41 AM Subject: Re: [agi] Re: Huge Progress on the Core of AGI Thanks Matt, Right. But Occam's Razor is not complete. It says simpler is better, but 1) this only applies when two hypotheses have the same explanatory power and 2) what defines simpler? So, maybe what I want to know from the state of the art in research is: 1) how precisely do other people define simpler and 2) More importantly, how do you compare competing explanations/hypotheses that have more or less explanatory power. Simpler does not apply unless you are comparing equally explanatory hypotheses. For example, the simplest hypothesis for all visual interpretation is that everything in the first image is gone in the second image, and everything in the second image is a new object. Simple. Done. Solved :) right? Well, clearly a more complicated explanation is warranted because a more complicated explanation is more *explanatory* and a better explanation. So, why is it better? Can it be defined as better in a precise way so that you can compare arbitrary hypotheses or explanations? That is what I'm trying to learn about. I don't think much progress has been made in this area, but I'd like to know what other people have done and any successes they've had. Dave On Tue, Jun 29, 2010 at 10:29 AM, Matt Mahoney matmaho...@yahoo.com wrote: David Jones wrote: If anyone has any knowledge of or references to the state of the art in explanation-based reasoning, can you send me keywords or links? The simplest explanation of the past is the best predictor of the future. http://en.wikipedia.org/wiki/Occam's_razor http://www.scholarpedia.org/article/Algorithmic_probability -- Matt Mahoney, matmaho...@yahoo.com From: David Jones davidher...@gmail.com To: agi agi@v2.listbox.com Sent: Tue, June 29, 2010 9:05:45 AM Subject: [agi] Re: Huge Progress on the Core of AGI If anyone has any knowledge of or references to the state of the art in explanation-based reasoning, can you send me keywords or links? I've read some through google, but I'm not really satisfied with anything I've found. Thanks, Dave On Sun, Jun 27, 2010 at 1:31 AM, David Jones davidher...@gmail.com wrote: A method for comparing hypotheses in explanatory-based reasoning: We prefer the hypothesis or explanation that *expects* more observations. If both explanations expect the same observations, then the simpler of the two
Re: [agi] Re: Huge Progress on the Core of AGI
David Jones wrote: I really don't think this is the right way to calculate simplicity. I will give you an example, because examples are more convincing than proofs. Suppose you perform a sequence of experiments whose outcome can either be 0 or 1. In the first 10 trials you observe 00. What do you expect to observe in the next trial? Hypothesis 1: the outcome is always 0. Hypothesis 2: the outcome is 0 for the first 10 trials and 1 thereafter. Hypothesis 1 is shorter than 2, so it is more likely to be correct. If I describe the two hypotheses in French or Chinese, then 1 is still shorter than 2. If I describe the two hypotheses in C, then 1 is shorter than 2. void hypothesis_1() { while (1) printf(0); } void hypothesis_2() { int i; for (i=0; i10; ++i) printf(0); while (1) printf(1); } If I translate these programs into Perl or Lisp or x86 assembler, then 1 will still be shorter than 2. I realize there might be smaller equivalent programs. But I think you could find a smaller program equivalent to hypothesis_1 than hypothesis_2. I realize there are other hypotheses than 1 or 2. But I think that the smallest one you can find that outputs eleven bits of which the first ten are zeros will be a program that outputs another zero. I realize that you could rewrite 1 so that it is longer than 2. But it is the shortest version that counts. More specifically consider all programs in which the first 10 outputs are 0. Then weight each program by 2^-length. So the shortest programs dominate. I realize you could make up a language where the shortest encoding of hypothesis 2 is shorter than 1. You could do this for any pair of hypotheses. However, I think if you stick to simple languages (and I realize this is a circular definition), then 1 will usually be shorter than 2. -- Matt Mahoney, matmaho...@yahoo.com From: David Jones davidher...@gmail.com To: agi agi@v2.listbox.com Sent: Tue, June 29, 2010 1:31:01 PM Subject: Re: [agi] Re: Huge Progress on the Core of AGI On Tue, Jun 29, 2010 at 11:26 AM, Matt Mahoney matmaho...@yahoo.com wrote: Right. But Occam's Razor is not complete. It says simpler is better, but 1) this only applies when two hypotheses have the same explanatory power and 2) what defines simpler? A hypothesis is a program that outputs the observed data. It explains the data if its output matches what is observed. The simpler hypothesis is the shorter program, measured in bits. I can't be confident that bits is the right way to do it. I suspect bits is an approximation of a more accurate method. I also suspect that you can write a more complex explanation program with the same number of bits. So, there are some flaws with this approach. It is an interesting idea to consider though. The language used to describe the data can be any Turing complete programming language (C, Lisp, etc) or any natural language such as English. It does not matter much which language you use, because for any two languages there is a fixed length procedure, described in either of the languages, independent of the data, that translates descriptions in one language to the other. Hypotheses don't have to be written in actual computer code and probably shouldn't be because hypotheses are not really meant to be run per say. And outputs are not necessarily the right way to put it either. Outputs imply prediction. And as mike has often pointed out, things cannot be precisely predicted. We can, however, determine whether a particular observation fits expectations, rather than equals some prediction. There may be multiple possible outcomes that we expect and which would be consistent with a hypothesis, which is why actual prediction should not be used. For example, the simplest hypothesis for all visual interpretation is that everything in the first image is gone in the second image, and everything in the second image is a new object. Simple. Done. Solved :) right? The hypothesis is not the simplest. The program that outputs the two frames as if independent cannot be smaller than the two frames compressed independently. The program could be made smaller if it only described how the second frame is different than the first. It would be more likely to correctly predict the third frame if it continued to run and described how it would be different than the second frame. I really don't think this is the right way to calculate simplicity. I don't think much progress has been made in this area, but I'd like to know what other people have done and any successes they've had. Kolmogorov proved that the solution is not computable. Given a hypothesis (a description of the observed data, or a program that outputs the observed data), there is no general procedure or test to determine whether a shorter (simpler, better) hypothesis exists. Proof: suppose there were. Then I could describe
Re: [agi] Re: Huge Progress on the Core of AGI
Such an example is no where near sufficient to accept the assertion that program size is the right way to define simplicity of a hypothesis. Here is a counter example. It requires a slightly more complex example because all zeros doesn't leave any room for alternative hypotheses. Here is the sequence: 10, 21, 32 void hypothesis_1() { int ten = 10; int counter = 0; while (1) { print(ten+counter) ten = ten + 10; counter = counter + 1; } } void hypothesis_2() { while (1) print(10 21 32) } Hypothesis 2 is simpler, yet clearly wrong. These examples don't really show anything. Dave On Tue, Jun 29, 2010 at 3:15 PM, Matt Mahoney matmaho...@yahoo.com wrote: David Jones wrote: I really don't think this is the right way to calculate simplicity. I will give you an example, because examples are more convincing than proofs. Suppose you perform a sequence of experiments whose outcome can either be 0 or 1. In the first 10 trials you observe 00. What do you expect to observe in the next trial? Hypothesis 1: the outcome is always 0. Hypothesis 2: the outcome is 0 for the first 10 trials and 1 thereafter. Hypothesis 1 is shorter than 2, so it is more likely to be correct. If I describe the two hypotheses in French or Chinese, then 1 is still shorter than 2. If I describe the two hypotheses in C, then 1 is shorter than 2. void hypothesis_1() { while (1) printf(0); } void hypothesis_2() { int i; for (i=0; i10; ++i) printf(0); while (1) printf(1); } If I translate these programs into Perl or Lisp or x86 assembler, then 1 will still be shorter than 2. I realize there might be smaller equivalent programs. But I think you could find a smaller program equivalent to hypothesis_1 than hypothesis_2. I realize there are other hypotheses than 1 or 2. But I think that the smallest one you can find that outputs eleven bits of which the first ten are zeros will be a program that outputs another zero. I realize that you could rewrite 1 so that it is longer than 2. But it is the shortest version that counts. More specifically consider all programs in which the first 10 outputs are 0. Then weight each program by 2^-length. So the shortest programs dominate. I realize you could make up a language where the shortest encoding of hypothesis 2 is shorter than 1. You could do this for any pair of hypotheses. However, I think if you stick to simple languages (and I realize this is a circular definition), then 1 will usually be shorter than 2. -- Matt Mahoney, matmaho...@yahoo.com -- *From:* David Jones davidher...@gmail.com *To:* agi agi@v2.listbox.com *Sent:* Tue, June 29, 2010 1:31:01 PM *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI On Tue, Jun 29, 2010 at 11:26 AM, Matt Mahoney matmaho...@yahoo.comwrote: Right. But Occam's Razor is not complete. It says simpler is better, but 1) this only applies when two hypotheses have the same explanatory power and 2) what defines simpler? A hypothesis is a program that outputs the observed data. It explains the data if its output matches what is observed. The simpler hypothesis is the shorter program, measured in bits. I can't be confident that bits is the right way to do it. I suspect bits is an approximation of a more accurate method. I also suspect that you can write a more complex explanation program with the same number of bits. So, there are some flaws with this approach. It is an interesting idea to consider though. The language used to describe the data can be any Turing complete programming language (C, Lisp, etc) or any natural language such as English. It does not matter much which language you use, because for any two languages there is a fixed length procedure, described in either of the languages, independent of the data, that translates descriptions in one language to the other. Hypotheses don't have to be written in actual computer code and probably shouldn't be because hypotheses are not really meant to be run per say. And outputs are not necessarily the right way to put it either. Outputs imply prediction. And as mike has often pointed out, things cannot be precisely predicted. We can, however, determine whether a particular observation fits expectations, rather than equals some prediction. There may be multiple possible outcomes that we expect and which would be consistent with a hypothesis, which is why actual prediction should not be used. For example, the simplest hypothesis for all visual interpretation is that everything in the first image is gone in the second image, and everything in the second image is a new object. Simple. Done. Solved :) right? The hypothesis is not the simplest. The program that outputs the two frames as if independent cannot be smaller than the two frames compressed independently. The program could be made smaller
Re: [agi] Re: Huge Progress on the Core of AGI
You can always find languages that favor either hypothesis. Suppose that you want to predict the sequence 10, 21, 32, ? and we write our hypothesis as a function that takes the trial number (0, 1, 2, 3...) and returns the outcome. The sequence 10, 21, 32, 43, 54... would be coded: int hypothesis_1(int trial) { return trial*11+10; } The sequence 10, 21, 32, 10, 21, 32... would be coded int hypothesis_2(int trial) { return trial%3*11+10; } which is longer and therefore less likely. Here is another example: predict the sequence 0, 1, 4, 9, 16, 25, 36, 49, ? Can you find a program shorter than this that doesn't predict 64? int hypothesis_1(int trial) { return trial*trial; } -- Matt Mahoney, matmaho...@yahoo.com From: David Jones davidher...@gmail.com To: agi agi@v2.listbox.com Sent: Tue, June 29, 2010 3:48:01 PM Subject: Re: [agi] Re: Huge Progress on the Core of AGI Such an example is no where near sufficient to accept the assertion that program size is the right way to define simplicity of a hypothesis. Here is a counter example. It requires a slightly more complex example because all zeros doesn't leave any room for alternative hypotheses. Here is the sequence: 10, 21, 32 void hypothesis_1() { int ten = 10; int counter = 0; while (1) { print(ten+counter) ten = ten + 10; counter = counter + 1; } } void hypothesis_2() { while (1) print(10 21 32) } Hypothesis 2 is simpler, yet clearly wrong. These examples don't really show anything. Dave On Tue, Jun 29, 2010 at 3:15 PM, Matt Mahoney matmaho...@yahoo.com wrote: David Jones wrote: I really don't think this is the right way to calculate simplicity. I will give you an example, because examples are more convincing than proofs. Suppose you perform a sequence of experiments whose outcome can either be 0 or 1. In the first 10 trials you observe 00. What do you expect to observe in the next trial? Hypothesis 1: the outcome is always 0. Hypothesis 2: the outcome is 0 for the first 10 trials and 1 thereafter. Hypothesis 1 is shorter than 2, so it is more likely to be correct. If I describe the two hypotheses in French or Chinese, then 1 is still shorter than 2. If I describe the two hypotheses in C, then 1 is shorter than 2. void hypothesis_1() { while (1) printf(0); } void hypothesis_2() { int i; for (i=0; i10; ++i) printf(0); while (1) printf(1); } If I translate these programs into Perl or Lisp or x86 assembler, then 1 will still be shorter than 2. I realize there might be smaller equivalent programs. But I think you could find a smaller program equivalent to hypothesis_1 than hypothesis_2. I realize there are other hypotheses than 1 or 2. But I think that the smallest one you can find that outputs eleven bits of which the first ten are zeros will be a program that outputs another zero. I realize that you could rewrite 1 so that it is longer than 2. But it is the shortest version that counts. More specifically consider all programs in which the first 10 outputs are 0. Then weight each program by 2^-length. So the shortest programs dominate. I realize you could make up a language where the shortest encoding of hypothesis 2 is shorter than 1. You could do this for any pair of hypotheses. However, I think if you stick to simple languages (and I realize this is a circular definition), then 1 will usually be shorter than 2. -- Matt Mahoney, matmaho...@yahoo.com From: David Jones davidher...@gmail.com To: agi agi@v2.listbox.com Sent: Tue, June 29, 2010 1:31:01 PM Subject: Re: [agi] Re: Huge Progress on the Core of AGI On Tue, Jun 29, 2010 at 11:26 AM, Matt Mahoney matmaho...@yahoo.com wrote: Right. But Occam's Razor is not complete. It says simpler is better, but 1) this only applies when two hypotheses have the same explanatory power and 2) what defines simpler? A hypothesis is a program that outputs the observed data. It explains the data if its output matches what is observed. The simpler hypothesis is the shorter program, measured in bits. I can't be confident that bits is the right way to do it. I suspect bits is an approximation of a more accurate method. I also suspect that you can write a more complex explanation program with the same number of bits. So, there are some flaws with this approach. It is an interesting idea to consider though. The language used to describe the data can be any Turing complete programming language (C, Lisp, etc) or any natural language such as English. It does not matter much which language you use, because for any two languages there is a fixed length procedure, described in either of the languages, independent of the data, that translates descriptions in one language to the other. Hypotheses don't have to be written in actual computer code and probably shouldn't
Re: [agi] Re: Huge Progress on the Core of AGI
David, What Matt is trying to explain is all right, but I think a better way of answering your question would be to invoke the mighty mysterious Bayes' Law. I had an epiphany similar to yours (the one that started this thread) about 5 years ago now. At the time I did not know that it had all been done before. I think many people feel this way about MDL. Looking into the MDL (minimum description length) literature would be a good starting point. In brief, the answer to your question is: we formalize the description length heuristic by assigning lower probabilities to longer hypotheses, and we apply Bayes law to update these probabilities given the data we observe. This updating captures the idea that we should reward theories which explain/expect more of the observations; it also provides a natural way to balance simplicity vs explanatory power, so that we can compare any two theories with a single scoring mechanism. Bayes Law automatically places the right amount of pressure to avoid overly elegant explanations which don't get much right, and to avoid overly complex explanations which fit the observations perfectly but which probably won't generalize to new data. Bayes' Law and MDL have strong connections, though sometimes they part ways. There are deep theorems here. For me it's good enough to note that if we're using a maximally efficient code for our knowledge representation, they are equivalent. (This in itself involves some deep math; I can explain if you're interested, though I believe I've already posted a writeup to this list in the past.) Bayesian updating is essentially equivalent to scoring hypotheses as: hypothesis size + size of data's description using hypothesis. Lower scores are better (as the score is approximately -log(probability)). If you go down this path, you will eventually come to understand (and, probably, accept) algorithmic information theory. Matt may be tring to force it on you too soon. :) --Abram On Tue, Jun 29, 2010 at 10:44 AM, David Jones davidher...@gmail.com wrote: Thanks Matt, Right. But Occam's Razor is not complete. It says simpler is better, but 1) this only applies when two hypotheses have the same explanatory power and 2) what defines simpler? So, maybe what I want to know from the state of the art in research is: 1) how precisely do other people define simpler and 2) More importantly, how do you compare competing explanations/hypotheses that have more or less explanatory power. Simpler does not apply unless you are comparing equally explanatory hypotheses. For example, the simplest hypothesis for all visual interpretation is that everything in the first image is gone in the second image, and everything in the second image is a new object. Simple. Done. Solved :) right? Well, clearly a more complicated explanation is warranted because a more complicated explanation is more *explanatory* and a better explanation. So, why is it better? Can it be defined as better in a precise way so that you can compare arbitrary hypotheses or explanations? That is what I'm trying to learn about. I don't think much progress has been made in this area, but I'd like to know what other people have done and any successes they've had. Dave On Tue, Jun 29, 2010 at 10:29 AM, Matt Mahoney matmaho...@yahoo.comwrote: David Jones wrote: If anyone has any knowledge of or references to the state of the art in explanation-based reasoning, can you send me keywords or links? The simplest explanation of the past is the best predictor of the future. http://en.wikipedia.org/wiki/Occam's_razorhttp://en.wikipedia.org/wiki/Occam%27s_razor http://en.wikipedia.org/wiki/Occam%27s_razor http://www.scholarpedia.org/article/Algorithmic_probability http://www.scholarpedia.org/article/Algorithmic_probability -- Matt Mahoney, matmaho...@yahoo.com -- *From:* David Jones davidher...@gmail.com *To:* agi agi@v2.listbox.com *Sent:* Tue, June 29, 2010 9:05:45 AM *Subject:* [agi] Re: Huge Progress on the Core of AGI If anyone has any knowledge of or references to the state of the art in explanation-based reasoning, can you send me keywords or links? I've read some through google, but I'm not really satisfied with anything I've found. Thanks, Dave On Sun, Jun 27, 2010 at 1:31 AM, David Jones davidher...@gmail.comwrote: A method for comparing hypotheses in explanatory-based reasoning: * We prefer the hypothesis or explanation that ***expects* more observations. If both explanations expect the same observations, then the simpler of the two is preferred (because the unnecessary terms of the more complicated explanation do not add to the predictive power).* *Why are expected events so important?* They are a measure of 1) explanatory power and 2) predictive power. The more predictive and the more explanatory a hypothesis is, the more likely the hypothesis is when compared to a competing hypothesis. Here are two case