Re: [agi] A question on the symbol-system hypothesis
Matt Mahoney wrote: I will try to answer several posts here. I said that the knowledge base of an AGI must be opaque because it has 10^9 bits of information, which is more than a person can comprehend. By opaque, I mean that you can't do any better by examining or modifying the internal representation than you could by examining or modifying the training data. For a text based AI with natural language ability, the 10^9 bits of training data would be about a gigabyte of text, about 1000 books. Of course you can sample it, add to it, edit it, search it, run various tests on it, and so on. What you can't do is read, write, or know all of it. There is no internal representation that you could convert it to that would allow you to do these things, because you still have 10^9 bits of information. It is a limitation of the human brain that it can't store more information than this. Understanding 10^9 bits of information is not the same as storing 10^9 bits of information. A typical painting in the Louvre might be 1 meter on a side. At roughly 16 pixels per millimeter, and a perceivable color depth of about 20 bits that would be about 10^8 bits. If an art specialist knew all about, say, 1000 paintings in the Louvre, that specialist would understand a total of about 10^11 bits. You might be inclined to say that not all of those bits count, that many are redundant to understanding. Exactly. People can easily comprehend 10^9 bits. It makes no sense to argue about degree of comprehension by quoting numbers of bits. Richard Loosemore - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
Mark Waser wrote: Given sufficient time, anything should be able to be understood and debugged. Give me *one* counter-example to the above . . . . Matt Mahoney replied: Google. You cannot predict the results of a search. It does not help that you have full access to the Internet. It would not help even if Google gave you full access to their server. This is simply not correct. Google uses a single non-random algorithm against a database to determine what results it returns. As long as you don't update the database, the same query will return the exact same results and, with knowledge of the algorithm, looking at the database manually will also return the exact same results. Full access to the Internet is a red herring. Access to Google's database at the time of the query will give the exact precise answer. This is also, exactly analogous to an AGI since access to the AGI's internal state will explain the AGI's decision (with appropriate caveats for systems that deliberately introduce randomness -- i.e. when the probability is 60/40, the AGI flips a weighted coin -- but in even those cases, the answer will still be of the form that the AGI ended up with a 60% probability of X and 40% probability of Y and the weighted coin landed on the 40% side). When we build AGI, we will understand it the way we understand Google. We know how a search engine works. We will understand how learning works. But we will not be able to predict or control what we build, even if we poke inside. I agree with your first three statements but again, the fourth is simply not correct (as well as a blatant invitation to UFAI). Google currently exercises numerous forms of control over their search engine. It is known that they do successfully exclude sites (for visibly trying to game PageRank, etc.). They constantly tweak their algorithms to change/improve the behavior and results. Note also that there is a huge difference between saying that something is/can be exactly controlled (or able to be exactly predicted without knowing it's exact internal state) and that something's behavior is bounded (i.e. that you can be sure that something *won't* happen -- like all of the air in a room suddenly deciding to occupy only half the room). No complex and immense system is precisely controlled but many complex and immense systems are easily bounded. - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, November 14, 2006 10:34 PM Subject: Re: [agi] A question on the symbol-system hypothesis I will try to answer several posts here. I said that the knowledge base of an AGI must be opaque because it has 10^9 bits of information, which is more than a person can comprehend. By opaque, I mean that you can't do any better by examining or modifying the internal representation than you could by examining or modifying the training data. For a text based AI with natural language ability, the 10^9 bits of training data would be about a gigabyte of text, about 1000 books. Of course you can sample it, add to it, edit it, search it, run various tests on it, and so on. What you can't do is read, write, or know all of it. There is no internal representation that you could convert it to that would allow you to do these things, because you still have 10^9 bits of information. It is a limitation of the human brain that it can't store more information than this. It doesn't matter if you agree with the number 10^9 or not. Whatever the number, either the AGI stores less information than the brain, in which case it is not AGI, or it stores more, in which case you can't know everything it does. Mark Waser wrote: I certainly don't buy the mystical approach that says that sufficiently large neural nets will come up with sufficiently complex discoveries that we can't understand them. James Ratcliff wrote: Having looked at the nueral network type AI algorithms, I dont see any fathomable way that that type of architecture could create a full AGI by itself. Nobody has created an AGI yet. Currently the only working model of intelligence we have is based on neural networks. Just because we can't understand it doesn't mean it is wrong. James Ratcliff wrote: Also it is a critical task for expert systems to explain why they are doing what they are doing, and for business application, I for one am not goign to blindy trust what the AI says, without a little background. I expect this ability to be part of a natural language model. However, any explanation will be based on the language model, not the internal workings of the knowledge representation. That remains opaque. For example: Q: Why did you turn left here? A: Because I need gas. There is no need to explain that there is an opening in the traffic, that you can see a place where you can turn left without going off the road, that the gas gauge reads
Re: [agi] A question on the symbol-system hypothesis
Matt, I would also note that you continue not to understand the difference between knowledge and data and contend that your 10^9 number is both entirely spurious and incorrect besides. I've read many times 1,000 books. I retain the vast majority of the *knowledge* in those books. I can't reproduce those books word for word by memory but that's not what intelligence is about AT ALL. It doesn't matter if you agree with the number 10^9 or not. Whatever the number, either the AGI stores less information than the brain, in which case it is not AGI, or it stores more, in which case you can't know everything it does. Information storage also has absolutely nothing to do with AGI (other than the fact that there probably is a minimum below which AGI can't fit). I know that my brain has far more information than is necessary for AGI (so the first part of your last statement is wrong). Further, I don't need to store everything that you know -- particularly if I have access to outside resources. My brain doesn't store all of the information in a phone book yet, effectively, I have total use of all of that information. Similarly, an AGI doesn't need to store 100% of the information that it uses. It simply needs to know where to find it upon need and how to use it. - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, November 14, 2006 10:34 PM Subject: Re: [agi] A question on the symbol-system hypothesis I will try to answer several posts here. I said that the knowledge base of an AGI must be opaque because it has 10^9 bits of information, which is more than a person can comprehend. By opaque, I mean that you can't do any better by examining or modifying the internal representation than you could by examining or modifying the training data. For a text based AI with natural language ability, the 10^9 bits of training data would be about a gigabyte of text, about 1000 books. Of course you can sample it, add to it, edit it, search it, run various tests on it, and so on. What you can't do is read, write, or know all of it. There is no internal representation that you could convert it to that would allow you to do these things, because you still have 10^9 bits of information. It is a limitation of the human brain that it can't store more information than this. It doesn't matter if you agree with the number 10^9 or not. Whatever the number, either the AGI stores less information than the brain, in which case it is not AGI, or it stores more, in which case you can't know everything it does. Mark Waser wrote: I certainly don't buy the mystical approach that says that sufficiently large neural nets will come up with sufficiently complex discoveries that we can't understand them. James Ratcliff wrote: Having looked at the nueral network type AI algorithms, I dont see any fathomable way that that type of architecture could create a full AGI by itself. Nobody has created an AGI yet. Currently the only working model of intelligence we have is based on neural networks. Just because we can't understand it doesn't mean it is wrong. James Ratcliff wrote: Also it is a critical task for expert systems to explain why they are doing what they are doing, and for business application, I for one am not goign to blindy trust what the AI says, without a little background. I expect this ability to be part of a natural language model. However, any explanation will be based on the language model, not the internal workings of the knowledge representation. That remains opaque. For example: Q: Why did you turn left here? A: Because I need gas. There is no need to explain that there is an opening in the traffic, that you can see a place where you can turn left without going off the road, that the gas gauge reads E, and that you learned that turning the steering wheel counterclockwise makes the car turn left, even though all of this is part of the thought process. The language model is responsible for knowing that you already know this. There is no need either (or even the ability) to explain the sequence of neuron firings from your eyes to your arm muscles. and this is one of the requirements for the Project Halo contest (took and passed the AP chemistry exam) http://www.projecthalo.com/halotempl.asp?cid=30 This is a perfect example of why a transparent KR does not scale. The expert system described was coded from 70 pages of a chemistry textbook in 28 person-months. Assuming 1K bits per page, this is a rate of 4 minutes per bit, or 2500 times slower than transmitting the same knowledge as natural language. Mark Waser wrote: Given sufficient time, anything should be able to be understood and debugged. ... Give me *one* counter-example to the above . . . . Google. You cannot predict the results of a search. It does not help that you have full access to the
Re: [agi] One grammar parser URL
1. No can do. The algorithmic complexity of parsing natural language as well as an average adult human is around 10^9 bits. There is no small grammar for English. 2. You need semantics to parse natural language. This is part of what makes it hard. Or do you want a parser that gives you wrong answers? I can do that. 3. If translating natural language to a structured representation is not hard, then do it. People have been working on this for 50 years without success. Doing logical inference is the easy part. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 8:59:45 AM Subject: Re: [agi] One grammar parser URL Several things: 1. Someone suggested these parsers to me: Eugene Charniak's http://www.cog.brown.edu/Research/nlp/resources.html Dan Bikel's http://www.cis.upenn.edu/~dbikel/software.html Demos for both are at: http://lfg-demo.computing.dcu.ie/lfgparser.html It seems that they are similar in function to the Stanford parser. I'd prefer smaller grammars and parsers with smaller memory footprints. 2. I ate pizza with {pepperoni|George|chopsticks} yielding the same parse should be expected. The difference of those sentences is in semantics, and the word with is overloaded with several meanings. The parser is only responsible for syntactic aspects. 3. Translating English sentences to Geniform or some other logical form may not be that hard, but after the translation we have to store the facts in a generic memory and use them for inference. For those, we need a canonical form, to organize the facts via clustering, and to keep track of what facts support other facts. All these are big problems. I'm looking for someone to do the translating so I can work on inference and generic memory. It is easier for one person to focus on one task, such as translation, for several formats. Another can focus on inference for several formats, etc. Then we can help each other while still exploring different ideas. YKY This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
Richard Loosemore [EMAIL PROTECTED] wrote: Understanding 10^9 bits of information is not the same as storing 10^9 bits of information. That is true. Understanding n bits is the same as compressing some larger training set that has an algorithmic complexity of n bits. Once you have done this, you can use your probability model to make predictions about unseen data generated by the same (unknown) Turing machine as the training data. The closer to n you can compress, the better your predictions will be. I am not sure what it means to understand a painting, but let's say that you understand art if you can identify the artists of paintings you haven't seen before with better accuracy than random guessing. The relevant quantity of information is not the number of pixels and resolution, which depend on the limits of the eye, but the (much smaller) number of features that the high level perceptual centers of the brain are capable of distinguishing and storing in memory. (Experiments by Standing and Landauer suggest it is a few bits per second for long term memory, the same rate as language). Then you guess the shortest program that generates a list of feature-artist pairs consistent with your knowledge of art and use it to predict artists given new features. My estimate of 10^9 bits for a language model is based on 4 lines of evidence, one of which is the amount of language you process in a lifetime. This is a rough estimate of course. I estimate 1 GB (8 x 10^9 bits) compressed to 1 bpc (Shannon) and assume you remember a significant fraction of that. Landauer, Tom (1986), “How much do people remember? Some estimates of the quantity of learned information in long term memory”, Cognitive Science (10) pp. 477-493 Shannon, Cluade E. (1950), “Prediction and Entropy of Printed English”, Bell Sys. Tech. J (3) p. 50-64. Standing, L. (1973), “Learning 10,000 Pictures”, Quarterly Journal of Experimental Psychology (25) pp. 207-222. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 9:33:04 AM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: I will try to answer several posts here. I said that the knowledge base of an AGI must be opaque because it has 10^9 bits of information, which is more than a person can comprehend. By opaque, I mean that you can't do any better by examining or modifying the internal representation than you could by examining or modifying the training data. For a text based AI with natural language ability, the 10^9 bits of training data would be about a gigabyte of text, about 1000 books. Of course you can sample it, add to it, edit it, search it, run various tests on it, and so on. What you can't do is read, write, or know all of it. There is no internal representation that you could convert it to that would allow you to do these things, because you still have 10^9 bits of information. It is a limitation of the human brain that it can't store more information than this. Understanding 10^9 bits of information is not the same as storing 10^9 bits of information. A typical painting in the Louvre might be 1 meter on a side. At roughly 16 pixels per millimeter, and a perceivable color depth of about 20 bits that would be about 10^8 bits. If an art specialist knew all about, say, 1000 paintings in the Louvre, that specialist would understand a total of about 10^11 bits. You might be inclined to say that not all of those bits count, that many are redundant to understanding. Exactly. People can easily comprehend 10^9 bits. It makes no sense to argue about degree of comprehension by quoting numbers of bits. Richard Loosemore - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
Sorry if I did not make clear the distinction between knowing the learning algorithm for AGI (which we can do) and knowing what was learned (which we can't). My point about Google is to illustrate that distinction. The Google database is about 10^14 bits. (It keeps a copy of the searchable part of the Internet in RAM). The algorithm is deterministic. You could, in principle, model the Google server in a more powerful machine and use it to predict the result of a search. But where does this get you? You can't predict the result of the simulation any more than you could predict the result of the query you are simulating. In practice the human brain has finite limits just like any other computer. My point about AGI is that constructing an internal representation that allows debugging the learned knowledge is pointless. A more powerful AGI could do it, but you can't. You can't do any better than to manipulate the input and observe the output. If you tell your robot to do something and it sits in a corner instead, you can't do any better than to ask it why, hope for a sensible answer, and retrain it. Trying to debug the reasoning for its behavior would be like trying to understand why a driver made a left turn by examining the neural firing patterns in the driver's brain. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 9:39:14 AM Subject: Re: [agi] A question on the symbol-system hypothesis Mark Waser wrote: Given sufficient time, anything should be able to be understood and debugged. Give me *one* counter-example to the above . . . . Matt Mahoney replied: Google. You cannot predict the results of a search. It does not help that you have full access to the Internet. It would not help even if Google gave you full access to their server. This is simply not correct. Google uses a single non-random algorithm against a database to determine what results it returns. As long as you don't update the database, the same query will return the exact same results and, with knowledge of the algorithm, looking at the database manually will also return the exact same results. Full access to the Internet is a red herring. Access to Google's database at the time of the query will give the exact precise answer. This is also, exactly analogous to an AGI since access to the AGI's internal state will explain the AGI's decision (with appropriate caveats for systems that deliberately introduce randomness -- i.e. when the probability is 60/40, the AGI flips a weighted coin -- but in even those cases, the answer will still be of the form that the AGI ended up with a 60% probability of X and 40% probability of Y and the weighted coin landed on the 40% side). When we build AGI, we will understand it the way we understand Google. We know how a search engine works. We will understand how learning works. But we will not be able to predict or control what we build, even if we poke inside. I agree with your first three statements but again, the fourth is simply not correct (as well as a blatant invitation to UFAI). Google currently exercises numerous forms of control over their search engine. It is known that they do successfully exclude sites (for visibly trying to game PageRank, etc.). They constantly tweak their algorithms to change/improve the behavior and results. Note also that there is a huge difference between saying that something is/can be exactly controlled (or able to be exactly predicted without knowing it's exact internal state) and that something's behavior is bounded (i.e. that you can be sure that something *won't* happen -- like all of the air in a room suddenly deciding to occupy only half the room). No complex and immense system is precisely controlled but many complex and immense systems are easily bounded. - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, November 14, 2006 10:34 PM Subject: Re: [agi] A question on the symbol-system hypothesis I will try to answer several posts here. I said that the knowledge base of an AGI must be opaque because it has 10^9 bits of information, which is more than a person can comprehend. By opaque, I mean that you can't do any better by examining or modifying the internal representation than you could by examining or modifying the training data. For a text based AI with natural language ability, the 10^9 bits of training data would be about a gigabyte of text, about 1000 books. Of course you can sample it, add to it, edit it, search it, run various tests on it, and so on. What you can't do is read, write, or know all of it. There is no internal representation that you could convert it to that would allow you to do these things, because you still have 10^9 bits of information. It is a limitation of the human brain
Re: [agi] A question on the symbol-system hypothesis
Matt Mahoney wrote: Richard Loosemore [EMAIL PROTECTED] wrote: Understanding 10^9 bits of information is not the same as storing 10^9 bits of information. That is true. Understanding n bits is the same as compressing some larger training set that has an algorithmic complexity of n bits. Once you have done this, you can use your probability model to make predictions about unseen data generated by the same (unknown) Turing machine as the training data. The closer to n you can compress, the better your predictions will be. I am not sure what it means to understand a painting, but let's say that you understand art if you can identify the artists of paintings you haven't seen before with better accuracy than random guessing. The relevant quantity of information is not the number of pixels and resolution, which depend on the limits of the eye, but the (much smaller) number of features that the high level perceptual centers of the brain are capable of distinguishing and storing in memory. (Experiments by Standing and Landauer suggest it is a few bits per second for long term memory, the same rate as language). Then you guess the shortest program that generates a list of feature-artist pairs consistent with your knowledge of art and use it to predict artists given new features. My estimate of 10^9 bits for a language model is based on 4 lines of evidence, one of which is the amount of language you process in a lifetime. This is a rough estimate of course. I estimate 1 GB (8 x 10^9 bits) compressed to 1 bpc (Shannon) and assume you remember a significant fraction of that. Matt, So long as you keep redefining understand to mean whatever something trivial (or at least, something different in different circumstances), all you do is reinforce the point I was trying to make. In your definition of understanding in the context of art, above, you specifically choose an interpretation that enables you to pick a particular bit rate. But if I chose a different interpretation (and I certainly would - an art historian would never say they understood a painting just because they could tell the artist's style better than a random guess!), I might come up with a different bit rate. And if I chose a sufficiently subtle concept of understand, I would be unable to come up with *any* bit rate, because that concept of understand would not lend itself to any easy bit rate analysis. The lesson? Talking about bits and bit rates is completely pointless which was my point. You mainly identify the meaning of understand as a variant of the meaning of compress. I completely reject this - this is the most idiotic development in AI research since the early attempts to do natural language translation using word-by-word lookup tables - and I challenge you to say why anyone could justify reducing the term in such an extreme way. Why have you thrown out the real meaning of understand and substituted another meaning? What have we gained by dumbing the concept down? As I said in previously, this is as crazy as redefining the complex concept of happiness to be a warm puppy. Richard Loosemore Landauer, Tom (1986), “How much do people remember? Some estimates of the quantity of learned information in long term memory”, Cognitive Science (10) pp. 477-493 Shannon, Cluade E. (1950), “Prediction and Entropy of Printed English”, Bell Sys. Tech. J (3) p. 50-64. Standing, L. (1973), “Learning 10,000 Pictures”, Quarterly Journal of Experimental Psychology (25) pp. 207-222. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 9:33:04 AM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: I will try to answer several posts here. I said that the knowledge base of an AGI must be opaque because it has 10^9 bits of information, which is more than a person can comprehend. By opaque, I mean that you can't do any better by examining or modifying the internal representation than you could by examining or modifying the training data. For a text based AI with natural language ability, the 10^9 bits of training data would be about a gigabyte of text, about 1000 books. Of course you can sample it, add to it, edit it, search it, run various tests on it, and so on. What you can't do is read, write, or know all of it. There is no internal representation that you could convert it to that would allow you to do these things, because you still have 10^9 bits of information. It is a limitation of the human brain that it can't store more information than this. Understanding 10^9 bits of information is not the same as storing 10^9 bits of information. A typical painting in the Louvre might be 1 meter on a side. At roughly 16 pixels per millimeter, and a perceivable color depth of about 20 bits that would be about 10^8 bits. If an art specialist
Re: [agi] A question on the symbol-system hypothesis
You're drifting off topic . . . . Let me remind you of the flow of the conversation. You said: Models that are simple enough to debug are too simple to scale. The contents of a knowledge base for AGI will be beyond our ability to comprehend. I said: Given sufficient time, anything should be able to be understood and debugged. Give me *one* counter-example to the above . . . . You said: Google. You cannot predict the results of a search. and It would not help even if Google gave you full access to their server. I said: This is simply not correct. Google uses a single non-random algorithm against a database to determine what results it returns. As long as you don't update the database, the same query will return the exact same results and, with knowledge of the algorithm, looking at the database manually will also return the exact same results. You are now changing the argument from your quote You cannot predict the results of a search ... even if Google gave you full access to their server to now say that you can't know what was learned (which I also believe is incorrect but will debate in the next e-mail). Are you conceding that you can predict the results of a Google search? Are you now conceding that it is not true that Models that are simple enough to debug are too simple to scale.? And, if the former but not the latter, would you care to attempt to offer another counter-example or would you prefer to retract your initial statements? - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 2:24 PM Subject: Re: [agi] A question on the symbol-system hypothesis Sorry if I did not make clear the distinction between knowing the learning algorithm for AGI (which we can do) and knowing what was learned (which we can't). My point about Google is to illustrate that distinction. The Google database is about 10^14 bits. (It keeps a copy of the searchable part of the Internet in RAM). The algorithm is deterministic. You could, in principle, model the Google server in a more powerful machine and use it to predict the result of a search. But where does this get you? You can't predict the result of the simulation any more than you could predict the result of the query you are simulating. In practice the human brain has finite limits just like any other computer. My point about AGI is that constructing an internal representation that allows debugging the learned knowledge is pointless. A more powerful AGI could do it, but you can't. You can't do any better than to manipulate the input and observe the output. If you tell your robot to do something and it sits in a corner instead, you can't do any better than to ask it why, hope for a sensible answer, and retrain it. Trying to debug the reasoning for its behavior would be like trying to understand why a driver made a left turn by examining the neural firing patterns in the driver's brain. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 9:39:14 AM Subject: Re: [agi] A question on the symbol-system hypothesis Mark Waser wrote: Given sufficient time, anything should be able to be understood and debugged. Give me *one* counter-example to the above . . . . Matt Mahoney replied: Google. You cannot predict the results of a search. It does not help that you have full access to the Internet. It would not help even if Google gave you full access to their server. This is simply not correct. Google uses a single non-random algorithm against a database to determine what results it returns. As long as you don't update the database, the same query will return the exact same results and, with knowledge of the algorithm, looking at the database manually will also return the exact same results. Full access to the Internet is a red herring. Access to Google's database at the time of the query will give the exact precise answer. This is also, exactly analogous to an AGI since access to the AGI's internal state will explain the AGI's decision (with appropriate caveats for systems that deliberately introduce randomness -- i.e. when the probability is 60/40, the AGI flips a weighted coin -- but in even those cases, the answer will still be of the form that the AGI ended up with a 60% probability of X and 40% probability of Y and the weighted coin landed on the 40% side). When we build AGI, we will understand it the way we understand Google. We know how a search engine works. We will understand how learning works. But we will not be able to predict or control what we build, even if we poke inside. I agree with your first three statements but again, the fourth is simply not correct (as well as a blatant invitation to UFAI). Google currently exercises numerous forms of control over their search engine.
Re: [agi] A question on the symbol-system hypothesis
Richard, what is your definition of understanding? How would you test whether a person understands art? Turing offered a behavioral test for intelligence. My understanding of understanding is that it is something that requires intelligence. The connection between intelligence and compression is not obvious. I have summarized the arguments here. http://cs.fit.edu/~mmahoney/compression/rationale.html -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 2:38:49 PM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: Richard Loosemore [EMAIL PROTECTED] wrote: Understanding 10^9 bits of information is not the same as storing 10^9 bits of information. That is true. Understanding n bits is the same as compressing some larger training set that has an algorithmic complexity of n bits. Once you have done this, you can use your probability model to make predictions about unseen data generated by the same (unknown) Turing machine as the training data. The closer to n you can compress, the better your predictions will be. I am not sure what it means to understand a painting, but let's say that you understand art if you can identify the artists of paintings you haven't seen before with better accuracy than random guessing. The relevant quantity of information is not the number of pixels and resolution, which depend on the limits of the eye, but the (much smaller) number of features that the high level perceptual centers of the brain are capable of distinguishing and storing in memory. (Experiments by Standing and Landauer suggest it is a few bits per second for long term memory, the same rate as language). Then you guess the shortest program that generates a list of feature-artist pairs consistent with your knowledge of art and use it to predict artists given new features. My estimate of 10^9 bits for a language model is based on 4 lines of evidence, one of which is the amount of language you process in a lifetime. This is a rough estimate of course. I estimate 1 GB (8 x 10^9 bits) compressed to 1 bpc (Shannon) and assume you remember a significant fraction of that. Matt, So long as you keep redefining understand to mean whatever something trivial (or at least, something different in different circumstances), all you do is reinforce the point I was trying to make. In your definition of understanding in the context of art, above, you specifically choose an interpretation that enables you to pick a particular bit rate. But if I chose a different interpretation (and I certainly would - an art historian would never say they understood a painting just because they could tell the artist's style better than a random guess!), I might come up with a different bit rate. And if I chose a sufficiently subtle concept of understand, I would be unable to come up with *any* bit rate, because that concept of understand would not lend itself to any easy bit rate analysis. The lesson? Talking about bits and bit rates is completely pointless which was my point. You mainly identify the meaning of understand as a variant of the meaning of compress. I completely reject this - this is the most idiotic development in AI research since the early attempts to do natural language translation using word-by-word lookup tables - and I challenge you to say why anyone could justify reducing the term in such an extreme way. Why have you thrown out the real meaning of understand and substituted another meaning? What have we gained by dumbing the concept down? As I said in previously, this is as crazy as redefining the complex concept of happiness to be a warm puppy. Richard Loosemore Landauer, Tom (1986), “How much do people remember? Some estimates of the quantity of learned information in long term memory”, Cognitive Science (10) pp. 477-493 Shannon, Cluade E. (1950), “Prediction and Entropy of Printed English”, Bell Sys. Tech. J (3) p. 50-64. Standing, L. (1973), “Learning 10,000 Pictures”, Quarterly Journal of Experimental Psychology (25) pp. 207-222. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 9:33:04 AM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: I will try to answer several posts here. I said that the knowledge base of an AGI must be opaque because it has 10^9 bits of information, which is more than a person can comprehend. By opaque, I mean that you can't do any better by examining or modifying the internal representation than you could by examining or modifying the training data. For a text based AI with natural language ability, the 10^9 bits of training data would be about a gigabyte of text, about 1000
Re: [agi] A question on the symbol-system hypothesis
It keeps a copy of the searchable part of the Internet in RAM Sometimes I wonder why I argue with you when you throw around statements like this that are this massively incorrect. Would you care to retract this? You could, in principle, model the Google server in a more powerful machine and use it to predict the result of a search What is this model the Google server BS? Google search results are a *rat-simple* database query. Building the database involves a much more sophisticated algorithm but it's results are *entirely* predictable if you know the order of the sites that are going to be imported. There is *NO* mystery or magic here. It is all eminently debuggable if you know the initial conditions. My point about AGI is that constructing an internal representation that allows debugging the learned knowledge is pointless. Huh? This is absolutely ridiculous. If the learned knowledge can't be debugged (either by you or by the AGI) then it's going to be *a lot* more difficult to unlearn/correct incorrect knowledge. How can that possibly be pointless? Not to mention the fact that teaching knowledge to others is much easier . . . . A more powerful AGI could do it, but you can't. Why can't I -- particularly if I were given infinite time (or even a moderately decent set of tools)? You can't do any better than to manipulate the input and observe the output. This is absolute and total BS and last two sentences in your e-mail (If you tell your robot to do something and it sits in a corner instead, you can't do any better than to ask it why, hope for a sensible answer, and retrain it. Trying to debug the reasoning for its behavior would be like trying to understand why a driver made a left turn by examining the neural firing patterns in the driver's brain.) are even worse. The human brain *is* relatively opaque in it's operation but there is no good reason that I know of why this is advantageous and *many* reasons why it is disadvantageous -- and I know of no reasons why opacity is required for intelligence. - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 2:24 PM Subject: Re: [agi] A question on the symbol-system hypothesis Sorry if I did not make clear the distinction between knowing the learning algorithm for AGI (which we can do) and knowing what was learned (which we can't). My point about Google is to illustrate that distinction. The Google database is about 10^14 bits. (It keeps a copy of the searchable part of the Internet in RAM). The algorithm is deterministic. You could, in principle, model the Google server in a more powerful machine and use it to predict the result of a search. But where does this get you? You can't predict the result of the simulation any more than you could predict the result of the query you are simulating. In practice the human brain has finite limits just like any other computer. My point about AGI is that constructing an internal representation that allows debugging the learned knowledge is pointless. A more powerful AGI could do it, but you can't. You can't do any better than to manipulate the input and observe the output. If you tell your robot to do something and it sits in a corner instead, you can't do any better than to ask it why, hope for a sensible answer, and retrain it. Trying to debug the reasoning for its behavior would be like trying to understand why a driver made a left turn by examining the neural firing patterns in the driver's brain. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 9:39:14 AM Subject: Re: [agi] A question on the symbol-system hypothesis Mark Waser wrote: Given sufficient time, anything should be able to be understood and debugged. Give me *one* counter-example to the above . . . . Matt Mahoney replied: Google. You cannot predict the results of a search. It does not help that you have full access to the Internet. It would not help even if Google gave you full access to their server. This is simply not correct. Google uses a single non-random algorithm against a database to determine what results it returns. As long as you don't update the database, the same query will return the exact same results and, with knowledge of the algorithm, looking at the database manually will also return the exact same results. Full access to the Internet is a red herring. Access to Google's database at the time of the query will give the exact precise answer. This is also, exactly analogous to an AGI since access to the AGI's internal state will explain the AGI's decision (with appropriate caveats for systems that deliberately introduce randomness -- i.e. when the probability is 60/40, the AGI flips a weighted coin -- but in even those cases, the answer will still be of
Re: [agi] A question on the symbol-system hypothesis
The connection between intelligence and compression is not obvious. The connection between intelligence and compression *is* obvious -- but compression, particularly lossless compression, is clearly *NOT* intelligence. Intelligence compresses knowledge to ever simpler rules because that is an effective way of dealing with the world. Discarding ineffective/unnecessary knowledge to make way for more effective/necessary knowledge is an effective way of dealing with the world. Blindly maintaining *all* knowledge at tremendous costs is *not* an effective way of dealing with the world (i.e. it is *not* intelligent). 1. What Hutter proved is that the optimal behavior of an agent is to guess that the environment is controlled by the shortest program that is consistent with all of the interaction observed so far. The problem of finding this program known as AIXI. 2. The general problem is not computable [11], although Hutter proved that if we assume time bounds t and space bounds l on the environment, then this restricted problem, known as AIXItl, can be solved in O(t2l) time Very nice -- except that O(t2l) time is basically equivalent to incomputable for any real scenario. Hutter's proof is useless because it relies upon the assumption that you have adequate resources (i.e. time) to calculate AIXI -- which you *clearly* do not. And like any other proof, once you invalidate the assumptions, the proof becomes equally invalid. Except as an interesting but unobtainable edge case, why do you believe that Hutter has any relevance at all? - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 2:54 PM Subject: Re: [agi] A question on the symbol-system hypothesis Richard, what is your definition of understanding? How would you test whether a person understands art? Turing offered a behavioral test for intelligence. My understanding of understanding is that it is something that requires intelligence. The connection between intelligence and compression is not obvious. I have summarized the arguments here. http://cs.fit.edu/~mmahoney/compression/rationale.html -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 2:38:49 PM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: Richard Loosemore [EMAIL PROTECTED] wrote: Understanding 10^9 bits of information is not the same as storing 10^9 bits of information. That is true. Understanding n bits is the same as compressing some larger training set that has an algorithmic complexity of n bits. Once you have done this, you can use your probability model to make predictions about unseen data generated by the same (unknown) Turing machine as the training data. The closer to n you can compress, the better your predictions will be. I am not sure what it means to understand a painting, but let's say that you understand art if you can identify the artists of paintings you haven't seen before with better accuracy than random guessing. The relevant quantity of information is not the number of pixels and resolution, which depend on the limits of the eye, but the (much smaller) number of features that the high level perceptual centers of the brain are capable of distinguishing and storing in memory. (Experiments by Standing and Landauer suggest it is a few bits per second for long term memory, the same rate as language). Then you guess the shortest program that generates a list of feature-artist pairs consistent with your knowledge of art and use it to predict artists given new features. My estimate of 10^9 bits for a language model is based on 4 lines of evidence, one of which is the amount of language you process in a lifetime. This is a rough estimate of course. I estimate 1 GB (8 x 10^9 bits) compressed to 1 bpc (Shannon) and assume you remember a significant fraction of that. Matt, So long as you keep redefining understand to mean whatever something trivial (or at least, something different in different circumstances), all you do is reinforce the point I was trying to make. In your definition of understanding in the context of art, above, you specifically choose an interpretation that enables you to pick a particular bit rate. But if I chose a different interpretation (and I certainly would - an art historian would never say they understood a painting just because they could tell the artist's style better than a random guess!), I might come up with a different bit rate. And if I chose a sufficiently subtle concept of understand, I would be unable to come up with *any* bit rate, because that concept of understand would not lend itself to any easy bit rate analysis. The lesson? Talking about bits and bit rates is completely pointless which was my point. You mainly identify the
Re: [agi] A question on the symbol-system hypothesis
Matt Mahoney wrote: Richard, what is your definition of understanding? How would you test whether a person understands art? Turing offered a behavioral test for intelligence. My understanding of understanding is that it is something that requires intelligence. The connection between intelligence and compression is not obvious. I have summarized the arguments here. http://cs.fit.edu/~mmahoney/compression/rationale.html 1) There will probably never be a compact definition of understanding. Nevertheless, it is possible for us (being understanding systems) to know some of its features. I could produce a shopping list of typical features of understanding, but that would not be the same as a definition, so I will not. See my paper in the forthcoming proceedings of the 2006 AGIRI workshop, for arguments. (I will make a version of this available this week, after final revisions). 3) One tiny, almost-too-obvious-to-be-worth-stating fact about understanding is that it compresses information in order to do its job. 4) To mistake this tiny little facet of understanding for the whole is to say that a hurricane IS rotation, rather than that rotation is a facet of what a hurricane is. 5) I have looked at your paper and my feelings are exactly the same as Mark's theorems developed on erroneous assumptions are worthless. Richard Loosemore - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
Mark Waser wrote: Are you conceding that you can predict the results of a Google search? OK, you are right. You can type the same query twice. Or if you live long enough you can do it the hard way. But you won't. Are you now conceding that it is not true that Models that are simple enough to debug are too simple to scale.? OK, you are right again. Plain text is a simple way to represent knowledge. I can search and edit terabytes of it. But this is not the point I wanted to make. I am sure I expressed it badly. The point is there are two parts to AGI, a learning algorithm and a knowledge base. The learning algorithm has low complexity. You can debug it, meaning you can examine the internals to test it and verify it is working the way you want. The knowledge base has high complexity. You can't debug it. You can examine it and edit it but you can't verify its correctness. An AGI with a correct learning algorithm might still behave badly. You can't examine the knowledge base to find out why. You can't manipulate the knowledge base data to fix it. At least you can't do these things any better than manipulating the inputs and observing the outputs. The reason is that the knowledge base is too complex. In theory you could do these things if you lived long enough, but you won't. For practical purposes, the AGI knowledge base is a black box. You need to design your goals, learning algorithm, data set and test program with this in mind. Trying to build transparency into the data structure would be pointless. Information theory forbids it. Opacity is not advantagous or desirable. It is just unavoidable. I am sure I won't convince you, so maybe you have a different explanation why 50 years of building structured knowledge bases has not worked, and what you think can be done about it? And Google DOES keep the searchable part of the Internet in memory http://blog.topix.net/archives/11.html because they have enough hardware to do it. http://en.wikipedia.org/wiki/Supercomputer#Quasi-supercomputing -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] One grammar parser URL
3. If translating natural language to a structured representation is not hard, then do it. People have been working on this for 50 years without success. Doing logical inference is the easy part. Actually, a more accurate statement would be Doing individual logical inference steps is the easy part. Appropriately constructing useful long chains of inference steps is an unsolved problem, just as is mapping NLP into a structured logical representation. Hence, the fact that nearly all automated theorem-provers are currently used in interactive mode, where the automated system does a few inference steps and then appeals to a human for help in search tree pruning (aka choosing what to do next), and then the automated system does a few more steps, etc. If you buy Lakoff and Nunez's theory of the cognitive underpinning of mathematics (and logic) in everyday embodied physical experience, then it follows that these two problems (semantic interpretation and inference control) have a lot of overlap. If human logical inference is based on metaphors of embodied experience, then inference control in humans is largely based on metaphors of control processes carried out in choosing actions in the everyday life context. In this case, the common sense knowledge deficit experienced by AI's underlies the difficulty that AI's experience with both inference control and semantic interpretation. Our approach in the Novamente project is to give our AI common sense knowledge via embedding it and interacting with it in the AGISim simulation world. This approach has yet to be proven, of course. However, it has not yet been as convincingly disproven as the Cyc-type approach of feeding a AI commonsense knowledge encoded in a formal language ;-) -- Ben G In this case, - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
1. The fact that AIXI^tl is intractable is not relevant to the proof that compression = intelligence, any more than the fact that AIXI is not computable. In fact it is supporting because it says that both are hard problems, in agreement with observation. 2. Do not confuse the two compressions. AIXI proves that the optimal behavior of a goal seeking agent is to guess the shortest program consistent with its interaction with the environment so far. This is lossless compression. A typical implementation is to perform some pattern recognition on the inputs to identify features that are useful for prediction. We sometimes call this lossy compression because we are discarding irrelevant data. If we anthropomorphise the agent, then we say that we are replacing the input with perceptually indistinguishable data, which is what we typically do when we compress video or sound. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 3:48:37 PM Subject: Re: [agi] A question on the symbol-system hypothesis The connection between intelligence and compression is not obvious. The connection between intelligence and compression *is* obvious -- but compression, particularly lossless compression, is clearly *NOT* intelligence. Intelligence compresses knowledge to ever simpler rules because that is an effective way of dealing with the world. Discarding ineffective/unnecessary knowledge to make way for more effective/necessary knowledge is an effective way of dealing with the world. Blindly maintaining *all* knowledge at tremendous costs is *not* an effective way of dealing with the world (i.e. it is *not* intelligent). 1. What Hutter proved is that the optimal behavior of an agent is to guess that the environment is controlled by the shortest program that is consistent with all of the interaction observed so far. The problem of finding this program known as AIXI. 2. The general problem is not computable [11], although Hutter proved that if we assume time bounds t and space bounds l on the environment, then this restricted problem, known as AIXItl, can be solved in O(t2l) time Very nice -- except that O(t2l) time is basically equivalent to incomputable for any real scenario. Hutter's proof is useless because it relies upon the assumption that you have adequate resources (i.e. time) to calculate AIXI -- which you *clearly* do not. And like any other proof, once you invalidate the assumptions, the proof becomes equally invalid. Except as an interesting but unobtainable edge case, why do you believe that Hutter has any relevance at all? - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 2:54 PM Subject: Re: [agi] A question on the symbol-system hypothesis Richard, what is your definition of understanding? How would you test whether a person understands art? Turing offered a behavioral test for intelligence. My understanding of understanding is that it is something that requires intelligence. The connection between intelligence and compression is not obvious. I have summarized the arguments here. http://cs.fit.edu/~mmahoney/compression/rationale.html -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 2:38:49 PM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: Richard Loosemore [EMAIL PROTECTED] wrote: Understanding 10^9 bits of information is not the same as storing 10^9 bits of information. That is true. Understanding n bits is the same as compressing some larger training set that has an algorithmic complexity of n bits. Once you have done this, you can use your probability model to make predictions about unseen data generated by the same (unknown) Turing machine as the training data. The closer to n you can compress, the better your predictions will be. I am not sure what it means to understand a painting, but let's say that you understand art if you can identify the artists of paintings you haven't seen before with better accuracy than random guessing. The relevant quantity of information is not the number of pixels and resolution, which depend on the limits of the eye, but the (much smaller) number of features that the high level perceptual centers of the brain are capable of distinguishing and storing in memory. (Experiments by Standing and Landauer suggest it is a few bits per second for long term memory, the same rate as language). Then you guess the shortest program that generates a list of feature-artist pairs consistent with your knowledge of art and use it to predict artists given new features. My estimate of 10^9 bits for a language model is based on 4 lines of evidence, one of which is
Re: [agi] A question on the symbol-system hypothesis
Richard Loosemore [EMAIL PROTECTED] wrote: 5) I have looked at your paper and my feelings are exactly the same as Mark's theorems developed on erroneous assumptions are worthless. Which assumptions are erroneous? -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 4:09:23 PM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: Richard, what is your definition of understanding? How would you test whether a person understands art? Turing offered a behavioral test for intelligence. My understanding of understanding is that it is something that requires intelligence. The connection between intelligence and compression is not obvious. I have summarized the arguments here. http://cs.fit.edu/~mmahoney/compression/rationale.html 1) There will probably never be a compact definition of understanding. Nevertheless, it is possible for us (being understanding systems) to know some of its features. I could produce a shopping list of typical features of understanding, but that would not be the same as a definition, so I will not. See my paper in the forthcoming proceedings of the 2006 AGIRI workshop, for arguments. (I will make a version of this available this week, after final revisions). 3) One tiny, almost-too-obvious-to-be-worth-stating fact about understanding is that it compresses information in order to do its job. 4) To mistake this tiny little facet of understanding for the whole is to say that a hurricane IS rotation, rather than that rotation is a facet of what a hurricane is. 5) I have looked at your paper and my feelings are exactly the same as Mark's theorems developed on erroneous assumptions are worthless. Richard Loosemore - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303