Re: [agi] Marcus Hutter's lossless compression of human knowledge prize
A common objection to compression as a test for AI is that humans can't do compression, so it has nothing to do with AI. The reason people can't compress is that compression requires both AI and deterministic computation. The human brain is not deterministic because it is made of neurons, which are noisy analog devices. In order to compress text well, the compressor must be able to estimate probabilities over text strings, i.e. predict text. If you take a text fragment from a book or article and ask someone to guess the next character, most people could do so more accurately than any program now in existence. This is clearly an AI problem. It takes intelligence to predict text like 6+8=__ or roses are ___. If a data compressor had such knowledge, then it could assign the shortest codes to the most likely answers. Specifically, if the next symbol has probability p, it is assigned a code of length log2(1/p) bits. Such knowledge is useless for human compression because the decompressor must have the exact same knowledge to generate the same codes. This requires deterministic computation, which is no problem for a machine. Therefore I believe that compression is a valid test for AI, and desirable because it is totally objective, unlike the Loebner prize. Some known problems with the test: - To pass the Turing test, a machine must have a model of interactive conversation. The Wikipedia training data lacks examples of dialogs. My argument is that noninteractive text is appropriate for tasks such as OCR, language translation, broadcast speech recognition, which is more useful than a machine that deliberatly makes mistakes and slows its responses to appear human. I think that the problems of learning interactive and noninteractive models have a lot of overlap. - A language model is insufficient for problems with nontextual I/O such as vision or robotics that require symbols to be grounded. True, but a language model should be a part of such systems. - We do not know how much compression is equivalent to AI. Shannon gave a very rough estimate of 0.6 to 1.3 bits per character entropy for written English in 1950. There has not been much progress since then in getting better numbers. The best compressors are near the high end of this range. (I did some research in 2000 to try to pin down a better number. http://cs.fit.edu/~mmahoney/dissertation/entropy1.html ) - It has not been shown that AI can learn from just a lot of text. I believe that lexical, semantic, and syntactic models can be learned from unlabeled text. Children seem to do this. I doubt that higher level problem solving abilities can be learned without some coaching from the programmer, but this is allowed. - The WIkipedia text has a lot of nontext like hypertext links, tables, foreign words, XML, etc. True, but it is 75% text, so a better compressor still needs to compress text better. Most of these issues were brought up in other newsgroups. http://groups.google.com/group/comp.ai.nat-lang/browse_frm/thread/9411183ccde5f7a1/# http://groups.google.com/group/comp.compression/browse_frm/thread/3f096aea993273cb/# http://groups.google.com/group/Hutter-Prize?lnk=li I also discuss them here. http://cs.fit.edu/~mmahoney/compression/rationale.html -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Ben Goertzel [EMAIL PROTECTED] To: agi@v2.listbox.com Cc: Bruce J. Klein [EMAIL PROTECTED] Sent: Saturday, August 12, 2006 12:28:30 PM Subject: [agi] Marcus Hutter's lossless compression of human knowledge prize Hi, About the Hutter Prize (see the end of this email for a quote of the post I'm responding to, which was posted a week or two ago)... While I have the utmost respect for Marcus Hutter's theoretical work on AGI, and I do think this prize is an interesting one, I also want to state that I don't think questing to win the Hutter Prize is a very effective path to follow if one's goal is to create pragmatic AGI. Look at it this way: WinZip could compress that dataset further than *I* could, given a brief period of time to do it. Yet, who is more generally intelligent, me or WinZip?? And who understands the data better? Or, consider my 9 year old daughter, who is a pretty bright girl but does not yet know how to write computer programs (she seems to have picked up some recessive genes for social well-adjustedness, and is not as geeky has her dad or big brothers...). Without a bunch of further education, she might NEVER be able to compress that dataset further than WinZip (whereas I could do it by writing a better compression algorithm than WinZip, given a bit of time). Yet I submit that she has significantly greater general intelligence than WinZip. In short: Compression as a measure of intelligence is only valid if you leave both processing time, and the contextuality of intelligence, out of the picture Similarly, my Novamente AI system is not made to perform rapid
Re: [agi] Marcus Hutter's lossless compression of human knowledge prize
First, the compression problem is not in NP. The general problem of encoding strings as the smallest programs to output them is undecidable.Second, given a model, then compression is the same as prediction. A model is a function that maps any string s to an estimated probability p(s). A compressor then maps s to a code of length log(1/p(s)). The decompressor does the inverse mapping. The compressor and decompressor only need to agree on the model p(), and can then use identical algorithms to assign a mapping. (This step is deterministic, so not possible by humans). Modeling is the same as prediction because p(s) = PROD_i p(s_i | s_1 s_2 ... s_i-1)which is the product of conditional probabilities over the next symbol s_i given all of the previous symbols s_1 through s_i-1 in s.Third, given a fixed test set, it would be trivial to write a decompressor that memorized it verbatim and compress to 0 bytes if we did not include the size of the decompressor in the contest. Instead you have to start with a small amount of knowledge coded into the decompressor and learn the rest from the data itself. This is a test of language learning ability.It is easy to dismiss compression as unrelated to AGI. How do you test if a machine with only text I/O knows that roses are red? Suppose it sees "red roses", then later "roses are" and predicts "red". An LSA or distant-bigram model will do this.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Russell Wallace [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, August 12, 2006 5:30:21 PMSubject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prizeOn 8/12/06, Matt Mahoney [EMAIL PROTECTED] wrote: In order to compress text well, the compressor must be able to estimate probabilities over text strings, i.e. predict text. Um no, the compressor doesn't need to predict anything - it has the entire file already at hand. The _de_compressor would benefit from being able to predict, e.g. "roses are red" - the third word need not be sent if the decompressor can be relied on to know what it will be given the first two. However, this is not permitted by the terms of the prize: the compressed file cannot depend on a knowledge base at the receiving end; it must run on a bare PC. Therefore the challenge is a purely mathematical one (in class NP, given a limit on decompression time), and not related to AGI. Even if the terms of the prize did allow a knowledge base at the receiving end (which would be problematic for a compression benchmark; it would be very difficult to make the test objective), it still wouldn't really be related to AGI. A good decompressor would know that the words "roses are" tend to be followed by the word "red" - but it would not know that the three words in sequence mean that roses are red. To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED] To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Marcus Hutter's lossless compression of human knowledge prize
Hutter's only assumption about AIXI is that the environment can be simulated by a Turing machine. With regard to forgetting, I think it plays a minor role in language modeling compared to vision and hearing. To model those, you need to understand what the brain filters out. Lossy compression formats like JPEG and MP3 exploit this by discarding what cannot be seen or heard. However, text doesn't work this way. How much can you discard from a text file before it differs noticeably? -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Pei Wang [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Saturday, August 12, 2006 8:53:40 PM Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prize Matt, So you mean we should leave forgetting out of the picture, just because we don't know how to objectively measure it. Though objectiveness is indeed desired for almost all measurements, it is not the only requirement for a good measurement of intelligence. Someone can objectively measure a wrong property of a system. I haven't been convinced why lossless compression can be taken as an indicator of intelligence, except that it is objective and easy to check. You wrote in your website that Hutter [21,22] proved that finding the optimal behavior of a rational agent is equivalent to compressing its observations., but his proof is under certain assumptions about the agent and its environment. Do these assumptions hold for the human mind or AGI in general? Pei On 8/12/06, Matt Mahoney [EMAIL PROTECTED] wrote: Forgetting is an important function in human intelligence because the storage capacity of the brain is finite. This is a form of lossy compression, discarding the least important information. Unfortunately, lossy compression cannot be evaluated objectively. We can compare an image compressed with JPEG with an equal sized image compressed by discarding the low order bits of each pixel, and judge the JPEG image to be of higher quality. JPEG uses a better model of the human visual system by discarding the same information that the human visual perception process does. It is more intelligent. Lossy image compression is a valid but subjective evaluation of models of human vision. There is no objective algorithm to test for image quality. It has to be done by humans. A lossless image compression contest would not measure intelligence because you are modeling the physics of light and matter, not something that comes from humans. Also, the vast majority of information in a raw image is useless noise, which is not compressible. A good model of the compressible parts would have only a small effect. It is better to discard the noise. We are a long way from understading vision. Standing in 1973 measured subjects ability to memorize 10,000 pictures, viewed for 5 seconds each, then 2 days later in a recall test showed pictures and asked if they were in the earlier set, which they did correctly much of the time [1]. You could achieve the same result if you compressed each picture to about 30 bits and compared Hamming distances. This is a long term learning rate of 6 bits per second for images, or 2 x 10^9 bits over a lifetime, assuming we don't forget anything after 2 days. Likewise, Landauer [2] estimated human long term memory at 10^9 bits based on rates of learning and forgetting. It is also about how much information you can absorb as speech or writing in a lifetime assuming 150 words per minute at 1 bpc entropy. It seems that the long term learning rate of the brain is independent of the medium. This is why I chose 1 GB of text for the benchmark. Text compression measures intelligence because it models information that comes from the human brain, not an external source. Also, there is very little noise in text. If a paragraph can be rephrased in 1000 different ways without changing its meaning, it only adds 10 more bits to code which representation was chosen. That is why lossless compression makes sense. [1] Standing, L. (1973), Learning 10,000 Pictures, Quarterly Journal of Experimental Psychology (25) pp. 207-222. [2] Landauer, Tom (1986), How much do people remember? Some estimates of the quantity of learned information in long term memory, Cognitive Science (10) pp. 477-493. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Pei Wang [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Saturday, August 12, 2006 4:03:55 PM Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prize Matt, To summarize and generalize data and to use the summary to predict the future is no doubt at the core of intelligence. However, I do not call this process compressing, because the result is not faultless, that is, there is information loss. It is not only because the human brains are noisy analog devices, but because the future
Re: [agi] Marcus Hutter's lossless compression of human knowledge prize
I will try to answer several posts here.First, I said that there is no knowledge that you can demonstrate verbally that cannot also be learned verbally. For simple cases, this is easy to show. If you test for knowledge X by asking question Q, expecting answer A, then you can train a machine "the answer to Q is A". I realize for many practical cases that there could be many questions about Q and you can't anticipate them all. In other words, X could be a procedure or algorithm for generating answers from an intractably large set of questions. For example, X could be the rules for addition or playing chess. In this case, you could train the machine by giving it the algorithm in the form of natural language text (here is how you play chess...).Humans possess a lot of knowledge that cannot be demonstrated verbally. Examples: how to ride a bicycle, how to catch a ball, what a banana tastes like, what my face looks like. The English language is inadequate to convey such knowledge fully, although some partial knowledge transfer is possible (I have brown hair). Now try to think of questions to test for the parts of the knowledge that cannot be conveyed verbally. Sure, you could ask what color my hair is. Try to ask a question about knowledge that cannot be conveyed verbally to the machine at all. If you can't convey this knowledge to the machine, it can't convey it to you.An important question is: how much information does a machine need to pass the Turing test? The machine only needs knowledge that can be verbally tested. Information theory says that this quantity cannot exceed the entropy of the training data plus the algorithmic complexity (length of the program) of the machine prior to training. From my argument above, all of the training data can be in the form of text. I estimate that the average adult has been exposed to about 1 GB of speech (transcribed) and writing since birth. This is why I chose 1 GB for the large text benchmark. I do not claim that the Wikipedia data is the *right* text to train an AI system, but I think it is the right amount, and I believe that the algorithms we would use on the right training set would be very similar to the ones we would use on this training set.Second, on lossy vs. lossless compression. It would be a good demonstration of AI if we could compress text using lossy techniques and uncompress to different text that had the same meaning. We can already do this at a simple level, e.g. swapping spaces and linefeeds, or substituting synonyms, or swapping the order of articles. We can't yet do this in the more conceptual way that humans could, but I think that a lossless model could demonstrate this capability. For example, an AI-level language model would recognize the similarity of "I ate a Big Mac" and "I ate at McDonalds" by compressing the concatenated pair of strings to a size only slightly larger than either string compressed by itself. This ability could then be used to generate conceptually similar strings (in O(n) time as I described earlier).Third, on AIXI, this is a mathematically proven result, so there is no need to test it experimentally. The purpose of the Hutter prize is to encourage research in human intelligence with regard to verbally expressable knowledge, not the more general case. The general case is known to be undecidable, or at least intractable in environments controlled by a finite state machine.AIXI requires the assumption that the environment be computable by a Turing machine. I think this is reasonable. People actually do behave like rational agents. If they didn't, we would not have Occam's razor.Here is an example: you draw 100 marbles from an urn. All of them are red. What do you predict will be the color of the next marble? Answer this way: what is the shortest program you could write that outputs 101 words, where the first 100 are "red"?Fourth, a program that downloads the Wikipedia benchmark violates the rules of the prize. The decompressor must run on a computer without a network connection. Rules are here:http://cs.fit.edu/~mmahoney/compression/textrules.html-- Matt Mahoney, [EMAIL PROTECTED] To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Marcus Hutter's lossless compression of human knowledge prize
I can't disagree that an AGI with vision and motor control is more useful than one without. Also I agree that humans learn language by integrating a lot of nonverbal knowledge, that humans verbally demonstrate some knowledge learned nonverbally and nonverbally demonstrate some knowledge learned verbally. Humans reason and solve problems using a model of a world that is learned both verbally and nonverbally. It would be difficult to construct such a model without nonverbal data. In the last Loebner contest, one of the judges asked the machines, which is bigger, a 747 or my big toe? None of the machines could answer. They lacked a good model of the real world. Cyc has a lot of hand-coded common sense knowledge and an inference engine, so it could probably answer such a question if it was phrased in Cycl. However Cyc lacks natural language ability and had no entry in the Loebner contest. It lacks a complete language model. I think to get the training data for a good model of the real world as humans experience it, you need to build a humanoid robot so that it can experience all of the things that real people experience. Without this model, you could not pass the Turing test. Not that you couldn't build a world model by other means, it would just take enormous effort, like Cyc. The goal of AI should not be to pass the Turing test. Who needs a machine that makes arithmetic mistakes and slows down its responses? What we need are machines that know enough natural language to do their jobs. An automated travel agent needs to know that a 747 is an airplane, and it needs to understand you when you say you want to change your reservation to leave next Tuesday. It does not need to know about toes. I think the Hutter prize will lead to a better understading of how we learn semantics and syntax. It will lead to language models that enable applications and your operating system to have a working natural language interface. It will improve the accuracy of text scanning, handwriting recognition, speech recognition, and language translation. It will lead to better spam detection. It will automate a lot of work now done by people on phones. Language modeling is short of AGI but I think it is an important goal. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Ben Goertzel [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Sunday, August 13, 2006 3:25:41 PM Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prize Matt, You've stated that any knowledge that can be demonstrated verbally CAN in principle be taught verbally. I don't agree that this is necessarily true for ANY learning system, but that's not the point I want to argue. My larger point is that this doesn't imply that this is how humans do it. So, if a human has learned a verbal behavior, and has been exposed to 1GB of text, it does not imply that said human has learned said behavior from said text. In fact there is much evidence that this is NOT the case -- this is what the whole literature on symbol grounding is about. Humans happen to learn a lot of their verbal behaviors based on non-verbal stimuli and actions. But, this is not to say that some other AI system couldn't learn to IMITATE human verbal behaviors based only on studying human behaviors, of course. IMO, focusing AI narrowly on text processing is a bad direction for near-term AGI research. I think that focusing on symbol grounding and perception/action/cognition integration is a better approach. But this better approach is not likely, in the immediate term, to be the best approach to excelling at the Hutter Prize task. Which gets back to my point that seeking to win the Hutter Prize is probably not a good guide for near-term AGI development. -- Ben G On 8/13/06, Matt Mahoney [EMAIL PROTECTED] wrote: I will try to answer several posts here. First, I said that there is no knowledge that you can demonstrate verbally that cannot also be learned verbally. For simple cases, this is easy to show. If you test for knowledge X by asking question Q, expecting answer A, then you can train a machine the answer to Q is A. I realize for many practical cases that there could be many questions about Q and you can't anticipate them all. In other words, X could be a procedure or algorithm for generating answers from an intractably large set of questions. For example, X could be the rules for addition or playing chess. In this case, you could train the machine by giving it the algorithm in the form of natural language text (here is how you play chess...). Humans possess a lot of knowledge that cannot be demonstrated verbally. Examples: how to ride a bicycle, how to catch a ball, what a banana tastes like, what my face looks like. The English language is inadequate to convey such knowledge fully, although some partial knowledge transfer is possible (I have brown hair). Now try to think
Re: [agi] Marcus Hutter's lossless compression of human knowledge prize
Semantic learning from unlabeled text has already been demonstrated and used to improve both text compression (perplexity) and word error rates for speech recognition [1], and pass the word analogy section of the SAT exams [2]. Semantic models exploit the fact that related words like moon and star tend to appear near each other, forming a fuzzy identity relation. Syntactic learning is possible from unlabeled text because words with the same grammatical role tend to appear in the same immediate context. For example, the X is tells you that X is a noun, allowing you to predict sequences like a X was. [1] Bellegarda, Jerome R., “Speech recognition experiments using multi-span statistical language models”, IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 717-720, 1999. [2] Turney, Peter D., Measuring Semantic Similarity by Latent Relational Analysis. In Proceedings Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05), 1136-1141, Edinburgh, Scotland, 2005. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Sunday, August 13, 2006 5:25:19 PM Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prize I think the Hutter prize will lead to a better understading of how we learn semantics and syntax. I have to disagree strongly. As long as you a requiring recreation at the bit level as opposed to the semantic or logical level, you aren't going to learn much at all about semantics or syntax (other than, possibly, relative frequency of various constructs which you can then use to *slightly* better optimize -- maybe well enough to win some money but not well enough to win enough to make it worthwhile since it is a definite sidetrack from AGI). Mark --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize
I've read Chaniak's book, Statistical Language Learning. A lot of researchers in language modeling are using perplexity (compression ratio) to compare models. But there are some problems with the way this is done. 1. Many evaluations are done on corpora from the LDC which are not free, like TREC, WSJ, Brown, etc. 2. Many evaluations use offline models. They train on a portion of the data set and evaluate on the rest, or use leave-one-out, or maybe divide into 3 parts including a validation set. This makes it difficult to compare work by different researchers because there is no consistency in the details of these experiments. 3. The input is usually preprocessed in various ways. Normally, case is folded, the words are converted to tokens from a fixed vocabulary and punctuation is removed. Again there is no consistency in the details, like the size of the vocabulary, whether to include numbers, etc. Also this filtering removes useful information, so it is difficult to evaulate the true perplexity of the model. I think a good language model will need to combine many techniques in lexical modeling (vocabulary acquistion, stemming, recognizing multiword phrases and compound words, dealing with rare words, misspelled words, capitalization, punctuation and various nontext forms of junk), semantics (distant bigrams, LSA), and syntax (statistical parsers, hidden Markov models) in a uniform framework. Most work is usually in the form of a word trigram model plus one other technique on cleaned up text. Nobody has put all this stuff together. As a result, the best compresors still use byte-level ngram statistics and at most some crude lexical parsing. I think we can do better. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: J. Storrs Hall, PhD. [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, August 15, 2006 9:37:32 AM Subject: Re: Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize On Tuesday 15 August 2006 00:55, Matt Mahoney wrote: ... To improve compression further, you will need to model semantics and/or syntax. No compressor currently does this. Has anyone looked at the statistical parsers? There is a big subfield of computational linguistics doing exactly this, cf e.g. Charniak (down the page to statistical parsing) http://www.cs.brown.edu/%7Eec/ I would speculate, btw, that the decompressor should be a virtual machine for some powerful macro-expander (which are equivalent to the lambda calculus, ergo Turing machines) and the probabilistic regularities in the source be reflected in the encoding -- which would be implemented by the executable compressed file. Josh --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED] --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize
I realize it is tempting to use lossy text compression as a test for AI because that is what the human brain does when we read text and recall it in paraphrased fashion. We remember the ideas and discard details about the _expression_ of those ideas. A lossy text compressor that did the same thing would certainly demonstrate AI.But there are two problems with using lossy compression as a test of AI:1. The test is subjective.2. Lossy compression does not imply AI.Lets assume we solve the subjectivity problem by having human judges evaluate whether the decompressed output is "close enough" to the input. We already do this with lossy image, audio and video compression (without much consensus).The second problem remains: ideal lossy compression does not imply passing the Turing test. For lossless compression, it can be proven that it does. Let p(s) be the (unknown) probability that s will be the prefix of a text dialog. Then a machine that can compute p(s) exactly is able to generate response A to question Q with the distribution p(QA)/p(Q) which is indistinguishable from human. The same model minimizes the compressed size, E[log 1/p(s)].This proof does not hold for lossy compression because different lossless models map to identical lossy models. The desired property of a lossless compressor C is that if and only if s1 and s2 have the same meaning (to most people), then the encodings C(s1) = C(s2). This code will ideally have length log 1/(p(s1)+p(s2)). But this does not imply that the decompressor knows p(s1) or p(s2). Thus, the decompressor may decompress to s1 or s2 or choose randomly between them. In general, the output distribution will be different than the true distrubution p(s1), p(s2), so it will be distinguishable from human even if the compression ratio is ideal.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, August 15, 2006 9:28:26 AMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize I don't see any point in this debate over lossless vs. lossy compression Lets see if I can simplify it. The stated goal is compressing human knowledge. The exact, same knowledge can always be expressed in a *VERY*large number of different bit strings Not being able to reproduce the exact bit string is lossy compression when viewed from the bitviewpoint but can be lossless from the knowledge viewpoint Therefore, reproducing the bit string isan additional requirement above and beyond the stated goal I strongly believe that this additional requirement will necessitate a *VERY* large amount of additional work not necessary for the stated goal In addition, by information theory, reproducing the exact bit string will requireadditional information beyond the knowledge contained in it (since numerous different strings can encode the same knowledge) Assuming optimalcompression, also by by information theory, additional information will add to the compressed size (i.e. lead to a less optimal result). So the question is "Given thatbit-level reproduction is harder, not necessary for knowledge compression/intelligence, and doesn't allow for the same degree of compression. Why makelife tougher when it isn't necessary for your stated purposes and makes your results (i.e. compression) worse?" - Original Message ----- From: Matt Mahoney To: agi@v2.listbox.com Sent: Tuesday, August 15, 2006 12:55 AM Subject: Re: Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize Where will the knowledge to compress text come from? There are 3 possibilities.1. externally supplied, like the lexical models (dictionaries) for paq8h and WinRK.2. learned from the input in a separate pass, like xml-wrt|ppmonstr.3. learned online in one pass, like paq8f and slim.These all have the same effect on compressed size. In the first case, you increase the size of the decompressor. In the second, you have to append the model you learned from the first pass to the compressed file so it is available to the decompressor. In the third case, compression is poor at the beginning. From the viewpoint of information theory, there is no difference in these three approaches. The penalty is the same.To improve compression further, you will need to model semantics and/or syntax. No compressor currently does this. I think the reason is that it is not worthwhile unless you have hundreds of megabytes of natural language text. In fact, only the top few compressors even have lexical models. All the rest are byte oriented n-gram models.A semantic model would know what words are related, like "star" and "moon". It would learn this by their tendency to appear together. You can build a dictionary of such knowledge from the data set i
Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize
You could use Keogh's compression dissimilarity measure to test for inconsistency.http://www.cs.ucr.edu/~eamonn/SIGKDD_2004_long.pdf CDM(x,y) = C(xy)/(C(x)+C(y)).where x and y are strings, and C(x) means the compressed size of x (lossless). The measure ranges from about 0.5 if x = y to about 1.0 if x and y do not share any information. Then, CDM("it is hot", "it is very warm") CDM("it is hot", "it is cold").assuming your compressor uses a good language model.Now if only we had some test to tell which compressors have the best language models...-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, August 15, 2006 3:22:10 PMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize Could you please write a test program to objectively test for lossy text compression using your algorithm? Writingthe test program for the decompressing programis relatively easy. Since the requirement was that the decompressing program be able to recognize when a piece of knowledge is in the corpus, when it's negation is in the corpus, when an incorrect substitution has been made, and when a correct substitution has been made -- all you/I would need to do isinvent (orobtain -- see two paragraphs down)a reasonably sized set of knowledge pieces to test, put them in a file, feed them to the decompressing program, and automatically grade it's answers as to which categoryeachfalls into. A reasonably small number of test cases should suffice as long as you don't advertise exactly which test cases are in the final test but once you're having competitors generate each other's tests, you can go hog-wild with the number. Writing the test program for the compressing program is also easy but developing the master list of inconsistencies is going to be a real difficulty -- unless you use the various contenders themselves to generate various versions of the list. I strongly doubt that most contenders will get false positives but strongly suspect that finding all of the inconsistencies will be a major area for improvement as the systems become more sophisticated. Note also that minor modifications ofany decompressing program should also be able to create test cases for your decompressor test. Simply ask it for a random sampling of knowledge, for the negations of a random sampling of knowledge, for some incorrect substitutions, and some hierarchical substitutions of each type. Any *real* contenders should be able to easily generate the tests for you. You can start by listing all of the inconsistencies in Wikipedia. see paragraph 2 above To make the test objective, you will either need a function to test whether two strings are inconsistent or not, or else you need to show that people will never disagree on this matter. It is impossible to show that people will never disagree on a matter. On the other hand, a knowledge compressor is going to have to recognize when two pieces of knowledge conflict (i.e. when two strings parse into knowledge statements that cannot coexist). You can always have a contender evaluate whether a competitor's "inconsistencies"are incorrect and then do some examination by hand on a representative sample where the contender says it can't tell (since, again,I suspect you'll find few misidentified inconsistencies -- but that finding all of the inconsistencies will be ever subject to improvement). Lossy compression does not imply AI. A lossy text compressor that did the same thing (recall it in paraphrased fashion)would certainly demonstrate AI. I disagree that these are inconsistent. Demonstrating and implying are different things. I didn't say that they were inconsistent. What I meant to say was that a decompressing programthat isable tooutput all of the compressed file's knowledge in ordinary English would, in your words,"certainly demonstrate AI". given statement 1, it's not a problem that "lossy compression does not imply AI" since the decompressing program would still "certainly demonstrate AI" - Original Message - From: Matt Mahoney To: agi@v2.listbox.com Sent: Tuesday, August 15, 2006 2:23 PM Subject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize Mark,Could you please write a test program to objectively test for lossy text compression using your algorithm? You can start by listing all of the inconsistencies in Wikipedia. To make the test objective, you will either need a function to test whether two strings are inconsistent or not, or else you need to show that people will never disagree on this matter. Lossy compression does not imply AI. A lossy text compressor that did the same thing (recall it in paraphrased fashion)would cert
Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize
Mark wrote:Huh? By definition, the compressor with the best language model is the one with the highest compression ratio. I'm glad we finally agree :-) You could use Keogh's compression dissimilarity measure to test for inconsistency. I don't think so. Take the following strings: "I only used red and yellow paint in the painting", "I painted the rose in my favorite color", "My favorite color is pink", "Orange is created by mixing red and yellow", "Pink is created by mixing red and white". How is Keogh's measure going to help you with that? You group the strings into a fixed set and a variable set and concatenate them. The variable set could be just "I only used red and yellow paint in the painting", and you compare the CDM replacing "yellow" with "white". Of course your compressor must be capable of abstract reasoning and have a world model.To answer Phil's post: Text compression is only near the theoretical limts for small files. For large files, there is progress to be made integrating known syntactic and semantic modeling techniques into general purpose compressors. The theoretical limit is about 1 bpc and we are not there yet. See the graph at http://cs.fit.edu/~mmahoney/dissertation/The proof that I gave that a language model implies passing the Turing test is for the ideal case where all people share identical models. The ideal case is deterministic. For the real case where models differ, passing the test is easier because a judge will attribute some machine errors to normal human variation. I discuss this in more detail at http://cs.fit.edu/~mmahoney/compression/rationale.html (text compression is equivalent to AI).It is really hard to get funding for text compression research (or AI). I had to change my dissertation topic to network security in 1999 because my advisor had funding for that. As a postdoc I applied for a $50K NSF grant for a text compression contest. It was rejected, so I started one without funding (which we now have). The problem is that many people do not believe that text compression is related to AI (even though speech recognition researchers have been evaluating models by perplexity since the early 1990's).-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, August 15, 2006 5:00:47 PMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize You could use Keogh's compression dissimilarity measure to test for inconsistency. I don't think so. Take the following strings: "I only used red and yellow paint in the painting", "I painted the rose in my favorite color", "My favorite color is pink", "Orange is created by mixing red and yellow", "Pink is created by mixing red and white". How is Keogh's measure going to help you with that? The problem is that Keogh's measure is intended for data-mining where you have separate instances, not one big entwined Gordian knot. Now if only we had some test to tell which compressors have the best language models... Huh? By definition, the compressor with the best language model is the one with the highest compression ratio. - Original Message - From: Matt Mahoney To: agi@v2.listbox.com Sent: Tuesday, August 15, 2006 3:54 PM Subject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize You could use Keogh's compression dissimilarity measure to test for inconsistency.http://www.cs.ucr.edu/~eamonn/SIGKDD_2004_long.pdf CDM(x,y) = C(xy)/(C(x)+C(y)).where x and y are strings, and C(x) means the compressed size of x (lossless). The measure ranges from about 0.5 if x = y to about 1.0 if x and y do not share any information. Then, CDM("it is hot", "it is very warm") CDM("it is hot", "it is cold").assuming your compressor uses a good language model.Now if only we had some test to tell which compressors have the best language models... -- Matt Mahoney, [EMAIL PROTECTED] To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize
If dumb models kill smart ones in text compression, then how do you know they are dumb? What is your objective test of "smart"? The fact is that in speech recognition research, language models with a lower perplexity also have lower word error rates.We have "smart" statistical parsers that are 60% accurate when trained and tested on manually labeled text. So why haven't we solved the AI problem? Meanwhile, a "dumb" model like matching query words to document words enables Google to answer natural language queries, while our smart parsers choke when you misspell a word. Who is smart and who is dumb? -- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Wednesday, August 16, 2006 9:17:52 AMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize You group the strings into a fixed set and a variable set and concatenate them. The variable set could be just "I only used red and yellow paint in the painting", and you compare the CDM replacing "yellow" with "white". Of course your compressor must be capable of abstract reasoning and have a world model. Very nice example of "homonculous"/"turtles-all-the-way-down"reasoning. The problem is that many people do not believe that text compression is related to AI (even though speech recognition researchers have been evaluating models by perplexity since the early 1990's). I believe that it's related to AI . . . . but that the dumbest models kill intelligent models every time . . . .which then makes AI useless for text compression And bit-level text storage and reproduction is unnecessary for AI (and adds a lot of needless complexity) . . . . So why are combining the two? - Original Message - From: Matt Mahoney To: agi@v2.listbox.com Sent: Tuesday, August 15, 2006 6:02 PM Subject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize Mark wrote: Huh? By definition, the compressor with the best language model is the one with the highest compression ratio.I'm glad we finally agree :-) You could use Keogh's compression dissimilarity measure to test for inconsistency. I don't think so. Take the following strings: "I only used red and yellow paint in the painting", "I painted the rose in my favorite color", "My favorite color is pink", "Orange is created by mixing red and yellow", "Pink is created by mixing red and white". How is Keogh's measure going to help you with that?You group the strings into a fixed set and a variable set and concatenate them. The variable set could be just "I only used red and yellow paint in the painting", and you compare the CDM replacing "yellow" with "white". Of course your compressor must be capable of abstract reasoning and have a world model.To answer Phil's post: Text compression is only near the theoretical limts for small files. For large files, there is progress to be made integrating known syntactic and semantic modeling techniques into general purpose compressors. The theoretical limit is about 1 bpc and we are not there yet. See the graph at http://cs.fit.edu/~mmahoney/dissertation/The proof that I gave that a language model implies passing the Turing test is for the ideal case where all people share identical models. The ideal case is deterministic. For the real case where models differ, passing the test is easier because a judge will attribute some machine errors to normal human variation. I discuss this in more detail at http://cs.fit.edu/~mmahoney/compression/rationale.html (text compression is equivalent to AI).It is really hard to get funding for text compression research (or AI). I had to change my dissertation topic to network security in 1999 because my advisor had funding for that. As a postdoc I applied for a $50K NSF grant for a text compression contest. It was rejected, so I started one without funding (which we now have). The problem is that many people do not believe that text compression is related to AI (even though speech recognition researchers have been evaluating models by perplexity since the early 1990's). -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, August 15, 2006 5:00:47 PMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize You could use Keogh's compression dissimilarity measure to test for inconsistency. I don't think so. Take the following strings: "I only used red and yellow paint in the painting", "I painted the rose in my favorite color", "My fav
Re: [agi] Lossy ** lossless compression
The argument for lossy vs. lossless compression as a test for AI seems to be motivated by the fact that humans use lossy compression to store memory, and cannot do lossless compression at all. The reason is that lossless compression requires the ability to do deterministic computation. Lossy compression does not. So this distinction is not important for machines.The proof that an ideal language model implies passing the Turing test requires a lossless model. A lossy model has only partial knowledge of the distribution of strings in natural language dialogs. Without full knowledge, it is not possible to duplicate the same distribution of equivalent representations of the same idea, allowing such expressions to be recognized as not human, even if the compression is ideal. For example, a lossy compressor might compress all of the following to the same code: "it is hot", "it is quite warm", "it is 107 degrees", "the burning desert sun seared my skin", etc. This distribution of expressions of equivalent ideas (or almost equivalent) is not uniform. Humans recognize that some expressions are more common than others, but an ideal lossy compressor is unable to regenerate the same distribution. (If it could, it would be a lossless model). It only needs to know the sum of the probabilities for ideal compression.This example brings up another issue. Who is to say if two expressions represent the same idea? The problem itself requires AI.The proper way to avoid coding equivalent representations in an objective way is to remove all noise (e.g. misspelled words, grammatical errors, arbitrary line breaks), from the data set and put it in a canonical form, so there can only be one way to represent the ideas within. This would remove any distinction between lossy and lossless compression. However it would be a gargantuan task. It would take a lifetime to read 1 GB of text. But by using Wikipedia, most of this work has already been done. There are very few spelling or grammar errors due to extensive review, and there is a rather uniform style. Line breaks only occur on paragraph boundaries.Uncompressed video would be the absolutely worst type of test data. Uncompressed video is about 10^8 to 10^9 bits per second. The human brain has a long term learning rate of around 10 bits per second. So all the rest is noise. How are you going to remove that prior to compression?There are no objective functions to compare the quality of lossy decompression. For images, we have PSNR, which is the RMS error of the pixel differences between the original and reconstructed images. But this is a poor measure. For example, if I increased the brightness of all pixels by 1%, you would not see any difference. However if I increased the brightness of just the top half of the image by 1%, then the PSNR would be reduced by 50% but there would be an obvious horizontal line across the image. Any test of lossy quality has to be subjective.This is not to say that investigating how humans do lossy compression isn't an important field of study. I think it is essential to understanding how vision, hearing, and the other senses work and how that data is processed. We currently do not have good models to describe how human decide what to remember and what to discard.But the Hutter prize is to motivate better language models, not vision or hearing or robotics. For that task, I think lossless text compression is the right approach.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: boris [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, August 19, 2006 10:25:58 PMSubject: [agi] Lossy ** lossless compression It's been said that we have to go after lossless compression because there's no way to objectively measure the quality of lossy compression. That makes sense only in the context of dumb indiscriminate transforms conventionally used for compression. If compression is produced by pattern recognition we can quantify lossless compression of individual patterns, which is a perfectly objective criterion for selectively *losing* insufficiently compressed patterns. To make Hutter's prize meaningful it must be awarded for compression of the *best* patterns, rather than of the whole data set. And, of course, linguistic/semantic data is a lousy place to start, it's already been heavily compressed by "algorithms" unknown to any autonomous system. An uncompressed movie would be a far, far better data sample. Also, the real criterion of intelligence is prediction, which is a *projected* compression of future data. The difference is that current compression is time-symmetrical, while prediction obviously isn't. To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED] To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Lossy ** lossless compression
Humans can do lossless compression but do it badly. Since human memory is inherently lossy, we must add error correcting information, which increases mental effort (storage cost, and thus learning time). Recalling text verbatim is harder than paraphrasing. It requires the mental equivalent of storing several identical copies. Humans can also execute arbitrary algorithms, but not efficiently. So it is possible to do things like send Morse code, which compresses text by using shorter codes for the most common letters. But this is not making use of our built in language model. The sender and receiver have to agree on a learned, predefined code (although the code is based on a crude model). Learning such codes requires extra effort (storage) so that the signal can be decoded without errors. Ironically, the receiver still uses his language model for error correction outside the scope of the Morse code decompression algorithm. If a signal is ambiguous as to whether a beep is a dot or a dash, the receiver can usually guess correctly by considering context. Machines can't do this. Decoding telegraph signals sent by humans is a hard problem for machines. Now, one may interpret this as an argument that lossless compression is unrelated to AI and we should use lossy compression as a test instead. No, I am not arguing that. Humans make very good use of their imprecise language models for text prediction and error correction. Those are the qualities that we want to emulate in AI. A machine can make a model precise at no extra cost, enabling us to use text compression to objectively measure these qualities. Researchers in speech recognition have been using this approach for the last 15 years. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: J. Andrew Rogers [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, August 22, 2006 12:58:04 AM Subject: Re: [agi] Lossy ** lossless compression On Aug 20, 2006, at 11:15 AM, Matt Mahoney wrote: The argument for lossy vs. lossless compression as a test for AI seems to be motivated by the fact that humans use lossy compression to store memory, and cannot do lossless compression at all. The reason is that lossless compression requires the ability to do deterministic computation. Lossy compression does not. I think this needs to be qualified a bit more strictly in real (read: finite) cases. There is no evidence that humans are incapable of lossless compression, only that lossless compression is far from efficient and humans have resource bounds that generally encourage efficiency. A distinction with a difference. Being able to recite a text verbatim is a different process than reciting a summary of its semantic content, and humans can do both. Even a probabilistic (e.g. Bayesian) computational model can reinforce some patterns to the point where all references to that pattern will be perfect in all contexts over some finite interval. I expect it would be trivial to prove a decent probabilistic model has just such a property over any arbitrary finite interval for any given pattern with proper reinforcement. I do not disagree that measures of lossy models is a significant practical issue for the purposes of a contest. But on the other hand, lossless models demand certain levels of inefficiency that a useful intelligent system would not exhibit and which impacts the solution space by how poorly these types of algorithms scale generally. If we knew an excellent lossless algorithm could fit within the resource constraints common today such that a lossy algorithm was irrelevant to the contest, I would expect a contest would be unnecessary. Which is not to say that I think the rules should be changed, just that this is quite relevant to the bigger question. Cheers, J. Andrew Rogers --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED] --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Lossy ** lossless compressio
As I stated earlier, the fact that there is normal variation in human language models makes it easier for a machine to pass the Turing test. However, a machine with a lossless model will still outperform one with a lossy model because the lossless model has more knowledge. I agree it is important to understand how the human brain filters information (lossy compression), especially vision and hearing. This does not change the fact that lossless compression is the right way to evaluate a language model. A lossy model cannot be evaluated objectively. I guess we will have to agree to disagree. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Philip Goetz [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Friday, August 25, 2006 12:31:06 PM Subject: Re: [agi] Lossy ** lossless compressio On 8/20/06, Matt Mahoney [EMAIL PROTECTED] wrote: The argument for lossy vs. lossless compression as a test for AI seems to be motivated by the fact that humans use lossy compression to store memory, and cannot do lossless compression at all. The reason is that lossless compression requires the ability to do deterministic computation. Lossy compression does not. So this distinction is not important for machines. No; the main argument is that lossy compression allows the use of much, much more sophisticated, and much, much more powerful compression algorithms, achieving much higher compression ratios. Also, lossless compression is already nearly as good as it can be. Statistical methods will probably out-perform intelligent methods on lossless compression, especially if the size of the compressor is included. The proof that an ideal language model implies passing the Turing test requires a lossless model. A lossy model has only partial knowledge of the distribution of strings in natural language dialogs. Without full knowledge, it is not possible to duplicate the same distribution of equivalent representations of the same idea, allowing such expressions to be recognized as not human, even if the compression is ideal. By this argument, no human can pass the Turing test, since none of us have the same distributions, either. Or perhaps just one human can pass it. Presumably Turing. You will never, never, never, never recreate the same exact language model in a computer as resides in any particular human. Losslessness is relevant only when you need to recreate it exactly, and you can't, so it's irrelevant. --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED] --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Lossy ** lossless compression
- Original Message From: Mark Waser [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Friday, August 25, 2006 5:58:02 PM Subject: Re: [agi] Lossy ** lossless compression However, a machine with a lossless model will still outperform one with a lossy model because the lossless model has more knowledge. PKZip has a lossless model. Are you claiming that it has more knowledge? More data/information *might* be arguable but certainly not knowledge -- and PKZip certainly can't use any knowledge that you claim that it has. DEL has a lossy model, and nothing compresses smaller. Is it smarter than PKZip? Let me state one more time why a lossless model has more knowledge. If x and x' have the same meaning to a lossy compressor (they compress to identical codes), then the lossy model only knows p(x)+p(x'). A lossless model also knows p(x) and p(x'). You can argue that if x and x' are not distinguishable then this extra knowledge is not important. But all text strings are distinguishable to humans. But let me give an example of what we have already learned from lossless compression tests. 1. PKZip, bzip2, ppmd, etc. model text at the character (ngram) level. 2. WinRK and paq8h model text at the lexical level using static dictionaries. They compress better than (1). 3. xml-wrt|ppmonstr and paq8hp1 model text at the lexical level using dictionaries learned from the input. They compress better than (2). I think you can see the pattern. There has been research in semantic models using distant bigrams and LSA. These compress cleaned text (restricted vocabulary, no punctuation) better than models without these capabilities, as measured by word perplexity. Currently there are no general purpose compressors that model syntax or semantics, probably because such models are only useful on large text corpora, not the kind of files people normally compress. I think that will change if there is a financial incentive. This does not change the fact that lossless compression is the right way to evaluate a language model. . . . . in *your* opinion. I might argue that it is the *easiest* way to evaluate a language model but certainly NOT the best -- and I would then argue, therefore, not the right way either. Also in the opinion of speech recognition researchers studying language models since the early 1990's. A lossy model cannot be evaluated objectively Bullsh*t. I've given you several examples of how. You've discarded them because you felt that they were too difficult and/or you didn't understand them. Deciding if a lossy decompression is close enough is an AI problem, or it requires subjective judging by humans. Look at benchmarks for video or audio codecs. Which sounds better, AAC or Ogg? -- Matt Mahoney, [EMAIL PROTECTED] --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Lossy ** lossless compression
First let me respond to Boris and Mark. I agree. Mark suggested putting Wikipedia in a canonical form, which would remove the distinction between lossless and lossy compression. This will be hard, but Boris made an important observation that useful data is generally compressable and useless data (noise) is not. I don't think the problem can be solved completely but there is clearly room for improvement. Eliezer suggests putting a model of the universe on a USB drive and then running the model to predict how many fingers he is holding up. Let's assume that is possible. Stephen Wolfram suggests the model, if one exists, might only be a few lines of code. http://en.wikipedia.org/wiki/A_New_Kind_of_Science But we must solve a few other problems first. 1. It may be hard to find such a model. We cannot tell whether the apparent randomness of quantum mechanics is truly random or generated by a deterministic, but random appearing process. This happens in cryptography. The only way to distinguis between true random data and an encrypted block of zero bits is to break the decryption. The former is not compressable, the latter is. 2. Assuming we solve this mystery of the universe and it turns out to be deterministic, we still have the problem of running the code on a computer that resides within the universe. If the universe is infinite, then it is possible because one Turing machine can simulate another. If the universe is finite (as quantum theory and the Big Bang suggest, also the lack of real Turing machines), then it is not possible because a state machine cannot simulate itself. Having the USB drive simulate all of the universe except itself would resolve this problem, but then if the USB drive resides outside the universe, how do we read the result? 3. Assuming we overcome this obstacle, it may be that the program will say how many fingers, but in that case the program also completely determines my behavior and might not allow me to answer. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Eliezer S. Yudkowsky [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Friday, August 25, 2006 8:08:02 PM Subject: Re: [agi] Lossy ** lossless compression Matt Mahoney wrote: DEL has a lossy model, and nothing compresses smaller. Is it smarter than PKZip? Let me state one more time why a lossless model has more knowledge. If x and x' have the same meaning to a lossy compressor (they compress to identical codes), then the lossy model only knows p(x)+p(x'). A lossless model also knows p(x) and p(x'). You can argue that if x and x' are not distinguishable then this extra knowledge is not important. But all text strings are distinguishable to humans. Suppose I give you a USB drive that contains a lossless model of the entire universe excluding the USB drive - a bitwise copy of all quark positions and field strengths. (Because deep in your heart, you know that underneath the atoms, underneath the quarks, at the uttermost bottom of reality, are tiny little XML files...) Let's say that you've got the entire database, and a Python interpreter that can process it at any finite speed you care to specify. Now write a program that looks at those endless fields of numbers, and says how many fingers I'm holding up behind my back. Looks like you'll have to compress that data first. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED] --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Lossy ** lossless compression
I think that either putting Wikipedia in canonical form, or recognizing that it is in canonical form, are two equally difficult problems. So the problem does not go away easily.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, August 26, 2006 4:51:07 PMSubject: Re: [agi] Lossy ** lossless compression Mark suggested putting Wikipedia in a canonical form, which would remove the distinction between lossless and lossy compression. Hmmm. Interesting . . . . Actually, I didn't suggest exactly that -- though I can see how you got that impression. I suggested that the decompression program should output the Wikipedia in canonical form meaning that it would be lossy as far as information is concerned (i.e. it loses the exact bit sequence of the input) but it would be lossless as far as knowledge is concerned. Putting the Wikipedia in a canonical form (or -- developing a good canonical form to put the Wikipedia into) strikes me as the largest part of the challenge (and thus, not something that you want to -- or should -- take on as contest organizers). - Original Message - From: "Matt Mahoney" [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Saturday, August 26, 2006 3:29 PM Subject: Re: [agi] Lossy ** lossless compression First let me respond to Boris and Mark. I agree. Mark suggested putting Wikipedia in a canonical form, which would remove the distinction between lossless and lossy compression. This will be hard, but Boris made an important observation that useful data is generally compressable and useless data (noise) is not. I don't think the problem can be solved completely but there is clearly room for improvement. Eliezer suggests putting a model of the universe on a USB drive and then running the model to predict how many fingers he is holding up. Let's assume that is possible. Stephen Wolfram suggests the model, if one exists, might only be a few lines of code. http://en.wikipedia.org/wiki/A_New_Kind_of_Science But we must solve a few other problems first. 1. It may be hard to find such a model. We cannot tell whether the apparent randomness of quantum mechanics is truly random or generated by a deterministic, but random appearing process. This happens in cryptography. The only way to distinguis between true random data and an encrypted block of zero bits is to break the decryption. The former is not compressable, the latter is. 2. Assuming we solve this mystery of the universe and it turns out to be deterministic, we still have the problem of running the code on a computer that resides within the universe. If the universe is infinite, then it is possible because one Turing machine can simulate another. If the universe is finite (as quantum theory and the Big Bang suggest, also the lack of real Turing machines), then it is not possible because a state machine cannot simulate itself. Having the USB drive simulate all of the universe except itself would resolve this problem, but then if the USB drive resides outside the universe, how do we read the result? 3. Assuming we overcome this obstacle, it may be that the program will say how many fingers, but in that case the program also completely determines my behavior and might not allow me to answer. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Eliezer S. Yudkowsky [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Friday, August 25, 2006 8:08:02 PM Subject: Re: [agi] Lossy ** lossless compression Matt Mahoney wrote: DEL has a lossy model, and nothing compresses smaller. Is it smarter than PKZip? Let me state one more time why a lossless model has more knowledge. If x and x' have the same meaning to a lossy compressor (they compress to identical codes), then the lossy model only knows p(x)+p(x'). A lossless model also knows p(x) and p(x'). You can argue that if x and x' are not distinguishable then this extra knowledge is not important. But all text strings are distinguishable to humans. Suppose I give you a USB drive that contains a lossless model of the entire universe excluding the USB drive - a bitwise copy of all quark positions and field strengths. (Because deep in your heart, you know that underneath the atoms, underneath the quarks, at the uttermost bottom of reality, are tiny little XML files...) Let's say that you've got the entire database, and a Python interpreter that can process it at any finite speed you care to specify. Now write a program that looks at those endless fields of numbers, and says how many fingers I'm holding up behind my back. Looks like you'll have to compress that data first. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.
Re: [agi] Lossy ** lossless compression
Suppose I claim that text8.zip available at http://cs.fit.edu/~mmahoney/compression/textdata.html is in canonical form. The procedure and a program for generating it is described at the bottom of that page. The output consists of only the lowercase letters a-z and spaces. If you claim that this is not in canonical form, then prove it. Specify a criteria for canonical form, a pass/fail test. I want an algorithm or a program, no hand waving or generalities. Input an arbitrary string, output yes or no.Do you see my point now? -- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, August 26, 2006 8:52:27 PMSubject: Re: [agi] Lossy ** lossless compression I think that either putting Wikipedia in canonical form, or recognizing that it is in canonical form, are two equally difficult problems. So the problem does not go away easily. Um. I think you missed my point. The compression program should be able to take the Wikipedia in it's current form and the decompression program should be able to output it in canonical form. Make the contestants do all the difficult work, not the judges. (and recognizing canonical form should be easy, ensuring it's completeness is likely to be a real problem, but that's what you have the other contestants for . . . . :-) - Original Message - From: Matt Mahoney To: agi@v2.listbox.com Sent: Saturday, August 26, 2006 5:33 PM Subject: Re: [agi] Lossy ** lossless compression I think that either putting Wikipedia in canonical form, or recognizing that it is in canonical form, are two equally difficult problems. So the problem does not go away easily. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, August 26, 2006 4:51:07 PMSubject: Re: [agi] Lossy ** lossless compression Mark suggested putting Wikipedia in a canonical form, which would remove the distinction between lossless and lossy compression. Hmmm. Interesting . . . . Actually, I didn't suggest exactly that -- though I can see how you got that impression. I suggested that the decompression program should output the Wikipedia in canonical form meaning that it would be lossy as far as information is concerned (i.e. it loses the exact bit sequence of the input) but it would be lossless as far as knowledge is concerned. Putting the Wikipedia in a canonical form (or -- developing a good canonical form to put the Wikipedia into) strikes me as the largest part of the challenge (and thus, not something that you want to -- or should -- take on as contest organizers). - Original Message - From: "Matt Mahoney" [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Saturday, August 26, 2006 3:29 PM Subject: Re: [agi] Lossy ** lossless compression First let me respond to Boris and Mark. I agree. Mark suggested putting Wikipedia in a canonical form, which would remove the distinction between lossless and lossy compression. This will be hard, but Boris made an important observation that useful data is generally compressable and useless data (noise) is not. I don't think the problem can be solved completely but there is clearly room for improvement. Eliezer suggests putting a model of the universe on a USB drive and then running the model to predict how many fingers he is holding up. Let's assume that is possible. Stephen Wolfram suggests the model, if one exists, might only be a few lines of code. http://en.wikipedia.org/wiki/A_New_Kind_of_Science But we must solve a few other problems first. 1. It may be hard to find such a model. We cannot tell whether the apparent randomness of quantum mechanics is truly random or generated by a deterministic, but random appearing process. This happens in cryptography. The only way to distinguis between true random data and an encrypted block of zero bits is to break the decryption. The former is not compressable, the latter is. 2. Assuming we solve this mystery of the universe and it turns out to be deterministic, we still have the problem of running the code on a computer that resides within the universe. If the universe is infinite, then it is possible because one Turing machine can simulate another. If the universe is finite (as quantum theory and the Big Bang suggest, also the lack of real Turing machines), then it is not possible because a state machine cannot simulate itself. Having the USB drive simulate all of the universe except itself would resolve this problem, but then if the USB drive resides outside the universe, how do we read the result? 3. Assuming we overcome this obstacle, it may be that the program will say how many fingers, but in that case the program also completely determines my
Re: [agi] Lossy ** lossless compression
Mark, I didn't get your attachment, the program that tells me if an arbitrary text string is in canonical form or not. Actually, if it will make it any easier, I really only need to know if a string is a canonical representation of Wikipedia.Oh, wait... there can only be one canonical form. I guess then all you have to do is store the canonical form and compare the input with it.After you solve this simple, easy problem and send me the program, I will solve the much harder problem of converting Wikipedia to canonical form.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Sunday, August 27, 2006 11:30:44 AMSubject: Re: [agi] Lossy ** lossless compression Suppose I claim that text8.zip available at http://cs.fit.edu/~mmahoney/compression/textdata.html is in canonical form. I reject your nonsensical claim. If you claim that this is not in canonical form, then prove it. Specify a criteria for canonical form, a pass/fail test. By definition, a canonical form should not have duplication. Your data has massive duplication (particularly when looked at on the knowledge level) and is therefore not canonical. Simple enough for you? Do you see my point now? No, all I see if that you're so invested in lossless (at the bit-level) compressionthatyou're not even willing to try to work to get past it. - Original Message - From: Matt Mahoney To: agi@v2.listbox.com Sent: Saturday, August 26, 2006 9:40 PM Subject: Re: [agi] Lossy ** lossless compression Suppose I claim that text8.zip available at http://cs.fit.edu/~mmahoney/compression/textdata.html is in canonical form. The procedure and a program for generating it is described at the bottom of that page. The output consists of only the lowercase letters a-z and spaces. If you claim that this is not in canonical form, then prove it. Specify a criteria for canonical form, a pass/fail test. I want an algorithm or a program, no hand waving or generalities. Input an arbitrary string, output yes or no.Do you see my point now? -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, August 26, 2006 8:52:27 PMSubject: Re: [agi] Lossy ** lossless compression I think that either putting Wikipedia in canonical form, or recognizing that it is in canonical form, are two equally difficult problems. So the problem does not go away easily. Um. I think you missed my point. The compression program should be able to take the Wikipedia in it's current form and the decompression program should be able to output it in canonical form. Make the contestants do all the difficult work, not the judges. (and recognizing canonical form should be easy, ensuring it's completeness is likely to be a real problem, but that's what you have the other contestants for . . . . :-) - Original Message - From: Matt Mahoney To: agi@v2.listbox.com Sent: Saturday, August 26, 2006 5:33 PM Subject: Re: [agi] Lossy ** lossless compression I think that either putting Wikipedia in canonical form, or recognizing that it is in canonical form, are two equally difficult problems. So the problem does not go away easily. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, August 26, 2006 4:51:07 PMSubject: Re: [agi] Lossy ** lossless compression Mark suggested putting Wikipedia in a canonical form, which would remove the distinction between lossless and lossy compression. Hmmm. Interesting . . . . Actually, I didn't suggest exactly that -- though I can see how you got that impression. I suggested that the decompression program should output the Wikipedia in canonical form meaning that it would be lossy as far as information is concerned (i.e. it loses the exact bit sequence of the input) but it would be lossless as far as knowledge is concerned. Putting the Wikipedia in a canonical form (or -- developing a good canonical form to put the Wikipedia into) strikes me as the largest part of the challenge (and thus, not something that you want to -- or should -- take on as contest organizers). - Original Message - From: "Matt Mahoney" [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Saturday, August 26, 2006 3:29 PM Subject: Re: [agi] Lossy ** lossless compression First let me respond to Boris and Mark. I agree. Mark suggested putting Wikipedia in a canonical form, which would remove the distinction between lossless and lossy compression. This will be hard, but Boris made an important observation t
Re: [agi] Lossy ** lossless compressi
In showing that compression implies AI, I first make the simplifying assumption that everyone shares the same language model. Then I relax that assumption and argue that this makes it easier for a machine to pass the Turing test. But I see your point. I argued that a lossless model knows everything that a lossy model does, plus more, because the lossless model knows p(x) and p(x'), while a lossy model only knows p(x) + p(x'). However I missed that the lossy model knows that x and x' are equivalent, while the lossless model does not. However, I think that a lossless model can reasonably derive this information by observing that p(x, x') is approximately equal to p(x) or p(x'). In other words, knowing both x and x' does not tell you any more than x or x' alone, or CDM(x, x') ~ 0.5. I think this is a reasonable way to model lossy behavior in humans. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Philip Goetz [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Sunday, August 27, 2006 9:23:25 PM Subject: Re: [agi] Lossy ** lossless compressi On 8/25/06, Matt Mahoney [EMAIL PROTECTED] wrote: As I stated earlier, the fact that there is normal variation in human language models makes it easier for a machine to pass the Turing test. However, a machine with a lossless model will still outperform one with a lossy model because the lossless model has more knowledge. That would be true only if there were one correct language model, AND you knew what it was. Besides which, every human has a lossy model. It seems to me that by your argument, a machine with a lossless model would out-perform a human, and thus /fail/ the Turing test. - Phil --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED] --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Lossy ** lossless compressi
On 8/28/06, Mark Waser wrote: How does a lossless model observe that Jim is extremely fat and James continues to be morbidly obese are approximately equal? I realize this is far beyond the capabilities of current data compression programs, which typically predict the next byte in the context of the last few bytes using learned statistics. Of course we must do better. The model has to either know, or be able to learn, the relationships between Jim and James, is and continues to be, fat and obese, etc. I think a 1 GB corpus is big enough to learn most of this knowledge using statistical methods. C:\res\data\wikigrep -c . enwik9 File enwik9: 10920493 lines match enwik9: grep: input lines truncated - result questionable C:\res\data\wikigrep -i -c fat enwik9 File enwik9: 1312 lines match enwik9: grep: input lines truncated - result questionable C:\res\data\wikigrep -i -c obese enwik9 File enwik9: 111 lines match enwik9: grep: input lines truncated - result questionable C:\res\data\wikigrep -i obese enwik9 |grep -c fat File STDIN: 14 lines match So we know that obese occurs in about 0.001% of all paragraphs, but in 1% of paragraphs containing fat. This is an example of a distant bigram model, which has been shown to improve word perplexity in offline models [1]. We can improve on this method using e.g. latent semantic analysis [2] to exploit the transitive property of semantics: if A appears near (means) B and B appears near C, then A predicts C. Likewise, syntax is learnable. For example, if you encounter the X is you know that X is a noun, so you can predict a X was or Xs rather than he X or Xed. This type of knowledge can be exploited using similarity modeling [3] to improve word preplexity. (Thanks to Rob Freeman for pointing me to this). Let me give one more example using the same learning mechanism by which syntax is learned: All men are mortal. Socrates is a man. Therefore Socrates is mortal. All insects have 6 legs. Ants are insects. Therefore ants have 6 legs. Now predict: All frogs are green. Kermit is a frog. Therefore... [1] Rosenfeld, Ronald, A Maximum Entropy Approach to Adaptive Statistical Language Modeling, Computer, Speech and Language, 10, 1996. [2] Bellegarda, Jerome R., Speech recognition experiments using multi-span statistical language models, IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 717-720, 1999. [3] Ido Dagan, Lillian Lee, Fernando C. N. Pereira, Similarity-Based Models of Word Cooccurrence Probabilities, Machine Learning, 1999. http://citeseer.ist.psu.edu/dagan99similaritybased.html -- Matt Mahoney, [EMAIL PROTECTED] --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Vision
I would like to make some very general comments on AGI. Marcus Hutter's AIXI shows that in a very general sense, the optimal behavior of a goal seeking agent at each point in time is to guess that the environment is simulated by the shortest Turing machine consistent with all observations so far. This is not a solution to the AGI because the problem is not computable, and is intractable even in the restricted domain of space and time bounded environments. Rather AIXI is a unifying framework for all machine learning algorithms, such as neural networks, SVM, Bayes, decision trees, GA, clustering, whatever. The implied goal of each is to find the simplest hypothesis that fits the data. More formally, they all use tractable algorithms to search a subset of short Turing machines for those consistent with the training data. Let us consider the subset of learning problems that are important to humans (vision, language, robotics, etc). Ben Goertzel made an important observation, that AGI = pattern recognition + goals. These correspond to the two learning mechanisms in animal brains, classical conditioning + operant conditioning. AGI is unsolved, so we cannot yet say which method should be used. But I should comment that the one working model we do have (the human brain) is best modeled by neural networks. I believe the reason that neural networks have so far failed to solve the problem is lack of computing power (10^13 connections, 10^14 connections/second) and lack of corresponding training data (several years of video, audio, etc). Google has this much computing power and data. It can already answer simple natural language queries like how many days until Xmas?. There have been lots of smart people working on AI for the last 50 years. By 1960 we have already seen language translation, handwriting recognition, natural language query answering in restricted domains, chess playing, automatic theorem proving, etc. If there was a shortcut to the general problem that didn't require massive computing power, I think we would have found it by now. On 9/4/06, YKY (Yan King Yin) [EMAIL PROTECTED] wrote: I think the essense of Hawkins' theory (his HTM [hierarchical temporal memory] model) is the compression of sensory experience via pattern recognition. Sensory experience goes in; condensed episodic memory comes out. He does this with neural networks. I worked with NNs for a while along exactly the same line of thought. After a while I just decided that NN is too difficult to work with, so I switched to (predicate) logic as the substrate for pattern recognition. Take an example: John hits Mary. Mary kicks John. Mary kicks John again. John hits Mary again. etc, etc. The point is to recognize that John and Mary are fighting, thus achieving compression. The fighting pattern can be irregular consisting of X hit Y, Y kick X, etc. With logic I can write down a rule for recognizing this pretty easily, mainly due to the use of symbolic variables. So you see the compressive power of logic. NN is just too clumsy to work with. Although we know that the brain somehow must perform this information compression with neurons, we just don't understand the mechanisms yet. Let's say the goal is to compress visual inputs to the John hits Mary level. I think it can be done using my vision scheme plus a logical knowledge representation. But with NN, this still seems very very remote There is a statistical language modeling solution to this problem. Counting Google hits: the 25,250,000,000 (at least this many English web pages) fight 603,000,000 hit kick 49,700,000 hit kick fight 18,500,000 So fight occurs on about 2.3% of all web pages, but 37% of web pages containing hit and kick. In fact, you could get similar numbers if your training sample had 1/1,000,000 as many pages. But to use a smaller training set than that you would have to use techniques like LSA to exploit the transitive property of semantics. If there are no documents containing both hit and fight then you could still infer the relationship from documents containing both hit and punch plus documents containing both punch and fight. LSA (latent semantic analysis) is described as the factoring of a word-document matrix into 3 matrices by SVD (singular value decomposition), where the middle matrix is diagonal, then discarding all but a few hundred of the largest diagonal terms. This greatly reduces the storage requirement (i.e. a simpler model). Furthermore, the SVD is equivalent to a 3 layer linear neural network with the layers representing words, an abstract semantic space, and documents. Not that SVD is fast... -- Matt Mahoney, [EMAIL PROTECTED] --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] G0 theory completed
The issue of control over an AGI was discussed in the singularity mailing list. The question was whether it is possible to guarantee that an AGI will be friendly. It was hotly debated with no consensus.My position is that once you make machines that are smarter than humans, and they do the same, you cannot guarantee anything. This limitation is fundamental, in the same way that you cannot predict if a Turing machine will halt. I cited two papers by Hutter and Legg to support this. Hutter's paper on AIXI proves that the optimal behavior of a rational agent (as a Turing machine) with the goal of maximizing the accumulated reward signal from an unknown interactive environment is to guess that the environment is simulated by the shortest Turing machine consistent with past observation. Legg's paper on the limits of learnability proves that the shortest Turing machine capable of learning to predict the output of another machine of Kolmogorov complexity n is between n and n + log n.Taken together, the papers explain a lot about the nature of uncertainty in a deterministic universe (as Einstein asserted, in spite of quantum mechanics). Hutter's proof requires the assumption that the universe be computable by a Turing machine. I think his paper (which essentially proves Occam's Razor) would not be so compelling if the universe were not in fact computable, or a simulation. The source of uncertainty is therefore due to the universe having greater Kolmogorov complexity than your brain.Your programming example illustrates this nicely. You can't understand a 30,000 line program all at once, so you divide it into modules with well defined interfaces. You can develop, test, debug, model, predict, etc. one small module while treating the rest of the program as unpredictable, even though you know it is really deterministic. If you didn't model the program this way, you wouldn't need to check function arguments or throw exceptions. So you are really supporting my argument that you cannot predict (and therefore cannot control) an AGI.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: David Clark [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Friday, October 6, 2006 6:03:58 PMSubject: Re: [agi] G0 theory completed Matt Mahoney said: If you can't model an AGI in your head, then you can't program it, understand it, test it, control it, or predict what it will do either. Can't model can't program- I have programmed a number of large (30,000+ lines of C) systems and I couldn't disagree more with the above statement. At any one time, I can only think about 10 or fewer detailsof any program at a time. It is only because of the organization of my code that I am able to make a program that actually works. As the program grows, the levels of abstraction grows so that (from my personal experience) I see no limit to the size of complexity of code that I can create or work on. I rely, for muchof the detail memory, on the source code and I don't try to model all the details at once at all. I have other tools that help with higher levels of abstraction. Understand- I "understand" all parts (in isolation) of allprograms that I write but for some of the largest ones, I can't predict (without tools) what the system will always do without actually running the program. If I wanted to know exactly then I would probably just run the program and find out for sure. Testing -There are many methods of testing and I see no limit to the size of code that can be tested. Microsoft Windows is the largest set of programs in the world and it can be argued that they are not fully tested but obviously enough for most people to use the programs. Control- Control has special meaning when talking about an AGI. If you truly had an AGI, would you have any more control over it if you could totally model or understand the AGI versus not? Predict- We can't predict how most people will react and think even if we have known them our whole lives. If an AGI had intelligence on par with humans, how could we expect to always predict what they would think or do when we can't do that with ourselves. What limit is there to knowledge in general? What tiny fraction of that knowledge can any single person embody? I don't know but it must be a tiny fraction of the knowledge we currently have and our rate of acquiring new knowledge is increasing exponentially. What limit is there to a database given an increasingly growing memory store? I think the answer if not infinite, then it must be many orders of magnitude larger than what a single human is capable of. Using a human mind as an example of the only way an AGI could possibly be created is flawed. Humans might have a huge number of limits to learning that AGI's do not. The hardware for an AGI is not limited to any specific amount and it is also not limited to current hardware or algorithms. A human is limited to the brain in his skull and only
Re: [agi] G0 theory completed
My concern about G0 is that the problem of integrating first order logic or structured symbolic knowledge with language and sensory/motor data is unsolved, even when augmented with weighted connections to represent probability and/or confidence (e.g. fuzzy logic, Bayesian systems, connectionist systems). I think such weighting is an improvement but something else is still missing. People have been working on weighted graphs in various forms for over 20 years. If there was an easy solution, we should have found it by now. I did not see any proposed solution in G0.First order logic is powerful, but that does not mean it is correct. I think it is an oversimplification, and we are discarding something essential for the sake of computational efficiency. The fact that you can represent Kicks(x,y) means that you can represent nonsense statements like "ball kicks boy". This is not how people think. A person reading such a statement will probably reverse the order of the words because it makes more sense that way. How would a symbolic system do that?I think AGI will be solved when we do two things. First, we must understand what is going on in the human brain. Second, we must build a system with enough hardware to simulate it properly.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Monday, October 9, 2006 2:23:59 PMSubject: Re: [agi] G0 theory completedMatt: (Sorry about the delay... I was busy advertising in other groups..) But now that you have completed your theory on how to build AGI, what do you do next?Which parts will you write yourself and which parts will you contract out? Ideally, any part that can be "out-sourced" should be out-sourced. At this stage let's see who are interested in this approach...? I still think there are still some fundamental problems to be solved.Your system is based on first order logic.(You said that is not a fixed design feature, but without a data structure you don't have a design).I am not aware of any system that has successfully integrated FOL (or its augmented variants) with sensory/motor data or language. For sensory processing, I think the main reason is that FOL is not probabilistic. We need to combine probability with FOL, which is not that hard. A Bayesian networkcan be viewed as propositional logic + probability. For natural language, perhaps the reason is thatthey have only focused on inference and ignoredpattern recognition, which, as I argued, is the basis ofdealing withthe semantics of words. All such systems require human programmers to explicity encode knowledge.You have many examples of how various types of knowledge can be represented.Books and papers on knowledge representation are full of similar examples.What these examples all lack is an explicit algorithm for acquiring such knowledge.Sure, humans can do it easily, but if you make your learning mechanism this smart, then you have already solved AGI.If it took anything less than human knowledge to do it, then surely systems like Cyc would have been built this way.Why spend 20 years hand coding millions of rules instead of a few days crunching a terabyte of text off the Internet? I think there is no shortcut to knowledge acquisition. Doug Lanet has argued that much of common sense knowledge is missing fromthe internet. For example a Google search of "water flowing downhill" returns 987 hits versus 1480 hits for "water flowing uphill". He argued that adult speech assumes common sense knowledge andthus is a badsource of common sense. There are multiple reasons why Cyc is not yet successful -- lack of sensory input (vision),not good enough to converse innatural language, inference engine not advanced enough, no probabilities or fuzzy logic, etc. It's like the failure ofearly gliders to fly,which doesn'tmean thatflying with planes is impossible. Successful language models like Google are based on statistical models, not FOL.Likewise, successful applications in vision or robotics generally use numerical/signal processing/neural models. A FOL formula such as "Sexy(x) ^ Intelligent(x)" is not unlike aneuron that detects 2 weighted inputs. If you add probabilities to FOL then they are evenmore similar. But logic is more powerful because it can use variables. For example "Kicks(x,y)" is veryVERY difficult to be expressed by statistical models or NNs, because it has to match "John kicks Mary" and "boy kicks dog", "robot kicks ball", etc. I think to succeed at AGI, we need to understand the theoretical limits of learning [1,2], then develop a system not based on methods that have already been shown not to work.Then build a system that can learn, give it enough raw data to do so, and set it loose. First of all we got to have a wine cup that is capable of holding the wine (knowledge). In other words, we need anarchitecture
Re: [agi] G0 theory completed
Do you really write 30,000 line programs without writing any error handling code?My argument for the unpredictability of AGI is based on Legg's paper [1]. It proves that a Turing machine cannot predict another machine with greater Kolmogorov complexity.Here I am equating Kolmogorov complexity with intelligence. I think that is reasonable. We already cannot predict what a 30,000 line program will do. During development, we break it down into small modules and work on them one at a time while modeling the rest of the program abstractly. Any simplified, abstract model (one whose Kolmogorov complexity is less than that of the system modeled) must be probabilitisic, an approximation. This is easily proven. If the model is exact, then our original assumption of the complexity of the system must have been wrong. [1] Legg, Shane, (2006), Is There an Elegant Universal Theory of Prediction?, Technical Report IDSIA-12-06, IDSIA / USI-SUPSI, Dalle Molle Institute for Artificial Intelligence, Galleria 2, 6928 Manno, Switzerland. http://www.vetta.org/documents/IDSIA-12-06-1.pdf -- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: David Clark [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, October 10, 2006 11:20:16 AMSubject: Re: [agi] G0 theory completed You misinterpret my response. I never said I didn't understand the 30,000 line program. I said I can't think about the details embodied by the code at the same time. These are quite different things. If remembering huge numbers of details at one time is your definition of understanding then humans don't understand almost anything in our world. We overcome our small short term memory by organization and external augmentation (make a few notes on paper or computer, memorize known solutions, etc). I think you define understanding much too narrowly. I don't treat the rest of my program as unpredictable when I am working on just one small module. In fact, I have a very clear view of what every part of the program will do when I decide to concentrate on that module or the bigger structures I have created. I don't check function arguments and have never used exceptions except where I want to catch a know condition with that structure. It makes no sense to have exceptions in general if you don't know what the error will be (so much for unpredictability), because you wouldn't be able to handle it in any meaningful way. Your comments don't support your conclusion of the non-predictability of an AGI but as I stated in my email, an AGI with close to human intelligence would be no less/more predictable than humans are. I don't thinkhumans are all that predictable, do you? My biggest complaint with your email was the idea of stating flat out that a brain cannot model something more complicated than itself. I disagree with this view. I think the answer is in the details. I can model the whole world easily in my limited short term memory but how detailed and how useful that model is, is open to debate. You wrote: I think to succeed at AGI, we need to understand the theoretical limits of learning [1,2], then develop a system not based on methods that have already been shown not to work. Then build a system that can learn, give it enough raw data to do so, and set it loose. Understanding learning in humans is helpful but why would all possible methods of gaining experience and divining solutions be embodied in humans already? If an AGI is embodied in a computer, and computers have significantly different attributes than humans, then why would human "theoretical limits of learning" be the final say for creating an AGI? I think most people on this list would agree with the build learning system, add data and go theory but the disagreement is always how much building needs to be done to get to the most optimal set of learning capability. We have seen many failures of potential AI programs that tried with a single learning method and got nowhere. If the answer to AGI was simple or easy, I think that solution would already have been found. The nuances of any past failed attempt at AGI also make your statement about "not based on methods that have already been shown not to work" not very useful. I don't know of anyone working on AI who thinks that their particular approach is exactly like the ones that have failed. Many outsiders to those projects might say that their basic approach has been shown not to work but the people on that project disagree. An example would be the many NN projects that have not produced any intelligence but we know that basically human intelligence is based on NN's. If there is a simple message in these facts I fail to see it, other than that no approach should be fully discounted until somebody actually succeeds at building an AGI. David Clark - Original Message ----- From: Matt Mahoney To: agi@v2.listbox.com Sent: Friday, October 06, 2006
Re: A Mind Ontology Project? [Re: [agi] method for joining efforts]
YKY, it looks like you removed the G0 page. Is this proprietary now too?http://www.geocities.com/genericai/-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Monday, October 16, 2006 9:37:23 PMSubject: Re: A Mind Ontology Project? [Re: [agi] method for joining efforts] Re the Mind Ontology page: I have written a "glossary of terms" pertinent to our discussions, including Ben's suggestion of the terms: -- perception -- emergence -- symbol grounding -- logic and I also added many of the terms in my architecture (which is not meant to be final, only as aproposal forfurther discussion). I find no use of "emergence" so I left it undefined =P I suggest thatweb page's content should be proprietary to Novamente, because it contains some of myideas of G0 in it. [I'd be happy to let Ben use all these ideas perhaps in exchange of a small amount of Novamente shares. Anyway, many of my ideas came from discussions with Ben and members of this list.] Secondly, I'm not sure what the "ontology" is supposed to mean except as a clarification of terms. If so I guess a few web pages would suffice for this purpose. Thirdly -- an important point -- I think Ben should focus on dividing the architecture into modules that can be researched and developed(relatively) independently. This is very important because even if Ben's brain has ideas that are better than each of our brains', his ideas will not be better than all ofours combined. So it would be a tremendous step forward if we could decide on a broad way of separating the modules. This I will propose on the page. Personally, I wish to specialize on pattern recognition and perhaps vision (since Ben is also doing some vision). YKY This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED] This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] SOTA
- Comprehensive (common-sense) knowledge-bases and/or ontologies Cyc/OpenCyc, Wordnet, etc. but there seems to be no good way for applications to use this information and no good alternative to hand coding knowledge. - Inference engines, etc. - Adaptive expert systems A dead end. There has been little progress since the 1970's. - Question answering systems Google. - NLP components such as parsers, translators, grammar-checkers Parsing is unsolved. Translators like Babelfish have progressed little since the 1959 Russian-English project. Microsoft Word's grammar checker catches some mistakes but is clearly not AI. - Interactive robotics systems (sensing/ actuation) - physical or virtual The Mars Rovers and the DARPA Grand Challenge (robotic auto race) are impressive but we clearly have a long way to go before your car drives itself. - Vision, voice, pattern recognition, etc. It is difficult to say about face recognition systems, because of their use in security, accuracy rates are secret. I believe they have been oversold. Voice recognition is limited to words and short phrases until we develop better language models with AI behind them. A keyboard is still faster than a microphone. - Interactive learning systems - Integrated intelligent systems Lots of theoretical results, but no real applications. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] SOTA
- Original Message From: BillK [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, October 19, 2006 11:43:46 AM Subject: Re: [agi] SOTA On 10/19/06, Matt Mahoney wrote: - NLP components such as parsers, translators, grammar-checkers Parsing is unsolved. Translators like Babelfish have progressed little since the 1959 Russian-English project. Microsoft Word's grammar checker catches some mistakes but is clearly not AI. http://www.charlotte.com/mld/charlotte/news/nation/15783022.htm I think the problem will eventually be solved. There was a long period of stagnation since the 1959 Russian-English project but I think this period will soon end thanks to better language models due to the recent availability of large text databases, fast hardware, and cheap memory. Once we solve the language modeling problem, we will remove the main barrier to many NLP problems such as speech recognition, translation, OCR, handwriting recognition, and question answering. Google has made good progress in this area using statistical modeling methods and was top ranked in a recent competition. Google has access to terabytes of text in many languages and a custom operating system for running programs in parallel on thousands of PCs. Here is Google's translation of the above article into Arabic and back to English. But as you can see, the job isn't finished. American soldiers heading to Iraq with a laptop translators from Stephanie Hinatz daily newspapers (Newport News,va. (ethnic)نورفولكVa. army-star trip now using similar instrument in Iraq to help the forces of language training without contact with Iraqi civilians and the training of the country's emerging police and military forces. the name of a double discourse to address Albernamjoho translator, which uses computers to convert spoken English Iraqi pwmound and vice versa. while the program is still technically in the research and development stage,Norfolk-based U.S. Joint Forces Command,in conjunction with the Defense Advanced Research projects Agency,some models has been sent to Iraq, 70 troops is used in tactical environments to evaluate its effectiveness. and so far is fine and said Wayne Richards,Commander leadership in the implementation section. the need for such a device for the first time in April 2004 when the joint forces command received an urgent request from commanders on the ground in Abragherichards. soldiers on the ground needed to improve communication with the Iraqi people. But because of the shortage of linguists and translators throughout the Department of Defense do not come from the difficult,even some of the forces of the so-called most important work in Iraq today in Iraq, the training of police and military forces. get those troops trained and capable of maintaining the security of the country itself is a reminder of return for service members to continue der inside and outside the war zone. experts are trying to develop this kind of technical translation for 10 years,He said that Richards. today, in its current form,The translator is the rugged laptop with the plugs are two or loudspeakers and Alsmaatrichards pointing to a model and convert. It is also easy to use Talking on the phone,as evidenced shortly after the Norfolk demonstration Tuesday. I tell you, an Iraqi withdrawal on a computer. you put the microphone up to your mouth. when he said :We are here to provide food and water for your family, You held by the E key to security in a painting keys. you,I wrote to you the text of what we discussed to delight on the screen. you wipe the words to make sure you get exactly. If you can change it manually. when you are convinced you to the t key to the interpretation and sentence looming on the screen once Achrihzh time in Arab Iraq. the computer also says his loud speakers through. the process is the same Balanceof those who did not talk to you. I repeat what you have and the Arab computer will spit on you, the words in the English language. as do translator rights,the program assumes some meanings. not 100% Richards. when I ask,For example,Can the newspaper today, the Arab-language Alanklizihaltrgmeh direct Can the newspaper today. because in any act made in every conversation with the translator is taken. any translation is not due to the past program. Defense Language Institute in California also true of all the translations and Richards. now,because of its size,the best place to use the translator is at the center of command and control or a classroom. It is unlikely that the average Navy will be overseeing the cart with 100 pounds of equipment to implement that attacks in Baghdad, in Sadr City. We hope if the days will be small enough that the sergeant to be implemented in a skirt. Think about it and Richards. sergeant beating on the door of the house formulateseen in Fallujah. a woman answers the door. The soldier's weapon. because it is afraid. the soldier immediately to the effects translator
Re: [agi] SOTA
- Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Friday, October 20, 2006 10:14:09 AM Subject: Re: [agi] SOTA We have been searching for decades to find shortcuts to fit our machines? When you send a child into her bedroom to search for a missing library book, she will often come back 15 seconds after entering the room with the statement I searched for it and it's not there. Drawing definite conclusions is about equally reliable, in both these cases. If you have figured out how to implement AI on a PC, please share it with us. Until then, you will need a more convincing argument that we aren't limited by hardware. A lot of people smarter than you or me have been working on this problem for a lot longer than 15 seconds. James first proposed association models of thought in 1890, about 90 years before connectionist neural models were popular. Hebb proposed a model of classical conditioning in which memory is stored in the synapse in 1949, decades before the phenomena was actually observed in living organisms. By the early 1960s we had programs that could answer natural language queries (the 1959 BASEBALL program), translate Russian to English, prove theorems in geometry, solve arithmetic word problems, and recognize handwritten digits. It is not that we can't come up with the right algorithms. It's that we don't have the computing power to implement them. The most successful AI applications today like Google require vast computing power. If the brain used its hardware in such a way that (say) a million neurons were required to implement a function that, on a computer, required a few hundred gates, your comparisons would be meaningless. I doubt the brain is that inefficient. There are lower animals that crawl with just a couple hundred neurons. In higher animals, neural processing is expensive, so there is evolutionary pressure to compute efficiently. Most of the energy you burn at rest is used by your brain. Humans had to evolve larger bodies than other primates to support our larger brains. In most neural models, it takes only one neuron to implement a logic gate and only one synapse to store a bit of memory. It used to be a standing joke in AI that researchers would claim there was nothing wrong with their basic approach, they just needed more computing power to make it work. That was two decades ago: has this lesson been forgotten already? I don't see why this should not still be true. The problem is we still do not know just how much computing power is needed. There is still no good estimate of the number of synapses in the human brain. We only know it is probably between 10^12 to 10^15 and we aren't even sure of that. So when AI is solved, it will probably be a surprise. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] SOTA
- Original Message From: Pei Wang [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Friday, October 20, 2006 3:35:57 PM Subject: Re: [agi] SOTA On 10/20/06, Matt Mahoney [EMAIL PROTECTED] wrote: It is not that we can't come up with the right algorithms. It's that we don't have the computing power to implement them. Can you give us an example? I hope you don't mean algorithms like exhaustive search. For example, neural networks which perform rudamentary pattern detection and control for vision, speech, language, robotics etc. Most of the theory had been worked out by the 1980's, but applications have been limited by CPU speed, memory, and training data. The basic building blocks were worked out much earlier. There are only two types of learning in animals, classical (association) and operant (reinforcement) conditioning. Hebb's rule for classicical condioning proposed in 1949 is the basis for most neural network learning algorithms today. Models of operant conditioning date back to W. Ross Ashby's 1960 Design for a Brain where he used randomized weight adjustments to stabilize a 4 neuron system build from vacuum tubes and mechanical components. Neural algorithms are not intractable. They run in polynomial time. Neural networks can recognize arbitrarily complex patterns by adding more layers and training them one at a time. This parallels the way people learn complex behavior. We learn simple patterns first, then build on them. The most successful AI applications today like Google require vast computing power. In what sense do you call Google an AI application? Google does pretty well with natural language questions like how many days until xmas? even though they don't advertise it that way (like Ask Jeeves did) and most people don't use it that way. Of course you might say that Google isn't doing AI, it is just matching query terms to documents. But it is always that way. Once you solve the problem, it's not AI any more. Deep Blue isn't AI. It just implements a chess playing algorithm in fast hardware. Suppose we decide the easiest way to build a huge neural network is to use real neurons and some genetic engineering. Is that AI? -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] SOTA
With regard to the computational requirements of AI, there is a very clear relation showing that the quality of a language model improves by adding time and memory, as shown in the following table: http://cs.fit.edu/~mmahoney/compression/text.html And with the size of the training set, as shown in this graph: http://cs.fit.edu/~mmahoney/dissertation/ Before you argue that text compression has nothing to do with AI, please read http://cs.fit.edu/~mmahoney/compression/rationale.html I recognize that language modeling is just one small aspect of AGI. But compression gives us hard numbers to compare the work of over 80 researchers spanning decades. The best performing systems push the hardware to its limits. This, and the evolutionary arguments I gave earlier lead me to believe that AGI will require a lot of computing power. Exactly how much, nobody knows. Whether or not AGI can be accomplished most efficiently with neural networks is an open question. But the one working system we know of is based on it, and we ought to study it. One critical piece of missing knowledge is the density of synapses in the human brain. I think this could be resolved by putting some brain tissue under an electron microscope, but I guess that the number is not important to neurobiologists. I read Pei Wang's paper, http://nars.wang.googlepages.com/wang.AGI-CNN.pdf Some of the shortcomings of neural networks mentioned only apply to classical (feedforward or symmetric) neural networks, not to asymmetric networks with recurrent circuits and time delay elements, as exist in the brain. Such circuits allow for short term stable or oscillating states which overcome some shortcomings such as the inability to train on multiple goals, which could be accomplished by turning parts of the network on or off. Also, it is not true that training has to be offline using multiple passes, as with backpropagation. Human language is structured so that layers can be trained progressively without need to search over hidden units. Word associations like sun-moon or to-from are linear. Some of the top compressors mentioned above (paq8, WinRK) use online, single pass neural networks to combine models, alternating prediction and training. But it is interesting that most of the remaining shortcomings are also shortcomings of human thought, such as the inability to insert or represent structured knowledge accurately. This is evidence that our models are correct. This does not mean they are the best answer. We don't want to duplicate the shortcomings of humans. We do not want to slow down our responses and insert errors in order to pass the Turing test (as in Turing's 1950 example). -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] SOTA
- Original Message From: Pei Wang [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Saturday, October 21, 2006 5:25:13 PM Subject: Re: [agi] SOTA For example, the human mind and some other AI techniques handle structured knowledge much better than NN does. Is this because the brain is representing the knowledge differently than a classical neural network, or because the brain has a lot more memory and can afford to represent structured knowledge inefficiently? I agree with the conclusion of your paper that a classical neural network is not sufficient to solve AGI. The brain is much more complex than that. But I think a neural architecture or a hybrid system that includes neural networks of some type is the right direction. For example, Novamente (if I understand correctly, a weighted hypergraph) has some resemblance to a neural network -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
[agi] Language modeling
- Original Message From: Pei Wang [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Saturday, October 21, 2006 7:03:39 PM Subject: Re: [agi] SOTA Well, in that sense NARS also has some resemblance to a neural network, as well as many other AI systems. Also to Novamente, if I understand correctly. Terms are linked by a probability and confidence. This seems to me to be an optimization of a neural network or connectionist model, which is restricted to one number per link, representing probability. To model confidence you would have to make redundant copies of the input and output units and their connections. This would be inefficient, of course. One aspect of NARS and many other structured or semi-structured knowledge representations that concerns me is the direct representation of concepts such as is-a, equivalence, logic (if-then, and, or, not), quantifiers (all, some), time (before and after), etc. These things seem fundamental to knowledge but are very hard to represent in a neural network, so it seems expedient to add them directly. My concern is that the direct encoding of such knowledge greatly complicates attempts to use natural language, which is still an unsolved problem. Language is the only aspect of intelligence that separates humans from other animals. Without language, you do not have AGI (IMHO). My concern is that structured knowledge is inconsistent with the development of language in children. As I mentioned earlier, natural language has a structure that allows direct training in neural networks using fast, online algorithms such as perceptron learning, rather than slow algorithms with hidden units such as back propagation. Each feature is a linear combination of previously learned features followed by a nonlinear clamping or threshold operation. Working in this fashion, we can represent arbitrarily complex concepts. In a connectionist model, we have, for example: - pixels - line segments - letters - words - phrases, parts of speech - sentences etc. Children also learn language as a progression toward increasingly complex patterns. - phonemes beginning at 2-4 weeks - phonological rules for segmenting continuous speech at 7-10 months [1] - words (semantics) beginning at 12 months - simple sentences (syntax) at 2-3 years - compound sentences around 5-6 years Attempts to change the modeling order are generally unsuccessful. For example, attempting to parse a sentence first and then extract its meaning does not work. You cannot parse a sentence without semantics. For example, the correct parse of I ate pizza with NP depends on whether NP is pepperoni, a fork, or Sam. Now when we hard code knowledge about logic, quantifiers, time, and other concepts and then try to retrofit NLP to it, we are modeling language in the worst possible order. Such concepts, needed to form compound sentences, are learned at the last stage of language deveopment. In fact, some tribal languages such as Piraha [2] do not ever reach this stage, even for adults. My caution is that any language model we develop has to be trainable in order from simple to complex. The model has to be able to first learn simple sentences in the absence of any knowledge of logical relations, and then there must be a mechanism for learning such relations. I realize that human models of logical relations must be horribly inefficient, given how long it takes children to learn them. I think to solve AGI, we need to develop a better understanding of such models. I do not hold out too much hope for a computationally efficient solution, given our long past record of failure. [1] Jusczyk, Peter W. (1996), “Investigations of the word segmentation abilities of infants”, 4’th Intl. Conf. on Speech and Language Processing, Vol. 3, 1561-1564 [2] The Piraha challenge: an Amazonian tribe takes grammar to a strange place, Science News, Dec. 10, 2005, http://www.findarticles.com/p/articles/mi_m1200/is_24_168/ai_n16029317/pg_1 -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
I am interested in identifying barriers to language modeling and how to overcome them. I have no doubt that probabilistic models such as NARS and Novamente can adequately represent human knowledge. Also, I have no doubt they can learn e.g. relations such as all frogs are green from examples of green frogs. My question relates to solving the language problem: how to convert natural language statements like frogs are green and equivalent variants into the formal internal representation without the need for humans to encode stuff like (for all X, frog(X) = green(X)). This problem is hard because there might not be terms that exactly correspond to frog or green, and also because interpreting natural language statements is not always straightforward, e.g. I know it was either a frog or a leaf because it was green. Converting natural language to a formal representation requires language modeling at the highest level. The levels from lowest to highest are: phonemes, word segmentation rules, semantics, simple sentences, compound sentences. Regardless of whether your child learned to read at age 3 or not at all, children always learn language in this order. The state of the art in language modeling is at the level of simple sentences, modeling syntax using n-grams (usually trigrams) or hidden Markov models generally without recursion (flat), and modeling semantics as word associations, possibly generalizing via LSA or clustering to exploit the transitive property (if A means B and B means C, then A means C). This is the level of modeling of the top text compressors on the large text benchmark and the lowest perplexity models used in speech recognition. I gave an example of a Google translation of English to Arabic and back. You may have noticed that strings of up to about 6 words looked grammatically correct, but that longer sequences contained errors. This is a characteristic of trigram models. Shannon noted in 1949 that random sequences that fit the n-gram (letter or word) statistics of English appear correct up to about 2n. All of these models have the property that they are trained in the same order that children learn language. For example, parsing sentences without semantics is difficult, but extracting semantics without parsing (text search) is easy. As a second example, it is possible to build a lexicon from text only if you know the rules for word segmentation. However, the reverse is not true. It is not necessary to have a lexicon to segment continuous text (spaces removed). The segmentation rules can be derived from n-gram statistics, analogous to learning the phonological rules for segmenting continuous speech. This was first demonstrated in text by Hutchens and Alder, which I improved on in 1999. http://cs.fit.edu/~mmahoney/dissertation/lex1.html With this observation, it seems that hard coding rules for inheritance, equivalence, logical, temporal etc. relations, into a knowledge representation will not help in learning these relations from text. The language model still has to learn these relations from previously learned, simpler concepts. In other words, the model has to learn the meanings of is, and, not, if-then, all, before, etc. without any help from the structure of the knowledge represenation or explicit encoding. The model has to first learn how to convert compound sentences into a formal representation and back, and only then can it start using or adding to the knowledge base. So my question is: what is needed to extend language models to the level of compound sentences? More training data? Different training data? A new theory of language acquisition? More hardware? How much? -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
- Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, October 24, 2006 12:37:16 PM Subject: Re: [agi] Language modeling Matt Mahoney wrote: Converting natural language to a formal representation requires language modeling at the highest level. The levels from lowest to highest are: phonemes, word segmentation rules, semantics, simple sentences, compound sentences. Regardless of whether your child learned to read at age 3 or not at all, children always learn language in this order. And the evidence for this would be what? Um, any textbook on psycholinguistics or developmental psychology, also the paper by Jusczyk I cited earlier. Ben pointed me to a book by Tomasello which I haven't read, but here is a good summary of his work on language acquisition in children. http://email.eva.mpg.de/~tomas/pdf/Mussen_chap_proofs.pdf I realize that the stages of language learning overlap, but they do not all start at the same time. It is a simple fact that children learn words with semantic content like ball or milk before they learn function words like the or of, in spite of the higher frequency of the latter. Likewise, successful language models used for information retrieval ignore function words and word order. Furthermore, children learn word segmentation rules before they learn words, again consistent with statistical language models. (The fact that children can learn sign language at 6 months is not inconsistent with these models. Sign language does not have the word segmentation problem). We can learn from these observations. One conclusion that I draw is that you can't build an AGI and tack on language modeling later. You have to integrate language modeling and train it in parallel with nonverbal skills such as vision and motor control, similar to training a child. We don't know today whether this will turn out to be true. Another important question is: how much will this cost? How much CPU, memory, and training data do you need? Again we can use cognitive models to help answer these questions. According to Tomasello, children are exposed to about 5000 to 7000 utterances per day, or about 20,000 words. This is equivalent to about 100 MB of text in 3 years. Children learn to use simple sentences of the form (subject-verb-object) and recognize word order in these sentences at about 22-24 months. For example, they respond correctly to make the bunny push the horse. However, such models are word specific. At about age 3 1/2, children are able to generalize novel words used in context as a verb to other syntactic constructs, e.g. to construct transitive sentences given examples where the verb is used only intransitively. This is about the state of the art with statistical models trained on hundreds of megabytes of text. Such experiments suggest that adult level modeling, which will be needed to interface with structured knowledge bases, will require about a gigabyte of training data. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Motivational Systems that are stable
My comment on Richard Loosemore's proposal: we should not be confident in our ability to produce a stable motivational system. We observe that motivational systems are highly stable in animals (including humans). This is only because if an animal can manipulate its motivations in any way, then it is quickly removed by natural selection. Examples of manipulation might be to turn off pain or hunger or reproductive drive, or to stimulate its pleasure center. Humans can do this to some extent by using drugs, but this leads to self destructive behavior. In experiments where a mouse can stimulate its pleasure center via an electrode in its brain by pressing a lever, it will press the lever, foregoing food and water until it dies.So we should not take the existence of stable motivational systems in nature as evidence that we can get it right. These systems are complex, have evolved over a long time, and even then don't always work in the face of technology or a rapidly changing environment.-- Matt Mahoney, [EMAIL PROTECTED] This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Motivational Systems that are stable
- Original Message From: James Ratcliff [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, October 28, 2006 10:23:58 AMSubject: Re: [agi] Motivational Systems that are stableI disagree that humans really have a "stable motivational system" or would have to have a much more strict interpretation of that phrase. Overall humans as a society have in general a stable system (discounting war and etc) But as individuals, too many humans are unstable in many small if not totally self-destructivee ways.I think we are misunderstanding. By "motivational system" I mean the part of the brain (or AGI) that provides the reinforcement signal (reward or penalty). By "stable", I mean that you have no control over the logic of this system. You cannot train it like you can train the other parts of your brain. You cannot learn to turn off pain or hunger or fear or fatigue or the need for sleep, etc. You cannot alter your emotional state. You cannot make yourself feel happy on demand. You cannot make yourself like what you don't like and vice versa. The pathways from your senses to the pain/pleasure centers of your brain are hardwired, determined by genetics and not alterable through learning.For an AGI it is very important that a motivational system be stable. The AGI should not be able to reprogram it. If it could, it could simply program itself for maximum pleasure and enter a degenerate state where it ceases to learn through reinforcement. It would be like the mouse that presses a lever to stimulate the pleasure center of its brain until it dies.It is also very important that a motivational system be correct. If the goal is that an AGI be friendly or obedient (whatever that means), then there needs to be a fixed function of some inputs that reliably detects friendliness or obedience. Maybe this is as simple as a human user pressing a button to signal pain or pleasure to the AGI. Maybe it is something more complex, like a visual system that recognizes facial expressions to tell if the user is happy or mad. If the AGI is autonomous, it is likely to be extremely complex. Whatever it is, it has to be correct.To answer your other question, I am working on natural language processing, although my approach is somewhat unusual.http://cs.fit.edu/~mmahoney/compression/text.html-- Matt Mahoney, [EMAIL PROTECTED] This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Natural versus formal AI interface languages
I guess the AI problem is solved, then. I can already communicate with my computer using formal, unambiguous languages. It already does a lot of things better than most humans, like arithmetic, chess, memorizing long lists and recalling them perfectly...If a machine can't pass the Turing test, then what is your definition of intelligence?-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: John Scanlon [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, October 31, 2006 8:48:43 AMSubject: [agi] Natural versus formal AI interface languages One of the major obstacles to real AI is the belief thatknowledge ofa natural language is necessary for intelligence. Ahuman-level intelligent system should be expected to have the ability to learn a natural language, but it is not necessary. It is better to start with a formal language, with unambiguous formal syntax,as the primary interface between human beings and AI systems. This type of language could be called a "para-natural formallanguage." It eliminatesall of the syntactical ambiguity that makes competent use of a natural language so difficult to implement in an AI system. Such a language would also be a member of the class "fifth generation computer language." This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED] This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Natural versus formal AI interface languages
Artificial languages that remove ambiguity like Lojban do not bring us any closer to solving the AI problem. It is straightforward to convert between artificial languages and structured knowledge (e.g first order logic), but it is still a hard (AI complete) problem to convert between natural and artificial languages. If you could translate English - Lojban - English, then you could just as well translate, e.g. English - Lojban - Russian. Without a natural language model, you have no access to the vast knowledge base of the Internet, or most of the human race. I know people can learn Lojban, just like they can learn Cycl or LISP. Lets not repeat these mistakes. This is not training, it is programming a knowledge base. This is narrow AI. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Natural versus formal AI interface languages
- Original Message From: Ben Goertzel [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, October 31, 2006 9:26:15 PM Subject: Re: Re: [agi] Natural versus formal AI interface languages Here is how I intend to use Lojban++ in teaching Novamente. When Novamente is controlling a humanoid agent in the AGISim simulation world, the human teacher talks to it about what it is doing. I would like the human teacher to talk to it in both Lojban++ and English, at the same time. According to my understanding of Novamente's learning and reasoning methods, this will be the optimal way of getting the system to understand English. At once, the system will get a perceptual-motor grounding for the English sentences, plus an understanding of the logical meaning of the sentences. I can think of no better way to help a system understand English. Yes, this is not the way humans do it. But so what? Novamente does not have a human brain, it has a different sort of infrastructure with different strengths and weaknesses. What about using baby English instead of an artificial language? By this I mean simple English at the level of a 2 or 3 year old child. Baby English has many of the properties that make artificial languages desirable, such as a small vocabulary, simple syntax and lack of ambiguity. Adult English is ambiguous because adults can use vast knowledge and context to resolve ambiguity in complex sentences. Children lack these abilities. I don't believe it is possible to map between natural and structured language without solving the natural language modeling problem first. I don't believe that having structured knowledge or a structured language available makes the problem any easier. It is just something else to learn. Humans learn natural language without having to learn structured languages, grammar rules, knowledge representation, etc. I realize that Novamente is different from the human brain. My argument is based on the structure of natural language, which is vastly different from artificial languages used for knowledge representation. To wit: - Artificial languages are designed to be processed (translated or compiled) in the order: lexical tokenization, syntactic parsing, semantic extraction. This does not work for natural language. The correct order is the order in which children learn: lexical, semantics, syntax. Thus we have successful language models that extract semantics without syntax (such as information retrieval and text categorization), but not vice versa. - Artificial language has a structure optimized for serial processing. Natural language is optimized for parallel processing. We resolve ambiguity and errors using context. Context detection is a type of parallel pattern recognition. Patterns can be letters, groups of letters, words, word categories, phrases, and syntactic structures. We recognize and combine perhaps tens or hundreds of patterns simultaneously by matching to perhaps 10^5 or more from memory. Artificial languages have no such mechanism and cannot tolerate ambiguity or errors. - Natural language has a structure that allows incremental learning. We can add words to the vocabulary one at a time. Likewise for phrases, idioms, classes of words and syntactic structures. Artificial languages must be processed by fixed algorithms. Learning algorithms are unknown. - Natural languages evolve slowly in a social environment. Artificial languages are fixed according to some specificiation. - Children can learn natural languages. Artificial languages are difficult to learn even for adults. - Writing in an artificial language is an iterative process in which the output is checked for errors by a computer and the utterance is revised. Natural language uses both iterative and forward error correction. By natural language I include man made languages like Esperanto. Esperanto was designed for communication between humans and has all the other properties of natural language. It lacks irregular verbs and such, but this is really a tiny part of a language's complexity. A natural language like English has a complexity of about 10^9 bits. How much information does it take to list all the irregularities in English like swim-swam, mouse-mice, etc? -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Natural versus formal AI interface languages
I don't know enough about Novamente to say if your approach would work. Using an artificial language as part of the environment (as opposed to a substitute for natural language) does seem to make sense. I think an interesting goal would be to teach an AGI to write software. If I understand your explanation, this is the same problem. I want to teach the AGI two languages (English and x86-64 machine code), one to talk to me and the other to define its environment. I would like to say to the AGI, write a program to print the numbers 1 through 100, are there any security flaws in this web browser? and ultimately, write a program like yourself, but smarter. This is obviously a hard problem, even if I substitute a more English-like programming language like COBOL. To solve the first example, the AGI needs an adult level understanding of English and arithmetic. To solve the second, it needs a comprehensive world model, including an understanding of how people think and the things they can experience. (If an embedded image can set a cookie, is this a security flaw?). When it can solve the third, we are in trouble (topic for another list). How could such an AGI be built? What would be its architecture? What learning algorithm? What training data? What computational cost? -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Ben Goertzel [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, November 2, 2006 3:45:42 PM Subject: Re: Re: [agi] Natural versus formal AI interface languages Yes, teaching an AI in Esperanto would make more sense than teaching it in English ... but, would not serve the same purpose as teaching it in Lojban++ and a natural language in parallel... In fact, an ideal educational programme would probably be to use, in parallel -- an Esperanto-based, rather than English-based, version of Lojban++ -- Esperanto However, I hasten to emphasize that this whole discussion is (IMO) largely peripheral to AGI. The main point is to get the learning algorithms and knowledge representation mechanisms right. (Or if the learning algorithm learns its own KR's, that's fine too...). Once one has what seems like a workable learning/representation framework, THEN one starts talking about the right educational programme. Discussing education in the absence of an understanding of internal learning algorithms is perhaps confusing... Before developing Novamente in detail, I would not have liked the idea of using Lojban++ to help teach an AGI, for much the same reasons that you are now complaining. But now, given the specifics of the Novamente system, it turns out that this approach may actually make teaching the system considerably easier -- and make the system more rapidly approach the point where it can rapidly learn natural language on its own. To use Eric Baum's language, it may be that by interacting with the system in Lojban++, we human teachers can supply the baby Novamente with much of the inductive bias that humans are born with, and that helps us humans to learn natural languages so relatively easily I guess that's a good way to put it. Not that learning Lojban++ is a substitute for learning English, rather that the knowledge gained via interaction in Lojban++ may be a substitute for human babies' language-focused and spacetime-focused inductive bias. Of course, Lojban++ can be used in this way **only** with AGI systems that combine -- a robust reinforcement learning capability -- an explicitly logic-based knowledge representation But Novamente does combine these two factors. I don't expect to convince you that this approach is a good one, but perhaps I have made my motivations clearer, at any rate. I am appreciating this conversation, as it is pushing me to verbally articulate my views more clearly than I had done before. -- Ben G On 11/2/06, Matt Mahoney [EMAIL PROTECTED] wrote: - Original Message From: Ben Goertzel [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, October 31, 2006 9:26:15 PM Subject: Re: Re: [agi] Natural versus formal AI interface languages Here is how I intend to use Lojban++ in teaching Novamente. When Novamente is controlling a humanoid agent in the AGISim simulation world, the human teacher talks to it about what it is doing. I would like the human teacher to talk to it in both Lojban++ and English, at the same time. According to my understanding of Novamente's learning and reasoning methods, this will be the optimal way of getting the system to understand English. At once, the system will get a perceptual-motor grounding for the English sentences, plus an understanding of the logical meaning of the sentences. I can think of no better way to help a system understand English. Yes, this is not the way humans do it. But so what? Novamente does not have a human brain, it has a different sort of infrastructure with different strengths and weaknesses. What about using
Re: Re: [agi] Natural versus formal AI interface languages
- Original Message From: Ben Goertzel [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Friday, November 3, 2006 9:28:24 PM Subject: Re: Re: [agi] Natural versus formal AI interface languages I do not agree that having precise quantitative measures of system intelligence is critical, or even important to AGI. The reason I ask is not just to compare different systems (which you can't really do if they serve different purposes), but also to measure progress. When I experiment with language models, I often try many variations, tune parameters, etc., so I need a quick test to see if what I did worked. I can do that very quickly using text compression. I can test tens or hundreds of slightly different models per day and make very precise measurements. Of course it is also useful that I can tell if my model works better or worse than somebody else's model that uses a completely different method. There does not seem to be much cooperation on this list toward the goal of achieving AGI. Everyone has their own ideas. That's OK. The purpose of having a metric is not to make it a race, but to help us communicate what works and what doesn't so we can work together while still pursuing our own ideas. Papers on language modeling do this by comparing different algorithms and reporting the results by word perplexity. So you don't have to re-experiment with various n-gram backoff models, LSA, statistical parsers, etc. You already know a lot about what works and what doesn't. Another reason for measurements is that it makes your goals concrete. How do you define general intelligence? Turing gave us a well defined goal, but there are some shortcomings. The Turing test is subjective, time consuming, isn't appropriate for robotics, and really isn't a good goal if it means deliberately degrading performance in order to appear human. So I am looking for better tests. I don't believe the approach of let's just build it and see what it does is going to produce anything useful. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Natural versus formal AI interface languages
Ben, The test you described (Easter Egg Hunt) is a perfectly good example of the type of test I was looking for. When you run the experiment you will no doubt repeat it many times, adjusting various parameters. Then you will evaluate by how many eggs are found, how fast, and the extent to which it helps the system learns to play Hide and Seek (also a measurable quantity). Two other good qualities are that the test is easy to describe and obviously relevant to intelligence. For text compression, the relevance is not so obvious. I look forward to seeing a paper on the outcome of the tests. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Ben Goertzel [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Friday, November 3, 2006 10:51:16 PM Subject: Re: Re: Re: Re: [agi] Natural versus formal AI interface languages I am happy enough with the long-term goal of independent scientific and mathematical discovery... And, in the short term, I am happy enough with the goals of carrying out the (AGISim versions of) the standard tasks used by development psychologists to study childrens' cognitive behavior... I don't see a real value to precisely quantifying these goals, though... To give an example of the kind of short-term goal that I think is useful, though, consider the following. We are in early 2007 (if all goes according to plan) going to teach Novamente to carry out a game called iterated Easter Egg hunt -- basically, to carry out an Easter Egg hunt in a room full of other agents ... and then do so over and over again, modeling what the other agents do and adjusting its behavior accordingly. Now, this task has a bit in common with the game Hide-and-Seek. So, you'd expect that a Novamente instance that had been taught iterated Easter Egg Hunt, would also be good at hide-and-seek. So, we want to see that the time required for an NM system to learn hide-and-seek will be less if the NM system has previously learned to play iterated Easter Egg hunt... This sort of goal is, I feel, good for infant-stage AGI education However, I wouldn't want to try to turn it into an objective IQ test. Our goal is not to make the best possible system for playing Easter Egg hunt or hide and seek or fetch or whatever And, in terms of language learning, our initial goal will not be to make the best possible system for conversing in baby-talk... Rather, our goal will be to make a system that can adequately fulfill these early-stage tasks, but in a way that we feel will be indefinitely generalizable to more complex tasks. This, I'm afraid, highlights a general issue with formal quantitative intelligence measures as applied to immature AGI systems/minds. Often the best way to achieve some early-developmental-stage task is going to be an overfitted, narrow-AI type of algorithm, which is not easily extendable to address more complex tasks. This is similar to my complaint about the Hutter Prize. Yah, a superhuman AGI will be an awesome text compressor. But this doesn't mean that the best way to achieve slightly better text compression than current methods is going to be **at all** extensible in the direction of AGI. Matt, you have yet to convince me that seeking to optimize interim quantitative milestones is a meaningful path to AGI. I think it is probably just a path to creating milestone-task-overfit narrow-AI systems without any real AGI-related expansion potential... -- Ben - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Natural versus formal AI interface languages
Another important lesson from SHRDLU, aside from discovering that the approach of hand coding knowledge doesn't work, was how long it took to discover this. It was not at all obvious from the initial success. Cycorp still hasn't figured it out after over 20 years. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Charles D Hixson [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Sunday, November 5, 2006 4:46:12 PM Subject: Re: [agi] Natural versus formal AI interface languages Richard Loosemore wrote: ... This is a question directed at this whole thread, about simplifying language to communicate with an AI system, so we can at least get something working, and then go from there This rationale is the very same rationale that drove researchers into Blocks World programs. Winograd and SHRDLU, etc. It was a mistake then: it is surely just as much of a mistake now. Richard Loosemore. - Not surely. It's definitely a defensible position, but I don't see any evidence that it has even a 50% probability of being correct. Also I'm not certain that SHRDLU and Blocks World were mistakes. They didn't succeed in their goals, but they remain as important markers. At each step we have limitations imposed by both our knowledge and our resources. These limits aren't constant. (P.S.: I'd throw Eliza into this same category...even though the purpose behind Eliza was different.) Think of the various approaches taken as being experiments with the user interface...since that's a large part of what they were. They are, of course, also experiments with how far one can push a given technique before encountering a combinatorial explosion. People don't seem very good at understanding that intuitively. In neural nets this same problem re-appears as saturation, the point at which as you learn new things old things become fuzzier and less certain. This may have some relevance to the way that people are continually re-writing their memories whenever they remember something. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Natural versus formal AI interface languages
- Original Message From: BillK [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Monday, November 6, 2006 10:08:09 AM Subject: Re: [agi] Natural versus formal AI interface languages Ogden said that it would take seven years to learn English, seven months for Esperanto, and seven weeks for Basic English, comparable with Ido. Basic English = 850 words = 10 words per day. Esperanto = 900 root forms or 17,000 words (http://www.freelang.net/dictionary/esperanto.html) = 4 to 80 words per day. English = 30,000 to 80,000 words = 12 to 30 words per day. SHRDLU = 200 words? = 0.3 words per day for 2 years. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] The concept of a KBMS
- Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Monday, November 6, 2006 7:49:06 PMSubject: Re: [agi] The concept of a KBMSThis is the specification of my logic: http://www.geocities.com/genericai/GI-Geniform.htm I conjecture thatNL sentences can be easilytranslated to/fromthis form. I conjecture it will be hard.Here is why. If it was easy to translate between natural language and an unambiguous, structured form, then it would be easy to translate between two natural languages, e.g. Russian - Geniform - English. This problem is known to be hard.What does the prepositional phrase "with" modify in "I ate pizza with {pepperoni, a fork, gusto, George}"?What does "they" refer to in (from Lenat) "The police arrested the demonstrators because they {feared, advocated} violence"?What does "it" refer to in "it is raining"?Is the following sentence correct: "The cat caught a moose"?What is the structured representation of "What?"-- Matt Mahoney, [EMAIL PROTECTED] This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] The crux of the problem
James Ratcliff [EMAIL PROTECTED] wrote:Many of these examples actually arnt hard, if you use some statitisical information and common sense knowledge base.The problem is not that these examples are hard, but that are are millions of them. To parse English you have to know that pizzas have pepperoni, that demonstrators advocate violence, that cats chase mice, and so on. There is no neat, tidy algorithm that will generate all of this knowledge. You can't do any better than to just write down all of these facts. The data is not compressable.I said millions, but we really don't know, maybe 10^9 bits. We have a long history of underestimating the complexity of natural language, going back to SHRDLU, Eliza, and the 1959 BASEBALL program, all of which could parse simple sentences. Cycorp is the only one who actually collected this much common human knowledge in a structured form. They probably did not expect it would take 20 years of manual coding, only to discover you can't build the knowledge base first and then tack on a natural language interface later. Something is still wrong.We have many ways to represent knowledge: LISP lists, frame-slot, augmented first order logic, term logic, Bayesian, connectionist, NARS, Novamente, etc. Humans can easily take sentences and convert them into the internal representation of any of these systems. Yet none of these systems has solved the natural language interface problem. Why is this?You can't ignore information theory. A Turing machine can't model another machine with greater Kolmogorov complexity. The brain can't understand itself. We want to build data structures where we can see how knowledge is represented so we can test and debug our systems. Sorry, information theory doesn't allow it. You can't have your AGI and understand it too. We need to think about opaque representations, systems we can train and test without looking inside, systems that work but we don't know how. This will be hard, but we have already tried the easy ways.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: James Ratcliff [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, November 7, 2006 9:38:54 AMSubject: Re: [agi] The concept of a KBMSMatt Mahoney [EMAIL PROTECTED] wrote: - Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Monday, November 6, 2006 7:49:06 PMSubject: Re: [agi] The concept of a KBMSThis is the specification of my logic: http://www.geocities.com/genericai/GI-Geniform.htm I conjecture thatNL sentences can be easilytranslated to/fromthis form. I conjecture it will be hard.Here is why. If it was easy to translate between natural language and an unambiguous, structured form, then it would be easy to translate between two natural languages, e.g. Russian - Geniform - English. This problem is known to be hard.What does the prepositional phrase "with" modify in "I ate pizza with {pepperoni, a fork, gusto, George}"?It is simple to show that there is a type fo pizza that is a pepperoni pizza, but not a fork pizza etc. The others all have different roles that are recognizable by the word type they have.This would create frames similar to:ate(Person, pepperoni pizza)ate(Person, pizza, with Utensil) ate(Person, pizza, with Feeling)ate(Person, pizza, with Person) So the eat action would show the different type of modifiers it would expect, and when it saw something different it would try to fit it into one of the expected slots, or a new slow/frame definition would need to be created. What does "they" refer to in (from Lenat) "The police arrested the demonstrators because they {feared, advocated} violence"?This one is harder, but...Statistically, on the first pass, "police feared violence" has 62 instances, and "demonstrators feared violence" has 0Then we can expand the "violence" term to attacks, riots we seepolice: 50+40demonstrators: 0 So we have overwhelming evidence there for police fearing it. Grammatically we assume the closest match which is demonstrators, so those have to be reconciled together to come up with police.Is the following sentence correct: "The cat caught a moose"?This can acutally be handled fairly well. looking at a frame of cat, and moose, we can statistically see that it is a rare if not non-existent event that a cat can catch a moose. Now in theory this could be a sci-fi book where a huge cat did catch the moose, but that would have to be learned with more context information.A frame for Cat catching would show about 15% mouse5% rats5% bird3% othersA general statement can be made that "cats catch small animals" and that matches most item.It is mentioned once on the net, by an unreliable quotes page, that a "cat caught a moose"and once in a fairy tale (The Violet Fairy Book - The Nunda)a cat caught a donkey.But for general commons sense these type source would be too far from the norm and are
Re: Re: RE: [agi] Natural versus formal AI interface languages
Ben Goertzel [EMAIL PROTECTED] wrote: I am afraid that it may not be possible to find an initial project that is both * small * clearly a meaningfully large step along the path to AGI * of significant practical benefit I'm afraid you're right. It is especially difficult because there is a long history of small (i.e narrow AI) projects that appear superficially to be meaningful steps toward AGI. Sometimes it is decades before we discover that they don't scale. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Natural versus formal AI interface languages
I think that natural language and the human genome have about the same order of magnitude complexity. The genome is 6 x 10^9 bits (2 bits per base pair) uncompressed, but there is a lot of noncoding DNA and some redundancy. By decoding, I assume you mean building a model and understanding the genome to the point where you could modify it and predict what will happen. The complexity of natural language is probably 10^9 bits. This is supported by: - Turing's 1950 estimate, which he did not explain. - Landauer's estimate of human long term memory capacity. - The quantity of language processed by an average adult, times Shannon's estimate of the entropy of written English of 1 bit per character. - Extrapolating the relationship between language model training set size and compression ratio in this graph: http://cs.fit.edu/~mmahoney/dissertation/ I don't think the encryption of the genome is any worse. Complex systems (that have high Kolmogorov complexity, are incrementally updatable, and do useful computation) tend to converge to the boundary between stability and chaos, where some perturbations decay while others grow. A characteristic of such systems (as studied by Kaufmann) is that the number of stable states or attractors tends to the square root of the size. The number of human genes is about the same as the size of the human vocabulary, about 30,000. Neither system is encrypted in the mathematical sense. Encryption cannot be an emergent property because it is at the extreme chaotic end of the spectrum. Changing one bit of the key or plaintext affects every bit of the ciphertext. The difference is that it is easier (faster and more ethical) to experiment with language models than the human genome. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Eliezer S. Yudkowsky [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 8, 2006 3:23:10 PM Subject: Re: [agi] Natural versus formal AI interface languages Eric Baum wrote: (Why should producing a human-level AI be cheaper than decoding the genome?) Because the genome is encrypted even worse than natural language. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] The crux of the problem
James,Many of the solutions you describe can use information gathered from statistical models, which are opaque. I need to elaborate on this, because I think opaque models will be fundamental to solving AGI. We need to build models in a way that doesn't require access to the internals. This requires a different approach than traditional knowledge representation. It will require black box testing and performance metrics. It will be less of an engineering approach, and more of an experimental one.Information retrieval is a good example. It is really simple. You type a question, and the system matches the words in your query to words in the document and ranks the documents by TF*IDF (term frequency times log inverse document frequency). This is an opaque model. We normally build an index, but this is really just an optimization. The language model is just the documents themselves. There is no good theory to explain why it works. It just does.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: James Ratcliff [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Wednesday, November 8, 2006 10:14:43 AMSubject: Re: [agi] The crux of the problemMatt: To parse English you have to know that pizzas have pepperoni, that demonstrators advocate violence, that cats chase mice, and so on. There is no neat, tidy algorithm that will generate all of this knowledge. You can't do any better than to just write down all of these facts. The data is not compressable.James: You CAN actually, simply because there is patterns, anytime there are patterns, there is regularity, and the ability to compress things. And those things are limited, even if on a super-large scale. The problem with that is the irregular parts, which have to be handled, and the amount of bad data, which has to be handled.But a simple example isate a pepperoni pizza ate a tuna pizzaate a VEGAN SUPREME pizzaate a Mexican pizzaate a pineapple pizzaAnd we can see right off, that these are different types of pizza topping, and we can compress that into a frame easilyFrame Pizza: can have Toppings: pepperoni, tuna, pineapple can be Type: vegan supreme, mexicanThis does take some work, and does require some good data, but can be done.We can take that further to gather probabilities, and confidences about the Pizza frame, such that we can determine that a pepperoni pizza is the most likely if a random pizza is ordered.This does not give a perfect collection of information, but alot can be garnered just from this. This does not solve the AI problem, but does give us a nice building block of Knowledge to start working with. This is a much preferred method than hand-coding each piece as Cyc has seen, and they are currently coding and using many algorithms now that take advantage of statistical NLP and google to assist and suggest answers, and check the answers they have in place.There is a simple pattern between Nouns and Verbs as well that can be taken out and extracted with relative ease, and also between Adj and Nouns, and Adv and Verbs.Ex: The dog eats, barks, growls, sniffs, attacks, alerts.That gives us an initial store of information about a dog frame.Then if given Rover barked at the mailmen. we can programmatically narrow the possibilities about what Actor can fulfill the "bark" role, and see that dogs bark, and are most likely to bark at the mailman, and give a probability, and confidence.One problem I have with you task of text compression is the stricture that it retain exactly the same text, as opposed to exactly the same Information.For a computer science data transmission issue the first is important, but for an AI issue the latter is more important.The dog sniffed the shoes. and The dog smelled the shoes. Is so very close in meaning as to be acceptable representation of the event, and many things can be reduced to their component parts, or even use a more common synonym, or word root.And it much more important that the system would be able to answer the question What did the dog sniff/smell? as opposed to keeping the data exactly the same.As long as the answers come out the same, the internal representation could be in chinese or marks in the sand.James Ratcliff This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] The crux of the problem
James Ratcliff [EMAIL PROTECTED] wrote:Matt, expand upon the first part as you said there please.I argued earlier that a natural language model has a complexity of about 10^9 bits. To be precise, let p(s) be a function that outputs an estimate of the probability that string s will appear as a prefix in human discourse, such as might occur in a Turing test between a judge and human confederate. If p(s) is a good estimate of the true probability for most s, then this model could be used to pass the Turing test as follows: if Q is the dialog so far, the the machine will respond with answer A by selecting randomly from the distribution p(A|Q) = p(QA)/p(Q). I argue that the Kolmogorov complexity of a function p() which is sufficiently accurate to pass the Turing test is about 10^9 bits.My argument that a language model must be opaque is based on the premise that the human brain cannot understand itself, for the same reason that a Turing machine cannot simulate another Turing machine with greater Kolmogorov complexity. This is not to say we can't build a brain. There are simple learning algorithms that can store vast knowledge. We can understand enough of the brain in order to describe its development, to write an algorithm for the learning mechanism and simulate its behavior. But we cannot know all of the knowledge it has learned. So we will be able to build an AGI and train it, but after we train it we cannot know everything that it knows. A transparent representation that implies otherwise is not possible.Most AGI designs have the form of a data structure to represent knowledge, and functions to convert input to knowledge and knowledge to output: input -- knowledge representation -- outputMany knowledge representations have been proposed: frame-slot, first order logic, connnectionist systems, etc. These generally have the form of labeled graphs, where the vertices generally correspond to words, concepts, or system states, and the edges correspond to relations such as "is-a" or "contains", implications, probabilities, confidences, etc. We argue for the correctness of these models by showing how facts such as "the cat ate a mouse" can be easily represented, and give many examples.Here is the problem. We know that the knowledge representation must have a complexity of 10^9 bits. Anything smaller cannot work. When we give examples, we usually draw graphs with just a few edges per vertex, but this is not how it will look when training is complete. Suppose there are 10^5 vertices, enough to represent a large vocabulary. Then your trained system must have about 10^4 edges per vertex. Building such a model by hand, or even trying to understand or debug it would be hopeless. I would call such a model opaque.It is natural for us to seek simple solutions, a "theory of everything". After all, we are agents in the sense of Hutter's AIXI following the provably optimal strategy of Occam's Razor. But in our drive to simplify and understand, we are trying to compress the language model to an impossibly small size, always misled down a dead end path by our initial successes with low complexity toy systems.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: James Ratcliff [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Friday, November 10, 2006 9:56:00 AMSubject: Re: [agi] The crux of the problemMatt, expand upon the first part as you said there please.JamesMatt Mahoney [EMAIL PROTECTED] wrote: James,Many of the solutions you describe can use information gathered from statistical models, which are opaque. I need to elaborate on this, because I think opaque models will be fundamental to solving AGI. We need to build models in a way that doesn't require access to the internals. This requires a different approach than traditional knowledge representation. It will require black box testing and performance metrics. It will be less of an engineering approach, and more of an experimental one.Information retrieval is a good example. It is really simple. You type a question, and the system matches the words in your query to words in the document and ranks the documents by TF*IDF (term frequency times log inverse document frequency). This is an opaque model. We normally build an index, but this is really just an optimization. The language model is just the documents themselves. There is no good theory to explain why it works. It just does.-- Matt Mahoney, [EMAIL PROTECTED] This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Natural versus formal AI interface languages
The security of Enigma depended on the secrecy of the algorithm in addition to the key. This violated Kirchoff's principle, the requirement that a system be secure against an adversary who has everything except the key. This mistake has been repeated many times by amateur cryptographers who thought that keeping the algorithm secret improved security. Such systems are invariably broken. Secure systems are built by publishing the algorithm so that people can try to break them before they are used for anything important. It has to be done this way because there is no provably secure system (regardless of whether P = NP), except the one time pad, which is impractical because it lacks message integrity, and the key has to be as large as the plaintext and can't be reused. Anyway, my point is that decoding the human genome or natural language is not as hard as breaking encryption. It cannot be because these systems are incrementally updatable, unlike ciphers. This allows you to use search strategies that run in polynomial time. A key search requires exponential time, or else the cipher is broken. Modeling language or the genome in O(n) time or even O(n^2) time with n = 10^9 is much faster than brute force cryptanalysis in O(2^n) time with n = 128. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Eric Baum [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, November 9, 2006 12:18:34 PM Subject: Re: [agi] Natural versus formal AI interface languages Eric Baum [EMAIL PROTECTED] wrote: Matt wrote: Changing one bit of the key or plaintext affects every bit of the cipherte= xt. That is simply not true of most encryptions. For example, Enigma.=20 Matt: Enigma is laughably weak compared to modern encryption, such as AES, RSA, S= HA-256, ECC, etc. Enigma was broken with primitive mechanical computers an= d pencil and paper. Enigma was broken without modern computers, *given access to the machine.* I chose Enigma as an example, because to break language it may be necessary to pay attention to the machine-- namely examining the genomics. But that is more work than you envisage ;^) It is true that much modern encryption is based on simple algorithms. However, some crypto-experts would advise more primitive approaches. RSA is not known to be hard, even if P!=NP, someone may find a number-theoretic trick tomorrow that factors. (Or maybe they already have it, and choose not to publish). If you use a mess machine like a modern version of enigma, that is much less likely to get broken, even though you may not have the theoretical results. Your response admits that for stream ciphers changing a bit of the plaintext doesn't affect many bits of the ciphertext, which was what I was mainly responding to. You may prefer other kinds of cipher, but your arguments about chaos are clearly not germane to concluding language is easy to decode. Incidentally, while no encryption scheme is provably hard to break (even assuming P!=NP) more is known about grammars: they are provably hard to decode given P!=NP. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] One grammar parser URL
http://josie.stanford.edu:8080/parser/Fails the Turing test :-) "I ate pizza with {pepperoni|George|chopsticks}" all have the same parse.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: James Ratcliff [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Sunday, November 12, 2006 1:11:32 PMSubject: [agi] One grammar parser URLDuring the grammar NLP discussion, someone asked about various parsers, well here is one that I am looking at now Download http://nlp.stanford.edu/software/lex-parser.shtmlStanfords parser, and a online version is here http://josie.stanford.edu:8080/parser/James ___James Ratcliff - http://falazar.comNew Torrent Site, Has TV and Movie Downloads! http://www.falazar.com/projects/Torrents/tvtorrents_show.php Everyone is raving about the all-new Yahoo! Mail beta. This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Natural versus formal AI interface languages
Eric, can you give an example of a one way function (such as a cryptographic hash or cipher) produced by evolution or by a genetic algorithm? A one-way function f has the property that y = f(x) is easy to compute, but it is hard to find x given f and y. Other examples might be modular exponentiation in large finite groups, or multiplication of prime numbers with thousands of digits. By incrementally updatable, I mean that you can make a small change to a system and the result will be a small change in behavior. For example, most DNA mutations have a small effect. We try to design software systems with this property so we can modify them without breaking them. However, as the system gets bigger, there is more interaction between components, until it reaches the point where every change introduces more bugs than it fixes and the code becomes unmaintainable. This is what happens when the system crosses the boundary from stability to chaotic. My argument for Kauffman's observation that complex systems sit on this boundary is that stable systems are less useful, but chaotic systems can't be developed as a long sequence of small steps. We are able to produce cryptosystems only because they are relatively simple, and even then it is hard. I don't dispute that learning some simple grammars is NP-hard. However, I don't believe that natural language is one of these grammars. It certainly is not simple. The human brain is less powerful than a Turing machine, so it has no special ability to solve NP-hard problems. The fact that humans can learn natural language is proof enough that it can be done. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Eric Baum [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Sunday, November 12, 2006 9:29:13 AM Subject: Re: [agi] Natural versus formal AI interface languages Matt wrote: Anyway, my point is that decoding the human genome or natural language is n= ot as hard as breaking encryption. It cannot be because these systems are = incrementally updatable, unlike ciphers. This allows you to use search str= ategies that run in polynomial time. A key search requires exponential tim= e, or else the cipher is broken. Modeling language or the genome in O(n) t= ime or even O(n^2) time with n =3D 10^9 is much faster than brute force cry= ptanalysis in O(2^n) time with n =3D 128. I don't know what you mean by incrementally updateable, but if you look up the literature on language learning, you will find that learning various sorts of relatively simple grammars from examples, or even if memory serves examples and queries, is NP-hard. Try looking for Dana Angluin's papers back in the 80's. If your claim is that evolution can not produce a 1-way function, that's crazy. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: Re: [agi] A question on the symbol-system hypothesis
James Ratcliff [EMAIL PROTECTED] wrote:Well, words and language based ideas/terms adequatly describe much of the upper levels of human interaction and see appropriate in that case.It fails of course when it devolpes down to the physical level, ie vision or motor cortex skills, but other than that, using language internaly would seem natural, and be much easier to look inside the box ,and see what is going on and correct thesystem's behaviour.No, no, no, that is why AI failed. You can't look inside the box because it's 10^9 bits. Models that are simple enough to debug are too simple to scale. How many times will we repeat this mistake? The contents of a knowledge base for AGI will be beyond our ability to comprehend. Get over it. It will require a different approach.1. Develop a quantifiable criteria for success, a test score.2. Develop a theory of learning.3. Develop a training and test set (about 10^9 bits compressed).4. Tune the learning model to improve the score.Example:1. Criteria: SAT analogy test score.2. Theory: word associtation matrix reduced by singular value decomposition (SVD).3. Data: 50M word corpus of news articles.4. Results: http://iit-iti.nrc-cnrc.gc.ca/iit-publications-iti/docs/NRC-48255.pdfAn SVD factored word association matrix seems pretty opaque to me. You can't point to which matrix elements represent associations like cat-dog, moon-star, etc, nor will you be inserting such knowledge for testing. If you want to understand it, you have to look at the learning algorithm. It turns out that there is an efficient neural model for SVD. http://gen.gorrellville.com/gorrell06.pdfIt should not take decades to develop a knowledge base like Cyc. Statistical approaches can do this in a matter of minutes or hours.-- Matt Mahoney, [EMAIL PROTECTED] This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
I will try to answer several posts here. I said that the knowledge base of an AGI must be opaque because it has 10^9 bits of information, which is more than a person can comprehend. By opaque, I mean that you can't do any better by examining or modifying the internal representation than you could by examining or modifying the training data. For a text based AI with natural language ability, the 10^9 bits of training data would be about a gigabyte of text, about 1000 books. Of course you can sample it, add to it, edit it, search it, run various tests on it, and so on. What you can't do is read, write, or know all of it. There is no internal representation that you could convert it to that would allow you to do these things, because you still have 10^9 bits of information. It is a limitation of the human brain that it can't store more information than this. It doesn't matter if you agree with the number 10^9 or not. Whatever the number, either the AGI stores less information than the brain, in which case it is not AGI, or it stores more, in which case you can't know everything it does. Mark Waser wrote: I certainly don't buy the mystical approach that says that sufficiently large neural nets will come up with sufficiently complex discoveries that we can't understand them. James Ratcliff wrote: Having looked at the nueral network type AI algorithms, I dont see any fathomable way that that type of architecture could create a full AGI by itself. Nobody has created an AGI yet. Currently the only working model of intelligence we have is based on neural networks. Just because we can't understand it doesn't mean it is wrong. James Ratcliff wrote: Also it is a critical task for expert systems to explain why they are doing what they are doing, and for business application, I for one am not goign to blindy trust what the AI says, without a little background. I expect this ability to be part of a natural language model. However, any explanation will be based on the language model, not the internal workings of the knowledge representation. That remains opaque. For example: Q: Why did you turn left here? A: Because I need gas. There is no need to explain that there is an opening in the traffic, that you can see a place where you can turn left without going off the road, that the gas gauge reads E, and that you learned that turning the steering wheel counterclockwise makes the car turn left, even though all of this is part of the thought process. The language model is responsible for knowing that you already know this. There is no need either (or even the ability) to explain the sequence of neuron firings from your eyes to your arm muscles. and this is one of the requirements for the Project Halo contest (took and passed the AP chemistry exam) http://www.projecthalo.com/halotempl.asp?cid=30 This is a perfect example of why a transparent KR does not scale. The expert system described was coded from 70 pages of a chemistry textbook in 28 person-months. Assuming 1K bits per page, this is a rate of 4 minutes per bit, or 2500 times slower than transmitting the same knowledge as natural language. Mark Waser wrote: Given sufficient time, anything should be able to be understood and debugged. ... Give me *one* counter-example to the above . . . . Google. You cannot predict the results of a search. It does not help that you have full access to the Internet. It would not help even if Google gave you full access to their server. When we build AGI, we will understand it the way we understand Google. We know how a search engine works. We will understand how learning works. But we will not be able to predict or control what we build, even if we poke inside. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] One grammar parser URL
1. No can do. The algorithmic complexity of parsing natural language as well as an average adult human is around 10^9 bits. There is no small grammar for English. 2. You need semantics to parse natural language. This is part of what makes it hard. Or do you want a parser that gives you wrong answers? I can do that. 3. If translating natural language to a structured representation is not hard, then do it. People have been working on this for 50 years without success. Doing logical inference is the easy part. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 8:59:45 AM Subject: Re: [agi] One grammar parser URL Several things: 1. Someone suggested these parsers to me: Eugene Charniak's http://www.cog.brown.edu/Research/nlp/resources.html Dan Bikel's http://www.cis.upenn.edu/~dbikel/software.html Demos for both are at: http://lfg-demo.computing.dcu.ie/lfgparser.html It seems that they are similar in function to the Stanford parser. I'd prefer smaller grammars and parsers with smaller memory footprints. 2. I ate pizza with {pepperoni|George|chopsticks} yielding the same parse should be expected. The difference of those sentences is in semantics, and the word with is overloaded with several meanings. The parser is only responsible for syntactic aspects. 3. Translating English sentences to Geniform or some other logical form may not be that hard, but after the translation we have to store the facts in a generic memory and use them for inference. For those, we need a canonical form, to organize the facts via clustering, and to keep track of what facts support other facts. All these are big problems. I'm looking for someone to do the translating so I can work on inference and generic memory. It is easier for one person to focus on one task, such as translation, for several formats. Another can focus on inference for several formats, etc. Then we can help each other while still exploring different ideas. YKY This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
Richard Loosemore [EMAIL PROTECTED] wrote: Understanding 10^9 bits of information is not the same as storing 10^9 bits of information. That is true. Understanding n bits is the same as compressing some larger training set that has an algorithmic complexity of n bits. Once you have done this, you can use your probability model to make predictions about unseen data generated by the same (unknown) Turing machine as the training data. The closer to n you can compress, the better your predictions will be. I am not sure what it means to understand a painting, but let's say that you understand art if you can identify the artists of paintings you haven't seen before with better accuracy than random guessing. The relevant quantity of information is not the number of pixels and resolution, which depend on the limits of the eye, but the (much smaller) number of features that the high level perceptual centers of the brain are capable of distinguishing and storing in memory. (Experiments by Standing and Landauer suggest it is a few bits per second for long term memory, the same rate as language). Then you guess the shortest program that generates a list of feature-artist pairs consistent with your knowledge of art and use it to predict artists given new features. My estimate of 10^9 bits for a language model is based on 4 lines of evidence, one of which is the amount of language you process in a lifetime. This is a rough estimate of course. I estimate 1 GB (8 x 10^9 bits) compressed to 1 bpc (Shannon) and assume you remember a significant fraction of that. Landauer, Tom (1986), “How much do people remember? Some estimates of the quantity of learned information in long term memory”, Cognitive Science (10) pp. 477-493 Shannon, Cluade E. (1950), “Prediction and Entropy of Printed English”, Bell Sys. Tech. J (3) p. 50-64. Standing, L. (1973), “Learning 10,000 Pictures”, Quarterly Journal of Experimental Psychology (25) pp. 207-222. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 9:33:04 AM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: I will try to answer several posts here. I said that the knowledge base of an AGI must be opaque because it has 10^9 bits of information, which is more than a person can comprehend. By opaque, I mean that you can't do any better by examining or modifying the internal representation than you could by examining or modifying the training data. For a text based AI with natural language ability, the 10^9 bits of training data would be about a gigabyte of text, about 1000 books. Of course you can sample it, add to it, edit it, search it, run various tests on it, and so on. What you can't do is read, write, or know all of it. There is no internal representation that you could convert it to that would allow you to do these things, because you still have 10^9 bits of information. It is a limitation of the human brain that it can't store more information than this. Understanding 10^9 bits of information is not the same as storing 10^9 bits of information. A typical painting in the Louvre might be 1 meter on a side. At roughly 16 pixels per millimeter, and a perceivable color depth of about 20 bits that would be about 10^8 bits. If an art specialist knew all about, say, 1000 paintings in the Louvre, that specialist would understand a total of about 10^11 bits. You might be inclined to say that not all of those bits count, that many are redundant to understanding. Exactly. People can easily comprehend 10^9 bits. It makes no sense to argue about degree of comprehension by quoting numbers of bits. Richard Loosemore - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
Sorry if I did not make clear the distinction between knowing the learning algorithm for AGI (which we can do) and knowing what was learned (which we can't). My point about Google is to illustrate that distinction. The Google database is about 10^14 bits. (It keeps a copy of the searchable part of the Internet in RAM). The algorithm is deterministic. You could, in principle, model the Google server in a more powerful machine and use it to predict the result of a search. But where does this get you? You can't predict the result of the simulation any more than you could predict the result of the query you are simulating. In practice the human brain has finite limits just like any other computer. My point about AGI is that constructing an internal representation that allows debugging the learned knowledge is pointless. A more powerful AGI could do it, but you can't. You can't do any better than to manipulate the input and observe the output. If you tell your robot to do something and it sits in a corner instead, you can't do any better than to ask it why, hope for a sensible answer, and retrain it. Trying to debug the reasoning for its behavior would be like trying to understand why a driver made a left turn by examining the neural firing patterns in the driver's brain. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 9:39:14 AM Subject: Re: [agi] A question on the symbol-system hypothesis Mark Waser wrote: Given sufficient time, anything should be able to be understood and debugged. Give me *one* counter-example to the above . . . . Matt Mahoney replied: Google. You cannot predict the results of a search. It does not help that you have full access to the Internet. It would not help even if Google gave you full access to their server. This is simply not correct. Google uses a single non-random algorithm against a database to determine what results it returns. As long as you don't update the database, the same query will return the exact same results and, with knowledge of the algorithm, looking at the database manually will also return the exact same results. Full access to the Internet is a red herring. Access to Google's database at the time of the query will give the exact precise answer. This is also, exactly analogous to an AGI since access to the AGI's internal state will explain the AGI's decision (with appropriate caveats for systems that deliberately introduce randomness -- i.e. when the probability is 60/40, the AGI flips a weighted coin -- but in even those cases, the answer will still be of the form that the AGI ended up with a 60% probability of X and 40% probability of Y and the weighted coin landed on the 40% side). When we build AGI, we will understand it the way we understand Google. We know how a search engine works. We will understand how learning works. But we will not be able to predict or control what we build, even if we poke inside. I agree with your first three statements but again, the fourth is simply not correct (as well as a blatant invitation to UFAI). Google currently exercises numerous forms of control over their search engine. It is known that they do successfully exclude sites (for visibly trying to game PageRank, etc.). They constantly tweak their algorithms to change/improve the behavior and results. Note also that there is a huge difference between saying that something is/can be exactly controlled (or able to be exactly predicted without knowing it's exact internal state) and that something's behavior is bounded (i.e. that you can be sure that something *won't* happen -- like all of the air in a room suddenly deciding to occupy only half the room). No complex and immense system is precisely controlled but many complex and immense systems are easily bounded. - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, November 14, 2006 10:34 PM Subject: Re: [agi] A question on the symbol-system hypothesis I will try to answer several posts here. I said that the knowledge base of an AGI must be opaque because it has 10^9 bits of information, which is more than a person can comprehend. By opaque, I mean that you can't do any better by examining or modifying the internal representation than you could by examining or modifying the training data. For a text based AI with natural language ability, the 10^9 bits of training data would be about a gigabyte of text, about 1000 books. Of course you can sample it, add to it, edit it, search it, run various tests on it, and so on. What you can't do is read, write, or know all of it. There is no internal representation that you could convert it to that would allow you to do these things, because you still have 10^9 bits of information. It is a limitation of the human brain
Re: [agi] A question on the symbol-system hypothesis
Richard, what is your definition of understanding? How would you test whether a person understands art? Turing offered a behavioral test for intelligence. My understanding of understanding is that it is something that requires intelligence. The connection between intelligence and compression is not obvious. I have summarized the arguments here. http://cs.fit.edu/~mmahoney/compression/rationale.html -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 2:38:49 PM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: Richard Loosemore [EMAIL PROTECTED] wrote: Understanding 10^9 bits of information is not the same as storing 10^9 bits of information. That is true. Understanding n bits is the same as compressing some larger training set that has an algorithmic complexity of n bits. Once you have done this, you can use your probability model to make predictions about unseen data generated by the same (unknown) Turing machine as the training data. The closer to n you can compress, the better your predictions will be. I am not sure what it means to understand a painting, but let's say that you understand art if you can identify the artists of paintings you haven't seen before with better accuracy than random guessing. The relevant quantity of information is not the number of pixels and resolution, which depend on the limits of the eye, but the (much smaller) number of features that the high level perceptual centers of the brain are capable of distinguishing and storing in memory. (Experiments by Standing and Landauer suggest it is a few bits per second for long term memory, the same rate as language). Then you guess the shortest program that generates a list of feature-artist pairs consistent with your knowledge of art and use it to predict artists given new features. My estimate of 10^9 bits for a language model is based on 4 lines of evidence, one of which is the amount of language you process in a lifetime. This is a rough estimate of course. I estimate 1 GB (8 x 10^9 bits) compressed to 1 bpc (Shannon) and assume you remember a significant fraction of that. Matt, So long as you keep redefining understand to mean whatever something trivial (or at least, something different in different circumstances), all you do is reinforce the point I was trying to make. In your definition of understanding in the context of art, above, you specifically choose an interpretation that enables you to pick a particular bit rate. But if I chose a different interpretation (and I certainly would - an art historian would never say they understood a painting just because they could tell the artist's style better than a random guess!), I might come up with a different bit rate. And if I chose a sufficiently subtle concept of understand, I would be unable to come up with *any* bit rate, because that concept of understand would not lend itself to any easy bit rate analysis. The lesson? Talking about bits and bit rates is completely pointless which was my point. You mainly identify the meaning of understand as a variant of the meaning of compress. I completely reject this - this is the most idiotic development in AI research since the early attempts to do natural language translation using word-by-word lookup tables - and I challenge you to say why anyone could justify reducing the term in such an extreme way. Why have you thrown out the real meaning of understand and substituted another meaning? What have we gained by dumbing the concept down? As I said in previously, this is as crazy as redefining the complex concept of happiness to be a warm puppy. Richard Loosemore Landauer, Tom (1986), “How much do people remember? Some estimates of the quantity of learned information in long term memory”, Cognitive Science (10) pp. 477-493 Shannon, Cluade E. (1950), “Prediction and Entropy of Printed English”, Bell Sys. Tech. J (3) p. 50-64. Standing, L. (1973), “Learning 10,000 Pictures”, Quarterly Journal of Experimental Psychology (25) pp. 207-222. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 9:33:04 AM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: I will try to answer several posts here. I said that the knowledge base of an AGI must be opaque because it has 10^9 bits of information, which is more than a person can comprehend. By opaque, I mean that you can't do any better by examining or modifying the internal representation than you could by examining or modifying the training data. For a text based AI with natural language ability, the 10^9 bits of training data would be about a gigabyte of text, about 1000
Re: [agi] A question on the symbol-system hypothesis
Mark Waser wrote: Are you conceding that you can predict the results of a Google search? OK, you are right. You can type the same query twice. Or if you live long enough you can do it the hard way. But you won't. Are you now conceding that it is not true that Models that are simple enough to debug are too simple to scale.? OK, you are right again. Plain text is a simple way to represent knowledge. I can search and edit terabytes of it. But this is not the point I wanted to make. I am sure I expressed it badly. The point is there are two parts to AGI, a learning algorithm and a knowledge base. The learning algorithm has low complexity. You can debug it, meaning you can examine the internals to test it and verify it is working the way you want. The knowledge base has high complexity. You can't debug it. You can examine it and edit it but you can't verify its correctness. An AGI with a correct learning algorithm might still behave badly. You can't examine the knowledge base to find out why. You can't manipulate the knowledge base data to fix it. At least you can't do these things any better than manipulating the inputs and observing the outputs. The reason is that the knowledge base is too complex. In theory you could do these things if you lived long enough, but you won't. For practical purposes, the AGI knowledge base is a black box. You need to design your goals, learning algorithm, data set and test program with this in mind. Trying to build transparency into the data structure would be pointless. Information theory forbids it. Opacity is not advantagous or desirable. It is just unavoidable. I am sure I won't convince you, so maybe you have a different explanation why 50 years of building structured knowledge bases has not worked, and what you think can be done about it? And Google DOES keep the searchable part of the Internet in memory http://blog.topix.net/archives/11.html because they have enough hardware to do it. http://en.wikipedia.org/wiki/Supercomputer#Quasi-supercomputing -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
1. The fact that AIXI^tl is intractable is not relevant to the proof that compression = intelligence, any more than the fact that AIXI is not computable. In fact it is supporting because it says that both are hard problems, in agreement with observation. 2. Do not confuse the two compressions. AIXI proves that the optimal behavior of a goal seeking agent is to guess the shortest program consistent with its interaction with the environment so far. This is lossless compression. A typical implementation is to perform some pattern recognition on the inputs to identify features that are useful for prediction. We sometimes call this lossy compression because we are discarding irrelevant data. If we anthropomorphise the agent, then we say that we are replacing the input with perceptually indistinguishable data, which is what we typically do when we compress video or sound. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 3:48:37 PM Subject: Re: [agi] A question on the symbol-system hypothesis The connection between intelligence and compression is not obvious. The connection between intelligence and compression *is* obvious -- but compression, particularly lossless compression, is clearly *NOT* intelligence. Intelligence compresses knowledge to ever simpler rules because that is an effective way of dealing with the world. Discarding ineffective/unnecessary knowledge to make way for more effective/necessary knowledge is an effective way of dealing with the world. Blindly maintaining *all* knowledge at tremendous costs is *not* an effective way of dealing with the world (i.e. it is *not* intelligent). 1. What Hutter proved is that the optimal behavior of an agent is to guess that the environment is controlled by the shortest program that is consistent with all of the interaction observed so far. The problem of finding this program known as AIXI. 2. The general problem is not computable [11], although Hutter proved that if we assume time bounds t and space bounds l on the environment, then this restricted problem, known as AIXItl, can be solved in O(t2l) time Very nice -- except that O(t2l) time is basically equivalent to incomputable for any real scenario. Hutter's proof is useless because it relies upon the assumption that you have adequate resources (i.e. time) to calculate AIXI -- which you *clearly* do not. And like any other proof, once you invalidate the assumptions, the proof becomes equally invalid. Except as an interesting but unobtainable edge case, why do you believe that Hutter has any relevance at all? - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 2:54 PM Subject: Re: [agi] A question on the symbol-system hypothesis Richard, what is your definition of understanding? How would you test whether a person understands art? Turing offered a behavioral test for intelligence. My understanding of understanding is that it is something that requires intelligence. The connection between intelligence and compression is not obvious. I have summarized the arguments here. http://cs.fit.edu/~mmahoney/compression/rationale.html -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 2:38:49 PM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: Richard Loosemore [EMAIL PROTECTED] wrote: Understanding 10^9 bits of information is not the same as storing 10^9 bits of information. That is true. Understanding n bits is the same as compressing some larger training set that has an algorithmic complexity of n bits. Once you have done this, you can use your probability model to make predictions about unseen data generated by the same (unknown) Turing machine as the training data. The closer to n you can compress, the better your predictions will be. I am not sure what it means to understand a painting, but let's say that you understand art if you can identify the artists of paintings you haven't seen before with better accuracy than random guessing. The relevant quantity of information is not the number of pixels and resolution, which depend on the limits of the eye, but the (much smaller) number of features that the high level perceptual centers of the brain are capable of distinguishing and storing in memory. (Experiments by Standing and Landauer suggest it is a few bits per second for long term memory, the same rate as language). Then you guess the shortest program that generates a list of feature-artist pairs consistent with your knowledge of art and use it to predict artists given new features. My estimate of 10^9 bits for a language model is based on 4 lines of evidence, one of which
Re: [agi] A question on the symbol-system hypothesis
Richard Loosemore [EMAIL PROTECTED] wrote: 5) I have looked at your paper and my feelings are exactly the same as Mark's theorems developed on erroneous assumptions are worthless. Which assumptions are erroneous? -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 4:09:23 PM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: Richard, what is your definition of understanding? How would you test whether a person understands art? Turing offered a behavioral test for intelligence. My understanding of understanding is that it is something that requires intelligence. The connection between intelligence and compression is not obvious. I have summarized the arguments here. http://cs.fit.edu/~mmahoney/compression/rationale.html 1) There will probably never be a compact definition of understanding. Nevertheless, it is possible for us (being understanding systems) to know some of its features. I could produce a shopping list of typical features of understanding, but that would not be the same as a definition, so I will not. See my paper in the forthcoming proceedings of the 2006 AGIRI workshop, for arguments. (I will make a version of this available this week, after final revisions). 3) One tiny, almost-too-obvious-to-be-worth-stating fact about understanding is that it compresses information in order to do its job. 4) To mistake this tiny little facet of understanding for the whole is to say that a hurricane IS rotation, rather than that rotation is a facet of what a hurricane is. 5) I have looked at your paper and my feelings are exactly the same as Mark's theorems developed on erroneous assumptions are worthless. Richard Loosemore - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
Mark Waser [EMAIL PROTECTED] wrote: So *prove* to me why information theory forbids transparency of a knowledge base. Isn't this pointless? I mean, if I offer any proof you will just attack the assumptions. Without assumptions, you can't even prove the universe exists. I have already stated reasons why I believe this is true. An AGI will have greater algorithmic complexity than the human brain (assumption). Transparency implies that you can examine the knowledge base and deterministically predict its output given some input (assumption about the definition of transparency). Legg proved [1] that a Turing machine cannot predict another machine of greater algorithmic complexity. Aside from that, I can only give examples as supporting evidence. 1. The relative success of statistical language learning (opaque) compared to structured knowledge, parsing, etc. 2. It would be (presumably) easier to explain human behavior by asking questions than by examining neurons (assuming we had the technology to do this). In your argument for transparency, you assume that individual pieces of knowledge can be isolated. Prove it. In the brain, knowledge is distributed. We make decisions by integrating many sources of evidence from all parts of the brain. [1] Legg, Shane, (2006), Is There an Elegant Universal Theory of Prediction?, Technical Report IDSIA-12-06, IDSIA / USI-SUPSI, Dalle Molle Institute for Artificial Intelligence, Galleria 2, 6928 Manno, Switzerland. http://www.vetta.org/documents/IDSIA-12-06-1.pdf -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, November 16, 2006 9:57:40 AM Subject: Re: [agi] A question on the symbol-system hypothesis The knowledge base has high complexity. You can't debug it. You can examine it and edit it but you can't verify its correctness. While the knowledge base is complex, I disagree with the way in which you're attempting to use the first sentence. The knowledge base *isn't* so complex that it causes a truly insoluble problem. The true problem is that the knowledge base will have a large enough size and will grow and change quickly enough that you can't maintain 100% control over the contents or even the integrity of it. I disagree with the second but believe that it may just be your semantics because of the third sentence. The question is what we mean by debug. If you mean remove all incorrect knowledge, then the answer is obviously yes, we can't remove all incorrect knowledge because odd sequences of observed events and incomplete knowledge means that globally incorrect knowledge *is* the correct deduction from experience. On the other hand, we certainly should be able to debug how the knowledge base operates, make sure that it maintains an acceptable degree of internal integrity, and responds correctly when it detects a major integrity problem. The *process* and global behavior of the knowledge base is what is important and it *can* be debugged. Minor mistakes and errors are just the cost of being limited in an erratic world. An AGI with a correct learning algorithm might still behave badly. No! An AGI with a correct learning algorithm may, through an odd sequence of events and incomplete knowledge, come to an incorrect conclusion and take an action that it would not have taken if it had perfect knowledge -- BUT -- this is entirely correct behavior, not bad behavior. Calling it bad behavior dramatically obscures what you are trying to do. You can't examine the knowledge base to find out why. No, no, no, no, NO! If you (or the AI) can't go back through the causal chain and explain exactly why an action was taken, then you have created an unsafe AI. A given action depends upon a small part of the knowledge base (which may then depend upon ever larger sections in an ongoing pyramid) and you can debug an action and see what lead to an action (that you believe is incorrect but the AI believes is correct). You can't manipulate the knowledge base data to fix it. Bull. You should be able to correctly come across a piece of incorrect knowledge that lead to an incorrect decision. You should be able to find the supporting knowledge structures. If the knowledge is truly incorrect, you should be able to provide evidence/experiences to the AI that leads it to correct the incorrect knowledge (or, you could just even just tack the correct knowledge in the knowledge base, fix it so that it temporarily can't be altered, and run your integrity repair routines -- which, I contend, any AI that is going to go anywhere must have). At least you can't do these things any better than manipulating the inputs and observing the outputs. No. I can find structures in the knowledge base and alter them. I would
Re: [agi] A question on the symbol-system hypothesis
My point is that humans make decisions based on millions of facts, and we do this every second. Every fact depends on other facts. The chain of reasoning covers the entire knowledge base. I said millions, but we really don't know. This is an important number. Historically we have tended to underestimate it. If the number is small, then we *can* follow the reasoning, make changes to the knowledge base and predict the outcome (provided the representation is transparent and accessible through a formal language). But this leads us down a false path. We are not so smart that we can build a machine smarter than us, and still be smarter than it. Either the AGI has more algorithmic complexity than you do, or it has less. If it has less, then you have failed. If it has more, and you try to explore the chain of reasoning, you will exhaust the memory in your brain before you finish. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mark Waser [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, November 16, 2006 3:16:54 PM Subject: Re: [agi] A question on the symbol-system hypothesis I consider the last question in each of your examples to be unreasonable (though for very different reasons). In the first case, What do you see? is a nonsensical and unnecessary extension on a rational chain of logic. The visual subsystem, which is not part of the AGI, has reported something and, unless there is a good reason not to, the AGI should believe it as a valid fact and the root of a knowledge chain. Extending past this point to ask a spurious, open question is silly. Doing so is entirely unnecessary. This knowledge chain is isolated. In the second case, I don't know why you're doing any sort of search (particularly since there wasn't any sort of question preceding it). The AI needed gas, it found a gas station, and it headed for it. You asked why it waited til a given time and it told you. How is this not isolated? - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, November 16, 2006 3:01 PM Subject: Re: [agi] A question on the symbol-system hypothesis Mark Waser [EMAIL PROTECTED] wrote: Give me a counter-example of knowledge that can't be isolated. Q. Why did you turn left here? A. Because I need gas. Q. Why do you need gas? A. Because the tank is almost empty. Q. How do you know? A. Because the needle is on E. Q. How do you know? A. Because I can see it. Q. What do you see? (depth first search) Q. Why did you turn left here? A. Because I need gas. Q. Why did you turn left *here*? A. Because there is a gas station. Q. Why did you turn left now? A. Because there is an opening in the traffic. (breadth first search) It's not that we can't do it in theory. It's that we can't do it in practice. The human brain is not a Turing machine. It has finite time and memory limits. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
Again, do not confuse the two compressions. In paq8f (on which paq8hp5 is based) I use lossy pattern recognition (like you describe, but at a lower level) to extract features to use as context for text prediction. The lossless compression is used to evaluate the quality of the prediction. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: James Ratcliff [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, November 16, 2006 1:41:41 PM Subject: Re: [agi] A question on the symbol-system hypothesis The main first subtitle: Compression is Equivalent to General IntelligenceUnless your definition of Compression is not the simple large amount of text turning into the small amount of text. And likewise with General Intelligence. I dont think under any of the many many definitions I have seen or created, that text or a compress thing can possibly be considered general intelligence. Another way: data != knowledge != intelligence Intelligence requires something else. I would say an actor. Now I would agree that a highly compressed, lossless data could represent a good knowledge base. Yeah that goes good. But quite simply, a lossy one provides a Better knowledge base, with two examples: 1. Poison ivy causes an itching rash for most people poison oak: The common effect is an irritating, itchy rash. Can be generalized or combined to: poison oak and poison ivy cause an itchy rash. Which is shorter, and lossy yet better for this fact. 2. If I see something in the road with four legs, and Im about to run it over, if I only have rules that say if a deer or dog runs in the road, dont hit it. Then I cant correctly act, because I only know there is something with 4 legs in the road. However, if I have a generalized rule in my mind that says If something with four legs is in the road, avoid it, then I have a better rule. This better rule cannot be gathered without generalization, and we have to have lots of generalization. The generalizations can be invalidated with exceptions, and we do it all the time, thats how we can tell not to pet a skunk instead of a cat. James Ratcliff Matt Mahoney [EMAIL PROTECTED] wrote: Richard Loosemore wrote: 5) I have looked at your paper and my feelings are exactly the same as Mark's theorems developed on erroneous assumptions are worthless. Which assumptions are erroneous? -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Richard Loosemore To: agi@v2.listbox.com Sent: Wednesday, November 15, 2006 4:09:23 PM Subject: Re: [agi] A question on the symbol-system hypothesis Matt Mahoney wrote: Richard, what is your definition of understanding? How would you test whether a person understands art? Turing offered a behavioral test for intelligence. My understanding of understanding is that it is something that requires intelligence. The connection between intelligence and compression is not obvious. I have summarized the arguments here. http://cs.fit.edu/~mmahoney/compression/rationale.html 1) There will probably never be a compact definition of understanding. Nevertheless, it is possible for us (being understanding systems) to know some of its features. I could produce a shopping list of typical features of understanding, but that would not be the same as a definition, so I will not. See my paper in the forthcoming proceedings of the 2006 AGIRI workshop, for arguments. (I will make a version of this available this week, after final revisions). 3) One tiny, almost-too-obvious-to-be-worth-stating fact about understanding is that it compresses information in order to do its job. 4) To mistake this tiny little facet of understanding for the whole is to say that a hurricane IS rotation, rather than that rotation is a facet of what a hurricane is. 5) I have looked at your paper and my feelings are exactly the same as Mark's theorems developed on erroneous assumptions are worthless. Richard Loosemore - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 ___ James Ratcliff - http://falazar.com New Torrent Site, Has TV and Movie Downloads! http://www.falazar.com/projects/Torrents/tvtorrents_show.php Sponsored Link Mortgage rates as low as 4.625% - $150,000 loan for $579 a month. Intro-*Terms This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] RSI - What is it and how fast?
I think this is a topic for the singularity list, but I agree it could happen very quickly. Right now there is more than enough computing power on the Internet to support superhuman AGI. One possibility is that it could take the form of a worm. http://en.wikipedia.org/wiki/SQL_slammer_(computer_worm) An AGI of this type would be far more dangerous because it could analyze code, discover large numbers of vulnerabilities and exploit them all at once. As the Internet gets bigger, faster, and more complex, the risk increases. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Hank Conn [EMAIL PROTECTED] To: agi agi@v2.listbox.com Sent: Thursday, November 16, 2006 3:33:08 PM Subject: [agi] RSI - What is it and how fast? Here are some of my attempts at explaining RSI... (1) As a given instance of intelligence, as defined as an algorithm of an agent capable of achieving complex goals in complex environments, approaches the theoretical limits of efficiency for this class of algorithms, intelligence approaches infinity. Since increasing computational resources available for an algorithm is a complex goal in a complex environment, the more intelligent an instance of intelligence becomes, the more capable it is in increasing the computational resources for the algorithm, as well as more capable in optimizing the algorithm for maximum efficiency, thus increasing its intelligence in a positive feedback loop. (2) Suppose an instance of a mind has direct access to some means of both improving and expanding both the hardware and software capability of its particular implementation. Suppose also that the goal system of this mind elicits a strong goal that directs its behavior to aggressively take advantage of these means. Given each increase in capability of the mind's implementation, it could (1) increase the speed at which its hardware is upgraded and expanded, (2) More quickly, cleverly, and elegantly optimize its existing software base to maximize capability, (3) Develop better cognitive tools and functions more quickly and in more quantity, and (4) Optimize its implementation on successively lower levels by researching and developing better, smaller, more advanced hardware. This would create a positive feedback loop- the more capable its implementation, the more capable it is in improving its implementation. How fast could RSI plausibly happen? Is RSI inevitable / how soon will it be? How do we truly maximize the benefit to humanity? It is my opinion that this could happen extremely quickly once a completely functional AGI is achieved. I think its plausible it could happen against the will of the designers (and go on to pose an existential risk), and quite likely that it would move along quite well with the designers intention, however, this opens up the door to existential disasters in the form of so-called Failures of Friendliness. I think its fairly implausible the designers would suppress this process, except those that are concerned about completely working out issues of Friendliness in the AGI design. This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] One grammar parser URL
YKY (Yan King Yin) [EMAIL PROTECTED] wrote: Any suggestions on how to make my project more popular? Clearly state the problem you want to solve. Don't just build AGI for the sake of building it. Do you think it is good practice to attach frames to *words*, or rather to *situations*? Neither. I think logical inference should be built into the language model. The hard part is not the inference (if you keep the chain of reasoning short) but in converting natural language statements about facts and relations and queries into mathematical form and back. But I don't think this is even necessary. Most people can reason informally without converting statements into first order logic. A language model should develop this capability first. Learning logic is similar to learning grammar. A statistical model can classify words into syntactic categories by context, e.g. the X is tells you that X is a noun, and that it can be used in novel contexts where other nouns have been observed, like a X was. At a somewhat higher level, you can teach logical inference by giving examples such as: All men are mortal. Socrates is a man. Therefore Socrates is mortal. All birds have wings. Tweety is a bird. Therefore Tweety has wings. which fit a pattern allowing you to complete the paragraph: All X are Y. Z is a X. Therefore... And likewise for other patterns that are taught in a logic class, e.g. If X then Y. Y is false. Therefore... Finally you give examples in formal notation and their English equivalents, (X = Y) ^ ~Y, and again use statistical modeling to learn the substitution rules to do these conversions. To get to this point I think you will first need to train the language model to detect higher level grammatical structures such as phrases and sentences, not just word categories. I believe this can be done using a neural model. This has been attempted using connectionist models, where neurons represent features at different levels of abstraction, such as letters, words, parts of speech, phrases, and sentence structures, in addition to time delayed copies of these. A problem with connectionist models is that each word or concept is assigned to a single neuron, so there is no biologically plausible mechanism for learning new words. A more accurate model is one in which each concept is correleated with many neurons to varying degrees, and each neuron is correlated with many concepts. Then we have a mechanism, which is to shift a large number of neurons slightly toward a new concept. Except for this process, we can still use the connectionist model as an approximation to help us understand the true model, with the understanding that a single weight in the model actually represents a large number of connections. I believe the language learning algorithm is essentially Hebb's model of classical conditioning, plus some stability constraints in the form of lateral inhibition and fatigue. Right now this is still research. I have no experimental results to show that this model would work. It is far from developed. I hope to test it eventually by putting it into a text compressor. If it does work, I don't know if it will train to a high enough level to solve logical inference, at least not without some hand written training data or a textbook on logic. If it does reach this point, we would show that examples of correct inference compress smaller than incorrect examples. To have it answer questions I would need to add a model of discourse, but that is a long way off. Most training text is not interactive, and I would need about 1 GB. Maybe you have some ideas? -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, November 16, 2006 7:17:55 PM Subject: Re: [agi] One grammar parser URL On 11/16/06, James Ratcliff [EMAIL PROTECTED] wrote: Correct, Using inferences only works in toy, or small well understood domains, as inevitably when it goes 2+ steps away from direct knowledge it will be making large assumptions and be wrong. My thoughts have been on an AISim as well, but I am laying out the works for it to be massivley available to many users. How many people are actively working with the AGISim, or do you expect to be, and do you feel that small set of user interaction with it will produce enough experience to advance the AI knowledge base? My presumption is to make the final AISim a simple enough but intersting interface to allow any number of potential users to interact, teach, and play with the bots inside. I have a very very basic, open structure for the knowledge base, and allow users to tweak and change the actual action functions available and create and remove items in the world to interact with. I wish to create a massively popular AI platform as well =) But my take would
Re: [agi] One grammar parser URL
see that the requirement for deterministic computation should be an obstacle to building a language model on a computer. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: James Ratcliff [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Friday, November 17, 2006 9:40:41 AM Subject: Re: [agi] One grammar parser URL Not quite gonna work that way unfortunatly. (I think) The 10^9 figure you used was the compressed amount of data for a lifetime of a human. You cant just give a NN that much data, you have to give it 10^9*X amount of data. The NN will need many exponential times the amount of training data to get the final results, to boil down to a final 10^9 amount of data. You may even be able to show that a computer can still process the new larger amount in a reasonable lifetime. Now the hard part, how do you generate that much information? Real experience correct information? You cant unfortuntaly. I have over 600 novels right now for mine. But the amount of compressed acutall Knowledge in all those... maybe a couple novels worth, and it is very very hard to even compress and learn that small amount of information. This NN line of reasoning, I havent seen prove effective on the totoal task of AGI, though it is wonderfull for the smaller component modules like vision / motor control, and is necessary there. Humans learn with a very small amount of information, and I really think we must model the AGI after this is some fashion, with the caveat that it is possible to train the AGI with 1,000 or a million people instead of just a small team of one or two, by distributing the learning activities throughout the internet. But that still gives us a very small sample size of the entireity of world experience. James Ratcliff - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
When I refer to a quantity of information, I mean its algorithmic complexity, the size of the smallest program that generates it. So yes, the Mandelbrot set contains very little information. I realize that algorithmic complexity is not obtainable in general. When I express AI or language modeling in terms of compression, I mean that the goal is to get as close to this unobtainable limit as possible. Algorithmic complexity can apply to either finite or infinite series. For example, the algorithmic complexity of a string of n zero bits is log n + C for some constant C that depends on your choice of universal Turing machine. The complexity of an infinite string of zero bits is a (small) constant C. When I talk about Kauffman's assertion that complex systems evolve toward the boundary between stability and chaos, I mean a discrete approximation of these concepts. These are defined for dynamic systems in real vector spaces controlled by differential equations. (Chaos requires at least 3 dimensions). A system is chaotic if its Lyapunov exponent is greater than 1, and stable if less than one. Extensions to discrete systems have been described. For example, the logistic map x := rx(1 - x), 0 x 1, goes from stable to chaotic as r grows from 0 to 4. For discrete spaces, pseudo random number generators are simple examples of chaotic systems. Kauffman studied chaos in large discrete systems (state machines with randomly connected logic gates) and found that the systems transition from stable to chaotic as the number of inputs per gate is increased from 2 to 3. At the boundary, the number of discrete attractors (repeating cycles) is about the square root of the number of variables. Kauffman noted that gene regulation can be modeled this way (gene combinations turn other genes on or off) and that the number of human cell types (254) is about the square root of the number of genes (he estimated 100K, but actually 30K). I noted (coincidentally?) that vocabulary size is about the square root of the size of a language model. The significance of this to AI is that I believe it bounds the degree of interconnectedness of knowledge. It cannot be so great that small updates to the AI result in large changes in behavior. This places limits on what we can build. For example, in a neural network with feedback loops, the weights would have to be kept small. We should not confuse symbols with meaning. A language model associates patterns of symbols with other patterns of symbols. It is not grounded. A model does not need vision to know that the sky is blue. They are just words. I believe that an ungrounded model (plus a discourse model, which has a sense of time and who is speaking) can pass the Turing test. I don't believe all of the conditions are in place for a hard takeoff yet. You need: 1. Self replicating computers. 2. AI smart enough to write programs from natural language specifications. 3. Enough hardware on the Internet to support AGI. 4. Execute access. 1. Computer manufacturing depends heavily on computer automation but you still need humans to make it all work. 2. AI language models are now at the level of a toddler, able to recognize simple sentences of a few words, but they can already learn in hours or days what takes a human years. 3. I estimate an adult level language model will fit on a PC but it would take 3 years to train it. A massively parallel architecure like Google's MapReduce could do it in an hour, but it would require a high speed network. A distributed implementation like GIMPS or SETI would not have enough interconnection speed to support a language model. I think you need about a 1Gb/s connection with low latency to distribute it over a few hundred PCs. 4. Execute access is one buffer overflow away. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Mike Dougherty [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Saturday, November 18, 2006 1:32:05 AM Subject: Re: [agi] A question on the symbol-system hypothesis I'm not sure I follow every twist in this thread. No... I'm sure I don't follow every twist in this thread. I have a question about this compression concept. Compute the number of pixels required to graph the Mandelbrot set at whatever detail you feel to be a sufficient for the sake of example. Now describe how this 'pattern' is compressed. Of course the ideal compression is something like 6 bytes. Show me a 6 byte jpg of a mandelbrot :) Is there a concept of compression of an infinite series? Or was the term bounding being used to describe the attractor around which the values tends to fall? chaotic attractor, statistical median, etc. they seem to be describing the same tendency of human pattern recognition of different types of data. Is a 'symbol' an idea, or a handle on an idea? Does this support the mechanics of how concepts can be built from agreed-upon ideas
Re: [agi] A question on the symbol-system hypothesis
I think your definition of understanding is in agreement with what Hutter calls intelligence, although he stated it more formally in AIXI. An agent and an enviroment are modeled as a pair of interactive Turing machines that pass symbols back and forth. In addition, the environment passes a reward signal to the agent, and the agent has the goal of maximizing the accumulated reward. The agent does not, in general, have a model of the environment, but must learn it. Intelligence is presumed to be correlated with a greater accumulated reward (perhaps averaged over a Solomonoff distribution of all environments). -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: James Ratcliff [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Saturday, November 18, 2006 7:42:19 AM Subject: Re: [agi] A question on the symbol-system hypothesis Have to amend that to acts or replies and it could react unpredictably depending on the humans level of understanding if it sees a nice neat answer, (like the jumping thru the window cause the door was blocked) that the human wasnt aware of, or was suprised about it would be equally good. And this doesnt cover the opposite of what other actions can be done, and what are the consequences, that is also important. And lastly this is for a situation only, we also have the more general case about understading a thing Where when it sees. or has, or is told about a thing, it understands it if, it know about general properties, and actions that can be done with, or using the thing. The main thing being we cant and arnt really defining understanding but the effect of the understanding, either in action or in a language reply. And it should be a level of understanding, not just a y/n. So if one AI saw an apple and said, I can throw / cut / eat it, and weighted those ideas. and the second had the same list, but weighted eat as more likely, and/or knew people sometimes cut it before eating it. Then the AI would understand to a higher level. Likewise if instead, one knew you could bake an apple pie, or apples came from apple trees, he would understand more. So it starts looking like a knowledge test then. Maybe we could extract simple facts from wiki, and start creating a test there, then add in more complicated things. James Charles D Hixson [EMAIL PROTECTED] wrote: Ben Goertzel wrote: ... On the other hand, the notions of intelligence and understanding and so forth being bandied about on this list obviously ARE intended to capture essential aspects of the commonsense notions that share the same word with them. ... Ben Given that purpose, I propose the following definition: A system understands a situation that it encounters if it predictably acts in such a way as to maximize the probability of achieving it's goals in that situation. I'll grant that it's a bit fuzzy, but I believe that it captures the essence of the visible evidence of understanding. This doesn't say what understanding is, merely how you can recognize it. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 ___ James Ratcliff - http://falazar.com New Torrent Site, Has TV and Movie Downloads! http://www.falazar.com/projects/Torrents/tvtorrents_show.php Sponsored Link Mortgage rates as low as 4.625% - $150,000 loan for $579 a month. Intro-*Terms This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] new paper: What Do You Mean by AI?
Pei, you classified NARS as a principle-based AI. Are there any others in that category? What about Novamente? -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Pei Wang [EMAIL PROTECTED] To: agi@v2.listbox.com agi@v2.listbox.com Sent: Friday, November 17, 2006 11:51:58 AM Subject: [agi] new paper: What Do You Mean by AI? Hi, A new paper of mine is put on-line for comment. English corrections are also welcome. You can either post to this mailing list or send me private emails. Thanks in advance. Pei --- TITLE: What Do You Mean by AI? ABSTRACT: Many problems in AI study can be traced back to the confusion of different research goals. In this paper, five typical ways to define AI are clarified, analyzed, and compared. It is argued that though they are all legitimate research goals, they lead the research to very different directions. Furthermore, most of them have trouble to give AI a proper identity. URL: http://nars.wang.googlepages.com/wang.AI_Definitions.pdf - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: Re: [agi] Understanding Natural Language
My point about artificial languages is I don't believe that they are of much use in helping to understand or solve the natural language modeling problem, which is a central problem to AGI. Ben mentioned one use, which is to use Lojban++ in combination with English to train an AGI in English. In this case, Lojban++ serves to help ground the language, just as using a 3-D modeling language could also be used to describe the environment. In this case, any language which is expressive enough to do this and is familiar to the developer will do. It is a different case where we require users to learn an artificial language because we don't know how to model natural language. I don't see how this can lead to any significant insights. There are already many examples of unabiguous and easy to parse programming languages (including superficially English-like languages such as COBOL and SQL) and formal knowledge representation languages (Cycl, prolog, etc). An AGI has to deal with ambiguity and errors in language. Consider the following sentence which I used earlier: I could even invent a new branch of mathematics, introduce appropriate notation, and express ideas in it. What does it refer to? The solution in an artificial language would be either to forbid pronouns (as in most programming languages) or explicitly label it to make the meaning explicit. But people don't want or need to do this. They can figure it out by context. If your AGI can't use context to solve such problems then you haven't solved the natural language modeling problem, and a vast body of knowledge will be inaccessible. I think you will find that writing a Lojban parser will be trivial compared to writing an English to Lojban translator. Andrii (lOkadin) Zvorygin [EMAIL PROTECTED] wrote: My initial reasoning was that right now many programs don't use AI, because programmers don't know, and the ones that do can't easily add code. It is because language modeling is unsolved. Computers would be much easier to use if we could talk to them in English. But they do not understand. We don't know how to make them understand. But we are making progress. Google will answer simple, natural language questions (although they don't advertise it). The fact that others haven't done it suggests the problem requires vast computational resources and training data. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Andrii (lOkadin) Zvorygin [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Sunday, November 26, 2006 4:37:02 PM Subject: Re: Re: [agi] Understanding Natural Language On 11/25/06, Matt Mahoney [EMAIL PROTECTED] wrote: Andrii (lOkadin) Zvorygin [EMAIL PROTECTED] wrote: Even if we were able to constrain the grammar, you still have the problem that people will still make ungrammatical statements, misspell words, omit words, and so on. Amazing you should mention such valid points against natural languages. This misses the point. Where are you going to get 1 GB of Lojban text to train your language model? Well A: I could just get IRC logs and mailing lists of the current Lojban community. B: point is to translate English into Lojban C: I'm not training a language model. I'm creating a parser, then a translator, then other things. The translator will have some elements of an AI probably Bayesian probability will be involved, it's too early to say however. I may be on the wrong list discussing this. If you require that all text pass through a syntax checker for errors, you will greatly increase the cost of generating your training data. Well A: There are rarely any errors -- unlike in a natural language like say English. B: Addressed above. This is not a trivial problem. Which one? Maybe as a whole it's not trivial, but when you break it down the little pieces are all individually trivial. It is a big part of why programmers can only write 10 lines of code per day on projects 1/1000 the size of a language model. Monolithic programming is the paradigm of the past, is one of the reasons I'm creating this new development model. Then when you have built the model, you will still have a system that is intolerant of errors and hard to use. Because of the nature of the development model -- designed after functional programming languages, going to be able to add functions anywhere in the process without interupting the rest of the functions, as it wont be changing the input other functions recieve(unless that is the intent). Hard to use? Well we'll see when I have a basic implementation, the whole point is so that it will be easy to use, maybe it wont work out though -- can't see how. .iacu'i(skepticism) Your language model needs to have a better way to deal with inconsistency than to report errors and make more work for the user. It can easily just check what the previous response of this user, or someone else that has made a similar error was when correcting. Trivial once we get
Re: [agi] Understanding Natural Language
Philip Goetz [EMAIL PROTECTED] wrote: The use of predicates for representation, and the use of logic for reasoning, are separate issues. I think it's pretty clear that English sentences translate neatly into predicate logic statements, and that such a transformation is likely a useful first step for any sentence-understanding process. I don't think it is clear at all. Try translating some poetry. Even for sentences that do have a clear representation in first order logic, the translation from English is not straightforward at all. It is an unsolved problem. I also dispute that it is even useful for sentence understanding. Google understands simple questions, and its model is just a bag of words. Attempts to apply parsing or reasoning to information retrieval have generally been a failure. It would help to define what sentence-understanding means. I say a computer understands English if it can correctly assign probabilities to long strings, where correct means ranked in the same order as judged by humans. So a program that recognizes the error in the string the cat caught a moose could be said to understand English. Thus, the grammar checker in Microsoft Word would have more understanding of a text document than a simple spell checker, but less understanding than most humans. Maybe you have a different definition. A reasonable definition for AI should be close to the conventional meaning and also be testable without making any assumption about the internals of the machine. Now it seems to me that you need to understand sentences before you can translate them into FOL, not the other way around. Before you can translate to FOL you have to parse the sentence, and before you can parse it you have to understand it, e.g. I ate pizza with pepperoni. I ate pizza with a fork. Using my definition of understanding, you have to recognize that ate with a fork and pizza with pepperoni rank higher than ate with pepperoni and pizza with a fork. A parser needs to know millions of rules like this. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Philip Goetz [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, November 28, 2006 2:47:41 PM Subject: Re: [agi] Understanding Natural Language On 11/24/06, J. Storrs Hall, PhD. [EMAIL PROTECTED] wrote: On Friday 24 November 2006 06:03, YKY (Yan King Yin) wrote: You talked mainly about how sentences require vast amounts of external knowledge to interpret, but it does not imply that those sentences cannot be represented in (predicate) logical form. Substitute bit string for predicate logic and you'll have a sentence that is just as true and not a lot less useful. I think there should be a working memory in which sentences under attention would bring up other sentences by association. For example if a person is being kicked is in working memory, that fact would bring up other facts such as being kicked causes a person to feel pain and possibly to get angry, etc. All this is orthogonal to *how* the facts are represented. Oh, I think the representation is quite important. In particular, logic lets you in for gazillions of inferences that are totally inapropos and no good way to say which is better. Logic also has the enormous disadvantage that you tend to have frozen the terms and levels of abstraction. Actual word meanings are a lot more plastic, and I'd bet internal representations are damn near fluid. The use of predicates for representation, and the use of logic for reasoning, are separate issues. I think it's pretty clear that English sentences translate neatly into predicate logic statements, and that such a transformation is likely a useful first step for any sentence-understanding process. Whether those predicates are then used to draw conclusions according to a standard logic system, or are used as inputs to a completely different process, is a different matter. The open questions are representation -- I'm leaning towards CSG in Hilbert spaces at the moment, but that may be too computationally demanding -- and how to form abstractions. Does CSG = context-sensitive grammar in this case? How would you use Hilbert spaces? - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Understanding Natural Language
First order logic (FOL) is good for expressing simple facts like all birds have wings or no bird has hair, but not for statements like most birds can fly. To do that you have to at least extend it with fuzzy logic (probability and confidence). A second problem is, how do you ground the terms? If you have for all X, bird(X) = has(X, wings), where does bird, wings, has get their meanings? The terms do not map 1-1 to English words, even though we may use the same notation. For example, you can talk about the wings of a building, or the idiom wing it. Most words in the dictionary list several definitions that depend on context. Also, words gradually change their meaning over time. I think FOL represents complex ideas poorly. Try translating what you just wrote into FOL and you will see what I mean. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Philip Goetz [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, November 28, 2006 5:45:51 PM Subject: Re: [agi] Understanding Natural Language Oops, Matt actually is making a different objection than Josh. Now it seems to me that you need to understand sentences before you can translate them into FOL, not the other way around. Before you can translate to FOL you have to parse the sentence, and before you can parse it you have to understand it, e.g. I ate pizza with pepperoni. I ate pizza with a fork. Using my definition of understanding, you have to recognize that ate with a fork and pizza with pepperoni rank higher than ate with pepperoni and pizza with a fork. A parser needs to know millions of rules like this. Yes, this is true. When I said neatly, I didn't mean easily. I mean that the correct representation in predicate logic is very similar to the English, and doesn't lose much meaning. It was misleading of me to say that it's a good starting point, though, since you do have to do a lot to get those predicates. A predicate representation can be very useful. This doesn't mean that you have to represent all of the predications that could be extracted from a sentence. The NLP system I'm working on does not, in fact, use a parse tree, for essentially the reasons Matt just gave. It doesn't want to make commitments about grammatical structure, so instead it just groups things into phrases, without deciding what the dependencies are between those phrases, and then has a bunch of different demons that scan those phrases looking for particular predications. As you find predications in the text, you can eliminate certain choices of lexical or semantic category for words, and eliminate arguments so that they can't be re-used in other predications. You never actually find the correct parse in our system, but you could if you wanted to. It's just that, we've already extracted the meaning that we're interested in by the time we have enough information to get the right parse, so the parse tree isn't of much use. We get the predicates that we're interested in, for the purposes at hand. We might never have to figure out whether pepperoni is a part or an instrument, because we don't care. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
So what is your definition of understanding? -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Philip Goetz [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, November 29, 2006 5:36:39 PM Subject: Re: [agi] A question on the symbol-system hypothesis On 11/19/06, Matt Mahoney [EMAIL PROTECTED] wrote: I don't think is is possible to extend the definition of understanding to machines in a way that would be generally acceptable, in the sense that humans understand understanding. Humans understand language. We don't generally say that animals in the wild understand their environment, although we do say that animals can be trained to understand commands. I generally say that animals in the wild understand their environment. If you don't, you are using a definition of understand that I don't understand. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
--- Philip Goetz [EMAIL PROTECTED] wrote: On 11/30/06, James Ratcliff [EMAIL PROTECTED] wrote: One good one: Consciousness is a quality of the mind generally regarded to comprise qualities such as subjectivity, self-awareness, sentience, sapience, and the ability to perceive the relationship between oneself and one's environment. (Block 2004). Compressed: Consciousness = intelligence + autonomy I don't think that definition says anything about intelligence or autonomy. All it is is a lot of words that are synonyms for consciousness, none of which really mean anything. I think if you insist on an operational definition of consciousness you will be confronted with a disturbing lack of evidence that it even exists. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: Motivational Systems of an AI [WAS Re: [agi] RSI - What is it and how fast?]
--- Hank Conn [EMAIL PROTECTED] wrote: On 12/1/06, Matt Mahoney [EMAIL PROTECTED] wrote: The goals of humanity, like all other species, was determined by evolution. It is to propagate the species. That's not the goal of humanity. That's the goal of the evolution of humanity, which has been defunct for a while. We have slowed evolution through medical advances, birth control and genetic engineering, but I don't think we have stopped it completely yet. You are confusing this abstract idea of an optimization target with the actual motivation system. You can change your motivation system all you want, but you woulnd't (intentionally) change the fundamental specification of the optimization target which is maintained by the motivation system as a whole. I guess we are arguing terminology. I mean that the part of the brain which generates the reward/punishment signal for operant conditioning is not trainable. It is programmed only through evolution. To some extent you can do this. When rats can electrically stimulate their nucleus accumbens by pressing a lever, they do so nonstop in preference to food and water until they die. I suppose the alternative is to not scan brains, but then you still have death, disease and suffering. I'm sorry it is not a happy picture either way. Or you have no death, disease, or suffering, but not wireheading. How do you propose to reduce the human mortality rate from 100%? -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: Re: [agi] Language acquisition in humans: How bound up is it with tonal pattern recognition...?
--- Ben Goertzel [EMAIL PROTECTED] wrote: I think that our propensity for music is pretty damn simple: it's a side-effect of the general skill-learning machinery that makes us memetic substrates. Tunes are trajectories in n-space as are the series of motor signals involved in walking, throwing, hitting, cracking nuts, chipping stones, etc, etc. Once we evolved a general learn-to-imitate-by-observing ability it will get used for imitating just about anything. Well, Steve Mithen argues otherwise in his book, based on admittedly speculative interpretations of anthropological/archaeological evidence... He argues for the presence of a specialized tonal pattern recognition module in the human brain, and the specific consequences for language learning of the existence of such a module... -- Ben I believe that Desmond Morris (The Naked Ape) argued that we like music because babies that liked to listen to their mother's heartbeat had a survival advantage. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: Motivational Systems of an AI [WAS Re: [agi] RSI - What is it and how fast?]
--- Hank Conn [EMAIL PROTECTED] wrote: On 12/1/06, Matt Mahoney [EMAIL PROTECTED] wrote: --- Hank Conn [EMAIL PROTECTED] wrote: On 12/1/06, Matt Mahoney [EMAIL PROTECTED] wrote: I suppose the alternative is to not scan brains, but then you still have death, disease and suffering. I'm sorry it is not a happy picture either way. Or you have no death, disease, or suffering, but not wireheading. How do you propose to reduce the human mortality rate from 100%? Why do you ask? You seemed to imply you knew an alternative to brain scanning, or did I misunderstand? -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
a connection was an attack, because the only information to tell that a connection was an attack was in the TCP packet contents, while my system looked only at packet headers. And yet, the system succeeded in placing about 50% of all attacks in the top 1% of suspicious connections. To this day, I don't know how it did it. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Re: Motivational Systems of an AI
--- Richard Loosemore [EMAIL PROTECTED] wrote: Matt Mahoney wrote: --- Richard Loosemore [EMAIL PROTECTED] wrote: I am disputing the very idea that monkeys (or rats or pigeons or humans) have a part of the brain which generates the reward/punishment signal for operant conditioning. This is behaviorism. I find myself completely at a loss to know where to start, if I have to explain what is wrong with behaviorism. Call it what you want. I am arguing that there are parts of the brain (e.g. the nucleus accumbens) responsible for reinforcement learning, and furthermore, that the synapses along the input paths to these regions are not trainable. I argue this has to be the case because an intelligent system cannot be allowed to modify its motivational system. Our most fundamental models of intelligent agents require this (e.g. AIXI -- the reward signal is computed by the environment). You cannot turn off hunger or pain. You cannot control your emotions. Since the synaptic weights cannot be altered by training (classical or operant conditioning), they must be hardwired as determined by your DNA. Pei has already spoken eloquently on many of these questions. Yes, and I agree with most of his comments. I need to clarify that the part of the motivational system that is not trainable is the one that computes top level goals such as hunger, thirst, pain, the need for sleep, reproductive drive, etc. I think we can agree on this. Regardless of training, everyone will get hungry if they don't eat. You can temporarily distract yourself from hunger, but a healthy person can't change this top level goal. If this were not true, obesity would not be such a problem, and instead you would see a lot of people starving themselves to death. I think the confusion is over learned secondary goals, such as seeking money to buy food, or education to get a better job. So in that context, I agree with most of your comments too. That all human learning can be reduced to classical and operant conditioning? Of course I am disputing this. This is the behaviorist idea that has been completely rejected by the cognitive science community since 1956. If you are willing to bend the meaning of the terms classical and operant conditioning sufficiently far from their origins, you might be able to make the idea more plausible, but that kind of redefinition is a little silly, and I don't see you trying to do that. How about if I call them supervised and unsupervised learning? Of course this is not helpful. What I am trying to do is understand how learning works in humans so it can be modeled in AGI. Classical conditioning (e.g. Pavlov) has a simple model proposed by Hebb in 1949. If neuron A fires followed by B after time t, then the weight from A to B is increased in proportion to AB/t (where A and B are activation levels). The dependence on A and B has been used in neural models long before synaptic weight changes were observed in animal brains. The factor 1/t (for t greater than a few hundred milliseconds) is supported by animal experiments. The model for reinforcement learning is not so clear. We can imagine several possibilities. 1. The weights of a neural network are randomly and temporarily varied. After a positive reinforcement, the changes become permanent. If negative, the changes are undone or made in the opposite direction. 2. The neuron activation level of B is varied by adding random noise, dB. After reinforcment r after time t, the weight change from A to B is proportional to A(dB)r/t. 3. There is no noise. Let dB be the rate of increase of B. The weight change is proportional to A(dB)r/t. 4. (as pointed out by Philip Goetz) http://www.iro.umontreal.ca/~lisa/pointeurs/RivestNIPS2004.pdf The weight change is proportional to AB(r-p), where p is the predicted reinforcement (trained by classical conditioning) and r is the actual reinforcement (tri-Hebbian model). And many other possibilities. We don't know what the brain uses. It might be a combination of these. From animal experiments we know that the learning rate is proportional to r/t, but not much else. From computer simulations, we know there is no best solution because it depends on the problem. So I would like to see an answer to this question. How does it work in the brain? How should it be done in AGI? -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: Motivational Systems of an AI [WAS Re: [agi] RSI - What is it and how fast?]
--- Mark Waser [EMAIL PROTECTED] wrote: You cannot turn off hunger or pain. You cannot control your emotions. Huh? Matt, can you really not ignore hunger or pain? Are you really 100% at the mercy of your emotions? Why must you argue with everything I say? Is this not a sensible statement? Since the synaptic weights cannot be altered by training (classical or operant conditioning) Who says that synaptic weights cannot be altered? And there's endless irrefutable evidence that the sum of synaptic weights is certainly constantly altering by the directed die-off of neurons. But not by training. You don't decide to be hungry or not, because animals that could do so were removed from the gene pool. Is this not a sensible way to program the top level goals for an AGI? -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] A question on the symbol-system hypothesis
Mark, Philip Goetz gave an example of an intrusion detection system that learned information that was not comprehensible to humans. You argued that he could have understood it if he tried harder. I disagreed and argued that an explanation would be useless even if it could be understood. If you use a computer to add up a billion numbers, do you check the math, or do you trust it to give you the right answer? My point is that when AGI is built, you will have to trust its answers based on the correctness of the learning algorithms, and not by examining the internal data or tracing the reasoning. I believe this is the fundamental flaw of all AI systems based on structured knowledge representations, such as first order logic, frames, connectionist systems, term logic, rule based systems, and so on. The evidence supporting my assertion is: 1. The relative success of statistical models vs. structured knowledge. 2. Arguments based on algorithmic complexity. (The brain cannot model a more complex machine). 3. The two examples above. I'm afraid that's all the arguments I have. Until we build AGI, we really won't know. I realize I am repeating (summarizing) what I have said before. If you want to tear down my argument line by line, please do it privately because I don't think the rest of the list will be interested. --- Mark Waser [EMAIL PROTECTED] wrote: Matt, Why don't you try addressing my points instead of simply repeating things that I acknowledged and answered and then trotting out tired old red herrings. As I said, your network intrusion anomaly detector is a pattern matcher. It is a stupid pattern matcher that can't explain it's reasoning and can't build upon what it has learned. You, on the other hand, gave a very good explanation of how it works. Thus, you have successfully proved that you are an explaining intelligence and it is not. If anything, you've further proved my point that an AGI is going to have to be able to explain/be explained. - Original Message - From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Saturday, December 02, 2006 5:17 PM Subject: Re: [agi] A question on the symbol-system hypothesis --- Mark Waser [EMAIL PROTECTED] wrote: A nice story but it proves absolutely nothing . . . . . I know a little about network intrusion anomaly detection (it was my dissertation topic), and yes it is an important lessson. Network traffic containing attacks has a higher algorithmic complexity than traffic without attacks. It is less compressible. The reason has nothing to do with the attacks, but with arbitrary variations in protocol usage made by the attacker. For example, the Code Red worm fragments the TCP stream after the HTTP GET command, making it detectable even before the buffer overflow code is sent in the next packet. A statistical model will learn that this is unusual (even though legal) in normal HTTP traffic, but offer no explanation why such an event should be hostile. The reason such anomalies occur is because when attackers craft exploits, they follow enough of the protocol to make it work but often don't care about the undocumented conventions followed by normal servers and clients. For example, they may use lower case commands where most software uses upper case, or they may put unusual but legal values in the TCP or IP-ID fields or a hundred other things that make the attack stand out. Even if they are careful, many exploits require unusual commands or combinations of options that rarely appear in normal traffic and are therefore less carefully tested. So my point is that it is pointless to try to make an anomaly detection system explain its reasoning, because the only explanation is that the traffic is unusual. The best you can do is have it estimate the probability of a false alarm based on the information content. So the lesson is that AGI is not the only intelligent system where you should not waste your time trying to understand what it has learned. Even if you understood it, it would not tell you anything. Would you understand why a person made some decision if you knew the complete state of every neuron and synapse in his brain? You developed a pattern-matcher. The pattern matcher worked (and I would dispute that it worked better than it had a right to). Clearly, you do not understand how it worked. So what does that prove? Your contention (or, at least, the only one that continues the previous thread) seems to be that you are too stupid to ever understand the pattern that it found. Let me offer you several alternatives: 1) You missed something obvious 2) You would have understood it if the system could have explained it to you 3) You would have understood it if the system had managed to losslessly convert
Re: Motivational Systems of an AI [WAS Re: [agi] RSI - What is it and how fast?]
--- Eric Baum [EMAIL PROTECTED] wrote: Matt --- Hank Conn [EMAIL PROTECTED] wrote: On 12/1/06, Matt Mahoney [EMAIL PROTECTED] wrote: The goals of humanity, like all other species, was determined by evolution. It is to propagate the species. That's not the goal of humanity. That's the goal of the evolution of humanity, which has been defunct for a while. Matt We have slowed evolution through medical advances, birth control Matt and genetic engineering, but I don't think we have stopped it Matt completely yet. I don't know what reason there is to think we have slowed evolution, rather than speeded it up. I would hazard to guess, for example, that since the discovery of birth control, we have been selecting very rapidly for people who choose to have more babies. In fact, I suspect this is one reason why the US (which became rich before most of the rest of the world) has a higher birth rate than Europe. Yes, but actually most of the population increase in the U.S. is from immigration. Population is growing the fastest in the poorest countries, especially Africa. Likewise, I expect medical advances in childbirth etc are selecting very rapidly for multiple births (which once upon a time often killed off mother and child.) I expect this, rather than or in addition to the effects of fertility drugs, is the reason for the rise in multiple births. The main effect of medical advances is to keep children alive who would otherwise have died from genetic weaknesses, allowing these weaknesses to be propagated. Genetic engineering has not yet had much effect on human evolution, as it has in agriculture. We have the technology to greatly speed up human evolution, but it is suppressed for ethical reasons. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] The Singularity
--- John Scanlon [EMAIL PROTECTED] wrote: Alright, I have to say this. I don't believe that the singularity is near, or that it will even occur. I am working very hard at developing real artificial general intelligence, but from what I know, it will not come quickly. It will be slow and incremental. The idea that very soon we can create a system that can understand its own code and start programming itself is ludicrous. Any arguments? Not very soon, maybe 10 or 20 years. General programming skills will first require an adult level language model and intelligence, something that could pass the Turing test. Currently we can write program-writing programs only in very restricted environments with simple, well defined goals (e.g. genetic algorithms). This is not sufficient for recursive self improvement. The AGI will first need to be at the intellectual level of the humans who built it. This means sufficient skills to do research, and to write programs from ambiguous natural language specificiations and have enough world knowledge to figure out what the customer really wanted. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: Re: [agi] A question on the symbol-system hypothesis
--- Ben Goertzel [EMAIL PROTECTED] wrote: Matt Maohoney wrote: My point is that when AGI is built, you will have to trust its answers based on the correctness of the learning algorithms, and not by examining the internal data or tracing the reasoning. Agreed... I believe this is the fundamental flaw of all AI systems based on structured knowledge representations, such as first order logic, frames, connectionist systems, term logic, rule based systems, and so on. I have a few points in response to this: 1) Just because a system is based on logic (in whatever sense you want to interpret that phrase) doesn't mean its reasoning can in practice be traced by humans. As I noted in recent posts, probabilistic logic systems will regularly draw conclusions based on synthesizing (say) tens of thousands or more weak conclusions into one moderately strong one. Tracing this kind of inference trail in detail is pretty tough for any human, pragmatically speaking... 2) IMO the dichotomy between logic based and statistical AI systems is fairly bogus. The dichotomy serves to separate extremes on either side, but my point is that when a statistical AI system becomes really serious it becomes effectively logic-based, and when a logic-based AI system becomes really serious it becomes effectively statistical ;-) I see your point that there is no sharp boundary between structured knowledge and statistical approaches. What I mean is that the normal software engineering practice of breaking down a hard problem into components with well defined interfaces does not work for AGI. We usually try things like: input text -- parser -- semantic extraction -- inference engine -- output text. The fallacy is believing that the intermediate representation would be more comprehensible than the input or output. That isn't possible because of the huge amount of data. In a toy system you might have 100 facts that you can compress down to a diagram that fits on a sheet of paper. In reality you might have a gigabyte of text that you can compress down to 10^9 bits. Whatever form this takes can't be more comprehensible than the input or output text. I think it is actually liberating to remove the requirement for transparency that was typical of GOFAI. For example, your knowledge representation could still be any of the existing forms but it could also be a huge matrix with billions of elements. But it will require a different approach to build, not so much engineering, but more of an experimental science, where you test different learning algoriths at the inputs and outputs only. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Brain memory Map Article -
Prior to reading this article it was my belief that the purpose of dreaming (REM sleep) was to copy medium term (daily) memories from the hippocampus to long term memory in the cerebral cortex. REM sleep occurs in only 2 of the 3 orders of mammals: placentals (which include humans and rodents) and marsupials. Egg laying mammals such as the spiny anteater do not dream and have a much different brain structure. I find it a mystery why memories are played back at high speed in reverse order, with excitations in the cortex preceding those in the hippocampus, and that this occurs during non REM sleep. Perhaps this is part of a feedback loop to erase memories from the hippocampus after they have been copied. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Bob Mottram [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, December 19, 2006 8:45:34 AM Subject: Re: [agi] Brain memory Map Article - I was also reading that article. The place cell phenomena has been known for many years. For a long time I've thought that sleep might be used for something other than just down time and cellular repair, and this research does seem to confirm that sleep has some functional role. It's interesting that memories are played back in reverse, which might suggest some form of back-propogation in which the brain is searching for the most likely causes of interesting events. On 18/12/06, James Ratcliff [EMAIL PROTECTED] wrote: Interesting article on how they really exhaustively mapped a rats brain... The researchers could interpret the memories through electrodes inserted into the rats' brains, including into special neurons in the hippocampus. These neurons are known as place cells because each is activated when the rat passes a specific location, as if they were part of a map in the brain. The activation is so reliable that one can tell where a rat is in its cage by seeing which of its place cells is firing. http://www.nytimes.com/2006/12/18/science/18memory.html?ref=us James Ratcliff ___ James Ratcliff - http://falazar.com New Torrent Site, Has TV and Movie Downloads! http://www.falazar.com/projects/Torrents/tvtorrents_show.php __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] teleoperated robots
One problem with teleoperation is latency delays. If the robot and operator are on opposite sides of the earth, then there is a round trip speed of light latency of 133 ms, enough to impair some operations like driving a car. As a practical matter, latency will be longer because of routing delays, slower than light transmission (fiber optic speed is 3/4 of c), and satellite links. One solution is a system that anticipates teleoperator commands using local sensory data, for example, a car that anticipates a steering or braking command when seeing an obstacle in the road. Notice what this gives us. First, teleoperation is a source of high quality training data. Second, we have a straightforward means of evaluating nonverbal AI control systems, just as we have text compression (perplexity) to evaluate language models. This gives us a continuous pathway to AGI, not just a pass/fail test like the Turing test. Third, as AI systems improve, we have an unobtrusive means of evaluating human teleoperators. Those whose commands are most predictable are rated highest. --- Bob Mottram [EMAIL PROTECTED] wrote: There should also be a rating facility, where the person receiving the telerobot service can provide feedback on how well the job had been done. High scoring teleoperators would be more likely to get work than ones who just picked your tools up and threw them around. Within a few years I think there will be much money to be made - not out of the robots themselves which will be fairly dumb devices - but in the teleoperation services which act as relays between the teleoperator and the service consumer. The teleop service provides a convenient mechanism for people to get paid for carrying out jobs remotely via the robot. One side effect of this is a truly global labour market. - Bob On 07/01/07, Neil H. [EMAIL PROTECTED] wrote: On 1/5/07, Olie Lamb [EMAIL PROTECTED] wrote: Well, I for one want a job assistant who can fetch things - what apprentices or surgical nurse-assistanty things are often called to do. Assistant: Please get me a Phillips head screwdriver and half-a-dozen 10mm screws A robot that could 1) Voice recognise instructions 2) Understand simple commands like Get me X, Hold this still, Return this... 3) Manoeuvre from your work space to your tool-store 4) Grab items from an appropriately set-up tool-store etc Would be pretty damn useful, and I see most of this as being feasible with current day tech. Sure, such an assistant would be pretty damn expensive, and less useful than a high-school-dropout apprentice/assistant (who can also run down the street and get you a sandwich), but this is a real, possible application for a robot. Actually, this makes me think that in the near-term (until automation catches up) there's a market for teleoperated robots. You could issue a request to the robot, which would get routed to a teleoperation company. Using an infrastructure somewhat like a call center (but hopefully with shorter delays) somebody would then be designated to handle your teleoperation until the task was complete. If teleoperation latency was important you could pay a premium to have the request routed to someplace in-country or in-state, otherwise you could have it routed to India or someplace else with lower labor costs. As tech progresses you could add in more automation, for things like grabbing specific objects, pathfinding, or handling other simple requests. Eventually it could get to the point where a human is only required for highly advanced procedures. Of course, there's potential privacy issues, but I'm sure somebody could figure out a solution for that. Thoughts? -- Neil - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] SOTA
--- Bob Mottram [EMAIL PROTECTED] wrote: Ah, but is a thermostat conscious ? :-) Are humans conscious? It depends on your definition of consciousness, which is really hard to define. Does a thermostat want to keep the room at a constant temperature? Or does it just behave as if that is what it wants? -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Project proposal: MindPixel 2
--- Gabriel R [EMAIL PROTECTED] wrote: Also, if you can think of any way to turn the knowledge-entry process into a fun game or competition, go for it. I've been told by a few people working on similar projects that making the knowledge-providing process engaging and fun for visitors ended up being a lot more important (and difficult) than they'd expected. Cyc has a game like this called FACTory at http://www.cyc.com/ It's purpose is to help refine its knowledge base. It presents statements and asks you to rate them as true, false, don't know or doesn't make sense. For example. - Most shirts are heavier than most appendixes. - Pages are typically located in HVAC Chem Bio facilities. - Terminals are typically located in studies. - People perform or are involved in paying a mortgage more frequenty than they perform or are involved in overbearing. - Most BTU dozer blades are wider than most T-64 medium tanks. The game exposes Cyc's shortcomings pretty quickly. Cyc seems to lack a world model and a language model. Sentences seem to be constructed by relating common properties of unrelated objects. The set of common properties is fairly small: size, weight, cost, frequency (for events), containment, etc. There does not seem to be any sense that Cyc understands the purpose or function of objects. The result is that context is no help in disambiguating terms that have more than one meaning, such as appendix, page, or terminal. A language model would allow a more natural grammar, such as People pay mortgages more often than they are overbearing. This example also exposes the fallacy of logical inference. Inference allows you to draw conclusions such as this, but why would you? Inference is not a good model of human thought. A good model would compare related objects. It might ask instead whether people make mortgage payments more frequently than they receive paychecks. The game gives no hint that Cyc understands such relations. Cyc has millions of hand coded assertions. It has taken over 20 years to get this far, and it seems we are not even close. This seems to be a problem with every knowledge representation based on labeled graphs (frame-slot, first order logic, connectionist, expert system, etc). Using English words to label the elements of your data structure does not substitute for a language model. Also, this labeling tempts you to examine and update the knowledge manually. We should know by now that there is just too much data to do this. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Project proposal: MindPixel 2
--- Stephen Reed [EMAIL PROTECTED] wrote: I worked at Cycorp when the FACTory game was developed. The examples below do not reveal Cyc's knowledge of the assertions connecting these disparate concepts, rather most show that the argument constraints of the terms compared are rather overly generalized. The exception is the example Most BTU dozer blades are wider than most T-64 medium tanks. in which both concepts are specializations of Platform-Military. Download and examine concepts in OpenCyc and Cyc's world model (or lack thereof by your standards) will be readily apparent. You need ResearchCyc which has no license fee for research purposes, in order to evaluate its language model. -Steve Thanks. I did take another look at Cyc, at least this talk by Lenat at Google. http://video.google.com/videoplay?docid=-7704388615049492068 In spite of Cyc's lack of success at AGI (so far), it is still the biggest repository of common sense knowledge. He explains how Cyc had tried machine learning approaches to acquiring such knowledge and why they failed. They knew early on that it would require a 1000 person-year effort to develop the knowledge base and proceeded anyway. Cyc has 3.2 million assertions, 300,000 concepts and 16,000 relations (is-a, contains, etc). They tried very hard to simplfy the knowledge base, to keep these numbers small. Cyc is planning a Web interface to its knowledge base. If they make something useful, a 1000 person-year effort is nothing. Lenat briefly mentions Sergey's (one of Google's founders) goal of solving AI by 2020. I think if Google and Cyc work together on this, they will succeed. - Original Message From: Matt Mahoney [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Sunday, January 14, 2007 3:14:07 PM Subject: Re: [agi] Project proposal: MindPixel 2 --- Gabriel R [EMAIL PROTECTED] wrote: Also, if you can think of any way to turn the knowledge-entry process into a fun game or competition, go for it. I've been told by a few people working on similar projects that making the knowledge-providing process engaging and fun for visitors ended up being a lot more important (and difficult) than they'd expected. Cyc has a game like this called FACTory at http://www.cyc.com/ It's purpose is to help refine its knowledge base. It presents statements and asks you to rate them as true, false, don't know or doesn't make sense. For example. - Most shirts are heavier than most appendixes. - Pages are typically located in HVAC Chem Bio facilities. - Terminals are typically located in studies. - People perform or are involved in paying a mortgage more frequenty than they perform or are involved in overbearing. - Most BTU dozer blades are wider than most T-64 medium tanks. The game exposes Cyc's shortcomings pretty quickly. Cyc seems to lack a world model and a language model. Sentences seem to be constructed by relating common properties of unrelated objects. The set of common properties is fairly small: size, weight, cost, frequency (for events), containment, etc. There does not seem to be any sense that Cyc understands the purpose or function of objects. The result is that context is no help in disambiguating terms that have more than one meaning, such as appendix, page, or terminal. A language model would allow a more natural grammar, such as People pay mortgages more often than they are overbearing. This example also exposes the fallacy of logical inference. Inference allows you to draw conclusions such as this, but why would you? Inference is not a good model of human thought. A good model would compare related objects. It might ask instead whether people make mortgage payments more frequently than they receive paychecks. The game gives no hint that Cyc understands such relations. Cyc has millions of hand coded assertions. It has taken over 20 years to get this far, and it seems we are not even close. This seems to be a problem with every knowledge representation based on labeled graphs (frame-slot, first order logic, connectionist, expert system, etc). Using English words to label the elements of your data structure does not substitute for a language model. Also, this labeling tempts you to examine and update the knowledge manually. We should know by now that there is just too much data to do this. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 Never miss an email again! Yahoo! Toolbar alerts you the instant new Mail arrives. http://tools.search.yahoo.com/toolbar/features/mail/ - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2
Re: [agi] Project proposal: MindPixel 2
--- YKY (Yan King Yin) [EMAIL PROTECTED] wrote: I'm not an academic (left uni a couple years ago) so I can't get academic funding for this. If I can't start an AI business I'd have to entirely give up AI as a career. I hope you can understand these circumstances. Aren't there companies looking for AI researchers? Google? Maybe another approach (the one I took) is to publish something innovative, and people come to you. It won't make you rich, but I have so far gotten 3 small consulting jobs designing and writing data compression software or doing research, all from home, simply because people have seen my work on my website (PAQ compressor, large text benchmark, Hutter prize) or they just saw my posts on comp.compression. I never looked for any of this work. I make enough teaching at a nearby university as an adjunct, with lots of time off. I'm sure I could make more money if I wanted to work long hours in an office, but I don't need to. PAQ introduced a new compression algorithm (context mixing) when PPM algorithms were the best known. PAQ would not have made it to the top of the benchmarks without the ideas and coding and testing efforts of others working on it with no reward except name recognition. That would not have happened if it wasn't free (GPL open source). Even now, I'm sure nobody would pay even $20/copy when there is so much free competition. Other good compressors (Compressia, WinRK) have failed with this business model. I think if you want to make a business out of AI, you are in for a lot of work.First you need something that is truly innovative, that does something that nobody else can do. What will that be? A search engine better than Google? A new operating system that understands natural language? A car that drives itself? A household servant robot? A program that can manage a company? A better spam detector? Text compression? Write down a well defined goal. Do research. What is your competition? How are your ideas better than what's been done? Prove it (with benchmarks), and the opportunities will come. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Chaitin randomness
--- gts [EMAIL PROTECTED] wrote: We can imagine ourselves parsing the sequence, dividing it into two groups: 1) complex/disorderly subsequences not amenable to simple algorithmic derivation and 2) simple/orderly subsequences such as those above that are so amenable. Now, if I understand Chaitin's information-theoretic compressibility definition of randomness correctly (and I very likely do not), the simple/orderly subsequences in group 2) are compressible and so would count against the larger sequence in any compressibility measure of its randomness. If that is so then a maximally random sequence might be best considered as one that is at least slightly compressible. But this definition would be contrary to Chaitin's idea that maximally random sequences are incompressible! Not so. Any information you save by compressing the compressible bits of a random sequence is lost because you also have to specify the location of those bits. (You can use the counting argument to prove this). Also I don't believe there are two types of randomness (algorithmic and process-based). Process based randomness (flipping a coin, quantum mechanics, etc) exists only in the context of an observer, something with memory such as a sensor, computer, or brain. If an observer has insufficient knowledge (or memory) to model its environment exactly, then it must use a probabilistic model. Algorithmic theory places hard limits on what is computable. An observer cannot model an environment more (algorithmically) complex than itself. If an observer is part of the universe that it observes, then it must have fewer states than the universe that includes it. Therefore the universe must appear probabilistic to any observer within it, even if the universe is deterministic. I think Einstein's view of quantum mechanics (God does not play dice) makes more sense when viewed in this light. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] (video)The Future of Cognitive Computing
--- Eugen Leitl [EMAIL PROTECTED] wrote: On Mon, Jan 22, 2007 at 05:26:43PM -0800, Matt Mahoney wrote: The issues of consciousness have been discussed on the singularity list. These are hard questions. I'm not sure questions about anything as ill-defined as consciousness are meaningful. The question arises when we need to make moral decisions, such as is it moral to upload a human brain into software, then manipulate that data in arbitrary ways, e.g. simulate pain? I think consciousness is poorly defined because any attempt to define it leads to the conclusion that it does not exist. You know what consciousness is, but try to define it. 1. Consciousness is the little person in your head that observes everything you sense and decides everything you do. 2. Consciousness (or self awareness) is what makes you different than everyone else. 3. Consciousness is what makes the world today different than before you were born. 4. If an exact copy of you was made, atom for atom, replicating all of your memories and behavior, then the only distinction between you and your copy would be that you have a consciousness. But with any of these definitions, it becomes clear that there is no physical justification for consciousness. You believe that other people have consciousnesses because you know that you do, and others are like you. But there is no way to know for sure. How do you distinguish between a person who has self awareness and one who only behaves as if he or she does? Perhaps we can drop the insistence that consciousness exists. Then a possible definition would be any behavior consistent with a belief in self awareness or free will. But this has problems too. - Does a thermostat want to keep the room at a constant temperature, or does it only behave as if that is what it wants? (Ask this question about human behavior). I don't understand your question. It depends on your definition of want. I mean that if an agent has goal directed behavior, then it behaves as if it wants to satisfy its goals. I use this example to show that goal directed behavior is not a criteria for consciousness. Do animals have consciousness? Does an embryo? These questions are controversial. AGI will raise new controversies. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Project proposal: MindPixel 2
--- YKY (Yan King Yin) [EMAIL PROTECTED] wrote: On 1/25/07, Ben Goertzel [EMAIL PROTECTED] wrote: If there is a major problem with Cyc, it is not the choice of basic KR language. Predicate logic is precise and relatively simple. I agree mostly, though I think even Cyc's simple predicate logic language can be made even simpler and better. For example, Cyc uses the classical quantifiers #$forAll and #$exists. In my version I don't use Frege-style quantifiers but I allow generalized modifiers like many, a few, in addition to all, exists. IMHO the problem with Cyc is they tried to go directly to adult level intelligence with no theory on how people learn. This is why they are having such difficulty adding a natural language interface. Children learn semantics first, then simple sentences, and then the elements of logic such as and, or, not, all, some, etc. Cyc went straight to adult level logic and math, and now they can't add in the stuff that should have been learned as children. They should have built the language model first. Another problem is that n-th order logic (even probabilistic) is not how people think. Logic does not model inductive reasoning, e.g. Kermit is a frog. Kermit is green. Therefore frogs are green. Where is the theory that explains why people reason this way? This is what happens when you ignore the cognitive side of AI. Rather, the main problem is the impracticality of encoding a decent percentage of the needed commonsense knowledge! Now I see why we disagree here. You believe we should acquire all knowledge via experiential learning. IMO we can do even better than the experiential route. We can let the internet crowd enter the commonsense corpus for us. This should be allow us to reach a functioning, usable AGI sooner. How much knowledge you need depends on what problem you are trying to solve. Building an AGI to run a corporation is not the same as building a better spam detector. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Enumeration of useful genetic biases for AGI
I don't think there is a simple answer to this problem. We observe very complex behavior in much simpler organisms that lack long term memory or the ability to learn. For example, bees are born knowing how to fly, build hives, gather food, and communicate its location. The complexity of inductive bias is bounded by the complexity of your DNA, about 6 x 10^9 bits. This is probably too high by a few orders of magnitude, just as the number of synapses overestimates the complexity of AGI. Nevertheless, we risk repeating the error of GOFAI. Early AI researchers were led astray by the successes of explicitly coding knowledge into toy systems. Now we know to use statistical and machine learning techniques, but we may still be led astray by oversimplified models of inductive bias. Certain aspects of the cerebral cortex are highly uniform, which suggests a simple model. But the rest of the brain has a complex structure that is poorly understood. AGI might still be harder than we think. It has happened before. -- Matt Mahoney, [EMAIL PROTECTED] - Original Message From: Ben Goertzel [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, February 13, 2007 9:28:53 PM Subject: [agi] Enumeration of useful genetic biases for AGI Hi, In a recent offlist email dialogue with an AI researcher, he made the following suggestion regarding the inductive bias that DNA supplies to the human brain to aid it in learning: * What is encoded in the DNA may include a starting ontology (as proposed, with exasperating vaguess, by developmental psychologists, though much more complex than anything they have thought of) but the more important thing is an implicit set of constraints on ontologies that can be discovered by systematic 'scientific' investigation. So it might not work in an arbitrary universe, including some simulated universes,e.g. 'tileworld' universes. One such constraint (as Kant pointed out in 1780) is the assumption that everything physical happens in 3-D space and time. Another is the requirement for causal determinism (for most processes). There may also be constraints on kinds of information-processing entities that can be learnt about in the environment, e.g. other humans, other animals, dead-ancestors, gods, spirits, computer games, The major, substantive, ontology extensions have to happen in (partially ordered) stages, each stage building on previous stages, and brain development is staggered accordingly. ** My response to him was that these genetic biases are indeed encoded in the Novamente design, but in a somewhat unsystematic and scattered way. For instance, in the Novamente system, -- the restriction to 3D space is implicit in the set of elementary predicates and procedures supplied to the system for preprocessing perceptual data on its way to abstract cognition -- the bias toward causal determinism is implicit in an inference control mechanism that specifically tries to build PredictiveAttractionLink relationships that embody likely causal relationships etc. I have actually never gone through the design with an eye towards identifying exactly how each important genetic bias of cognition is encoded in the system. However, this would be an interesting and worthwhile thing to do. Toward that end, it would be interesting to have a systematic list somewhere of the genetic biases that are thought to be most important for structuring human cognition. Does anyone know of a well-thought-out list of this sort. Of course I could make one by surveying the cognitive psych literature, but why reinvent the wheel? -- Ben G - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
[agi] Re: Languages for AGI
I think choosing an architecture for AGI is a much more important problem than choosing a language. But there are some things we already know about AGI. First, AGI requires a vast amount of knowledge, and therefore a vast amount of computation. Therefore, at least part of the AGI will have to be implemented in a fast (perhaps parallel) language. Second, if you plan on having a team of programmers do the work (rather than all by yourself) then you will have to choose a widely known language. Early work in AI used languages like Lisp or Prolog to directly express knowledge. Now we all know (except at Cycorp) that this does not work. There is too much knowledge to code directly. You will need a learning algorithm and training and test data. The minimum requirement for AGI is a language model, which requires about 10^9 bits of information (based on estimates by Turing and Landauer, and the amount of language processed by adulthood). When you add vision, speech, robotics, etc., it will be more. We don't know how much, but if we use the human brain as a model, then one estimate is the number of synapses (about 10^13) multiplied by the access rate (10 Hz) = 10^14 operations per second. But these numbers are really just guesses. Perhaps they are high, but people have been working on computational shortcuts for the last 50 years without success. My work is in data compression, which I believe is an AI problem. (You might disagree, but first see my argument at http://cs.fit.edu/~mmahoney/compression/rationale.html ). Whether or not you agree, compression, like AGI, requires a great deal of memory and CPU. Many of the top compressors ranked in my benchmark are open source, and of those, the top languages are C++ followed by C and assembler. I don't know of any written in Java, C#, Python, or any interpreted languages, or any that use relational databases. AGI is amenable to parallel computation. Language, vision, speech, and robotics all involve combining thousands of soft constraints. This requires vector operations. The fastest way to do this on a PC is to use the parallel MMX and SSE2 instructions (or a GPU) that are not accessible in high level languages. The 16-bit vector dot product that I implemented in MMX as part of the neural network used in the PAQ compressor is 6 times faster than optimized C. Fortunately you do not need a lot of assembler, maybe a couple hundred lines of code to do most of the work. AGI is still an area of research. Not only do you need fast implementations so your experiments finish in reasonable time, but you will need to change your code many times. Train, test, modify, repeat. Your code has to be both optimized and structured so that it can be easily changed in ways you can't predict. This is hard, but unfortunately we do not know yet what will work. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] Do AGIs dream of electric sheep?
I believe the purpose of sleep in placental and marsupial mammals (the only animals with REM sleep) is to copy medium term (daily) memories from the hippocampus to long term memory in the cortex. In humans, only visual and verbal memories are transferred (as dreams). During deep sleep between dreams, memories in the cortex are played back in reverse and fed back to the hippocampus*, which I believe is the process of erasing medium term memories as part of a feedback loop. An AGI should have a hierarchy of short and long term memory, but I don't believe it is necessary to mimic sleeping and dreaming. I think there are more efficient ways to implement a cache when you remove the limitations of neurons. *discussed on this list. Sorry, I don't remember the reference. --- Chuck Esterbrook [EMAIL PROTECTED] wrote: This is a light article about the purpose and value of sleep in humans: http://www.dailymail.co.uk/pages/live/articles/technology/technology.html?in_article_id=437683in_page_id=1965 The article is nothing earth shattering, but it reminded me that I've thought for a long time that an AGI would likely have a sleep cycle to perform various functions such as optimizing memory retrieval, learning new associations, solving problems, etc. What about the AGIs that people are building or working towards, such as those from Novamente, AdaptiveAI, Hall, etc.? Do/Will your systems have sleep periods for internal maintenance and improvement? If so, what types of activities do they perform during sleep? Or feel free to chime in with thoughts on AGI and sleep even if you haven't begun building yet... -Chuck -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
Re: [agi] The Missing Piece
--- Andrii (lOkadin) Zvorygin [EMAIL PROTECTED] wrote: Hmmm, if you could put on some basic rules on the randomness(in a database of Lojban that gives a random statement or series of statements), say to accept logical statements that could then be applied onto input. So say you same something like le MLAtu cu GLEki (the cat is happy) and later make a statement le MLAtu and press return it could ask you cu GLEki gi'a mo (is happy or is what function?). If it was to be a chat bot, it could wait for a reply and if it believes no one is interested it could offer a random phrase as a topic such as le MLAtu cu GLEki. So maybe some can try approaching AI from the other way around? Instead of going bottom up of purely unambiguous code to restricted randomness of interaction. To go from pure randomness to restricted randomness of interaction. Does anyone know what would be a good language to do that in? I think I recall there being a programming language based on set theory that was all about streams. What about English? Irregular grammar is only a tiny part of the language modeling problem. Uaing an artificial language with a regular grammar to simplify the problem is a false path. If people actually used Logban then it would be used in ways not intended by the developer and it would develop all the warts of real languages. The real problem is to understand how humans learn language. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303