[agi] Solomonoff Induction is Not Universal and Probability is not Prediction
Suppose you have sets of programs that produce two strings. One set of outputs is 00 and the other is 11. Now suppose you used these sets of programs to chart the probabilities of the output of the strings. If the two strings were each output by the same number of programs then you'd have a .5 probability that either string would be output. That's ok. But, a more interesting question is, given that the first digits are 000, what are the chances that the next digit will be 1? Dim Induction will report .5, which of course is nonsense and a whole less useful than making a rough guess. But, of course, Solomonoff Induction purports to be able, if it was feasible, to compute the possibilities for all possible programs. Ok, but now, try thinking about this a little bit. If you have ever tried writing random program instructions what do you usually get? Well, I'll take a hazard and guess (a lot better than the bogus method of confusing shallow probability with prediction in my example) and say that you will get a lot of programs that crash. Well, most of my experiments with that have ended up with programs that go into an infinite loop or which crash. Now on a universal Turing machine, the results would probably look a little different. Some strings will output nothing and go into an infinite loop. Some programs will output something and then either stop outputting anything or start outputting an infinite loop of the same substring. Other programs will go on to infinity producing something that looks like random strings. But the idea that all possible programs would produce well distributed strings is complete hogwash. Since Solomonoff Induction does not define what kind of programs should be used, the assumption that the distribution would produce useful data is absurd. In particular, the use of the method to determine the probability based given an initial string (as in what follows given the first digits are 000) is wrong as in really wrong. The idea that this crude probability can be used as prediction is unsophisticated. Of course you could develop an infinite set of Solomonoff Induction values for each possible given initial sequence of digits. Hey when you're working with infeasible functions why not dream anything? I might be wrong of course. Maybe there is something you guys haven't been able to get across to me. Even if you can think for yourself you can still make mistakes. So if anyone has actually tried writing a program to output all possible programs (up to some feasible point) on a Turing Machine simulator, let me know how it went. Jim Bromer --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Hutter - A fundamental misdirection?
In short, instead of a pot of neurons, we might instead have a pot of dozens of types of neurons that each have their own complex rules regarding what other types of neurons they can connect to, and how they process information... ...there is plenty of evidence (from the slowness of evolution, the large number (~200) of neuron types, etc.), that it is many-layered and quite complex... The disconnect between the low-level neural hardware and the implementation of algorithms that build conceptual spaces via dimensionality reduction--which generally ignore facts such as the existence of different types of neurons, the apparently hierarchical organization of neocortex, etc.--seems significant. Have there been attempts to develop computational models capable of LSA-style feats (e.g., constructing a vector space in which words with similar meanings tend to be relatively close to each other) that take into account basic facts about how neurons actually operate (ideally in a more sophisticated way than the nodes of early connectionist networks which, as we now know, are not particularly neuron-like at all)? If so, I would love to know about them. On Tue, Jun 29, 2010 at 3:02 PM, Ian Parker ianpark...@gmail.com wrote: The paper seems very similar in principle to LSA. What you need for a concept vector (or position) is the application of LSA followed by K-Means which will give you your concept clusters. I would not knock Hutter too much. After all LSA reduces {primavera, mamanthal, salsa, resorte} to one word giving 2 bits saving on Hutter. - Ian Parker On 29 June 2010 07:32, rob levy r.p.l...@gmail.com wrote: Sorry, the link I included was invalid, this is what I meant: http://www.geog.ucsb.edu/~raubal/Publications/RefConferences/ICSC_2009_AdamsRaubal_Camera-FINAL.pdfhttp://www.geog.ucsb.edu/%7Eraubal/Publications/RefConferences/ICSC_2009_AdamsRaubal_Camera-FINAL.pdf On Tue, Jun 29, 2010 at 2:28 AM, rob levy r.p.l...@gmail.com wrote: On Mon, Jun 28, 2010 at 5:23 PM, Steve Richfield steve.richfi...@gmail.com wrote: Rob, I just LOVE opaque postings, because they identify people who see things differently than I do. I'm not sure what you are saying here, so I'll make some random responses to exhibit my ignorance and elicit more explanation. I think based on what you wrote, you understood (mostly) what I was trying to get across. So I'm glad it was at least quasi-intelligible. :) It sounds like this is a finer measure than the dimensionality that I was referencing. However, I don't see how to reduce anything as quantized as dimensionality into finer measures. Can you say some more about this? I was just referencing Gardenfors' research program of conceptual spaces (I was intentionally vague about committing to this fully though because I don't necessarily think this is the whole answer). Page 2 of this article summarizes it pretty succinctly: http://http://goog_1627994790 www.geog.ucsb.edu/.../ICSC_2009_AdamsRaubal_Camera-FINAL.pdf However, different people's brains, even the brains of identical twins, have DIFFERENT mappings. This would seem to mandate experience-formed topology. Yes definitely. Since these conceptual spaces that structure sensorimotor expectation/prediction (including in higher order embodied exploration of concepts I think) are multidimensional spaces, it seems likely that some kind of neural computation over these spaces must occur, I agree. though I wonder what it actually would be in terms of neurons, (and if that matters). I don't see any route to the answer except via neurons. I agree this is true of natural intelligence, though maybe in modeling, the neural level can be shortcut to the topo map level without recourse to neural computation (use some more straightforward computation like matrix algebra instead). Rob *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction
Jim, I am unable to find the actual objection to Solomonoff in what you wrote (save for that it's wrong as in really wrong). It's true that a lot of programs won't produce any output. That just means they won't alter the prediction. It's also true that a lot of programs will produce random-looking or boring-looking output. This just means that Solomonoff will have some expectation of those things. To use your example, given 000, the chances that the next digit will be 0 will be fairly high thanks to boring programs which just output lots of zeros. (Not sure why you mention the idea that it might be .5? This sounds like no induction rather than dim induction...) --Abram On Wed, Jul 7, 2010 at 10:10 AM, Jim Bromer jimbro...@gmail.com wrote: Suppose you have sets of programs that produce two strings. One set of outputs is 00 and the other is 11. Now suppose you used these sets of programs to chart the probabilities of the output of the strings. If the two strings were each output by the same number of programs then you'd have a .5 probability that either string would be output. That's ok. But, a more interesting question is, given that the first digits are 000, what are the chances that the next digit will be 1? Dim Induction will report .5, which of course is nonsense and a whole less useful than making a rough guess. But, of course, Solomonoff Induction purports to be able, if it was feasible, to compute the possibilities for all possible programs. Ok, but now, try thinking about this a little bit. If you have ever tried writing random program instructions what do you usually get? Well, I'll take a hazard and guess (a lot better than the bogus method of confusing shallow probability with prediction in my example) and say that you will get a lot of programs that crash. Well, most of my experiments with that have ended up with programs that go into an infinite loop or which crash. Now on a universal Turing machine, the results would probably look a little different. Some strings will output nothing and go into an infinite loop. Some programs will output something and then either stop outputting anything or start outputting an infinite loop of the same substring. Other programs will go on to infinity producing something that looks like random strings. But the idea that all possible programs would produce well distributed strings is complete hogwash. Since Solomonoff Induction does not define what kind of programs should be used, the assumption that the distribution would produce useful data is absurd. In particular, the use of the method to determine the probability based given an initial string (as in what follows given the first digits are 000) is wrong as in really wrong. The idea that this crude probability can be used as prediction is unsophisticated. Of course you could develop an infinite set of Solomonoff Induction values for each possible given initial sequence of digits. Hey when you're working with infeasible functions why not dream anything? I might be wrong of course. Maybe there is something you guys haven't been able to get across to me. Even if you can think for yourself you can still make mistakes. So if anyone has actually tried writing a program to output all possible programs (up to some feasible point) on a Turing Machine simulator, let me know how it went. Jim Bromer *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com -- Abram Demski http://lo-tho.blogspot.com/ http://groups.google.com/group/one-logic --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction
Abram, I don't think you are right. The reason is that Solomonoff Induction does not produce a true universal probability for any given first digits. To do so it would have to be capable of representing the probability of any (computable) sequence that follows any (computable) string of given first digits. Yes, if a high proportion of programs produce 00, it will be able to register that as string as more probable, but the information on what the next digits will be, given some input, will not be represented in anything that resembled compression. For instance, if you had 62 bits and wanted to know what the probability of the next two bits were, you would have to have done the infinite calculations of a Solomonoff Induction for each of the 2^62 possible combination of bits that represented the possible input to your problem. I might be wrong, but I don't see where all this is information is being hidden if I am. On the other hand, if I am right (or even partially right) I don't understand why seemingly smart people are excited about this as a possible AGI method. We in AGI specifically want to know the answer to the kind of question: Given some partially defined situation, how could a computer best figure out what is going on. Most computer situations are going to be represented by kilobytes or megabytes these days, not in strings of 32 bits or less. If there was an abstraction that could help us think about these things, it could help even if the ideal would be way beyond any feasible technology. And there is an abstraction like this that can help us. Applied probability. We can think about these ideas in the terms of strings if we want to but the key is that WE have to work out the details because we see the problems differently. There is nothing that I have seen in Solomonoff Induction that suggests that this is an adequate or even useful method to use. On the other hand I would not be talking about this if it weren't for Solomonoff so maybe I just don't share your enthusiasm. If I have misunderstood something then all I can say is that I am still waiting for someone to explain it in a way that I can understand. Jim On Wed, Jul 7, 2010 at 1:58 PM, Abram Demski abramdem...@gmail.com wrote: Jim, I am unable to find the actual objection to Solomonoff in what you wrote (save for that it's wrong as in really wrong). It's true that a lot of programs won't produce any output. That just means they won't alter the prediction. It's also true that a lot of programs will produce random-looking or boring-looking output. This just means that Solomonoff will have some expectation of those things. To use your example, given 000, the chances that the next digit will be 0 will be fairly high thanks to boring programs which just output lots of zeros. (Not sure why you mention the idea that it might be .5? This sounds like no induction rather than dim induction...) --Abram On Wed, Jul 7, 2010 at 10:10 AM, Jim Bromer jimbro...@gmail.com wrote: Suppose you have sets of programs that produce two strings. One set of outputs is 00 and the other is 11. Now suppose you used these sets of programs to chart the probabilities of the output of the strings. If the two strings were each output by the same number of programs then you'd have a .5 probability that either string would be output. That's ok. But, a more interesting question is, given that the first digits are 000, what are the chances that the next digit will be 1? Dim Induction will report .5, which of course is nonsense and a whole less useful than making a rough guess. But, of course, Solomonoff Induction purports to be able, if it was feasible, to compute the possibilities for all possible programs. Ok, but now, try thinking about this a little bit. If you have ever tried writing random program instructions what do you usually get? Well, I'll take a hazard and guess (a lot better than the bogus method of confusing shallow probability with prediction in my example) and say that you will get a lot of programs that crash. Well, most of my experiments with that have ended up with programs that go into an infinite loop or which crash. Now on a universal Turing machine, the results would probably look a little different. Some strings will output nothing and go into an infinite loop. Some programs will output something and then either stop outputting anything or start outputting an infinite loop of the same substring. Other programs will go on to infinity producing something that looks like random strings. But the idea that all possible programs would produce well distributed strings is complete hogwash. Since Solomonoff Induction does not define what kind of programs should be used, the assumption that the distribution would produce useful data is absurd. In particular, the use of the method to determine the probability based given an initial string (as in what follows given the
Re: [agi] Hutter - A fundamental misdirection?
There is very little. Someone do research. Here is a paper on language fitness. http://kybele.psych.cornell.edu/~edelman/elcfinal.pdf http://kybele.psych.cornell.edu/~edelman/elcfinal.pdfLSA is *not* discussed nor is any fitness concept with the language itself. Similar sounding (or written) words must be capable of disambiguation using LSA, otherwise the language would be unfit. Let us have a *gedanken* language where spring the example I have taken with my Spanish cannot be disambiguated. Suppose * spring* meant *step forward, *as well as its other meanings. If I am learning to dance I do not think about *primavera, resorte *or* mamanthal* but I do think about *salsa*. If I did not know whether I was to jump or put my leg forward it would be extremely confusing. To my knowledge fitness in this context has not been discussed. In fact perhaps the only work that is relevant is my own which I posted here some time ago. The reduction in entropy (compression) obtained with LSA was disappointing. The different meanings (different words in Spanish other languages) are compressed more readily. Both Spanish and English have a degree of fitness which (just possibly) is definable in LSA terms. - Ian Parker On 7 July 2010 17:12, Gabriel Recchia grecc...@gmail.com wrote: In short, instead of a pot of neurons, we might instead have a pot of dozens of types of neurons that each have their own complex rules regarding what other types of neurons they can connect to, and how they process information... ...there is plenty of evidence (from the slowness of evolution, the large number (~200) of neuron types, etc.), that it is many-layered and quite complex... The disconnect between the low-level neural hardware and the implementation of algorithms that build conceptual spaces via dimensionality reduction--which generally ignore facts such as the existence of different types of neurons, the apparently hierarchical organization of neocortex, etc.--seems significant. Have there been attempts to develop computational models capable of LSA-style feats (e.g., constructing a vector space in which words with similar meanings tend to be relatively close to each other) that take into account basic facts about how neurons actually operate (ideally in a more sophisticated way than the nodes of early connectionist networks which, as we now know, are not particularly neuron-like at all)? If so, I would love to know about them. On Tue, Jun 29, 2010 at 3:02 PM, Ian Parker ianpark...@gmail.com wrote: The paper seems very similar in principle to LSA. What you need for a concept vector (or position) is the application of LSA followed by K-Means which will give you your concept clusters. I would not knock Hutter too much. After all LSA reduces {primavera, mamanthal, salsa, resorte} to one word giving 2 bits saving on Hutter. - Ian Parker On 29 June 2010 07:32, rob levy r.p.l...@gmail.com wrote: Sorry, the link I included was invalid, this is what I meant: http://www.geog.ucsb.edu/~raubal/Publications/RefConferences/ICSC_2009_AdamsRaubal_Camera-FINAL.pdfhttp://www.geog.ucsb.edu/%7Eraubal/Publications/RefConferences/ICSC_2009_AdamsRaubal_Camera-FINAL.pdf On Tue, Jun 29, 2010 at 2:28 AM, rob levy r.p.l...@gmail.com wrote: On Mon, Jun 28, 2010 at 5:23 PM, Steve Richfield steve.richfi...@gmail.com wrote: Rob, I just LOVE opaque postings, because they identify people who see things differently than I do. I'm not sure what you are saying here, so I'll make some random responses to exhibit my ignorance and elicit more explanation. I think based on what you wrote, you understood (mostly) what I was trying to get across. So I'm glad it was at least quasi-intelligible. :) It sounds like this is a finer measure than the dimensionality that I was referencing. However, I don't see how to reduce anything as quantized as dimensionality into finer measures. Can you say some more about this? I was just referencing Gardenfors' research program of conceptual spaces (I was intentionally vague about committing to this fully though because I don't necessarily think this is the whole answer). Page 2 of this article summarizes it pretty succinctly: http://http://goog_1627994790 www.geog.ucsb.edu/.../ICSC_2009_AdamsRaubal_Camera-FINAL.pdf However, different people's brains, even the brains of identical twins, have DIFFERENT mappings. This would seem to mandate experience-formed topology. Yes definitely. Since these conceptual spaces that structure sensorimotor expectation/prediction (including in higher order embodied exploration of concepts I think) are multidimensional spaces, it seems likely that some kind of neural computation over these spaces must occur, I agree. though I wonder what it actually would be in terms of neurons, (and if that matters). I don't see any route to the answer except via neurons. I agree this is true of natural
Re: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction
Jim Bromer wrote: But, a more interesting question is, given that the first digits are 000, what are the chances that the next digit will be 1? Dim Induction will report .5, which of course is nonsense and a whole less useful than making a rough guess. Wrong. The probability of a 1 is p(0001)/(p()+p(0001)) where the probabilities are computed using Solomonoff induction. A program that outputs will be shorter in most languages than a program that outputs 0001, so 0 is the most likely next bit. More generally, probability and prediction are equivalent by the chain rule. Given any 2 strings x followed by y, the prediction p(y|x) = p(xy)/p(x). -- Matt Mahoney, matmaho...@yahoo.com From: Jim Bromer jimbro...@gmail.com To: agi agi@v2.listbox.com Sent: Wed, July 7, 2010 10:10:37 AM Subject: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction Suppose you have sets of programs that produce two strings. One set of outputs is 00 and the other is 11. Now suppose you used these sets of programs to chart the probabilities of the output of the strings. If the two strings were each output by the same number of programs then you'd have a .5 probability that either string would be output. That's ok. But, a more interesting question is, given that the first digits are 000, what are the chances that the next digit will be 1? Dim Induction will report .5, which of course is nonsense and a whole less useful than making a rough guess. But, of course, Solomonoff Induction purports to be able, if it was feasible, to compute the possibilities for all possible programs. Ok, but now, try thinking about this a little bit. If you have ever tried writing random program instructions what do you usually get? Well, I'll take a hazard and guess (a lot better than the bogus method of confusing shallow probability with prediction in my example) and say that you will get a lot of programs that crash. Well, most of my experiments with that have ended up with programs that go into an infinite loop or which crash. Now on a universal Turing machine, the results would probably look a little different. Some strings will output nothing and go into an infinite loop. Some programs will output something and then either stop outputting anything or start outputting an infinite loop of the same substring. Other programs will go on to infinity producing something that looks like random strings. But the idea that all possible programs would produce well distributed strings is complete hogwash. Since Solomonoff Induction does not define what kind of programs should be used, the assumption that the distribution would produce useful data is absurd. In particular, the use of the method to determine the probability based given an initial string (as in what follows given the first digits are 000) is wrong as in really wrong. The idea that this crude probability can be used as prediction is unsophisticated. Of course you could develop an infinite set of Solomonoff Induction values for each possible given initial sequence of digits. Hey when you're working with infeasible functions why not dream anything? I might be wrong of course. Maybe there is something you guys haven't been able to get across to me. Even if you can think for yourself you can still make mistakes. So if anyone has actually tried writing a program to output all possible programs (up to some feasible point) on a Turing Machine simulator, let me know how it went. Jim Bromer agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
Re: [agi] Hutter - A fundamental misdirection?
Gorrell and Webb describe a neural implementation of LSA that seems more biologically plausible than the usual matrix factoring implementation. http://www.dcs.shef.ac.uk/~genevieve/gorrell_webb.pdf In the usual implementation, a word-word matrix A is factored to A = USV where S is diagonal (containing eigenvalues), and then the smaller elements of S are discarded. In the Gorrell model, U and V are the weights of a 3 layer neural network mapping words to words, and the nonzero elements of S represent the semantic space in the middle layer. As the network is trained, neurons are added to S. Thus, the network is trained online in a single pass, unlike factoring, which is offline. -- Matt Mahoney, matmaho...@yahoo.com From: Gabriel Recchia grecc...@gmail.com To: agi agi@v2.listbox.com Sent: Wed, July 7, 2010 12:12:00 PM Subject: Re: [agi] Hutter - A fundamental misdirection? In short, instead of a pot of neurons, we might instead have a pot of dozens of types of neurons that each have their own complex rules regarding what other types of neurons they can connect to, and how they process information... ...there is plenty of evidence (from the slowness of evolution, the large number (~200) of neuron types, etc.), that it is many-layered and quite complex... The disconnect between the low-level neural hardware and the implementation of algorithms that build conceptual spaces via dimensionality reduction--which generally ignore facts such as the existence of different types of neurons, the apparently hierarchical organization of neocortex, etc.--seems significant. Have there been attempts to develop computational models capable of LSA-style feats (e.g., constructing a vector space in which words with similar meanings tend to be relatively close to each other) that take into account basic facts about how neurons actually operate (ideally in a more sophisticated way than the nodes of early connectionist networks which, as we now know, are not particularly neuron-like at all)? If so, I would love to know about them. On Tue, Jun 29, 2010 at 3:02 PM, Ian Parker ianpark...@gmail.com wrote: The paper seems very similar in principle to LSA. What you need for a concept vector (or position) is the application of LSA followed by K-Means which will give you your concept clusters. I would not knock Hutter too much. After all LSA reduces {primavera, mamanthal, salsa, resorte} to one word giving 2 bits saving on Hutter. - Ian Parker On 29 June 2010 07:32, rob levy r.p.l...@gmail.com wrote: Sorry, the link I included was invalid, this is what I meant: http://www.geog.ucsb.edu/~raubal/Publications/RefConferences/ICSC_2009_AdamsRaubal_Camera-FINAL.pdf On Tue, Jun 29, 2010 at 2:28 AM, rob levy r.p.l...@gmail.com wrote: On Mon, Jun 28, 2010 at 5:23 PM, Steve Richfield steve.richfi...@gmail.com wrote: Rob, I just LOVE opaque postings, because they identify people who see things differently than I do. I'm not sure what you are saying here, so I'll make some random responses to exhibit my ignorance and elicit more explanation. I think based on what you wrote, you understood (mostly) what I was trying to get across. So I'm glad it was at least quasi-intelligible. :) It sounds like this is a finer measure than the dimensionality that I was referencing. However, I don't see how to reduce anything as quantized as dimensionality into finer measures. Can you say some more about this? I was just referencing Gardenfors' research program of conceptual spaces (I was intentionally vague about committing to this fully though because I don't necessarily think this is the whole answer). Page 2 of this article summarizes it pretty succinctly: http://www.geog.ucsb.edu/.../ICSC_2009_AdamsRaubal_Camera-FINAL.pdf However, different people's brains, even the brains of identical twins, have DIFFERENT mappings. This would seem to mandate experience-formed topology. Yes definitely. Since these conceptual spaces that structure sensorimotor expectation/prediction (including in higher order embodied exploration of concepts I think) are multidimensional spaces, it seems likely that some kind of neural computation over these spaces must occur, I agree. though I wonder what it actually would be in terms of neurons, (and if that matters). I don't see any route to the answer except via neurons. I agree this is true of natural intelligence, though maybe in modeling, the neural level can be shortcut to the topo map level without recourse to neural computation (use some more straightforward computation like matrix algebra instead). Rob agi | Archives | Modify Your Subscription agi | Archives | Modify Your Subscription agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed:
Re: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction
Matt, But you are still saying that Solomonoff Induction has to be recomputed for each possible combination of bit value aren't you? Although this doesn't matter when you are dealing with infinite computations in the first place, it does matter when you are wondering if this has anything to do with AGI and compression efficiencies. Jim Bromer On Wed, Jul 7, 2010 at 5:44 PM, Matt Mahoney matmaho...@yahoo.com wrote: Jim Bromer wrote: But, a more interesting question is, given that the first digits are 000, what are the chances that the next digit will be 1? Dim Induction will report .5, which of course is nonsense and a whole less useful than making a rough guess. Wrong. The probability of a 1 is p(0001)/(p()+p(0001)) where the probabilities are computed using Solomonoff induction. A program that outputs will be shorter in most languages than a program that outputs 0001, so 0 is the most likely next bit. More generally, probability and prediction are equivalent by the chain rule. Given any 2 strings x followed by y, the prediction p(y|x) = p(xy)/p(x). -- Matt Mahoney, matmaho...@yahoo.com -- *From:* Jim Bromer jimbro...@gmail.com *To:* agi agi@v2.listbox.com *Sent:* Wed, July 7, 2010 10:10:37 AM *Subject:* [agi] Solomonoff Induction is Not Universal and Probability is not Prediction Suppose you have sets of programs that produce two strings. One set of outputs is 00 and the other is 11. Now suppose you used these sets of programs to chart the probabilities of the output of the strings. If the two strings were each output by the same number of programs then you'd have a .5 probability that either string would be output. That's ok. But, a more interesting question is, given that the first digits are 000, what are the chances that the next digit will be 1? Dim Induction will report .5, which of course is nonsense and a whole less useful than making a rough guess. But, of course, Solomonoff Induction purports to be able, if it was feasible, to compute the possibilities for all possible programs. Ok, but now, try thinking about this a little bit. If you have ever tried writing random program instructions what do you usually get? Well, I'll take a hazard and guess (a lot better than the bogus method of confusing shallow probability with prediction in my example) and say that you will get a lot of programs that crash. Well, most of my experiments with that have ended up with programs that go into an infinite loop or which crash. Now on a universal Turing machine, the results would probably look a little different. Some strings will output nothing and go into an infinite loop. Some programs will output something and then either stop outputting anything or start outputting an infinite loop of the same substring. Other programs will go on to infinity producing something that looks like random strings. But the idea that all possible programs would produce well distributed strings is complete hogwash. Since Solomonoff Induction does not define what kind of programs should be used, the assumption that the distribution would produce useful data is absurd. In particular, the use of the method to determine the probability based given an initial string (as in what follows given the first digits are 000) is wrong as in really wrong. The idea that this crude probability can be used as prediction is unsophisticated. Of course you could develop an infinite set of Solomonoff Induction values for each possible given initial sequence of digits. Hey when you're working with infeasible functions why not dream anything? I might be wrong of course. Maybe there is something you guys haven't been able to get across to me. Even if you can think for yourself you can still make mistakes. So if anyone has actually tried writing a program to output all possible programs (up to some feasible point) on a Turing Machine simulator, let me know how it went. Jim Bromer *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com/ *agi* | Archives https://www.listbox.com/member/archive/303/=now https://www.listbox.com/member/archive/rss/303/ | Modifyhttps://www.listbox.com/member/?;Your Subscription http://www.listbox.com/ --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com