Re: Predictions duplications
Schmidhuber wrote: Why care for the subset of provable sentences? Aren't we interested in the full set of all describable sentences? We are interested in the true sentences. The provable one and the unprovable one. We can generate it, without caring for proofs at all. If you mean generate all describable *sentences* , this is trivial but then you generate only a language. If you mean generate all computations, it is equivalent with generating the proofs of all \Sigma_1 sentences (= the arithmetical UD). The provability predicate is just a universal \Sigma_1 sentence. (I recall a \Sigma_1 sentence is a sentence equivalent to (Ex)P(x) with P algorithmicaly decidable). Unspeakable means beyond formal definition. Beyond turing nameability? Beyond formal definition by whom? You know the universal sound machine has no mean to define its own truth predicate (Tarski). Is it in that sense? The knowledge of p by the machine can be define (in a first approximation) by p Bew('p), in which case, although thoroughly clear at the metalevel, the knowledge *by* the machine is beyound any formal definition *for* the machine. Bew = Godel provability predicate, 'p = the godel number of p. Bew(x) = (Ey)B(y,x) with B(y,x) means y is the godel number of a proof of a sentence with godel number x. In particular, the way we humans communicate and agree on a common formal language is not formally defined. OK. Most importantly the very meaning of finite or number, cannot be formally defined. But I show that as far as we are sound universal machine, or own knowledgeability predicate cannot be formally defined by us. You write x and I say, ok, it's an x, but that's all based on some sort of informal agreement that involves complex pattern recognition and learning processes. Usually we do not question the validity of outcomes of such cognitive processes, and just go ahead with formal derivations. Yes, but the situation is worst for the very *meaning* of our utterances. But there is this major informal step *before* formal reasoning can start, and good textbooks on logic acknowledge this. There is *this* one, but there is a deeper one with the *meaning* of our formal presentations. We really bet we share a common intuition on the notion of finitary procedure. Note that intuitionist and classicist diverge beyond natural numbers. We are not on the same lenghtwave Juergen. You believe there is a computable universe. I am agnostic about that. But if I survive or (just experience nothing) through a digital functionnal substitution made at some level L then my experience is defined by the set of all consistent extensions defined by all consistent histories definissable below L. With comp realities emerge from the way consistent machine's memories organise themselves (through the many many histories going through their states). Your speed prior would be useful if it shows how it enhances the weight of normal stories in the limit first person experiences (due to unawareness of the reconstitution delays). Note that quantum quantization of classical theories provide such an explanation. I show comp need to extract that quantization from a measure on the consistent extensions. I show the probability one obeys the main axioms of quantum logic. You ask me often why I give so much importance to the notion of provability. The reason is that in some sense I just interview scientific machines, as they can appear in the GP's work, about what they are able to prove about their own consistent extensions, and how and why they can arrive to probability consisderation. And provability is enough non trivial for providing clues on the minimal necessary logical structures on that set of consistent extensions. Bruno
Re: Predictions duplications
From: Russell Standish [EMAIL PROTECTED] The only reason for not accepting the simplest thing is if it can be shown to be logically inconsistent. This far, you have shown no such thing, but rather demonstrated an enormous confusion between measure and probability distribution. Russell, this is really too much; please read any intro to measure theory, e.g., the one by Halmos. Measures in general are _not_ probability distributions. They are more than that; you can view them as _sets_ of probability distributions on sets that are related to each other in a specific way. Probability density functions define measures and therefore in general aren't probability distributions either. The uniform PDF on [0..1] yields the perfect trivial example; that's exactly why I used it before. In the computability context, compare LI/Vitanyi 1997, chapters 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5 (continuous sample space with (semi)measure M(x)). This is an excellent reference, thank you. Example 4.2.1 gives the construction of the uniform measure over the set of strings, precisely what you claim didn't exist when we started this discussion. You just want to annoy me, don't you? For the last time: There is a uniform *measure* on the strings, but no uniform probability distribution. 1) Halting theorem implies the set of halting programs is of measure zero within the set of all programs You mean, given a uniform measure? But why should you need the halting theorem for this trivial statement? In fact, what exactly do you mean by halting theorem? (Schoenfield's book provides a good intro to mathematical logic and computability theory.) The halting theorem is that there is no algorithm for predicting whether any program will halt. Of course, the result I was referring to is that the set of compressible strings is of measure zero. The set of halting programs actually has finite measure, since \Omega 0. However, there is still finite probability of an input string being nonhalting, and indeed the probability seems to be greater than 0.7 (ie \Omega 0.217643 - see pg 218 Li and Vitanyi), so this still presents a problem for computationalism. This is mixing up everything, in particular, different measures. 1) Infinite strings with finite program have *nonzero* measure, given the *universal* measure (Omega is a number derived from the universal measure). 2) They have measure zero, given the *uniform* measure (Omega has nothing to do with it). 3) To see 2) we do not need the halting theorem - we just compare the finite programs (countably many) to all strings (uncountably many) and we are done. 4) The halting theorem is no problem whatsoever when it comes to computable universes. Why care whether some computable universe stops or not? If it does, it does, otherwise it does not. In both cases we can compute it in the limit. The GP program (aka Universal Dovetailer) is not the simplest thing. The simplest describable thing is the set of all descriptions, that simply exist without being generated. That has precisely zero information or complexity, whereas the UD has some complexity of the order of a few 100 bits. The simplest prior is the uniform one, ie there is no bias whatsoever in the selection. Exist without being generated? Precisely zero information or complexity? But with respect to what? With respect to the observer. With the respect to anything in fact. The set of all possible descriptions has precisely zero information, by definition, regardless of what observer or context you employ. This is a vague informal statement, not a definition. Which are the possible descriptions? What's the formal difference between a description and a non-description? What is your anything if there is no device for describing it formally? If anything does not convey any bit of info then what exactly is it that conveys one bit? Or two? Any traditional definition of information/simplicity requires the specification of a formal description method, such as a Turing machine. You don't need one? Then how do you define information or complexity? How exactly do you define simple ? Actually, all it needs is an equivalencing procedure. See my On Complexity and Emergence paper. A Turing machine will do the job for you, but so will a neural network or an expert system following heuristic rules. So you have to write down formally this equivalencing procedure, right? Why should this be different in principle from writing down the formal rules for a Turing machine? Why is a simplicity measure that depends on your equivalencing procedure superior to a simplicity measure depending on a Turing machine? (The TM procedure includes yours - because it includes all procedures - but not the other way round.)
Re: Predictions duplications
Juergen Schmidhuber wrote: From: Russell Standish [EMAIL PROTECTED] The only reason for not accepting the simplest thing is if it can be shown to be logically inconsistent. This far, you have shown no such thing, but rather demonstrated an enormous confusion between measure and probability distribution. Russell, this is really too much; please read any intro to measure theory, e.g., the one by Halmos. Measures in general are _not_ probability distributions. They are more than that; you can view them as _sets_ of probability distributions on sets that are related to each other in a specific way. Probability density functions define measures and therefore in general aren't probability distributions either. The uniform PDF on [0..1] yields the perfect trivial example; that's exactly why I used it before. In the computability context, compare LI/Vitanyi 1997, chapters 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5 (continuous sample space with (semi)measure M(x)). This is an excellent reference, thank you. Example 4.2.1 gives the construction of the uniform measure over the set of strings, precisely what you claim didn't exist when we started this discussion. You just want to annoy me, don't you? For the last time: There is a uniform *measure* on the strings, but no uniform probability distribution. Who is annoying who? I never claimed a uniform probability density (I actually looked up the difference between probability density and probability distribution yesterday in Li and Vitanyi) - only a uniform measure. In fact, my first comment to you was Why do you insist on it being a PDF?. Remember? The simplest prior is the uniform measure on infinite strings. 1) Halting theorem implies the set of halting programs is of measure zero within the set of all programs You mean, given a uniform measure? But why should you need the halting theorem for this trivial statement? In fact, what exactly do you mean by halting theorem? (Schoenfield's book provides a good intro to mathematical logic and computability theory.) The halting theorem is that there is no algorithm for predicting whether any program will halt. Of course, the result I was referring to is that the set of compressible strings is of measure zero. The set of halting programs actually has finite measure, since \Omega 0. However, there is still finite probability of an input string being nonhalting, and indeed the probability seems to be greater than 0.7 (ie \Omega 0.217643 - see pg 218 Li and Vitanyi), so this still presents a problem for computationalism. This is mixing up everything, in particular, different measures. 1) Infinite strings with finite program have *nonzero* measure, given the *universal* measure (Omega is a number derived from the universal measure). 2) They have measure zero, given the *uniform* measure (Omega has nothing to do with it). The uniform measure and the universal measure are related. The uniform prior is on the set of input strings to a UTM, the universal measure is the induced measure on the output strings. No. 1) here is the relevant statement, not 2). 3) To see 2) we do not need the halting theorem - we just compare the finite programs (countably many) to all strings (uncountably many) and we are done. 4) The halting theorem is no problem whatsoever when it comes to computable universes. Why care whether some computable universe stops or not? If it does, it does, otherwise it does not. In both cases we can compute it in the limit. The GP program (aka Universal Dovetailer) is not the simplest thing. The simplest describable thing is the set of all descriptions, that simply exist without being generated. That has precisely zero information or complexity, whereas the UD has some complexity of the order of a few 100 bits. The simplest prior is the uniform one, ie there is no bias whatsoever in the selection. Exist without being generated? Precisely zero information or complexity? But with respect to what? With respect to the observer. With the respect to anything in fact. The set of all possible descriptions has precisely zero information, by definition, regardless of what observer or context you employ. This is a vague informal statement, not a definition. Which are the possible descriptions? What's the formal difference between a description and a non-description? What is your anything if there is no device for describing it formally? If anything does not convey any bit of info then what exactly is it that conveys one bit? Or two? Choose an alphabet (a finite set of symbols). The set of all descriptions is the set of all infinite length strings constructed from that alphabet. I would suggest the notation A*, where A is the alphabet, but I think this usually refers to the set of finite length
Re: Predictions duplications
Juergen Schmidhuber wrote: From: Russell Standish [EMAIL PROTECTED] The only reason for not accepting the simplest thing is if it can be shown to be logically inconsistent. This far, you have shown no such thing, but rather demonstrated an enormous confusion between measure and probability distribution. Russell, this is really too much; please read any intro to measure theory, e.g., the one by Halmos. Measures in general are _not_ probability distributions. They are more than that; you can view them as _sets_ of probability distributions on sets that are related to each other in a specific way. Probability density functions define measures and therefore in general aren't probability distributions either. The uniform PDF on [0..1] yields the perfect trivial example; that's exactly why I used it before. In the computability context, compare LI/Vitanyi 1997, chapters 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5 (continuous sample space with (semi)measure M(x)). This is an excellent reference, thank you. Example 4.2.1 gives the construction of the uniform measure over the set of strings, precisely what you claim didn't exist when we started this discussion. 1) Halting theorem implies the set of halting programs is of measure zero within the set of all programs You mean, given a uniform measure? But why should you need the halting theorem for this trivial statement? In fact, what exactly do you mean by halting theorem? (Schoenfield's book provides a good intro to mathematical logic and computability theory.) The halting theorem is that there is no algorithm for predicting whether any program will halt. Of course, the result I was referring to is that the set of compressible strings is of measure zero. The set of halting programs actually has finite measure, since \Omega 0. However, there is still finite probability of an input string being nonhalting, and indeed the probability seems to be greater than 0.7 (ie \Omega 0.217643 - see pg 218 Li and Vitanyi), so this still presents a problem for computationalism. The GP program (aka Universal Dovetailer) is not the simplest thing. The simplest describable thing is the set of all descriptions, that simply exist without being generated. That has precisely zero information or complexity, whereas the UD has some complexity of the order of a few 100 bits. The simplest prior is the uniform one, ie there is no bias whatsoever in the selection. Exist without being generated? Precisely zero information or complexity? But with respect to what? With respect to the observer. With the respect to anything in fact. The set of all possible descriptions has precisely zero information, by definition, regardless of what observer or context you employ. Any traditional definition of information/simplicity requires the specification of a formal description method, such as a Turing machine. You don't need one? Then how do you define information or complexity? How exactly do you define simple ? Actually, all it needs is an equivalencing procedure. See my On Complexity and Emergence paper. A Turing machine will do the job for you, but so will a neural network or an expert system following heuristic rules. Perhaps the simplest formal approach is say that the equivalencing procedure is a function from the set of descriptions onto the integers (labelling the equivalence classes). If that function is partial recursive, then the equivalencing mechanism can be replaced by the appropriate Turing machine. Computationalism is the creed that this function must be partial recursive, but it is not necessary for the definition of information or complexity. I can see some subtle problems occurring if this equivalencing procedure is stochastic in nature, however by summing over the appropriate distributions, one should still end up with meaningful results provided the variance is not too large. Cheers Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 () Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions duplications
From: Juho Pennanen [EMAIL PROTECTED] So there may be no 'uniform probability distribution' on the set of all strings, but there is the natural probability measure, that is in many cases exactly as useful. Sure, I agree, measures are useful; I'm using them all the time. But in general they are _not_ the same thing as probability distributions. From: Russell Standish [EMAIL PROTECTED] The only reason for not accepting the simplest thing is if it can be shown to be logically inconsistent. This far, you have shown no such thing, but rather demonstrated an enormous confusion between measure and probability distribution. Russell, this is really too much; please read any intro to measure theory, e.g., the one by Halmos. Measures in general are _not_ probability distributions. They are more than that; you can view them as _sets_ of probability distributions on sets that are related to each other in a specific way. Probability density functions define measures and therefore in general aren't probability distributions either. The uniform PDF on [0..1] yields the perfect trivial example; that's exactly why I used it before. In the computability context, compare LI/Vitanyi 1997, chapters 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5 (continuous sample space with (semi)measure M(x)). 1) Halting theorem implies the set of halting programs is of measure zero within the set of all programs You mean, given a uniform measure? But why should you need the halting theorem for this trivial statement? In fact, what exactly do you mean by halting theorem? (Schoenfield's book provides a good intro to mathematical logic and computability theory.) The GP program (aka Universal Dovetailer) is not the simplest thing. The simplest describable thing is the set of all descriptions, that simply exist without being generated. That has precisely zero information or complexity, whereas the UD has some complexity of the order of a few 100 bits. The simplest prior is the uniform one, ie there is no bias whatsoever in the selection. Exist without being generated? Precisely zero information or complexity? But with respect to what? Any traditional definition of information/simplicity requires the specification of a formal description method, such as a Turing machine. You don't need one? Then how do you define information or complexity? How exactly do you define simple ?
Re: Predictions duplications
Schmidhuber: It's the simplest thing, given this use of mathematical language we have agreed upon. But here the power of the formal approach ends - unspeakable things remain unspoken. Marchal: I disagree. I would even say that it is here that the serious formal approach begins. Take unprovable for unspeakable, we can meta-reason (informally or formally) and study the logical structure of the set of unprovable sentences by sound universal machines. Schmidhuber: Unprovable does not mean unspeakable! You can spell out many unprovable things. Why care for the subset of provable sentences? Aren't we interested in the full set of all describable sentences? We can generate it, without caring for proofs at all. Unspeakable means beyond formal definition. In particular, the way we humans communicate and agree on a common formal language is not formally defined. You write x and I say, ok, it's an x, but that's all based on some sort of informal agreement that involves complex pattern recognition and learning processes. Usually we do not question the validity of outcomes of such cognitive processes, and just go ahead with formal derivations. But there is this major informal step *before* formal reasoning can start, and good textbooks on logic acknowledge this.
Re: Predictions duplications
juergen wrote: Russell, at the risk of beating a dead horse: a uniform measure is _not_ a uniform probability distribution. Why were measures invented in the first place? To deal with infinite sets. You cannot have a uniform probability distribution on infinitely many things. The last sentence is trivially true when 'probability distributions' are defined as probability measures on discrete sets. But someone could also think this is just irrelevant word-play on definitions. There are uniform probability (density) functions on infinite sets, e.g. the uniform distribution on the interval [0,1], which gives the measure (probability) t for each interval [x,x+t]. Obviously, this distribution gives measure 0 for any singleton containing only one real number. Still it's uniform, though not a 'probability distribution' if one wishes to use the most restricted definition. Similarly, there is the natural probability measure on the set of all infinite strings of 0's and 1's. It is 'uniform' in the sense that no string or place in a string is in a priviledged position. To define it, set n_0 as the set of all strings that have a 0 at the n'th place and n_1 as the set of all strings that have a 1 at the n'th place, for any n. Then set m(n_0)=m(n_1)=1/2, for all n. Using the standard definition of measure that has been cited on this list (Kolmogorov axioms), m has a unique extension which is a measure on the set of all strings. So there may be no 'uniform probability distribution' on the set of all strings, but there is the natural probability measure, that is in many cases exactly as useful. Juho
Re: Predictions duplications
From: Russell Standish [EMAIL PROTECTED] To: [EMAIL PROTECTED] I think we got into this mess debating whether an infinite set could support a uniform measure. I believe I have demonstrated this. I've yet to see anything that disabuses me of the notion that a probability distribtuion is simply a measure that has been normalised to 1. Not all measures are even normalisable. Russell, at the risk of beating a dead horse: a uniform measure is _not_ a uniform probability distribution. Why were measures invented in the first place? To deal with infinite sets. You cannot have a uniform probability distribution on infinitely many things. That's why measures isolate just finitely many things, say, every bitstring of size n, and for each x of size n look at the infinite set of strings starting with x. A uniform measure assigns equal probability to each such set. Of course, then you have a uniform probability distribution on those finitely many things which are sets. But that's not a uniform probability distribution on infinitely many things, e.g., on the bitstrings themselves! The measure above is _not_ a probability distribution; it is an infinite _set_ of _finite_ probability distributions, one for string size 0, one for string size 1, one for string size 2,... I realise that the Halting theorem gives problems for believers of computationalism. It does not. Why should it? I never subscribed to computationalism at any time, but at this stage do not reject it. I could conceive of us living in a stupendous virtual reality system, which is in effect what your GP religion Mark II is. However, as pointed out by others, it does suffer from turtle-itis, and should not be considered the null hypothesis. It requires evidence for belief. By turtle-itis you mean: in which universe do the GP and his computer reside? Or the higher-level GP2 which programmed GP? And so on? But we cannot get rid of this type of circularity - computability and mathematical logic are simply taken as given things, without any further explanation, like a religion. The computable multiverse, or the set of logically possible or mathematically possible or computable universes, represents the simplest explanation we can write down formally. But what exactly does it mean to accept something as a formal statement? What does it mean to identify the messy retina patterns caused by this text with abstract mathematical symbols such as x and y? All formal explanations in our formal papers assume that we agree on how to read them. But reading and understanding papers is a complex physical and cognitive process. So all our formal explanations are relative to this given process which we usually do not even question. Essentially, the GP program is the simplest thing we can write down, relative to the unspoken assumption that it is clear what it means to write something down, and how to process it. It's the simplest thing, given this use of mathematical language we have agreed upon. But here the power of the formal approach ends - unspeakable things remain unspoken. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Predictions duplications
[EMAIL PROTECTED] wrote: From: Russell Standish [EMAIL PROTECTED] To: [EMAIL PROTECTED] I think we got into this mess debating whether an infinite set could support a uniform measure. I believe I have demonstrated this. I've yet to see anything that disabuses me of the notion that a probability distribtuion is simply a measure that has been normalised to 1. Not all measures are even normalisable. Russell, at the risk of beating a dead horse: a uniform measure is _not_ a uniform probability distribution. Why were measures invented in the first place? To deal with infinite sets. You cannot have a uniform probability distribution on infinitely many things. That's why measures isolate just finitely many things, say, every bitstring of size n, and for each x of size n look at the infinite set of strings starting with x. A uniform measure assigns equal probability to each such set. Of course, then you have a uniform probability distribution on those finitely many things which are sets. But that's not a uniform probability distribution on infinitely many things, e.g., on the bitstrings themselves! The measure above is _not_ a probability distribution; it is an infinite _set_ of _finite_ probability distributions, one for string size 0, one for string size 1, one for string size 2,... Good grief, you've missed the point entirely! Measures do measure infinite sets - consider the set [0,1] - it is an infinite set, and the uniform probaility distribution and uniform measure are the same thing. We can also consider the set [0,\infty) which has a uniform measure on it (the same constant probability distibution as before, but extended to the full set). Clearly it cannot have a uniform probability distribution, as the set has infinite measure. You seem fixated on the peculiar definition of measure on strings that you gave before, rather than the more general notion. I realise that the Halting theorem gives problems for believers of computationalism. It does not. Why should it? Restating what I said before simply: 1) Halting theorem implies the set of halting programs is of measure zero within the set of all programs 2) Computationalism is the assumption that the universe's observer is a universal Turing machine. 3) Uniform measure over the Plenitude of descriptions implies that observer will never see simple strings. I never subscribed to computationalism at any time, but at this stage do not reject it. I could conceive of us living in a stupendous virtual reality system, which is in effect what your GP religion Mark II is. However, as pointed out by others, it does suffer from turtle-itis, and should not be considered the null hypothesis. It requires evidence for belief. By turtle-itis you mean: in which universe do the GP and his computer reside? Or the higher-level GP2 which programmed GP? And so on? But we cannot get rid of this type of circularity - computability and mathematical logic are simply taken as given things, without any further explanation, like a religion. The computable multiverse, or the set of logically possible or mathematically possible or computable universes, represents the simplest explanation we can write down formally. But what exactly does it mean to accept something as a formal statement? What does it mean to identify the messy retina patterns caused by this text with abstract mathematical symbols such as x and y? All formal explanations in our formal papers assume that we agree on how to read them. But reading and understanding papers is a complex physical and cognitive process. So all our formal explanations are relative to this given process which we usually do not even question. Essentially, the GP program is the simplest thing we can write down, relative to the unspoken assumption that it is clear what it means to write something down, and how to process it. It's the simplest thing, given this use of mathematical language we have agreed upon. But here the power of the formal approach ends - unspeakable things remain unspoken. The GP program (aka Universal Dovetailer) is not the simplest thing. The simplest describable thing is the set of all descriptions, that simply exist without being generated. That has precisely zero information or complexity, whereas the UD has some complexity of the order of a few 100 bits. The simplest prior is the uniform one, ie there is no bias whatsoever in the selection. The only reason for not accepting the simplest thing is if it can be shown to be logically inconsistent. This far, you have shown no such thing, but rather demonstrated an enormous confusion between measure and probability distribution. The set of all descriptions and the uniform measure do not suffer from turtle-itis. However, they are susceptible to criticism of the Church-Turing thesis - namely why should our idea of information in terms of strings of symbols from a discrete alphabet have
Re: Predictions duplications
From: [EMAIL PROTECTED]: [EMAIL PROTECTED] wrote: From [EMAIL PROTECTED]: [EMAIL PROTECTED] wrote: M measure: M(empty string)=1 M(x) = M(x0)+M(x1) nonnegative for all finite x. This sounds more like a probability distribution than a measure. In the set of all descriptions, we only consider infinite length bitstrings. Finite length bitstrings are not members. However, we can identify finite length bitstrings with subsets of descriptions. The empty string corresponds to the full set of all descriptions, so the first line M(empty string)=1 implies that the measure is normalisable (ie a probability distribution). Please check out definitions of measure and distribution! Normalisability is not the critical issue. Clearly: Sum_x M(x) is infinite. So M is not a probability distribution. M(x) is just measure of all strings starting with x: M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = For a measure to be normalisable, the sum over a *disjoint* set of subsets must be finite. If the set of subsets is not disjoint (ie the intersection are not empty) then the sum may well be infinite. Bringing this to the case of finite strings. Each finite string is actually a subset of the set of infinite strings, each containing the same finite prefix. So the string 101010 is actually a subset of 10101 and so on. The Sum_x M(x), where I assume x ranges over all strings will of course be infinite. However, since the set of finite strings is not disjoint, this doesn't imply M(x) is unnormalisable. Now when you realise that every finite string x is a subset of the empty string, it becomes clear that M(x) is normalised to precisely 1. The point is: prob dists and measures are different things. There is a good reason for giving them different names. Prob dists assign numbers to individual objects, not to sets. Traditional definitions as well as those for semimeasures in http://www.idsia.ch/~juergen/toesv2/ Neglecting finite universes means loss of generality though. Hence measures mu(x) in the ATOE paper do not neglect finite x: mu(empty string)=1 mu(x) = P(x)+mu(x0)+mu(x1) (all nonnegative). And here P is a probability distribution indeed! P(x)0 possible only for x with finite description. The P(x)0 case above actually breaks the countably subadditive property, so mu(x) cannot be called a measure... I'm not entirely sure what you're getting at here. The set of computable universes includes the finite ones which we may not ignore. How do they contribute to the measure of all universes? For convenience introduce a third symbol $ besides 0 and 1. Now what is the measure of the set of all universes starting with 00110? It's the sum of the measure of the set of universes starting with 001101 and the measure of the set of universes starting with 001100 and the measure of the single universe 00110$ that stops right after 00110. Once there is a $ symbol there won't be another 0 or 1. So mu(x) = P(x)+mu(x0)+mu(x1). Our favorite example is the universal Turing machine with random inputs. For finite x: P(x) is the probability that the TM output is x and nothing but x. mu(x) is something different, namely, the so-called universal measure of all strings _starting_ with x. Again mu(x) = P(x)+mu(x0)+mu(x1). Juergen Schmidhuberhttp://www.idsia.ch/~juergen/
Re: Predictions duplications
Juergen wrote (on 12th Oct): . . . In most possible futures your computer will vanish within the next second. But it does not. This indicates that our future is _not_ sampled from a uniform prior. I don't wish to comment directly on the computer-vanishing problem as it applies to Juergen's scheme (my own problem with this laudable scheme is that it appears to be vulnerable to the same 'turtle-itis' criticism as for all theistic religions - the (literal or abstract GP) 'turtle' responsible for the world seems to need another turtle to support it and so on - there is no full explanation), but I would like to say that certain other proposed solutions don't suffer from this computer-vanishing problem (also known as the WR/dragon problem), if one thinks of infinite length bit strings / formal system descriptions via Limit n - infinity, where n is the relevant string/description length (see appendix below). It seems to me that in thinking in simple infinity terms one can lose essential information (for example integers cannot be determined to be more numerous than the odd numbers - both are of the lowest order of infinity - not a problem for limit analysis). Alastair Malcolm APPENDIX One might naively think that as there are at least hundreds of possible states (call them V1, V2... ) where some part of our computer clearly vanishes (and only one (N), where normality prevails), then even if one considers bit string or other types of formal description involving other variations in our universe or indeed other universes, one could still 'divide through' to find that we are most likely to be in a universe where our computer vanishes in whole or part. However, we note that in any minimal description of our universe (and the following argument does not depend on there having to *only* be a minimal description), deviations from the actual physical laws causing V1, V2... will involve additional ad-hoc rules/events, so we can say that in complexity terms (whether as a bit string or formal system description) we will have V1 = N + VE1, V2 = N + VE2 etc, where VE1, VE2 etc are the extra segments of description required to cater for the part-vanishings (strictly: Length(V1) = Length(N) + Length(VE1) etc, but hopefully this is clear). Moreover, if we are considering all possible descriptions, we also have to allow for extra descriptions corresponding to entities beyond our observability. For each V = N + VE we will have many DC = N + DCE, where DC is a 'don't care' description - it is a universe (or set of universes) indistinguishable by us from N (our own, non-computer-vanishing one), yet objectively different. Now, the key point is that in any objective descriptive framework (whether by bit string or formal system), one should take the Limit as the description length increases to infinity, not as our (humanly biassed) visible universe is progressively and unobservably added to (say by other universes). As we do this, we are far more likely to be in a DC (= N + DCE) universe than a V (= N + VE) universe: computers don't normally vanish, in whole or in part. More details at: http://www.physica.freeserve.co.uk/p105.htm linking to: http://www.physica.freeserve.co.uk/p111.htm http://www.physica.freeserve.co.uk/p112.htm
Re: Predictions duplications
[EMAIL PROTECTED] wrote: From [EMAIL PROTECTED]: [EMAIL PROTECTED] wrote: M measure: M(empty string)=1 M(x) = M(x0)+M(x1) nonnegative for all finite x. This sounds more like a probability distribution than a measure. In the set of all descriptions, we only consider infinite length bitstrings. Finite length bitstrings are not members. However, we can identify finite length bitstrings with subsets of descriptions. The empty string corresponds to the full set of all descriptions, so the first line M(empty string)=1 implies that the measure is normalisable (ie a probability distribution). Please check out definitions of measure and distribution! Normalisability is not the critical issue. Clearly: Sum_x M(x) is infinite. So M is not a probability distribution. M(x) is just measure of all strings starting with x: M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = For a measure to be normalisable, the sum over a *disjoint* set of subsets must be finite. If the set of subsets is not disjoint (ie the intersection are not empty) then the sum may well be infinite. Bringing this to the case of finite strings. Each finite string is actually a subset of the set of infinite strings, each containing the same finite prefix. So the string 101010 is actually a subset of 10101 and so on. The Sum_x M(x), where I assume x ranges over all strings will of course be infinite. However, since the set of finite strings is not disjoint, this doesn't imply M(x) is unnormalisable. Now when you realise that every finite string x is a subset of the empty string, it becomes clear that M(x) is normalised to precisely 1. Neglecting finite universes means loss of generality though. Hence measures mu(x) in the ATOE paper do not neglect finite x: mu(empty string)=1 mu(x) = P(x)+mu(x0)+mu(x1) (all nonnegative). And here P is a probability distribution indeed! P(x)0 possible only for x with finite description. The P(x)0 case above actually breaks the countably subadditive property, so mu(x) cannot be called a measure... I'm not entirely sure what you're getting at here. Cheers Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/ Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 () Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions duplications
From [EMAIL PROTECTED]: [EMAIL PROTECTED] wrote: M measure: M(empty string)=1 M(x) = M(x0)+M(x1) nonnegative for all finite x. This sounds more like a probability distribution than a measure. In the set of all descriptions, we only consider infinite length bitstrings. Finite length bitstrings are not members. However, we can identify finite length bitstrings with subsets of descriptions. The empty string corresponds to the full set of all descriptions, so the first line M(empty string)=1 implies that the measure is normalisable (ie a probability distribution). Please check out definitions of measure and distribution! Normalisability is not the critical issue. Clearly: Sum_x M(x) is infinite. So M is not a probability distribution. M(x) is just measure of all strings starting with x: M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = Neglecting finite universes means loss of generality though. Hence measures mu(x) in the ATOE paper do not neglect finite x: mu(empty string)=1 mu(x) = P(x)+mu(x0)+mu(x1) (all nonnegative). And here P is a probability distribution indeed! P(x)0 possible only for x with finite description. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Predictions duplications
Hal Finney wrote: Isn't this fixed by saying that the uniform measure is not over all universe histories, as you have it above, but over all programs that generate universes? Now we have the advantage that short programs generate more regular universes than long ones, and the WAP grows teeth. I agree with Juergen on this: there is no uniform measure over all programs (nor on all integers). But in anycase you know I argue, unlike Juergen, that the measure should be put on all (relative) consistent computational extensions appearing in UD*. This gives indeed a sort of Weak Turing-tropic Principle. In which relative *speed* is a by product, not a prior. Bruno
Re: Predictions duplications
Hal - that is not a uniform measure! [EMAIL PROTECTED] wrote: Juergen Schmidhuber writes: But there is no uniform prior over all programs! Just like there is no uniform prior over the integers. To see this, just try to write one down. I think there is. Given a program of length l, the prior probability is 2^(-l). (That is 2 to the power of negative l.) The length of a program is defined by interpreting it using self-delimiting rules as is customary in the AIT analysis of Greg Chaitin. Hal Finney Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 () Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions duplications
Hal Finney wrote: Juergen Schmidhuber writes: But there is no uniform prior over all programs! Just like there is no uniform prior over the integers. To see this, just try to write one down. I think there is. Given a program of length l, the prior probability is 2^(-l). (That is 2 to the power of negative l.) The length of a program is defined by interpreting it using self-delimiting rules as is customary in the AIT analysis of Greg Chaitin. This doesn't seem to be very uniform to me. Maybe you mean that with the above prior the probability for a bit, drawn randomly from the set of all programs, to be 1 is 1/2 ? Saibal Saibal
Re: Predictions duplications
That is almost the correct solution, Hal. If we ask what an observer will make of a random description chosen at random, then you get regular universes with probability exponentially related to the inferred complexity. It is far clearer to see what happen when the observer is a UTM, forcibly terminating programs after a certain number of steps (representing the observer's resource bound) (thus all descriptions are halting programs). Then one obtains a Solomon-Levy distribution or universal prior. However, this argument also works when the observer is not a UTM, but simply a classification device of some kind. The WAP has nothing to do with this issue, except inasmuch as universes can only be observed through the eyes of some observer. Again I reiterate that Juergen's resource-bounded Great Programmer religion need be nothing but a reflection of our conscious selves stamped upon our observations. Cheers [EMAIL PROTECTED] wrote: Juergen writes: Some seem to think that the weak anthropic principle explains the regularity. The argument goes like this: Let there be a uniform measure on all universe histories, represented as bitstrings. Now take the tiny subset of histories in which you appear. Although the measure of this subset is tiny, its conditional measure, given your very existence, is not: According to the weak anthropic principle, the conditional probability of finding yourself in a regular universe compatible with your existence equals 1. But it is essential to see that the weak anthropic principle does not have any predictive power at all. It does not tell you anything about the future. It cannot explain away futures in which you still exist but irregular things happen. Only a nonuniform prior can explain this. Isn't this fixed by saying that the uniform measure is not over all universe histories, as you have it above, but over all programs that generate universes? Now we have the advantage that short programs generate more regular universes than long ones, and the WAP grows teeth. Hal Finney Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 () Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions duplications
According to whim or taste implies a conscious entity performing choices according to a free will. This need not be the case. In my mind, random means selected without cause (or without procedure/algorithm). A lot has been written on randomness, and its problematic nature. I don't for one minute suggest I have anything new to say on the matter. Cheers jamikes wrote: Dear Russell and Hal, this is a naive question, however it goes into the basics of your written explanations. How would YOU define random? I had this problem for a long long time and never got a satisfactory solution. In my (non Indo-European) language, Hungarian, there is no exact word for it, it is used as a term meaning according to whim or taste (tetszöleges) - which leaves open that I may not LIKE the 'random' in question. If you say: a sequence defying all rules, then it is not random, it is calculable. You have to consider all rules and cut them out. Is a series of 1... random? of course it may be. Is the pi-decimal random? no, because it is a result of calculation. If one says: just pick it that would follow all circumstantial pressure of the situation, maybe unconsciously, yet defying the 'random'. So what is the content you use behind that word? I would be surprised if both of you (and others, too) would agree G. Till then I wish you luck to use the word - at random. Best wishes John Mikes - Original Message - From: Russell Standish [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Sunday, October 14, 2001 4:36 AM Subject: Re: Predictions duplications SNIP That is almost the correct solution, Hal. If we ask what an observer will make of a random description chosen at random, then you get regular universes with probability exponentially related to the inferred complexity. It is far clearer to see what happen when the observer is a UTM, forcibly terminating programs after a certain number of steps (representing the observer's resource bound) (thus all descriptions are halting programs). Then one obtains a Solomon-Levy distribution or universal prior. However, this argument also works when the observer is not a UTM, but simply a classification device of some kind. The WAP has nothing to do with this issue, except inasmuch as universes can only be observed through the eyes of some observer. Again I reiterate that Juergen's resource-bounded Great Programmer religion need be nothing but a reflection of our conscious selves stamped upon our observations. Cheers [EMAIL PROTECTED] wrote: Juergen writes: Some seem to think that the weak anthropic principle explains the regularity. The argument goes like this: Let there be a uniform measure on all universe histories, represented as bitstrings. Now take the tiny subset of histories in which you appear. Although the measure of this subset is tiny, its conditional measure, given your very existence, is not: According to the weak anthropic principle, the conditional probability of finding yourself in a regular universe compatible with your existence equals 1. But it is essential to see that the weak anthropic principle does not have any predictive power at all. It does not tell you anything about the future. It cannot explain away futures in which you still exist but irregular things happen. Only a nonuniform prior can explain this. Isn't this fixed by saying that the uniform measure is not over all universe histories, as you have it above, but over all programs that generate universes? Now we have the advantage that short programs generate more regular universes than long ones, and the WAP grows teeth. Hal Finney -- -- Dr. Russell StandishDirector High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 () Australia[EMAIL PROTECTED] Room 2075, Red Centre http://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02 -- -- Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 () Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions duplications
In reply to Russell Standish and Juho Pennanen I'd just like to emphasize the main point, which is really trivial: by definition, a uniform measure on the possible futures makes all future beginnings of a given size equally likely. Then regular futures clearly are not any more likely than the irregular ones. I have no idea what makes Russell think this is debatable. In most possible futures your computer will vanish within the next second. But it does not. This indicates that our future is _not_ sampled from a uniform prior. Some seem to think that the weak anthropic principle explains the regularity. The argument goes like this: Let there be a uniform measure on all universe histories, represented as bitstrings. Now take the tiny subset of histories in which you appear. Although the measure of this subset is tiny, its conditional measure, given your very existence, is not: According to the weak anthropic principle, the conditional probability of finding yourself in a regular universe compatible with your existence equals 1. But it is essential to see that the weak anthropic principle does not have any predictive power at all. It does not tell you anything about the future. It cannot explain away futures in which you still exist but irregular things happen. Only a nonuniform prior can explain this. Which nonuniform prior is plausible? My favorite is the resource-optimal Speed Prior. I am hoping and expecting that someone will confirm it soon by finding a rather fast pseudorandom generator responsible for apparently noisy events in our universe. But others may be more conservative and go for the more dominant enumerable Solomonoff-Levin prior mu^M or maybe even for the nonenumerable prior mu^E (which dominates mu^M while still being computable in the limit) or maybe even for the extreme prior mu^G. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/ From [EMAIL PROTECTED] Thu Oct 11 17:36:41 2001 From: Juho Pennanen [EMAIL PROTECTED] I tried to understand the problem that doctors Schmidhuber and Standish are discussing by describing it in the most concrete terms I could, below. (I admit beforehand I couldn't follow all the details and do not know all the papers and theorems referred to, so this could be irrelevant.) So, say you are going to drop a pencil from your hand and trying to predict if it's going to fall down or up this time. Using what I understand with comp TOE, I would take the set of all programs that at some state implement a certain conscious state, namely the state in which you remember starting your experiment of dropping the pencil, and have already recorded the end result (I abreviate this conscious state with CS. To be exact it is a set of states, but that shouldn't make a difference). The space of all programs would be the set of all programs in some language, coded as infinite numerable sequences of 0's and 1's. (I do not know how much the chosen language + coding has effect on the whole thing). Now for your prediction you need to divide the implementations of CS into two sets: those in which the pencil fell down and those in which it fell up. Then you compare the measures of those sets. (You would need to assume that each program is run just once or something of the sort. Some programs obviously implement CS several times when they run. So you would maybe just include those programs that implement CS infinitely many times, and weight them with the density of CS occurrences during their running.) One way to derive the measure you need is to assume a measure on the set of all infinite sequences (i.e. on all programs). For this we have the natural measure, i.e. the product measure of the uniform measure on the set containing 0 and 1. And as far as my intuition goes, this measure would lead to the empirically correct prediction on the direction of the pencil falling. And if I understood it right, this is not too far from what Dr. Standish was claiming? And we wouldn't need any speed priors. But maybe the need of speed prior would come to play if I thought more carefully about the detailed assumptions involved? E.g. that each program would be run just once, with the same speed etc? I am not sure. Juho / Juho Pennanen Department of Forest Ecology, P.O.Box 24 FIN-00014 University of Helsinki tel. (09)191 58144 (+358-9-191 58144) GSM 040 5455 845 (+358-40-5455 845) http://www.helsinki.fi/people/juho.pennanen */ Resent-Date: Thu, 11 Oct 2001 18:57:25 -0700 From: Russell Standish [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Huh? A PDF? You mean a probability density function? On a continuous set? Probability Distribution Function. And PDF's are defined on all
Re: Predictions duplications
Juergen writes: Some seem to think that the weak anthropic principle explains the regularity. The argument goes like this: Let there be a uniform measure on all universe histories, represented as bitstrings. Now take the tiny subset of histories in which you appear. Although the measure of this subset is tiny, its conditional measure, given your very existence, is not: According to the weak anthropic principle, the conditional probability of finding yourself in a regular universe compatible with your existence equals 1. But it is essential to see that the weak anthropic principle does not have any predictive power at all. It does not tell you anything about the future. It cannot explain away futures in which you still exist but irregular things happen. Only a nonuniform prior can explain this. Isn't this fixed by saying that the uniform measure is not over all universe histories, as you have it above, but over all programs that generate universes? Now we have the advantage that short programs generate more regular universes than long ones, and the WAP grows teeth. Hal Finney
Re: Predictions duplications
I tried to understand the problem that doctors Schmidhuber and Standish are discussing by describing it in the most concrete terms I could, below. (I admit beforehand I couldn't follow all the details and do not know all the papers and theorems referred to, so this could be irrelevant.) So, say you are going to drop a pencil from your hand and trying to predict if it's going to fall down or up this time. Using what I understand with comp TOE, I would take the set of all programs that at some state implement a certain conscious state, namely the state in which you remember starting your experiment of dropping the pencil, and have already recorded the end result (I abreviate this conscious state with CS. To be exact it is a set of states, but that shouldn't make a difference). The space of all programs would be the set of all programs in some language, coded as infinite numerable sequences of 0's and 1's. (I do not know how much the chosen language + coding has effect on the whole thing). Now for your prediction you need to divide the implementations of CS into two sets: those in which the pencil fell down and those in which it fell up. Then you compare the measures of those sets. (You would need to assume that each program is run just once or something of the sort. Some programs obviously implement CS several times when they run. So you would maybe just include those programs that implement CS infinitely many times, and weight them with the density of CS occurrences during their running.) One way to derive the measure you need is to assume a measure on the set of all infinite sequences (i.e. on all programs). For this we have the natural measure, i.e. the product measure of the uniform measure on the set containing 0 and 1. And as far as my intuition goes, this measure would lead to the empirically correct prediction on the direction of the pencil falling. And if I understood it right, this is not too far from what Dr. Standish was claiming? And we wouldn't need any speed priors. But maybe the need of speed prior would come to play if I thought more carefully about the detailed assumptions involved? E.g. that each program would be run just once, with the same speed etc? I am not sure. Juho / Juho Pennanen Department of Forest Ecology, P.O.Box 24 FIN-00014 University of Helsinki tel. (09)191 58144 (+358-9-191 58144) GSM 040 5455 845 (+358-40-5455 845) http://www.helsinki.fi/people/juho.pennanen */
Re: Predictions duplications
From [EMAIL PROTECTED] : [EMAIL PROTECTED] wrote: So you NEED something additional to explain the ongoing regularity. You need something like the Speed Prior, which greatly favors regular futures over others. I take issue with this statement. In Occam's Razor I show how any observer will expect to see regularities even with the uniform prior (comes about because all observers have resource problems, incidently). The speed prior is not necessary for Occam's Razor. It is obviously consistent with it though. First of all: there is _no_ uniform prior on infinitely many things. Try to build a uniform prior on the integers. (Tegmark also wrote that ... all mathematical structures are a priori given equal statistical weight, but of course this does not make much sense because there is _no_ way of assigning equal nonvanishing probability to all - infinitely many - mathematical structures.) There is at best a uniform measure on _beginnings_ of strings. Then strings of equal size have equal measure. But then regular futures (represented as strings) are just as likely as irregular ones. Therefore I cannot understand the comment: (comes about because all observers have resource problems, incidently). Of course, alternative priors lead to alternative variants of Occam's razor. That has been known for a long time - formal versions of Occam's razor go at least back to Solomonoff, 1964. The big question really is: which prior is plausible? The most general priors we can discuss are those computable in the limit, in the algorithmic TOE paper. They do not allow for computable optimal prediction though. But the more restrictive Speed Prior does, and seems plausible from any programmer's point of view. The interesting thing is of course whether it is possible to experimentally distinguish between the speed prior and the uniform prior, and it is not at all clear to me that it is possible to distinguish between these cases. I suggest to look at experimental data that seems to have Gaussian randomness in it, such as interference patterns in split experiments. The Speed Prior suggests the data cannot be really random, but that a fast pseudorandom generator PRG is responsible, e.g., by dividing some seed by 7 and taking some of the resulting digits as the new seed, or whatever. So it's verifiable - we just have to discover the PRG method. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Predictions duplications
From [EMAIL PROTECTED] : [EMAIL PROTECTED] wrote: So you NEED something additional to explain the ongoing regularity. You need something like the Speed Prior, which greatly favors regular futures over others. I take issue with this statement. In Occam's Razor I show how any observer will expect to see regularities even with the uniform prior (comes about because all observers have resource problems, incidently). The speed prior is not necessary for Occam's Razor. It is obviously consistent with it though. First of all: there is _no_ uniform prior on infinitely many things. Try to build a uniform prior on the integers. (Tegmark also wrote that ... all mathematical structures are a priori given equal statistical weight, but of course this does not make much sense because there is _no_ way of assigning equal nonvanishing probability to all - infinitely many - mathematical structures.) I don't know why you insist on the prior being a PDF. It is not necessary. With the uniform prior, all finite sets have vanishing probability. However, all finite descriptions correspond to infinite sets, and these infinite sets have non-zero probability. Huh? A PDF? You mean a probability density function? On a continuous set? No! I am talking about probability distributions on describable objects. On things you can program. Anyway, you write ...observer will expect to see regularities even with the uniform prior but that clearly cannot be true. There is at best a uniform measure on _beginnings_ of strings. Then strings of equal size have equal measure. But then regular futures (represented as strings) are just as likely as irregular ones. Therefore I cannot understand the comment: (comes about because all observers have resource problems, incidently). Since you've obviously barked up the wrong tree here, it's a little hard to know where to start. Once you understand that each observer must equivalence an infinite number of descriptions due to the boundedness of its resources, it becomes fairly obvious that the smaller, simpler descriptions correspond to larger equivalence classes (hence higher probability). Maybe you should write down formally what you mean? Which resource bounds? On which machine? What exactly do you mean by simple? Are you just referring to the traditional Solomonoff-Levin measure and the associated old Occam's razor theorems, or do you mean something else? Of course, alternative priors lead to alternative variants of Occam's razor. That has been known for a long time - formal versions of Occam's razor go at least back to Solomonoff, 1964. The big question really is: which prior is plausible? The most general priors we can discuss are those computable in the limit, in the algorithmic TOE paper. They do not allow for computable optimal prediction though. But the more restrictive Speed Prior does, and seems plausible from any programmer's point of view. The interesting thing is of course whether it is possible to experimentally distinguish between the speed prior and the uniform prior, and it is not at all clear to me that it is possible to distinguish between these cases. I suggest to look at experimental data that seems to have Gaussian randomness in it, such as interference patterns in split experiments. The Speed Prior suggests the data cannot be really random, but that a fast pseudorandom generator PRG is responsible, e.g., by dividing some seed by 7 and taking some of the resulting digits as the new seed, or whatever. So it's verifiable - we just have to discover the PRG method. I can't remember which incompleteness result it is, but it is impossible to prove the randomness of any sequence. In order to falsify your theory one would need to prove a sequence to be random. However, of course if all known sequences are provably pseudo-random (ie compressible), then this would constitute pretty good evidence. However, this is a tall order, as there is no algorithm for generating the compression behind an arbitrary sequence. You are talking falsifiability. I am talking verifiability. Sure, you cannot prove randomness. But that's not the point of any inductive science. The point is to find regularities if there are any. Occam's razor encourages us to search for regularity, even when we do not know whether there is any. Maybe some PhD student tomorrow will discover a simple PRG of the kind I mentioned, and get famous. It is important to see that Popper's popular and frequently cited and overrated concept of falsifiability does not really help much to explain what inductive science such as physics is all about. E.g., physicists accept Everett's ideas although most of his postulated parallel universes will remain inaccessible forever, and therefore are _not_ falsifiable. Clearly, what's convincing about the multiverse theory is its simplicity, not its
Re: Predictions duplications
[EMAIL PROTECTED] wrote: So you NEED something additional to explain the ongoing regularity. You need something like the Speed Prior, which greatly favors regular futures over others. I take issue with this statement. In Occam's Razor I show how any observer will expect to see regularities even with the uniform prior (comes about because all observers have resource problems, incidently). The speed prior is not necessary for Occam's Razor. It is obviously consistent with it though. The interesting thing is of course whether it is possible to experimentally distinguish between the speed prior and the uniform prior, and it is not at all clear to me that it is possible to distinguish between these cases. Cheers Dr. Russell Standish Director High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile) UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 () Australia[EMAIL PROTECTED] Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02
Re: Predictions duplications
Juergen Schmidhuber wrote: We need a prior probability distribution on possible histories. OK. I agree with that. But of course we differ on the meaning of possible histories. And we tackle also the prior probability in quite different ways. Then, once we have observed a past history, we can use Bayes' rule to compute the probability of various futures, given the observations. Then we predict the most probable future. It seems to me that Bayes' rule is what we *need* to build a plausible *past* before, and concerning the future, and even the present, I prefer a theory, or a philosophy (or religion), hoping it can put light on the unknown. The algorithmic TOE paper discusses several formally describable priors computable in the limit. But for the moment let me focus on my favorite, the Speed Prior. Mmh... You know that there exist class of infinitely speedable programs (and theories), and even that those class correspond to a superclass of universal machines. This entails the UD, the Universal Dovetailer, which I like to call sometimes the 'splashed Universal Turing Machine', (which btw makes UD* sort of denotational semantics of the UTM) could generate and execute quicker and quicker version of itself. This means you need a non universal ontology. For helping other to understand the point I propose to await discussing this point after I give the solution of the problem in diagonalisation 1: http://www.escribe.com/science/theory/m3079.html Assume the Great Programmer has a resource problem, just like any other programmer. You told Wei Dai not taking the Great Programmer to seriously. I am not sure I can imagine the Great Programmer having resource problem. Unless you are a finitist! Knowing that the UTM emulating the UD will use an unboundable part of its tape ... How could the GP have a resource problem when he is the one building universes in which resource problem can happen (apparently)? Are you not presuposing some physical structure in advance somewhere? According to the Speed Prior, the harder it is to compute something, the less likely it gets. More precisely, the cumulative prior probability of all data whose computation costs at least O(n) time is at most inversely proportional to n. Very interesting idea. I would say: the harder it is to compute something FROM here and now the less likely it gets from here and now. And I would add that this is only true if your neighborhood is less complex than yourself. I really believe your idea is interesting for some low level computational physics. But when agent, including yourself, enters in the play, its another matter, because it is intrinsically harder to compute things then. To evaluate the plausibility of this, consider your own computer: most processes just take a few microseconds, some take a few seconds, few take hours, very few take days, etc... ? In the work of the GP, that is in UD*, most processes takes billions of googolplex kalpa ... A lot of processes engaged by the UD simply does not stop. Some are interesting and perhaps Turing Universal like the process raising better and better approximations of the Mandelbrot set. We do not know the Great Programmer's computer and algorithm for computing universe histories and other things. Still, we know that he cannot do it more efficiently than the asymptotically fastest algorithm. Are you really sure about that. If the GP is really *the* GP, existing by Church Thesis, I think it follows from Blum Speed-up Theorem that it has no asymptotically fastest algorithm. But you are talking of our little program generated and executed by the UD perhaps. If this one has an asymptotically fastest algorithm, are you sure it still can emulate a UTM? I bet the universe is universal. Also: how could we know and test we are in a quick or slow version of that little program. Is that not (absolutely) undecidable. Should that program really be run? If yes, are you not presupposing again a physical universe? The Speed Prior reflects properties of precisely this optimal algorithm. It turns out that the best way of predicting future y, given past x, is to minimize the (computable) Kt-complexity of (x,y). As observation size grows, the predictions converge to the optimal predictions. As observation grows knowledge grows, and we grows making everything more complex that before. I guess you mean optimal predictions at some level. With comp the more knowledge grows, the more ignorance grows. To address Bruno Marchal's questions: Many of my duplications in parallel universes with identical past x will experience different futures y. None of my copies knows a priori which one it is. But each does know: Those futures y with high Kt(x,y) will be highly unlikely; those with low Kt(x,y) won't. It is indeed hard to write a little program which generate a long Kolmogorov-Chaitin incompressible 01 string. But, as I told you earlier, it is quite easy to write a little program which