The Swiss AI Lab IDSIA: 20th Anniversary / Jobs
The 20 year anniversary celebration of the Swiss AI Lab IDSIA will take place on 24 October 14:00-18:00 in Palazzo dei Congressi in the center of Lugano, Switzerland, during Ticino Informatica 2008: http://www.idsia.ch/Files/idsia20/Program.html Distinguished keynote speakers: 1. Mauro Dell'Ambrogio, Swiss State Secretary for Education and Research 2. Prof. Marco Dorigo, Pioneer of Ant Colony Optimization and Swarm Intelligence 3. Prof. Helge Ritter, A pioneer of Self-Organizing Maps; Leibniz Laureate Panel: The Future of AI Registration is free! Please register here: http://www.idsia.ch/Files/idsia20/Register.html? Juergen Schmidhuber http://www.idsia.ch/~juergen/ PS: Current jobs at IDSIA (all hired persons will closely interact): 1 Postdoc and 1 PhD student in biologically plausible reinforcement learning: http://www.idsia.ch/~juergen/sinergia2008.html 1 Postdoc and 1 Developer in handwriting recognition: http://www.idsia.ch/~juergen/kti2008.html 1 related Postdoc in medical image analysis: http://www.idsia.ch/~luca/PostDoc_2008_IDSIA_IN3.html 1 PhD fellowship in evolutionary computation: http://www.idsia.ch/~tino/evo2008.html More jobs on artificial curiosity and learning robots to be announced soon: http://www.idsia.ch/~juergen/interest.html http://www.idsia.ch/~juergen/learningrobots.html smime.p7s Description: S/MIME cryptographic signature
Re: Wolfram 2,3 Turing Machine
Impressive result by Alex Smith! Funny though how Wolfram's web sites on this print Wolfram's name in larger font and more frequently than Smith's, even trying to sell this as New Kind Of Science although it's just a continuation of a decades-old search for small universal Turing machines :-) BTW, check out Marcus Hutter's older posting to the Kolmogorov Complexity mailing list on whether such machines should really count as UTMs or not: http://mailman.ti-edu.ch/pipermail/kolmogorov/2007/000245.html JS http://www.idsia.ch/~juergen/computeruniverse.html http://www.idsia.ch/~juergen/wolfram.html On Oct 24, 2007, at 8:32 PM, Tom Caylor wrote: We're excited to announce that the $25,000 Wolfram 2,3 Turing Machine Research Prize has been won. Alex Smith, a 20-year-old undergraduate in Birmingham, UK, has given a 40-page proof that Wolfram's 2,3 Turing machine is indeed universal. This result ends a half-century quest to find the simplest possible universal Turing machine. It also provides strong further evidence for Wolfram's Principle of Computational Equivalence. The official prize ceremony is planned for November at Bletchley Park, UK, site of Alan Turing's wartime work. For more information about the prize and the solution, see: http://www.wolframprize.org Stephen Wolfram has posted his personal reaction to the prize at: http://blog.wolfram.com/2007/10/the_prize_is_won_the_simplest.html smime.p7s Description: S/MIME cryptographic signature
Re: measure problem
Hi Max, in this particular universe it's going well, thank you! As promised, I had a look at your paper. I think it is well written and fun to read. I've got a few comments though, mostly on the nature of math vs computation, and why Goedel is sexy but not an issue when it comes to identifying possible mathematical structures / universes / formally describable things. I think some of the comments are serious enough to affect the conclusions. Some come with quotes from papers in http://www.idsia.ch/~juergen/computeruniverse.html where several of your main issues are addressed. Some are marked by Serious. I am making a cc to the everythingers, although it seems they are mostly interested in other things now - probably nobody is really going to read this tedious response which became much longer than I anticipated. 1. An abstract baggage-free mathematical structure does not exist any more than a baggage-free computer - the particular axiomatic system you choose is like the set of primitive instructions of the computer you choose. Not very serious, since for general computers and general axiomatic systems there are invariance theorems: changing the baggage often does not change a lot, so to speak. But it should be mentioned. 2. p 11: you say that data sampled from Gaussian random variables is incompressible - NOT true - give short codes to probable events (close to the mean), long codes to rare events (Huffman coding). 3. same sentence: how to test what inflation predicts? How to test whether the big bang seed was really random, not pseudo-random? The second million bits of pi look random but are not. We should search for short programs compressing the apparent randomness: http://www.idsia.ch/~juergen/randomness.html 4. p 15: Mathematical structure (MS) just exists. Is that so? Others will look at your symbols and say they are just heaps of chalk on a blackboard, and you need a complex, wet pattern recognition system to interpret them. Here's where beliefs enter... 5. p 18: mathematical structures, formal systems and computations are aspects of one underlying transcendent structure whose nature we don't fully understand But we do! I'd say there are NO serious open problems with your figure 5 - formal systems vs math vs computation is a well-explored field. More about this below. The 2000 paper (your nr 17) exploits this understanding; it turns out the most convenient way to deal with the measure problem is the computer science way (right hand corner of your figure 5). As I wrote in the 2000 paper: http://arxiv.org/abs/quant-ph/0011122 The algorithmic approach, however, offers several conceptual advantages: (1) It provides the appropriate framework for issues of information-theoretic complexity traditionally ignored in pure mathematics, and imposes natural complexity-based orderings on the possible universes and subsets thereof. (2) It taps into a rich source of theoretical insights on computable probability distributions relevant for establishing priors on possible universes. Such priors are needed for making probabilistic predictions concerning our own particular universe. Although Tegmark suggests that ``... all mathematical structures are a priori given equal statistical weight'' [#!Tegmark:98!#](p. 27), there is no way of assigning equal nonvanishing probability to all (infinitely many) mathematical structures. Hence we really need something like the complexity-based weightings discussed in [#!Schmidhuber:97brauer!#] and especially the paper at hand. (3) The algorithmic approach is the obvious framework for questions of temporal complexity such as those discussed in this paper, e.g., ``what is the most efficient way of simulating all universes?'' 6. Serious: run the sim, or just describe its program? Are you sure you know what you want to say here? What's the precise difference between program bitstrings and output bitstrings? The bitstrings generated by the programs (the descriptions) are just alternative descriptions of the universes, possibly less compact ones. You as an external observer may need yet another program that translates the output bits (typically a less compressed description) into video or something, to obtain the description your eyes want. Note that the 2000 paper and the 2002 journal variant don't really care for time evolution, just for descriptions - within the bitstrings maybe there is an observer who thinks he knows what's time, but to the outsider his concept of time may be irrelevant. (Unlike the 1997 paper, the 2000/2002 papers do not focus on a one to one mapping between physical and computational time steps, otherwise we'd miss all the universes where the concept of time is irrelevant.) Here's what I wrote at the end: After all, algorithmic theories of the describable do encompass everything we will ever be able to talk and write about. Other things are simply beyond description. 7. Serious: p 18 CUH: what's your def of computable? You mean
Zuse Symposium: Is the universe a computer? Berlin Nov 6-7
Dear colleagues, many interesting talks at the Zuse Symposium: Is the universe a computer? Berlin Nov 6-7, 2006 http://www.dtmb.de/Aktuelles/Aktionen/Informatikjahr-Zuse/ Best regards, -JS http://www.idsia.ch/~juergen/computeruniverse.html --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Everything List group. To post to this group, send email to everything-list@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/everything-list -~--~~~~--~~--~--~---
probabilities measures computable universes
I browsed through recent postings and hope this delayed but self-contained message can clarify a few things about probabilities and measures and predictability etc. What is the probability of an integer being, say, a square? This question does not make sense without a prior probability distribution on the integers. This prior cannot be uniform. Try to find one! Under _any_ distribution some integers must be more likely than others. Which prior is good? Is there a `best' or `universal' prior? Yes, there is. It assigns to each integer n as much probability as any other computable prior, save for a constant factor that does not depend on n. (A computable prior can be encoded as a program that takes n as input and outputs n's probability, e.g., a program that implements Bernoulli's formula, etc.) Given a set of priors, a universal prior is essentially a weighted sum of all priors in the set. For example, Solomonoff's famous weighted sum of all enumerable priors will assign at least as much probability to any square integer as any other computable prior, save for a constant machine-dependent factor that becomes less and less relevant as the integers get larger and larger. Now let us talk about all computable universe histories. Some are finite, some infinite. Each has at least one program that computes it. Again there is _no_ way of assigning equal probability to all of them! Many are tempted to assume a uniform distribution without thinking much about it, but there is _no_ such thing as a uniform distribution on all computable universes, or on all axiomatic mathematical structures, or on all logically possible worlds, etc! (Side note: There only is a uniform _measure_ on the finitely many possible history _beginnings_ of a given size, each standing for an uncountable _set_ of possible futures. Probabilities refer to single objects, measures to sets.) It turns out that we can easily build universal priors using Levin's important concept of self- delimiting programs. Such programs may occasionally execute the instruction request new input bit; the bit is chosen randomly and will remain fixed thereafter. Then the probability of some universe history is the probability of guessing a program for it. This probability is `universal' as it does not depend much on the computer (whose negligible influence can be buried in a constant universe-independent factor). Some programs halt or go in an infinite loop without ever requesting additional input bits. Universes with at least one such short self-delimiting program are more probable than others. To make predictions about some universe, say, ours, we need a prior as well. For instance, most people would predict that next Tuesday it won't rain on the moon, although there are computable universes where it does. The anthropic principle is an _insufficient_ prior that does not explain the absence of rain on the moon - it does assign cumulative probability 1.0 to the set of all universes where we exist, and 0.0 to all the other universes, but humans could still exist if it did rain on the moon occasionally. Still, many tend to consider the probability of such universes as small, which actually says something about their prior. We do not know yet the true prior from which our universe is sampled - Schroedinger's wave function may be an approximation thereof. But it turns out that if the true prior is computable at all, then we can in principle already predict near-optimally, using the universal prior instead: http://www.idsia.ch/~juergen/unilearn.html Many really smart physicists do not know this yet. Technical issues and limits of computable universes are discussed in papers available at: http://www.idsia.ch/~juergen/computeruniverse.html Even stronger predictions using a prior based on the fastest programs (not the shortest): http://www.idsia.ch/~juergen/speedprior.html -Juergen Schmidhuber
everything talk - new AI
I put everything talk slides on the web: http://www.idsia.ch/~juergen/everythingtalk/ http://www.idsia.ch/~juergen/computeruniverse.html And also a recent overview article: The New AI: General Sound Relevant for Physics J. Schmidhuber. TR IDSIA-04-03, arxiv:cs.AI/0302012 To appear in B. Goertzel and C. Pennachin, eds.: Artificial General Intelligence http://www.idsia.ch/~juergen/newai/newai.html Abstract. Most traditional artificial intelligence (AI) systems of the past 50 years are either very limited, or based on heuristics, or both. The new millennium, however, has brought substantial progress in the field of theoretically optimal and practically feasible algorithms for prediction, search, inductive inference based on Occam's razor, problem solving, decision making, and reinforcement learning in environments of a very general type. Since inductive inference is at the heart of all inductive sciences, some of the results are relevant not only for AI and computer science but also for physics, provoking nontraditional predictions based on Zuse's thesis of the computer-generated universe. Comments welcome! Juergen Schmidhuberhttp://www.idsia.ch/~juergen/
Re: Zuse's thesis web site
Thanks for your comments, Hal! Your point on the dovetailer is well taken. I modified the text: If Zuse's thesis is correct, then which is our universe's program? It may come as a surprise to some that we already know at least one algorithm that does compute our universe (and several others)... Ed Clark has a nice review page on Wolfram's book: http://www.math.usf.edu/~eclark/ANKOS_reviews.html It includes Scott Aaronson's interesting review which also addresses the issue of Bell's inequality. Best, Juergen http://www.idsia.ch/~juergen/digitalphysics.html Juergen Schmidhuber writes: I welcome feedback on a little web page on Zuse's 1967 thesis (which states that the universe is being computed on a cellular automaton): http://www.idsia.ch/~juergen/digitalphysics.html That's very interesting; I was not aware of Zuse. Unfortunately I don't know German so I can't read his paper. Regarding the question of the compatibility of CA models with relativity and QM, Wolfram looks into this in some detail. He essentially abandons a simple CA model in favor of a more complex network of interacting nodes, which has some features similar to the Lorentz transformation of relativity. Then to address the EPR style long-distance correlations of QM, he proposes that while the network is mostly local, it has occasional nodes which get stretched apart and are connected to distant nodes. These are rare but allow for the type of information flow necessary to reproduce long-distance QM correlations. All in all it is a pretty ad hoc and unconvincing model. I tried to read the t'Hooft paper referenced here but it was over my head. It also struck me though as not really addressing the discrepancy between long-distance correlations and local CA models. It seems very much an open and difficult question to me to show how a local CA model can reproduce relativity and QM. One issue which CA models tend to ignore is the MWI. Most CA models are built as hidden variable theories which define a single universe. Some multiverse models have that structure as well. But it seems to me that this is an entirely unnecessary restriction. If a CA can model a universe, it can model a multiverse, and likewise with any other computing model like TMs. The MWI is fully deterministic, which may make it a more attractive target for modelling with a deterministic computational theory than attempting to reproduce the statistical phenomena of QM, essentially via hidden variables. Any hidden variable theory, CA based or not, has two strikes against it from the beginning due to the the many well known difficulties of Bell inequalities and EPR correlations. Regarding entropy, it is pointed out that entropy does not grow in a CA model. Wolfram discusses this as well. While entropy technically does not grow, you can get phenomena that look very much like entropy growth in a CA model. Eventually you will get a Poincare recurrence if the universe is finite. But if you start in a sufficiently simple state, there are many CA models which will mimic entropy growth into a more complex state. And this may be close enough to explain our universe. Alternatively, of course the MWI as a deterministic theory also does not have entropy growth. As mentioned above, computational models of our universe might well do better to aim towards an MWI world. As far as the claim that we already know the algorithm that runs our universe, and it is the UD: I think this is amusing but ultimately misleading. It's true that a dovetailer which runs all programs will indeed run our own universe's program (assuming it has one), but I think it is a misuse of terminology to say that the UD is the algorithm that is running our universe. I would reserve that phrase to refer to the specific program that generates our universe and no others. It will be a tremendous accomplishment of physics and philosophy when that program is discovered, but it is misleading to give the impression that we already know what it is. I think a better terminology here would be something like, we don't need to know the specific program that describes our universe in order to imagine how to program a computer that would in fact generate our experiences, at least in theory. And then go on and explain about running all programs at once, etc. Hal Finney
Zuse's thesis web site
I welcome feedback on a little web page on Zuse's 1967 thesis (which states that the universe is being computed on a cellular automaton): http://www.idsia.ch/~juergen/digitalphysics.html Juergen Schmidhuber
Re: wheeler walked away; Zuse ...
[EMAIL PROTECTED] wrote: as for your point in your post about wheeler attaching his name to the theory, I think its ok for proponents and not originators of a theory to be named along with it. for example lately Ive been referring to the fredkin-wolfram thesis. fredkin is far more the originator; wolfram is far more the proponent. seems to me the everett-wheeler theory can be fairly seen in the same way. just to set the record straight: the originator was Zuse. Here is a link to a PDF scan of his 1967 paper (on the universe running on a cellular automaton): http://www.idsia.ch/~juergen/zuse.html
colt
Physics is an inductive science. You try to find a law that matches the data, and hope that it generalizes on unseen data. When asked about the nature of their inductive business, most physicists refer to Popper's ideas from the 1930s (falsifiability etc), and sometimes to Occam's razor: simple explanations are better explanations. Most physicists, however, are not even aware of the fact that there is a formal basis for their inductive science, provided by the field of computational learning theory (COLT), in particular, the theory of universal induction. The contents of the following COLT paper may be old news to some on this list. -- The Speed Prior: a new simplicity measure yielding near-optimal computable predictions (Juergen Schmidhuber, IDSIA) In J. Kivinen and R. H. Sloan, eds, Proc. 15th Annual Conf. on Computational Learning Theory (COLT), 216-228, Springer, 2002; based on section 6 of http://arXiv.org/abs/quant-ph/0011122 (2000) http://www.idsia.ch/~juergen/speedprior.html ftp://ftp.idsia.ch/pub/juergen/colt.ps Solomonoff's optimal but noncomputable method for inductive inference assumes that observation sequences x are drawn from an recursive prior distribution mu(x). Instead of using the unknown mu(x) he predicts using the celebrated universal enumerable prior M(x) which for all x exceeds any recursive mu(x), save for a constant factor independent of x. The simplicity measure M(x) naturally implements Occam's razor and is closely related to the Kolmogorov complexity of x. However, M assigns high probability to certain data x that are extremely hard to compute. This does not match our intuitive notion of simplicity. Here we suggest a more plausible measure derived from the fastest way of computing data. In absence of contrarian evidence, we assume that the physical world is generated by a computational process, and that any possibly infinite sequence of observations is therefore computable in the limit (this assumption is more radical and stronger than Solomonoff's). Then we replace M by the novel Speed Prior S, under which the cumulative a priori probability of all data whose computation through an optimal algorithm requires more than O(n) resources is 1/n. We show that the Speed Prior allows for deriving a computable strategy for optimal prediction of future y, given past x. Then we consider the case that the data actually stem from a nonoptimal, unknown computational process, and use Hutter's recent results to derive excellent expected loss bounds for S-based inductive inference. Assuming our own universe is sampled from S, we predict: it won't get many times older than it is now; large scale quantum computation won't work well; beta decay is not random but due to some fast pseudo-random generator which we should try to discover. Juergen Schmidhuberhttp://www.idsia.ch/~juergen
the short program
Tim May wrote: One thing that Tegmark got right, I think, is the notion that a lot of branches of mathematics and a lot of mathematical structures probably go into making up the nature of reality. This is at first glance dramatically apposite the ideas of Fredkin, Toffoli, Wheeler1970, and Wolfram on the generation of reality from simple, local rules. Wolfram has made a claim in interviews, and perhaps somewhere in his new book, that he thinks the Universe may be generated by a 6-line Mathematica program! I'd like to point out that it was Konrad Zuse himself (inventor of the computer: http://www.idsia.ch/~juergen/zuse.html ) who was the first to propose that the physical universe is running on a grid of simple computers, each communicating with its neighbors: a cellular automaton. He called this Rechnender Raum, which means Computing Space. As always, Zuse was way ahead of his time. Perhaps the best reference is: Zuse, Konrad: Rechnender Raum, Schriften zur Datenverarbeitung, Band 1. Friedrich Vieweg Sohn, Braunschweig (1969). And here is the simple method for computing all universes, including ours, if it is computable indeed (adapted from page 1 of the 97 paper http://www.idsia.ch/~juergen/everything/html.html ): Order all programs lexicographically. The first is run for one instruction every second step, the next for one instruction every second of the remaining steps, and so on. As a by-product, this program also outputs descriptions of all formally describable mathematical systems, and all proofs of all their theorems. A bit of thought shows that the method even has the optimal order of complexity. For example, it outputs our universe history as quickly as the history's fastest program, save for a (possibly huge) constant that does not depend on output size. Juergen Schmidhuber http://www.idsia.ch/~juergen/
Re: Optimal Prediction
Wei Dai wrote: BTW, isn't the justification for universal prediction taken in this paper kind of opposite to the one you took? The abstract says The problem, however, is that in many cases one does not even have a reasonable guess of the true distribution. In order to overcome this problem ... Your papers on the other hand assume that the true distribution can be known and proposed that it must be the Speed Prior. (Later you said you believe it is the Speed Prior with probability 1.) Is this still your position? Well, I prefer to recall my more cautious fallback position - let us just write down the assumptions, and derive the consequences. One does not really have to know the true prior; one just needs an upper bound on its power. Choose some prior P, and assume the universe is sampled from a prior less dominant than P. Then predictions according to P will be rather accurate. For example, suppose the process computing the universe is not optimally efficient for some reason. As long as the resource postulate holds the true prior cannot dominate the Speed Prior, and S-based predictions will be fine. Juergen
Re: Optimal Prediction
Bill Jefferys wrote: At 10:59 AM +0200 4/3/02, Juergen Schmidhuber wrote: The theory of inductive inference is Bayesian, of course. But Bayes' rule by itself does not yield Occam's razor. By itself? No one said it did. Of course assumptions must be made. At minimum one always has to choose priors in Bayesian inference. Our paper shows that there is a Bayesian interpretation that yields something very suggestive of Ockham's razor. It is appealing in that if one has a simple versus a complex hypothesis, simple meaning that the prior probability is concentrated and complex meaning that it is vague and spread out, simple meaning that you don't have many knobs to tweak, complex meaning the opposite, then the simple hypothesis will be favored over the complex one unless the data lie well away from where the simple hypothesis has placed its bets. Bayesians distinguish this from Ockham's formulation by calling it the Bayesian Ockham's razor, recognizing that it is not what William of Ockham wrote, Entia non sunt multiplicanda sine necessitate (or one of his other genuine formulations). Please don't read more into our article than is there. By itself. First you said that the AP by itself has no predictive power. I missed the by itself so misunderstood you, but when I understood what you were saying I agreed. Now you say that Bayes' rule by itself does not yield Ockham's razor. Jim and I never said that it did. I am hard pressed to see how anything nontrivial relating to the real world can be gotten from any principle by itself, so I don't regard these comments as very profound, or very useful. [Remainder of article snipped] Bill One has to choose priors. Exactly. To repeat the nontrivial point: The only choice you need to obtain Occam's razor is to restrict yourself to priors computable in the limit (this is not much of a restriction in the sense that other priors are beyond constructive description anyway). Then you'll automatically prefer few knobs to tweak, or, more generally, short descriptions of the observations: http://www.idsia.ch/~juergen/toesv2/ Juergen (will be out of town until April 16)
Re: Optimal Prediction
Bill Jefferys wrote: At 9:19 AM +0100 3/27/02, Juergen Schmidhuber wrote: You are claiming the AP necessarily implies a specific fact about nuclear energy levels? I greatly doubt that - can you give a proof? Yes, I can. http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1954ApJS1..121Hdb_key=ASThigh=06070 This is the classic paper where Hoyle shows this. It has been cited repeatedly in the literature on the AP as an example of a genuine prediction of the AP. For example, Barrow and Tipler cite it in their book. Being sceptical about proof by citation, I failed to find such a mathematical proof in this paper. Its predictions are based on approximations and assumptions (the authors do acknowlege this) besides the AP. Where exactly do you think is the proof? Predictions that later turn out to be correct demonstrate predictive power of the sum of all assumptions, not of AP by itself. I don't care about flying rabbits. Stick to the issue at hand. This was the issue in the original message. At the risk of beating a dead horse: plain AP does not exclude continuations of your world with different physical laws, as long as your self survives. Who knows - maybe you could live on in a crude simulation that completes ignores the concept of energy levels. To make nontrivial predictions you clearly need more than the AP. The theory of inductive inference provides what is missing: http://www.idsia.ch/~juergen/unilearn.html Juergen
Re: Optimal Prediction
Bill Jefferys wrote: It's pointless wasting my time on this. As both Russell and I pointed out, this is a standard example that is cited by people who are knowledgeable about the AP. Either you have a different definition of predictive power than the rest of us do, or you don't understand Hoyle's very clearly written paper. In either case, it would be foolish of me to continue the thread. Goodbye. plonk Bill Don't try to teach a pig to sing. It wastes your time and annoys the pig. Predictive power is measurable by standard concepts of probability theory and complexity theory. You may choose to ignore this, but don't include all those who don't among the rest of us. Write down all assumptions, derive the consequences, and observe that the AP _by itself_ cannot predict anything nontrivial. Fortunately Hoyle was more careful than some who cite him - he just wrote: the results...were obtained subject to certain assumptions ...have not been demonstrated in a manner free from all doubt. Nevertheless, the number of assumptions made was much less than the number of results obtained. Thus he informally invoked Occam's razor: find short descriptions that explain a lot. Occam's razor is not the AP. It is formally treated by the theory of inductive inference. Although this theory is at the heart of what physicists are doing, some of them are not yet fully aware of it. Juergen(will be out of town for a while)
Optimal Prediction
Many on this list have discussed the anthropic principle. It essentially says that the conditional probability of finding yourself in a universe compatible with your existence equals 1. But unfortunately the anthropic principle does not have any predictive power. It does NOT predict there won't be any flying rabbits tomorrow. Clearly we need more than the anthropic principle to explain the predictability of our universe. In particular, we need an optimal way of predicting the future, given past observations. And there is one! Normally we do not know the true conditional probability distribution p(next event | past). But assume we do know that p is in some set P of distributions. Choose a fixed weight w_q for each q in P such that the w_q add up to 1 (for simplicity, let P be countable). Then construct the Bayesmix M(x) = Sum_q w_q q(x), and predict using M instead of the optimal but unknown p. How wrong is it to do that? The recent exciting work of my postdoc Marcus Hutter (IDSIA) provides general and sharp (!) loss bounds: Let LM(n) and Lp(n) be the total expected losses of the M-predictor and the p-predictor, respectively, for the first n events. Then LM(n)-Lp(n) is at most of the order of sqrt[Lp(n)]. That is, M is not much worse than p. And in general, no other predictor can do better than that! In particular, if p is deterministic, then the M-predictor soon won't make any errors any more. If P contains ALL recursively computable distributions, then M becomes the celebrated enumerable universal prior. That is, after decades of somewhat stagnating research we now have sharp loss bounds for Solomonoff's universal (but incomputable) induction scheme. And if we also include the distributions computable in the limit, we get sharp loss bounds for the even more general priors introduced in Algorithmic Theories of Everything: http://www.idsia.ch/~juergen/toesv2/ Similarly, if we replace M by the Speed Prior S - where S(x) is small if x is hard to compute by any method - we obtain appropriate loss bounds for computable S-based induction: http://www.idsia.ch/~juergen/toesv2/node32.html Alternatively, reduce M to what you get if you just add up weighted estimated future finance data probabilities generated by 1000 commercial stock-market prediction software packages. If only one of them happens to work fine (but you do not know which) you still should get rich. To learn more, please read Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet:ftp://ftp.idsia.ch/pub/techrep/IDSIA-02-02.ps.gz and also check out Hutter's other recent papers at ICML, ECML, NIPS, Int. J. of Foundations of CS: www.idsia.ch/~marcus Juergen Schmidhuber http://www.idsia.ch/~juergen/
Re: Does provability matter?
Wei Dai wrote: I don't understand how you can believe that the probability of more dominant priors is zero. That implies if I offered you a bet of $1 versus your entire net worth that large scale quantum computation will in fact work, you'd take that bet. Would you really? Your dollar against my 2 cents worth? I don't want to rip you off! Ok, let me adopt a more careful Bayesian's fallback position: I just assume certain things, then derive consequences, given the assumptions, without precisely quantifying the plausibility of the assumptions themselves, letting the reader decide whether they are plausible or not. Juergen
Re: Does provability matter?
Wei Dai wrote: I'm not sure I understand this. Can you give an example of how our universe might make use of an entire continuum of real numbers? How might someone show this if it were true? I have no idea. In fact, I guess it is impossible. But if there is a formally describable prior that dominates the speed prior, and you agree that the more dominant prior doesn't have a prior probability of zero, then isn't the speed prior redundant? Wouldn't you get equal posterior probabilities (up to a constant multiple) by dropping the speed prior from your prior on priors, no matter what it assigns to priors that are not formally describable? In the Bayesian framework we derive consequences of assumptions represented as priors. The stronger the assumptions, the more specific the predictions. The Speed Prior assumption is stronger than the assumption of a formally describable prior. It is not redundant because it yields stronger predictions such as: The computer computing our universe won't compute much more of it; large scale quantum computation won't work; etc. In fact, I do believe the Speed Prior dominates the true prior from which our universe is sampled (which is all I need to make good computable predictions), and that the probability of even more dominant priors is zero indeed. But as a Bayesian I sometimes ignore my beliefs and also derive consequences of more dominant priors. I do find them quite interesting, and others who do not share my belief in the Speed Prior might do so too. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Does provability matter?
Wei Dai wrote: On Thu, Nov 15, 2001 at 10:35:58AM +0100, Juergen Schmidhuber wrote: Why do you prefer the Speed Prior? Under the Speed Prior, oracle universes are not just very unlikely, they have probability 0, right? Suppose one day we actually find an oracle for the halting problem, or even just find out that there is more computing power in our universe than is needed to explain our intelligence. Would you then (1) give up the Speed Prior and adopt a more dominant prior, or (2) would you say that you've encountered an extremely unlikely event (i.e. more likely you're hallucinating)? If you answer (1) then why not adopt the more dominant prior now? Why not adopt a more dominant prior now? I just go for the simplest explanation consistent with available data, where my simplicity measure reflects what computer scientists find simple: things that are easy to compute. You didn't explicitly answer my question about what you would do if you saw something that is very unlikely under the Speed Prior but likely under more dominant priors, but it seems that your implicit answer is (1). In that case you must have some kind of prior (in the Baysian sense) for the Speed Prior vs. more dominant priors. So where does that prior come from? Or let me put it this way. What do you think is the probability that the Great Programmer has no computational resource constraints? If you don't say the probability is zero, then what you should adopt is a weighted average of the Speed Prior and a more dominant prior, but that average itself is a more dominant prior. Do you agree? If someone can use large scale quantum computing to solve a problem whose solution requires much more computation time than necessary to compute the history of our own particular universe, then one could take this as evidence against the Speed Prior. You are right: to quantify this evidence in a Bayesian framework we need a prior on priors. Which one? Hm. Let me extend your question and ask: what's the probability that the Great Programmer is more than a mere programmer in the sense that he is not bound by the limits of computability? For instance, if someone were able to show that our universe somehow makes use of an entire continuum of real numbers we'd be forced to accept some even more dominant prior that is not even computable in the limit. We could not even formally specify it. So what's my prior on all priors? Since the attempt to answer such a question might lead outside what's formally describable, I'll remain silent for now. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Does provability matter?
Wei Dai wrote: Thanks for clarifying the provability issue. I think I understand and agree with you. On Tue, Nov 13, 2001 at 12:05:22PM +0100, Juergen Schmidhuber wrote: What about exploitation? Once you suspect you found the PRG you can use it to predict the future. Unfortunately the prediction will take enormous time to stabilize, and you never can be sure it's finished. So it's not very practical. By exploiting the fact that we're in an oracle universe I didn't mean using TMs to predict the oracle outputs. That is certainly impractical. There are a couple of things you could do though. One is to use some oracle outputs to predict other oracle outputs when the relationship between them is computable. The other, much more important, is to quickly solve arbitrarily hard computational problem using the oracles. I prefer the additional resource assumptions reflected by the Speed Prior. They make the oracle universes very unlikely, and yield computable predictions. Why do you prefer the Speed Prior? Under the Speed Prior, oracle universes are not just very unlikely, they have probability 0, right? Suppose one day we actually find an oracle for the halting problem, or even just find out that there is more computing power in our universe than is needed to explain our intelligence. Would you then (1) give up the Speed Prior and adopt a more dominant prior, or (2) would you say that you've encountered an extremely unlikely event (i.e. more likely you're hallucinating)? If you answer (1) then why not adopt the more dominant prior now? You are right in the sense that under the Speed Prior any complete _infinite_ oracle universe has probability 0. On the other hand, any _finite_ beginning of an oracle universe has nonvanishing probability. Why? The fastest algorithm for computing all universes computes _all_ finite beginnings of all universes. Now oracle universes occasionally require effectively random bits. History beginnings that require n such bits come out very slowly: the computation of the n-th oracle bit requires more than O(2^n) steps, even by the fastest algorithm. Therefore, under the Speed Prior the probabilities of oracle-based prefixes quickly converge towards zero with growing prefix size. But in the finite case there always will remain some tiny epsilon. Why not adopt a more dominant prior now? I just go for the simplest explanation consistent with available data, where my simplicity measure reflects what computer scientists find simple: things that are easy to compute. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Does provability matter?
Wei Dai wrote: On Wed, Oct 31, 2001 at 10:49:41AM +0100, Juergen Schmidhuber wrote: Which are the logically possible universes? Tegmark mentioned a somewhat vaguely defined set of ``self-consistent mathematical structures,'' implying provability of some sort. The postings of Marchal also focus on what's provable and what's not. Do you understand Marchal's postings? I really have trouble making sense of them. If you do, would you mind rephrasing his main ideas for me? No, unfortunately I do not understand him. But provability does seem important to him. Maybe someone else could post a brief and concise summary? Is provability really relevant? Philosophers and physicists find it sexy for its Goedelian limits. But what does this have to do with the set of possible universes? What does it mean for a universe to be provable? One interpretation is that there exists an algorithm to compute any finite time-space region of the universe that always halts. Is that what you mean? Well, any finite object x has a halting algorithm that computes it. So which are the unprovable things? They are formal statements such as algorithm x will eventually compute object y. If it does then you can prove it, otherwise in general you can never be sure. But why identify universes with true or false statements? When I'm asking Is provability really relevant? I am not really referring to provable universes - I don't even know what that is supposed to be. I just mean: why should it be important to show that a particular algorithm computes a particular universe? Or that a particular universe has a particular property (there are many generally unprovable properties)? Let's compute them all - why worry about provability? The provability discussion seems to distract from the real issue. If we limit ourselves to universes corresponding to traditionally provable theorems then we will miss out on many formally and constructively describable universes that are computable in the limit yet in a certain sense soaked with unprovability. Example: a never ending universe history h is computed by a finite nonhalting program p. To simulate randomness and noise etc, p invokes a short pseudorandom generator subroutine q which also never halts. The n-th pseudorandom event of history h is based on q's n-th output bit q(n) which is initialized by 0 and set to 1 as soon as the n-th statement in an ordered list of all possible statements of a formal axiomatic system is proven by a theorem prover that systematically computes all provable theorems. Whenever q modifies some q(n) that was already used in the previous computation of h, p appropriately recomputes h since the n-th pseudorandom event. Living in this universe is like living in a universe with a halting-problem oracle, right? If you want to know whether program x halts, you just measure the n-th pseudorandom event, where n is such that the n-th theorem is x halts. Yes? Yes. Once the history has stabilized this will give you the correct answer. The problem with universes with halting-problem oracles (which I'll just call oracle universes) is that either the oracle is not exploitable (for example if no one in the universe figures out that the pseudorandom events actually correspond to theorems) in which case the universe just looks like an ordinary provable universe, or the oracle is exploitable, in which case the outcome becomes completely unimaginable for us. (Or is that not true? Can we imagine what would happen if an intelligent species came across an exploitable halting-problem oracle?) But why should this lack of provability matter? Ignoring this universe just implies loss of generality. Provability is not the issue. If oracle universes could exist, how significant are they? What's the probability that we live in such a universe, and how could we try to determine and/or exploit this fact? If the computer on which the history of our universe is calculated is a general Turing machine (as opposed to a monotone machine which cannot edit previous outputs), and if we make no resource assumptions such as those embodied by the Speed Prior, then the oracle universes indeed represent a large fraction of the probability mass of all universes. Why? Because there are many oracle universes with very short algorithms. How to determine whether we are in an oracle universe? In principle it is possible, at least to the extent inductive inference is possible at all! Just search through all oracle pseudo-random generators (PRGs) and try to match their outputs with the apparently noisy observations. If the observations are not really random, eventually you'll stumble across the right PRG. It's just that you cannot prove that it is the right one, or that its outputs have already stabilized. What about exploitation? Once you suspect you found the PRG you can use it to predict the future. Unfortunately the prediction
Does provability matter?
Cannibalizing previous thread Provable vs Computable: Which are the logically possible universes? Tegmark mentioned a somewhat vaguely defined set of ``self-consistent mathematical structures,'' implying provability of some sort. The postings of Marchal also focus on what's provable and what's not. Is provability really relevant? Philosophers and physicists find it sexy for its Goedelian limits. But what does this have to do with the set of possible universes? The provability discussion seems to distract from the real issue. If we limit ourselves to universes corresponding to traditionally provable theorems then we will miss out on many formally and constructively describable universes that are computable in the limit yet in a certain sense soaked with unprovability. Example: a never ending universe history h is computed by a finite nonhalting program p. To simulate randomness and noise etc, p invokes a short pseudorandom generator subroutine q which also never halts. The n-th pseudorandom event of history h is based on q's n-th output bit q(n) which is initialized by 0 and set to 1 as soon as the n-th statement in an ordered list of all possible statements of a formal axiomatic system is proven by a theorem prover that systematically computes all provable theorems. Whenever q modifies some q(n) that was already used in the previous computation of h, p appropriately recomputes h since the n-th pseudorandom event. Such a virtual reality or universe is perfectly well-defined. We can program it. At some point each history prefix will remain stable forever. Even if we know p and q, however, in general we will never know for sure whether some q(n) that is still zero won't flip to 1 at some point, because of Goedel etc. So this universe features lots of unprovable aspects. But why should this lack of provability matter? Ignoring this universe just implies loss of generality. Provability is not the issue. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Predictions duplications
From: Russell Standish [EMAIL PROTECTED] The only reason for not accepting the simplest thing is if it can be shown to be logically inconsistent. This far, you have shown no such thing, but rather demonstrated an enormous confusion between measure and probability distribution. Russell, this is really too much; please read any intro to measure theory, e.g., the one by Halmos. Measures in general are _not_ probability distributions. They are more than that; you can view them as _sets_ of probability distributions on sets that are related to each other in a specific way. Probability density functions define measures and therefore in general aren't probability distributions either. The uniform PDF on [0..1] yields the perfect trivial example; that's exactly why I used it before. In the computability context, compare LI/Vitanyi 1997, chapters 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5 (continuous sample space with (semi)measure M(x)). This is an excellent reference, thank you. Example 4.2.1 gives the construction of the uniform measure over the set of strings, precisely what you claim didn't exist when we started this discussion. You just want to annoy me, don't you? For the last time: There is a uniform *measure* on the strings, but no uniform probability distribution. 1) Halting theorem implies the set of halting programs is of measure zero within the set of all programs You mean, given a uniform measure? But why should you need the halting theorem for this trivial statement? In fact, what exactly do you mean by halting theorem? (Schoenfield's book provides a good intro to mathematical logic and computability theory.) The halting theorem is that there is no algorithm for predicting whether any program will halt. Of course, the result I was referring to is that the set of compressible strings is of measure zero. The set of halting programs actually has finite measure, since \Omega 0. However, there is still finite probability of an input string being nonhalting, and indeed the probability seems to be greater than 0.7 (ie \Omega 0.217643 - see pg 218 Li and Vitanyi), so this still presents a problem for computationalism. This is mixing up everything, in particular, different measures. 1) Infinite strings with finite program have *nonzero* measure, given the *universal* measure (Omega is a number derived from the universal measure). 2) They have measure zero, given the *uniform* measure (Omega has nothing to do with it). 3) To see 2) we do not need the halting theorem - we just compare the finite programs (countably many) to all strings (uncountably many) and we are done. 4) The halting theorem is no problem whatsoever when it comes to computable universes. Why care whether some computable universe stops or not? If it does, it does, otherwise it does not. In both cases we can compute it in the limit. The GP program (aka Universal Dovetailer) is not the simplest thing. The simplest describable thing is the set of all descriptions, that simply exist without being generated. That has precisely zero information or complexity, whereas the UD has some complexity of the order of a few 100 bits. The simplest prior is the uniform one, ie there is no bias whatsoever in the selection. Exist without being generated? Precisely zero information or complexity? But with respect to what? With respect to the observer. With the respect to anything in fact. The set of all possible descriptions has precisely zero information, by definition, regardless of what observer or context you employ. This is a vague informal statement, not a definition. Which are the possible descriptions? What's the formal difference between a description and a non-description? What is your anything if there is no device for describing it formally? If anything does not convey any bit of info then what exactly is it that conveys one bit? Or two? Any traditional definition of information/simplicity requires the specification of a formal description method, such as a Turing machine. You don't need one? Then how do you define information or complexity? How exactly do you define simple ? Actually, all it needs is an equivalencing procedure. See my On Complexity and Emergence paper. A Turing machine will do the job for you, but so will a neural network or an expert system following heuristic rules. So you have to write down formally this equivalencing procedure, right? Why should this be different in principle from writing down the formal rules for a Turing machine? Why is a simplicity measure that depends on your equivalencing procedure superior to a simplicity measure depending on a Turing machine? (The TM procedure includes yours - because it includes all procedures - but not the other way round.)
Re: Predictions duplications
From: Juho Pennanen [EMAIL PROTECTED] So there may be no 'uniform probability distribution' on the set of all strings, but there is the natural probability measure, that is in many cases exactly as useful. Sure, I agree, measures are useful; I'm using them all the time. But in general they are _not_ the same thing as probability distributions. From: Russell Standish [EMAIL PROTECTED] The only reason for not accepting the simplest thing is if it can be shown to be logically inconsistent. This far, you have shown no such thing, but rather demonstrated an enormous confusion between measure and probability distribution. Russell, this is really too much; please read any intro to measure theory, e.g., the one by Halmos. Measures in general are _not_ probability distributions. They are more than that; you can view them as _sets_ of probability distributions on sets that are related to each other in a specific way. Probability density functions define measures and therefore in general aren't probability distributions either. The uniform PDF on [0..1] yields the perfect trivial example; that's exactly why I used it before. In the computability context, compare LI/Vitanyi 1997, chapters 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5 (continuous sample space with (semi)measure M(x)). 1) Halting theorem implies the set of halting programs is of measure zero within the set of all programs You mean, given a uniform measure? But why should you need the halting theorem for this trivial statement? In fact, what exactly do you mean by halting theorem? (Schoenfield's book provides a good intro to mathematical logic and computability theory.) The GP program (aka Universal Dovetailer) is not the simplest thing. The simplest describable thing is the set of all descriptions, that simply exist without being generated. That has precisely zero information or complexity, whereas the UD has some complexity of the order of a few 100 bits. The simplest prior is the uniform one, ie there is no bias whatsoever in the selection. Exist without being generated? Precisely zero information or complexity? But with respect to what? Any traditional definition of information/simplicity requires the specification of a formal description method, such as a Turing machine. You don't need one? Then how do you define information or complexity? How exactly do you define simple ?
Re: Predictions duplications
Schmidhuber: It's the simplest thing, given this use of mathematical language we have agreed upon. But here the power of the formal approach ends - unspeakable things remain unspoken. Marchal: I disagree. I would even say that it is here that the serious formal approach begins. Take unprovable for unspeakable, we can meta-reason (informally or formally) and study the logical structure of the set of unprovable sentences by sound universal machines. Schmidhuber: Unprovable does not mean unspeakable! You can spell out many unprovable things. Why care for the subset of provable sentences? Aren't we interested in the full set of all describable sentences? We can generate it, without caring for proofs at all. Unspeakable means beyond formal definition. In particular, the way we humans communicate and agree on a common formal language is not formally defined. You write x and I say, ok, it's an x, but that's all based on some sort of informal agreement that involves complex pattern recognition and learning processes. Usually we do not question the validity of outcomes of such cognitive processes, and just go ahead with formal derivations. But there is this major informal step *before* formal reasoning can start, and good textbooks on logic acknowledge this.
Re: Predictions duplications
From: Russell Standish [EMAIL PROTECTED] To: [EMAIL PROTECTED] I think we got into this mess debating whether an infinite set could support a uniform measure. I believe I have demonstrated this. I've yet to see anything that disabuses me of the notion that a probability distribtuion is simply a measure that has been normalised to 1. Not all measures are even normalisable. Russell, at the risk of beating a dead horse: a uniform measure is _not_ a uniform probability distribution. Why were measures invented in the first place? To deal with infinite sets. You cannot have a uniform probability distribution on infinitely many things. That's why measures isolate just finitely many things, say, every bitstring of size n, and for each x of size n look at the infinite set of strings starting with x. A uniform measure assigns equal probability to each such set. Of course, then you have a uniform probability distribution on those finitely many things which are sets. But that's not a uniform probability distribution on infinitely many things, e.g., on the bitstrings themselves! The measure above is _not_ a probability distribution; it is an infinite _set_ of _finite_ probability distributions, one for string size 0, one for string size 1, one for string size 2,... I realise that the Halting theorem gives problems for believers of computationalism. It does not. Why should it? I never subscribed to computationalism at any time, but at this stage do not reject it. I could conceive of us living in a stupendous virtual reality system, which is in effect what your GP religion Mark II is. However, as pointed out by others, it does suffer from turtle-itis, and should not be considered the null hypothesis. It requires evidence for belief. By turtle-itis you mean: in which universe do the GP and his computer reside? Or the higher-level GP2 which programmed GP? And so on? But we cannot get rid of this type of circularity - computability and mathematical logic are simply taken as given things, without any further explanation, like a religion. The computable multiverse, or the set of logically possible or mathematically possible or computable universes, represents the simplest explanation we can write down formally. But what exactly does it mean to accept something as a formal statement? What does it mean to identify the messy retina patterns caused by this text with abstract mathematical symbols such as x and y? All formal explanations in our formal papers assume that we agree on how to read them. But reading and understanding papers is a complex physical and cognitive process. So all our formal explanations are relative to this given process which we usually do not even question. Essentially, the GP program is the simplest thing we can write down, relative to the unspoken assumption that it is clear what it means to write something down, and how to process it. It's the simplest thing, given this use of mathematical language we have agreed upon. But here the power of the formal approach ends - unspeakable things remain unspoken. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Predictions duplications
From: [EMAIL PROTECTED]: [EMAIL PROTECTED] wrote: From [EMAIL PROTECTED]: [EMAIL PROTECTED] wrote: M measure: M(empty string)=1 M(x) = M(x0)+M(x1) nonnegative for all finite x. This sounds more like a probability distribution than a measure. In the set of all descriptions, we only consider infinite length bitstrings. Finite length bitstrings are not members. However, we can identify finite length bitstrings with subsets of descriptions. The empty string corresponds to the full set of all descriptions, so the first line M(empty string)=1 implies that the measure is normalisable (ie a probability distribution). Please check out definitions of measure and distribution! Normalisability is not the critical issue. Clearly: Sum_x M(x) is infinite. So M is not a probability distribution. M(x) is just measure of all strings starting with x: M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = For a measure to be normalisable, the sum over a *disjoint* set of subsets must be finite. If the set of subsets is not disjoint (ie the intersection are not empty) then the sum may well be infinite. Bringing this to the case of finite strings. Each finite string is actually a subset of the set of infinite strings, each containing the same finite prefix. So the string 101010 is actually a subset of 10101 and so on. The Sum_x M(x), where I assume x ranges over all strings will of course be infinite. However, since the set of finite strings is not disjoint, this doesn't imply M(x) is unnormalisable. Now when you realise that every finite string x is a subset of the empty string, it becomes clear that M(x) is normalised to precisely 1. The point is: prob dists and measures are different things. There is a good reason for giving them different names. Prob dists assign numbers to individual objects, not to sets. Traditional definitions as well as those for semimeasures in http://www.idsia.ch/~juergen/toesv2/ Neglecting finite universes means loss of generality though. Hence measures mu(x) in the ATOE paper do not neglect finite x: mu(empty string)=1 mu(x) = P(x)+mu(x0)+mu(x1) (all nonnegative). And here P is a probability distribution indeed! P(x)0 possible only for x with finite description. The P(x)0 case above actually breaks the countably subadditive property, so mu(x) cannot be called a measure... I'm not entirely sure what you're getting at here. The set of computable universes includes the finite ones which we may not ignore. How do they contribute to the measure of all universes? For convenience introduce a third symbol $ besides 0 and 1. Now what is the measure of the set of all universes starting with 00110? It's the sum of the measure of the set of universes starting with 001101 and the measure of the set of universes starting with 001100 and the measure of the single universe 00110$ that stops right after 00110. Once there is a $ symbol there won't be another 0 or 1. So mu(x) = P(x)+mu(x0)+mu(x1). Our favorite example is the universal Turing machine with random inputs. For finite x: P(x) is the probability that the TM output is x and nothing but x. mu(x) is something different, namely, the so-called universal measure of all strings _starting_ with x. Again mu(x) = P(x)+mu(x0)+mu(x1). Juergen Schmidhuberhttp://www.idsia.ch/~juergen/
Re: Predictions duplications
From [EMAIL PROTECTED]: [EMAIL PROTECTED] wrote: M measure: M(empty string)=1 M(x) = M(x0)+M(x1) nonnegative for all finite x. This sounds more like a probability distribution than a measure. In the set of all descriptions, we only consider infinite length bitstrings. Finite length bitstrings are not members. However, we can identify finite length bitstrings with subsets of descriptions. The empty string corresponds to the full set of all descriptions, so the first line M(empty string)=1 implies that the measure is normalisable (ie a probability distribution). Please check out definitions of measure and distribution! Normalisability is not the critical issue. Clearly: Sum_x M(x) is infinite. So M is not a probability distribution. M(x) is just measure of all strings starting with x: M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = Neglecting finite universes means loss of generality though. Hence measures mu(x) in the ATOE paper do not neglect finite x: mu(empty string)=1 mu(x) = P(x)+mu(x0)+mu(x1) (all nonnegative). And here P is a probability distribution indeed! P(x)0 possible only for x with finite description. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Predictions duplications
In reply to Russell Standish and Juho Pennanen I'd just like to emphasize the main point, which is really trivial: by definition, a uniform measure on the possible futures makes all future beginnings of a given size equally likely. Then regular futures clearly are not any more likely than the irregular ones. I have no idea what makes Russell think this is debatable. In most possible futures your computer will vanish within the next second. But it does not. This indicates that our future is _not_ sampled from a uniform prior. Some seem to think that the weak anthropic principle explains the regularity. The argument goes like this: Let there be a uniform measure on all universe histories, represented as bitstrings. Now take the tiny subset of histories in which you appear. Although the measure of this subset is tiny, its conditional measure, given your very existence, is not: According to the weak anthropic principle, the conditional probability of finding yourself in a regular universe compatible with your existence equals 1. But it is essential to see that the weak anthropic principle does not have any predictive power at all. It does not tell you anything about the future. It cannot explain away futures in which you still exist but irregular things happen. Only a nonuniform prior can explain this. Which nonuniform prior is plausible? My favorite is the resource-optimal Speed Prior. I am hoping and expecting that someone will confirm it soon by finding a rather fast pseudorandom generator responsible for apparently noisy events in our universe. But others may be more conservative and go for the more dominant enumerable Solomonoff-Levin prior mu^M or maybe even for the nonenumerable prior mu^E (which dominates mu^M while still being computable in the limit) or maybe even for the extreme prior mu^G. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/ From [EMAIL PROTECTED] Thu Oct 11 17:36:41 2001 From: Juho Pennanen [EMAIL PROTECTED] I tried to understand the problem that doctors Schmidhuber and Standish are discussing by describing it in the most concrete terms I could, below. (I admit beforehand I couldn't follow all the details and do not know all the papers and theorems referred to, so this could be irrelevant.) So, say you are going to drop a pencil from your hand and trying to predict if it's going to fall down or up this time. Using what I understand with comp TOE, I would take the set of all programs that at some state implement a certain conscious state, namely the state in which you remember starting your experiment of dropping the pencil, and have already recorded the end result (I abreviate this conscious state with CS. To be exact it is a set of states, but that shouldn't make a difference). The space of all programs would be the set of all programs in some language, coded as infinite numerable sequences of 0's and 1's. (I do not know how much the chosen language + coding has effect on the whole thing). Now for your prediction you need to divide the implementations of CS into two sets: those in which the pencil fell down and those in which it fell up. Then you compare the measures of those sets. (You would need to assume that each program is run just once or something of the sort. Some programs obviously implement CS several times when they run. So you would maybe just include those programs that implement CS infinitely many times, and weight them with the density of CS occurrences during their running.) One way to derive the measure you need is to assume a measure on the set of all infinite sequences (i.e. on all programs). For this we have the natural measure, i.e. the product measure of the uniform measure on the set containing 0 and 1. And as far as my intuition goes, this measure would lead to the empirically correct prediction on the direction of the pencil falling. And if I understood it right, this is not too far from what Dr. Standish was claiming? And we wouldn't need any speed priors. But maybe the need of speed prior would come to play if I thought more carefully about the detailed assumptions involved? E.g. that each program would be run just once, with the same speed etc? I am not sure. Juho / Juho Pennanen Department of Forest Ecology, P.O.Box 24 FIN-00014 University of Helsinki tel. (09)191 58144 (+358-9-191 58144) GSM 040 5455 845 (+358-40-5455 845) http://www.helsinki.fi/people/juho.pennanen */ Resent-Date: Thu, 11 Oct 2001 18:57:25 -0700 From: Russell Standish [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Huh? A PDF? You mean a probability density function? On a continuous set? Probability Distribution Function. And PDF's are defined on all
Re: Predictions duplications
From [EMAIL PROTECTED] : [EMAIL PROTECTED] wrote: So you NEED something additional to explain the ongoing regularity. You need something like the Speed Prior, which greatly favors regular futures over others. I take issue with this statement. In Occam's Razor I show how any observer will expect to see regularities even with the uniform prior (comes about because all observers have resource problems, incidently). The speed prior is not necessary for Occam's Razor. It is obviously consistent with it though. First of all: there is _no_ uniform prior on infinitely many things. Try to build a uniform prior on the integers. (Tegmark also wrote that ... all mathematical structures are a priori given equal statistical weight, but of course this does not make much sense because there is _no_ way of assigning equal nonvanishing probability to all - infinitely many - mathematical structures.) There is at best a uniform measure on _beginnings_ of strings. Then strings of equal size have equal measure. But then regular futures (represented as strings) are just as likely as irregular ones. Therefore I cannot understand the comment: (comes about because all observers have resource problems, incidently). Of course, alternative priors lead to alternative variants of Occam's razor. That has been known for a long time - formal versions of Occam's razor go at least back to Solomonoff, 1964. The big question really is: which prior is plausible? The most general priors we can discuss are those computable in the limit, in the algorithmic TOE paper. They do not allow for computable optimal prediction though. But the more restrictive Speed Prior does, and seems plausible from any programmer's point of view. The interesting thing is of course whether it is possible to experimentally distinguish between the speed prior and the uniform prior, and it is not at all clear to me that it is possible to distinguish between these cases. I suggest to look at experimental data that seems to have Gaussian randomness in it, such as interference patterns in split experiments. The Speed Prior suggests the data cannot be really random, but that a fast pseudorandom generator PRG is responsible, e.g., by dividing some seed by 7 and taking some of the resulting digits as the new seed, or whatever. So it's verifiable - we just have to discover the PRG method. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Predictions duplications
falsifiability, in line with Solomonoff's theory of inductive inference and Occam's razor, which is not just a wishy-washy philosophical framework like Popper's. Similarly, today's string physicists accept theories for their simplicity, not their falsifiability. Just like nobody is able to test whether gravity is the same on Sirius, but believing it makes things simpler. Again: the essential question is: which prior is plausible? Which represents the correct notion of simplicity? Solomonoff's traditional prior, which does not care for temporal complexity at all? Even more general priors computable in the limit, such as those discussed in the algorithmic TOE paper? Or the Speed Prior, which embodies a more restricted concept of simplicity that differs from Kolmogorov complexity because it takes runtime into account, in an optimal fashion? Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Bell's Inequality Determinism vs Nondeterminism
Einstein, Podolsky and Rosen (1935) suggested hidden variables unexplained by standard quantum mechanics, and postulated local interaction only. In 1964 Bell introduced an inequality whose violation was thought to provide evidence against _local_ hidden variable theories. Violation tests are inconclusive as yet - see chain of papers starting maybe with Aspect et al, PRL 49, p. 91 and 1804, 1982; and Franson's counter argument: PR D, p 2529, 1985, and numerous more recent papers. Many physicists now believe in nonlocality, but the issue is not settled yet. Is Bell's inequality relevant to algorithmic TOEs? It is about locality assumptions - but ATOEs do not even insist on locality. There are computable universes whose light speed exceeds ours, or where there is no light at all, where there is global interaction among distant particles, or where there are no particles as we know them, etc. The point of ATOEs is: as long as the computability assumptions hold, any complex universe is unlikely, no matter whether Bell's inequality makes sense in it or not. From this point of view Bell's inequality and locality are not the real issues. The issue is more general. It is determinism vs nondeterminism. There is NO experimental data proving the universe is truly nondeterministic. If it were, where would the unexplained randomness come from? As pointed out by Saibal Mitra, non-crackpots such as 't Hooft (physics Nobel 1999) endorse hidden variable theory and determinism, e.g.: http://xxx.lanl.gov/abs/hep-th/0105105 http://xxx.lanl.gov/abs/hep-th/0104219 Sure, there is a lot of data that at first glance suggests probabilistic aspects of particle behavior, but many pseudorandom generators produce data that match Gaussian or other well-known distributions. I think nobody has ever bothered to systematically check existing data for pseudorandomness. (Note that decorrelation does not imply independence.) Suppose the history of some universe is indeed generated by a deterministic computer program. In principle this would make everything predictable for anybody who knows the program. This does not at all imply, however, that an observer evolving within the universe could predict exactly what is happening light years away, for several reasons: 1. He may never get full access to the current hidden variables, i.e., the current state of the program, because of: 1a. some sort of prewired uncertainty principle that holds for local observers (but not for outsiders such as the Great Programmer). 1b. a maximal universe-specific speed that prevents the observer from collecting precise information about distant parts of his universe. 2. Even if the observer had full access to the state, to make predictions he would need a predictor built within his universe. In general, this machine will run much slower than the universe itself, unable to predict it in time. (But perhaps able to postdict parts of it.) To illustrate 2: there are pseudorandom generators that run extremely slowly but produce extremely convincing results such as the enumerable but effectively random number Omega. The speed prior, however, suggests our universe's pseudorandom generator is much faster than the Omega generator. Juergen http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: Immortality
But there _is_ a modern religion based on universal principles. More precisely, on universal computers. According to the Great Programmer Religion, the Great Programmer wrote a very short program that computes all computable universes. One of them is ours. Some disciples of this religion find it plausible because the short program is the simplest explanation of all observations. Others find it increasingly plausible because our own, currently quite primitive virtual realities are getting more sophisticated all the time. We observe a speed-up factor of 10^6 every 20 years, possibly 10^30 in the ongoing century. More and more people, especially kids, are in regular contact with virtual realities, and to them the new religion may seem just like a natural extrapolation. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/toesv2/ http://www.idsia.ch/~juergen/everything/html.html From: K. S. Ryan [EMAIL PROTECTED] Date: Fri, 28 Sep 2001 15:15:45 Religion is a system of beliefs describing our place in the cosmos. Tha basic premise of all religions is that we are best when we act in accordance with universal principles. Prophets describe our place in the cosmos by explaining universal principles. Fundamentalists of all sorts are in conflict with the modern world because universal principles are increasingly scientific, secular, and physically practical. Old school religions are losing market share to the modern world. But the modern world does not offer a unified system of beleif describing our place in the universe. Contemporary truths are not packaged as a whole, as a spiritual intellectual-emotional raison d'etre. Thus, while educated modern worlders may not be convinced by old school religious beliefs, there is not a modern unified system to replace them. Because we are human, we think a lot, and need to know what the individual means to the whole. What is our place in the cosmos? That is the question. And there is turbulance.
Re: Countable vs Continuous
From: Joel Dobrzelewski [EMAIL PROTECTED] Subject: Re: Countable vs Continuous Date: Thu, 21 Jun 2001 08:41:13 -0400 Juergen: There is the rather harmless kind: the countable one. And some say there is another kind, a strange one, the one associated with the uncountable continuum, the one whose very existence many deny. Do not lump them together. Yes, I can see how this distinction might be useful in some esoteric discussions. But it seems (to me) to have little relevance to achieving a successful Theory of Everything. And it might even distract us.. Both kinds of infinities are inaccessible to humans. So they can play no part in our theories. In this sense, I think it's fair to lump them into the pile of things that are not helpful. I think we may not ignore infinities for quite pragmatic, non-esoteric reasons. Many believe the history of our own universe will be infinite - certainly there is no evidence against this possibility. Also, any finite never-halting program for a virtual reality corresponds to an infinite history. TOEs ignoring this seem unnecessarily restrictive. We cannot build the universe out of pieces themselves we cannot construct. (e.g. Pi) What you cannot construct in finite time is just a particular representation of Pi, namely, the one consisting of infinitely many digits. But this is not a problem of Pi, it is a problem of this particular representation. There are better, finite, unambiguous representations of Pi: its programs. You can manipulate them, copy them, insert into other finite programs and theorem provers, etc. Proofs of Pi's properties are essentially proofs of properties of Pi's programs. Really bad are those things that do not have ANY finite representation. Juergen, what do you think about the minimal cellular automaton? Is it a good candidate ATOE (algorithmic theory of everything)? it depends - minimal for what purpose? Juergen http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Countable vs Continuous
There has been some confusion regarding the various kinds of infinity. There is the rather harmless kind: the countable one. And some say there is another kind, a strange one, the one associated with the uncountable continuum, the one whose very existence many deny. Do not lump them together. The former is accessible by nonhalting computer programs. The latter is not. A program that runs forever cannot consume more than countable time steps and storage cells, adding a new cell whenever it needs one. For example, countably many steps and cells are sufficient to print all digits of the real number Pi. Therefore Pi is computable in the limit. But countable time and space resources are NOT sufficient to print all digits of all random reals in the continuum. In fact, countable resources are not even sufficient to print all (countably many) digits of a single real without finite individual description. Unlike Pi, such truly random reals (and almost all reals are truly random) are NOT computable in the limit, although all their finite beginnings are. Pi has a finite description. Are all infinite objects with finite descriptions computable in the limit? No. Counter example: Infinite Omega (the halting probability of a universal Turing machine with random input) has a finite description, but countable resources are NOT sufficient to print it, although each finite beginning of Omega is computable in the limit. Algorithmic TOEs are limited to the comparatively few, countably many, possibly infinite universe histories with finite descriptions. ATOEs deny the existence of other histories, of histories we cannot even describe in principle, of histories we cannot reasonably talk about. Likewise, ATOEs are restricted to probabilities computable in the limit. We cannot formally describe other probabilities, and therefore cannot write reasonable papers about them. This apparently harmless restriction is the very reason that any complex future (among all the possible futures compatible with the anthropic principle) necessarily is unlikely. Juergen Schmidhuber http://www.idsia.ch/~juergen/ http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: why not high complexity?
From [EMAIL PROTECTED] Thu May 31 18:14:55 2001 From: Karl Stiefvater [EMAIL PROTECTED] O O Maybe you'd like to write down formally OO what you mean. O O O O sure. i suspect we're talking past each other. O O OO O let M be the set of universes. let A_i be a O O O sequence of finite subsets of M, such that A_i O O is a strict subset of A_(i+1). define e(i) be O O O the expected complexity of a uniformly chosen OOO member of A_i. O O O O O Othen lim i-inf e(i) = inf. This will not give you a uniform distribution on infinitely many things. For simplicity, consider integers instead of universes. Assign something like probability P(n)=6/(n^2 pi) to integer n. This yields nonvanishing probability for infinitely many integers. But it's not uniform. Uniformness and nonzero limits are incompatible. O OO O Practical unpredictability due to OO OO deterministic chaos and Heisenberg O OO O O etc is very different from true O O O O unpredictability. For instance, despite of O O O OO chaos and uncertainty principle my computer OOO O O probably will not disappear within the O OO O O next hour. But in most possible futures it OO O O O won't even see the next instant - most are OO O O O maximally random and unpredictable. OO O OO O OOO O O O yes - i think i understand what you're saying OO O O O here. a universe with high complexity is a very OOO O OO messy place indeed - computers disappear, etc. O OOO O OO however, i think you'll agree, that our universe O OO (unless it *is* using a pseudo-random number OO O generator) is quite messy. Not at all. It seems extremely regular. Whatever appears messy may be due to lack of knowledge, not to lack of regularity. OOO i'm wondering if perhaps a different force is OO OO keeping the complexity low. an anthropic force OO OO - if complexity is too high, then life doesn't OOO OOO OO evolve - and we don't see it. According to the weak anthropic principle, the conditional probability of finding ourselves in a universe compatible with our existence equals 1. But most futures compatible with our existence are complex. So why is ours so regular? Algorithmic TOEs explain this, and add predictive power to the weak anthropic principle. http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Re: why not high complexity?
O O ??? - There is no way of assigning equal OO O O O nonvanishing probability to infinitely O O O O many mathematical structures, each being O O O represented by a finite set of axioms. OO O O O O okay - strictly speaking, you are correct. but a OOOcommon trick is to compute equal-probabilities O on finite subsets of the infinite set. and then O O O you can take the limit as those subsets grow O O O to the size of the infinite set. OOO O OOOthe growing here is important - very often the O OO O order in which you add members to the set change O how the series converges. but for the case of O OOO expected complexity, it does not. but in the limit uniform probabilities vanish. Maybe you'd like to write down formally what you mean.
Re: why not high complexity?
From: Karl Stiefvater [EMAIL PROTECTED] Date: Mon, 28 May 2001 00:11:33 -0500 O OO OO Max Tegmark suggests that .. all mathematical O O structures are a priori given equal statistical OOOO O weight and Jurgen Schmidhuber counters that O O OOO there is no way of assigning nonvanishing OOO Oprobability to all (infinitely many) OO O mathematical structures Not quite - the text in Algorithmic Theories of Everything says ... of assigning nonvanishing _equal_ probability ... and he then goes on O (i think) to assign a weighting based upon OO OOO time-complexity. Most of the weightings discussed in Algorithmic Theories of Everything completely ignore time, except for one: the speed prior S derived from the assumption that the universe-generating process is not only computable but also optimally efficient. Concerning time-independent weightings: Different computing devices (traditional Turing Machines, Enumerable Output Machines, General Turing Machines) reflect different degrees of computability (traditional monotone computability, enumerability, computability in the limit). This causes various weightings. All favor short descriptions, given the device. O O OOOi have to say i find Tegmark's argument more OO O persuasive - i can't see why the great O O programmer should be worried about runtime. Even if He completely ignores runtime, He still cannot assign high probability to irregular universes with long minimal descriptions. O O OO furthermore, i feel intuitively that the O O universes ought to have equal weight. Some intuitively feel the sun revolves around the earth. OO OOsuch a sort of probability can be defined, of O O O course, by taking the limit as finite subsets O approach the full infinite set. as long as we OO OOO get the same answer regardless of the order in O O O which we grow the subset, the limit can be said OOO O O O to be defined. ??? - There is no way of assigning equal nonvanishing probability to infinitely many mathematical structures, each being represented by a finite set of axioms. Maybe the intention is to assign measure 2^-n to all histories of size n. That would imply our environment will dissolve into randomness right now, because almost all continuations of its rather regular history so far are random. But instead the universe keeps following the nonrandom traditional laws of physics, thus offering evidence against this measure. O O O the problem is - such a view predicts that we O O OO live in a universe of high Kolmogorov complexity OO OO O - not low complexity. O O O OO O but i don't see why this is such a surprise O O- living in such a universe, we ought to see OO OO O events occur which we cannot effectively O OOO O O predict. but that is exactly what we do see. Practical unpredictability due to deterministic chaos and Heisenberg etc is very different from true unpredictability. For instance, despite of chaos and uncertainty principle my computer probably will not disappear within the next hour. But in most possible futures it won't even see the next instant - most are maximally random and unpredictable. Any irregular future, however, must have small measure, given the rather harmless assumption of formal describability or computability in the limit. http://www.idsia.ch/~juergen/everything/html.html http://www.idsia.ch/~juergen/toesv2/
Provable vs Computable
Which are the logically possible universes? Max Tegmark mentioned a somewhat vaguely defined set of ``self-consistent mathematical structures,'' implying provability of some sort. The postings of Bruno Marchal and George Levy and Hal Ruhl also focus on what's provable and what's not. Is provability really relevant? Philosophers and physicists find it sexy for its Goedelian limits. But what does this have to do with the set of possible universes? I believe the provability discussion distracts a bit from the real issue. If we limit ourselves to universes corresponding to traditionally provable theorems then we will miss out on many formally and constructively describable universes that are computable in the limit yet in a certain sense soaked with unprovable aspects. Example: a never ending universe history h is computed by a finite nonhalting program p. To simulate randomness and noise etc, p invokes a short pseudorandom generator subroutine q which also never halts. The n-th pseudorandom event of history h is based on q's n-th output bit q(n) which is initialized by 0 and set to 1 as soon as the n-th element of an ordered list of all possible program prefixes halts. Whenever q modifies some q(n) that was already used in the previous computation of h, p appropriately recomputes h since the n-th pseudorandom event. Such a virtual reality or universe is perfectly well-defined. At some point each history prefix will remain stable forever. Even if we know p and q, however, in general we will never know for sure whether some q(n) that is still zero won't flip to 1 at some point, because of Goedel etc. So this universe features lots of unprovable aspects. But why should this lack of provability matter? It does not do any harm. Note also that observers evolving within the universe may write books about all kinds of unprovable things; they may also write down inconsistent axioms; etc. All of this is computable though, since the entire universe history is. So again, why should provability matter? Juergen Schmidhuber http://www.idsia.ch/~juergen/toesv2/
Re: Computing Randomness
Hal, you wrote: I believe that attempting an extensive detailed formal description of the Everything is the wrong approach. IMO - at least in this case - the more information used to describe, the smaller the thing described. I was not able to follow this, but informal and vague descriptions of everything have been around for millennia. Now it is time to go beyond this. One cannot scientifically discuss nonformal statements or theories. You did refer to formal describability when you repeatedly wrote that if a universe is a theorem cascade in an N-bit formal axiomatic system then Chaitin's incompleteness dictates that the cascade must stop, and there are no more than 2^[N + c] theorems. For some reason this statement even made it into your FAQ list. Counter example: number theory is describable by finitely many bits of axioms yielding an infinite cascade of provable theorems. JS
Computing Randomness
Where does all the randomness come from? Many physicists would be content with a statistical theory of everything (TOE) based on simple probabilistic physical laws allowing for stochastic predictions such as We do not know where this particular electron will be in the next nanosecond, but with probability 0.04 we will find it in a certain volume V. Any source of randomness, however, leaves us with an unsatisfactory TOE, since randomness does not have a compact or simple explanation, by definition. Where does the enormous information conveyed by a particular history of random events come from? A TOE that cannot explain this is incomplete. The in hindsight obvious solution is an ensemble TOE which covers all possible universe histories. The ensemble conveys less information than most particular histories - one main motivation of this mailing list. Which are the possible histories? Let us focus on well-defined ensembles only, and ignore those that cannot be sufficiently specified to permit reconstruction through a formally describable computer. In particular, we may ignore uncountable ensembles such as continua, or other ensembles including histories without finite descriptions. Is there an optimally efficient way of computing all the randomness in all the describable (possibly infinite) universe histories? Yes, there is. There exists a machine-independent ensemble-generating algorithm called FAST that computes any history essentially as quickly as this history's fastest algorithm. Somewhat surprisingly, FAST is not slowed down much by the simultaneous computation of all the other histories. It turns out, however, that for any given history there is a speed limit which greatly depends on the history's degree of randomness. Highly random histories are extremely hard to compute, even by the optimal algorithm FAST. Each new bit of a truly random history requires at least twice the time required for computing the entire previous history. As history size grows, the speed of highly random histories (and most histories are random indeed) vanishes very quickly, no matter which computer we use (side note: infinite random histories would even require uncountable time, which does not make any sense). On the other hand, FAST keeps generating numerous nonrandom histories very quickly; the fastest ones come out at a rate of a constant number of bits per fixed time interval. Now consider an observer evolving in some universe history. He does not know in which, but as history size increases it becomes less and less likely that he is located in one of the slowly computable, highly random universes: after sufficient time most long histories involving him will be fast ones. Some consequences are discussed in http://www.idsia.ch/~juergen/toesv2/node39.html Juergen Schmidhuber
Re: on formally describable ...
Bruno Marchal explained to Jesse Mazer: Schmidhuber's solution is based on a belongness relation between observer and universes which is impossible to keep once we take comp seriously. But even if we make sense to such a relation, it would only eliminates third person white rabbits and not the first person white rabbits: remember that the great programmer emulates all (semi)computable universe but also all possible dreams. In fact Schmidhuber assume a solution of the mind body problem which is just incompatible with comp. Technically that makes his work incomplete (at least). Such statements keep failing to make sense to me and others I know. Anybody out there who does understand what is meant?
Re: on formally describable universes and measures
From [EMAIL PROTECTED] Sat Mar 3 18:05:53 2001 From: Saibal Mitra [EMAIL PROTECTED] Jürgen wrote: - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, February 22, 2001 5:32 PM Subject: Re: on formally describable universes and measures Saibal Mitra wrote: I think the source of the problem is equation 1 of Juergen's paper. This equation supposedly gives the probability that I am in a particular universe, but it ignores that multiple copies of me might exist in one universe. Let's consider a simple example. The prior probability of universe i (i0) is denoted as P(i), and i copies of me exist in universe i. In this case, Juergen computes the propability that if you pick a universe at random, sampled with the prior P, you pick universe i. This probability is, of course, P(i). Therefore Juergen never has to identify how many times I exist in a particular universe, and can ignore what consciousness actually is. Surely an open universe where an infinite number of copies of me exist is infinitely more likely than a closed universe where I don't have any copies, assuming that the priors are of the same order? To respond, let me repeat the context of eq. 1 [In which universe am I?] Let h(y) represent a property of any possibly infinite bitstring y, say, h(y)=1 if y represents the history of a universe inhabited by yourself and h(y)=0 otherwise. According to the weak anthropic principle, the conditional probability of finding yourself in a universe compatible with your existence equals 1. But there may be many y's satisfying h(y)=1. What is the probability that y=x, where x is a particular universe satisfying h(x)=1? According to Bayes, P(x=y | h(y)=1) = (P(h(y)=1 | x=y) P(x = y)) / (sum_{z:h(z)=1} P(z)) propto P(x), where P(A | B) denotes the probability of A, given knowledge of B, and the denominator is just a normalizing constant. So the probability of finding yourself in universe x is essentially determined by P(x), the prior probability of x. Universes without a single copy of yourself are ruled out by the weak anthropic principle. But the others indeed suggest the question: what can we say about the distribution on the copies within a given universe U (maybe including those living in virtual realities running on various computers in U)? I believe this is the issue you raise - please correct me if I am wrong! (Did you really mean to write i copies in universe i?) I did mean to write i copies in universe i, maybe it would have been better to write n(i) copies in universe i. Anyway, according to equation 1 the probability of universe x given that n(x) 0 is proportional to P(x), which is also intuitively logical. My point is that from the perspective of the observer, of which there are n(x) copies in universe x, things look different. Intuitively, it seems that the measure of the observer should be n(x)* P(x). E.g. suppose there exist x1 and x2 such that P(x1) = P(x2) and n(x1) n(x2) 0. It seems to me that the observer is more likely to find himself in universe x1 compared to universe x2. From an algorithmic TOE perspective the only important thing is that the measure is computable in the limit - a bit more below. Intuitively, some copies might be more likely than others. But what exactly does that mean? If the copies were identical in the sense no outsider could distinguish them, then the concept of multiple copies wouldn't make sense - there simply would not be any multiple copies. So there must be detectable differences between copies, such as those embodied by their different environments. So my answer would be: as soon as you have a method for identifying and separating various observer copies within a universe U, each distinguishable copy_i is different in the sense that it lives in a different universe U_i, just like you and me can be viewed as living in different universes because your inputs from the environment are not identical to mine. In general, the pair (U_i, copy_i) conveys more information than U by itself (information is needed to separate them). The appropriate domain of universes x (to use the paper's notation) would be the set of all possible pairs of the form (separate universe, separate observer). Equation 1 above is perfectly applicable to this domain. Okay, but since I don't know which of the copies I am, the probability that I am one of the copies inside universe i is given as: Sum_{i = 1}^{n(U)} P(U_i) Is this proportional to P(U) or is it proportional to n(U) P(U) ? Saibal To say something about some observer's future, given the past, we need a probability distribution on the possible futures. Does it really make sense to speak of several different copies of some observer within a given universe? Not really, because the observers in general will have
Re: on formally describable universes and measures
The best you can achieve is an algorithm that outputs at least the computable infinite reals in the sense that it outputs their finite descriptions or programs. I am not sure I understand you here. Are you aware that the set of descriptions of computable reals is not closed for the diagonalisation procedure. That is: you cannot generate all (and only) descriptions of computable reals. The algorithm you are mentionning does not exist. You can only generate a superset of the set of all computable reals. That set (of description of all computable reals) is even constructively not *recursively enumerable* in the sense that, if you give me an algorithm generating the (description of) computable real, I can transform it for building a computable real not being generated by your algorithm. I guess you know that. That is why most formal constructivists consider their set of constructive reals as subset of the Turing computable reals. For exemple you can choose the set of reals which are provably computable in some formal system (like the system F by Girard, in which you can formalize ..., well Hilbert space and probably the whole of the *constructive* part of Tegmark mathematical ontology! That is very nice and big but not enough big for my purpose which has some necessarily non constructive feature. About natural numbers and machines I am a classical platonist. About real numbers I have no definite opinion. The describable reals are those computable in the limit by finite GTM programs. There is a program that eventually outputs all finite programs for a given GTM, by listing all possible program prefixes. Sure, in general one cannot decide whether a given prefix in the current list indeed is a complete program, or whether a given prefix will still grow longer by requesting additional input bits, or whether it will even grow forever, or whether it will really compute a real in the limit, or whether it won't because some of its output bits will flip back and forth forever: http://rapa.idsia.ch/~juergen/toesv2/node9.html But what do such undecidability results really mean? Are they relevant in any way? They do not imply that I cannot write down finite descriptions of the describable reals - they just mean that in general I cannot know at a given time which of the current list elements are indeed complete descriptions of some real, and which are not. Still, after finite time my list of symbol sequences will contain a complete description of any given computable real. Thus undecidable properties do not necessarily make things nonconstructive. Can you imagine yourself as a Platonist for a while, if only for the sake of the reasoning? I am not even sure what exactly a Platonist is. Do I need any additional preliminaries to realize why I genuinely fail to understand your invariance lemma? Sure. The delays question for exemple. Let us follow Jesse Mazer idea of torture. Suppose I duplicate you and reconstitute you, not in Washington and Moscow but in some Paradise and some Hell. Would you feel more comfortable if I tell you I will reconstitute you in paradise tomorow and in hell only in 3001 after C. ? Is that what you would choose? Choose? Do I have a choice? Which is my choice? Computationalism is more a human right than a doctrinal truth. I skipped this statement and related ones... Juergen
Re: on formally describable universes and measures
[EMAIL PROTECTED] to [EMAIL PROTECTED] : Certainly things that we can imagine even slightly, like real-valued observers, already have a kind of existence, in that they cause us to argue about them. [EMAIL PROTECTED] to [EMAIL PROTECTED] : That's a bit like saying there is some truth to 1+1=3 just because we can argue about it [EMAIL PROTECTED] to GLevy: Many things are doubtful. 2+2=4 isn't. [EMAIL PROTECTED] to [EMAIL PROTECTED] : There you go again. But being sure isn't the same as being right. Despite the intuitively compelling nature of arithmetic as we know it, it is really quite arbitrary. It is compelling only because we evolved in a world that provided some survival advantage to brains that interpreted sense experience that way, by way of major approximations and conflations. But its formalizations, like the Peano axioms and the inference mechanism that produces theorems like 1+1=2 really are just arbitrary system of rewriting rules. Its perfectly easy to construct equally pretty systems where 1+1 = 3 or 1+1 = 1, starting with different initial strings or using different rewrite rules. When I say 1+1=2 isn't doubtful, without redefining 1,+,=,2, I am assuming the particular traditional rewrite rules used by everybody, not alternative systems where symbol 2 is replaced by 6, or + by addition modulo group size. And you can build universes in such systems, where the arithmetic you find so correct never rears it misshapen head. Algorithmic TOEs are indeed about all possible rewrite systems, including your nontraditional ones. Perhaps you would like to argue that our traditional rewrite system is doubtful as it cannot prove its own consistence? But algorithmic TOEs include even rewrite systems that are inconsistent, given a particular interpretation imposed on innocent symbol strings. They are limited to all possible ways of manipulating symbols. From any description-oriented and communication-oriented viewpoint, however, this does not seem much of a limitation as we cannot even describe in principle things outside the range of algorithmic TOEs. What's more, there are situations in our own neighborhood where alternate arithmetics are more natural than everyday arithmetic. For instance, in a lasing medium, if you introduce one photon in a particular quantum state, and then add another photon in the same state, it is likely that you will find three photons in that state (then more and more - Boson statistics: the probability of a new recruit to a state occupied by n particles is proportional to n/(n+1)). Photons in the same state are in principle indistinguishable from one another, so occupancy of a quantum state is a purer model of counting than the everyday one: when you count pebbles, thay remain quite distinguishable from one another, and it takes an arbitrary high-handed act of abstraction to say that THIS pebble, with its unique shape, color and scratch pattern is somehow the same as this other, completely different pebble. The quantum world in general, with its superpositions, entanglements and ephemeral virtual particles is probably poorly served by bimodal Aristotelian logic, never mind mathematical frameworks idealized from grouping pebbles. But because you are so exclusively wedded to these parochial ways of thinking, you feel you can just reject out of hand the existence (among many other things) of beings able to store, compute and communicate reals, even though many of their properties can be puzzled out. PAH! But you cannot even unambiguously define in a formal way (using any describable set of axioms or rewriting rules) what your intuitive notion of the continuum of reals really means. All provable properties of the reals are expressible as finite symbol strings, and also have an interpretation in a noncontinuous, countable model. Maybe there _are_ important things beyond formal description, things we cannot sensibly talk about. It seems wise though not to talk about them.
Re: on formally describable universes and measures
From [EMAIL PROTECTED] Sun Feb 18 01:16:16 2001 The exchange between Bruno and Juergens is, I believe, instructive and constructive as it forces them to refine their positions. Where did I have to refine mine? JS That' right I guess. You didn't have to refine yours...I guess Dubito ergo cogito does not apply to you. George Many things are doubtful. 2+2=4 isn't. Seriously - where did the present discussion force a refinement of algorithmic TOEs? I do believe they leave lots of space for refinements, but the current debate has not explored this space; it has focused on other things instead. Juergen
Re: on formally describable universes and measures
JS: Then there is your invariance lemma: the way you quantify 1-indeterminacy is independent of (3-)time, (3-)place, and (3-)real/virtual nature of the reconstitution. This does not make sense, because if the (3-) probability distribution on the possible futures and reconstitutions does depend on time or place or other things (why not?) then 1-indeterminacy does so, too. Under most distributions, some futures are less likely than others. Hence there are nontrivial, distribution-dependent, probabilistic 1-predictions as well as quantifications of 1-indeterminacy that depend on time/space/other things. BM: I see you genuily fail to understand the invariance lemma. No problem. We will come back to this until you get the TILT, (if you agree). Bruno, I am usually skipping those of your paragraphs that contain sentences such as physics is a branch of machine's psychology because I have no idea what that is supposed to mean. Still, I feel you do have something nontrivial to say, I just have not been able to figure out what exactly it is. Maybe if I knew why I genuinely fail to understand the invariance lemma - please show me! JS: Not uncomputable. Any past is finite. BM: I was talking about the immediate next futur. But any finite future is computable by a long program as well. The problems arise with infinite futures. JS: I am prejudiced against claims of rigorous proof when even the assumptions are unclear; and against statements that are plain wrong, such as the UD generates all real numbers. BM: I am not claiming I am rigorous, except when you say I am vague and when you ask me precisions which are not relevant. The sentence the UD generates all real numbers is ambiguous. Either you interpret it as The UD generates (enumerates) the set of all real numbers This does indeed contradict Cantor's theorem. Or you interpret it as All real number are (individually) generated by the UD. In which case, with the usual definition of generating a real (generating all its prefixes) it is just correct. Isn't it? No, it isn't, since generating an individual real is not equivalent to generating all prefixes of all reals. Generating an individual real means generating all prefixes of that individual real, AND NOTHING ELSE. Generating a real means you somehow have to be able to identify and completely describe that particular real. If you cannot do this without describing lots of other things then the individual real does not exist from any constructive perspective. The trivial algorithm ALPHABET ( http://rapa.idsia.ch/~juergen/toesv2/node27.html ) whose outputs are 0,1,00,01,10,11,000is not generating a description of any individual infinite real because it never creates a complete representation thereof. Ambiguity arises because each of the outputs is just a prefix of many infinite reals. The best you can achieve is an algorithm that outputs at least the computable infinite reals in the sense that it outputs their finite descriptions or programs. BM: So I ask you again: do you agree that comp entails the existence of first person indeterminacy, as it is shown by the self-duplication thought experience? If it just means you don't know in advance in which possible future you'll end up, provided there is a nontrivial distribution on the possible futures, then this is ok (and trivial). Do I need any additional preliminaries to realize why I genuinely fail to understand your invariance lemma? Juergen
Re: on formally describable universes and measures
This time I'll repeat only a fraction of the 500 lines in your reply: From [EMAIL PROTECTED]: Suppose you survive only through a simulation of the big bang at the level of the quantum superstring, membrane, etc. then the correct level of substitution is the level of the quantum superstring, membrane, etc. Remember the definition of COMP, it says that *there exists* such a level. It does not say that this or that *is* the correct level. Ok. It is a sort of admission of ignorance. This ignorance is fundamental. Indeed it has been shown (independently by a lot of people---ref in my papers) that comp entails we cannot know the correct levels. We can bet on it, though, and we can make reasoning relatively to correct bets. Normally a constructive philosopher should abandon comp right here, because it follows from that theorem that we cannot be machine in any constructive sense. Which theorem? Send pointer to its proof. Not to its informal description, but to its proof. (One reason why I doubt this: isn't the lowest possible level - embodied by what's computable in the limit - sufficient? Why not run all universes GTM-computable in the limit? If one of them is ours, then the set-up is constructive.) You miss the point. Even the one who as PI on his T-shirt is wrong if he believes PI helps him to predict the issue of the next self-duplication. Note that if the program remains as lenghty as the sequence, as it happens for most Schmidhubers---in the iterated self-duplication, these sequence are called uncomputable by Solovay, Chaitin, etc. Not uncomputable. Any past is finite. So it is at worst random or Martin-Loef-random or incompressible. None of those you cite is careless when it comes to the difference between countable and uncountable sets. None claims one can compute a continuum by dovetailing. Dovetailing will be forever limited to the countable realm. Is there a probability distribution on this set (if not, you cannot predict anything)? Which one? You talk really as if probability was the only manner you know for quantifying uncertainty. Beside probability there exist other ways to handle the uncertain. The one I know the best is Dempster-Shafer theory. (I have work some years with expert in that field). Not only I do not restrict myself to the uniform distribution, but I don't share your assumption that the only way for quantifying uncertainty is probability. Why not Dempster-Shafer theory of evidence ? The various Dempster-Shafer (DS) approaches are no alternatives to probability theory. They are extensions designed to facilitate handling of uncertainty in contexts where lack of belief does not imply disbelief. But DS is essentially about hulls confining sets of traditional distributions, and thus compatible with the traditional framework of Bayesian statistics. (Variants of DS that are not yield paradoxical results.) In the first part of my thesis: I am not pretending that I have solved the mind-body problem, nor the problem of the origin of the physical laws, nor the QCU. But I have rigorously proved that with comp these problems are equivalent. It is this recurring type of claim I find so irritating: rigorous proof without formal framework. A weakness is that I am lead toward hard mathematics. What a strange remark. The weakness of your texts is that they are so informal. Which unique formalisation? Please write it down! How can you possibly isolate it by informal reasoning? I was talking *there* about the modal logic G, G*, S4Grz, Z1, Z1*, etc. These formal logics are intensional (modal) variation of the provability logics of the sound self-referentially correct machine. I have provide semantics, and theorem provers. See explanation and technical details in my thesis and in my papers. Your thesis is in French. Your papers are informal. They always include sentences such as: Actually such proof and clarification is one of the main result in my thesis ... without going into details I will briefly try to convey the main line of the argument (p 4 of paper dated sept 24, 2000). Then follow informal examples, references to philosophers, and unsubstantiated claims such as the UD generates all real numbers, when it only generates all their finite prefixes, which is a fundamental difference. Then there is your invariance lemma: the way you quantify 1-indeterminacy is independent of (3-)time, (3-)place, and (3-)real/virtual nature of the reconstitution. This does not make sense, because if the (3-) probability distribution on the possible futures and reconstitutions does depend on time or place or other things (why not?) then 1-indeterminacy does so, too. Under most distributions, some futures are less likely than others. Hence there are nontrivial, distribution-dependent, probabilistic 1-predictions as well as quantifications of 1-indeterminacy that depend on time/space/other things. You are also telling us
Re: on formally describable universes and measures
This time I'll annotate your entire message to demonstrate how many things I tend to find unclear in your texts. From: Marchal [EMAIL PROTECTED] Juergen wrote (among things): But how to answer an ill-posed question? You promise that time and space will disappear at the end of the reasoning, but your question is about delays, and how can we speak about delays without defining time? Simulation time? Real time? Both? How? There is no way to continue without formal framework. We were doing a thought experiment. I haven't say that the delays were virtual. This is done much later in the reasoning. Of course, as George Levy says the permutation real/virtual makes no changes in the first person point of view, and does not change the distribution either. To derive consequences we need to know the assumptions. Of course, this holds for thought experiments as well. Without defining delays you cannot derive something from delays. IF we accept COMP we survive with an artificial brain (Well: in case we were betting on a correct level of substitution). What is a correct level of substitution? Where does the betting come in? On which alternatives can we bet? Which is the distribution on the alternatives? That means the doctor scan (at the right level) your brain, destroy What is the right level? it, and then from the recollected information he builds a new one. The state of the artificial brain mirrors the state of your brain. You survive (with comp!). Now let us suppose the doctor keeps the information hidden in a drawer during one year. Real time delay, in the every day-type of life. After that delay he makes the reconstitution. I am just saying that, with comp, from the point of view of the one who survive, that delay cannot be perceived. It has not influence the kept information of your brain. From the first person point of view the delay introduced by the doctor has not been and cannot been directly perceived. That seems obvious, but what exactly do you mean by perceive, as opposed to directly perceive? You open your eyes - things have changed - another time, another place. Which are the limits of perception here? (that's why I insist sometimes that reconstition booth has no windows!). why sometimes, why sometimes not? Anyway, in general things will have changed - you may need some technical equipment to detect the changes, still, in principle you could find out things are different, at least in the real world. If you cannot, then why not - which are the assumptions here? Maybe you are talking about a virtual reality that you can fully control? Then which is the precise set of virtual realities you are considering? Is there a probability distribution on this set (if not, you cannot predict anything)? Which one? Are you seeing my point ? It does also not change first person perception in case of self-multiplication. Your point is the revival of an old science fiction theme. But as soon as you want to derive something you need to state formal assumptions, otherwise you'll end up with empty philosophic blabla. There is no way to continue without formal framework. I isolate a unique formalisation by an informal reasoning. Which unique formalisation? Please write it down! How can you possibly isolate it by informal reasoning? To formalise at this stage would automatically put the mind-body problem under the rug. Didn't you just say there is a unique formalisation? Why does formalisation suddenly put the mind-body problem under the rug? What's the problem with the mind-body problem? Why is it incompatible with formalisation? A TOE which doesn't address (at least) the mind-body problem is a TOS (a theory of *some* thing). Without formal assumptions you have no theory of everything, no theory of something, no theory at all. But as I show below, those self-multiplication are easily formalised (at least the third person description of those experiment are easily formalised). You can easily write a program which multiplied yourself (still betting on a correct level of course) relatively to virtual environments. Correct level? Betting? On what - which are the alternatives? Which is the distribution on the alternatives? The program that multiplies observers _seems_ to go into the direction of a formal ansatz, although it remains vague. How does the program identify an observer, or myself? It is much easier to write a program that copies entire computable universes together with the embedded observers, because such a simple program does not need to identify observers and separate them from their environment. Please state precisely what you really mean. Don't give another informal example, be precise. Are you among those who argues that talk on consciousness is a hoax ? Not necessarily. How do you manage consciousness in your TOE-approach? Algorithmic TOEs are about computable probability distributions on universe histories
Re: Algorithmic TOEs vs Nonalgorithmic TOEs
From [EMAIL PROTECTED] Wed Feb 14 23:34:18 2001 [EMAIL PROTECTED]: Yes. My point is: as long as we are not forced by evidence, why assume the existence of something we cannot describe or analyze in principle? In the spirit of this list, shouldn't we assume the existence (insofar as anything exists: i.e. it exists if it thinks it exists) of anything unless we are forced by the evidence to rule it out? Certainly things that we can imagine even slightly, like real-valued observers, already have a kind of existence, in that they cause us to argue about them. That's a bit like saying there is some truth to 1+1=3 just because we can argue about it.
Re: Algorithmic TOEs vs Nonalgorithmic TOEs
[EMAIL PROTECTED]: But stuff indescribable by us ought to be constructible by observers with real-valued input, output and memory living in real-valued worlds. They might communicate identities of arbitrary real numbers as naturally as we refer to a points on a line, and execute analytic algorithms as easily as we fiddle with finite alphabets. Those observers may not be in reach of our analysis, but they are within the scope of their own. Yes. My point is: as long as we are not forced by evidence, why assume the existence of something we cannot describe or analyze in principle?
Re: Algorithmic TOEs vs Nonalgorithmic TOEs
From [EMAIL PROTECTED] Tue Feb 13 10:49:14 2001 Subject: Re: Algorithmic TOEs vs Nonalgorithmic TOEs [EMAIL PROTECTED]: This constructive notion of formal describability is less restrictive than the traditional notion of computability, mainly because we do not insist on the existence of a halting program that computes an upper bound of the convergence time of p's n-th output bit. Formal describability thus pushes constructivism to the extreme, barely avoiding the nonconstructivism embodied by even less restrictive concepts of describability. It certainly tests the limits, since there's in principle no way of knowing in general at any time whether a particular bit of the answer on the tape is yet the final answer. Sure. Still, each bit eventually stabilizes forever. Hence the entire bit sequence is constructively describable, and thus a potential universe history that we may not ignore - it might be ours. Example 1.1 [Pseudorandom universe] Let x be an infinite sequence of finite bitstrings x^1,x^2,... representing the history of some discrete universe, where x^k represents the state of the universe at discrete time step k, and x^1 the Big Bang (compare http://www.idsia.ch/~juergen/everything/node1.html ). Suppose there is a finite algorithm A that computes x^{k+1} (k \geq 1) from x^{k} and additional information noise^k (this may require numerous computational steps of A, that is, local time of the universe may run comparatively slowly). Assume that noise^k is not truly random but calculated by invoking a finite pseudorandom generator subroutine. Then x is describable because it has a finite constructive description. Example 2.1 [Pseudorandom universe based on halting problem] Let x be a universe history in the style above. Suppose its pseudorandom generator's n-th output bit PRG(n) is 1 if the n-th program of an ordered list of all possible programs halts, and 0 otherwise. Since PRG(n) is describable, x is too. But there is no halting algorithm computing PRG(n) for all n, otherwise the halting problem would be solvable, which it is not. Hence in general there is no computer that outputs x and only x without ever editing some previously computed history: http://rapa.idsia.ch/~juergen/toesv2/node7.html I see how you could compute Omega: both create and run all Turing machines in a diagonalized way, and each time one halts, increase the approximation of the halting probability by the appropriate amount. But you'd never be sure of even bits near the front, because some running small machines might halt anytime, altering those early bits. Sure. But each bit of Omega will eventually stabilize forever. Omega is actually not so bad as it is enumerable - you won't even need a GTM for it, you can compute it on an EOM. GTMs are required, however, for constructively describable numbers even worse than Omega. See Theorem 3.3: http://rapa.idsia.ch/~juergen/toesv2/node14.html ... Machines with real-valued registers ... This does not lead outside the GTM realm as long as the indefinitely growing register contents are finite for each finite time step. Well, I was thinking they would have operations like reciprocal (or maybe 1/(x+1) to limit the range) which could instantly generate non-terminating expansions. More interestingly they might include transcendental operations to generate non-repeating expansions. Hey, they could even have the Omega operation, which loads a register with Omega, all bits finalized. That certainly beats GTMs. Check out analytic TMs (Hotz, Vierke and Schieffer, 1995) and R-Machines (Blum, Shub, Smale, 1989): http://rapa.idsia.ch/~juergen/toesv2/node47.html The alphabet of such TMs is indeed real-valued instead of binary. This is beyond constructivism though. GTMs are at the extreme end of the constructive spectrum. They are beaten by nonconstructive devices only. But stuff describable by nonconstructive devices only does not even exist. Does it?
Re: on formally describable universes and measures
Resent-Date: Fri, 9 Feb 2001 06:15:47 -0800 Subject: Re: on formally describable universes and measures From: Marchal [EMAIL PROTECTED] No, I do not. I suggest you first define a formal framework for measuring delays etc. Then we can continue. You should have told me this at the preceeding step which was also informal (although precise). I am proposing a thought experiment which is a kind of reductio ad absurdo here (remember that time and space will disappear at the end of the reasoning). My feeling is that, for some unknow reason you have decided to elude the reasoning. That seems clear with your answer to Russell Standish: you are saying 2+2=4 and I am saying 2+2=5! You are saying that I am fully wrong, but you don't tell me where. How am I suppose to take your disagrement here. You don't really answer the question. But how to answer an ill-posed question? You promise that time and space will disappear at the end of the reasoning, but your question is about delays, and how can we speak about delays without defining time? Simulation time? Real time? Both? How? There is no way to continue without formal framework. What does your theory predict with respect to the following experience: You are scanned read and annihilate at Amsterdam. I reconstitute you in Washington tomorrow, and at Moscow in one billion years. Are your expectations different from the situation where the two reconstitutions are simultaneous. Expectations with respect to what? Moscow one billion years from now might be different from Washington tomorrow, so there seem to be two different possible futures. The essential question is: what is the distribution on the possible futures? Is the distribution computable? How does the distribution depend on your delays and other computable (?) things? Are there just 2 possible futures? Or 10? Or infinitely many? If you want to be formal, let us accept classical Newtonian mechanics for the sake of the argument. You know that with comp such experience are possible *in principle*, and that is all what we need for the reasoning. Should we or should we not take these delays into account when evaluating the first-person indeterminacy? What does your theory say? What do you say? Again I fail to understand the question. Please define delays! How many possible delays are there? Are they computable? What exactly is this indeterminacy? Is it something different from an ordinary distribution? If so, what is it? If not, why don't you call it a distribution? The theory first asks: what is the distribution on all possible futures? Maybe you say you do not know. Since it is an algorithmic theory, it answers: ok, distribution unknown, but if it is describable (GTM-computable in the limit), then I still can say something, namely, I can exclude almost all infinite futures (those without finite descriptions) with probability one. And among those I cannot exclude, the weird ones with very long minimal descriptions are very unlikely. Maybe you now say you don't buy the describability assumption. Then the theory can't say nothing nontrivial no more. Neither can you though. Juergen
Re: on formally describable universes and measures
From Russell Standish Thu Feb 8 23:52:51 2001 Guys, I'm getting great enjoyment out of the titanic battle between Juergen and Bruno over the meaning of the UD. I'm learning a lot from Battle? The case is clear. You cannot battle over whether 2+2 equals 4 or 5. the exchange, however, I must admit I do see Bruno's point of view. His UD does seem to generate the reals (or equivalently the set of all infinite binary strings) in countable time. However, I know Even Bruno admits this is not true. Thus his NONalgorithmic arithmetic realism. that infinity (like probability) is a nasty concept, that can easily trip you up. It's easy: just don't confuse the countable set of all finite beginnings of the reals with the uncountable set of all reals, which does not exist from an algorithmic or constructive point of view. There other ways of approaching this - for instance a finite set of axioms, when enumerated into theorems will tell us all that can be known about the real numbers. I cut and paste from the thread Algorithmic TOEs vs Nonalgorithmic TOEs: Loewenheim-Skolem implies that any first order theory with an uncountable model such as the real numbers also has a countable model. None of the countably many theorems concerning the real numbers depends on the continuum, whatever that may be. Our vague ideas of the continuum are just that: vague ideas without formal anchor. I have sympathy for one point of Juergen's though - in the space of descriptions (which we should agree by extension of logical positivism is all that can be discussed), computable descriptions must have higher measure than noncomputable ones. However, it seems to me that a random oracle is an essential component of consciousness and free will - why this is so I can only guess - and so the anthropic principle constrains the interesting universe to having these. It could be that this random oracle is simply a consequence of 1st person indeterminism that arises through the duplicability assumption, as Bruno points out, but then why should duplicability be necessary? There is no evidence whatsoever that we need a random oracle.
Re: on formally describable universes and measures
Your vague answers to questions I did not ask keep evading the issue of continuum vs computability in the limit. I give up. JS
Re: on formally describable universes and measures
On Thu Jan 18 Bruno Marchal replied: Pi is enumerable. Most reals are not. Most of the dummy data is much less likely than extraordinary data (such as Pi), if the dummy data probability is approximable by a computer. Compare Algorithmic Theories of Everything: http://www.idsia.ch/~juergen/toesv2/node23.html A program which generates all the reals is shorter than a program which generates Pi, which is itself shorter than a program which generates a particular real (for most particular reals). Perhaps you confuse program generating reals and programs generating *set* of reals. I certainly do not. There is no program generating the uncountable set of all reals. There only is a program generating countably many prefixes of reals. How to distinguish those from the countably many prefixes of the countable rational numbers? Instead of giving examples, could you just provide a short proof of your claim that there is no computable universe to which we belong? Tell me what you don't understand in my UDA post (which is the beginning of the shortest proof I know). UDA is at http://www.escribe.com/science/theory/m1726.html I did look at it and found lots of words but no formal proof, although it does say QED at some point. You are repeatedly talking about universes generated by a dovetailer. All these universes obviously must be computable, otherwise the dovetailer could not compute them. So how can you claim that there is no computable universe to which we belong, when the very tool you are using generates lots of universes to which we belong? It does not make sense to me - my best guess is that you mean something quite different from what you say. Maybe you just want to say we do not know in which of the many possible computable futures we will end, but this is obvious and precisely the reason why we need to look at the possible probability distributions on possible future histories, to make nontrivial predictions, e.g.: http://www.idsia.ch/~juergen/everything/node4.html (1997) http://www.idsia.ch/~juergen/toesv2/node15.html (2000) Let 3-you be your current computational state and 1-you your actual awareness. What happens is that 3-you belongs to an infinity of computational histories (generated by the UD) and the UDA shows that your expected futur 1-you is undetermined and that the domain of indeterminacy is given by that set of computational histories. So we belongs to an infinity (a continuum) of infinite computational histories. No continuum! The infinite computational histories are countable. The continuum is not. The concepts of dovetailing and continuum are incompatible. The dovetailer will compute many histories featuring a Bruno or two, but only countably many. No continuum! PS I am rather buzy, so I am sorry if I am to short or if I take time for answering. Don't hesitate to make any remarks, though. You are not too short as long as your legs reach the ground :-) Juergen
Re: everything priors
Hi Max, 2) If so, should we really limit ourself to this particular kind of mathematical structures? My concern is that we may be a bit too narrow-minded if we do. But this sort of narrow-mindedness seems necessary to remain within the formally describable realm. I'd go beyond computable structures only if forced by evidence, e.g., if someone shows our universe somehow won't run without all the real numbers. But for now there isn't any evidence in this vein. Wait a sec: there's also no evidence that our particular universe has seven spatial dimensions or a proton/electron mass ratio different from 1836. But we're considering the whole ensemble here. Right. Our motivation is that the ensemble is compatible with existing data yet simpler than the special case. But Occam's razor also motivates us to consider just the computable universes as opposed to those based on, say, non-computable real numbers. The ensemble of computable universes is both compatible with known data AND simple. The ensemble of non-computable universes is not! Cheers, Juergen
Re: Turing vs math
Step n owns 2^(n-1) initial segments. Bruno, why are we discussing this? Sure, in finite time you can compute all initial segments of size n. In countable time you can compute one real, or a countable number of reals. But each of your steps needs more than twice the time required by the previous step. Therefore you need more than countable time to compute all reals. Now, could you give me a bitstring which is not generated by this countably infinite process ? Sure. The output of the program While TRUE print 1 won't be computed in countable time by your procedure. Juergen
Re: Turing vs math
Why assume non-computable stuff without compelling reason? Shaved by Occam's razor. Jacques: On the contrary. Why assume the lack of *any* given type of mathematical stucture? A true everything-hypothesis surely would not. Occam's razor says: don't add extra distinctions such as a restriction like that. Note also that, as I said, computability isn't the real issue. A Turing machine can not be a continuous (but computable) structure. Of course the non-computable stuctures should exist too in an everything - hypothesis. The non-computable structures are just an ill-defined fidget of our imagination. They do not exist in the sense that we cannot formally describe them with a finite number of bits. Textbooks and theorems about real numbers are computable (finite symbol strings, finite proofs), most real numbers are not. Occam's razor really says: do not add any bits beyond those necessary to explain the data. Observed data does not require more than a finite number of bits, and never will. Non-computability is not a restriction. It is an unnecessary extension that greatly complicates things, so much that we cannot even talk about it in a formal way. Juergen Schmidhuber www.idsia.ch
Re: Turing vs math
Jacques Mallah wrote: I agree with Alistair Malcolm that Turing machines are not enough A continuous structure is a perfectly good mathematical structure, but no Turing based scheme can include it. Why assume non-computable stuff without compelling reason? Shaved by Occam's razor. Juergen Juergen Schmidhuber www.idsia.ch
Re: Turing vs math
Bruno: Honestly it is still not clear. How could ever S(U)=TRUE be computable ? As a computer scientist I guess you know that even the apparently simple question does that piece of code computes the factorial function is not computable. Sure, it's not even decidable in general whether a given program will produce any output at all. Once the output is there, though, you can easily decide whether it consists of, say, prime numbers. In absence of evidence to the contrary we assume that the presence of your consciousness (whatever that may be) is detectable by a computable process, namely, the one employed by you, the observer, who decides that he exists, via some sort of computation taking place as part of the evolution of his universe. George: Juergens seems to be talking as if the measure of information is absolute. It is not. Information is always conditional information (Shannon), where the condition is the observer's a-priori database of information. Sure, but in the Kolmogorov framework the role of prior knowlwdge vanishes in the limit. Each prior corresponds to a particular Turing machine TM with particular inbuilt prior knowledge. But TM2 can simulate TM1 using a compiler whose size is independent of the information content of the universes they compute. For all U, K1(U)=O(K2(U)), where K1(U) and K2(U) are U's information contents relative to T1 and T2. Similarly P1(U)= O(P2(U)), where P1 and P2 are the respective probabilities given Solomonoff-Levin distributions for T1 and T2. There is a good reason why Solomonoff-Levin distributions are called universal distributions! Hal: We then invoke the principle that large-program universes are inherently less likely than small-program universes, and presto! we have it more likely that we live in a universe without flying rabbits, without magic, etc. That's the general argument we are striving to achieve. I agree. That's precisely one of the major points of the 1997 paper. I do think that this argument has some problems, but it is appealing and if the holes can be filled it seems to offer an answer to the question. What do you think? Where exactly are the holes? Juergen
Re: Turing vs math
Bruno wrote: I don't take the notion of observer for granted. Neither do I, of course. The observer O is something computable that evolves in some universe U. The problem is that to be in a universe has no clear meaning But it does. There is a computable predicate S such that S(U)=TRUE if O inhabits U. Fortunately, there is no need for us to further specify what it means to ``inhabit,'' to be ``conscious'' or ``self-aware,'' or whether there is some other observer who applies S to U and uses S(U) to identify O, etc. Now we have a formally defined, conditional probability distribution on all universes satisfying S. I thought this to be clear, but maybe I should have written it down explicitly. Juergen