The Swiss AI Lab IDSIA: 20th Anniversary / Jobs

2008-10-02 Thread Schmidhuber Juergen

The 20 year anniversary celebration of the Swiss AI Lab IDSIA
will take place on 24 October 14:00-18:00 in Palazzo dei Congressi
in the center of Lugano, Switzerland, during Ticino Informatica 2008:
http://www.idsia.ch/Files/idsia20/Program.html

Distinguished keynote speakers:

1. Mauro Dell'Ambrogio,
Swiss State Secretary for Education and Research

2. Prof. Marco Dorigo,
Pioneer of Ant Colony Optimization and Swarm Intelligence

3. Prof. Helge Ritter,
A pioneer of Self-Organizing Maps; Leibniz Laureate

Panel: The Future of AI

Registration is free! Please register here:
http://www.idsia.ch/Files/idsia20/Register.html?

Juergen Schmidhuber
http://www.idsia.ch/~juergen/


PS: Current jobs at IDSIA
(all hired persons will closely interact):
1 Postdoc and 1 PhD student in
biologically plausible reinforcement learning:
http://www.idsia.ch/~juergen/sinergia2008.html
1 Postdoc and 1 Developer in handwriting recognition:
http://www.idsia.ch/~juergen/kti2008.html
1 related Postdoc in medical image analysis:
http://www.idsia.ch/~luca/PostDoc_2008_IDSIA_IN3.html
1 PhD fellowship in evolutionary computation:
http://www.idsia.ch/~tino/evo2008.html
More jobs on artificial curiosity and
learning robots to be announced soon:
http://www.idsia.ch/~juergen/interest.html
http://www.idsia.ch/~juergen/learningrobots.html

smime.p7s
Description: S/MIME cryptographic signature


Re: Wolfram 2,3 Turing Machine

2007-10-26 Thread Juergen Schmidhuber

Impressive result by Alex Smith!
Funny though how Wolfram's web sites on this
print Wolfram's name in larger font and more
frequently than Smith's, even trying to sell this
as New Kind Of Science although it's just a
continuation of a decades-old search for
small universal Turing machines :-)

BTW, check out Marcus Hutter's older posting to
the Kolmogorov Complexity mailing list on whether
such machines should really count as UTMs or not:
http://mailman.ti-edu.ch/pipermail/kolmogorov/2007/000245.html

JS
http://www.idsia.ch/~juergen/computeruniverse.html
http://www.idsia.ch/~juergen/wolfram.html

On Oct 24, 2007, at 8:32 PM, Tom Caylor wrote:

We're excited to announce that the $25,000 Wolfram 2,3 Turing Machine
Research Prize has been won.
Alex Smith, a 20-year-old undergraduate in Birmingham, UK, has given a
40-page proof that Wolfram's 2,3 Turing machine is indeed universal.
This result ends a half-century quest to find the simplest possible
universal Turing machine. It also provides strong further evidence for
Wolfram's Principle of Computational Equivalence. The official prize
ceremony is planned for November at Bletchley Park, UK, site of Alan
Turing's wartime work.
For more information about the prize and the solution, see:
http://www.wolframprize.org
Stephen Wolfram has posted his personal reaction to the prize at:
http://blog.wolfram.com/2007/10/the_prize_is_won_the_simplest.html




smime.p7s
Description: S/MIME cryptographic signature


Re: measure problem

2007-04-26 Thread Juergen Schmidhuber
Hi Max,

in this particular universe it's going well, thank you!

As promised, I had a look at your paper. I think
it is well written and fun to read. I've got a few comments
though, mostly on the nature of math vs computation,
and why Goedel is sexy but not an issue
when it comes to identifying possible mathematical
structures / universes / formally describable things.
I think some of the comments are serious enough to affect
the conclusions. Some come with quotes from papers in
http://www.idsia.ch/~juergen/computeruniverse.html
where several of your main issues are addressed.
Some are marked by Serious.

I am making a cc to the everythingers, although it seems
they are mostly interested in other things now - probably
nobody is really going to read this tedious response which
became much longer than I anticipated.

1. An abstract baggage-free mathematical structure
does not exist any more than a baggage-free
computer - the particular axiomatic system you choose
is like the set of primitive instructions of the computer
you choose. Not very serious, since for general computers
and general axiomatic systems there are invariance theorems:
changing the baggage often does not change a lot, so to speak.
But it should be mentioned.

2. p 11: you say that data sampled from Gaussian random variables
is incompressible - NOT true - give short codes to probable events
(close to the mean), long codes to rare events (Huffman
coding).

3. same sentence: how to test what inflation predicts?
How to test whether the big bang seed was really random,
not pseudo-random? The second million bits of pi look
random but are not. We should search for short programs
compressing the apparent randomness:
http://www.idsia.ch/~juergen/randomness.html

4. p 15: Mathematical structure (MS)
  just exists. Is that so? Others will look at
your symbols and say they are  just heaps of chalk
on a blackboard, and you need a complex,
wet pattern recognition system to interpret them.
Here's where beliefs enter...

5. p 18: mathematical structures, formal systems and
computations are aspects of one underlying
transcendent structure whose nature we don't fully understand
But we do! I'd say there are NO serious open problems with
your figure 5 - formal systems vs math vs computation
is a well-explored field. More about this below.
The 2000 paper (your nr 17) exploits
this understanding; it turns out the most convenient way to deal
with the measure problem is the computer science way (right
hand corner of your figure 5).
As I wrote in the 2000 paper:
http://arxiv.org/abs/quant-ph/0011122
The algorithmic approach, however, offers several conceptual 
advantages: (1) It provides the appropriate framework for issues of 
information-theoretic complexity traditionally ignored in pure 
mathematics, and imposes natural complexity-based orderings on the 
possible universes and subsets thereof. (2) It taps into a rich source 
of theoretical insights on computable probability distributions 
relevant for establishing priors on possible universes. Such priors are 
needed for making probabilistic predictions concerning our own 
particular universe. Although Tegmark suggests that ``... all 
mathematical structures are a priori given equal statistical weight'' 
[#!Tegmark:98!#](p. 27), there is no way of assigning equal 
nonvanishing probability to all (infinitely many) mathematical 
structures. Hence we really need something like the complexity-based 
weightings discussed in [#!Schmidhuber:97brauer!#] and especially the 
paper at hand. (3) The algorithmic approach is the obvious framework 
for questions of temporal complexity such as those discussed in this 
paper, e.g., ``what is the most efficient way of simulating all 
universes?''

6. Serious: run the sim, or just describe its program?
Are you sure you know what you want to say here?
What's the precise difference between program bitstrings
and output bitstrings? The bitstrings generated by the programs (the
descriptions) are just alternative descriptions of
the universes, possibly less compact ones. You as
an external observer may need yet another program
that translates the output bits (typically a less compressed
description) into video or something, to obtain the
description your eyes want.
Note that the 2000 paper and the 2002
journal variant don't really care for time
evolution, just for descriptions - within
the bitstrings maybe there is an observer
who thinks he knows what's time, but
to the outsider his concept of
time may be irrelevant. (Unlike the
1997 paper, the 2000/2002 papers do not focus on a
one to one mapping between physical
and computational time steps, otherwise we'd
miss all the universes where the concept
of time is irrelevant.) Here's what I wrote at  the end:
After all, algorithmic theories of the describable do
encompass everything we will ever be able to talk
and write about. Other things are simply beyond description.

7. Serious:  p 18 CUH: what's your def
of computable? You mean

Zuse Symposium: Is the universe a computer? Berlin Nov 6-7

2006-11-02 Thread Juergen Schmidhuber

Dear colleagues,
many interesting talks
at the Zuse Symposium:
Is the universe a computer?
Berlin Nov 6-7, 2006
http://www.dtmb.de/Aktuelles/Aktionen/Informatikjahr-Zuse/

Best regards,
-JS
http://www.idsia.ch/~juergen/computeruniverse.html


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Everything List group.
To post to this group, send email to everything-list@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/everything-list
-~--~~~~--~~--~--~---



probabilities measures computable universes

2004-01-23 Thread Juergen Schmidhuber
I browsed through recent postings and hope
this delayed but self-contained message can clarify
a few things about probabilities and measures
and predictability etc.
What is the probability of an integer being, say,
a square? This question does not make sense without
a prior probability distribution on the integers.
This prior cannot be uniform. Try to find one!
Under _any_ distribution some integers must be
more likely than others.
Which prior is good?  Is there a `best' or
`universal' prior? Yes, there is. It assigns to
each integer n as much probability as any other
computable prior, save for a constant factor that
does not depend on n.  (A computable prior can be
encoded as a program that takes n as input and
outputs n's probability, e.g., a program that
implements Bernoulli's formula, etc.)
Given a set of priors, a universal prior is
essentially a weighted sum of all priors in the
set. For example, Solomonoff's famous weighted sum
of all enumerable priors will assign at least as
much probability to any square integer as any
other computable prior, save for a constant
machine-dependent factor that becomes less and
less relevant as the integers get larger and
larger.
Now let us talk about all computable universe
histories. Some are finite, some infinite. Each
has at least one program that computes it. Again
there is _no_ way of assigning equal probability
to all of them! Many are tempted to assume a
uniform distribution without thinking much about
it, but there is _no_ such thing as a uniform
distribution on all computable universes, or on
all axiomatic mathematical structures, or on
all logically possible worlds, etc!
(Side note: There only is a uniform _measure_ on
the finitely many possible history _beginnings_
of a given size, each standing for an uncountable
_set_ of possible futures. Probabilities
refer to single objects, measures to sets.)
It turns out that we can easily build universal
priors using Levin's important concept of self-
delimiting programs. Such programs may
occasionally execute the instruction request new
input bit; the bit is chosen randomly and will
remain fixed thereafter. Then the probability of
some universe history is the probability of guessing
a program for it. This probability is `universal'
as it does not depend much on the computer (whose
negligible influence can be buried in a constant
universe-independent factor). Some programs halt or
go in an infinite loop without ever requesting
additional input bits. Universes with at least one
such short self-delimiting program are more probable
than others.
To make predictions about some universe, say,
ours, we need a prior as well. For instance,
most people would predict that next Tuesday it
won't rain on the moon, although there are
computable universes where it does. The anthropic
principle is an _insufficient_ prior that does not
explain the absence of rain on the moon - it does
assign cumulative probability 1.0 to the set of all
universes where we exist, and 0.0 to all the other
universes, but humans could still exist if it did
rain on the moon occasionally. Still, many tend to
consider the probability of such universes as small,
which actually says something about their prior.
We do not know yet the true prior from which
our universe is sampled - Schroedinger's wave
function may be an approximation thereof. But it
turns out that if the true prior is computable
at all, then we can in principle already predict
near-optimally, using the universal prior instead:
http://www.idsia.ch/~juergen/unilearn.html
Many really smart physicists do not know this
yet. Technical issues and limits of computable
universes are discussed in papers available at:
http://www.idsia.ch/~juergen/computeruniverse.html
Even stronger predictions using a prior based
on the fastest programs (not the shortest):
http://www.idsia.ch/~juergen/speedprior.html
-Juergen Schmidhuber



everything talk - new AI

2003-02-27 Thread Juergen Schmidhuber
I put everything talk slides on the web:
http://www.idsia.ch/~juergen/everythingtalk/
http://www.idsia.ch/~juergen/computeruniverse.html
And also a recent overview article:

The New AI: General  Sound  Relevant for Physics
J. Schmidhuber. TR IDSIA-04-03, arxiv:cs.AI/0302012
To appear in B. Goertzel and C. Pennachin, eds.:
Artificial General Intelligence
http://www.idsia.ch/~juergen/newai/newai.html
Abstract. Most traditional artificial intelligence (AI)
systems of the past 50 years are either very limited, or
based on heuristics, or both. The new millennium, however,
has brought substantial progress in the field of theoretically
optimal and practically feasible algorithms for prediction,
search, inductive inference based on Occam's razor, problem
solving, decision making, and reinforcement learning in
environments of a very general type. Since inductive inference
is at the heart of all inductive sciences, some of the results
are relevant not only for AI and computer science but also for
physics, provoking nontraditional predictions based on Zuse's
thesis of the computer-generated universe.
Comments welcome!

Juergen Schmidhuberhttp://www.idsia.ch/~juergen/



Re: Zuse's thesis web site

2002-11-06 Thread Juergen Schmidhuber
Thanks for your comments, Hal!

Your point on the dovetailer is well taken. I modified the text:
If Zuse's thesis is correct, then which is our universe's program? It 
may come as a surprise to some that we already know at least one 
algorithm that does compute our universe (and several others)...

Ed Clark has a nice review page on Wolfram's book:
http://www.math.usf.edu/~eclark/ANKOS_reviews.html
It includes Scott Aaronson's interesting review which also
addresses the issue of Bell's inequality.

Best,
Juergen   http://www.idsia.ch/~juergen/digitalphysics.html


Juergen Schmidhuber writes:


I welcome feedback on a little web page on Zuse's 1967 thesis
(which states that the universe is being computed on a cellular automaton):

http://www.idsia.ch/~juergen/digitalphysics.html



That's very interesting; I was not aware of Zuse.  Unfortunately I
don't know German so I can't read his paper.

Regarding the question of the compatibility of CA models with relativity
and QM, Wolfram looks into this in some detail.  He essentially abandons
a simple CA model in favor of a more complex network of interacting
nodes, which has some features similar to the Lorentz transformation of
relativity.  Then to address the EPR style long-distance correlations of
QM, he proposes that while the network is mostly local, it has occasional
nodes which get stretched apart and are connected to distant nodes.
These are rare but allow for the type of information flow necessary to
reproduce long-distance QM correlations.  All in all it is a pretty ad
hoc and unconvincing model.

I tried to read the t'Hooft paper referenced here but it was over my
head.  It also struck me though as not really addressing the discrepancy
between long-distance correlations and local CA models.  It seems very
much an open and difficult question to me to show how a local CA model
can reproduce relativity and QM.

One issue which CA models tend to ignore is the MWI.  Most CA models
are built as hidden variable theories which define a single universe.
Some multiverse models have that structure as well.  But it seems to me
that this is an entirely unnecessary restriction.  If a CA can model
a universe, it can model a multiverse, and likewise with any other
computing model like TMs.

The MWI is fully deterministic, which may make it a more attractive
target for modelling with a deterministic computational theory than
attempting to reproduce the statistical phenomena of QM, essentially
via hidden variables.  Any hidden variable theory, CA based or not,
has two strikes against it from the beginning due to the the many well
known difficulties of Bell inequalities and EPR correlations.

Regarding entropy, it is pointed out that entropy does not grow in a
CA model.  Wolfram discusses this as well.  While entropy technically
does not grow, you can get phenomena that look very much like entropy
growth in a CA model.  Eventually you will get a Poincare recurrence
if the universe is finite.  But if you start in a sufficiently simple
state, there are many CA models which will mimic entropy growth into a
more complex state.  And this may be close enough to explain our universe.

Alternatively, of course the MWI as a deterministic theory also does
not have entropy growth.  As mentioned above, computational models of
our universe might well do better to aim towards an MWI world.

As far as the claim that we already know the algorithm that runs our
universe, and it is the UD: I think this is amusing but ultimately
misleading.  It's true that a dovetailer which runs all programs will
indeed run our own universe's program (assuming it has one), but I think
it is a misuse of terminology to say that the UD is the algorithm that
is running our universe.  I would reserve that phrase to refer to the
specific program that generates our universe and no others.  It will be a
tremendous accomplishment of physics and philosophy when that program is
discovered, but it is misleading to give the impression that we already
know what it is.

I think a better terminology here would be something like, we don't
need to know the specific program that describes our universe in order
to imagine how to program a computer that would in fact generate our
experiences, at least in theory.  And then go on and explain about
running all programs at once, etc.

Hal Finney







Zuse's thesis web site

2002-11-05 Thread Juergen Schmidhuber
I welcome feedback on a little web page on Zuse's 1967 thesis
(which states that the universe is being computed on a cellular automaton):

http://www.idsia.ch/~juergen/digitalphysics.html

Juergen Schmidhuber




Re: wheeler walked away; Zuse ...

2002-09-27 Thread Juergen Schmidhuber

[EMAIL PROTECTED] wrote:
 as for your point in your post about wheeler attaching
 his name to the theory, I think its ok for proponents
 and not originators of a theory to be named along with it.
 for example lately Ive been referring to the
 fredkin-wolfram thesis. fredkin is far more the 
 originator; wolfram is far more the proponent. seems
 to me the everett-wheeler theory can be fairly seen in the
 same way.

just to set the record straight: the originator was Zuse.
Here is a link to a PDF scan of his 1967 paper (on the
universe running on a cellular automaton):
http://www.idsia.ch/~juergen/zuse.html




colt

2002-08-02 Thread Juergen Schmidhuber

Physics is an inductive science. You try to find a law that matches
the data, and hope that it generalizes on unseen data.
 
When asked about the nature of their inductive business, most
physicists refer to Popper's ideas from the 1930s (falsifiability
etc), and sometimes to Occam's razor: simple explanations are better
explanations.
 
Most physicists, however, are not even aware of the fact that there
is a formal basis for their inductive science, provided by the field
of computational learning theory (COLT), in particular, the theory of
universal induction.
 
The contents of the following COLT paper may be old news to some
on this list.
 
--
 
 
The Speed Prior: a new simplicity measure yielding near-optimal
computable predictions (Juergen Schmidhuber, IDSIA)

In J. Kivinen and R. H. Sloan, eds, Proc. 15th Annual Conf. on
Computational Learning Theory (COLT), 216-228, Springer, 2002;
based on section 6 of http://arXiv.org/abs/quant-ph/0011122 (2000)
http://www.idsia.ch/~juergen/speedprior.html
ftp://ftp.idsia.ch/pub/juergen/colt.ps

 
Solomonoff's optimal but noncomputable method for inductive
inference assumes that observation sequences x are drawn from an
recursive prior distribution mu(x). Instead of using the unknown
mu(x) he predicts using the celebrated universal enumerable prior
M(x) which for all x exceeds any recursive mu(x), save for a
constant factor independent of x. The simplicity measure M(x)
naturally implements Occam's razor and is closely related to the
Kolmogorov complexity of x. However, M assigns high probability to
certain data x that are extremely hard to compute. This does not match
our intuitive notion of simplicity. Here we suggest a more plausible
measure derived from the fastest way of computing data.  In absence
of contrarian evidence, we assume that the physical world is generated
by a computational process, and that any possibly infinite sequence of
observations is therefore computable in the limit (this assumption is
more radical and stronger than Solomonoff's). Then we replace M by the
novel Speed Prior S, under which the cumulative a priori probability
of all data whose computation through an optimal algorithm requires
more than O(n) resources is 1/n.  We show that the Speed Prior allows
for deriving a computable strategy for optimal prediction of future
y, given past x.  Then we consider the case that the data actually
stem from a nonoptimal, unknown computational process, and use
Hutter's recent results to derive excellent expected loss bounds for
S-based inductive inference.  Assuming our own universe is sampled
from S, we predict: it won't get many times older than it is now;
large scale quantum computation won't work well; beta decay is not
random but due to some fast pseudo-random generator which we should
try to discover.
 
Juergen Schmidhuberhttp://www.idsia.ch/~juergen




the short program

2002-07-10 Thread Juergen Schmidhuber

Tim May wrote:
 One thing that Tegmark got right, I think, is the notion that a lot of
 branches of mathematics and a lot of mathematical structures probably go
 into making up the nature of reality.

 This is at first glance dramatically apposite the ideas of Fredkin,
 Toffoli, Wheeler1970, and Wolfram on the generation of reality from
 simple, local rules. Wolfram has made a claim in interviews, and perhaps
 somewhere in his new book, that he thinks the Universe may be generated
 by a 6-line Mathematica program!
 
I'd like to point out that it was Konrad Zuse himself (inventor of the
computer: http://www.idsia.ch/~juergen/zuse.html ) who was the first to
propose that the physical universe is running on a grid of simple
computers, each communicating with its neighbors: a cellular automaton.
He called this Rechnender Raum, which means Computing Space. As
always, Zuse was way ahead of his time. Perhaps the best reference is:
 
  Zuse, Konrad: Rechnender Raum, Schriften zur Datenverarbeitung,
   Band 1.  Friedrich Vieweg  Sohn, Braunschweig (1969).
 
And here is the simple method for computing all universes, including
ours, if it is computable indeed (adapted from page 1 of the 97 paper
http://www.idsia.ch/~juergen/everything/html.html ):
 
  Order all programs lexicographically. The first
  is run for one instruction every second step, the next for one
  instruction every second of the remaining steps, and so on.
 
As a by-product, this program also outputs descriptions of all formally
describable mathematical systems, and all proofs of all their theorems.
 
A bit of thought shows that the method even has the optimal order of
complexity. For example, it outputs our universe history as quickly as
the history's fastest program, save for a (possibly huge) constant that
does not depend on output size.
 
Juergen Schmidhuber  http://www.idsia.ch/~juergen/




Re: Optimal Prediction

2002-04-15 Thread Juergen Schmidhuber

Wei Dai wrote:
 BTW, isn't the justification for universal prediction taken in this paper
 kind of opposite to the one you took? The abstract says The problem,
 however, is that in many cases one does not even have a reasonable guess
 of the true distribution. In order to overcome this problem ... Your
 papers on the other hand assume that the true distribution can be known
 and proposed that it must be the Speed Prior. (Later you said you believe
 it is the Speed Prior with probability 1.) Is this still your position?

Well, I prefer to recall my more cautious fallback position - let us
just
write down the assumptions, and derive the consequences.

One does not really have to know the true prior; one just needs an upper
bound on its power.  Choose some prior P, and assume the universe is
sampled from a prior less dominant than P. Then predictions according to
P will be rather accurate.

For example, suppose the process computing the universe is not optimally
efficient for some reason.  As long as the resource postulate holds the
true prior cannot dominate the Speed Prior, and S-based predictions
will be fine.

Juergen




Re: Optimal Prediction

2002-04-04 Thread Juergen Schmidhuber

Bill Jefferys wrote:

 At 10:59 AM +0200 4/3/02, Juergen Schmidhuber wrote:
 The theory of inductive inference is Bayesian, of course.
 But Bayes' rule by itself does not yield Occam's razor.

 By itself? No one said it did. Of course assumptions must be made.
 At minimum one always has to choose priors in Bayesian inference.

 Our paper shows that there is a Bayesian interpretation that yields
 something very suggestive of Ockham's razor. It is appealing in that
 if one has a simple versus a complex hypothesis, simple meaning
 that the prior probability is concentrated and complex meaning that
 it is vague and spread out, simple meaning that you don't have many
 knobs to tweak, complex meaning the opposite, then the simple
 hypothesis will be favored over the complex one unless the data lie
 well away from where the simple hypothesis has placed its bets.
 Bayesians distinguish this from Ockham's formulation by calling it
 the Bayesian Ockham's razor, recognizing that it is not what
 William of Ockham wrote, Entia non sunt multiplicanda sine
 necessitate (or one of his other genuine formulations).

 Please don't read more into our article than is there.

 By itself. First you said that the AP by itself has no predictive
 power. I missed the by itself so misunderstood you, but when I
 understood what you were saying I agreed. Now you say that Bayes'
 rule by itself does not yield Ockham's razor. Jim and I never said
 that it did. I am hard pressed to see how anything nontrivial
 relating to the real world can be gotten from any principle by
 itself, so I don't regard these comments as very profound, or very
 useful.

 [Remainder of article snipped]

 Bill
 
 
One has to choose priors. Exactly. To repeat the nontrivial point:
The only choice you need to obtain Occam's razor is to restrict
yourself to priors computable in the limit (this is not much of a
restriction in the sense that other priors are beyond constructive
description anyway). Then you'll automatically prefer few knobs to
tweak, or, more generally, short descriptions of the
observations:  http://www.idsia.ch/~juergen/toesv2/
 
Juergen  (will be out of town until April 16)




Re: Optimal Prediction

2002-03-28 Thread Juergen Schmidhuber

Bill Jefferys wrote:

 At 9:19 AM +0100 3/27/02, Juergen Schmidhuber wrote:
 You are claiming the AP necessarily implies a specific fact about
 nuclear energy levels? I greatly doubt that - can you give a proof?
 
 Yes, I can.
 
 
http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1954ApJS1..121Hdb_key=ASThigh=06070
 
 This is the classic paper where Hoyle shows this. It has been cited
 repeatedly in the literature on the AP as an example of a genuine
 prediction of the AP. For example, Barrow and Tipler cite it in their
 book.

Being sceptical about proof by citation, I failed to find such a  
mathematical proof in this paper. Its predictions are based on 
approximations and assumptions (the authors do acknowlege this) 
besides the AP.  Where exactly do you think is the proof?

Predictions that later turn out to be correct demonstrate predictive 
power of the sum of all assumptions, not of AP by itself.

 I don't care about flying rabbits. Stick to the issue at hand. 

This was the issue in the original message.

At the risk of beating a dead horse: plain AP does not exclude 
continuations of your world with different physical laws, as long as 
your self survives. Who knows - maybe you could live on in a crude 
simulation that completes ignores the concept of energy levels.

To make nontrivial predictions you clearly need more than the AP.
The theory of inductive inference provides what is missing:  
http://www.idsia.ch/~juergen/unilearn.html

Juergen




Re: Optimal Prediction

2002-03-28 Thread Juergen Schmidhuber


Bill Jefferys wrote:

 It's pointless wasting my time on this. As both Russell and I pointed
 out, this is a standard example that is cited by people who are
 knowledgeable about the AP. Either you have a different definition of
 predictive power than the rest of us do, or you don't understand
 Hoyle's very clearly written paper. In either case, it would be
 foolish of me to continue the thread.

 Goodbye. plonk

 Bill

Don't try to teach a pig to sing. It wastes your time and annoys the pig.


Predictive power is measurable by standard concepts of probability
theory
and complexity theory.  You may choose to ignore this, but don't include
all those who don't among the rest of us.

Write down all assumptions, derive the consequences, and observe that
the
AP _by itself_ cannot predict anything nontrivial.

Fortunately Hoyle was more careful than some who cite him - he just 
wrote: the results...were obtained subject to certain assumptions
...have not been demonstrated in a manner free from all doubt.
Nevertheless, the number of assumptions made was much less 
than the number of results obtained.

Thus he informally invoked Occam's razor: find short descriptions that
explain a lot. Occam's razor is not the AP.  It is formally treated by  
the theory of inductive inference. Although this theory is at the heart  
of what physicists are doing, some of them are not yet fully aware of
it.

Juergen(will be out of town for a while)




Optimal Prediction

2002-03-26 Thread Juergen Schmidhuber

Many on this list have discussed the anthropic principle. It
essentially says that the conditional probability of finding
yourself in a universe compatible with your existence equals 1.

But unfortunately the anthropic principle does not have any
predictive power. It does NOT predict there won't be any flying
rabbits tomorrow.

Clearly we need more than the anthropic principle to explain
the predictability of our universe. In particular, we need an 
optimal way of predicting the future, given past observations. 

And there is one!

Normally we do not know the true conditional probability 
distribution p(next event | past). But assume we do know that 
p is in some set P of distributions. Choose a fixed weight w_q 
for each q in P such that the w_q add up to 1 (for simplicity, 
let P be countable). Then construct the Bayesmix 
M(x) = Sum_q w_q q(x), and predict using M instead of the
optimal but unknown p.

How wrong is it to do that? The recent exciting work of my postdoc
Marcus Hutter (IDSIA) provides general and sharp (!) loss bounds:

Let LM(n) and Lp(n) be the total expected losses of the M-predictor 
and the p-predictor, respectively, for the first n events. Then 
LM(n)-Lp(n) is at most of the order of sqrt[Lp(n)]. That is, M is 
not much worse than p. And in general, no other predictor can do 
better than that!

In particular, if p is deterministic, then the M-predictor soon 
won't make any errors any more.

If P contains ALL recursively computable distributions, then M 
becomes the celebrated enumerable universal prior. That is, after 
decades of somewhat stagnating research we now have sharp loss 
bounds for Solomonoff's universal (but incomputable) induction 
scheme. And if we also include the distributions computable in 
the limit, we get sharp loss bounds for the even more general 
priors introduced in Algorithmic Theories of Everything: 
http://www.idsia.ch/~juergen/toesv2/

Similarly, if we replace M by the Speed Prior S - where S(x) is 
small if x is hard to compute by any method - we obtain appropriate 
loss bounds for computable S-based induction:
http://www.idsia.ch/~juergen/toesv2/node32.html

Alternatively, reduce M to what you get if you just add up 
weighted estimated future finance data probabilities generated by 
1000 commercial stock-market prediction software packages. If only 
one of them happens to work fine (but you do not know which) you 
still should get rich.

To learn more, please read

Optimality of Universal Bayesian Sequence Prediction for General Loss  
and Alphabet:ftp://ftp.idsia.ch/pub/techrep/IDSIA-02-02.ps.gz

and also check out Hutter's other recent papers at ICML, ECML, NIPS,
Int. J. of Foundations of CS: www.idsia.ch/~marcus



Juergen Schmidhuber
http://www.idsia.ch/~juergen/




Re: Does provability matter?

2001-12-19 Thread Juergen Schmidhuber


Wei Dai wrote:
 I don't understand how you can believe that the probability of more
 dominant priors is zero. That implies if I offered you a bet of $1
 versus your entire net worth that large scale quantum computation will
 in fact work, you'd take that bet. Would you really?

Your dollar against my 2 cents worth? I don't want to rip you off!

Ok, let me adopt a more careful Bayesian's fallback position: I just
assume certain things, then derive consequences, given the assumptions,
without precisely quantifying the plausibility of the assumptions 
themselves, letting the reader decide whether they are plausible or not.

Juergen




Re: Does provability matter?

2001-12-11 Thread Juergen Schmidhuber

Wei Dai wrote:
 I'm not sure I understand this. Can you give an example of how our
 universe might make use of an entire continuum of real numbers? How might
 someone show this if it were true?

I have no idea. In fact, I guess it is impossible.

 But if there is a formally describable prior that dominates the speed
 prior, and you agree that the more dominant prior doesn't have a prior
 probability of zero, then isn't the speed prior redundant? Wouldn't you
 get equal posterior probabilities (up to a constant multiple) by
 dropping the speed prior from your prior on priors, no matter what it
 assigns to priors that are not formally describable?

In the Bayesian framework we derive consequences of assumptions
represented as priors. The stronger the assumptions, the more specific
the
predictions. The Speed Prior assumption is stronger than the assumption
of a formally describable prior. It is not redundant because it yields
stronger
predictions such as: The computer computing our universe won't compute
much more of it; large scale quantum computation won't work; etc.

In fact, I do believe the Speed Prior dominates the true prior from
which our universe is sampled (which is all I need to make good
computable predictions), and that the probability of even more
dominant priors is zero indeed. But as a Bayesian I sometimes ignore
my beliefs and also derive consequences of more dominant priors.
I do find them quite interesting, and others who do not share my
belief in the Speed Prior might do so too.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Re: Does provability matter?

2001-11-28 Thread Juergen Schmidhuber

Wei Dai wrote:

 On Thu, Nov 15, 2001 at 10:35:58AM +0100, Juergen Schmidhuber wrote:
   Why do you prefer the Speed Prior? Under the Speed Prior, oracle universes
   are not just very unlikely, they have probability 0, right? Suppose one
   day we actually find an oracle for the halting problem, or even just find
   out that there is more computing power in our universe than is needed to
   explain our intelligence. Would you then (1) give up the Speed Prior and
   adopt a more dominant prior, or (2) would you say that you've encountered
   an extremely unlikely event (i.e. more likely you're hallucinating)?
  
   If you answer (1) then why not adopt the more dominant prior now?
 
  Why not adopt a more dominant prior now? I just go for the simplest
  explanation consistent with available data, where my simplicity measure
  reflects what computer scientists find simple: things that are easy
  to compute.

 You didn't explicitly answer my question about what you would do if you
 saw something that is very unlikely under the Speed Prior but likely under
 more dominant priors, but it seems that your implicit answer is (1). In
 that case you must have some kind of prior (in the Baysian sense) for the
 Speed Prior vs. more dominant priors. So where does that prior come from?

 Or let me put it this way. What do you think is the probability that the
 Great Programmer has no computational resource constraints? If you don't
 say the probability is zero, then what you should adopt is a weighted
 average of the Speed Prior and a more dominant prior, but that average
 itself is a more dominant prior. Do you agree?

If someone can use large scale quantum computing to solve a problem
whose
solution requires much more computation time than necessary to compute
the history of our own particular universe, then one could take this
as evidence against the Speed Prior.  You are right: to quantify this
evidence in a Bayesian framework we need a prior on priors.

Which one? Hm. Let me extend your question and ask: what's the
probability
that the Great Programmer is more than a mere programmer in the sense
that he is not bound by the limits of computability?  For instance,
if someone were able to show that our universe somehow makes use of an
entire continuum of real numbers we'd be forced to accept some even more
dominant prior that is not even computable in the limit.  We could not
even formally specify it.

So what's my prior on all priors? Since the attempt to answer such a
question might lead outside what's formally describable, I'll remain 
silent for now.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Re: Does provability matter?

2001-11-15 Thread Juergen Schmidhuber

Wei Dai wrote:

 Thanks for clarifying the provability issue. I think I understand and
 agree with you.

 On Tue, Nov 13, 2001 at 12:05:22PM +0100, Juergen Schmidhuber wrote:
  What about exploitation? Once you suspect you found the PRG you can use
  it
  to predict the future. Unfortunately the prediction will take enormous
  time to stabilize, and you never can be sure it's finished.
  So it's not very practical.

 By exploiting the fact that we're in an oracle universe I didn't mean
 using TMs to predict the oracle outputs. That is certainly impractical.

 There are a couple of things you could do though. One is to use some
 oracle outputs to predict other oracle outputs when the relationship
 between them is computable. The other, much more important, is to quickly
 solve arbitrarily hard computational problem using the oracles.

  I prefer the additional resource assumptions reflected
  by the Speed Prior.  They make the oracle universes very unlikely, and
  yield computable predictions.

 Why do you prefer the Speed Prior? Under the Speed Prior, oracle universes
 are not just very unlikely, they have probability 0, right? Suppose one
 day we actually find an oracle for the halting problem, or even just find
 out that there is more computing power in our universe than is needed to
 explain our intelligence. Would you then (1) give up the Speed Prior and
 adopt a more dominant prior, or (2) would you say that you've encountered
 an extremely unlikely event (i.e. more likely you're hallucinating)?

 If you answer (1) then why not adopt the more dominant prior now?

You are right in the sense that under the Speed Prior any complete
_infinite_ oracle universe has probability 0. On the other hand,
any _finite_ beginning of an oracle universe has nonvanishing
probability. Why? The fastest algorithm for computing all universes
computes _all_ finite beginnings of all universes. Now oracle universes
occasionally require effectively random bits.  History beginnings that
require n such bits come out very slowly: the computation of the n-th
oracle bit requires more than O(2^n) steps, even by the fastest
algorithm.
Therefore, under the Speed Prior the probabilities of oracle-based
prefixes quickly converge towards zero with growing prefix size. But in
the finite case there always will remain some tiny epsilon.

Why not adopt a more dominant prior now? I just go for the simplest
explanation consistent with available data, where my simplicity measure
reflects what computer scientists find simple: things that are easy
to compute.


Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Re: Does provability matter?

2001-11-13 Thread Juergen Schmidhuber


Wei Dai wrote:

 On Wed, Oct 31, 2001 at 10:49:41AM +0100, Juergen Schmidhuber wrote:
  Which are the logically possible universes?  Tegmark mentioned a
  somewhat
  vaguely defined set of ``self-consistent mathematical structures,''
  implying provability of some sort. The postings of Marchal also focus
  on what's provable and what's not.

 Do you understand Marchal's postings? I really have trouble making sense
 of them. If you do, would you mind rephrasing his main ideas for me?

No, unfortunately I do not understand him. But provability does seem
important to him.  Maybe someone else could post a brief and concise
summary?

  Is provability really relevant?  Philosophers and physicists find it
  sexy for its Goedelian limits. But what does this have to do with the
  set of possible universes?

 What does it mean for a universe to be provable? One interpretation is
 that there exists an algorithm to compute any finite time-space region of
 the universe that always halts. Is that what you mean?

Well, any finite object x has a halting algorithm that computes it.
So which are the unprovable things? They are formal statements such as
algorithm x will eventually compute object y. If it does then you can
prove it, otherwise in general you can never be sure. But why identify
universes with true or false statements? When I'm asking Is provability
really relevant? I am not really referring to provable universes -
I don't even know what that is supposed to be. I just mean: why should
it be important to show that a particular algorithm computes a
particular
universe? Or that a particular universe has a particular property (there
are many generally unprovable properties)?  Let's compute them all -
why worry about provability?

  The provability discussion seems to distract from the real issue. If we
  limit ourselves to universes corresponding to traditionally provable
  theorems then we will miss out on many formally and constructively
  describable universes that are computable in the limit yet in a certain
  sense soaked with unprovability.
 
  Example: a never ending universe history h is computed by a finite
  nonhalting program p. To simulate randomness and noise etc, p invokes
  a short pseudorandom generator subroutine q which also never halts. The
  n-th pseudorandom event of history h is based on q's  n-th output bit
  q(n)
  which is initialized by 0 and set to 1 as soon as the n-th statement in
  an ordered list of all possible statements of a formal axiomatic system
  is proven by a theorem prover that systematically computes all provable
  theorems.  Whenever q modifies some q(n) that was already used in the
  previous computation of h, p appropriately recomputes h since the n-th
  pseudorandom event.

 Living in this universe is like living in a universe with a
 halting-problem oracle, right? If you want to know whether program x
 halts, you just measure the n-th pseudorandom event, where n is such that
 the n-th theorem is x halts. Yes?

Yes. Once the history has stabilized this will give you the correct
answer.

 The problem with universes with halting-problem oracles (which I'll just
 call oracle universes) is that either the oracle is not exploitable
 (for example if no one in the universe figures out that the pseudorandom
 events actually correspond to theorems) in which case the universe just
 looks like an ordinary provable universe, or the oracle is
 exploitable, in which case the outcome becomes completely unimaginable for
 us. (Or is that not true? Can we imagine what would happen if an
 intelligent species came across an exploitable halting-problem oracle?)

  But why should this lack of provability matter?  Ignoring this universe
  just implies loss of generality.  Provability is not the issue.

 If oracle universes could exist, how significant are they? What's
 the probability that we live in such a universe, and how could we try to
 determine and/or exploit this fact?

If the computer on which the history of our universe is calculated is a
general Turing machine (as opposed to a monotone machine which cannot
edit
previous outputs), and if we make no resource assumptions such as those
embodied by the Speed Prior, then the oracle universes indeed represent
a large fraction of the probability mass of all universes. Why? Because
there are many oracle universes with very short algorithms.

How to determine whether we are in an oracle universe? In principle it
is possible, at least to the extent inductive inference is possible at
all! Just search through all oracle pseudo-random generators (PRGs) and
try to match their outputs with the apparently noisy observations. If
the observations are not really random, eventually you'll stumble across
the right PRG. It's just that you cannot prove that it is the right one,
or that its outputs have already stabilized.

What about exploitation? Once you suspect you found the PRG you can use
it
to predict the future. Unfortunately the prediction

Does provability matter?

2001-10-31 Thread Juergen Schmidhuber

Cannibalizing previous thread Provable vs Computable:

Which are the logically possible universes?  Tegmark mentioned a
somewhat
vaguely defined set of ``self-consistent mathematical structures,''
implying provability of some sort. The postings of Marchal also focus
on what's provable and what's not.

Is provability really relevant?  Philosophers and physicists find it
sexy for its Goedelian limits. But what does this have to do with the
set of possible universes?

The provability discussion seems to distract from the real issue. If we
limit ourselves to universes corresponding to traditionally provable
theorems then we will miss out on many formally and constructively
describable universes that are computable in the limit yet in a certain
sense soaked with unprovability.

Example: a never ending universe history h is computed by a finite
nonhalting program p. To simulate randomness and noise etc, p invokes
a short pseudorandom generator subroutine q which also never halts. The
n-th pseudorandom event of history h is based on q's  n-th output bit
q(n)
which is initialized by 0 and set to 1 as soon as the n-th statement in
an ordered list of all possible statements of a formal axiomatic system
is proven by a theorem prover that systematically computes all provable
theorems.  Whenever q modifies some q(n) that was already used in the
previous computation of h, p appropriately recomputes h since the n-th
pseudorandom event.

Such a virtual reality or universe is perfectly well-defined.  We can
program it. At some point each history prefix will remain stable
forever.
Even if we know p and q, however, in general we will never know for sure
whether some q(n) that is still zero won't flip to 1 at some point,
because of Goedel etc.  So this universe features lots of unprovable
aspects. 

But why should this lack of provability matter?  Ignoring this universe
just implies loss of generality.  Provability is not the issue.

Juergen Schmidhuber   

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Re: Predictions duplications

2001-10-29 Thread Juergen Schmidhuber

   From: Russell Standish [EMAIL PROTECTED]
   The only reason for not accepting the simplest thing is if it can be
   shown to be logically inconsistent. This far, you have shown no such
   thing, but rather demonstrated an enormous confusion between measure
   and probability distribution.
 
  Russell, this is really too much; please read any intro to measure
  theory,
  e.g., the one by Halmos.  Measures in general are _not_ probability
  distributions.  They are more than that; you can view them as _sets_ of
  probability distributions on sets that are related to each other in a
  specific way. Probability density functions define measures and
  therefore
  in general aren't probability distributions either. The uniform PDF on
  [0..1] yields the perfect trivial example; that's exactly why I used it
  before.  In the computability context, compare LI/Vitanyi 1997, chapters
  4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
  (continuous sample space with (semi)measure M(x)).

 This is an excellent reference, thank you. Example 4.2.1 gives the
 construction of the uniform measure over the set of strings, precisely
 what you claim didn't exist when we started this discussion.

You just want to annoy me, don't you?  For the last time: There is a
uniform *measure* on the strings, but no uniform probability
distribution.

   1) Halting theorem implies the set of halting programs is of measure
   zero within the set of all programs
 
  You mean, given a uniform measure? But why should you need the halting
  theorem for this trivial statement?  In fact, what exactly do you mean
  by
  halting theorem?  (Schoenfield's book provides a good intro to
  mathematical
  logic and computability theory.)
 

 The halting theorem is that there is no algorithm for predicting
 whether any program will halt. Of course, the result I was referring
 to is that the set of compressible strings is of measure zero. The set
 of halting programs actually has finite measure, since \Omega 
 0. However, there is still finite probability of an input string
 being nonhalting, and indeed the probability seems to be greater than
 0.7 (ie \Omega 0.217643 - see pg 218 Li and Vitanyi), so this still
 presents a problem for computationalism.

This is mixing up everything, in particular, different measures.
1) Infinite strings with finite program have *nonzero* measure, given
the *universal* measure (Omega is a number derived from the universal
measure).  2) They have measure zero, given the *uniform* measure (Omega
has nothing to do with it).  3) To see 2) we do not need the halting
theorem - we just compare the finite programs (countably many) to all
strings (uncountably many) and we are done. 4) The halting theorem is
no problem whatsoever when it comes to computable universes.  Why care
whether some computable universe stops or not?  If it does, it does,
otherwise it does not. In both cases we can compute it in the limit.

   The GP program (aka Universal Dovetailer) is not the simplest
   thing. The simplest describable thing is the set of all descriptions,
   that simply exist without being generated. That has precisely zero
   information or complexity, whereas the UD has some complexity of the
   order of a few 100 bits. The simplest prior is the uniform one, ie
   there is no bias whatsoever in the selection.
 
  Exist without being generated?  Precisely zero information or
  complexity? But with respect to what?

 With respect to the observer. With the respect to anything in
 fact. The set of all possible descriptions has precisely zero
 information, by definition, regardless of what observer or context you
 employ.

This is a vague informal statement, not a definition. Which are the
possible descriptions? What's the formal difference between a
description
and a non-description?  What is your anything if there is no device
for describing it formally? If anything does not convey any bit of
info then what exactly is it that conveys one bit? Or two?

  Any traditional definition of
  information/simplicity requires the specification of a formal
  description
  method, such as a Turing machine. You don't need one? Then how do you
  define information or complexity? How exactly do you define simple ?

 Actually, all it needs is an equivalencing procedure. See my On
 Complexity and Emergence paper. A Turing machine will do the job for
 you, but so will a neural network or an expert system following
 heuristic rules.

So you have to write down formally this equivalencing procedure, right?
Why should this be different in principle from writing down the formal
rules for a Turing machine? Why is a simplicity measure that depends on
your equivalencing procedure superior to a simplicity measure depending
on a Turing machine?  (The TM procedure includes yours - because it
includes all procedures - but not the other way round.)




Re: Predictions duplications

2001-10-26 Thread Juergen Schmidhuber

 From: Juho Pennanen [EMAIL PROTECTED]
 So there may be no 'uniform probability distribution' on the set of all
 strings, but there is the natural probability measure, that is in many
 cases exactly as useful.

Sure, I agree, measures are useful; I'm using them all the time. But in
general they are _not_ the same thing as probability distributions.

 From: Russell Standish [EMAIL PROTECTED]
 The only reason for not accepting the simplest thing is if it can be
 shown to be logically inconsistent. This far, you have shown no such
 thing, but rather demonstrated an enormous confusion between measure
 and probability distribution.

Russell, this is really too much; please read any intro to measure
theory,
e.g., the one by Halmos.  Measures in general are _not_ probability
distributions.  They are more than that; you can view them as _sets_ of
probability distributions on sets that are related to each other in a
specific way. Probability density functions define measures and
therefore
in general aren't probability distributions either. The uniform PDF on
[0..1] yields the perfect trivial example; that's exactly why I used it
before.  In the computability context, compare LI/Vitanyi 1997, chapters
4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
(continuous sample space with (semi)measure M(x)).

 1) Halting theorem implies the set of halting programs is of measure
 zero within the set of all programs

You mean, given a uniform measure? But why should you need the halting
theorem for this trivial statement?  In fact, what exactly do you mean
by
halting theorem?  (Schoenfield's book provides a good intro to
mathematical
logic and computability theory.)

 The GP program (aka Universal Dovetailer) is not the simplest
 thing. The simplest describable thing is the set of all descriptions,
 that simply exist without being generated. That has precisely zero
 information or complexity, whereas the UD has some complexity of the
 order of a few 100 bits. The simplest prior is the uniform one, ie
 there is no bias whatsoever in the selection.

Exist without being generated?  Precisely zero information or
complexity? But with respect to what?  Any traditional definition of
information/simplicity requires the specification of a formal
description
method, such as a Turing machine. You don't need one? Then how do you
define information or complexity? How exactly do you define simple ?




Re: Predictions duplications

2001-10-26 Thread Juergen Schmidhuber

Schmidhuber:

It's the simplest thing, given this use of mathematical
language we have agreed upon. But here the power of the
formal approach ends - unspeakable things remain unspoken.

Marchal:

I disagree. I would even say that it is here that the serious formal
approach begins. Take unprovable for unspeakable, we can
meta-reason (informally or formally) and study the logical structure
of the set of unprovable sentences by sound universal machines.

Schmidhuber:

Unprovable does not mean unspeakable! 
You can spell out many unprovable things.

Why care for the subset of provable sentences? 
Aren't we interested in the full set of all describable sentences? 
We can generate it, without caring for proofs at all.

Unspeakable means beyond formal definition. In particular, the
way we humans communicate and agree on a common formal language is not
formally defined. You write x and I say, ok, it's an x, but that's
all based on some sort of informal agreement that involves complex
pattern recognition and learning processes. Usually we do not question
the validity of outcomes of such cognitive processes, and just go ahead
with formal derivations.  But there is this major informal step *before*
formal reasoning can start, and good textbooks on logic acknowledge
this.




Re: Predictions duplications

2001-10-25 Thread juergen

 From: Russell Standish [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 
 I think we got into this mess debating whether an infinite set could
 support a uniform measure. I believe I have demonstrated this. 
 I've yet to see anything that disabuses me of the notion that a
 probability distribtuion is simply a measure that has been normalised
 to 1. Not all measures are even normalisable.

Russell, at the risk of beating a dead horse: a uniform measure is _not_ a
uniform probability distribution. Why were measures invented in the first
place? To deal with infinite sets.  You cannot have a uniform probability
distribution on infinitely many things. That's why measures isolate just
finitely many things, say, every bitstring of size n, and for each x of
size n look at the infinite set of strings starting with x. A uniform
measure assigns equal probability to each such set. Of course, then you
have a uniform probability distribution on those finitely many things
which are sets. But that's not a uniform probability distribution on
infinitely many things, e.g., on the bitstrings themselves!  The measure
above is _not_ a probability distribution; it is an infinite _set_ of
_finite_ probability distributions, one for string size 0, one for string
size 1, one for string size 2,...

 I realise that the Halting theorem gives problems for believers of
 computationalism. 

It does not. Why should it?

 I never subscribed to computationalism at any time,
 but at this stage do not reject it. I could conceive of us living in
 a stupendous virtual reality system, which is in effect what your GP
 religion Mark II is. However, as pointed out by others, it does suffer
 from turtle-itis, and should not be considered the null
 hypothesis. It requires evidence for belief.

By turtle-itis you mean: in which universe do the GP and his computer
reside?  Or the higher-level GP2 which programmed GP? And so on? But
we cannot get rid of this type of circularity - computability and
mathematical logic are simply taken as given things, without any
further explanation, like a religion. The computable multiverse, or
the set of logically possible or mathematically possible or computable
universes, represents the simplest explanation we can write down formally.
But what exactly does it mean to accept something as a formal statement?
What does it mean to identify the messy retina patterns caused by this
text with abstract mathematical symbols such as x and y?  All formal
explanations in our formal papers assume that we agree on how to read
them. But reading and understanding papers is a complex physical and
cognitive process. So all our formal explanations are relative to this
given process which we usually do not even question. Essentially, the
GP program is the simplest thing we can write down, relative to the
unspoken assumption that it is clear what it means to write something
down, and how to process it. It's the simplest thing, given this use
of mathematical language we have agreed upon. But here the power of the
formal approach ends - unspeakable things remain unspoken.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Re: Predictions duplications

2001-10-23 Thread juergen

 From: [EMAIL PROTECTED]:
 [EMAIL PROTECTED] wrote:
   From [EMAIL PROTECTED]:
   [EMAIL PROTECTED] wrote:
M measure:
M(empty string)=1
M(x) = M(x0)+M(x1) nonnegative for all finite x.
   
   This sounds more like a probability distribution than a measure. In
   the set of all descriptions, we only consider infinite length
   bitstrings. Finite length bitstrings are not members. However, we can
   identify finite length bitstrings with subsets of descriptions. The
   empty string corresponds to the full set of all descriptions, so the
   first line M(empty string)=1 implies that the measure is normalisable
   (ie a probability distribution).
  
  Please check out definitions of measure and distribution! 
  Normalisability is not the critical issue.
  
  Clearly: Sum_x M(x) is infinite. So M is not a probability
  distribution. M(x) is just measure of all strings starting with x:
  M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = 
 
 For a measure to be normalisable, the sum over a *disjoint* set of
 subsets must be finite. If the set of subsets is not disjoint (ie the
 intersection are not empty) then the sum may well be infinite.
 Bringing this to the case of finite strings. Each finite string is
 actually a subset of the set of infinite strings, each containing the
 same finite prefix. So the string 101010 is actually a subset of 10101
 and so on. The Sum_x M(x), where I assume x ranges over all strings
 will of course be infinite. However, since the set of finite strings
 is not disjoint, this doesn't imply M(x) is unnormalisable.
 Now when you realise that every finite string x is a subset of the empty
 string, it becomes clear that M(x) is normalised to precisely 1.

The point is: prob dists and measures are different things. There is a
good reason for giving them different names. Prob dists assign numbers
to individual objects, not to sets.  Traditional definitions as well as
those for semimeasures in http://www.idsia.ch/~juergen/toesv2/

  Neglecting finite universes means loss of generality though.
  Hence measures mu(x) in the ATOE paper do not neglect finite x: 
  
  mu(empty string)=1
  mu(x) = P(x)+mu(x0)+mu(x1)  (all nonnegative).
  
  And here P is a probability distribution indeed! 
  P(x)0 possible only for x with finite description. 
  
 
 The P(x)0 case above actually breaks the countably subadditive
 property, so mu(x) cannot be called a measure... I'm not entirely sure
 what you're getting at here.

The set of computable universes includes the finite ones which we may 
not ignore. How do they contribute to the measure of all universes?
For convenience introduce a third symbol $ besides 0 and 1. Now what is 
the measure of the set of all universes starting with 00110? It's the sum of
the measure of the set of universes starting with 001101 and
the measure of the set of universes starting with 001100 and
the measure of the single universe 00110$ that stops right after 00110. 
Once there is a $ symbol there won't be another 0 or 1.  
So mu(x) = P(x)+mu(x0)+mu(x1).  

Our favorite example is the universal Turing machine with random inputs.
For finite x: P(x) is the probability that the TM output is x and nothing
but x. mu(x) is something different, namely, the so-called universal measure 
of all strings _starting_ with x.  Again mu(x) = P(x)+mu(x0)+mu(x1).  

Juergen Schmidhuberhttp://www.idsia.ch/~juergen/




Re: Predictions duplications

2001-10-22 Thread juergen



 From [EMAIL PROTECTED]:
 [EMAIL PROTECTED] wrote:
  M measure:
  M(empty string)=1
  M(x) = M(x0)+M(x1) nonnegative for all finite x.
 
 This sounds more like a probability distribution than a measure. In
 the set of all descriptions, we only consider infinite length
 bitstrings. Finite length bitstrings are not members. However, we can
 identify finite length bitstrings with subsets of descriptions. The
 empty string corresponds to the full set of all descriptions, so the
 first line M(empty string)=1 implies that the measure is normalisable
 (ie a probability distribution).


Please check out definitions of measure and distribution! 
Normalisability is not the critical issue.

Clearly: Sum_x M(x) is infinite. So M is not a probability
distribution. M(x) is just measure of all strings starting with x:
M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = 

Neglecting finite universes means loss of generality though.
Hence measures mu(x) in the ATOE paper do not neglect finite x: 

mu(empty string)=1
mu(x) = P(x)+mu(x0)+mu(x1)  (all nonnegative).

And here P is a probability distribution indeed! 
P(x)0 possible only for x with finite description. 

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/






Re: Predictions duplications

2001-10-12 Thread juergen


In reply to Russell Standish and Juho Pennanen I'd just like to
emphasize the main point, which is really trivial: by definition, a
uniform measure on the possible futures makes all future beginnings of
a given size equally likely. Then regular futures clearly are not any
more likely than the irregular ones. I have no idea what makes Russell
think this is debatable. In most possible futures your computer will
vanish within the next second. But it does not. This indicates that our
future is _not_ sampled from a uniform prior.

Some seem to think that the weak anthropic principle explains the
regularity. The argument goes like this: Let there be a uniform measure
on all universe histories, represented as bitstrings.  Now take the tiny
subset of histories in which you appear.  Although the measure of this
subset is tiny, its conditional measure, given your very existence,
is not: According to the weak anthropic principle, the conditional
probability of finding yourself in a regular universe compatible with 
your existence equals 1.

But it is essential to see that the weak anthropic principle does not
have any predictive power at all. It does not tell you anything about
the future.  It cannot explain away futures in which you still exist
but irregular things happen. Only a nonuniform prior can explain this.

Which nonuniform prior is plausible? My favorite is the resource-optimal
Speed Prior. I am hoping and expecting that someone will confirm it soon
by finding a rather fast pseudorandom generator responsible for apparently
noisy events in our universe. But others may be more conservative and
go for the more dominant enumerable Solomonoff-Levin prior mu^M or maybe
even for the nonenumerable prior mu^E (which dominates mu^M while still
being computable in the limit) or maybe even for the extreme prior mu^G.


Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




 From [EMAIL PROTECTED]  Thu Oct 11 17:36:41 2001
 From: Juho Pennanen [EMAIL PROTECTED]
 
 
 
 I tried to understand the problem that doctors Schmidhuber 
 and Standish are discussing by describing it in the most 
 concrete terms I could, below. (I admit beforehand I couldn't 
 follow all  the details and do not know all the papers and 
 theorems referred to, so this could be irrelevant.)
 
 So, say you are going to drop a pencil from your hand and 
 trying to predict if it's going to fall down or up this 
 time. Using what I understand with comp TOE, I would take 
 the set of all programs that at some state implement a 
 certain conscious state, namely the state in which you 
 remember starting your experiment of dropping the pencil, 
 and have already recorded the end result (I abreviate this 
 conscious state with CS. To be exact it is a set of states,
 but that shouldn't make a difference). 
 
 The space of all programs would be the set of all programs 
 in some language, coded as infinite numerable sequences of 
 0's and 1's. (I do not know how much the chosen language + 
 coding has effect on the whole thing). 
 
 Now for your prediction you need to divide the 
 implementations of CS into two sets: those in which the 
 pencil fell down and those in which it fell up. Then you 
 compare the measures of those sets. (You would need to 
 assume that each program is run just once or something of 
 the sort. Some programs obviously implement CS several 
 times when they run. So you would maybe just include those 
 programs that implement CS infinitely many times, and 
 weight them with the density of CS occurrences during 
 their running.) 
 
 One way to derive the measure you need is to assume a 
 measure on the set of all infinite sequences (i.e. on
 all programs). For this we have the natural measure, 
 i.e. the product measure of the uniform measure on 
 the set containing 0 and 1. And as far as my intuition 
 goes, this measure would lead to the empirically correct 
 prediction on the direction of the pencil falling. And 
 if I understood it right, this is not too far from what 
 Dr. Standish was claiming? And we wouldn't need any 
 speed priors. 
 
 But maybe the need of speed prior would come to play if I 
 thought more carefully about the detailed assumptions
 involved? E.g. that each program would be run just once, 
 with the same speed etc? I am not sure.
 
 Juho 
 
 /
 Juho Pennanen
 Department of Forest Ecology, P.O.Box 24
 FIN-00014 University of Helsinki
 tel. (09)191 58144 (+358-9-191 58144)
 GSM 040 5455 845 (+358-40-5455 845)
 http://www.helsinki.fi/people/juho.pennanen
 */
 

 Resent-Date: Thu, 11 Oct 2001 18:57:25 -0700
 From: Russell Standish [EMAIL PROTECTED]
 
 [EMAIL PROTECTED] wrote:
  
  
  
  Huh? A PDF? You mean a probability density function? On a continuous set? 
 
 Probability Distribution Function. And PDF's are defined on all

Re: Predictions duplications

2001-10-11 Thread juergen



 From [EMAIL PROTECTED] :
 [EMAIL PROTECTED] wrote:
  
  So you NEED something additional to explain the ongoing regularity.
  You need something like the Speed Prior, which greatly favors regular 
  futures over others.
 
 I take issue with this statement. In Occam's Razor I show how any
 observer will expect to see regularities even with the uniform prior
 (comes about because all observers have resource problems,
 incidently). The speed prior is not necessary for Occam's Razor. It is
 obviously consistent with it though.

First of all: there is _no_ uniform prior on infinitely many things.
Try to build a uniform prior on the integers. (Tegmark also wrote that
... all mathematical structures are a priori given equal statistical
weight, but of course this does not make much sense because there is
_no_ way of assigning equal nonvanishing probability to all - infinitely
many - mathematical structures.)

There is at best a uniform measure on _beginnings_ of strings. Then
strings of equal size have equal measure.

But then regular futures (represented as strings) are just as likely
as irregular ones. Therefore I cannot understand the comment: (comes
about because all observers have resource problems, incidently).

Of course, alternative priors lead to alternative variants of Occam's
razor.  That has been known for a long time - formal versions of Occam's
razor go at least back to Solomonoff, 1964.  The big question really
is: which prior is plausible? The most general priors we can discuss are
those computable in the limit, in the algorithmic TOE paper. They do not
allow for computable optimal prediction though. But the more restrictive
Speed Prior does, and seems plausible from any programmer's point of view.

 The interesting thing is of course whether it is possible to
 experimentally distinguish between the speed prior and the uniform
 prior, and it is not at all clear to me that it is possible to
 distinguish between these cases.

I suggest to look at experimental data that seems to have Gaussian
randomness in it, such as interference patterns in split experiments.
The Speed Prior suggests the data cannot be really random, but that a
fast pseudorandom generator PRG is responsible, e.g., by dividing some
seed by 7 and taking some of the resulting digits as the new seed, or
whatever. So it's verifiable - we just have to discover the PRG method.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/






Re: Predictions duplications

2001-10-11 Thread juergen
 falsifiability, in line with Solomonoff's theory of inductive
inference and Occam's razor, which is not just a wishy-washy philosophical
framework like Popper's.

Similarly, today's string physicists accept theories for their simplicity,
not their falsifiability.  Just like nobody is able to test whether
gravity is the same on Sirius, but believing it makes things simpler.

Again: the essential question is: which prior is plausible? Which
represents the correct notion of simplicity? Solomonoff's traditional
prior, which does not care for temporal complexity at all? Even more
general priors computable in the limit, such as those discussed in
the algorithmic TOE paper?  Or the Speed Prior, which embodies a more
restricted concept of simplicity that differs from Kolmogorov complexity
because it takes runtime into account, in an optimal fashion?

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/





Bell's Inequality Determinism vs Nondeterminism

2001-10-03 Thread juergen

Einstein, Podolsky and Rosen (1935) suggested hidden variables unexplained
by standard quantum mechanics, and postulated local interaction only. In
1964 Bell introduced an inequality whose violation was thought to provide
evidence against _local_ hidden variable theories. Violation tests are
inconclusive as yet - see chain of papers starting maybe with Aspect et
al, PRL 49, p. 91 and 1804, 1982; and Franson's counter argument: PR D, p
2529, 1985, and numerous more recent papers. Many physicists now believe 
in nonlocality, but the issue is not settled yet.

Is Bell's inequality relevant to algorithmic TOEs?  It is about locality
assumptions - but ATOEs do not even insist on locality. There are
computable universes whose light speed exceeds ours, or where there is no
light at all, where there is global interaction among distant particles,
or where there are no particles as we know them, etc.  The point of ATOEs
is: as long as the computability assumptions hold, any complex universe
is unlikely, no matter whether Bell's inequality makes sense in it or not.

From this point of view Bell's inequality and locality are not
the real issues. The issue is more general.  It is determinism vs
nondeterminism.  There is NO experimental data proving the universe
is truly nondeterministic.  If it were, where would the unexplained
randomness come from? As pointed out by Saibal Mitra, non-crackpots
such as 't Hooft (physics Nobel 1999) endorse hidden variable
theory and determinism, e.g.: http://xxx.lanl.gov/abs/hep-th/0105105
http://xxx.lanl.gov/abs/hep-th/0104219

Sure, there is a lot of data that at first glance suggests probabilistic
aspects of particle behavior, but many pseudorandom generators produce
data that match Gaussian or other well-known distributions. I think
nobody has ever bothered to systematically check existing data for
pseudorandomness. (Note that decorrelation does not imply independence.)

Suppose the history of some universe is indeed generated by a
deterministic computer program. In principle this would make everything
predictable for anybody who knows the program. This does not at all
imply, however, that an observer evolving within the universe could
predict exactly what is happening light years away, for several reasons:

1. He may never get full access to the current hidden variables, i.e.,
the current state of the program, because of:

1a. some sort of prewired uncertainty principle that holds for local 
observers (but not for outsiders such as the Great Programmer).

1b. a maximal universe-specific speed that prevents the observer from
collecting precise information about distant parts of his universe.

2. Even if the observer had full access to the state, to make predictions
he would need a predictor built within his universe. In general, this
machine will run much slower than the universe itself, unable to predict
it in time.  (But perhaps able to postdict parts of it.)

To illustrate 2: there are pseudorandom generators that run extremely
slowly but produce extremely convincing results such as the enumerable but
effectively random number Omega.  The speed prior, however, suggests our
universe's pseudorandom generator is much faster than the Omega generator.

Juergen

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Re: Immortality

2001-09-28 Thread juergen




But there _is_ a modern religion based on universal principles.
More precisely, on universal computers.

According to the Great Programmer Religion, the Great Programmer wrote a
very short program that computes all computable universes.  One of them
is ours.

Some disciples of this religion find it plausible because the short
program is the simplest explanation of all observations. 

Others find it increasingly plausible because our own, currently
quite primitive virtual realities are getting more sophisticated all
the time. We observe a speed-up factor of 10^6 every 20 years, possibly
10^30 in the ongoing century.  More and more people, especially kids, are
in regular contact with virtual realities, and to them the new religion
may seem just like a natural extrapolation.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/toesv2/
http://www.idsia.ch/~juergen/everything/html.html



 From: K. S. Ryan [EMAIL PROTECTED]
 Date: Fri, 28 Sep 2001 15:15:45

 Religion is a system of beliefs describing our place in the cosmos.
 
 Tha basic premise of all religions is that we are best when we act in 
 accordance with universal principles.
 
 Prophets describe our place in the cosmos by explaining universal 
 principles.
 
 Fundamentalists of all sorts are in conflict with the modern world because 
 universal principles are increasingly scientific, secular, and physically 
 practical.
 
 Old school religions are losing market share to the modern world.
 But the modern world does not offer a unified system of beleif describing 
 our place in the universe. Contemporary truths are not packaged as a whole, 
 as a spiritual intellectual-emotional raison d'etre. Thus, while educated 
 modern worlders may not be convinced by old school religious beliefs, there 
 is not a modern unified system to replace them. Because we are human, we 
 think a lot, and need to know what the individual means to the whole. What 
 is our place in the cosmos? That is the question. And there is turbulance.






Re: Countable vs Continuous

2001-06-21 Thread juergen

 From: Joel Dobrzelewski [EMAIL PROTECTED]
 Subject: Re: Countable vs Continuous
 Date: Thu, 21 Jun 2001 08:41:13 -0400

 Juergen:
  There is the rather harmless kind: the countable one. And some say
  there is another kind, a strange one, the one associated with the
  uncountable continuum, the one whose very existence many deny.
 
  Do not lump them together.

 Yes, I can see how this distinction might be useful in some esoteric
 discussions.  But it seems (to me) to have little relevance to achieving a
 successful Theory of Everything.  And it might even distract us..

 Both kinds of infinities are inaccessible to humans.  So they can play no
 part in our theories.  In this sense, I think it's fair to lump them into
 the pile of things that are not helpful.

I think we may not ignore infinities for quite pragmatic, non-esoteric
reasons. Many believe the history of our own universe will be
infinite - certainly there is no evidence against this possibility.  Also,
any finite never-halting program for a virtual reality corresponds to
an infinite history. TOEs ignoring this seem unnecessarily restrictive.

 We cannot build the universe out of pieces themselves we cannot construct.
 (e.g. Pi)

What you cannot construct in finite time is just a particular
representation of Pi, namely, the one consisting of infinitely
many digits. But this is not a problem of Pi, it is a problem of
this particular representation. There are better, finite, unambiguous
representations of Pi: its programs.  You can manipulate them, copy them,
insert into other finite programs and theorem provers, etc.  Proofs of
Pi's properties are essentially proofs of properties of Pi's programs.

Really bad are those things that do not have ANY finite representation.

 Juergen, what do you think about the minimal cellular automaton?  Is it a
 good candidate ATOE (algorithmic theory of everything)?

it depends - minimal for what purpose?

Juergen

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Countable vs Continuous

2001-06-21 Thread juergen

There has been some confusion regarding the various kinds of infinity.

There is the rather harmless kind: the countable one. And some say there
is another kind, a strange one, the one associated with the uncountable
continuum, the one whose very existence many deny.

Do not lump them together.

The former is accessible by nonhalting computer programs. The latter is not.

A program that runs forever cannot consume more than countable time steps
and storage cells, adding a new cell whenever it needs one.  For example,
countably many steps and cells are sufficient to print all digits of
the real number Pi. Therefore Pi is computable in the limit.

But countable time and space resources are NOT sufficient to print all
digits of all random reals in the continuum.  In fact, countable resources
are not even sufficient to print all (countably many) digits of a single
real without finite individual description. Unlike Pi, such truly random
reals (and almost all reals are truly random) are NOT computable in the
limit, although all their finite beginnings are.

Pi has a finite description. Are all infinite objects with finite
descriptions computable in the limit? No. Counter example: Infinite Omega
(the halting probability of a universal Turing machine with random input)
has a finite description, but countable resources are NOT sufficient
to print it, although each finite beginning of Omega is computable in
the limit.

Algorithmic TOEs are limited to the comparatively few, countably many,
possibly infinite universe histories with finite descriptions. ATOEs deny
the existence of other histories, of histories we cannot even describe
in principle, of histories we cannot reasonably talk about.

Likewise, ATOEs are restricted to probabilities computable in the limit.
We cannot formally describe other probabilities, and therefore cannot
write reasonable papers about them. This apparently harmless restriction
is the very reason that any complex future (among all the possible
futures compatible with the anthropic principle) necessarily is unlikely.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Re: why not high complexity?

2001-06-06 Thread juergen



 From [EMAIL PROTECTED]  Thu May 31 18:14:55 2001
 From: Karl Stiefvater [EMAIL PROTECTED]
   O 
  O  Maybe you'd like to write down formally
 OO  what you mean.
  O  O   
  O  O   sure. i suspect we're talking past each other.
O O  
  OO  O  let M be the set of universes. let A_i be a
 O O  O  sequence of finite subsets of M, such that A_i
   O  O  is a strict subset of A_(i+1). define e(i) be
  O  O O the expected complexity of a uniformly chosen
 OOO member of A_i.
 O O  O  
   O O  Othen lim i-inf e(i) = inf.

This will not give you a uniform distribution on infinitely many
things.  For simplicity, consider integers instead of universes. Assign 
something like probability P(n)=6/(n^2 pi) to integer n. This yields 
nonvanishing probability for infinitely many integers. But it's not 
uniform. Uniformness and nonzero limits are incompatible.


 O  OO   O   Practical unpredictability due to
OO  OO   deterministic chaos and Heisenberg
  O OO O  O  etc is very different from true
  O  O O  O  unpredictability. For instance, despite of
   O O O OO  chaos and uncertainty principle my computer
   OOO O  O  probably will not disappear within the
 O  OO  O O  next hour. But in most possible futures it
 OO  O  O O  won't even see the next instant - most are
  OO O  O O  maximally random and unpredictable.
 OO  O OO O  
 OOO  O O O  yes - i think i understand what you're saying
  OO O  O O  here. a universe with high complexity is a very
 OOO O   OO  messy place indeed - computers disappear, etc.
 O OOO O OO  however, i think you'll agree, that our universe
 O OO    (unless it *is* using a pseudo-random number
 OO O    generator) is quite messy.

Not at all. It seems extremely regular. Whatever appears messy 
may be due to lack of knowledge, not to lack of regularity.

    OOO  i'm wondering if perhaps a different force is
  OO OO  keeping the complexity low. an anthropic force
  OO OO  - if complexity is too high, then life doesn't
 OOO OOO OO  evolve - and we don't see it.

According to the weak anthropic principle, the conditional probability
of finding ourselves in a universe compatible with our existence equals
1. But most futures compatible with our existence are complex.  So why 
is ours so regular? Algorithmic TOEs explain this, and add predictive 
power to the weak anthropic principle.

http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/






Re: why not high complexity?

2001-05-30 Thread juergen


 O O ??? - There is no way of assigning equal
  OO O O O   nonvanishing probability to infinitely
   O O O O   many mathematical structures, each being
 O O O   represented by a finite set of axioms.
  OO O O O   
 O   okay - strictly speaking, you are correct. but a
  OOOcommon trick is to compute equal-probabilities
 O   on finite subsets of the infinite set. and then
 O O O   you can take the limit as those subsets grow
 O O O   to the size of the infinite set.
OOO  O   
  OOOthe growing here is important - very often the
  O OO   O   order in which you add members to the set change
  O      how the series converges. but for the case of
  O OOO  expected complexity, it does not.

but in the limit uniform probabilities vanish. Maybe you'd like to
write down formally what you mean. 




Re: why not high complexity?

2001-05-30 Thread juergen



 From: Karl Stiefvater [EMAIL PROTECTED]
 Date: Mon, 28 May 2001 00:11:33 -0500
 
  O   OO OO  Max Tegmark suggests that .. all mathematical
  O O    structures are a priori given equal statistical
 OOOO O  weight and Jurgen Schmidhuber counters that
  O  O  OOO  there is no way of assigning nonvanishing
OOO Oprobability to all (infinitely many)
 OO  O   mathematical structures 

Not quite - the text in Algorithmic Theories of Everything says
... of assigning nonvanishing _equal_ probability ...

 and he then goes on
  O  (i think) to assign a weighting based upon
   OO   OOO  time-complexity.

Most of the weightings discussed in Algorithmic Theories of Everything
completely ignore time, except for one: the speed prior S derived from
the assumption that the universe-generating process is not only computable
but also optimally efficient.

Concerning time-independent weightings: Different computing devices
(traditional Turing  Machines, Enumerable Output Machines, General Turing
Machines) reflect different degrees of computability (traditional monotone
computability, enumerability, computability in the limit). This causes
various weightings. All favor short descriptions, given the device.

 O O  OOOi have to say i find Tegmark's argument more
 OO   O  persuasive - i can't see why the great
   O  O  programmer should be worried about runtime.

Even if He completely ignores runtime, He still cannot assign high
probability to irregular universes with long minimal descriptions.

 O O OO  furthermore, i feel intuitively that the
   O O   universes ought to have equal weight.

Some intuitively feel the sun revolves around the earth.

  OO   OOsuch a sort of probability can be defined, of
 O  O O  course, by taking the limit as finite subsets
  O  approach the full infinite set. as long as we
 OO OOO  get the same answer regardless of the order in
 O O O   which we grow the subset, the limit can be said
 OOO O O O   to be defined.

??? - There is no way of assigning equal nonvanishing probability to
infinitely many mathematical structures, each being represented by a
finite set of axioms.

Maybe the intention is to assign measure 2^-n to all histories of size n.
That would imply our environment will dissolve into randomness right now,
because almost all continuations of its rather regular history so far
are random. But instead the universe keeps following the nonrandom
traditional laws of physics, thus offering evidence against this measure.

 O O O   the problem is - such a view predicts that we
 O  O OO live in a universe of high Kolmogorov complexity
 OO   OO O   - not low complexity.
 O  O
 O OO O  but i don't see why this is such a surprise
  O O- living in such a universe, we ought to see
OO OO O  events occur which we cannot effectively
 O OOO  O O  predict. but that is exactly what we do see.

Practical unpredictability due to deterministic chaos and Heisenberg etc
is very different from true unpredictability. For instance, despite of
chaos and uncertainty principle my computer probably will not disappear
within the next hour. But in most possible futures it won't even see the
next instant - most are maximally random and unpredictable.  Any irregular
future, however, must have small measure, given the rather harmless
assumption of formal describability or computability in the limit.

http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Provable vs Computable

2001-05-04 Thread juergen


Which are the logically possible universes?  Max Tegmark mentioned
a somewhat vaguely defined set of ``self-consistent mathematical
structures,'' implying provability of some sort. The postings of Bruno
Marchal and George Levy and Hal Ruhl also focus on what's provable and
what's not.

Is provability really relevant?  Philosophers and physicists find
it sexy for its Goedelian limits. But what does this have to do with
the set of possible universes?

I believe the provability discussion distracts a bit from the
real issue. If we limit ourselves to universes corresponding to
traditionally provable theorems then we will miss out on many formally
and constructively describable universes that are computable in the
limit yet in a certain sense soaked with unprovable aspects.

Example: a never ending universe history h is computed by a finite
nonhalting program p. To simulate randomness and noise etc, p invokes a
short pseudorandom generator subroutine q which also never halts. The
n-th pseudorandom event of history h is based on q's  n-th output bit
q(n) which is initialized by 0 and set to 1 as soon as the n-th element
of an ordered list of all possible program prefixes halts.  Whenever q
modifies some q(n) that was already used in the previous computation of
h, p appropriately recomputes h since the n-th pseudorandom event.

Such a virtual reality or universe is perfectly well-defined.  At some
point each history prefix will remain stable forever.  Even if we know p
and q, however, in general we will never know for sure whether some q(n)
that is still zero won't flip to 1 at some point, because of Goedel etc.
So this universe features lots of unprovable aspects.

But why should this lack of provability matter? It does not do any harm.

Note also that observers evolving within the universe may write
books about all kinds of unprovable things; they may also write down
inconsistent axioms; etc. All of this is computable though, since the
entire universe history is.  So again, why should provability matter?

Juergen Schmidhuber   http://www.idsia.ch/~juergen/toesv2/




Re: Computing Randomness

2001-04-10 Thread juergen

Hal, you wrote:

 I believe that attempting an extensive detailed formal description of
 the Everything is the wrong approach. IMO - at least in this case - the
 more information used to describe, the smaller the thing described.

I was not able to follow this, but informal and vague descriptions of
everything have been around for millennia.  Now it is time to go beyond
this. One cannot scientifically discuss nonformal statements or theories.

You did refer to formal describability when you repeatedly wrote that
if a universe is a theorem cascade in an N-bit formal axiomatic system
then Chaitin's incompleteness dictates that the cascade must stop,
and there are no more than 2^[N + c] theorems.  For some reason this
statement even made it into your FAQ list.

Counter example: number theory is describable by finitely many bits of
axioms yielding an infinite cascade of provable theorems.

JS




Computing Randomness

2001-03-22 Thread juergen

Where does all the randomness come from?

Many physicists would be content with a statistical theory of everything
(TOE) based on simple probabilistic physical laws allowing for stochastic
predictions such as We do not know where this particular electron will
be in the next nanosecond, but with probability 0.04 we will find it in a
certain volume V.

Any source of randomness, however, leaves us with an unsatisfactory
TOE, since randomness does not have a compact or simple explanation, by
definition. Where does the enormous information conveyed by a particular
history of random events come from? A TOE that cannot explain this
is incomplete.

The in hindsight obvious solution is an ensemble TOE which covers all
possible universe histories.  The ensemble conveys less information than
most particular histories - one main motivation of this mailing list.

Which are the possible histories?  Let us focus on well-defined ensembles
only, and ignore those that cannot be sufficiently specified to permit
reconstruction through a formally describable computer. In particular,
we may ignore uncountable ensembles such as continua, or other ensembles
including histories without finite descriptions.

Is there an optimally efficient way of computing all the randomness in
all the describable (possibly infinite) universe histories?  Yes, there
is. There exists a machine-independent ensemble-generating algorithm
called FAST that computes any history essentially as quickly as this
history's fastest algorithm. Somewhat surprisingly, FAST is not slowed
down much by the simultaneous computation of all the other histories.

It turns out, however, that for any given history there is a speed
limit which greatly depends on the history's degree of randomness.
Highly random histories are extremely hard to compute, even by the optimal
algorithm FAST. Each new bit of a truly random history requires at least
twice the time required for computing the entire previous history.

As history size grows, the speed of highly random histories (and most
histories are random indeed) vanishes very quickly, no matter which
computer we use (side note: infinite random histories would even require
uncountable time, which does not make any sense). On the other hand,
FAST keeps generating numerous nonrandom histories very quickly; the
fastest ones come out at a rate of a constant number of bits per fixed
time interval.

Now consider an observer evolving in some universe history. He does not
know in which, but as history size increases it becomes less and less
likely that he is located in one of the slowly computable, highly random
universes: after sufficient time most long histories involving him will
be fast ones.

Some consequences are discussed in
http://www.idsia.ch/~juergen/toesv2/node39.html

Juergen Schmidhuber




Re: on formally describable ...

2001-03-21 Thread juergen

Bruno Marchal explained to Jesse Mazer:
 
 Schmidhuber's solution is based on a belongness relation between
 observer and universes which is impossible to keep once we take
 comp seriously. But even if we make sense to such a relation, it
 would only eliminates third person white rabbits and not the
 first person white rabbits: remember that the great programmer
 emulates all (semi)computable universe but also all possible 
 dreams.
 
 In fact Schmidhuber assume a solution of the mind body problem
 which is just incompatible with comp. Technically that makes 
 his work incomplete (at least).

Such statements keep failing to make sense to me and others I know.  
Anybody out there who does understand what is meant?





Re: on formally describable universes and measures

2001-03-09 Thread juergen



 From [EMAIL PROTECTED]  Sat Mar  3 18:05:53 2001
 From: Saibal Mitra [EMAIL PROTECTED]
 Jürgen wrote:
 - Original Message -
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Thursday, February 22, 2001 5:32 PM
 Subject: Re: on formally describable universes and measures
 
  Saibal Mitra wrote:
 
   I think the source of the problem is equation 1 of Juergen's paper. This
   equation supposedly gives the probability that I am in a particular
   universe, but it ignores that multiple copies of me might exist in
   one universe. Let's consider a simple example. The prior probability
   of universe i (i0) is denoted as P(i), and i copies of me exist in
   universe i. In this case, Juergen computes the propability that if you
   pick a universe at random, sampled with the prior P, you pick universe
   i. This probability is, of course, P(i). Therefore Juergen never has
   to identify how many times I exist in a particular universe, and can
   ignore what consciousness actually is.  Surely an open universe where an
   infinite number of copies of me exist is infinitely more likely than a
   closed universe where I don't have any copies, assuming that the priors
   are of the same order?
 
  To respond, let me repeat the context of eq. 1 [In which universe am I?]
  Let h(y) represent a property of any possibly infinite bitstring y, say,
  h(y)=1 if y represents the history of a universe inhabited by yourself
  and h(y)=0 otherwise.  According to the weak anthropic principle, the
  conditional probability of finding yourself in a universe compatible with
  your existence equals 1.  But there may be many y's satisfying h(y)=1.
  What is the probability that y=x, where x is a particular universe
  satisfying h(x)=1? According to Bayes,
  P(x=y | h(y)=1) =
  (P(h(y)=1 | x=y) P(x = y)) / (sum_{z:h(z)=1}  P(z))
  propto P(x),
  where P(A | B) denotes the probability of A, given knowledge of B, and
  the denominator is just a normalizing constant.  So the probability of
  finding yourself in universe x is essentially determined by P(x), the
  prior probability of x.
 
  Universes without a single copy of yourself are ruled out by the weak
  anthropic principle.  But the others indeed suggest the question: what can
  we say about the distribution on the copies within a given universe U
 (maybe
  including those living in virtual realities running on various computers
 in U)?
  I believe this is the issue you raise - please correct me if I am wrong!
  (Did you really mean to write i copies in universe i?)
 
 
 I did mean to write i copies in universe i, maybe it would have been better
 to write
 n(i) copies in universe i. Anyway, according to equation 1 the probability
 of universe x
 given that n(x) 0 is proportional to P(x), which is also intuitively
 logical. My point is
 that from the perspective of the observer, of which there are n(x) copies in
 universe x, things
 look different. Intuitively, it seems that the measure of the observer
 should be n(x)* P(x).
 E.g.  suppose there exist x1 and x2 such that P(x1) = P(x2) and n(x1) 
 n(x2)  0.
 It seems to me that the observer is more likely to find himself in universe
 x1 compared to
 universe  x2.

From an algorithmic TOE perspective the only important thing is 
that the measure is computable in the limit - a bit more below.

  Intuitively, some copies might be more likely than others. But what
  exactly does that mean? If the copies were identical in the sense no
  outsider could distinguish them, then the concept of multiple copies
  wouldn't make sense - there simply would not be any multiple copies. So
  there must be detectable differences between copies, such as those
  embodied by their different environments.
 
  So my answer would be: as soon as you have a method for identifying and
  separating various observer copies within a universe U, each
 distinguishable
  copy_i is different in the sense that it lives in a different universe
  U_i, just like you and me can be viewed as living in different universes
  because your inputs from the environment are not identical to mine.
 
  In general, the pair (U_i, copy_i) conveys more information than U by
  itself (information is needed to separate them).  The appropriate domain
  of universes x (to use the paper's notation) would be the set of all
 possible
  pairs of the form (separate universe, separate observer).
 
  Equation 1 above is perfectly applicable to this domain.
 
 
 Okay, but since I don't know which of the copies I am, the probability
 that I am one of the copies inside universe i is given as:
 Sum_{i = 1}^{n(U)} P(U_i)
 
 Is this  proportional to P(U)  or is it
 proportional to n(U) P(U) ?
 
 Saibal


To say something about some observer's future, given the past, we need a
probability distribution on the possible futures.  Does it really make
sense to speak of several different copies of some observer within a
given universe?  Not really, because the observers in general will have

Re: on formally describable universes and measures

2001-02-28 Thread juergen



 The best you can achieve is an algorithm that outputs at least the 
 computable infinite reals in the sense that it outputs their 
 finite descriptions or programs.
 
 I am not sure I understand you here.
 Are you aware that the set of descriptions of computable reals
 is not closed for the diagonalisation procedure.
 That is: you cannot generate all (and only)  descriptions of
 computable reals. The algorithm you are mentionning does not exist.
 You can only generate a superset of the set of all computable reals.
 That set (of description of all computable reals) is even
 constructively not *recursively enumerable* in the sense that,
 if you give me an algorithm generating the (description of) 
 computable real, I can transform it for building a computable 
 real not being generated by your algorithm. I guess you know that.
 
 That is why most formal constructivists consider their set of
 constructive reals as subset
 of the Turing computable reals. For exemple you can choose the 
 set of reals which are provably
 computable in some formal system (like the system F by Girard,
 in which you can formalize ..., well Hilbert space and probably 
 the whole of the *constructive* part of Tegmark mathematical ontology!
 That is very nice and big but not enough big for my purpose which
 has some necessarily non constructive feature. 
 About natural numbers and machines I am a classical
 platonist. About real numbers I have no definite opinion.

The describable reals are those computable in the limit by finite GTM
programs. There is a program that eventually outputs all finite programs
for a given GTM, by listing all possible program prefixes. Sure, in
general one cannot decide whether a given prefix in the current list
indeed is a complete program, or whether a given prefix will still grow longer
by requesting additional input bits, or whether it will even grow forever,
or whether it will really compute a real in the limit, or whether it won't
because some of its output bits will flip back and forth forever:
http://rapa.idsia.ch/~juergen/toesv2/node9.html

But what do such undecidability results really mean? Are they relevant in
any way? They do not imply that I cannot write down finite descriptions
of the describable reals - they just mean that in general I cannot know
at a given time which of the current list elements are indeed complete
descriptions of some real, and which are not. Still, after finite time
my list of symbol sequences will contain a complete description of any
given computable real.

Thus undecidable properties do not necessarily make things nonconstructive.

 Can you imagine yourself as a Platonist for a while, if only
 for the sake of the reasoning?

I am not even sure what exactly a Platonist is.

  Do I need any additional
 preliminaries to realize why I genuinely fail to understand your
 invariance lemma?
 
 Sure. The delays question for exemple. Let us follow Jesse Mazer
 idea of torture. Suppose I duplicate you and reconstitute you, not
 in Washington and Moscow but in some Paradise and some Hell.
 Would you feel more comfortable if I tell you
 I will reconstitute you in paradise tomorow and in hell only in
 3001 after C. ? Is that what you would choose?

Choose? Do I have a choice? Which is my choice?

 Computationalism is more a human right than a doctrinal truth. 

I skipped this statement and related ones...

Juergen





Re: on formally describable universes and measures

2001-02-23 Thread juergen


[EMAIL PROTECTED] to [EMAIL PROTECTED] :

 Certainly things that we can imagine even slightly, like real-valued
 observers, already have a kind of existence, in that they cause us
 to argue about them.

[EMAIL PROTECTED] to [EMAIL PROTECTED] :

 That's a bit like saying there is some truth to 1+1=3 just because we 
 can argue about it

[EMAIL PROTECTED] to GLevy:

 Many things are doubtful. 2+2=4 isn't.

[EMAIL PROTECTED] to [EMAIL PROTECTED] :

 There you go again.  But being sure isn't the same as being right.
 
 Despite the intuitively compelling nature of arithmetic as we know it,
 it is really quite arbitrary.  It is compelling only because we
 evolved in a world that provided some survival advantage to brains
 that interpreted sense experience that way, by way of major
 approximations and conflations.  But its formalizations, like the
 Peano axioms and the inference mechanism that produces theorems like
 1+1=2 really are just arbitrary system of rewriting rules.

 Its perfectly easy to construct equally pretty systems where 1+1 = 3
 or 1+1 = 1, starting with different initial strings or using different
 rewrite rules.  

When I say 1+1=2 isn't doubtful, without redefining 1,+,=,2, I
am assuming the particular traditional rewrite rules used by everybody,
not alternative systems where symbol 2 is replaced by 6, or + by
addition modulo group size.

 And you can build universes in such systems, where the
 arithmetic you find so correct never rears it misshapen head.

Algorithmic TOEs are indeed about all possible rewrite systems,
including your nontraditional ones.

Perhaps you would like to argue that our traditional rewrite system is
doubtful as it cannot prove its own consistence? But algorithmic TOEs
include even rewrite systems that are inconsistent, given a particular
interpretation imposed on innocent symbol strings. They are limited to 
all possible ways of manipulating symbols.

From any description-oriented and communication-oriented viewpoint,
however, this does not seem much of a limitation as we cannot even
describe in principle things outside the range of algorithmic TOEs.

 What's more, there are situations in our own neighborhood where
 alternate arithmetics are more natural than everyday arithmetic.  For
 instance, in a lasing medium, if you introduce one photon in a
 particular quantum state, and then add another photon in the same
 state, it is likely that you will find three photons in that state
 (then more and more - Boson statistics: the probability of a new
 recruit to a state occupied by n particles is proportional to
 n/(n+1)).  Photons in the same state are in principle
 indistinguishable from one another, so occupancy of a quantum state is
 a purer model of counting than the everyday one: when you count
 pebbles, thay remain quite distinguishable from one another, and it
 takes an arbitrary high-handed act of abstraction to say that THIS
 pebble, with its unique shape, color and scratch pattern is somehow
 the same as this other, completely different pebble.
 
 The quantum world in general, with its superpositions, entanglements
 and ephemeral virtual particles is probably poorly served by bimodal
 Aristotelian logic, never mind mathematical frameworks idealized from
 grouping pebbles.
 
 But because you are so exclusively wedded to these parochial ways of
 thinking, you feel you can just reject out of hand the existence
 (among many other things) of beings able to store, compute and
 communicate reals, even though many of their properties can be puzzled
 out.  PAH!

But you cannot even unambiguously define in a formal way (using any
describable set of axioms or rewriting rules) what your intuitive notion
of the continuum of reals really means. All provable properties of
the reals are expressible as finite symbol strings, and also have an
interpretation in a noncontinuous, countable model.

Maybe there _are_ important things beyond formal description, things we
cannot sensibly talk about. It seems wise though not to talk about them.







Re: on formally describable universes and measures

2001-02-22 Thread juergen


  From [EMAIL PROTECTED]  Sun Feb 18 01:16:16 2001
  The exchange between Bruno and Juergens is, I believe, instructive and
  constructive as it forces them to refine their positions.
  Where did I have to refine mine?
  JS
 That' right I guess. You didn't have to refine yours...I guess Dubito ergo
 cogito does not apply to you.
 George

Many things are doubtful. 2+2=4 isn't. Seriously - where did the present
discussion force a refinement of algorithmic TOEs? I do believe they leave
lots of space for refinements, but the current debate has not explored
this space; it has focused on other things instead.

Juergen




Re: on formally describable universes and measures

2001-02-22 Thread juergen




JS:
Then there is your invariance lemma: the way you quantify 1-indeterminacy
is independent of (3-)time, (3-)place, and (3-)real/virtual nature of the
reconstitution. This does not make sense, because if the (3-) probability
distribution on the possible futures and reconstitutions does depend on
time or place or other things (why not?) then 1-indeterminacy does so,
too. Under most distributions, some futures are less likely than others.
Hence there are nontrivial, distribution-dependent, probabilistic
1-predictions as well as quantifications of 1-indeterminacy that depend on
time/space/other things.

BM:
I see you genuily fail to understand the invariance lemma.  No problem. We
will come back to this until you get the TILT, (if you agree).

Bruno, I am usually skipping those of your paragraphs that contain
sentences such as physics is a branch of machine's psychology because
I have no idea what that is supposed to mean.  Still, I feel you do have
something nontrivial to say, I just have not been able to figure out
what exactly it is. Maybe if I knew why I genuinely fail to understand
the invariance lemma - please show me! 

JS:
Not uncomputable. Any past is finite. 
BM:
I was talking about the immediate next futur.

But any finite future is computable by a long program as well.
The problems arise with infinite futures.

JS:
I am prejudiced against claims of rigorous proof when even the 
assumptions are unclear; and against statements that are plain 
wrong, such as the UD generates all real numbers.

BM:
I am not claiming I am rigorous, except when you say I am vague
and when you ask me precisions which are not relevant.
The sentence the UD generates all real numbers is ambiguous.
Either you interpret it as
 The UD generates (enumerates) the set of all real numbers 
This does indeed contradict Cantor's theorem. 
Or you interpret it as 
 All real number are (individually) generated by the UD.
In which case, with the usual definition of generating a
real (generating all its prefixes) it is just correct. Isn't it?

No, it isn't, since generating an individual real is not equivalent to
generating all prefixes of all reals. Generating an individual real
means generating all prefixes of that individual real, AND NOTHING
ELSE. Generating a real means you somehow have to be able to identify
and completely describe that particular real. If you cannot do this
without describing lots of other things then the individual real does
not exist from any constructive perspective.

The trivial algorithm ALPHABET 
( http://rapa.idsia.ch/~juergen/toesv2/node27.html )
whose outputs are 0,1,00,01,10,11,000is
not generating a description of any individual infinite real because it
never creates a complete representation thereof. Ambiguity arises because
each of the outputs is just a prefix of many infinite reals.

The best you can achieve is an algorithm that outputs at least the 
computable infinite reals in the sense that it outputs their 
finite descriptions or programs.

BM:
So I ask you again: do you agree that comp entails the existence of
first person indeterminacy, as it is shown by the self-duplication
thought experience?

If it just means you don't know in advance in which possible future you'll
end up, provided there is a nontrivial distribution on the possible
futures, then this is ok (and trivial). Do I need any additional
preliminaries to realize why I genuinely fail to understand your
invariance lemma?

Juergen






Re: on formally describable universes and measures

2001-02-20 Thread juergen




This time I'll repeat only a fraction of the 500 lines in your reply:

From [EMAIL PROTECTED]:
 Suppose you survive only through a simulation of
 the big bang at the level of the quantum superstring, membrane, etc. 
 then the correct level of substitution is the level of the quantum
 superstring, membrane, etc. 
 
 Remember the definition of COMP, it says that *there exists* such a
 level. It does not say that this or that *is* the correct level.

Ok.

 It is a sort of admission of ignorance. This ignorance is 
 fundamental. Indeed it  has been shown (independently by a 
 lot of people---ref in my papers) that comp entails we cannot
 know the correct levels. 
 We can bet on it, though, and we can make reasoning
 relatively to correct bets.
 
 Normally a constructive philosopher should abandon comp right here,
 because it follows from that theorem that we cannot be machine in
 any constructive sense.

Which theorem? Send pointer to its proof. Not to its informal
description, but to its proof. 

(One reason why I doubt this: isn't the lowest possible level - embodied
by what's computable in the limit - sufficient? Why not run all universes
GTM-computable in the limit?  If one of them is ours, then the set-up
is constructive.)

 You miss the point. Even the one who as PI on his T-shirt is
 wrong if he believes PI helps him to predict the issue of the
 next self-duplication.
 Note that if the program remains as lenghty as the sequence, as it
 happens for most Schmidhubers---in the iterated self-duplication,
 these sequence are called uncomputable by Solovay, Chaitin, etc.

Not uncomputable. Any past is finite. So it is at worst random or
Martin-Loef-random or incompressible.  None of those you cite is
careless when it comes to the difference between countable and uncountable
sets. None claims one can compute a continuum by dovetailing. Dovetailing
will be forever limited to the countable realm.

 Is there a probability distribution on this set
 (if not, you cannot predict anything)? Which one?
 
 You talk really as if probability was the only manner you
 know for quantifying uncertainty.
 
 Beside probability there exist other ways to handle the 
 uncertain. The one I know
 the best is Dempster-Shafer theory. 
 (I have work some years with expert in that field).

 Not only I do not restrict myself to the uniform distribution, but 
 I don't share your assumption that the only way for quantifying 
 uncertainty is probability. Why not Dempster-Shafer theory of evidence ?

The various Dempster-Shafer (DS) approaches are no alternatives
to probability theory. They are extensions designed to facilitate
handling of uncertainty in contexts where lack of belief does not imply
disbelief. But DS is essentially about hulls confining sets of
traditional distributions, and thus compatible with the traditional
framework of Bayesian statistics. (Variants of DS that are not yield
paradoxical results.)
 
 In the first part of my thesis:
 I am not pretending that I have solved the mind-body problem, nor
 the problem of the origin of the physical laws, nor the QCU.
 But I have rigorously proved that with comp these problems are 
 equivalent.

It is this recurring type of claim I find so irritating: rigorous 
proof without formal framework. 

 A weakness is that I am lead toward hard mathematics.

What a strange remark. The weakness of your texts is that they are
so informal.

 Which unique formalisation? Please write it down!
 How can you possibly isolate it by informal reasoning? 
 
 I was talking *there* about the modal logic G, G*, 
 S4Grz, Z1, Z1*, etc.
 These formal logics are intensional (modal) variation of the
 provability logics  of the sound  self-referentially correct 
 machine. I have provide semantics, and theorem provers.
 See explanation and technical details in my thesis and in my 
 papers.

Your thesis is in French. Your papers are informal. They always include
sentences such as: Actually such proof and clarification is one of the
main result in my thesis ... without going into details I will briefly
try to convey the main line of the argument (p 4 of paper dated sept
24, 2000).  Then follow informal examples, references to philosophers,
and unsubstantiated claims such as the UD generates all real numbers,
when it only generates all their finite prefixes, which is a fundamental
difference.

Then there is your invariance lemma: the way you quantify 1-indeterminacy
is independent of (3-)time, (3-)place, and (3-)real/virtual nature of the
reconstitution. This does not make sense, because if the (3-) probability
distribution on the possible futures and reconstitutions does depend on
time or place or other things (why not?) then 1-indeterminacy does so,
too. Under most distributions, some futures are less likely than others.
Hence there are nontrivial, distribution-dependent, probabilistic
1-predictions as well as quantifications of 1-indeterminacy that depend on
time/space/other things.

 You are also telling us 

Re: on formally describable universes and measures

2001-02-16 Thread juergen



This time I'll annotate your entire message to demonstrate how many 
things I tend to find unclear in your texts. 

 From: Marchal [EMAIL PROTECTED]
 Juergen wrote (among things):
 
 But how to answer an ill-posed question?  You promise that time and
 space will disappear at the end of the reasoning, but your question
 is about delays, and how can we speak about delays without defining
 time? Simulation time? Real time? Both? How? There is no way to continue
 without formal framework.
 
 We were doing a thought experiment. I haven't say that the delays were
 virtual. This is done much later in the reasoning. Of course, as George
 Levy says the permutation real/virtual makes no changes in the first
 person point of view, and does not change the distribution either.

To derive consequences we need to know the assumptions. Of course, this
holds for thought experiments as well. Without defining delays you cannot
derive something from delays.

 IF we accept COMP we survive with
 an artificial brain (Well: in case we were betting on a correct level
 of substitution).

What is a correct level of substitution?
Where does the betting come in? On which alternatives can we bet?
Which is the distribution on the alternatives?

 That means the doctor scan (at the right level) your brain, destroy

What is the right level?

 it, and then from the recollected information he builds a new one.
 The state of the artificial brain mirrors the state of your brain.
 You survive (with comp!).
  
 Now let us suppose the doctor keeps the information hidden in a drawer 
 during one year. 
 Real time delay, in the every day-type of life.
 After that delay he makes the reconstitution.
 I am just saying that, with comp, from the point of view of the 
 one who survive, that delay cannot be perceived. It has not influence
 the kept information of your brain.
 From the first person point of view the delay introduced by the doctor
 has not been and cannot been directly perceived.

That seems obvious, but what exactly do you mean by perceive,
as opposed to directly perceive?  You open your eyes - things have
changed - another time, another place. Which are the limits of perception here?

 (that's why I insist sometimes that reconstition booth has no windows!).

why sometimes, why sometimes not?  Anyway, in general things will
have changed - you may need some technical equipment to detect the
changes, still, in principle you could find out things are different,
at least in the real world. If you cannot, then why not - which are the
assumptions here?  Maybe you are talking about a virtual reality that you
can fully control? Then which is the precise set of virtual realities
you are considering? Is there a probability distribution on this set
(if not, you cannot predict anything)? Which one?

 Are you seeing my point ? It does also not change first person perception
 in case of self-multiplication.

Your point is the revival of an old science fiction theme.
But as soon as you want to derive something you need to state formal
assumptions, otherwise you'll end up with empty philosophic blabla.

 There is no way to continue without formal framework.
 
 I isolate a unique formalisation by an informal reasoning. 

Which unique formalisation? Please write it down!
How can you possibly isolate it by informal reasoning? 

 To formalise
 at this stage would automatically  put the mind-body problem 
 under the rug. 

Didn't you just say there is a unique formalisation? 
Why does formalisation suddenly put the mind-body problem under the rug?
What's the problem with the mind-body problem? Why is it incompatible
with formalisation?

 A TOE which doesn't address (at least) the mind-body
 problem is a TOS (a theory of *some* thing).

Without formal assumptions you have 
no theory of everything, no theory of something, no theory at all.

 But as I show below, those self-multiplication are easily 
 formalised (at least the third person description of those experiment
 are easily formalised). You can easily write a program which multiplied
 yourself (still betting on a correct level of course) relatively to 
 virtual environments. 

Correct level? Betting? On what - which are the alternatives?
Which is the distribution on the alternatives?

The program that multiplies observers _seems_ to go into the direction
of a formal ansatz, although it remains vague. How does the program
identify an observer, or myself? It is much easier to write a program that
copies entire computable universes together with the embedded observers,
because such a simple program does not need to identify observers
and separate them from their environment.  Please state precisely what
you really mean. Don't give another informal example, be precise.

 Are you among those who argues that talk on consciousness is a hoax ?

Not necessarily.

 How do you manage consciousness in your TOE-approach?

Algorithmic TOEs are about computable probability distributions on
universe histories

Re: Algorithmic TOEs vs Nonalgorithmic TOEs

2001-02-15 Thread juergen

 From [EMAIL PROTECTED]  Wed Feb 14 23:34:18 2001
 [EMAIL PROTECTED]:
  Yes. My point is: as long as we are not forced by evidence, why
  assume the existence of something we cannot describe or analyze in
  principle?

 In the spirit of this list, shouldn't we assume the existence (insofar
 as anything exists: i.e. it exists if it thinks it exists) of anything
 unless we are forced by the evidence to rule it out?
 Certainly things that we can imagine even slightly, like real-valued
 observers, already have a kind of existence, in that they cause us
 to argue about them.

That's a bit like saying there is some truth to 1+1=3 just because we 
can argue about it.




Re: Algorithmic TOEs vs Nonalgorithmic TOEs

2001-02-14 Thread juergen

 [EMAIL PROTECTED]:
 
 But stuff indescribable by us ought to be constructible by observers
 with real-valued input, output and memory living in real-valued
 worlds.  They might communicate identities of arbitrary real numbers
 as naturally as we refer to a points on a line, and execute analytic
 algorithms as easily as we fiddle with finite alphabets.

 Those observers may not be in reach of our analysis, but they are
 within the scope of their own.

Yes. My point is: as long as we are not forced by evidence, why assume
the existence of something we cannot describe or analyze in principle?




Re: Algorithmic TOEs vs Nonalgorithmic TOEs

2001-02-13 Thread juergen



 From [EMAIL PROTECTED]  Tue Feb 13 10:49:14 2001
 Subject: Re: Algorithmic TOEs vs Nonalgorithmic TOEs
 
 [EMAIL PROTECTED]:
  This constructive notion of formal describability is less restrictive
  than the traditional notion of computability, mainly because we do not
  insist on the existence of a halting program that computes an upper
  bound of the convergence time of p's n-th output bit. Formal
  describability thus pushes constructivism to the extreme, barely
  avoiding the nonconstructivism embodied by even less restrictive
  concepts of describability.
 
 It certainly tests the limits, since there's in principle no way of
 knowing in general at any time whether a particular bit of the answer
 on the tape is yet the final answer.

Sure. Still, each bit eventually stabilizes forever. Hence the entire
bit sequence is constructively describable, and thus a potential universe
history that we may not ignore - it might be ours.

Example 1.1 [Pseudorandom universe] Let x be an infinite sequence
of finite bitstrings x^1,x^2,... representing the history of
some discrete universe, where x^k represents the state of the
universe at discrete time step k, and x^1 the Big Bang (compare
http://www.idsia.ch/~juergen/everything/node1.html ).  Suppose there is
a finite algorithm A that computes x^{k+1} (k \geq 1)  from x^{k} and
additional information noise^k (this may require numerous computational
steps of A, that is, local time of the universe may run comparatively
slowly).  Assume that noise^k is not truly random but calculated by
invoking a finite pseudorandom generator subroutine. Then x is describable
because it has a finite constructive description.

Example 2.1 [Pseudorandom universe based on halting problem] Let x
be a universe history in the style above.  Suppose its pseudorandom
generator's  n-th output bit PRG(n) is 1 if the n-th program of an ordered
list of all possible programs halts, and 0 otherwise. Since PRG(n) is
describable, x is too.  But there is no halting algorithm computing
PRG(n) for all n, otherwise the halting problem would be solvable,
which it is not.  Hence in general there is no computer that outputs
x and only x without ever editing some previously computed history:
http://rapa.idsia.ch/~juergen/toesv2/node7.html


 I see how you could compute Omega: both create and run all Turing
 machines in a diagonalized way, and each time one halts, increase the
 approximation of the halting probability by the appropriate amount.
 But you'd never be sure of even bits near the front, because some
 running small machines might halt anytime, altering those early bits.

Sure. But each bit of Omega will eventually stabilize forever.  Omega is
actually not so bad as it is enumerable - you won't even need a GTM
for it, you can compute it on an EOM. GTMs are required, however, for
constructively describable numbers even worse than Omega. See Theorem
3.3: http://rapa.idsia.ch/~juergen/toesv2/node14.html

  ... Machines with real-valued registers ...
  This does not lead outside the GTM realm as long as the indefinitely
  growing register contents are finite for each finite time step.
 
 Well, I was thinking they would have operations like reciprocal (or
 maybe 1/(x+1) to limit the range) which could instantly generate
 non-terminating expansions.  More interestingly they might include
 transcendental operations to generate non-repeating expansions.
 Hey, they could even have the Omega operation, which loads
 a register with Omega, all bits finalized.
 
 That certainly beats GTMs.

Check out analytic TMs (Hotz, Vierke and Schieffer,
1995) and R-Machines (Blum, Shub, Smale, 1989):
http://rapa.idsia.ch/~juergen/toesv2/node47.html The alphabet of such TMs
is indeed real-valued instead of binary. This is beyond constructivism
though.  GTMs are at the extreme end of the constructive spectrum.
They are beaten by nonconstructive devices only.

But stuff describable by nonconstructive devices only does not even exist.
Does it?





Re: on formally describable universes and measures

2001-02-12 Thread juergen


 Resent-Date: Fri, 9 Feb 2001 06:15:47 -0800
 Subject: Re: on formally describable universes and measures
 From: Marchal [EMAIL PROTECTED]
 
 No, I do not. I suggest you first define a formal framework for
 measuring delays etc. Then we can continue.
 
 You should have told me this at the preceeding step which was
 also informal (although precise). 
 I am proposing a thought experiment which is
 a kind of reductio ad absurdo here (remember that time and
 space will disappear at the end of the reasoning).
 
 My feeling is that, for some unknow reason
 you have decided to elude the reasoning.
 
 That seems clear with your answer to Russell Standish: you 
 are saying 2+2=4 and I am saying 2+2=5! You are saying that
 I am fully wrong, but you don't tell me where.
 
 How am I suppose to take your disagrement here. You don't really
 answer the question.

But how to answer an ill-posed question?  You promise that time and
space will disappear at the end of the reasoning, but your question
is about delays, and how can we speak about delays without defining
time? Simulation time? Real time? Both? How? There is no way to continue
without formal framework.

  What does your theory predict with respect to 
 the following experience: You are scanned read and annihilate
 at Amsterdam. I reconstitute you in Washington tomorrow, and at
 Moscow in one billion years. Are your expectations different
 from the situation where the two reconstitutions are simultaneous.

Expectations with respect to what?  Moscow one billion years from
now might be different from Washington tomorrow, so there seem to be
two different possible futures. The essential question is: what is the
distribution on the possible futures? Is the distribution computable?
How does the distribution depend on your delays and other computable (?)
things? Are there just 2 possible futures? Or 10? Or infinitely many?

 If you want to be formal, let us accept classical Newtonian
 mechanics for the sake of the argument. You know that with comp
 such experience are possible *in principle*, and that is all what
 we need for the reasoning.
 
 Should we or should we not take these delays into account when
 evaluating the first-person indeterminacy? What does your
 theory say? What do you say? 

Again I fail to understand the question.  Please define delays! How
many possible delays are there? Are they computable? What exactly is this
indeterminacy? Is it something different from an ordinary distribution? 
If so, what is it?  If not, why don't you call it a distribution?

The theory first asks: what is the distribution on all possible
futures?  Maybe you say you do not know. Since it is an algorithmic
theory, it answers: ok, distribution unknown, but if it is describable
(GTM-computable in the limit), then I still can say something, namely,
I can exclude almost all infinite futures (those without finite
descriptions) with probability one. And among those I cannot exclude,
the weird ones with very long minimal descriptions are very unlikely.

Maybe you now say you don't buy the describability assumption.  Then the
theory can't say nothing nontrivial no more. Neither can you though.

Juergen




Re: on formally describable universes and measures

2001-02-09 Thread juergen

 From Russell Standish Thu Feb  8 23:52:51 2001
 Guys,
   I'm getting great enjoyment out of the titanic battle between
 Juergen and Bruno over the meaning of the UD. I'm learning a lot from

Battle? The case is clear.
You cannot battle over whether 2+2 equals 4 or 5.

 the exchange, however, I must admit I do see Bruno's point of
 view. His UD does seem to generate the reals (or equivalently the set
 of all infinite binary strings) in countable time. However, I know

Even Bruno admits this is not true. Thus his NONalgorithmic arithmetic
realism.

 that infinity (like probability) is a nasty concept, that can easily
 trip you up.

It's easy: just don't confuse the countable set of all finite beginnings
of the reals with the uncountable set of all reals, which does not exist
from an algorithmic or constructive point of view.

 There other ways of approaching this - for instance a finite set of
 axioms, when enumerated into theorems will tell us all that can be
 known about the real numbers.

I cut and paste from the thread Algorithmic TOEs vs Nonalgorithmic TOEs:
Loewenheim-Skolem implies that any first order theory with an uncountable
model such as the real numbers also has a countable model.  None of
the countably many theorems concerning the real numbers depends on the
continuum, whatever that may be.  Our vague ideas of the continuum
are just that: vague ideas without formal anchor.

 I have sympathy for one point of Juergen's though - in the space of
 descriptions (which we should agree by extension of logical positivism
 is all that can be discussed), computable descriptions must have
 higher measure than noncomputable ones. However, it seems to me that a
 random oracle is an essential component of consciousness and free will
 - why this is so I can only guess - and so the anthropic principle
 constrains the interesting universe to having these. It could be that
 this random oracle is simply a consequence of 1st person indeterminism
 that arises through the duplicability assumption, as Bruno points out,
 but then why should duplicability be necessary?

There is no evidence whatsoever that we need a random oracle.




Re: on formally describable universes and measures

2001-01-30 Thread juergen

Your vague answers to questions I did not ask keep evading the issue of
continuum vs computability in the limit. I give up.  JS




Re: on formally describable universes and measures

2001-01-19 Thread juergen

On Thu Jan 18 Bruno Marchal replied:

 Pi is enumerable. Most reals are not. Most of the dummy data is much
 less likely than extraordinary data (such as Pi),
 if the dummy data probability is approximable by a computer. 
 Compare Algorithmic Theories of
 Everything: http://www.idsia.ch/~juergen/toesv2/node23.html
 
 A program which generates all the reals is shorter than a program which
 generates Pi, which is itself shorter than a program which generates
 a particular real (for most particular  reals).
 
 Perhaps you confuse program generating reals and programs 
 generating *set* of reals.

I certainly do not.

There is no program generating the uncountable set of all reals.

There only is a program generating countably many prefixes of reals.

How to distinguish those from the countably many prefixes of the 
countable rational numbers?

 Instead of giving examples, could you just provide a short proof of your
 claim that there is no computable universe to which we belong?
 
 Tell me what you don't understand in my UDA post (which is the beginning
 of the shortest proof I know). 
 UDA is at   http://www.escribe.com/science/theory/m1726.html

I did look at it and found lots of words but no formal proof,
although it does say QED at some point.

You are repeatedly talking about universes generated by a dovetailer. All
these universes obviously must be computable, otherwise the dovetailer
could not compute them. So how can you claim that there is no computable
universe to which we belong, when the very tool you are using generates
lots of universes to which we belong? It does not make sense to me - my
best guess is that you mean something quite different from what you say.

Maybe you just want to say we do not know in which of the many possible
computable futures we will end, but this is obvious and precisely the
reason why we need to look at the possible probability distributions on
possible future histories, to make nontrivial predictions, e.g.:
http://www.idsia.ch/~juergen/everything/node4.html  (1997)
http://www.idsia.ch/~juergen/toesv2/node15.html (2000)

 Let  3-you be your current computational state and 1-you your actual
 awareness.
 What happens is that 3-you belongs to an infinity of computational
 histories (generated by the UD) and the UDA shows that your expected
 futur 1-you is undetermined and that the domain of indeterminacy is 
 given by that set of computational histories.
 
 So we belongs to an infinity (a continuum) of
 infinite computational histories.

No continuum!

The infinite computational histories are countable. The continuum is not.

The concepts of dovetailing and continuum are incompatible.

The dovetailer will compute many histories featuring a Bruno or two,
but only countably many.

No continuum!

 PS I am rather buzy, so I am sorry if I am to short or if I take time
 for answering. Don't hesitate to make any remarks, though.

You are not too short as long as your legs reach the ground :-)

Juergen




Re: everything priors

1999-11-11 Thread Juergen Schmidhuber

Hi Max,

2) If so, should we really limit ourself to this particular kind
of mathematical structures? My concern is that we may be a bit too
narrow-minded if we do.

 But this sort of narrow-mindedness seems necessary to remain within the
 formally describable realm.  I'd go beyond computable structures only
 if forced by evidence, e.g., if someone shows our universe somehow won't
 run without all the real numbers. But for now there isn't any evidence
 in this vein.
Wait a sec: there's also no evidence that our particular
universe has seven spatial dimensions or a proton/electron
mass ratio different from 1836. But we're considering the whole
ensemble here.

Right. Our motivation is that the ensemble is compatible with existing
data yet simpler than the special case. But Occam's razor also motivates
us to consider just the computable universes as opposed to those based on,
say, non-computable real numbers. The ensemble of computable universes is
both compatible with known data AND simple. The ensemble of non-computable
universes is not!

Cheers,
Juergen




Re: Turing vs math

1999-11-04 Thread Juergen Schmidhuber

 Step n owns 2^(n-1) initial segments.

Bruno, why are we discussing this? Sure, in finite time you can compute
all initial segments of size n.  In countable time you can compute one
real, or a countable number of reals.  But each of your steps needs more
than twice the time required by the previous step.  Therefore you need 
more than countable time to compute all reals.

 Now, could you give me a bitstring which is not generated
 by this countably infinite process ?

Sure. The output of the program While TRUE print 1 won't
be computed in countable time by your procedure.

Juergen




Re: Turing vs math

1999-10-27 Thread Juergen Schmidhuber

 Why assume non-computable stuff without compelling reason?
 Shaved by Occam's razor.

Jacques:

On the contrary.  Why assume the lack of *any* given type of
mathematical stucture?  A true everything-hypothesis surely would not.
Occam's razor says: don't add extra distinctions such as a restriction
like that.

Note also that, as I said, computability isn't the real issue.  A
Turing machine can not be a continuous (but computable) structure.  Of
course the non-computable stuctures should exist too in an everything -
hypothesis.

The non-computable structures are just an ill-defined fidget of our
imagination. They do not exist in the sense that we cannot formally
describe them with a finite number of bits.  Textbooks and theorems about
real numbers are computable (finite symbol strings, finite proofs), most
real numbers are not.

Occam's razor really says: do not add any bits beyond those necessary
to explain the data. Observed data does not require more than a finite
number of bits, and never will. 

Non-computability is not a restriction. It is an unnecessary extension
that greatly complicates things, so much that we cannot even talk about 
it in a formal way.

Juergen Schmidhuber  www.idsia.ch




Re: Turing vs math

1999-10-26 Thread Juergen Schmidhuber

Jacques Mallah wrote:

  I agree with
 Alistair Malcolm that Turing machines are not enough
  A continuous structure is a perfectly good
 mathematical structure, but no Turing based scheme can include it.

Why assume non-computable stuff without compelling reason?

Shaved by Occam's razor.

Juergen



Juergen Schmidhuber www.idsia.ch




Re: Turing vs math

1999-10-22 Thread Juergen Schmidhuber

Bruno:

 Honestly it is still not clear. How could ever S(U)=TRUE be computable ?
 As a computer scientist I guess you know that even the apparently simple
 question does that piece of code computes the factorial function is not
 computable.

Sure, it's not even decidable in general whether a given program will
produce any output at all. Once the output is there, though, you can
easily decide whether it consists of, say, prime numbers. In absence of
evidence to the contrary we assume that the presence of your consciousness
(whatever that may be) is detectable by a computable process, namely, the
one employed by you, the observer, who decides that he exists, via some
sort of computation taking place as part of the evolution of his universe.

George:

 Juergens seems to be talking as if the measure of information is absolute. It
 is not. Information is always conditional information (Shannon), where the
 condition is the observer's a-priori database of information.

Sure, but in the Kolmogorov framework the role of prior knowlwdge vanishes
in the limit. Each prior corresponds to a particular Turing machine
TM with particular inbuilt prior knowledge. But TM2 can simulate TM1
using a compiler whose size is independent of the information content
of the universes they compute. For all U, K1(U)=O(K2(U)), where K1(U)
and K2(U) are U's information contents relative to T1 and T2. Similarly
P1(U)= O(P2(U)), where P1 and P2 are the respective probabilities given
Solomonoff-Levin distributions for T1 and T2. There is a good reason why
Solomonoff-Levin distributions are called universal distributions!

Hal:

 We then invoke the principle that large-program universes are inherently
 less likely than small-program universes, and presto! we have it more
 likely that we live in a universe without flying rabbits, without
 magic, etc.  That's the general argument we are striving to achieve.

I agree. That's precisely one of the major points of the 1997 paper.

 I do think that this argument has some problems, but it is appealing and
 if the holes can be filled it seems to offer an answer to the question.
 What do you think?

Where exactly are the holes?

Juergen




Re: Turing vs math

1999-10-21 Thread Juergen Schmidhuber

Bruno wrote:

 I don't take the notion of observer for granted.

Neither do I, of course. The observer O is something computable that
evolves in some universe U.

 The problem is that to be in a universe has no clear meaning

But it does.  There is a computable predicate S such that S(U)=TRUE  if
O inhabits U.  Fortunately, there is no need for us to further specify
what it means to ``inhabit,'' to be ``conscious'' or ``self-aware,'' or 
whether there is some other observer who applies S to U and uses S(U)
to identify O, etc.

Now we have a formally defined, conditional probability distribution
on all universes satisfying S.  I thought this to be clear, but maybe
I should have written it down explicitly.

Juergen




UTM vs math

1999-10-21 Thread Juergen Schmidhuber