Re: Predictions duplications

2001-10-29 Thread Marchal

Schmidhuber wrote:

Why care for the subset of provable sentences? 
Aren't we interested in the full set of all describable sentences?


We are interested in the true sentences. The provable one and the
unprovable one.


We can generate it, without caring for proofs at all.


If you mean generate all describable *sentences* , this is trivial
but then you generate only a language.

If you mean generate all computations, it is equivalent with generating
the proofs of all \Sigma_1 sentences (= the arithmetical UD).
The provability predicate is just a universal \Sigma_1 sentence.

(I recall a \Sigma_1 sentence is a sentence equivalent to (Ex)P(x) with
P algorithmicaly decidable).


Unspeakable means beyond formal definition.


Beyond turing nameability?
Beyond formal definition by whom? You know the universal sound
machine has no mean to define its own truth predicate (Tarski). Is it 
in that sense? The knowledge of p by the machine can be define (in a first
approximation) by p  Bew('p), in which case, although thoroughly
clear at the metalevel, the knowledge *by* the machine is beyound
any formal definition *for* the machine.

Bew = Godel provability predicate, 'p = the godel number of p.
Bew(x) = (Ey)B(y,x) with B(y,x) means y is the godel number of a proof
of a sentence with godel number x.


In particular, the
way we humans communicate and agree on a common formal language is not
formally defined.


OK. Most importantly the very meaning of finite or number, cannot
be formally defined. But I show that as far as we are sound universal
machine, or own knowledgeability predicate cannot be formally defined
by us.


You write x and I say, ok, it's an x, but that's
all based on some sort of informal agreement that involves complex
pattern recognition and learning processes. Usually we do not question
the validity of outcomes of such cognitive processes, and just go ahead
with formal derivations.


Yes, but the situation is worst for the very *meaning* of our
utterances.


But there is this major informal step *before*
formal reasoning can start, and good textbooks on logic acknowledge
this.


There is *this* one, but there is a deeper one with the *meaning*
of our formal presentations. We really bet we share a common intuition
on the notion of finitary procedure.
Note that intuitionist and classicist diverge beyond natural numbers.

We are not on the same lenghtwave Juergen. You believe there
is a computable universe. I am agnostic about that. But if I survive
or (just experience nothing) through a digital functionnal substitution
made at some level L then my experience is defined by the set of all 
consistent extensions defined by all consistent histories definissable
below L. With comp realities emerge from the way consistent machine's
memories organise themselves (through the many many histories going
through their states).

Your speed prior would be useful if it shows how it enhances
the weight of normal stories in the limit first person experiences
(due to unawareness of the reconstitution delays).
Note that quantum quantization of classical theories provide such
an explanation. I show comp need to extract that quantization from
a measure on the consistent extensions. I show the probability one
obeys the main axioms of quantum logic.

You ask me often why I give so much importance to the notion
of provability. The reason is that in some sense I just interview
scientific machines, as they can appear in the GP's work, about what
they are able to prove about their own consistent extensions, and how
and why they can arrive to probability consisderation. And provability
is enough non trivial for providing clues on the minimal necessary
logical structures on that set of consistent extensions.


Bruno




Re: Predictions duplications

2001-10-29 Thread Juergen Schmidhuber

   From: Russell Standish [EMAIL PROTECTED]
   The only reason for not accepting the simplest thing is if it can be
   shown to be logically inconsistent. This far, you have shown no such
   thing, but rather demonstrated an enormous confusion between measure
   and probability distribution.
 
  Russell, this is really too much; please read any intro to measure
  theory,
  e.g., the one by Halmos.  Measures in general are _not_ probability
  distributions.  They are more than that; you can view them as _sets_ of
  probability distributions on sets that are related to each other in a
  specific way. Probability density functions define measures and
  therefore
  in general aren't probability distributions either. The uniform PDF on
  [0..1] yields the perfect trivial example; that's exactly why I used it
  before.  In the computability context, compare LI/Vitanyi 1997, chapters
  4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
  (continuous sample space with (semi)measure M(x)).

 This is an excellent reference, thank you. Example 4.2.1 gives the
 construction of the uniform measure over the set of strings, precisely
 what you claim didn't exist when we started this discussion.

You just want to annoy me, don't you?  For the last time: There is a
uniform *measure* on the strings, but no uniform probability
distribution.

   1) Halting theorem implies the set of halting programs is of measure
   zero within the set of all programs
 
  You mean, given a uniform measure? But why should you need the halting
  theorem for this trivial statement?  In fact, what exactly do you mean
  by
  halting theorem?  (Schoenfield's book provides a good intro to
  mathematical
  logic and computability theory.)
 

 The halting theorem is that there is no algorithm for predicting
 whether any program will halt. Of course, the result I was referring
 to is that the set of compressible strings is of measure zero. The set
 of halting programs actually has finite measure, since \Omega 
 0. However, there is still finite probability of an input string
 being nonhalting, and indeed the probability seems to be greater than
 0.7 (ie \Omega 0.217643 - see pg 218 Li and Vitanyi), so this still
 presents a problem for computationalism.

This is mixing up everything, in particular, different measures.
1) Infinite strings with finite program have *nonzero* measure, given
the *universal* measure (Omega is a number derived from the universal
measure).  2) They have measure zero, given the *uniform* measure (Omega
has nothing to do with it).  3) To see 2) we do not need the halting
theorem - we just compare the finite programs (countably many) to all
strings (uncountably many) and we are done. 4) The halting theorem is
no problem whatsoever when it comes to computable universes.  Why care
whether some computable universe stops or not?  If it does, it does,
otherwise it does not. In both cases we can compute it in the limit.

   The GP program (aka Universal Dovetailer) is not the simplest
   thing. The simplest describable thing is the set of all descriptions,
   that simply exist without being generated. That has precisely zero
   information or complexity, whereas the UD has some complexity of the
   order of a few 100 bits. The simplest prior is the uniform one, ie
   there is no bias whatsoever in the selection.
 
  Exist without being generated?  Precisely zero information or
  complexity? But with respect to what?

 With respect to the observer. With the respect to anything in
 fact. The set of all possible descriptions has precisely zero
 information, by definition, regardless of what observer or context you
 employ.

This is a vague informal statement, not a definition. Which are the
possible descriptions? What's the formal difference between a
description
and a non-description?  What is your anything if there is no device
for describing it formally? If anything does not convey any bit of
info then what exactly is it that conveys one bit? Or two?

  Any traditional definition of
  information/simplicity requires the specification of a formal
  description
  method, such as a Turing machine. You don't need one? Then how do you
  define information or complexity? How exactly do you define simple ?

 Actually, all it needs is an equivalencing procedure. See my On
 Complexity and Emergence paper. A Turing machine will do the job for
 you, but so will a neural network or an expert system following
 heuristic rules.

So you have to write down formally this equivalencing procedure, right?
Why should this be different in principle from writing down the formal
rules for a Turing machine? Why is a simplicity measure that depends on
your equivalencing procedure superior to a simplicity measure depending
on a Turing machine?  (The TM procedure includes yours - because it
includes all procedures - but not the other way round.)




Re: Predictions duplications

2001-10-29 Thread Russell Standish

Juergen Schmidhuber wrote:
 
From: Russell Standish [EMAIL PROTECTED]
The only reason for not accepting the simplest thing is if it can be
shown to be logically inconsistent. This far, you have shown no such
thing, but rather demonstrated an enormous confusion between measure
and probability distribution.
  
   Russell, this is really too much; please read any intro to measure
   theory,
   e.g., the one by Halmos.  Measures in general are _not_ probability
   distributions.  They are more than that; you can view them as _sets_ of
   probability distributions on sets that are related to each other in a
   specific way. Probability density functions define measures and
   therefore
   in general aren't probability distributions either. The uniform PDF on
   [0..1] yields the perfect trivial example; that's exactly why I used it
   before.  In the computability context, compare LI/Vitanyi 1997, chapters
   4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
   (continuous sample space with (semi)measure M(x)).
 
  This is an excellent reference, thank you. Example 4.2.1 gives the
  construction of the uniform measure over the set of strings, precisely
  what you claim didn't exist when we started this discussion.
 
 You just want to annoy me, don't you?  For the last time: There is a
 uniform *measure* on the strings, but no uniform probability
 distribution.

Who is annoying who? I never claimed a uniform probability density (I
actually looked up the difference between probability density and
probability distribution yesterday in Li and Vitanyi) - only a uniform
measure. In fact, my first comment to you was Why do you insist on it
being a PDF?. Remember?

The simplest prior is the uniform measure on infinite strings. 

 
1) Halting theorem implies the set of halting programs is of measure
zero within the set of all programs
  
   You mean, given a uniform measure? But why should you need the halting
   theorem for this trivial statement?  In fact, what exactly do you mean
   by
   halting theorem?  (Schoenfield's book provides a good intro to
   mathematical
   logic and computability theory.)
  
 
  The halting theorem is that there is no algorithm for predicting
  whether any program will halt. Of course, the result I was referring
  to is that the set of compressible strings is of measure zero. The set
  of halting programs actually has finite measure, since \Omega 
  0. However, there is still finite probability of an input string
  being nonhalting, and indeed the probability seems to be greater than
  0.7 (ie \Omega 0.217643 - see pg 218 Li and Vitanyi), so this still
  presents a problem for computationalism.
 
 This is mixing up everything, in particular, different measures.
 1) Infinite strings with finite program have *nonzero* measure, given
 the *universal* measure (Omega is a number derived from the universal
 measure).  2) They have measure zero, given the *uniform* measure (Omega
 has nothing to do with it). 

The uniform measure and the universal measure are related. The uniform
prior is on the set of input strings to a UTM, the universal measure
is the induced measure on the output strings.

No. 1) here is the relevant statement, not 2).

 3) To see 2) we do not need the halting
 theorem - we just compare the finite programs (countably many) to all
 strings (uncountably many) and we are done. 4) The halting theorem is
 no problem whatsoever when it comes to computable universes.  Why care
 whether some computable universe stops or not?  If it does, it does,
 otherwise it does not. In both cases we can compute it in the limit.
 
The GP program (aka Universal Dovetailer) is not the simplest
thing. The simplest describable thing is the set of all descriptions,
that simply exist without being generated. That has precisely zero
information or complexity, whereas the UD has some complexity of the
order of a few 100 bits. The simplest prior is the uniform one, ie
there is no bias whatsoever in the selection.
  
   Exist without being generated?  Precisely zero information or
   complexity? But with respect to what?
 
  With respect to the observer. With the respect to anything in
  fact. The set of all possible descriptions has precisely zero
  information, by definition, regardless of what observer or context you
  employ.
 
 This is a vague informal statement, not a definition. Which are the
 possible descriptions? What's the formal difference between a
 description
 and a non-description?  What is your anything if there is no device
 for describing it formally? If anything does not convey any bit of
 info then what exactly is it that conveys one bit? Or two?
 

Choose an alphabet (a finite set of symbols). The set of all
descriptions is the set of all infinite length strings constructed
from that alphabet. I would suggest the notation A*, where A is the
alphabet, but I think this usually refers to the set of finite length

Re: Predictions duplications

2001-10-28 Thread Russell Standish

Juergen Schmidhuber wrote:
  From: Russell Standish [EMAIL PROTECTED]
  The only reason for not accepting the simplest thing is if it can be
  shown to be logically inconsistent. This far, you have shown no such
  thing, but rather demonstrated an enormous confusion between measure
  and probability distribution.
 
 Russell, this is really too much; please read any intro to measure
 theory,
 e.g., the one by Halmos.  Measures in general are _not_ probability
 distributions.  They are more than that; you can view them as _sets_ of
 probability distributions on sets that are related to each other in a
 specific way. Probability density functions define measures and
 therefore
 in general aren't probability distributions either. The uniform PDF on
 [0..1] yields the perfect trivial example; that's exactly why I used it
 before.  In the computability context, compare LI/Vitanyi 1997, chapters
 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
 (continuous sample space with (semi)measure M(x)).

This is an excellent reference, thank you. Example 4.2.1 gives the
construction of the uniform measure over the set of strings, precisely
what you claim didn't exist when we started this discussion.


 
  1) Halting theorem implies the set of halting programs is of measure
  zero within the set of all programs
 
 You mean, given a uniform measure? But why should you need the halting
 theorem for this trivial statement?  In fact, what exactly do you mean
 by
 halting theorem?  (Schoenfield's book provides a good intro to
 mathematical
 logic and computability theory.)
 

The halting theorem is that there is no algorithm for predicting
whether any program will halt. Of course, the result I was referring
to is that the set of compressible strings is of measure zero. The set
of halting programs actually has finite measure, since \Omega 
0. However, there is still finite probability of an input string
being nonhalting, and indeed the probability seems to be greater than
0.7 (ie \Omega 0.217643 - see pg 218 Li and Vitanyi), so this still
presents a problem for computationalism.

  The GP program (aka Universal Dovetailer) is not the simplest
  thing. The simplest describable thing is the set of all descriptions,
  that simply exist without being generated. That has precisely zero
  information or complexity, whereas the UD has some complexity of the
  order of a few 100 bits. The simplest prior is the uniform one, ie
  there is no bias whatsoever in the selection.
 
 Exist without being generated?  Precisely zero information or
 complexity? But with respect to what? 

With respect to the observer. With the respect to anything in
fact. The set of all possible descriptions has precisely zero
information, by definition, regardless of what observer or context you
employ.

 Any traditional definition of
 information/simplicity requires the specification of a formal
 description
 method, such as a Turing machine. You don't need one? Then how do you
 define information or complexity? How exactly do you define simple ?
 

Actually, all it needs is an equivalencing procedure. See my On
Complexity and Emergence paper. A Turing machine will do the job for
you, but so will a neural network or an expert system following
heuristic rules. 

Perhaps the simplest formal approach is say that the equivalencing procedure
is a function from the set of descriptions onto the integers
(labelling the equivalence classes). If that function is partial
recursive, then the equivalencing mechanism can be replaced by the
appropriate Turing machine. Computationalism is the creed that this
function must be partial recursive, but it is not necessary for the
definition of information or complexity.

I can see some subtle problems occurring if this equivalencing
procedure is stochastic in nature, however by summing over the
appropriate distributions, one should still end up with meaningful
results provided the variance is not too large.


Cheers


Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 ()
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02





Re: Predictions duplications

2001-10-26 Thread Juergen Schmidhuber

 From: Juho Pennanen [EMAIL PROTECTED]
 So there may be no 'uniform probability distribution' on the set of all
 strings, but there is the natural probability measure, that is in many
 cases exactly as useful.

Sure, I agree, measures are useful; I'm using them all the time. But in
general they are _not_ the same thing as probability distributions.

 From: Russell Standish [EMAIL PROTECTED]
 The only reason for not accepting the simplest thing is if it can be
 shown to be logically inconsistent. This far, you have shown no such
 thing, but rather demonstrated an enormous confusion between measure
 and probability distribution.

Russell, this is really too much; please read any intro to measure
theory,
e.g., the one by Halmos.  Measures in general are _not_ probability
distributions.  They are more than that; you can view them as _sets_ of
probability distributions on sets that are related to each other in a
specific way. Probability density functions define measures and
therefore
in general aren't probability distributions either. The uniform PDF on
[0..1] yields the perfect trivial example; that's exactly why I used it
before.  In the computability context, compare LI/Vitanyi 1997, chapters
4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
(continuous sample space with (semi)measure M(x)).

 1) Halting theorem implies the set of halting programs is of measure
 zero within the set of all programs

You mean, given a uniform measure? But why should you need the halting
theorem for this trivial statement?  In fact, what exactly do you mean
by
halting theorem?  (Schoenfield's book provides a good intro to
mathematical
logic and computability theory.)

 The GP program (aka Universal Dovetailer) is not the simplest
 thing. The simplest describable thing is the set of all descriptions,
 that simply exist without being generated. That has precisely zero
 information or complexity, whereas the UD has some complexity of the
 order of a few 100 bits. The simplest prior is the uniform one, ie
 there is no bias whatsoever in the selection.

Exist without being generated?  Precisely zero information or
complexity? But with respect to what?  Any traditional definition of
information/simplicity requires the specification of a formal
description
method, such as a Turing machine. You don't need one? Then how do you
define information or complexity? How exactly do you define simple ?




Re: Predictions duplications

2001-10-26 Thread Juergen Schmidhuber

Schmidhuber:

It's the simplest thing, given this use of mathematical
language we have agreed upon. But here the power of the
formal approach ends - unspeakable things remain unspoken.

Marchal:

I disagree. I would even say that it is here that the serious formal
approach begins. Take unprovable for unspeakable, we can
meta-reason (informally or formally) and study the logical structure
of the set of unprovable sentences by sound universal machines.

Schmidhuber:

Unprovable does not mean unspeakable! 
You can spell out many unprovable things.

Why care for the subset of provable sentences? 
Aren't we interested in the full set of all describable sentences? 
We can generate it, without caring for proofs at all.

Unspeakable means beyond formal definition. In particular, the
way we humans communicate and agree on a common formal language is not
formally defined. You write x and I say, ok, it's an x, but that's
all based on some sort of informal agreement that involves complex
pattern recognition and learning processes. Usually we do not question
the validity of outcomes of such cognitive processes, and just go ahead
with formal derivations.  But there is this major informal step *before*
formal reasoning can start, and good textbooks on logic acknowledge
this.




Re: Predictions duplications

2001-10-25 Thread Juho Pennanen



juergen wrote:

 Russell, at the risk of beating a dead horse: a uniform measure is _not_ a
 uniform probability distribution. Why were measures invented in the first
 place? To deal with infinite sets.  You cannot have a uniform probability
 distribution on infinitely many things. 


The last sentence is trivially true when 'probability distributions' are 
defined as probability measures on discrete sets. But someone could also 
think this is just irrelevant word-play on definitions.

There are uniform probability (density) functions on infinite sets, e.g. 
the uniform distribution on the interval [0,1], which gives the measure 
(probability) t for each interval [x,x+t]. Obviously, this distribution 
gives measure 0 for any singleton containing only one real number. Still 
it's uniform, though not a 'probability distribution' if one wishes to 
use the most restricted definition.

Similarly, there is the natural probability measure on the set of all 
infinite strings of 0's and 1's. It is 'uniform' in the sense that no 
string or place in a string is in a priviledged position. To define it, 
set n_0 as the set of all strings that have a 0 at the n'th place and 
n_1 as the set of all strings that have a 1 at the n'th place, for any 
n. Then set m(n_0)=m(n_1)=1/2, for all n. Using the standard definition 
of measure that has been cited on this list (Kolmogorov axioms), m has a 
unique extension which is a measure on the set of all strings.

So there may be no 'uniform probability distribution' on the set of all 
strings, but there is the natural probability measure, that is in many 
cases exactly as useful.

Juho




Re: Predictions duplications

2001-10-25 Thread juergen

 From: Russell Standish [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 
 I think we got into this mess debating whether an infinite set could
 support a uniform measure. I believe I have demonstrated this. 
 I've yet to see anything that disabuses me of the notion that a
 probability distribtuion is simply a measure that has been normalised
 to 1. Not all measures are even normalisable.

Russell, at the risk of beating a dead horse: a uniform measure is _not_ a
uniform probability distribution. Why were measures invented in the first
place? To deal with infinite sets.  You cannot have a uniform probability
distribution on infinitely many things. That's why measures isolate just
finitely many things, say, every bitstring of size n, and for each x of
size n look at the infinite set of strings starting with x. A uniform
measure assigns equal probability to each such set. Of course, then you
have a uniform probability distribution on those finitely many things
which are sets. But that's not a uniform probability distribution on
infinitely many things, e.g., on the bitstrings themselves!  The measure
above is _not_ a probability distribution; it is an infinite _set_ of
_finite_ probability distributions, one for string size 0, one for string
size 1, one for string size 2,...

 I realise that the Halting theorem gives problems for believers of
 computationalism. 

It does not. Why should it?

 I never subscribed to computationalism at any time,
 but at this stage do not reject it. I could conceive of us living in
 a stupendous virtual reality system, which is in effect what your GP
 religion Mark II is. However, as pointed out by others, it does suffer
 from turtle-itis, and should not be considered the null
 hypothesis. It requires evidence for belief.

By turtle-itis you mean: in which universe do the GP and his computer
reside?  Or the higher-level GP2 which programmed GP? And so on? But
we cannot get rid of this type of circularity - computability and
mathematical logic are simply taken as given things, without any
further explanation, like a religion. The computable multiverse, or
the set of logically possible or mathematically possible or computable
universes, represents the simplest explanation we can write down formally.
But what exactly does it mean to accept something as a formal statement?
What does it mean to identify the messy retina patterns caused by this
text with abstract mathematical symbols such as x and y?  All formal
explanations in our formal papers assume that we agree on how to read
them. But reading and understanding papers is a complex physical and
cognitive process. So all our formal explanations are relative to this
given process which we usually do not even question. Essentially, the
GP program is the simplest thing we can write down, relative to the
unspoken assumption that it is clear what it means to write something
down, and how to process it. It's the simplest thing, given this use
of mathematical language we have agreed upon. But here the power of the
formal approach ends - unspeakable things remain unspoken.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Re: Predictions duplications

2001-10-25 Thread Russell Standish

[EMAIL PROTECTED] wrote:
 
  From: Russell Standish [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  
  I think we got into this mess debating whether an infinite set could
  support a uniform measure. I believe I have demonstrated this. 
  I've yet to see anything that disabuses me of the notion that a
  probability distribtuion is simply a measure that has been normalised
  to 1. Not all measures are even normalisable.
 
 Russell, at the risk of beating a dead horse: a uniform measure is _not_ a
 uniform probability distribution. Why were measures invented in the first
 place? To deal with infinite sets.  You cannot have a uniform probability
 distribution on infinitely many things. That's why measures isolate just
 finitely many things, say, every bitstring of size n, and for each x of
 size n look at the infinite set of strings starting with x. A uniform
 measure assigns equal probability to each such set. Of course, then you
 have a uniform probability distribution on those finitely many things
 which are sets. But that's not a uniform probability distribution on
 infinitely many things, e.g., on the bitstrings themselves!  The measure
 above is _not_ a probability distribution; it is an infinite _set_ of
 _finite_ probability distributions, one for string size 0, one for string
 size 1, one for string size 2,...
 

Good grief, you've missed the point entirely! Measures do measure
infinite sets - consider the set [0,1] - it is an infinite set, and
the uniform probaility distribution and uniform measure are the same
thing. We can also consider the set [0,\infty) which has a uniform
measure on it (the same constant probability distibution as before,
but extended to the full set). Clearly it cannot have a uniform
probability distribution, as the set has infinite measure. You seem
fixated on the peculiar definition of measure on strings that you gave
before, rather than the more general notion.


  I realise that the Halting theorem gives problems for believers of
  computationalism. 
 
 It does not. Why should it?

Restating what I said before simply:

1) Halting theorem implies the set of halting programs is of measure
zero within the set of all programs

2) Computationalism is the assumption that the universe's observer is
a universal Turing machine.

3) Uniform measure over the Plenitude of descriptions implies that
observer will never see simple strings.

 
  I never subscribed to computationalism at any time,
  but at this stage do not reject it. I could conceive of us living in
  a stupendous virtual reality system, which is in effect what your GP
  religion Mark II is. However, as pointed out by others, it does suffer
  from turtle-itis, and should not be considered the null
  hypothesis. It requires evidence for belief.
 
 By turtle-itis you mean: in which universe do the GP and his computer
 reside?  Or the higher-level GP2 which programmed GP? And so on? But
 we cannot get rid of this type of circularity - computability and
 mathematical logic are simply taken as given things, without any
 further explanation, like a religion. The computable multiverse, or
 the set of logically possible or mathematically possible or computable
 universes, represents the simplest explanation we can write down formally.
 But what exactly does it mean to accept something as a formal statement?
 What does it mean to identify the messy retina patterns caused by this
 text with abstract mathematical symbols such as x and y?  All formal
 explanations in our formal papers assume that we agree on how to read
 them. But reading and understanding papers is a complex physical and
 cognitive process. So all our formal explanations are relative to this
 given process which we usually do not even question. Essentially, the
 GP program is the simplest thing we can write down, relative to the
 unspoken assumption that it is clear what it means to write something
 down, and how to process it. It's the simplest thing, given this use
 of mathematical language we have agreed upon. But here the power of the
 formal approach ends - unspeakable things remain unspoken.
 

The GP program (aka Universal Dovetailer) is not the simplest
thing. The simplest describable thing is the set of all descriptions,
that simply exist without being generated. That has precisely zero
information or complexity, whereas the UD has some complexity of the
order of a few 100 bits. The simplest prior is the uniform one, ie
there is no bias whatsoever in the selection.

The only reason for not accepting the simplest thing is if it can be
shown to be logically inconsistent. This far, you have shown no such
thing, but rather demonstrated an enormous confusion between measure
and probability distribution.

The set of all descriptions and the uniform measure do not suffer from
turtle-itis. However, they are susceptible to criticism of the
Church-Turing thesis - namely why should our idea of information in
terms of strings of symbols from a discrete alphabet have 

Re: Predictions duplications

2001-10-23 Thread juergen

 From: [EMAIL PROTECTED]:
 [EMAIL PROTECTED] wrote:
   From [EMAIL PROTECTED]:
   [EMAIL PROTECTED] wrote:
M measure:
M(empty string)=1
M(x) = M(x0)+M(x1) nonnegative for all finite x.
   
   This sounds more like a probability distribution than a measure. In
   the set of all descriptions, we only consider infinite length
   bitstrings. Finite length bitstrings are not members. However, we can
   identify finite length bitstrings with subsets of descriptions. The
   empty string corresponds to the full set of all descriptions, so the
   first line M(empty string)=1 implies that the measure is normalisable
   (ie a probability distribution).
  
  Please check out definitions of measure and distribution! 
  Normalisability is not the critical issue.
  
  Clearly: Sum_x M(x) is infinite. So M is not a probability
  distribution. M(x) is just measure of all strings starting with x:
  M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = 
 
 For a measure to be normalisable, the sum over a *disjoint* set of
 subsets must be finite. If the set of subsets is not disjoint (ie the
 intersection are not empty) then the sum may well be infinite.
 Bringing this to the case of finite strings. Each finite string is
 actually a subset of the set of infinite strings, each containing the
 same finite prefix. So the string 101010 is actually a subset of 10101
 and so on. The Sum_x M(x), where I assume x ranges over all strings
 will of course be infinite. However, since the set of finite strings
 is not disjoint, this doesn't imply M(x) is unnormalisable.
 Now when you realise that every finite string x is a subset of the empty
 string, it becomes clear that M(x) is normalised to precisely 1.

The point is: prob dists and measures are different things. There is a
good reason for giving them different names. Prob dists assign numbers
to individual objects, not to sets.  Traditional definitions as well as
those for semimeasures in http://www.idsia.ch/~juergen/toesv2/

  Neglecting finite universes means loss of generality though.
  Hence measures mu(x) in the ATOE paper do not neglect finite x: 
  
  mu(empty string)=1
  mu(x) = P(x)+mu(x0)+mu(x1)  (all nonnegative).
  
  And here P is a probability distribution indeed! 
  P(x)0 possible only for x with finite description. 
  
 
 The P(x)0 case above actually breaks the countably subadditive
 property, so mu(x) cannot be called a measure... I'm not entirely sure
 what you're getting at here.

The set of computable universes includes the finite ones which we may 
not ignore. How do they contribute to the measure of all universes?
For convenience introduce a third symbol $ besides 0 and 1. Now what is 
the measure of the set of all universes starting with 00110? It's the sum of
the measure of the set of universes starting with 001101 and
the measure of the set of universes starting with 001100 and
the measure of the single universe 00110$ that stops right after 00110. 
Once there is a $ symbol there won't be another 0 or 1.  
So mu(x) = P(x)+mu(x0)+mu(x1).  

Our favorite example is the universal Turing machine with random inputs.
For finite x: P(x) is the probability that the TM output is x and nothing
but x. mu(x) is something different, namely, the so-called universal measure 
of all strings _starting_ with x.  Again mu(x) = P(x)+mu(x0)+mu(x1).  

Juergen Schmidhuberhttp://www.idsia.ch/~juergen/




Re: Predictions duplications

2001-10-22 Thread Alastair Malcolm

Juergen wrote (on 12th Oct):

 . . . In most possible futures your computer will
 vanish within the next second. But it does not. This indicates that our
 future is _not_ sampled from a uniform prior.

I don't wish to comment directly on the computer-vanishing problem as it
applies to Juergen's scheme (my own problem with this laudable scheme is
that it appears to be vulnerable to the same 'turtle-itis' criticism as for
all theistic religions - the (literal or abstract GP) 'turtle' responsible
for the world seems to need another turtle to support it and so on - there
is no full explanation), but I would like to say that certain other proposed
solutions don't suffer from this computer-vanishing problem (also known as
the WR/dragon problem), if one thinks of infinite length bit strings /
formal system descriptions via Limit n - infinity, where n is the relevant
string/description length (see appendix below). It seems to me that in
thinking in simple infinity terms one can lose essential information (for
example integers cannot be determined to be more numerous than the odd
numbers - both are of the lowest order of infinity - not a problem for limit
analysis).

Alastair Malcolm

APPENDIX

One might naively think that as there are at least hundreds of possible
states (call them V1, V2... ) where some part of our computer clearly
vanishes (and only one (N), where normality prevails), then even if one
considers bit string or other types of formal description involving other
variations in our universe or indeed other universes, one could still
'divide through' to find that we are most likely to be in a universe where
our computer vanishes in whole or part.

However, we note that in any minimal description of our universe (and the
following argument does not depend on there having to *only* be a minimal
description), deviations from the actual physical laws causing V1, V2...
will involve additional ad-hoc rules/events, so we can say that in
complexity terms (whether as a bit string or formal system description) we
will have V1 = N + VE1, V2 = N + VE2 etc, where VE1, VE2 etc are the extra
segments of description required to cater for the part-vanishings (strictly:
Length(V1) = Length(N) + Length(VE1) etc, but hopefully this is clear).

Moreover, if we are considering all possible descriptions, we also have to
allow for extra descriptions corresponding to entities beyond our
observability. For each V = N + VE we will have many DC = N + DCE, where DC
is a 'don't care' description - it is a universe (or set of universes)
indistinguishable by us from N (our own, non-computer-vanishing one), yet
objectively different.

Now, the key point is that in any objective descriptive framework (whether
by bit string or formal system), one should take the Limit as the
description length increases to infinity, not as our (humanly biassed)
visible universe is progressively and unobservably added to (say by other
universes). As we do this, we are far more likely to be in a DC (= N + DCE)
universe than a V (= N + VE) universe: computers don't normally vanish, in
whole or in part.

More details at:
http://www.physica.freeserve.co.uk/p105.htm
linking to:
http://www.physica.freeserve.co.uk/p111.htm
http://www.physica.freeserve.co.uk/p112.htm




Re: Predictions duplications

2001-10-22 Thread Russell Standish

[EMAIL PROTECTED] wrote:
 
 
 
  From [EMAIL PROTECTED]:
  [EMAIL PROTECTED] wrote:
   M measure:
   M(empty string)=1
   M(x) = M(x0)+M(x1) nonnegative for all finite x.
  
  This sounds more like a probability distribution than a measure. In
  the set of all descriptions, we only consider infinite length
  bitstrings. Finite length bitstrings are not members. However, we can
  identify finite length bitstrings with subsets of descriptions. The
  empty string corresponds to the full set of all descriptions, so the
  first line M(empty string)=1 implies that the measure is normalisable
  (ie a probability distribution).
 
 
 Please check out definitions of measure and distribution! 
 Normalisability is not the critical issue.
 
 Clearly: Sum_x M(x) is infinite. So M is not a probability
 distribution. M(x) is just measure of all strings starting with x:
 M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = 

For a measure to be normalisable, the sum over a *disjoint* set of
subsets must be finite. If the set of subsets is not disjoint (ie the
intersection are not empty) then the sum may well be infinite.

Bringing this to the case of finite strings. Each finite string is
actually a subset of the set of infinite strings, each containing the
same finite prefix. So the string 101010 is actually a subset of 10101
and so on. The Sum_x M(x), where I assume x ranges over all strings
will of course be infinite. However, since the set of finite strings
is not disjoint, this doesn't imply M(x) is unnormalisable.

Now when you realise that every finite string x is a subset of the empty
string, it becomes clear that M(x) is normalised to precisely 1.

 
 Neglecting finite universes means loss of generality though.
 Hence measures mu(x) in the ATOE paper do not neglect finite x: 
 
 mu(empty string)=1
 mu(x) = P(x)+mu(x0)+mu(x1)  (all nonnegative).
 
 And here P is a probability distribution indeed! 
 P(x)0 possible only for x with finite description. 
 

The P(x)0 case above actually breaks the countably subadditive
property, so mu(x) cannot be called a measure... I'm not entirely sure
what you're getting at here.

Cheers

 Juergen Schmidhuber
 
 http://www.idsia.ch/~juergen/
 http://www.idsia.ch/~juergen/everything/html.html
 http://www.idsia.ch/~juergen/toesv2/
 
 
 




Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 ()
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02





Re: Predictions duplications

2001-10-22 Thread juergen



 From [EMAIL PROTECTED]:
 [EMAIL PROTECTED] wrote:
  M measure:
  M(empty string)=1
  M(x) = M(x0)+M(x1) nonnegative for all finite x.
 
 This sounds more like a probability distribution than a measure. In
 the set of all descriptions, we only consider infinite length
 bitstrings. Finite length bitstrings are not members. However, we can
 identify finite length bitstrings with subsets of descriptions. The
 empty string corresponds to the full set of all descriptions, so the
 first line M(empty string)=1 implies that the measure is normalisable
 (ie a probability distribution).


Please check out definitions of measure and distribution! 
Normalisability is not the critical issue.

Clearly: Sum_x M(x) is infinite. So M is not a probability
distribution. M(x) is just measure of all strings starting with x:
M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = 

Neglecting finite universes means loss of generality though.
Hence measures mu(x) in the ATOE paper do not neglect finite x: 

mu(empty string)=1
mu(x) = P(x)+mu(x0)+mu(x1)  (all nonnegative).

And here P is a probability distribution indeed! 
P(x)0 possible only for x with finite description. 

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/






Re: Predictions duplications

2001-10-15 Thread Marchal

Hal Finney wrote:

Isn't this fixed by saying that the uniform measure is not over all
universe histories, as you have it above, but over all programs that
generate universes?  Now we have the advantage that short programs
generate more regular universes than long ones, and the WAP grows teeth.

I agree with Juergen on this: there is no uniform measure over all
programs (nor on all integers).

But in anycase you know I argue, unlike Juergen, that the measure should 
be put on all (relative) consistent computational extensions
appearing in UD*. This gives indeed a sort of Weak Turing-tropic 
Principle.

In which relative *speed* is a by product, not a prior.

Bruno




Re: Predictions duplications

2001-10-15 Thread Russell Standish

Hal - that is not a uniform measure!

[EMAIL PROTECTED] wrote:
 
 Juergen Schmidhuber writes:
  But there is no uniform prior over all programs!
  Just like there is no uniform prior over the integers.
  To see this, just try to write one down.
 
 I think there is.  Given a program of length l, the prior probability
 is 2^(-l).  (That is 2 to the power of negative l.)  The length of a
 program is defined by interpreting it using self-delimiting rules as
 is customary in the AIT analysis of Greg Chaitin.
 
 Hal Finney
 




Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 ()
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02





Re: Predictions duplications

2001-10-15 Thread Saibal Mitra

Hal Finney wrote:

 Juergen Schmidhuber writes:
  But there is no uniform prior over all programs!
  Just like there is no uniform prior over the integers.
  To see this, just try to write one down.

 I think there is.  Given a program of length l, the prior probability
 is 2^(-l).  (That is 2 to the power of negative l.)  The length of a
 program is defined by interpreting it using self-delimiting rules as
 is customary in the AIT analysis of Greg Chaitin.

This doesn't seem to be very uniform to me. Maybe you mean that with the
above prior the probability for a bit, drawn randomly from the set of all
programs,  to be 1 is 1/2 ?

Saibal

Saibal





Re: Predictions duplications

2001-10-14 Thread Russell Standish

That is almost the correct solution, Hal. If we ask what an observer
will make of a random description chosen at random, then you get
regular universes with probability exponentially related to the
inferred complexity. It is far clearer to see what happen when the
observer is a UTM, forcibly terminating programs after a
certain number of steps (representing the observer's resource bound)
(thus all descriptions are halting programs). Then one obtains a
Solomon-Levy distribution or universal prior. However, this argument
also works when the observer is not a UTM, but simply a classification
device of some kind.

The WAP has nothing to do with this issue, except inasmuch as
universes can only be observed through the eyes of some observer.

Again I reiterate that Juergen's resource-bounded Great Programmer
religion need be nothing but a reflection of our conscious selves stamped
upon our observations. 

Cheers

[EMAIL PROTECTED] wrote:
 
 Juergen writes:
  Some seem to think that the weak anthropic principle explains the
  regularity. The argument goes like this: Let there be a uniform measure
  on all universe histories, represented as bitstrings.  Now take the tiny
  subset of histories in which you appear.  Although the measure of this
  subset is tiny, its conditional measure, given your very existence,
  is not: According to the weak anthropic principle, the conditional
  probability of finding yourself in a regular universe compatible with 
  your existence equals 1.
 
  But it is essential to see that the weak anthropic principle does not
  have any predictive power at all. It does not tell you anything about
  the future.  It cannot explain away futures in which you still exist
  but irregular things happen. Only a nonuniform prior can explain this.
 
 Isn't this fixed by saying that the uniform measure is not over all
 universe histories, as you have it above, but over all programs that
 generate universes?  Now we have the advantage that short programs
 generate more regular universes than long ones, and the WAP grows teeth.
 
 Hal Finney
 




Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 ()
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02





Re: Predictions duplications

2001-10-14 Thread Russell Standish

According to whim or taste implies a conscious entity performing
choices according to a free will. This need not be the case. In my
mind, random means selected without cause (or without
procedure/algorithm).

A lot has been written on randomness, and its problematic nature. I
don't for one minute suggest I have anything new to say on the
matter. 

Cheers

jamikes wrote:
 
 Dear Russell and Hal,
 this is a naive question, however it goes into the basics of your written
 explanations. How would YOU define random? I had this problem for a long
 long time and never got a satisfactory solution. In my (non Indo-European)
 language, Hungarian, there is no exact word for it, it is used as a term
 meaning according to whim or taste (tetszöleges) - which leaves open that
 I may not LIKE the 'random' in question. If you say: a sequence defying all
 rules, then it is not random, it is calculable. You have to consider all
 rules and cut them out. Is a series of 1... random? of course it may be.
 Is the pi-decimal random? no, because it is a result of calculation.  If one
 says: just pick it that would follow all circumstantial pressure of the
 situation, maybe unconsciously, yet defying the 'random'.
 
 So what is the content you use behind that word?
 
 I would be surprised if both of you (and others, too) would agree G.
 Till then I wish you luck to use the word - at random.
 
 Best wishes
 John Mikes
 - Original Message -
 From: Russell Standish [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Sent: Sunday, October 14, 2001 4:36 AM
 Subject: Re: Predictions  duplications
 SNIP
 
 
  That is almost the correct solution, Hal. If we ask what an observer
  will make of a random description chosen at random, then you get
  regular universes with probability exponentially related to the
  inferred complexity. It is far clearer to see what happen when the
  observer is a UTM, forcibly terminating programs after a
  certain number of steps (representing the observer's resource bound)
  (thus all descriptions are halting programs). Then one obtains a
  Solomon-Levy distribution or universal prior. However, this argument
  also works when the observer is not a UTM, but simply a classification
  device of some kind.
 
  The WAP has nothing to do with this issue, except inasmuch as
  universes can only be observed through the eyes of some observer.
 
  Again I reiterate that Juergen's resource-bounded Great Programmer
  religion need be nothing but a reflection of our conscious selves stamped
  upon our observations.
 
  Cheers
 
  [EMAIL PROTECTED] wrote:
  
   Juergen writes:
Some seem to think that the weak anthropic principle explains the
regularity. The argument goes like this: Let there be a uniform
 measure
on all universe histories, represented as bitstrings.  Now take the
 tiny
subset of histories in which you appear.  Although the measure of this
subset is tiny, its conditional measure, given your very existence,
is not: According to the weak anthropic principle, the conditional
probability of finding yourself in a regular universe compatible with
your existence equals 1.
   
But it is essential to see that the weak anthropic principle does not
have any predictive power at all. It does not tell you anything about
the future.  It cannot explain away futures in which you still exist
but irregular things happen. Only a nonuniform prior can explain this.
  
   Isn't this fixed by saying that the uniform measure is not over all
   universe histories, as you have it above, but over all programs that
   generate universes?  Now we have the advantage that short programs
   generate more regular universes than long ones, and the WAP grows teeth.
  
   Hal Finney
  
 
 
 
  --
 --
  Dr. Russell StandishDirector
  High Performance Computing Support Unit, Phone 9385 6967, 8308 3119
 (mobile)
  UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 ()
  Australia[EMAIL PROTECTED]
  Room 2075, Red Centre
 http://parallel.hpc.unsw.edu.au/rks
  International prefix  +612, Interstate prefix 02
  --
 --
 
 




Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 ()
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02





Re: Predictions duplications

2001-10-12 Thread juergen


In reply to Russell Standish and Juho Pennanen I'd just like to
emphasize the main point, which is really trivial: by definition, a
uniform measure on the possible futures makes all future beginnings of
a given size equally likely. Then regular futures clearly are not any
more likely than the irregular ones. I have no idea what makes Russell
think this is debatable. In most possible futures your computer will
vanish within the next second. But it does not. This indicates that our
future is _not_ sampled from a uniform prior.

Some seem to think that the weak anthropic principle explains the
regularity. The argument goes like this: Let there be a uniform measure
on all universe histories, represented as bitstrings.  Now take the tiny
subset of histories in which you appear.  Although the measure of this
subset is tiny, its conditional measure, given your very existence,
is not: According to the weak anthropic principle, the conditional
probability of finding yourself in a regular universe compatible with 
your existence equals 1.

But it is essential to see that the weak anthropic principle does not
have any predictive power at all. It does not tell you anything about
the future.  It cannot explain away futures in which you still exist
but irregular things happen. Only a nonuniform prior can explain this.

Which nonuniform prior is plausible? My favorite is the resource-optimal
Speed Prior. I am hoping and expecting that someone will confirm it soon
by finding a rather fast pseudorandom generator responsible for apparently
noisy events in our universe. But others may be more conservative and
go for the more dominant enumerable Solomonoff-Levin prior mu^M or maybe
even for the nonenumerable prior mu^E (which dominates mu^M while still
being computable in the limit) or maybe even for the extreme prior mu^G.


Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




 From [EMAIL PROTECTED]  Thu Oct 11 17:36:41 2001
 From: Juho Pennanen [EMAIL PROTECTED]
 
 
 
 I tried to understand the problem that doctors Schmidhuber 
 and Standish are discussing by describing it in the most 
 concrete terms I could, below. (I admit beforehand I couldn't 
 follow all  the details and do not know all the papers and 
 theorems referred to, so this could be irrelevant.)
 
 So, say you are going to drop a pencil from your hand and 
 trying to predict if it's going to fall down or up this 
 time. Using what I understand with comp TOE, I would take 
 the set of all programs that at some state implement a 
 certain conscious state, namely the state in which you 
 remember starting your experiment of dropping the pencil, 
 and have already recorded the end result (I abreviate this 
 conscious state with CS. To be exact it is a set of states,
 but that shouldn't make a difference). 
 
 The space of all programs would be the set of all programs 
 in some language, coded as infinite numerable sequences of 
 0's and 1's. (I do not know how much the chosen language + 
 coding has effect on the whole thing). 
 
 Now for your prediction you need to divide the 
 implementations of CS into two sets: those in which the 
 pencil fell down and those in which it fell up. Then you 
 compare the measures of those sets. (You would need to 
 assume that each program is run just once or something of 
 the sort. Some programs obviously implement CS several 
 times when they run. So you would maybe just include those 
 programs that implement CS infinitely many times, and 
 weight them with the density of CS occurrences during 
 their running.) 
 
 One way to derive the measure you need is to assume a 
 measure on the set of all infinite sequences (i.e. on
 all programs). For this we have the natural measure, 
 i.e. the product measure of the uniform measure on 
 the set containing 0 and 1. And as far as my intuition 
 goes, this measure would lead to the empirically correct 
 prediction on the direction of the pencil falling. And 
 if I understood it right, this is not too far from what 
 Dr. Standish was claiming? And we wouldn't need any 
 speed priors. 
 
 But maybe the need of speed prior would come to play if I 
 thought more carefully about the detailed assumptions
 involved? E.g. that each program would be run just once, 
 with the same speed etc? I am not sure.
 
 Juho 
 
 /
 Juho Pennanen
 Department of Forest Ecology, P.O.Box 24
 FIN-00014 University of Helsinki
 tel. (09)191 58144 (+358-9-191 58144)
 GSM 040 5455 845 (+358-40-5455 845)
 http://www.helsinki.fi/people/juho.pennanen
 */
 

 Resent-Date: Thu, 11 Oct 2001 18:57:25 -0700
 From: Russell Standish [EMAIL PROTECTED]
 
 [EMAIL PROTECTED] wrote:
  
  
  
  Huh? A PDF? You mean a probability density function? On a continuous set? 
 
 Probability Distribution Function. And PDF's are defined on all
 

Re: Predictions duplications

2001-10-12 Thread hal

Juergen writes:
 Some seem to think that the weak anthropic principle explains the
 regularity. The argument goes like this: Let there be a uniform measure
 on all universe histories, represented as bitstrings.  Now take the tiny
 subset of histories in which you appear.  Although the measure of this
 subset is tiny, its conditional measure, given your very existence,
 is not: According to the weak anthropic principle, the conditional
 probability of finding yourself in a regular universe compatible with 
 your existence equals 1.

 But it is essential to see that the weak anthropic principle does not
 have any predictive power at all. It does not tell you anything about
 the future.  It cannot explain away futures in which you still exist
 but irregular things happen. Only a nonuniform prior can explain this.

Isn't this fixed by saying that the uniform measure is not over all
universe histories, as you have it above, but over all programs that
generate universes?  Now we have the advantage that short programs
generate more regular universes than long ones, and the WAP grows teeth.

Hal Finney




Re: Predictions duplications

2001-10-11 Thread Juho Pennanen



I tried to understand the problem that doctors Schmidhuber 
and Standish are discussing by describing it in the most 
concrete terms I could, below. (I admit beforehand I couldn't 
follow all  the details and do not know all the papers and 
theorems referred to, so this could be irrelevant.)

So, say you are going to drop a pencil from your hand and 
trying to predict if it's going to fall down or up this 
time. Using what I understand with comp TOE, I would take 
the set of all programs that at some state implement a 
certain conscious state, namely the state in which you 
remember starting your experiment of dropping the pencil, 
and have already recorded the end result (I abreviate this 
conscious state with CS. To be exact it is a set of states,
but that shouldn't make a difference). 

The space of all programs would be the set of all programs 
in some language, coded as infinite numerable sequences of 
0's and 1's. (I do not know how much the chosen language + 
coding has effect on the whole thing). 

Now for your prediction you need to divide the 
implementations of CS into two sets: those in which the 
pencil fell down and those in which it fell up. Then you 
compare the measures of those sets. (You would need to 
assume that each program is run just once or something of 
the sort. Some programs obviously implement CS several 
times when they run. So you would maybe just include those 
programs that implement CS infinitely many times, and 
weight them with the density of CS occurrences during 
their running.) 

One way to derive the measure you need is to assume a 
measure on the set of all infinite sequences (i.e. on
all programs). For this we have the natural measure, 
i.e. the product measure of the uniform measure on 
the set containing 0 and 1. And as far as my intuition 
goes, this measure would lead to the empirically correct 
prediction on the direction of the pencil falling. And 
if I understood it right, this is not too far from what 
Dr. Standish was claiming? And we wouldn't need any 
speed priors. 

But maybe the need of speed prior would come to play if I 
thought more carefully about the detailed assumptions
involved? E.g. that each program would be run just once, 
with the same speed etc? I am not sure.

Juho 




/
Juho Pennanen
Department of Forest Ecology, P.O.Box 24
FIN-00014 University of Helsinki
tel. (09)191 58144 (+358-9-191 58144)
GSM 040 5455 845 (+358-40-5455 845)
http://www.helsinki.fi/people/juho.pennanen
*/




Re: Predictions duplications

2001-10-11 Thread juergen



 From [EMAIL PROTECTED] :
 [EMAIL PROTECTED] wrote:
  
  So you NEED something additional to explain the ongoing regularity.
  You need something like the Speed Prior, which greatly favors regular 
  futures over others.
 
 I take issue with this statement. In Occam's Razor I show how any
 observer will expect to see regularities even with the uniform prior
 (comes about because all observers have resource problems,
 incidently). The speed prior is not necessary for Occam's Razor. It is
 obviously consistent with it though.

First of all: there is _no_ uniform prior on infinitely many things.
Try to build a uniform prior on the integers. (Tegmark also wrote that
... all mathematical structures are a priori given equal statistical
weight, but of course this does not make much sense because there is
_no_ way of assigning equal nonvanishing probability to all - infinitely
many - mathematical structures.)

There is at best a uniform measure on _beginnings_ of strings. Then
strings of equal size have equal measure.

But then regular futures (represented as strings) are just as likely
as irregular ones. Therefore I cannot understand the comment: (comes
about because all observers have resource problems, incidently).

Of course, alternative priors lead to alternative variants of Occam's
razor.  That has been known for a long time - formal versions of Occam's
razor go at least back to Solomonoff, 1964.  The big question really
is: which prior is plausible? The most general priors we can discuss are
those computable in the limit, in the algorithmic TOE paper. They do not
allow for computable optimal prediction though. But the more restrictive
Speed Prior does, and seems plausible from any programmer's point of view.

 The interesting thing is of course whether it is possible to
 experimentally distinguish between the speed prior and the uniform
 prior, and it is not at all clear to me that it is possible to
 distinguish between these cases.

I suggest to look at experimental data that seems to have Gaussian
randomness in it, such as interference patterns in split experiments.
The Speed Prior suggests the data cannot be really random, but that a
fast pseudorandom generator PRG is responsible, e.g., by dividing some
seed by 7 and taking some of the resulting digits as the new seed, or
whatever. So it's verifiable - we just have to discover the PRG method.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/






Re: Predictions duplications

2001-10-11 Thread juergen



   From [EMAIL PROTECTED] :
   [EMAIL PROTECTED] wrote:

So you NEED something additional to explain the ongoing regularity.
You need something like the Speed Prior, which greatly favors regular 
futures over others.
   
   I take issue with this statement. In Occam's Razor I show how any
   observer will expect to see regularities even with the uniform prior
   (comes about because all observers have resource problems,
   incidently). The speed prior is not necessary for Occam's Razor. It is
   obviously consistent with it though.
  
  First of all: there is _no_ uniform prior on infinitely many things.
  Try to build a uniform prior on the integers. (Tegmark also wrote that
  ... all mathematical structures are a priori given equal statistical
  weight, but of course this does not make much sense because there is
  _no_ way of assigning equal nonvanishing probability to all - infinitely
  many - mathematical structures.)
 
 I don't know why you insist on the prior being a PDF. It is not
 necessary. With the uniform prior, all finite sets have vanishing
 probability. However, all finite descriptions correspond to infinite
 sets, and these infinite sets have non-zero probability.

Huh? A PDF? You mean a probability density function? On a continuous set? 
No! I am talking about probability distributions on describable objects. 
On things you can program. 

Anyway, you write ...observer will expect to see regularities even with 
the uniform prior but that clearly cannot be true.

  There is at best a uniform measure on _beginnings_ of strings. Then
  strings of equal size have equal measure.
  
  But then regular futures (represented as strings) are just as likely
  as irregular ones. Therefore I cannot understand the comment: (comes
  about because all observers have resource problems, incidently).
 
 Since you've obviously barked up the wrong tree here, it's a little
 hard to know where to start. Once you understand that each observer
 must equivalence an infinite number of descriptions due to the
 boundedness of its resources, it becomes fairly obvious that the
 smaller, simpler descriptions correspond to larger equivalence classes
 (hence higher probability).

Maybe you should write down formally what you mean? Which resource bounds?
On which machine? What exactly do you mean by simple? Are you just
referring to the traditional Solomonoff-Levin measure and the associated
old Occam's razor theorems, or do you mean something else?

  Of course, alternative priors lead to alternative variants of Occam's
  razor.  That has been known for a long time - formal versions of Occam's
  razor go at least back to Solomonoff, 1964.  The big question really
  is: which prior is plausible? The most general priors we can discuss are
  those computable in the limit, in the algorithmic TOE paper. They do not
  allow for computable optimal prediction though. But the more restrictive
  Speed Prior does, and seems plausible from any programmer's point of view.
  
   The interesting thing is of course whether it is possible to
   experimentally distinguish between the speed prior and the uniform
   prior, and it is not at all clear to me that it is possible to
   distinguish between these cases.
  
  I suggest to look at experimental data that seems to have Gaussian
  randomness in it, such as interference patterns in split experiments.
  The Speed Prior suggests the data cannot be really random, but that a
  fast pseudorandom generator PRG is responsible, e.g., by dividing some
  seed by 7 and taking some of the resulting digits as the new seed, or
  whatever. So it's verifiable - we just have to discover the PRG method.
  
 
 I can't remember which incompleteness result it is, but it is
 impossible to prove the randomness of any sequence. In order to
 falsify your theory one would need to prove a sequence to be
 random. However, of course if all known sequences are provably
 pseudo-random (ie compressible), then this would constitute pretty
 good evidence. However, this is a tall order, as there is no algorithm
 for generating the compression behind an arbitrary sequence.

You are talking falsifiability. I am talking verifiability.  Sure, you
cannot prove randomness. But that's not the point of any inductive
science. The point is to find regularities if there are any. Occam's
razor encourages us to search for regularity, even when we do not know
whether there is any. Maybe some PhD student tomorrow will discover a
simple PRG of the kind I mentioned, and get famous.

It is important to see that Popper's popular and frequently cited and
overrated concept of falsifiability does not really help much to explain
what inductive science such as physics is all about. E.g., physicists
accept Everett's ideas although most of his postulated parallel universes
will remain inaccessible forever, and therefore are _not_ falsifiable.
Clearly, what's convincing about the multiverse theory is its simplicity,
not its 

Re: Predictions duplications

2001-10-10 Thread Russell Standish

[EMAIL PROTECTED] wrote:
 
 So you NEED something additional to explain the ongoing regularity.
 You need something like the Speed Prior, which greatly favors regular 
 futures over others.
 

I take issue with this statement. In Occam's Razor I show how any
observer will expect to see regularities even with the uniform prior
(comes about because all observers have resource problems,
incidently). The speed prior is not necessary for Occam's Razor. It is
obviously consistent with it though.

The interesting thing is of course whether it is possible to
experimentally distinguish between the speed prior and the uniform
prior, and it is not at all clear to me that it is possible to
distinguish between these cases.

Cheers


Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax   9385 6965, 0425 253119 ()
Australia[EMAIL PROTECTED] 
Room 2075, Red Centrehttp://parallel.hpc.unsw.edu.au/rks
International prefix  +612, Interstate prefix 02





Re: Predictions duplications

2001-10-09 Thread Marchal

Juergen Schmidhuber wrote:


We need a prior probability distribution on possible histories.

OK. I agree with that. But of course we differ on the meaning of
possible histories. And we tackle also the prior probability
in quite different ways. 


Then, once we have observed a past history, we can use Bayes' rule to
compute the probability of various futures, given the observations. Then
we predict the most probable future.


It seems to me that Bayes' rule is what we *need* to build a plausible
*past* before, and concerning the future, and even the present, I prefer
a theory, or a philosophy (or religion), hoping it can put light on
the unknown.


The algorithmic TOE paper discusses several formally describable priors
computable in the limit. But for the moment let me focus on my favorite,
the Speed Prior.


Mmh... You know that there exist class of infinitely speedable
programs (and theories), and even that those class correspond to a
superclass of universal machines. This entails the UD, 
the Universal Dovetailer, which I like to call sometimes
the 'splashed Universal Turing Machine',
(which btw makes UD* sort of denotational semantics of the UTM) could
generate and execute quicker and quicker version of itself. 
This means you need a non universal ontology. 
For helping other to understand the point I
propose to await discussing this point after I give the solution of the
problem in  diagonalisation 1: 
http://www.escribe.com/science/theory/m3079.html


Assume the Great Programmer has a resource problem, just like any 
other programmer.


You told Wei Dai not taking the Great Programmer to seriously.
I am not sure I can imagine the Great Programmer having resource
problem. Unless you are a finitist! Knowing that the UTM emulating
the UD will use an unboundable part of its tape ...
How could the GP have a resource problem when he is the one
building universes in which resource problem can happen (apparently)?
Are you not presuposing some physical structure in advance somewhere?


According to the Speed Prior, the harder it is to compute
something, the less likely it gets. More precisely, the cumulative prior
probability of all data whose computation costs at least O(n) time is
at most inversely proportional to n.


Very interesting idea. I would say: the harder it is to compute
something FROM here and now the less likely it gets from here and now. 
And I would
add that this is only true if your neighborhood is less complex than
yourself. I really believe your idea is interesting for some low level
computational physics. But when agent, including yourself, enters in the
play, its another matter, because it is intrinsically harder to compute
things then.


To evaluate the plausibility of this, consider your own computer: most
processes just take a few microseconds, some take a few seconds, few
take hours, very few take days, etc...

?
In the work of the GP, that is in UD*, most processes takes billions
of googolplex kalpa ...
A lot of processes engaged by the UD simply does not stop.
Some are interesting and perhaps Turing Universal like the process
raising better and better approximations of the Mandelbrot
set. 


We do not know the Great Programmer's computer and algorithm for computing
universe histories and other things. Still, we know that he cannot do
it more efficiently than the asymptotically fastest algorithm.


Are you really sure about that. If the GP is really *the* GP,
existing by Church Thesis, I think it follows from Blum Speed-up
Theorem that it has no asymptotically fastest algorithm.
But you are talking of our little program generated and executed
by the UD perhaps.
If this one has an asymptotically fastest algorithm, are you sure it
still can emulate a UTM? I bet the universe is universal.
Also:  how could we know and test we are in a quick or slow version of 
that little program. Is that not (absolutely) undecidable.
Should that program really be run? If yes, are you not presupposing
again a physical universe?
 

The Speed Prior reflects properties of precisely this optimal algorithm.
It turns out that the best way of predicting future y, given past x,
is to minimize the (computable) Kt-complexity of (x,y).  As observation
size grows, the predictions converge to the optimal predictions.


As observation grows knowledge grows, and we grows making everything
more complex that before. I guess you mean optimal predictions at 
some level. With comp the more knowledge grows, the more ignorance
grows.


To address Bruno Marchal's questions: Many of my duplications in parallel
universes with identical past x will experience different futures y.
None of my copies knows a priori which one it is. But each does know:
Those futures y with high Kt(x,y) will be highly unlikely; those with
low Kt(x,y) won't.


It is indeed hard to write a little program which generate a long
Kolmogorov-Chaitin incompressible 01 string.
But, as I told you earlier, it is quite easy to write a little
program which