date:20100707

[agi] Solomonoff Induction is Not Universal and Probability is not Prediction

2010-07-07 Thread Jim Bromer

Suppose you have sets of programs that produce two strings.  One set of
outputs is 00 and the other is 11. Now suppose you used these sets
of programs to chart the probabilities of the output of the strings.  If the
two strings were each output by the same number of programs then you'd have
a .5 probability that either string would be output.  That's ok.  But, a
more interesting question is, given that the first digits are 000, what are
the chances that the next digit will be 1?  Dim Induction will report .5,
which of course is nonsense and a whole less useful than making a rough
guess.

But, of course, Solomonoff Induction purports to be able, if it was
feasible, to compute the possibilities for all possible programs.  Ok, but
now, try thinking about this a little bit.  If you have ever tried writing
random program instructions what do you usually get?  Well, I'll take a
hazard and guess (a lot better than the bogus method of confusing shallow
probability with prediction in my example) and say that you will get a lot
of programs that crash.  Well, most of my experiments with that have ended
up with programs that go into an infinite loop or which crash.  Now on a
universal Turing machine, the results would probably look a little
different.  Some strings will output nothing and go into an infinite loop.
Some programs will output something and then either stop outputting anything
or start outputting an infinite loop of the same substring.  Other programs
will go on to infinity producing something that looks like random strings.
But the idea that all possible programs would produce well distributed
strings is complete hogwash.  Since Solomonoff Induction does not define
what kind of programs should be used, the assumption that the distribution
would produce useful data is absurd.  In particular, the use of the method
to determine the probability based given an initial string (as in what
follows given the first digits are 000) is wrong as in really wrong.  The
idea that this crude probability can be used as prediction is
unsophisticated.

Of course you could develop an infinite set of Solomonoff Induction values
for each possible given initial sequence of digits.  Hey when you're working
with infeasible functions why not dream anything?

I might be wrong of course.  Maybe there is something you guys
haven't been able to get across to me.  Even if you can think for yourself
you can still make mistakes.  So if anyone has actually tried writing a
program to output all possible programs (up to some feasible point) on a
Turing Machine simulator, let me know how it went.

Jim Bromer



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Hutter - A fundamental misdirection?

2010-07-07 Thread Gabriel Recchia

In short, instead of a pot of neurons, we might instead have a pot of
dozens of types of
neurons that each have their own complex rules regarding what other types
of neurons they
can connect to, and how they process information...

...there is plenty of evidence (from the slowness of evolution, the large
number (~200)
of neuron types, etc.), that it is many-layered and quite complex...

The disconnect between the low-level neural hardware and the implementation
of algorithms that build conceptual spaces via dimensionality
reduction--which generally ignore facts such as the existence of different
types of neurons, the apparently hierarchical organization of neocortex,
etc.--seems significant. Have there been attempts to develop computational
models capable of LSA-style feats (e.g., constructing a vector space in
which words with similar meanings tend to be relatively close to each other)
that take into account basic facts about how neurons actually operate
(ideally in a more sophisticated way than the nodes of early connectionist
networks which, as we now know, are not particularly neuron-like at all)? If
so, I would love to know about them.

On Tue, Jun 29, 2010 at 3:02 PM, Ian Parker ianpark...@gmail.com wrote:

The paper seems very similar in principle to LSA. What you need for a
concept vector (or position) is the application of LSA followed by K-Means
which will give you your concept clusters.

I would not knock Hutter too much. After all LSA reduces {primavera,
mamanthal, salsa, resorte} to one word giving 2 bits saving on Hutter.

- Ian Parker

On 29 June 2010 07:32, rob levy r.p.l...@gmail.com wrote:

Sorry, the link I included was invalid, this is what I meant:

http://www.geog.ucsb.edu/~raubal/Publications/RefConferences/ICSC_2009_AdamsRaubal_Camera-FINAL.pdfhttp://www.geog.ucsb.edu/%7Eraubal/Publications/RefConferences/ICSC_2009_AdamsRaubal_Camera-FINAL.pdf

On Tue, Jun 29, 2010 at 2:28 AM, rob levy r.p.l...@gmail.com wrote:

On Mon, Jun 28, 2010 at 5:23 PM, Steve Richfield
steve.richfi...@gmail.com wrote:

Rob,

I just LOVE opaque postings, because they identify people who see things
differently than I do. I'm not sure what you are saying here, so I'll make
some random responses to exhibit my ignorance and elicit more
explanation.

I think based on what you wrote, you understood (mostly) what I was
trying to get across. So I'm glad it was at least quasi-intelligible. :)

It sounds like this is a finer measure than the dimensionality that I
was referencing. However, I don't see how to reduce anything as quantized
as
dimensionality into finer measures. Can you say some more about this?

I was just referencing Gardenfors' research program of conceptual
spaces (I was intentionally vague about committing to this fully though
because I don't necessarily think this is the whole answer). Page 2 of this
article summarizes it pretty succinctly: http://http://goog_1627994790
www.geog.ucsb.edu/.../ICSC_2009_AdamsRaubal_Camera-FINAL.pdf

However, different people's brains, even the brains of identical twins,
have DIFFERENT mappings. This would seem to mandate experience-formed
topology.

Yes definitely.

Since these conceptual spaces that structure sensorimotor
expectation/prediction (including in higher order embodied exploration of
concepts I think) are multidimensional spaces, it seems likely that some
kind of neural computation over these spaces must occur,

I agree.

though I wonder what it actually would be in terms of neurons, (and if
that matters).

I don't see any route to the answer except via neurons.

I agree this is true of natural intelligence, though maybe in modeling,
the neural level can be shortcut to the topo map level without recourse to
neural computation (use some more straightforward computation like matrix
algebra instead).

Rob

*agi* | Archives https://www.listbox.com/member/archive/303/=now
https://www.listbox.com/member/archive/rss/303/ |
Modifyhttps://www.listbox.com/member/?;Your Subscription
http://www.listbox.com

---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction

2010-07-07 Thread Abram Demski

Jim,

I am unable to find the actual objection to Solomonoff in what you wrote
(save for that it's wrong as in really wrong).

It's true that a lot of programs won't produce any output. That just means
they won't alter the prediction.

It's also true that a lot of programs will produce random-looking or
boring-looking output. This just means that Solomonoff will have some
expectation of those things. To use your example, given 000, the chances
that the next digit will be 0 will be fairly high thanks to boring programs
which just output lots of zeros. (Not sure why you mention the idea that it
might be .5? This sounds like no induction rather than dim induction...)

--Abram

On Wed, Jul 7, 2010 at 10:10 AM, Jim Bromer jimbro...@gmail.com wrote:

 Suppose you have sets of programs that produce two strings.  One set of
 outputs is 00 and the other is 11. Now suppose you used these sets
 of programs to chart the probabilities of the output of the strings.  If the
 two strings were each output by the same number of programs then you'd have
 a .5 probability that either string would be output.  That's ok.  But, a
 more interesting question is, given that the first digits are 000, what are
 the chances that the next digit will be 1?  Dim Induction will report .5,
 which of course is nonsense and a whole less useful than making a rough
 guess.

 But, of course, Solomonoff Induction purports to be able, if it was
 feasible, to compute the possibilities for all possible programs.  Ok, but
 now, try thinking about this a little bit.  If you have ever tried writing
 random program instructions what do you usually get?  Well, I'll take a
 hazard and guess (a lot better than the bogus method of confusing shallow
 probability with prediction in my example) and say that you will get a lot
 of programs that crash.  Well, most of my experiments with that have ended
 up with programs that go into an infinite loop or which crash.  Now on a
 universal Turing machine, the results would probably look a little
 different.  Some strings will output nothing and go into an infinite loop.
 Some programs will output something and then either stop outputting anything
 or start outputting an infinite loop of the same substring.  Other programs
 will go on to infinity producing something that looks like random strings.
 But the idea that all possible programs would produce well distributed
 strings is complete hogwash.  Since Solomonoff Induction does not define
 what kind of programs should be used, the assumption that the distribution
 would produce useful data is absurd.  In particular, the use of the method
 to determine the probability based given an initial string (as in what
 follows given the first digits are 000) is wrong as in really wrong.  The
 idea that this crude probability can be used as prediction is
 unsophisticated.

 Of course you could develop an infinite set of Solomonoff Induction values
 for each possible given initial sequence of digits.  Hey when you're working
 with infeasible functions why not dream anything?

 I might be wrong of course.  Maybe there is something you guys
 haven't been able to get across to me.  Even if you can think for yourself
 you can still make mistakes.  So if anyone has actually tried writing a
 program to output all possible programs (up to some feasible point) on a
 Turing Machine simulator, let me know how it went.

 Jim Bromer

*agi* | Archives https://www.listbox.com/member/archive/303/=now
 https://www.listbox.com/member/archive/rss/303/ | 
 Modifyhttps://www.listbox.com/member/?;Your Subscription
 http://www.listbox.com




-- 
Abram Demski
http://lo-tho.blogspot.com/
http://groups.google.com/group/one-logic



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction

2010-07-07 Thread Jim Bromer

Abram,
I don't think you are right.  The reason is that Solomonoff Induction does
not produce a true universal probability for any given first digits.  To do
so it would have to be capable of representing the probability of any
(computable)  sequence that follows any (computable) string of given first
digits.

Yes, if a high proportion of programs produce 00, it will be able to
register that as string as more probable, but the information on what the
next digits will be, given some input, will not be represented in anything
that resembled compression.  For instance, if you had 62 bits and wanted
to know what the probability of the next two bits were, you would have to
have done the infinite calculations of a Solomonoff Induction for each of
the 2^62 possible combination of bits that represented the possible input to
your problem.

I might be wrong, but I don't see where all this is information is being
hidden if I am.  On the other hand, if I am right (or even partially right)
I don't understand why seemingly smart people are excited about this as a
possible AGI method.

We in AGI specifically want to know the answer to the kind of question:
Given some partially defined situation, how could a computer best figure out
what is going on.  Most computer situations are going to be represented by
kilobytes or megabytes these days, not in strings of 32 bits or less.  If
there was an abstraction that could help us think about these things, it
could help even if the ideal would be way beyond any feasible technology.
And there is an abstraction like this that can help us.  Applied
probability.  We can think about these ideas in the terms of strings if we
want to but the key is that WE have to work out the details because we see
the problems differently.  There is nothing that I have seen in Solomonoff
Induction that suggests that this is an adequate or even useful method to
use.  On the other hand I would not be talking about this if it weren't for
Solomonoff so maybe I just don't share your enthusiasm.  If I have
misunderstood something then all I can say is that I am still waiting for
someone to explain it in a way that I can understand.

Jim

On Wed, Jul 7, 2010 at 1:58 PM, Abram Demski abramdem...@gmail.com wrote:

 Jim,

 I am unable to find the actual objection to Solomonoff in what you wrote
 (save for that it's wrong as in really wrong).

 It's true that a lot of programs won't produce any output. That just means
 they won't alter the prediction.

 It's also true that a lot of programs will produce random-looking or
 boring-looking output. This just means that Solomonoff will have some
 expectation of those things. To use your example, given 000, the chances
 that the next digit will be 0 will be fairly high thanks to boring programs
 which just output lots of zeros. (Not sure why you mention the idea that it
 might be .5? This sounds like no induction rather than dim induction...)

 --Abram

   On Wed, Jul 7, 2010 at 10:10 AM, Jim Bromer jimbro...@gmail.com wrote:

   Suppose you have sets of programs that produce two strings.  One set
 of outputs is 00 and the other is 11. Now suppose you used these
 sets of programs to chart the probabilities of the output of the strings.
 If the two strings were each output by the same number of programs then
 you'd have a .5 probability that either string would be output.  That's ok.
 But, a more interesting question is, given that the first digits are 000,
 what are the chances that the next digit will be 1?  Dim Induction will
 report .5, which of course is nonsense and a whole less useful than making a
 rough guess.

 But, of course, Solomonoff Induction purports to be able, if it was
 feasible, to compute the possibilities for all possible programs.  Ok, but
 now, try thinking about this a little bit.  If you have ever tried writing
 random program instructions what do you usually get?  Well, I'll take a
 hazard and guess (a lot better than the bogus method of confusing shallow
 probability with prediction in my example) and say that you will get a lot
 of programs that crash.  Well, most of my experiments with that have ended
 up with programs that go into an infinite loop or which crash.  Now on a
 universal Turing machine, the results would probably look a little
 different.  Some strings will output nothing and go into an infinite loop.
 Some programs will output something and then either stop outputting anything
 or start outputting an infinite loop of the same substring.  Other programs
 will go on to infinity producing something that looks like random strings.
 But the idea that all possible programs would produce well distributed
 strings is complete hogwash.  Since Solomonoff Induction does not define
 what kind of programs should be used, the assumption that the distribution
 would produce useful data is absurd.  In particular, the use of the method
 to determine the probability based given an initial string (as in what
 follows given the

Re: [agi] Hutter - A fundamental misdirection?

2010-07-07 Thread Ian Parker

There is very little. Someone do research. Here is a paper on language
fitness.

http://kybele.psych.cornell.edu/~edelman/elcfinal.pdf

http://kybele.psych.cornell.edu/~edelman/elcfinal.pdfLSA is *not* discussed
nor is any fitness concept with the language itself. Similar sounding (or
written) words must be capable of disambiguation using LSA, otherwise the
language would be unfit. Let us have a *gedanken* language where spring
the example I have taken with my Spanish cannot be disambiguated. Suppose *
spring* meant *step forward, *as well as its other meanings. If I am
learning to dance I do not think about *primavera, resorte *or*
mamanthal* but
I do think about *salsa*. If I did not know whether I was to jump or put
my leg forward it would be extremely confusing. To my knowledge fitness in
this context has not been discussed.

In fact perhaps the only work that is relevant is my own which I posted here
some time ago. The reduction in entropy (compression) obtained with LSA was
disappointing. The different meanings (different words in Spanish other
languages) are compressed more readily. Both Spanish and English have a
degree of fitness which (just possibly) is definable in LSA terms.

- Ian Parker

On 7 July 2010 17:12, Gabriel Recchia grecc...@gmail.com wrote:

...there is plenty of evidence (from the slowness of evolution, the large
number (~200)
of neuron types, etc.), that it is many-layered and quite complex...

On Tue, Jun 29, 2010 at 3:02 PM, Ian Parker ianpark...@gmail.com wrote:

The paper seems very similar in principle to LSA. What you need for a
concept vector (or position) is the application of LSA followed by K-Means
which will give you your concept clusters.

I would not knock Hutter too much. After all LSA reduces {primavera,
mamanthal, salsa, resorte} to one word giving 2 bits saving on Hutter.

- Ian Parker

On 29 June 2010 07:32, rob levy r.p.l...@gmail.com wrote:

Sorry, the link I included was invalid, this is what I meant:

On Tue, Jun 29, 2010 at 2:28 AM, rob levy r.p.l...@gmail.com wrote:

On Mon, Jun 28, 2010 at 5:23 PM, Steve Richfield
steve.richfi...@gmail.com wrote:

Rob,

I just LOVE opaque postings, because they identify people who see
things differently than I do. I'm not sure what you are saying here, so
I'll
make some random responses to exhibit my ignorance and elicit more
explanation.

I think based on what you wrote, you understood (mostly) what I was
trying to get across. So I'm glad it was at least quasi-intelligible. :)

It sounds like this is a finer measure than the dimensionality that
I was referencing. However, I don't see how to reduce anything as
quantized
as dimensionality into finer measures. Can you say some more about this?

I was just referencing Gardenfors' research program of conceptual
spaces (I was intentionally vague about committing to this fully though
because I don't necessarily think this is the whole answer). Page 2 of
this
article summarizes it pretty succinctly: http://http://goog_1627994790
www.geog.ucsb.edu/.../ICSC_2009_AdamsRaubal_Camera-FINAL.pdf

However, different people's brains, even the brains of identical twins,
have DIFFERENT mappings. This would seem to mandate experience-formed
topology.

Yes definitely.

I agree.

though I wonder what it actually would be in terms of neurons, (and if
that matters).

I don't see any route to the answer except via neurons.

I agree this is true of natural

Re: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction

2010-07-07 Thread Matt Mahoney

 Jim Bromer wrote:
 But, a more interesting question is, given that the first digits are 000, 
 what 
are the chances that the next digit will be 1?  Dim Induction will report .5, 
which of course is nonsense and a whole less useful than making a rough guess.

Wrong. The probability of a 1 is p(0001)/(p()+p(0001)) where the 
probabilities are computed using Solomonoff induction. A program that outputs 
 will be shorter in most languages than a program that outputs 0001, so 0 
is 
the most likely next bit.

More generally, probability and prediction are equivalent by the chain rule. 
Given any 2 strings x followed by y, the prediction p(y|x) = p(xy)/p(x).

 -- Matt Mahoney, matmaho...@yahoo.com





From: Jim Bromer jimbro...@gmail.com
To: agi agi@v2.listbox.com
Sent: Wed, July 7, 2010 10:10:37 AM
Subject: [agi] Solomonoff Induction is Not Universal and Probability is not 
Prediction


Suppose you have sets of programs that produce two strings.  One set of 
outputs is 00 and the other is 11. Now suppose you used these sets of 
programs to chart the probabilities of the output of the strings.  If the two 
strings were each output by the same number of programs then you'd have a .5 
probability that either string would be output.  That's ok.  But, a more 
interesting question is, given that the first digits are 000, what are the 
chances that the next digit will be 1?  Dim Induction will report .5, which of 
course is nonsense and a whole less useful than making a rough guess.
 
But, of course, Solomonoff Induction purports to be able, if it was feasible, 
to 
compute the possibilities for all possible programs.  Ok, but now, try thinking 
about this a little bit.  If you have ever tried writing random program 
instructions what do you usually get?  Well, I'll take a hazard and guess (a 
lot 
better than the bogus method of confusing shallow probability with prediction 
in my example) and say that you will get a lot of programs that crash.  Well, 
most of my experiments with that have ended up with programs that go into an 
infinite loop or which crash.  Now on a universal Turing machine, the results 
would probably look a little different.  Some strings will output nothing and 
go 
into an infinite loop.  Some programs will output something and then either 
stop 
outputting anything or start outputting an infinite loop of the same substring. 
 
Other programs will go on to infinity producing something that looks like 
random 
strings.  But the idea that all possible programs would produce well 
distributed 
strings is complete hogwash.  Since Solomonoff Induction does not define what 
kind of programs should be used, the assumption that the distribution would 
produce useful data is absurd.  In particular, the use of the method to 
determine the probability based given an initial string (as in what follows 
given the first digits are 000) is wrong as in really wrong.  The idea that 
this 
crude probability can be used as prediction is unsophisticated.
 
Of course you could develop an infinite set of Solomonoff Induction values for 
each possible given initial sequence of digits.  Hey when you're working with 
infeasible functions why not dream anything?
 
I might be wrong of course.  Maybe there is something you guys haven't been 
able 
to get across to me.  Even if you can think for yourself you can still make 
mistakes.  So if anyone has actually tried writing a program to output all 
possible programs (up to some feasible point) on a Turing Machine simulator, 
let 
me know how it went.
 
Jim Bromer
 
agi | Archives  | Modify Your Subscription  


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Hutter - A fundamental misdirection?

2010-07-07 Thread Matt Mahoney

Gorrell and Webb describe a neural implementation of LSA that seems more 
biologically plausible than the usual matrix factoring implementation.
http://www.dcs.shef.ac.uk/~genevieve/gorrell_webb.pdf
 
In the usual implementation, a word-word matrix A is factored to A = USV where 
S 
is diagonal (containing eigenvalues), and then the smaller elements of S are 
discarded. In the Gorrell model, U and V are the weights of a 3 layer neural 
network mapping words to words, and the nonzero elements of S represent the 
semantic space in the middle layer. As the network is trained, neurons are 
added 
to S. Thus, the network is trained online in a single pass, unlike factoring, 
which is offline.

-- Matt Mahoney, matmaho...@yahoo.com





From: Gabriel Recchia grecc...@gmail.com
To: agi agi@v2.listbox.com
Sent: Wed, July 7, 2010 12:12:00 PM
Subject: Re: [agi] Hutter - A fundamental misdirection?

 In short, instead of a pot of neurons, we might instead have a pot of 
 dozens 
of types of 

 neurons that each have their own complex rules regarding what other types of 
neurons they 

 can connect to, and how they process information...

 ...there is plenty of evidence (from the slowness of evolution, the large 
number (~200) 

 of neuron types, etc.), that it is many-layered and quite complex...

The disconnect between the low-level neural hardware and the implementation of 
algorithms that build conceptual spaces via dimensionality reduction--which 
generally ignore facts such as the existence of different types of neurons, the 
apparently hierarchical organization of neocortex, etc.--seems significant. 
Have 
there been attempts to develop computational models capable of LSA-style feats 
(e.g., constructing a vector space in which words with similar meanings tend to 
be relatively close to each other) that take into account basic facts about how 
neurons actually operate (ideally in a more sophisticated way than the nodes of 
early connectionist networks which, as we now know, are not particularly 
neuron-like at all)? If so, I would love to know about them.



On Tue, Jun 29, 2010 at 3:02 PM, Ian Parker ianpark...@gmail.com wrote:

The paper seems very similar in principle to LSA. What you need for a concept 
vector  (or position) is the application of LSA followed by K-Means which will 
give you your concept clusters.


I would not knock Hutter too much. After all LSA reduces {primavera, 
mamanthal, 
salsa, resorte} to one word giving 2 bits saving on Hutter.




  - Ian Parker



On 29 June 2010 07:32, rob levy r.p.l...@gmail.com wrote:

Sorry, the link I included was invalid, this is what I meant: 


http://www.geog.ucsb.edu/~raubal/Publications/RefConferences/ICSC_2009_AdamsRaubal_Camera-FINAL.pdf




On Tue, Jun 29, 2010 at 2:28 AM, rob levy r.p.l...@gmail.com wrote:

On Mon, Jun 28, 2010 at 5:23 PM, Steve Richfield steve.richfi...@gmail.com 
wrote:

Rob,

I just LOVE opaque postings, because they identify people who see things 
differently than I do. I'm not sure what you are saying here, so I'll make 
some 
random responses to exhibit my ignorance and elicit more explanation.




I think based on what you wrote, you understood (mostly) what I was trying 
to 
get across.  So I'm glad it was at least quasi-intelligible. :)
 
 It sounds like this is a finer measure than the dimensionality that I was 
referencing. However, I don't see how to reduce anything as quantized as 
dimensionality into finer measures. Can you say some more about this?




I was just referencing Gardenfors' research program of conceptual spaces 
(I 
was intentionally vague about committing to this fully though because I 
don't 
necessarily think this is the whole answer).  Page 2 of this article 
summarizes 
it pretty succinctly: 
http://www.geog.ucsb.edu/.../ICSC_2009_AdamsRaubal_Camera-FINAL.pdf


 
However, different people's brains, even the brains of identical twins, have 
DIFFERENT mappings. This would seem to mandate experience-formed topology.
 



Yes definitely.
 
Since these conceptual spaces that structure sensorimotor 
expectation/prediction 
(including in higher order embodied exploration of concepts I think) are 
multidimensional spaces, it seems likely that some kind of neural 
computation 
over these spaces must occur,

I agree.
 

though I wonder what it actually would be in terms of neurons, (and if that 
matters).

I don't see any route to the answer except via neurons.


I agree this is true of natural intelligence, though maybe in modeling, the 
neural level can be shortcut to the topo map level without recourse to 
neural 
computation (use some more straightforward computation like matrix algebra 
instead).

Rob

agi | Archives  | Modify Your Subscription  

agi | Archives  | Modify Your Subscription  

agi | Archives  | Modify Your Subscription  


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed:

Re: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction

2010-07-07 Thread Jim Bromer

Matt,
But you are still saying that Solomonoff Induction has to be recomputed for
each possible combination of bit value aren't you?  Although this doesn't
matter when you are dealing with infinite computations in the first place,
it does matter when you are wondering if this has anything to do with AGI
and compression efficiencies.
Jim Bromer
On Wed, Jul 7, 2010 at 5:44 PM, Matt Mahoney matmaho...@yahoo.com wrote:

Jim Bromer wrote:
  But, a more interesting question is, given that the first digits are 000,
 what are the chances that the next digit will be 1?  Dim Induction will
 report .5, which of course is nonsense and a whole less useful than making a
 rough guess.

 Wrong. The probability of a 1 is p(0001)/(p()+p(0001)) where the
 probabilities are computed using Solomonoff induction. A program that
 outputs  will be shorter in most languages than a program that outputs
 0001, so 0 is the most likely next bit.

 More generally, probability and prediction are equivalent by the chain
 rule. Given any 2 strings x followed by y, the prediction p(y|x) =
 p(xy)/p(x).


 -- Matt Mahoney, matmaho...@yahoo.com


  --
 *From:* Jim Bromer jimbro...@gmail.com
 *To:* agi agi@v2.listbox.com
 *Sent:* Wed, July 7, 2010 10:10:37 AM
 *Subject:* [agi] Solomonoff Induction is Not Universal and Probability
 is not Prediction

 Suppose you have sets of programs that produce two strings.  One set of
 outputs is 00 and the other is 11. Now suppose you used these sets
 of programs to chart the probabilities of the output of the strings.  If the
 two strings were each output by the same number of programs then you'd have
 a .5 probability that either string would be output.  That's ok.  But, a
 more interesting question is, given that the first digits are 000, what are
 the chances that the next digit will be 1?  Dim Induction will report .5,
 which of course is nonsense and a whole less useful than making a rough
 guess.

 But, of course, Solomonoff Induction purports to be able, if it was
 feasible, to compute the possibilities for all possible programs.  Ok, but
 now, try thinking about this a little bit.  If you have ever tried writing
 random program instructions what do you usually get?  Well, I'll take a
 hazard and guess (a lot better than the bogus method of confusing shallow
 probability with prediction in my example) and say that you will get a lot
 of programs that crash.  Well, most of my experiments with that have ended
 up with programs that go into an infinite loop or which crash.  Now on a
 universal Turing machine, the results would probably look a little
 different.  Some strings will output nothing and go into an infinite loop.
 Some programs will output something and then either stop outputting anything
 or start outputting an infinite loop of the same substring.  Other programs
 will go on to infinity producing something that looks like random strings.
 But the idea that all possible programs would produce well distributed
 strings is complete hogwash.  Since Solomonoff Induction does not define
 what kind of programs should be used, the assumption that the distribution
 would produce useful data is absurd.  In particular, the use of the method
 to determine the probability based given an initial string (as in what
 follows given the first digits are 000) is wrong as in really wrong.  The
 idea that this crude probability can be used as prediction is
 unsophisticated.

 Of course you could develop an infinite set of Solomonoff Induction values
 for each possible given initial sequence of digits.  Hey when you're working
 with infeasible functions why not dream anything?

 I might be wrong of course.  Maybe there is something you guys
 haven't been able to get across to me.  Even if you can think for yourself
 you can still make mistakes.  So if anyone has actually tried writing a
 program to output all possible programs (up to some feasible point) on a
 Turing Machine simulator, let me know how it went.

 Jim Bromer

*agi* | Archives https://www.listbox.com/member/archive/303/=now
 https://www.listbox.com/member/archive/rss/303/ | 
 Modifyhttps://www.listbox.com/member/?;Your Subscription
 http://www.listbox.com/
*agi* | Archives https://www.listbox.com/member/archive/303/=now
 https://www.listbox.com/member/archive/rss/303/ | 
 Modifyhttps://www.listbox.com/member/?;Your Subscription
 http://www.listbox.com/




---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

[agi] Solomonoff Induction is Not Universal and Probability is not Prediction

Re: [agi] Hutter - A fundamental misdirection?

Re: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction

Re: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction

Re: [agi] Hutter - A fundamental misdirection?

Re: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction

Re: [agi] Hutter - A fundamental misdirection?

Re: [agi] Solomonoff Induction is Not Universal and Probability is not Prediction

8 matches

Site Navigation

Mail list logo

Footer information