Re: [agi] Complexity of environment of agi agent

2003-09-24 Thread arnoud
 A more natural way seems to me: predict what state the PRS is in the next
 time a prediction failure (of the base level) happens. The PRS can be seen
 as containing all the information about the future (also future PRSes)
 until a prediction failure. Apparantly the environment then switches to
 some new unforseen behaviour. Until then the environment did not contain
 any new information; it can be compressed to the PRS (and the Abstraction
 and Prediction neural networks which are fixed during run time).

A correction:
The Prediction neural network computes the predicted next input on basis
of the most recent input also (Pr: C, I - Ipred). Therefor, even if the
prediction succeeds, it can be that the PRS does not contain all the
information to construct the future input.
The sequence untill a prediction failure is compressible to the PSR plus the 
present input. Because of this the PSR can be viewed/operate as an abstract 
classification without detail information (which is stored in the present 
input I) (although the PSR can also hold the detail information in very 
simple environments) , so it's not an unimportant point.

second:
The PRS that is predicted is the PRS (= context C) after the prediction 
failure.

By the way:
I'm going to use BPTT (backpropagation through time) to train the recurrent 
neural network. According to Schmidhuber at IDSIA there is a better recurrent 
neural network system Long Short Term Memory (LSTM). This network + training 
technique is claimed to be able to look further back (1000 steps in stead of 
just 10 for BPTT). Is anyone familiair with this? Can it be put in the same 
layering architecture that I am aiming at, i.e. does it have an abstraction 
state?

Bye,
Arnoud

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Complexity of environment of agi agent

2003-09-22 Thread arnoud
For example
 you could have some sequence where the computation time of the n-th
 bit take n^1000 computation cycles.  There is only one pattern and
 it's highly compressible as it has a pretty short algorithm however
 there is no way you'll ever learn what the pattern is.

To say a bit more/correct myself/repeat what you said/repeat myself:
Every long sequence where the number of computation cycles needed for 
computation the nth element is not smaller than a constant C is not 
practically compressible. The amount of information that the nth element 
depends on is then also limited by some constant.
If computation time grows like 2^n, n^1000, n or log(n), computation will 
become unfeasible in the long run. For log(n) that might be a very long time, 
but still eventually impossible.
This seems to me to be a more useful notion of compressibility than 
Kolmogorov's: feasible compressibility.
 
If the nth element just depends on the near past (say the last 5 elements or 
so) and the number of computations is bounded by some not very high constant 
finding a good prediction function is easy (with neural networks at least). 
But that is not the case in general. There may have been cardinal events in 
the past that need to be remembered in order to predict well. In that case 
computation can also be feasible. Of course if the determining events are too 
far in the past it's impossible to find the causal connection.

The trick is to identify those cardinal events, and then remember them, 
completely or just a part. 
For this I use a recurrent neural network system (Abstraction/Prediction). It 
is trained on prediction success. There is always a fixed number of 
calculations based on a fixed amount of information.
Abstraction (input, context) - new context
Prediction (input ,context) - predicted new input

Alas, a known problem with recurrent networks is that with backpropagation of 
errors the contributions of the far past to the error gradient go to zero too 
fast. (I remember, I may have stated this problem wrong)
A module in my system on its own can therefor only predict well on the basis 
of the near past. That near past and the near future must be feasibly 
compressible. Whether the rest is does not matter for this method.

A method help to solve (live with) this problem:
The state of the recurrent neural network (context) can be seen as an abstract 
description of the state the (perceived) sequence is in. The recent past is 
compressed into it (it can be said to simulate the Markov property), all 
relevant information is kept because of the training on prediction. 
irrelevant information is filtered out.  It can be seen as a classification 
of the near past sequence (and the expected near future), which I mentioned 
in earlier mails.
Sutton calls such state descriptions that help to predict predictive 
representations of state, PRS. These states can themselves be predicted, by a 
higher level module. To get grip on a farther past than the base level can, 
this module must run 'slower', i.e. not try to predict every PRS every time. 
It must predict the state of the PSR every 2, 10 or so cycles of the base 
level. Otherwise it will have the same time horizon as the base level.

A more natural way seems to me: predict what state the PRS is in the next time 
a prediction failure (of the base level) happens. The PRS can be seen as 
containing all the information about the future (also future PRSes) until a 
prediction failure. Apparantly the environment then switches to some new 
unforseen behaviour. Until then the environment did not contain any new 
information; it can be compressed to the PRS (and the Abstraction and 
Prediction neural networks which are fixed during run time).

The higher level module predicts the new PRS of the base level. It thus 
predicts in which abstract class the sequence will fall when it 'switches' to 
a new behaviour. If it is successful it can be said to have a farther 
predictive horizon than the base module. The drawback is that the predictions 
are stated less precise, in terms of a class of sequences, the PRS (if the 
environment is so complex that information had to be filtered out).   
Will it work? Only if the environment (and therefor the sequence) is simple 
enough. The perceived sequence must be feasibly compressible in the abstract 
representations of the PRS (of the base level), in the near past and the near 
future. The horizons are farther away than for the base module.

After that a third module can be build on the second to predict the PRS of the 
second module, and so on.
If the environment allows it, the future will be predicted further and 
further, but also vaguer and vaguer, or maybe better said, in more general 
terms (in complex but still feasible environments). The input of the top 
level will be the cause of many different sequences. 

This method seems to me to make long term planning possible in feasibly 
compressible environments in general terms. The 

Re: [agi] Complexity of environment of agi agent

2003-09-20 Thread arnoud
Hi Shane,

On Friday 19 September 2003 02:58, Shane Legg wrote:
 arnoud wrote:
  How large can those constants be? How complex may the environment be
  maximally for an ideal, but still realistic, agi agent (thus not a
  solomonof or AIXI agent) to be still succesful? Does somebody know how to
  calculate (and formalise) this?

 I'm not sure if this makes much sense.  An ideal agent is not going
 to be a realistic agent.  The bigger your computer and the better
 your software more complexity your agent will be able to deal with.

With an ideal realistic agent I meant the best software we can make on the 
best hardware we can make.


 The only way I could see that it would make sense would be if you
 could come up with an algorithm and prove that it made the best
 possible usage of time and space in terms of achieving its goals.
 Then the constants you are talking about would be set by this
 algorithm and the size of the biggest computer you could get.

Yes, but then the agent is made already. I think some estimates of the 
constants would help me to make design decisions. But if the constants can 
only be determined afterwards, they are of no use to me. 


  Not even an educated guess?
 
  But I think some things can be said:
  Suppose perception of the environment is just a bit at a time:
  ...010100010010010111010101010...
 
  In the random case: for any sequence of length l the number of possible
  patterns is 2^l. Completely hopeless, unless prediction precision need
  decreases also exponentially with l. But that is not realistic. You then
  know nothing, but you want nothing also.

 Yes, this defines the limiting case for Solomonoff Induction...

  in the logarithmic case: the number of possible patterns of length l
  increases logarithmically with l: #p  constant * log(l). If the constant
  is not to high this environment can be learned easily. There is no need
  for vagueness

 Not true.  Just because the sequence is very compressible in a
 Kolmogorov sense doesn't imply that it's easy to learn.  For example
 you could have some sequence where the computation time of the n-th
 bit take n^1000 computation cycles. There is only one pattern and
 it's highly compressible as it has a pretty short algorithm however
 there is no way you'll ever learn what the pattern is.

Do I have to see it like something that the value of the nth bit is a 
(complex) function of all the former bits? Then it makes sense to me. After 
some length l of the pattern computation becomes unfeasible.
But this is not the way I intend my system to handle patterns. It learns the 
pattern after a lot of repeted occurences of it (in perception). And then it 
just stores the whole pattern ;-) No compression there. But since the 
environment is made outof smaller patterns, the pattern can be formulated in 
those smaller patterns, and thus save memory space.
In the logarithmic case: say there are 2 patterns of length 100, then there 
are 3 patterns of length 1000. Let's say the 2 patterns of l = 100 are 
primitive and are stored bit by bit. The 3 patterns however can be stored 
using 10 bits for each. The 4 patterns of length 10^4 can be stored using 16 
bits for each, etc etc.
It isn't really different in the linear case, except that number of patterns 
that can be found in the environment grows linearly with l, and there's need 
for abstraction (i.e. storing as classes of sequences, lossy data 
compression).


  I suppose the point I'm trying to make is that complexity of the
  environment is not all. It's is also important to know how many of the
  complexity can be ignored.

 Yes.  The real measure of how difficult an environment is is not
 the complexity of the environment, but rather the complexity of
 the simplest solution to the problem that you need to solve in
 that environment.

Yes, but in general you don't know the complexity of the simplest solution of 
the problem in advance. It's more likely that you get to know first what the 
complexity of the environment is.
The strategy I'm proposing is: ignore everything that is too complex. Just 
forget about it and hope you can, otherwise it's just bad luck. Of course you 
want to do the very best to solve the problem, and that entails that some 
complex phenomenon that can be handled must not be ignored a priori; it must 
only be ignored if there is evidence that understanding that phenomenon does 
not help solving your the problem.
In order for this strategy to work you need to know what the maximum 
complexity is an agent can handle, as a function of the resources of the 
agent: Cmax(R). And it would be very helpful for making design decisions to 
know Cmax(R) in advance. You can then build in that everything above Cmax(R) 
should be ignored; 'vette pech' as we say in Dutch if you then are not able 
to solve the problem. 


 Shane

 P.S. one of these days I'm going to get around to replying to your
 other emails to me!!  sorry about the delay!

Ach, you're mailing now. I don't 

Re: [agi] Complexity of environment of agi agent

2003-09-20 Thread Shane Legg
Arnoud,

I'm not sure if this makes much sense.  An ideal agent is not going
to be a realistic agent.  The bigger your computer and the better
your software more complexity your agent will be able to deal with.
With an ideal realistic agent I meant the best software we can make on the 
best hardware we can make.
In which case I think the question is pretty much impossible to
answer.  Who knows what the best hardware we can make is?  Who
knows what the best software we can make is?

Do I have to see it like something that the value of the nth bit is a 
(complex) function of all the former bits? Then it makes sense to me. After 
some length l of the pattern computation becomes unfeasible.
But this is not the way I intend my system to handle patterns. It learns the 
pattern after a lot of repeted occurences of it (in perception). And then it 
just stores the whole pattern ;-) No compression there. But since the 
environment is made outof smaller patterns, the pattern can be formulated in 
those smaller patterns, and thus save memory space.
This is ok, but it does limit the sorts of things that your system
is able to do.  I actually suspect that humans do a lot of very
simple pattern matching like you suggest and in some sense fake
being able to work out complex looking patterns.  It's just that
we have seen so many patterns in the past and that we are very
good at doing fast and sometimes slightly abstract pattern matching
on a huge database of experience.   Nevertheless you need to be a
little careful because some very simple patterns that don't
repeat in a very explicit way could totally confuse your system:
123.9123

Your system, if I understand correctly, would not see the pattern
until it had seen the whole cycle several times.  Something like
5*100,000*2 = 1,000,000 characters into the sequence and even then
it would need to remember 100,000 characters of information.  A
human would see the pattern after just a few characters with
perhaps some uncertainly as to what will happen after the 9.
The total storage required for the pattern with a human would be
far less than 100,000 characters your system would need too.

Yes, but in general you don't know the complexity of the simplest solution of 
the problem in advance. It's more likely that you get to know first what the 
complexity of the environment is.
In general an agent doesn't know the complexity of its
environment either.

The strategy I'm proposing is: ignore everything that is too complex. Just 
forget about it and hope you can, otherwise it's just bad luck. Of course you 
want to do the very best to solve the problem, and that entails that some 
complex phenomenon that can be handled must not be ignored a priori; it must 
only be ignored if there is evidence that understanding that phenomenon does 
not help solving your the problem.
In order for this strategy to work you need to know what the maximum 
complexity is an agent can handle, as a function of the resources of the 
agent: Cmax(R). And it would be very helpful for making design decisions to 
know Cmax(R) in advance. You can then build in that everything above Cmax(R) 
should be ignored; 'vette pech' as we say in Dutch if you then are not able 
to solve the problem. 
Why not just do this dynamically?  Try to look at how much of
the agent's resources are being used for something and how much
benefit the agent is getting from this.  If something else comes
along that seems to have a better ratio of benefit to resource
usage then throw away some of the older stuff to free up resources
for this new thing.
Shane

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Complexity of environment of agi agent

2003-09-20 Thread Shane Legg
Ciao Arnoud,

Perhaps my pattern wasn't clear enough

1
2
3
4
.
.
.
00099
00100
00101
.
.
.
0
1
.
.
.
8
9
then repeat from the start again.  However each character is
part of the sequence.  So the agent sees 10002300...
So the whole pattern in some sense is 100,000 numbers
each of 5 characters giving a 500,000 character pattern
of digits from 0 to 9.  A human can learn this reasonably
easily but your AI won't.  It would take something more
like a mega byte to store the pattern.  Actually with
the overhead of all the rules it would be much bigger.
Shane

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Complexity of environment of agi agent

2003-09-20 Thread arnoud
Hi Shane, how's the barn holding,

 Perhaps my pattern wasn't clear enough

 1
 2
 3
 4
 .
 .
 .
 00099
 00100
 00101
 .
 .
 .
 0
 1
 .
 .
 .
 8
 9

 then repeat from the start again.  However each character is
 part of the sequence.  So the agent sees 10002300...


OK, now I see what you mean.

 So the whole pattern in some sense is 100,000 numbers
 each of 5 characters giving a 500,000 character pattern
 of digits from 0 to 9.  A human can learn this reasonably
 easily but your AI won't.  It would take something more
 like a mega byte to store the pattern.  Actually with
 the overhead of all the rules it would be much bigger.

My agent has these patterns for breakfast! I certainly hope so, at least.

Well, there is a very simple rule here, namely just add 1 arithmically to the 
last 5 inputs, and then you successfully predict the next five inputs. 
Can my system represent that rule? I think it can. If I simplify my system so 
that is does not act, just perceive and predict, there are 2 neural networks 
in one module (and I only need one):
Abstraction(C, I) - new C
Prediction (C, I) - predicted I
with C being the context vector (of bits), and I the input vector (of bits).
Abstraction must make sure that C has all the relevant information stored in 
it. C may contain the last 10 inputs. Or the five of the last block, and a 
counter for where it is in the new block. Prediction must then perform the 
operation of adding 1 and giving as output the value at the counter place 
plus 1. E.g. 00012 was the last block, the counter is 4. This is stored in C. 
Prediction calculates 00013 outof C and takes the fourth plus one character 
being '3'.
Abstraction({00012,3},{1}) - {00012,4}
Prediction({00012,4},{1}) - {3}
That is about how it will work, I suppose. If you see an error (or you have 
other patterns to think about) please say so. Of course Prediction and 
Abstraction will only work this way after a lot of training (prediction error 
minimalisation). (A nice deus ex machina I can always fall back on ;-) 

Maybe I've misled you with my recent mails of talk about storing patterns. In 
a way it does, that but not explicitely. It stores a mechanism how to extent 
a perceived sequence, i.e. predict the next step. 

Hoi,
Arnoud


---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Complexity of environment of agi agent

2003-09-19 Thread Shane Legg
arnoud wrote:

How large can those constants be? How complex may the environment be maximally 
for an ideal, but still realistic, agi agent (thus not a solomonof or AIXI 
agent) to be still succesful? Does somebody know how to calculate (and 
formalise) this?
I'm not sure if this makes much sense.  An ideal agent is not going
to be a realistic agent.  The bigger your computer and the better
your software more complexity your agent will be able to deal with.
The only way I could see that it would make sense would be if you
could come up with an algorithm and prove that it made the best
possible usage of time and space in terms of achieving its goals.
Then the constants you are talking about would be set by this
algorithm and the size of the biggest computer you could get.

Not even an educated guess?

But I think some things can be said:
Suppose perception of the environment is just a bit at a time: 
...010100010010010111010101010...

In the random case: for any sequence of length l the number of possible 
patterns is 2^l. Completely hopeless, unless prediction precision need 
decreases also exponentially with l. But that is not realistic. You then know 
nothing, but you want nothing also.
Yes, this defines the limiting case for Solomonoff Induction...


in the logarithmic case: the number of possible patterns of length l increases 
logarithmically with l: #p  constant * log(l). If the constant is not to 
high this environment can be learned easily. There is no need for vagueness 
Not true.  Just because the sequence is very compressible in a
Kolmogorov sense doesn't imply that it's easy to learn.  For example
you could have some sequence where the computation time of the n-th
bit take n^1000 computation cycles.  There is only one pattern and
it's highly compressible as it has a pretty short algorithm however
there is no way you'll ever learn what the pattern is.

I suppose the point I'm trying to make is that complexity of the environment 
is not all. It's is also important to know how many of the complexity can be 
ignored.
Yes.  The real measure of how difficult an environment is is not
the complexity of the environment, but rather the complexity of
the simplest solution to the problem that you need to solve in
that environment.
Shane

P.S. one of these days I'm going to get around to replying to your
other emails to me!!  sorry about the delay!
---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


RE: [agi] Complexity of environment of agi agent

2003-09-18 Thread Ben Goertzel


 How complex may the environment
 be maximally
 for an ideal, but still realistic, agi agent (thus not a
 solomonof or AIXI
 agent) to be still succesful? Does somebody know how to calculate (and
 formalise) this?

 Bye,
 Arnoud

There are two different questions here, and I'm not sure which one you mean

Given a set of computational resources R, we can ask either

1)
what is a complexity level C so that, for any environment E of complexity
level C, there is *some* AI system running on R that can predict events in E
reasonably well (i.e. assuming that R is specialized for E)?

or

2)
what is a complexity level C so that, for some AI system X running on R, and
for *any* environment E of complexity C, X can predict events in E
reasonably well (i.e. assuming that R is not specialized for E)?


The human brain for instance is highly specialized to certain environments,
and cannot predict equally intelligently in other environments of comparable
a priori complexity to the environments for which it's specialized.

Note that my computational resources R includes both space and time
resources.  Unless the time resources are unrealistically ample, this rules
out answering 2) by using an AI system X that does an AIXItl style search
through all programs that could run on resources (R - epsilon) and finding
the optimal one for the given environment E.

Of course, contemporary mathematics does not give us answers to either of
these questions.  Nobody knows


-- Ben G

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Complexity of environment of agi agent

2003-09-18 Thread Bill Hibbard
Hi Arnoud,

Two points:

1. Most real environments are open, non-linear systems,
which means they generally have some characteristic time
horizon beyond which prediction is no better than a guess
based on historical statistics, no matter how much
computation is used. For the weather, for example,
predictions beyond two or three weeks will never be
much better than just using climatological averages.

2. How well an agent has to do is often relative to other
competing agents, or when there is no competition all
an agent can expect is as good as possible. For example,
in predicting the stock market an agent only has to be
better than competing agents to make money. And in
predicting the weather, there is a real limit on how well
an agent can do.

Cheers,
Bill
--
Bill Hibbard, SSEC, 1225 W. Dayton St., Madison, WI  53706
[EMAIL PROTECTED]  608-263-4427  fax: 608-263-6738
http://www.ssec.wisc.edu/~billh/vis.html

On Thu, 18 Sep 2003, arnoud wrote:

 Dear AGI,

 What is the maximal complexity of an environment in which prediction of future
 events in the environment is still computationally feasible?

 In most 'realistic' environments prediction of the near future needs to be
 very precise and prediction of the far future can be vaguer, i.e. a large
 class of event types satisfies the prediction, in order for an agi agent to
 achieve its goals. (If prediction on the very long term also needs to be
 detailed and precise the environment is impossible for any agent, not?).

 In a formula, :
 (maximal vagueness of prediction that is allowed by agi agent, in order to
 plan and act successful to achieve goals) / (time scale of horizon of
 prediction) =  some constant.

 Vagueness of prediction is the number of perception event types that satify
 the prediction. (count the perception event types at bit level).

 somewhat equivalent formula:
 (number of patterns of about time length l that occur in the environment) /
 (about l) =  some constant

 A pattern here is a class of noisy variations on a pattern (10% noise, 20%
 noise? The smaller the constant gets the higher the noise ratio can be.).

 How large can those constants be? How complex may the environment be maximally
 for an ideal, but still realistic, agi agent (thus not a solomonof or AIXI
 agent) to be still succesful? Does somebody know how to calculate (and
 formalise) this?

 Bye,
 Arnoud


 ---
 To unsubscribe, change your address, or temporarily deactivate your subscription,
 please go to http://v2.listbox.com/member/[EMAIL PROTECTED]



---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]