Re: [agi] Complexity of environment of agi agent
Hi Shane, On Friday 19 September 2003 02:58, Shane Legg wrote: arnoud wrote: How large can those constants be? How complex may the environment be maximally for an ideal, but still realistic, agi agent (thus not a solomonof or AIXI agent) to be still succesful? Does somebody know how to calculate (and formalise) this? I'm not sure if this makes much sense. An ideal agent is not going to be a realistic agent. The bigger your computer and the better your software more complexity your agent will be able to deal with. With an ideal realistic agent I meant the best software we can make on the best hardware we can make. The only way I could see that it would make sense would be if you could come up with an algorithm and prove that it made the best possible usage of time and space in terms of achieving its goals. Then the constants you are talking about would be set by this algorithm and the size of the biggest computer you could get. Yes, but then the agent is made already. I think some estimates of the constants would help me to make design decisions. But if the constants can only be determined afterwards, they are of no use to me. Not even an educated guess? But I think some things can be said: Suppose perception of the environment is just a bit at a time: ...010100010010010111010101010... In the random case: for any sequence of length l the number of possible patterns is 2^l. Completely hopeless, unless prediction precision need decreases also exponentially with l. But that is not realistic. You then know nothing, but you want nothing also. Yes, this defines the limiting case for Solomonoff Induction... in the logarithmic case: the number of possible patterns of length l increases logarithmically with l: #p constant * log(l). If the constant is not to high this environment can be learned easily. There is no need for vagueness Not true. Just because the sequence is very compressible in a Kolmogorov sense doesn't imply that it's easy to learn. For example you could have some sequence where the computation time of the n-th bit take n^1000 computation cycles. There is only one pattern and it's highly compressible as it has a pretty short algorithm however there is no way you'll ever learn what the pattern is. Do I have to see it like something that the value of the nth bit is a (complex) function of all the former bits? Then it makes sense to me. After some length l of the pattern computation becomes unfeasible. But this is not the way I intend my system to handle patterns. It learns the pattern after a lot of repeted occurences of it (in perception). And then it just stores the whole pattern ;-) No compression there. But since the environment is made outof smaller patterns, the pattern can be formulated in those smaller patterns, and thus save memory space. In the logarithmic case: say there are 2 patterns of length 100, then there are 3 patterns of length 1000. Let's say the 2 patterns of l = 100 are primitive and are stored bit by bit. The 3 patterns however can be stored using 10 bits for each. The 4 patterns of length 10^4 can be stored using 16 bits for each, etc etc. It isn't really different in the linear case, except that number of patterns that can be found in the environment grows linearly with l, and there's need for abstraction (i.e. storing as classes of sequences, lossy data compression). I suppose the point I'm trying to make is that complexity of the environment is not all. It's is also important to know how many of the complexity can be ignored. Yes. The real measure of how difficult an environment is is not the complexity of the environment, but rather the complexity of the simplest solution to the problem that you need to solve in that environment. Yes, but in general you don't know the complexity of the simplest solution of the problem in advance. It's more likely that you get to know first what the complexity of the environment is. The strategy I'm proposing is: ignore everything that is too complex. Just forget about it and hope you can, otherwise it's just bad luck. Of course you want to do the very best to solve the problem, and that entails that some complex phenomenon that can be handled must not be ignored a priori; it must only be ignored if there is evidence that understanding that phenomenon does not help solving your the problem. In order for this strategy to work you need to know what the maximum complexity is an agent can handle, as a function of the resources of the agent: Cmax(R). And it would be very helpful for making design decisions to know Cmax(R) in advance. You can then build in that everything above Cmax(R) should be ignored; 'vette pech' as we say in Dutch if you then are not able to solve the problem. Shane P.S. one of these days I'm going to get around to replying to your other emails to me!! sorry about the delay! Ach, you're mailing now. I don't
Re: [agi] Complexity of environment of agi agent
Arnoud, I'm not sure if this makes much sense. An ideal agent is not going to be a realistic agent. The bigger your computer and the better your software more complexity your agent will be able to deal with. With an ideal realistic agent I meant the best software we can make on the best hardware we can make. In which case I think the question is pretty much impossible to answer. Who knows what the best hardware we can make is? Who knows what the best software we can make is? Do I have to see it like something that the value of the nth bit is a (complex) function of all the former bits? Then it makes sense to me. After some length l of the pattern computation becomes unfeasible. But this is not the way I intend my system to handle patterns. It learns the pattern after a lot of repeted occurences of it (in perception). And then it just stores the whole pattern ;-) No compression there. But since the environment is made outof smaller patterns, the pattern can be formulated in those smaller patterns, and thus save memory space. This is ok, but it does limit the sorts of things that your system is able to do. I actually suspect that humans do a lot of very simple pattern matching like you suggest and in some sense fake being able to work out complex looking patterns. It's just that we have seen so many patterns in the past and that we are very good at doing fast and sometimes slightly abstract pattern matching on a huge database of experience. Nevertheless you need to be a little careful because some very simple patterns that don't repeat in a very explicit way could totally confuse your system: 123.9123 Your system, if I understand correctly, would not see the pattern until it had seen the whole cycle several times. Something like 5*100,000*2 = 1,000,000 characters into the sequence and even then it would need to remember 100,000 characters of information. A human would see the pattern after just a few characters with perhaps some uncertainly as to what will happen after the 9. The total storage required for the pattern with a human would be far less than 100,000 characters your system would need too. Yes, but in general you don't know the complexity of the simplest solution of the problem in advance. It's more likely that you get to know first what the complexity of the environment is. In general an agent doesn't know the complexity of its environment either. The strategy I'm proposing is: ignore everything that is too complex. Just forget about it and hope you can, otherwise it's just bad luck. Of course you want to do the very best to solve the problem, and that entails that some complex phenomenon that can be handled must not be ignored a priori; it must only be ignored if there is evidence that understanding that phenomenon does not help solving your the problem. In order for this strategy to work you need to know what the maximum complexity is an agent can handle, as a function of the resources of the agent: Cmax(R). And it would be very helpful for making design decisions to know Cmax(R) in advance. You can then build in that everything above Cmax(R) should be ignored; 'vette pech' as we say in Dutch if you then are not able to solve the problem. Why not just do this dynamically? Try to look at how much of the agent's resources are being used for something and how much benefit the agent is getting from this. If something else comes along that seems to have a better ratio of benefit to resource usage then throw away some of the older stuff to free up resources for this new thing. Shane --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Complexity of environment of agi agent
Ciao Arnoud, Perhaps my pattern wasn't clear enough 1 2 3 4 . . . 00099 00100 00101 . . . 0 1 . . . 8 9 then repeat from the start again. However each character is part of the sequence. So the agent sees 10002300... So the whole pattern in some sense is 100,000 numbers each of 5 characters giving a 500,000 character pattern of digits from 0 to 9. A human can learn this reasonably easily but your AI won't. It would take something more like a mega byte to store the pattern. Actually with the overhead of all the rules it would be much bigger. Shane --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Complexity of environment of agi agent
Hi Shane, how's the barn holding, Perhaps my pattern wasn't clear enough 1 2 3 4 . . . 00099 00100 00101 . . . 0 1 . . . 8 9 then repeat from the start again. However each character is part of the sequence. So the agent sees 10002300... OK, now I see what you mean. So the whole pattern in some sense is 100,000 numbers each of 5 characters giving a 500,000 character pattern of digits from 0 to 9. A human can learn this reasonably easily but your AI won't. It would take something more like a mega byte to store the pattern. Actually with the overhead of all the rules it would be much bigger. My agent has these patterns for breakfast! I certainly hope so, at least. Well, there is a very simple rule here, namely just add 1 arithmically to the last 5 inputs, and then you successfully predict the next five inputs. Can my system represent that rule? I think it can. If I simplify my system so that is does not act, just perceive and predict, there are 2 neural networks in one module (and I only need one): Abstraction(C, I) - new C Prediction (C, I) - predicted I with C being the context vector (of bits), and I the input vector (of bits). Abstraction must make sure that C has all the relevant information stored in it. C may contain the last 10 inputs. Or the five of the last block, and a counter for where it is in the new block. Prediction must then perform the operation of adding 1 and giving as output the value at the counter place plus 1. E.g. 00012 was the last block, the counter is 4. This is stored in C. Prediction calculates 00013 outof C and takes the fourth plus one character being '3'. Abstraction({00012,3},{1}) - {00012,4} Prediction({00012,4},{1}) - {3} That is about how it will work, I suppose. If you see an error (or you have other patterns to think about) please say so. Of course Prediction and Abstraction will only work this way after a lot of training (prediction error minimalisation). (A nice deus ex machina I can always fall back on ;-) Maybe I've misled you with my recent mails of talk about storing patterns. In a way it does, that but not explicitely. It stores a mechanism how to extent a perceived sequence, i.e. predict the next step. Hoi, Arnoud --- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]