Chris S wrote:
Can a Conx NN currently be trained via reinforcement learning
techniques, such as TD(lambda)?
Chris,
Yes, there is the beginning of some support in Conx for similar
techniques. You can at least use this as an example. Take a look at the
SigmaNetwork in pyrobot/brain/conx.py. It is an example of a type of
CRBP (complimentary reinforcement backprop). It works like this:
1. Propagate the activations through the network
2. Flip a coin for each output, using the output as a bias. For example,
if the output is .75, then the flip will be biased to be 1.0, otherwise
it will be 0.0.
3. Sum up those 1 and 0 values. Use those as "votes", where majority
rules. Ie, if a majority have a value of 1, then the network says 1.
4. Compare that vote with the target (just one value in this example).
If it matches, then train the net with the vector from step 2 as a
target vector. If it doesn't match, then use the complement of the
vector from step 2 as the target vector.
5. Set the error of the output layer with the difference from step 5.
6. Do backprop as usual.
7. repeat
Here is a CRBP network for learning XOR with the SigmaNetwork:
from pyrobot.brain.conx import SigmaNetwork
net = SigmaNetwork()
net.addLayers(2, 5, 11)
net.setInputs( [[0, 0], [0, 1], [1, 0], [1, 1]] )
net.setTargets( [[0], [1], [1], [0]] )
net.train()
net.interactive = 1
net.learning = 0
net.sweep()
Notice that the output layer has 11 nodes. You can have as many as you
want, because it is the sum that is important. (The hidden can be any
size too). The network is free to create a self-organized output
representation. See:
Ackley and Littman
http://www.cs.duke.edu/~mlittman/docs/nips-crbp.ps
for more information on CRBP.
The SigmaNetwork shows how you can separate the backprop steps in order
to get in there and set the target and error independently of the normal
BP process.
Hope that helps,
-Doug
Regards,
Chris
_______________________________________________
Pyro-users mailing list
[email protected]
http://emergent.brynmawr.edu/mailman/listinfo/pyro-users
_______________________________________________
Pyro-users mailing list
[email protected]
http://emergent.brynmawr.edu/mailman/listinfo/pyro-users