[Pyro-users] Re: Pyro-users Digest, Vol 36, Issue 1

matthew studley Wed, 10 Jan 2007 01:46:21 -0800

> My question is what is the best way to train these networks? My
> current strategy is to do nothing until the game is over. I'll use a
> static algorithm to reliably score the end game state. Then I'll
> create a training corpus by taking the score and iterating through
> each move in the game, creating training sets in the form of
> [boardstate, finalscore].


I think you'll run into problems with this training strategy; is each
move worth the final score?.  You might want to look at TD-Lambda,
Q-learning or Sarsa algorithms.  See the book on "Reinforcement
Learning" by Sutton and Barto.

it's online at :
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html

some work by IBM using TD-Lambda to train an ANN to play backgammon:

http://www.research.ibm.com/massive/tdl.html

regards

Matt 

-- 
Dr Matthew Studley
Artificial Intelligence Group

Faculty of Computer Science, 
  Engineering and Mathematics
University of the West of England
Coldharbour Lane
Frenchay
Bristol
UK
BS16 1QY
=================================
tel: +44 (0) 11732 83177
mob: +44 (0) 7712 659022



This email was independently scanned for viruses by McAfee anti-virus software 
and none were found
_______________________________________________
Pyro-users mailing list
[email protected]
http://emergent.brynmawr.edu/mailman/listinfo/pyro-users

[Pyro-users] Re: Pyro-users Digest, Vol 36, Issue 1

Reply via email to