Hi Andy,

An explanation of Gittin's index:

Imagine you are in a casino which has N slot machines. Each machine can be
played by inserting a 1-dollar coin in it and pulling a lever. Every time
you pull a lever, the machine might spit out some amount of money, governed
by an underlying (unknown) stochastic process.

Suppose you have K-dollars in your pocket available for playing. In what
order must you play to maximize your expected reward? The more you explore
various machines, the more you learn about the stochastic processes that
govern each machine. However after a while, you want to exploit your
knowledge and simply play the machine with the highest expected reward.

The Gittin's index provides an answer to this question by computing an
index based on the past history of plays and wins for each machine. Once
you have computed it, you simply play the machine with the highest index.
The Gittin's index is a special case of more general "dynamic allocation
indices", which offer index-based policies for situations where  you need
to make such exploration-versus-exploitation trade-offs.

Here is a practical example of real-world use: I am using a Gittins-index
based strategy to tune an underwater communication link. In my lab, we have
a powerful modem that has various signal processing algorithms implemented
in it, which can be used with various settings. The settings that you use
will determine the data rate of the link. The total number of settings are
of the order of a few million, and since i obviously cannot test them all,
i use a Gittins-based strategy to decide which setting to test next.

Regarding value/policy iteration: You are right that these are dynamic
programming solutions that assume that a transition model is given. In a
many reinforcement learning situations, the transition model will be
unknown, and will need to be learnt. One of the ways to do so is by doing
monte-carlo runs to approximate the value function. Q-learning is another
popular algorithm that tries to do that.

Once we have the Gittins index+policy/value iteration in place, i can
possibly look into implementing the monte-carlo methods as part of the
project too, depending on whether we have sufficient time left...

regards
shankar.



On Sun, Mar 11, 2012 at 8:59 PM, Andreas Mueller
<[email protected]>wrote:

>  Hi Shankar.
>
> Can you explain in how far these two ideas are related to online learning?
> I am not familiar with Gittin's index but value iteration and policy
> iteration
> are dynamic programming algorithms that assume a model is given.
> I would implement those by either giving the full transition model to the
> algorithm
> or by giving functions that evaluate the model to the algorithm.
>
> For Monte Carlo algorithms I could the connection a bit better though it
> probably
> depends a lot on your application. This could also be implemented with a
> call-back.
>
> Are there any reinforcement learning experts in the project that can
> comment on that?
>
> In general having reinforcement learning in the scikit seems interesting,
> though I am not
> sure whether that is in the current scope. On the other hand, if it helps
> refine the API
> that would be great :)
>
> Cheers,
> Andy
>
>
> On 03/11/2012 08:47 AM, Shankar Satish wrote:
>
> Hello everyone,
>
>  I am a prospective GSOC-2012 student. I have some project ideas that i
> would like to bounce-off the community:
>
>  I would like add online-learning functionality. To do so, we can
> implement some reinforcement-learning algorithms. The problem is described
> in terms of an "agent" that needs to choose between various possible
> actions, and it gets feedback based on the results. Reinforcement learning
> techniques are concerned with finding optimal policies for the agent under
> various conditions.
>
>  Specifically, i have 2 algorithms in mind to start off with:
>
>  1. Computation of the Gittin's index, which is an optimal solution to
> the multi-armed bandit problem:
> http://en.wikipedia.org/wiki/Multi-armed_bandit
> 2. Value iteration/policy iteration:
> http://en.wikipedia.org/wiki/Markov_decision_process#Value_iteration
>
>  So, what do you think about these ideas in general? Also, are they
> suitable candidates for a GSOC project?
>
>  regards
> shankar.
>
>
>
>
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a 
> service.http://www.accelacomm.com/jaw/sfnl/114/51521223/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to