Hello everyone,

I have perused the past email archives for discussion on "partial_fit" and
the online-learning APIs. As per my understanding, having a "partial_fit"
kind of API essentially modifies a supervised learning classifier so that
it can accept a continuous stream of data.

In that case, my proposal to explicitly implement reinforcement learning
methods is distinct in the sense that there are no input/output pairs
presented to the learning algorithms. Rather, the algorithms are given
reward "scores", and the focus is on the continuous online performance, and
the aim is to maximize the total score over the course of operation.
Another way to look at it is that the algorithm adaptively samples the
search space, with an aim to balance the exploration of the space to gain
knowledge, versus exploiting the current knowledge...

regards
shankar.


On Mon, Mar 12, 2012 at 11:30 AM, Shankar Satish <[email protected]>wrote:

>
> Hi Andy,
>
> An explanation of Gittin's index:
>
> Imagine you are in a casino which has N slot machines. Each machine can be
> played by inserting a 1-dollar coin in it and pulling a lever. Every time
> you pull a lever, the machine might spit out some amount of money, governed
> by an underlying (unknown) stochastic process.
>
> Suppose you have K-dollars in your pocket available for playing. In what
> order must you play to maximize your expected reward? The more you explore
> various machines, the more you learn about the stochastic processes that
> govern each machine. However after a while, you want to exploit your
> knowledge and simply play the machine with the highest expected reward.
>
> The Gittin's index provides an answer to this question by computing an
> index based on the past history of plays and wins for each machine. Once
> you have computed it, you simply play the machine with the highest index.
> The Gittin's index is a special case of more general "dynamic allocation
> indices", which offer index-based policies for situations where  you need
> to make such exploration-versus-exploitation trade-offs.
>
> Here is a practical example of real-world use: I am using a Gittins-index
> based strategy to tune an underwater communication link. In my lab, we have
> a powerful modem that has various signal processing algorithms implemented
> in it, which can be used with various settings. The settings that you use
> will determine the data rate of the link. The total number of settings are
> of the order of a few million, and since i obviously cannot test them all,
> i use a Gittins-based strategy to decide which setting to test next.
>
> Regarding value/policy iteration: You are right that these are dynamic
> programming solutions that assume that a transition model is given. In a
> many reinforcement learning situations, the transition model will be
> unknown, and will need to be learnt. One of the ways to do so is by doing
> monte-carlo runs to approximate the value function. Q-learning is another
> popular algorithm that tries to do that.
>
> Once we have the Gittins index+policy/value iteration in place, i can
> possibly look into implementing the monte-carlo methods as part of the
> project too, depending on whether we have sufficient time left...
>
> regards
> shankar.
>
>
>
> On Sun, Mar 11, 2012 at 8:59 PM, Andreas Mueller <[email protected]
> > wrote:
>
>>  Hi Shankar.
>>
>> Can you explain in how far these two ideas are related to online learning?
>> I am not familiar with Gittin's index but value iteration and policy
>> iteration
>> are dynamic programming algorithms that assume a model is given.
>> I would implement those by either giving the full transition model to the
>> algorithm
>> or by giving functions that evaluate the model to the algorithm.
>>
>> For Monte Carlo algorithms I could the connection a bit better though it
>> probably
>> depends a lot on your application. This could also be implemented with a
>> call-back.
>>
>> Are there any reinforcement learning experts in the project that can
>> comment on that?
>>
>> In general having reinforcement learning in the scikit seems interesting,
>> though I am not
>> sure whether that is in the current scope. On the other hand, if it helps
>> refine the API
>> that would be great :)
>>
>> Cheers,
>> Andy
>>
>>
>> On 03/11/2012 08:47 AM, Shankar Satish wrote:
>>
>> Hello everyone,
>>
>>  I am a prospective GSOC-2012 student. I have some project ideas that i
>> would like to bounce-off the community:
>>
>>  I would like add online-learning functionality. To do so, we can
>> implement some reinforcement-learning algorithms. The problem is described
>> in terms of an "agent" that needs to choose between various possible
>> actions, and it gets feedback based on the results. Reinforcement learning
>> techniques are concerned with finding optimal policies for the agent under
>> various conditions.
>>
>>  Specifically, i have 2 algorithms in mind to start off with:
>>
>>  1. Computation of the Gittin's index, which is an optimal solution to
>> the multi-armed bandit problem:
>> http://en.wikipedia.org/wiki/Multi-armed_bandit
>> 2. Value iteration/policy iteration:
>> http://en.wikipedia.org/wiki/Markov_decision_process#Value_iteration
>>
>>  So, what do you think about these ideas in general? Also, are they
>> suitable candidates for a GSOC project?
>>
>>  regards
>> shankar.
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Virtualization & Cloud Management Using Capacity Planning
>> Cloud computing makes use of virtualization - but cloud computing
>> also focuses on allowing computing to be delivered as a 
>> service.http://www.accelacomm.com/jaw/sfnl/114/51521223/
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Virtualization & Cloud Management Using Capacity Planning
>> Cloud computing makes use of virtualization - but cloud computing
>> also focuses on allowing computing to be delivered as a service.
>> http://www.accelacomm.com/jaw/sfnl/114/51521223/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to