Applications are invited for a Ph.D. with Alessandro Lazaric at SequeL, 
INRIA-Lille. 



Ph.D. Title: Transfer in multi-armed bandit and reinforcement learning 

Keywords: reinforcement learning, multi-armed bandit, transfer learning, 
exploration-exploitation, representation learning, hierarchical learning. 



Research Topic 

The main objective of this Ph.D. research project is to advance the 
state-of-the-art in the field of multi-armed bandit (MAB) and reinforcement 
learning (RL) through the development of novel transfer learning algorithms. 
Multi-armed bandit and reinforcement learning formalize the problem of learning 
an optimal behavior policy from the experience directly collected from an 
unknown environment. Such general model already provides powerful tools that 
can be used to learn from data in a very diverse range of applications (e.g., 
see successful applications to computer games, recommendation systems, energy 
management, logistics, and autonomous robotics). Nonetheless, practical 
limitations of current algorithms encouraged research in developing efficient 
ways to integrate expert prior knowledge into the learning process. Although 
this improves the performance of RL algorithms, it dramatically reduces their 
autonomy, since it requires a constant supervision by a domain expert. A 
solution to this problem is provided by transfer learning , which is directly 
motivated by the observation that one of the key features that allows humans to 
accomplish complicated tasks is their ability of building general knowledge 
from past experience and transfer it in learning new tasks . Thus, we believe 
that bringing the capability of transfer of learning to existing machine 
learning algorithms will enable them to solve series of tasks in complex and 
unknown environments. The objective is to develop algorithms that not only 
learn from experience but also extract knowledge and transfer it through 
different tasks; thus obtaining a dramatic speed-up in the learning process and 
a significant improvement of its overall performance. Thus, the general 
objective in this Ph.D. project is to design RL algorithms able to 
incrementally discover, construct, and transfer “prior” knowledge in a fully 
automatic way. 



Research Program 

While the idea of transfer learning has been applied in a series of machine 
learning problems, its integration in MAB and RL is much more complicated. In 
fact, the number of scenarios that can be constructed and the different types 
of knowledge that can be constructed and transferred is much larger than in 
simpler problems, such as supervised learning. During the Ph.D. we will thus 
investigate a variety of approaches to transfer in MAB and RL, ranging from 
transfer of sample to transfer of representations. We will address some of the 
following questions: 



(i) Exploration. Which knowledge transfer can provably improve the 
exploration-exploitation performance of a MAB and RL algorithm in terms of 
sample complexity and regret? 

(ii) Representation. Which techniques of representation better fit into 
transfer in RL? 

(iii) Hierarchical structures. Is it possible to prove the advantage of 
hierarchical structures over flat structures in MAB (e.g., hierarchical 
clustering) and in RL (e.g., options)? Under which assumptions? How can we 
create such hierarchies automatically? 



The previous questions will require theoretical, algorithmic and empirical 
study. The Ph.D. will cover different learning scenarios (e.g., multi-armed 
bandit, linear bandit, contextual bandit, full reinforcement learning) and 
different validation environments (e.g., fully synthetic, off-line evaluation 
from logged data, online simulation in real-world applications). As such, we 
expect the Ph.D. to produce a variety of results: 

    * Theoretical study of the conditions and the type of improvement brought 
by transfer methods w.r.t. no-transfer standard RL algorithms. 
    * Empirical validation of the proposed algorithms and comparison with 
existing transfer and no-transfer methods. 
    * Investigation of the application of transfer in RL to real-world problems 
such as recommendation systems, trading, and computer games. 



Profile 

The applicant must have a Master of Science in Computer Science, Statistics, or 
related fields, possibly with background in reinforcement learning, bandits, or 
optimization. Candidates with either very strong mathematical or computer 
science background will be considered. The working language in the lab is 
English, a good written and oral communication skills are mandatory. 



Application 

The application should include a brief description of research interests and 
past experience, a CV, degrees and grades, a copy of Master thesis (or a draft 
thereof), motivation letter (short but pertinent to this call), relevant 
publications, and other relevant documents. Candidates are encouraged to 
provide letter(s) of recommendation and contact information to reference 
persons. Please send your application in one single pdf to 
alessandro.lazaric-at-inria.fr . 


    * Application closing date: May 15 , 2015 
    * Interviews: May/June 2015 
    * Final decision : June/July 2015 
    * Duration: 3 years (a full time position) 
    * Starting date: October 15, 2015 (flexible) 
    * Supervisor: Alessandro Lazaric 
    * Place: SequeL, INRIA Lille - Nord Europe 



Working environment 

The PhD candidate will work at SequeL ( https://sequel.lille.inria.fr/ ) lab at 
Inria Lille - Nord Europe located in Lille. Inria ( http://www.inria.fr/ ) is 
France's leading institution in Computer Science, with over 2800 scientists 
employed, of which around 250 in Lille. Lille is the capital of the north of 
France, a metropolis with 1 million inhabitants, with excellent train 
connection to Brussels (30 min), Paris (1h) and London (1h30). The research 
team SequeL (Sequential Learning) is composed of about 20 members working in 
machine learning, notably in reinforcement learning, multi-armed bandit, 
statistical learning, and sequence prediction. The Ph.D. program will be 
co-funded by the ANR ExTra-Learn project, which is entirely focused on the 
problem of transfer in RL. 



Benefits 

    * Salary: 1957,54 € the first two years and 2058,84 € the third year 
    * Salary after taxes: around 1597,11€ the 1st two years and 1679,76 € the 
3rd year (benefits included). 
    * Possibility of French courses 
    * Help for housing 
    * Participation for public transport 
    * Scientific Resident card and help for husband/wife visa 



References 

[1] D. Calandriello, A. Lazaric, M. Restelli. “Sparse Multi-task Reinforcement 
Learning”. In Proceedings of the Twenty-Eigth Annual Conference on Neural 
Information Processing Systems (NIPS'14) , 2014. 

[2] M. Gheshlaghi-Azar, A. Lazaric, E. Brunskill. “Resource-efficient 
Stochastic Optimization of a Locally Smooth Function under Correlated Bandit 
Feedback”. In Proceedings of the Thirty-First International Conference on 
Machine Learning (ICML'14) , 2014. 

[3] M. Azar, A. Lazaric, and E. Brunskill. “Sequential Transfer in Multi-armed 
Bandit with Finite Set of Models”. In: Proceedings of the Twenty-Seventh Annual 
Conference on Neural Information Processing Systems (NIPS'13) . 2013. pp. 
2220-2228. 

[4] A. Lazaric and M. Restelli. “Transfer from Multiple MDPs”. In Proceedings 
of the Twenty-Fifth Annual Conference on Neural Information Processing Systems 
(NIPS'11) , 2011. 

[5] A. Lazaric. “Transfer in Reinforcement Learning: a Framework and a Survey”. 
In M. Wiering and M. van Otterlo, editors, Reinforcement Learning: State of the 
Art , Springer, 2011. 

[6] M. E. Taylor and P. Stone. “Transfer Learning for Reinforcement Learning 
Domains: A Survey”. Journal of Machine Learning Research , 10(1): pp. 
1633–1685, 2009. 

[7] R. S. Sutton and A. Barto. Reinforcement Learning: an Introduction . MIT 
Press, Cambridge, MA, 1998. 

_______________________________________________
uai mailing list
[email protected]
https://secure.engr.oregonstate.edu/mailman/listinfo/uai

Reply via email to