Applications are invited for a Ph.D. with Alessandro Lazaric at SequeL,
INRIA-Lille.
Ph.D. Title: Transfer in multi-armed bandit and reinforcement learning
Keywords: reinforcement learning, multi-armed bandit, transfer learning,
exploration-exploitation, representation learning, hierarchical learning.
Research Topic
The main objective of this Ph.D. research project is to advance the
state-of-the-art in the field of multi-armed bandit (MAB) and reinforcement
learning (RL) through the development of novel transfer learning algorithms.
Multi-armed bandit and reinforcement learning formalize the problem of learning
an optimal behavior policy from the experience directly collected from an
unknown environment. Such general model already provides powerful tools that
can be used to learn from data in a very diverse range of applications (e.g.,
see successful applications to computer games, recommendation systems, energy
management, logistics, and autonomous robotics). Nonetheless, practical
limitations of current algorithms encouraged research in developing efficient
ways to integrate expert prior knowledge into the learning process. Although
this improves the performance of RL algorithms, it dramatically reduces their
autonomy, since it requires a constant supervision by a domain expert. A
solution to this problem is provided by transfer learning , which is directly
motivated by the observation that one of the key features that allows humans to
accomplish complicated tasks is their ability of building general knowledge
from past experience and transfer it in learning new tasks . Thus, we believe
that bringing the capability of transfer of learning to existing machine
learning algorithms will enable them to solve series of tasks in complex and
unknown environments. The objective is to develop algorithms that not only
learn from experience but also extract knowledge and transfer it through
different tasks; thus obtaining a dramatic speed-up in the learning process and
a significant improvement of its overall performance. Thus, the general
objective in this Ph.D. project is to design RL algorithms able to
incrementally discover, construct, and transfer “prior” knowledge in a fully
automatic way.
Research Program
While the idea of transfer learning has been applied in a series of machine
learning problems, its integration in MAB and RL is much more complicated. In
fact, the number of scenarios that can be constructed and the different types
of knowledge that can be constructed and transferred is much larger than in
simpler problems, such as supervised learning. During the Ph.D. we will thus
investigate a variety of approaches to transfer in MAB and RL, ranging from
transfer of sample to transfer of representations. We will address some of the
following questions:
(i) Exploration. Which knowledge transfer can provably improve the
exploration-exploitation performance of a MAB and RL algorithm in terms of
sample complexity and regret?
(ii) Representation. Which techniques of representation better fit into
transfer in RL?
(iii) Hierarchical structures. Is it possible to prove the advantage of
hierarchical structures over flat structures in MAB (e.g., hierarchical
clustering) and in RL (e.g., options)? Under which assumptions? How can we
create such hierarchies automatically?
The previous questions will require theoretical, algorithmic and empirical
study. The Ph.D. will cover different learning scenarios (e.g., multi-armed
bandit, linear bandit, contextual bandit, full reinforcement learning) and
different validation environments (e.g., fully synthetic, off-line evaluation
from logged data, online simulation in real-world applications). As such, we
expect the Ph.D. to produce a variety of results:
* Theoretical study of the conditions and the type of improvement brought
by transfer methods w.r.t. no-transfer standard RL algorithms.
* Empirical validation of the proposed algorithms and comparison with
existing transfer and no-transfer methods.
* Investigation of the application of transfer in RL to real-world problems
such as recommendation systems, trading, and computer games.
Profile
The applicant must have a Master of Science in Computer Science, Statistics, or
related fields, possibly with background in reinforcement learning, bandits, or
optimization. Candidates with either very strong mathematical or computer
science background will be considered. The working language in the lab is
English, a good written and oral communication skills are mandatory.
Application
The application should include a brief description of research interests and
past experience, a CV, degrees and grades, a copy of Master thesis (or a draft
thereof), motivation letter (short but pertinent to this call), relevant
publications, and other relevant documents. Candidates are encouraged to
provide letter(s) of recommendation and contact information to reference
persons. Please send your application in one single pdf to
alessandro.lazaric-at-inria.fr .
* Application closing date: May 15 , 2015
* Interviews: May/June 2015
* Final decision : June/July 2015
* Duration: 3 years (a full time position)
* Starting date: October 15, 2015 (flexible)
* Supervisor: Alessandro Lazaric
* Place: SequeL, INRIA Lille - Nord Europe
Working environment
The PhD candidate will work at SequeL ( https://sequel.lille.inria.fr/ ) lab at
Inria Lille - Nord Europe located in Lille. Inria ( http://www.inria.fr/ ) is
France's leading institution in Computer Science, with over 2800 scientists
employed, of which around 250 in Lille. Lille is the capital of the north of
France, a metropolis with 1 million inhabitants, with excellent train
connection to Brussels (30 min), Paris (1h) and London (1h30). The research
team SequeL (Sequential Learning) is composed of about 20 members working in
machine learning, notably in reinforcement learning, multi-armed bandit,
statistical learning, and sequence prediction. The Ph.D. program will be
co-funded by the ANR ExTra-Learn project, which is entirely focused on the
problem of transfer in RL.
Benefits
* Salary: 1957,54 € the first two years and 2058,84 € the third year
* Salary after taxes: around 1597,11€ the 1st two years and 1679,76 € the
3rd year (benefits included).
* Possibility of French courses
* Help for housing
* Participation for public transport
* Scientific Resident card and help for husband/wife visa
References
[1] D. Calandriello, A. Lazaric, M. Restelli. “Sparse Multi-task Reinforcement
Learning”. In Proceedings of the Twenty-Eigth Annual Conference on Neural
Information Processing Systems (NIPS'14) , 2014.
[2] M. Gheshlaghi-Azar, A. Lazaric, E. Brunskill. “Resource-efficient
Stochastic Optimization of a Locally Smooth Function under Correlated Bandit
Feedback”. In Proceedings of the Thirty-First International Conference on
Machine Learning (ICML'14) , 2014.
[3] M. Azar, A. Lazaric, and E. Brunskill. “Sequential Transfer in Multi-armed
Bandit with Finite Set of Models”. In: Proceedings of the Twenty-Seventh Annual
Conference on Neural Information Processing Systems (NIPS'13) . 2013. pp.
2220-2228.
[4] A. Lazaric and M. Restelli. “Transfer from Multiple MDPs”. In Proceedings
of the Twenty-Fifth Annual Conference on Neural Information Processing Systems
(NIPS'11) , 2011.
[5] A. Lazaric. “Transfer in Reinforcement Learning: a Framework and a Survey”.
In M. Wiering and M. van Otterlo, editors, Reinforcement Learning: State of the
Art , Springer, 2011.
[6] M. E. Taylor and P. Stone. “Transfer Learning for Reinforcement Learning
Domains: A Survey”. Journal of Machine Learning Research , 10(1): pp.
1633–1685, 2009.
[7] R. S. Sutton and A. Barto. Reinforcement Learning: an Introduction . MIT
Press, Cambridge, MA, 1998.
_______________________________________________
uai mailing list
[email protected]
https://secure.engr.oregonstate.edu/mailman/listinfo/uai