Efficient Sequential Learning in Structured and Constrained Environments,
SequeL team, INRIA, Lille, France (PhD position)

*Keywords:* *multi-arm bandit, stochastic optimization, reinforcement
learning, apprenticeship learning, learning on graphs, transfer learning*
This Ph.D. program is focused on the problem of sequential learning in
structured and constrained environments. The key aspect of this problem is
that relatively little knowledge of the environment is available
beforehand, and the learner (a virtual or physical agent) has to
sequentially interact with environment to learn its *structure* and then
act optimally. This problem encompasses a wide range of applications
depending on the definition of the structure of the environment, the
*sequential
nature of the interaction* between learner and environment, and the type of
*constraints*. In particular, this Ph.D. program will be driven by two
application domains of great scientific and commercial importance: r*ecommender
systems* and *crowdsourcing*. In recommender systems, the sequential
interaction is particularly complex since the user-base keeps changing over
time (users subscribe and unsubscribe to the service) and the set of items
that can be recommended continuously evolve (products may be added or
removed or change characteristics). On the other hand, the environment is
highly structured, since users may have similar preferences and items share
similar features that could be exploited to improve the quality of the
recommendations. In crowdsourcing, users are payed to solve subtasks
assigned by an automatic system. The main challenge in this scenario is
that users may have very heterogeneous skills and the crowdsourcing
campaign has strict, and often contradicting, constraints on resources,
such as the money invested in the campaign, the time, the desired accuracy
in the solution of the global task, etc. In this case, it is crucial that
the learner identifies the best allocation of subtasks to users in order to
meet most of the desired constraints. Finally, both applications usually
involve large amounts of data (e.g., large user-base) which asks for
*efficient* learning algorithms with minimal time, space, and sample
complexity. In general, a successful sequential learning strategy should
efficiently allocate the constrained resources to *exploitation* (making
the best decision based on our current, but possibly imperfect, knowledge)
and to *exploration* (decisions that may appear sub-optimal but which may
reduce the uncertainty and, as a result, could improve the relevance of
future decisions). Background:The main mathematical models of reference to
deal with the problem of sequential learning are *multi-armed bandits*
and *reinforcement
learning*. The multi-armed bandit problem captures the essence of the
exploration-exploitation trade-off in an unknown environment where only
noisy observations of the performance of a set of arms (e.g., the
preference of a user for a product) are available and the objective is to
identify the arm with the best performance. On the other hand,
reinforcement learning formalizes the more general problem of
decision-making under uncertainty when the state of the environment evolves
in response to the action taken by the learner. Objectives:The objective of
this research program is to answer to a number of fundamental questions in
sequential learning in structured and constrained environments:

   - *Sequential transfer learning from multiple and concurrent sources:* It
   is often the case that similar problems belonging to the same domain should
   be solved over time (e.g., in crowdsourcing the user-base may be almost the
   same across different tasks). The objective of sequential transfer learning
   is to construct *general* knowledge from past experience and reuse it to
   improve the learning performance over subsequent tasks. The objective is to
   develop transfer algorithms for exploration-exploitation problems and
   understand how effective it could be and under which assumptions, negative
   transfer can be avoided. Furthermore, we also intend to investigate the
   scenario of transfer from parallel sources, like in distributed learning
   problems. In this case, the main constraints are time and communication. In
   fact, each source acts as a node in a network so that at each point in time
   the information available at different sources is limited and each source
   may be constrained in communicating the knowledge learned so far to other
   nodes.
   - *Learning on Graphs:* Graphs offer a natural representation for the
   structured problems. The users of a social network, the similarity between
   the content, the location proximity of the sensors: all these can be
   represented by a (weighted) graph. A decision-making strategy then often
   queries either nodes or edges, e.g., readings from a sensor network,
   product offers to a particular user in a social network, etc. Graph
   representation however comes with several challenges. A large number of
   nodes poses significant storage and computational difficulties and
   efficient approximations need to be often designed. Furthermore, real-world
   graphs are typically dynamic, with changing number of nodes and
   connections. Finally, there are cases when even the graph structure itself
   is unknown and its discovery needs to be a part of an efficient strategy.
   Summing up, learning in realistic graph needs to account for a*structure
   discovery*.
   - *Sequential apprenticeship learning:* In reinforcement learning, it is
   not always easy to define the desired behavior of an agent through a reward
   function. For instance, in recommender systems the objective is to maximize
   the long-term revenue, but other criteria may be of interest such as
   avoiding suggesting the same item too often, favor novel products, and
   introduce promotions. These behaviors are desirable but difficult to encode
   as a reward. Nonetheless, an expert may sequentially provide feedbacks to
   the strategy currently implemented by the learner over time and even
   propose corrections. The objective is to integrate the imitation learning
   and reinforcement learning in a unique learning system that can optimize
   the reward (e.g., long-term revenue) exploiting the corrections and
   examples provided by the expert. Another issue that should be considered is
   that the learner may have a finite budget of requests of feedbacks from the
   supervisor and should use them when it is more needed.

Job Description:The PhD candidate will focus on one or more issues related
to sequential learning in structured and constrained problems. The PhD
candidate will first acquire expertise in different topics of machine
learning such as online learning, multi-armed bandit, statistical learning
theory, reinforcement learning, approximate dynamic programming, and
algorithmic game theory. Then, the PhD candidate is expected to contribute
to the advancement of the literature on this problem along many different
lines: methodological (e.g., definition of general abstract models for a
wide range of decision-making problems), theoretical (e.g., near optimality
performance guarantees), and algorithmic (e.g., development of novel
algorithms for specific decision-making problems). The research activity of
the PhD candidate will be closely related to EU CompLACS project (
http://www.csml.ucl.ac.uk/projects/complacs/). This will allow the PhD
candidate to develop collaborations with other researchers participating in
these research projects and it may also allow him/her to spend part of his
research activity at partner laboratories such as University College London
(UK), Montanuniversitat Leoben (Austria), TU Darmstadt, etc. Possibility of
internships in the industry research labs such as: Technicolor Research or
INTEL Research in California. The starting date of PhD program flexible
after October 15th, 2014. Profile:The applicant must have a Master of
Science in Computer Science, Statistics, or related fields, possibly with
background in reinforcement learning, bandits, or optimization. The working
language in the lab is English, a good written and oral communication
skills are required. Application:The application should include a brief
description of research interests and past experience, a CV, degrees and
grades, a copy of Master thesis (or a draft thereof), motivation letter
(short but pertinent to this call), relevant publications, and other
relevant documents. Candidates are encouraged to provide letter(s) of
recommendation and contact information to reference persons. Please send
your application in one single pdf to michal.valko-at-inria.fr or
alessandro.lazaric-at-inria.fr. The *deadline* for the application is April
15th, 2014, but we encourage the applicants to contact us as soon as
possible. The final decision will be communicated in the *beginning of June*
.


   - *Application closing date:* April 15, 2014
   - *Duration: *3 years (a full time position)
   - *Starting date:* October 15st, 2014
   - *Supervisors:* Michal Valko <http://researchers.lille.inria.fr/~valko/>
   , Alessandro Lazaric <http://researchers.lille.inria.fr/~lazaric/>, and Remi
   Munos <http://researchers.lille.inria.fr/~munos/>
   - *Place:* SequeL, INRIA Lille - Nord Europe

Working Environment:The PhD candidate will work at SequeL (
https://sequel.lille.inria.fr/) lab at INRIA Lille - Nord Europe located in
Lille. INRIA (http://www.inria.fr/) is France's leading institution in
Computer Science, with over 2800 scientists employed, of which around 250
in Lille. Lille is the capital of the north of France, a metropolis with 1
million inhabitants, with excellent train connection to Brussels (30 min),
Paris (1h) and London (1h30). The SequeL lab is a dynamic lab at INRIA with
over 25 researchers (including PhD students) which covers several aspects
of machine learning from theory to applications, including statistical
learning, reinforcement learning, and sequential learning. Benefits:

   - *Duration:* 36 months – starting date of the contract : October 2014,
   15th
   - *Salary:* 1957,54 € the first two years and 2058,84 € the third year
   - Salary after taxes: around 1597,11€ the 1st two years and 1679,76 €
   the 3rd year (benefits included).
   - Possibility of French courses
   - Help for housing
   - Participation for public transport
   - Scientific Resident card and help for husband/wife visa

References:

   1. Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. Competing in the
   Dark: An Efficient Algorithm for Bandit Linear Optimization. In Proceedings
   of the 21st Annual Conference on Learning Theory (COLT'08), 2008.
   2. Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, and Karthik
   Sridharan. Stochastic Convex Optimization. In Proceedings of the Conference
   on Learning Theory (COLT'09), 2009.
   3. Pieter Abbeel, Andrew Ng: Apprenticeship learning via Inverse
   Reinforcement Learning, in Proceedings of the 21st International Conference
   on Machine Learning, (ICML) 2004.
   4. P Cesa-Bianchi, Nicolo , Gentile, Claudio, and Zappella, Giovanni. A
   Gang of Bandits. In Neural Information Processing Systems (NIPS), 2013.
   5. Li, Lihong, Chu, Wei, Langford, John, and Schapire, Robert E. A
   Contextual-Bandit Approach to Personalized News Article Recommendation. WWW
   10, 2010.
   6. Alon, Noga, Cesa-Bianchi, Nicolo , Gentile, Claudio, and Mansour,
   Yishay. From Bandits to Experts: A Tale of Domination and Independence. In
   Neural Information Processing Systems (NIPS), 2013.
   7. A. Lazaric. Transfer in Reinforcement Learning: a Framework and a
   Survey. In M. Wiering and M. van Otterlo, editors, Reinforcement Learning:
   State of the Art, Springer, 2011.

This call is posted at
http://researchers.lille.inria.fr/~valko/hp/call-phd-2014<http://researchers.lille.inria.fr/~valko/hp/call-phd-2014.php>
with
the most up-to-date information.
_______________________________________________
uai mailing list
[email protected]
https://secure.engr.oregonstate.edu/mailman/listinfo/uai

Reply via email to