Proposal for a PhD position beginning in September 2014: “Bayesian model of the 
joint development of perception, action and phonology”
 

Context

The Speech Unit(e)s project is focused on the speech unification process 
associating the auditory, visual and motor streams in the human brain, in an 
interdisciplinary approach combining cognitive psychology, neurosciences, 
phonetics (both descriptive and developmental) and computational models. The 
framework is provided by the “Perception-for-Action-Control Theory (PACT)” 
developed by the PI (Schwartz et al., 2012).

PACT is a perceptuo-motor theory of speech communication, which connects in a 
principled way perceptual shaping and motor procedural knowledge in speech 
multisensory processing. The communication unit in PACT is neither a sound nor 
a gesture but a perceptually shaped gesture, that is a perceptuo-motor unit. It 
is characterised by both articulatory coherence – provided by its gestural 
nature – and perceptual value – necessary for being functional. PACT considers 
two roles for the perceptuo-motor link in speech perception: online unification 
of the sensory and motor streams through audio-visuo-motor binding, and offline 
joint emergence of the perceptual and motor repertoires in speech development.

 

Objectives of the PhD position

In the debates between auditory and motor theories of speech perception, and in 
their modern revival concerning the role of the dorsal route (Hickok & Poeppel, 
2004, 2007), there is no real reflexion about what could be the functional role 
of a perceptuo-motor coupling for speech perception. The “dorsal route” is 
supposed to be useful in “adverse conditions”, e.g. in noise or with a foreign 
language (Callan et al., 2004; Zekveld et al., 2006). But no theoretical 
explanation is actually proposed for this potential efficiency of motor 
processes in adverse conditions.
We have recently developed a computational framework enabling to compare the 
predictions of auditory, motor and perceptuo-motor theories in various kinds of 
situations (Moulin-Frier et al., 2012). Casting these theories into a single, 
unified mathematical framework is an efficient way to compare the theories and 
their properties in a systematic manner. Bayesian modeling is a mathematical 
framework that precisely allows such comparisons. The trick is that the same 
tool, namely probabilities, can be used both for defining the models and for 
comparing them (see e.g. Myung & Pitt, 2009).
The generic model we developed is called COSMO, which stands for "Communicating 
about Objects using Sensory-Motor Operations". The COSMO acronym also 
represents the five variables around which the basic structure of the model is 
built. In COSMO, communication (C) is a success when an object OS in the 
speaker’s mind is transferred, via sensory and motor means S and M, to the 
listener’s mind where it is correctly recovered as OL. COSMO assumes that a 
communicating agent, which is both a speaker and a listener, internalizes the 
communication situation inside an internal model.
The PhD project aims at developing COSMO in two major directions.
(1)  Joint acquisition of perceptual and motor repertoires in a syllabic 
framework. Experiments in COSMO have mainly concerned simple stimuli, e.g. in 
abstract one-dimensional sensory-motor spaces, or with restricted vowel 
samples. We will explore strategies for automatically learning to produce and 
perceive complex sequences such as plosive-vowel CV sequences, which display 
systematic coarticulation phenomena. Various kinds of exploration and learning 
mechanisms are available from cognitive and developmental robotics 
(Moulin-Frier & Oudeyer, 2012). Validation tests will be inspired from real 
data, on e.g. locus equations for plosive acoustics (Sussman et al., 1998), 
robustness to perturbations in production (Lindblom et al., 1979; Savariaux et 
al., 1995), or coupling of perceptual and motor idiosyncrasies.

(2)  Comparison of auditory, motor and perceptuo-motor theories for speech 
processing in various conditions. Once these perception and production 
components will be settled in COSMO, we will compare auditory, motor and 
perceptuo-motor speech perception theories in challenging conditions, such as 
noise, speaker normalization, or foreign accent. We will test the ability to 
develop a perceptuo-motor phonology from auditory and motor experience, e.g. to 
acquire a category such as “plosive place of articulation” through the 
discovery of perceptuo-motor links in learning. We will also test COSMO on 
natural CV stimuli, exploiting natural multi-speaker corpora of CV sequences 
for learning and perceptual tests. 

The work will be realized within a multidisciplinary team gathering knowledge 
in speech communication, cognitive theories and Bayesian modeling (Jean-Luc 
Schwartz in GIPSA-Lab Grenoble, Julien Diard in LPNC Grenoble, Pierre Bessière 
in ISIR Paris), in collaboration with Pierre-Yves Oudeyer in INRIA Bordeaux.

 

Practical information

The PhD position is open from September 2014, or slightly later if necessary.

Candidates should have a master, some knowledge about speech and cognitive 
modeling, and ability to program and to develop computational models.

They must send a CV, together with a letter explaining why they are interested 
in the project. They should also provide two names (with email addresses) for 
recommendations about their applications.

This should be send before June 15th to Jean-Luc Schwartz 
([email protected]). Interviews will be done with 
preselected candidates. Decision will occur in the following weeks.




_______________________________________________
uai mailing list
[email protected]
https://secure.engr.oregonstate.edu/mailman/listinfo/uai

Reply via email to