Jelinek has a recent book called "Statistical Methods for Speech
Recognition" that goes into all the tricks of how HMMs are used for
speech.
A basic, widely cited paper is
@article{Rabiner89,
author = "L. R. Rabiner",
title = "A Tutorial on {H}idden {M}arkov {M}odels and Selected
Applications in Speech Recognition",
journal = "Proc. of the IEEE",
year = 1989,
volume = 77,
number = 2,
pages = "257--286"
}
But there's all sorts of variations - factorial HMMs, input-output
HMMs, coupled HMMs, autoregressive HMMs, etc. - which are all just
different kinds of dynamic Bayes nets. What exactly do you need to know?
Kevin