My impression (and Svetlomir should correct me) is that the intent was to use two HMM's on separate inputs and then use the decoded state sequences from those as inputs to a third HMM.
If that is the question, then I think that Mahout's HMM's are sufficiently object-ish that this should work. Obviously, it will take multiple training passes to train each separate model. On Sun, Jul 24, 2011 at 11:25 AM, Dhruv <[email protected]> wrote: > Svetlomir and Ted -- I was not trying to be rude, sorry if I came across > that way because of my exuberance. I apologize. > > I was eager to help and may have acted too fast and misunderstood the > question, so I turn to both of you for a little clarification. > > I'm confused whether the X's refer to the hidden states, or training > instances. Since the hidden sequence is always a Markov Chain in HMMs, I > assumed that Svetlomir meant that X1 and X2 were two separate hidden state > sequences because Markov Chain was explicitly mentioned in his original > question. To quote: > > ----------- > X1----X1----X1----...X1 (Markov Chain for input parameter 1 => monitoring > X1's changes over time) > > X2----X2----X2----...X2 (Markov Chain for intput parameter 2 => monitoring > X2's changes over time) > ----------- > > Further, since X1 and X2 were not slated to have any relationship with each > other and since they were the observations of two different parameters, I > construed that X1 and X2 represented two separate hidden state sequences. I > gathered that the hidden state sequences X1 and X2 are drawn from two > disjoint hidden vocabulary sets. The user wants to discover the model on > some training set and then, to the trained model, feed Y for decoding to > arrive at the most likely sequence of states, X1 and X2 which emitted Y. > > In my answer, I continued with this line saying that in one training, you > can't arrive at two separate models for X1 and X2 which contain the > requisite distributions which can be used for decoding, say sequences of X1 > to have produced Y or sequence of X2 to have produced Y. Hence, I suggested > having only one set for the hidden states, combining X1s and X2s and then > train the model on it. Given the domain of application, this may or may not > make sense, hence I was doubtful of formulating the problem as HMM and > suggested alternatives. > > However: > > If X's are two separate input sequences for training, then yes, the current > implementation is capable of training the HMM. If Y is the output, then one > can decode, after training, the sequence of hidden states which most likely > produced Y. > > For the output probability question, my answer was to use the trained > model's HmmModel.getEmissionMatrix.get(hiddenState, emittedState) method to > compute the output probability for a particular hidden state. I believe > this > is not what the user wanted? > > > Dhruv > > On Sun, Jul 24, 2011 at 12:56 PM, Ted Dunning <[email protected]> > wrote: > > > On Sun, Jul 24, 2011 at 7:52 AM, Dhruv <[email protected]> wrote: > > > > > ... If you look into the *definition* of HMM, the hidden sequence is > > drawn > > > from > > > only one set. The hidden sequence's transitions can be expressed as a > > joint > > > probability p(s0, s1). Similarly the observed sequence has a joint > > > distribution with the hidden sequence such as p(y0, s1) and so on. > > > > > > > I think gentler language might be a good idea here. The question was not > > at > > all unreasonable. > > > > > > > > > > The hidden state transitions follow the Markov memorylessness property > > and > > > hence form a Markov Chain. > > > > > > In your case, you are trying to model your problem assuming that there > > are > > > two underlying state sequences affecting the observed output. This > > doesn't > > > fit into the HMM's definition and you probably want something else. > > > > > > > Actually, what the original poster wanted is quite sensible. While the > > output sequence is due to a single input sequence, that input sequence is > > not observable. As such, we have a noisy channel problem where we want > to > > estimate something about that original sequence. The point of the Markov > > model is that it defines a distribution of output sequence given an input > > sequence (and model). This distribution can be inverted so that given a > > particular output sequence, we can estimate the probability distribution > of > > input sequences conditional on the output. > > > > The typical decoding algorithm for HMM's estimates only the maximum > > likelihood input sequence but this does not negate the fact that we have > a > > distribution. There are alternative decoding algorithms that allow a set > > of > > high probability sequences to be estimated or allow a partial probability > > lattice to be output that allows alternative sequences to be probed. > > > > If you do want to fit your problem into the HMM framework, you need to > > > condense the X1 and X2 sequences into a single set and then condition > the > > > Ys > > > on it. > > > > > > > Not at all. > > > > > > > > 3. Can we get output probabilities from the HMM for a concrete state? > > > > > > > > > > Yes, after training, you can retrieve any of the trained model's > > > distributions as a Mahout Matrix type and use get(row, col). > > > > > > > This is not quite what the question was. > > >
