thanks for the reply ... I've discretized the continuous time series
observations and assigned them to symbols. The number of hidden states is
2: "out of control" and "not out of control -- 0 and 1. With the scenario
defined this way, i'm able to get good predictions from HMM. What i don't
know how to do is get a measure of the model's "confidence" in the
prediction. How do i get that out of the HMM API?
As for your interesting reply, i'm not sure i understand it. So I would use
the k-means clustering but what would i be clustering? The nearness of the
points? Some aggregate of the points? The distance between the points of
one sub-sequence from another (that's probably it). The purpose of such
clustering would be to reduce the dimension of my sequence of n time series
observations to a symbol (i.e., the cluster ID)?
> You can now quantize your data using this clustering
so you are suggesting i use the membership in a particular cluster as the
symbolic representation of each subsequence to then plug into the HMM? Not
sure but i assume these would be the observed values, since the hidden
state i'm after is "in control" / "out of control".
Sorry if i'm completely missing it.
On Wed, May 22, 2013 at 11:41 AM, Ted Dunning <[email protected]> wrote:
> HMM's could be useful, but you have to define things a bit differently.
>
> First of all, HMM's want symbolic inputs and want to give you symbolic
> outputs. You don't get to see the internal state.
>
> My first approach would be to use k-means clustering on short sequences of
> your observed continuous variables. You should use as large a k as gives
> you about the same squared error on held out data as on the training data.
>
> You can now quantize your data using this clustering. That is the first
> step for your HMM.
>
> The next step is to train the HMM. You need to give it many sequences of
> quantized state variables and the desired outputs at each time step. You
> have to guess at the number of hidden states.
>
> The next step would be to run the HMM on new data and evaluate.
>
>
>
> On Wed, May 22, 2013 at 8:20 AM, yikes aroni <[email protected]> wrote:
>
> > I'm not knowledgable of statistics nor data analysis, so please be
> > gentle! I am using Mahout to predict time series out of control state.
> I've
> > had a fair amount of success classifying with SGD and Adaptive regression
> > approaches but want to see if Hidden Markov Models can do a better job
> for
> > my purposes. I have two questions.
> >
> > Question 1
> > I train the model using HmmTrainer.trainSupervisedSequence(). The hidden
> > state is the status: Out-of-Control (OOC) or Not-OOC for the next point
> in
> > time. Thus when i use HmmEvaluator.decode(model, observedSequence,
> false),
> > I look at the "state" associated with the last point in the
> > observedSequence and take *that* as my prediction of State at t+1. First
> of
> > all -- is this sensible? Or is there a better way to use the API to get a
> > prediction of State at t+1 given Observations 0 through t after training?
> >
> > Question 2
> > Once I get my prediction -- i.e., the state the model predicts will be
> > associated with the last observation in my observation sequence -- how
> do I
> > use the API to get the probability of a that predicted state being
> correct?
> > I've looked at various output from HmmUtils and HmmEvaluator, but not
> being
> > strong in my knowledge of HMM, i'm not sure which (if any) are what i
> need.
> > Ultimately, I want to be able to say something like "The predicted next
> > state of this time series is OOC with a confidence of 0.37".
> >
> > thank you
> >
>