Hi Ted, 

thanks very much for the response, very helpful to hear these thoughts. 

What I will do is look at the data set issue and report back as to what I find 
out. I'll prod round the code and see if I can get a clue as to how it produces 
infinities and so on.

I think that one of the Mahout algorithms (DF) does use NaN for "undecidable" 

(ref) 
http://mail-archives.apache.org/mod_mbox/mahout-dev/201206.mbox/%3C824188178.43658.1340361882497.JavaMail.jiratomcat@issues-vm%3E

So perhaps there is a long term need to think through the output semantics of 
the library? 

I ran an open source project (Zeus Agents - still on source forge! but antique) 
for many years before it faded, so I know that random suggestions with no 
technical input is fairly unhelpful, but give me some time and I'll try and 
come back with something more useful! 

Best,

Simon

----
Dr. Simon Thompson
Chief Researcher, Customer Experience.
BT Research.
BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
IP5 3RE

Note :

This email contains BT information, which may be privileged or confidential. 
It's meant only for the individual(s) or entity named above. If you're not the 
intended recipient, note that disclosing, copying, distributing or using this 
information is prohibited. If you've received this email in error, please let 
me know immediately on the email address above. Thank you.
We monitor our email system, and may record your emails.
British Telecommunications plc
Registered office: 81 Newgate Street London EC1A 7AJ
Registered in England no: 1800000
________________________________________
From: Ted Dunning [[email protected]]
Sent: 06 January 2013 20:16
To: [email protected]
Subject: Re: HMM - baum welch and hmmpredict

It sounds like you are getting some numerical stability issues with the
training program.  With HMM's, the most common problem that leads to this
is numerical underflow.  I haven't looked at this in detail, however, so I
can't comment very knowledgeably.  It is possible that the current
implementation has no regularization which might lead to problems for
synthetic data-sets such as your counting example because there are no
observations for some transitions and the trainer may try to represent this
as -Inf in log space.

I can say that the Mahout HMM implementations are a student project and
have not seen much run-time or critical review.  That means that the
probability of serious bugs in the implementation is much higher than code
that is heavily used such as the recommender or the math library.  The
student who did the work is good, but that doesn't take the place of wide
usage.

On Sat, Jan 5, 2013 at 11:44 AM, <[email protected]> wrote:

> Hi there,
>
> I've got a couple of questions about the hmm elements of Mahout.
>
> - when I get models that are made of NaN I guess this is telling me that
> the algorithm can't make a prediction?
> - I can train models with 1 hidden state, or 2 hidden states and once or
> twice with 3 hidden states.. but when I try to train anything more complex
> it always seems to come back with NaNs - even with data sets like 1 2 3 4
> 5 1 2 3 4 5 1 2... which in my simple minded view should work well for 4
> or 5 hidden states : what am I doing wrong?
> - I have used hmmpredict to produce some... predictions! but how can I
> give it a sequence and then ask for the next state? Or should I simply use
> the code to create a custom predictor of my own?
>
> All the best,
>
> Simon
>
>
> ----
> Dr. Simon Thompson
> Chief Researcher, Customer Experience.
> BT Research.
> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
> IP5 3RE
>
> Note :
>
> This email contains BT information, which may be privileged or
> confidential. It's meant only for the individual(s) or entity named above.
> If you're not the intended recipient, note that disclosing, copying,
> distributing or using this information is prohibited. If you've received
> this email in error, please let me know immediately on the email address
> above. Thank you.
> We monitor our email system, and may record your emails.
> British Telecommunications plc
> Registered office: 81 Newgate Street London EC1A 7AJ
> Registered in England no: 1800000

Reply via email to