Hi Simon,

Are you using the standalone HMM trainer or are you running with the MapReduce 
variant using the patch available at 
https://issues.apache.org/jira/browse/MAHOUT-627?

As Ted mentioned, these trainers can experience arithmetic underflow when the 
set of states is large. Did you try the log scaled APIs for the Baum Welch 
trainer? The log scaled versions are more immune to underflows.

-Dhruv

On Jan 6, 2013, at 12:34 PM, [email protected] wrote:

> Hi Ted, 
> 
> thanks very much for the response, very helpful to hear these thoughts. 
> 
> What I will do is look at the data set issue and report back as to what I 
> find out. I'll prod round the code and see if I can get a clue as to how it 
> produces infinities and so on.
> 
> I think that one of the Mahout algorithms (DF) does use NaN for "undecidable" 
> 
> (ref) 
> http://mail-archives.apache.org/mod_mbox/mahout-dev/201206.mbox/%3C824188178.43658.1340361882497.JavaMail.jiratomcat@issues-vm%3E
> 
> So perhaps there is a long term need to think through the output semantics of 
> the library? 
> 
> I ran an open source project (Zeus Agents - still on source forge! but 
> antique) for many years before it faded, so I know that random suggestions 
> with no technical input is fairly unhelpful, but give me some time and I'll 
> try and come back with something more useful! 
> 
> Best,
> 
> Simon
> 
> ----
> Dr. Simon Thompson
> Chief Researcher, Customer Experience.
> BT Research.
> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
> IP5 3RE
> 
> Note :
> 
> This email contains BT information, which may be privileged or confidential. 
> It's meant only for the individual(s) or entity named above. If you're not 
> the intended recipient, note that disclosing, copying, distributing or using 
> this information is prohibited. If you've received this email in error, 
> please let me know immediately on the email address above. Thank you.
> We monitor our email system, and may record your emails.
> British Telecommunications plc
> Registered office: 81 Newgate Street London EC1A 7AJ
> Registered in England no: 1800000
> ________________________________________
> From: Ted Dunning [[email protected]]
> Sent: 06 January 2013 20:16
> To: [email protected]
> Subject: Re: HMM - baum welch and hmmpredict
> 
> It sounds like you are getting some numerical stability issues with the
> training program.  With HMM's, the most common problem that leads to this
> is numerical underflow.  I haven't looked at this in detail, however, so I
> can't comment very knowledgeably.  It is possible that the current
> implementation has no regularization which might lead to problems for
> synthetic data-sets such as your counting example because there are no
> observations for some transitions and the trainer may try to represent this
> as -Inf in log space.
> 
> I can say that the Mahout HMM implementations are a student project and
> have not seen much run-time or critical review.  That means that the
> probability of serious bugs in the implementation is much higher than code
> that is heavily used such as the recommender or the math library.  The
> student who did the work is good, but that doesn't take the place of wide
> usage.
> 
> On Sat, Jan 5, 2013 at 11:44 AM, <[email protected]> wrote:
> 
>> Hi there,
>> 
>> I've got a couple of questions about the hmm elements of Mahout.
>> 
>> - when I get models that are made of NaN I guess this is telling me that
>> the algorithm can't make a prediction?
>> - I can train models with 1 hidden state, or 2 hidden states and once or
>> twice with 3 hidden states.. but when I try to train anything more complex
>> it always seems to come back with NaNs - even with data sets like 1 2 3 4
>> 5 1 2 3 4 5 1 2... which in my simple minded view should work well for 4
>> or 5 hidden states : what am I doing wrong?
>> - I have used hmmpredict to produce some... predictions! but how can I
>> give it a sequence and then ask for the next state? Or should I simply use
>> the code to create a custom predictor of my own?
>> 
>> All the best,
>> 
>> Simon
>> 
>> 
>> ----
>> Dr. Simon Thompson
>> Chief Researcher, Customer Experience.
>> BT Research.
>> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
>> IP5 3RE
>> 
>> Note :
>> 
>> This email contains BT information, which may be privileged or
>> confidential. It's meant only for the individual(s) or entity named above.
>> If you're not the intended recipient, note that disclosing, copying,
>> distributing or using this information is prohibited. If you've received
>> this email in error, please let me know immediately on the email address
>> above. Thank you.
>> We monitor our email system, and may record your emails.
>> British Telecommunications plc
>> Registered office: 81 Newgate Street London EC1A 7AJ
>> Registered in England no: 1800000

Reply via email to