MAHOUT-627 provides a patch for training the HMM using the Baum Welch Algorithm on MapReduce. If you would like to give it a spin, I will be glad to help you with any questions you may have about its use.
Can you provide more details about your project? I am particularly interested in the number of observed and hidden states you have in your HMM model and the total training set size for understanding your scalability requirements. On Wed, Jan 25, 2012 at 1:08 PM, Manuel Blechschmidt < [email protected]> wrote: > Hi Keary, > > On 25.01.2012, at 21:47, Keary Cavin wrote: > > > Hello, > > > > We are investigating Mahout as a scalable solution for a hidden markov > model genetic imputation problem we'd like to run on our Hadoop cluster. > > > > Does the infrastructure exist to run the Mahout HMM code through Hadoop? > > Actually in the core Mahout SVN there is not parallel implementation of > any aspects of the HMM approach yet. Nevertheless Dhruv Kumar implemented > during a google summar of code a paralyzed Baum Welch algorithm. > > There was a discussion 2 days ago about HMM: > > http://mail-archives.apache.org/mod_mbox/mahout-user/201201.mbox/%3c5d1fc0e56861d84086fe034afd1b223d3cd10...@mbx1.hosted.exchange-login.net%3E > > Here is the patch: > https://issues.apache.org/jira/browse/MAHOUT-627 > > Keep in mind that executing machine learning in parallel is on going > research. > > > > > Thanks very much, > > > > Keary > > /Manuel > > -- > Manuel Blechschmidt > Dortustr. 57 > 14467 Potsdam > Mobil: 0173/6322621 > Twitter: http://twitter.com/Manuel_B > >
