Hi Dhruv and Manuel, Thank you for your responses. I apologize for my late reply.
One of the immediate goals of our project is to perform imputation over several hundred genomes. In our next round of imputations, we anticipate incoming data for several thousand genomes. I don't have a ready answer for the question about the number of observed and hidden states in the HMM. We do know the best imputation window size our current code supports is between 1 and 5 million base pairs. We have a meeting scheduled with the authors of the imputation code we are using and we want to get the details on the parameters and implementation details of its Hidden Markov Model. When I have more information about the algorithm, I will send it to you. Dhruv, I downloaded the MAHOUT-627 patch and applied the files to the current mahout release. I'll let you know when I have questions. Thank you very much, Keary -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Dhruv Kumar Sent: Wednesday, January 25, 2012 5:15 PM To: [email protected] Subject: Re: status of hadoop hidden markov model in mahout MAHOUT-627 provides a patch for training the HMM using the Baum Welch Algorithm on MapReduce. If you would like to give it a spin, I will be glad to help you with any questions you may have about its use. Can you provide more details about your project? I am particularly interested in the number of observed and hidden states you have in your HMM model and the total training set size for understanding your scalability requirements. On Wed, Jan 25, 2012 at 1:08 PM, Manuel Blechschmidt < [email protected]> wrote: > Hi Keary, > > On 25.01.2012, at 21:47, Keary Cavin wrote: > > > Hello, > > > > We are investigating Mahout as a scalable solution for a hidden > > markov > model genetic imputation problem we'd like to run on our Hadoop cluster. > > > > Does the infrastructure exist to run the Mahout HMM code through Hadoop? > > Actually in the core Mahout SVN there is not parallel implementation > of any aspects of the HMM approach yet. Nevertheless Dhruv Kumar > implemented during a google summar of code a paralyzed Baum Welch algorithm. > > There was a discussion 2 days ago about HMM: > > http://mail-archives.apache.org/mod_mbox/mahout-user/201201.mbox/%3C5D > 1fc0e56861d84086fe034afd1b223d3cd10...@mbx1.hosted.exchange-login.net% > 3E > > Here is the patch: > https://issues.apache.org/jira/browse/MAHOUT-627 > > Keep in mind that executing machine learning in parallel is on going > research. > > > > > Thanks very much, > > > > Keary > > /Manuel > > -- > Manuel Blechschmidt > Dortustr. 57 > 14467 Potsdam > Mobil: 0173/6322621 > Twitter: http://twitter.com/Manuel_B > >
