RE: status of hadoop hidden markov model in mahout

Keary Cavin Tue, 31 Jan 2012 11:15:09 -0800

Hi Dhruv and Manuel,

Thank you for your responses.  I apologize for my late reply.

One of the immediate goals of our project is to perform imputation over several 
hundred genomes.  In our next round of imputations, we anticipate incoming data 
for several thousand genomes.

I don't have a ready answer for the question about the number of observed and 
hidden states in the HMM.  We do know the best imputation window size our 
current code supports is between 1 and 5 million base pairs.

We have a meeting scheduled with the authors of the imputation code we are 
using and we want to get the details on the parameters and implementation 
details of its Hidden Markov Model.

When I have more information about the algorithm, I will send it to you.

Dhruv, I downloaded the MAHOUT-627 patch and applied the files to the current 
mahout release.  I'll let you know when I have questions.

Thank you very much,

Keary

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Dhruv Kumar
Sent: Wednesday, January 25, 2012 5:15 PM
To: [email protected]
Subject: Re: status of hadoop hidden markov model in mahout

MAHOUT-627 provides a patch for training the HMM using the Baum Welch Algorithm 
on MapReduce. If you would like to give it a spin, I will be glad to help you 
with any questions you may have about its use.

Can you provide more details about your project? I am particularly interested 
in the number of observed and hidden states you have in your HMM model and the 
total training set size for understanding your scalability requirements.

On Wed, Jan 25, 2012 at 1:08 PM, Manuel Blechschmidt < 
[email protected]> wrote:

> Hi Keary,
>
> On 25.01.2012, at 21:47, Keary Cavin wrote:
>
> > Hello,
> >
> > We are investigating Mahout as a scalable solution for a hidden 
> > markov
> model genetic imputation problem we'd like to run on our Hadoop cluster.
> >
> > Does the infrastructure exist to run the Mahout HMM code through Hadoop?
>
> Actually in the core Mahout SVN there is not parallel implementation 
> of any aspects of the HMM approach yet. Nevertheless Dhruv Kumar 
> implemented during  a google summar of code a paralyzed Baum Welch algorithm.
>
> There was a discussion 2 days ago about HMM:
>
> http://mail-archives.apache.org/mod_mbox/mahout-user/201201.mbox/%3C5D
> 1fc0e56861d84086fe034afd1b223d3cd10...@mbx1.hosted.exchange-login.net%
> 3E
>
> Here is the patch:
> https://issues.apache.org/jira/browse/MAHOUT-627
>
> Keep in mind that executing machine learning in parallel is on going 
> research.
>
> >
> > Thanks very much,
> >
> > Keary
>
> /Manuel
>
> --
> Manuel Blechschmidt
> Dortustr. 57
> 14467 Potsdam
> Mobil: 0173/6322621
> Twitter: http://twitter.com/Manuel_B
>
>

RE: status of hadoop hidden markov model in mahout

Reply via email to