Re: [Scikit-learn-general] How does scikit handle HMM emission probabilities that do not add up to 1?

anas elghafari Fri, 09 May 2014 02:16:28 -0700

On Thursday, May 8, 2014, Kyle Kastner <kastnerk...@gmail.com> wrote:


> It looks like you are getting a 0 because the
> _generate_sample_from_state function will never have any value other than
> 0 for the (cdf > rand).argmax() see line 991 here:
> https://github.com/hmmlearn/hmmlearn/blob/master/hmmlearn/hmm.py . This
> is expected behavior IMO for when emission probs are manually assigned with
> strange input - it has to predict something!
>
>
 okay, I see. thank you.



> So it is not forcing it to be a probability (no explicit summing to one)
> that I can see. How that affects your actual problem I am not sure - I
> would think you would need deletion information or some outside model to
> insert the silent state in order to use those observations.
>
> You might also try to add 4 more states, where _A, _T, _C, and _G
> represent a deletion followed by a symbol (or maybe symbol followed by
> deletion is better, don't know your problem that well). That might allow
> you to accommodate for a deletion in both cases, since the state contains
> both a symbol and an optional deletion.
>

Perhaps I didn't  fully understand your suggestion. But it seems to me that
this will only solve the observations generation problem, but wouldn't help
in the case where the state sequences have to be predicted based on
observations. The observation sequences we have don't contain a deletion
symbol, but we still want to allow the possibility that a deletion took
place (that the silent deletion state was visited) .we would then have to
take the observation sequence and populate it with underscores in every
possible way. Of course, this would lead to a combinatoric blow up in the
number of observation sequences that need to be fed to model.

Anas

>
>
>
> On Thu, May 8, 2014 at 3:26 AM, anas elghafari 
> <anas.elghaf...@gmail.com>wrote:
>
> That would only work for the generation of samples, but wouldn't work when
> the task is predicting the underlying states. Say I choose _ as a stand in
> for nothing, and only the silent state emits this. If I have observations:
> GGTTAAAA, the model will predict that the silent state couldn't have been
> visited in the generation of these observations.
>
> Anas
>
>
> 2014-05-08 8:57 GMT+02:00 Andy <t3k...@gmail.com>:
>
>  Why don't you just add another output state that stands for "nothing"
>
>
> On 05/07/2014 11:52 PM, anas elghafari wrote:
>
>  Thank you, Kyle, for your answer. I think there is some massaging of the
> numbers going on. For example, I tried to specify a silent state (a state
> where all emission probabilities are 0).The result was that all emissions
> at that state were of the first signal. I.e. the probability of the first
> emission was rounded up to 1 (code below).
>
>  My question: is there a way to specify a silent state? There are some
> applications that require such a thing (e.g. DNA sequences where one of the
> things that could happen is the deletion of a symbol. The deletion state
> would show up in the sequence of states, but would
> emit no signal)
>
> --------------
> Code sample: trying to build an HMM with the second state (state 1) as a
> silent state:
>
> def testHMM():
>     startprob = np.array([0.5, 0.5])
>     transmat = np.array([[0.5, 0.5], [0.5, 0.5]])
>     emissions = np.array([[0,0.8,0.2], [0,0,0]])
>     model = hmm.MultinomialHMM(2, startprob, transmat)
>     model.emissionprob_ = emissions
>     return model
>
> >>> m = HMM_test.testHMM()
> >>> m.sample(20)
> (array([2, 1, 1, 1, 2, 0, 1, 0, 1, 1, 0, 1, 0, 2, 0, 0, 1, 1, 1, 1],
> dtype=int64),
>  array([0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0]))
>
>  In this example, singal 0 should've never shown up (since its probability
> is 0 in both states). However, it is the only signal emitted by state 1.
>
>  Anas
>
>
>
>
> 2014-05-07 21:34 GMT+02:00 Kyle Kastner <kastnerk...@gmail.com>:
>
> If you are manually specifying the emission probabilities I don't think
> there are any hooks/asserts to guarantee that variable is normalized I.E.
> if you assign to the emissionprob_ instead of using the fit() function, I
> think it is on you to make sure the emission probabilities you are
> assigning *are* actually probabilities. From my perspective that is desired
> behavior, but maybe someone with more experience on this HMM implementation
> can comment.
>
> For future reference,
> Hidden Markov Models were recently split into their own project: seen here
> https://github.com/hmmlearn/hmmlearn
>
>  Kyle
>
>
>  On Wed, May 7, 2014 at 10:41 AM, anas elghafari
>
>

------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
&#149; 3 signs your SCM is hindering your productivity
&#149; Requirements for releasing software faster
&#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] How does scikit handle HMM emission probabilities that do not add up to 1?

Reply via email to