Re: [Scikit-learn-general] How does scikit handle HMM emission probabilities that do not add up to 1?

Kyle Kastner Thu, 08 May 2014 07:57:06 -0700

It looks like you are getting a 0 because the
_generate_sample_from_state function will never have any value other than 0
for the (cdf > rand).argmax() see line 991 here:
https://github.com/hmmlearn/hmmlearn/blob/master/hmmlearn/hmm.py . This is
expected behavior IMO for when emission probs are manually assigned with
strange input - it has to predict something!


So it is not forcing it to be a probability (no explicit summing to one)
that I can see. How that affects your actual problem I am not sure - I
would think you would need deletion information or some outside model to
insert the silent state in order to use those observations.

You might also try to add 4 more states, where _A, _T, _C, and _G represent
a deletion followed by a symbol (or maybe symbol followed by deletion is
better, don't know your problem that well). That might allow you to
accommodate for a deletion in both cases, since the state contains both a
symbol and an optional deletion.



On Thu, May 8, 2014 at 3:26 AM, anas elghafari <anas.elghaf...@gmail.com>wrote:

> That would only work for the generation of samples, but wouldn't work when
> the task is predicting the underlying states. Say I choose _ as a stand in
> for nothing, and only the silent state emits this. If I have observations:
> GGTTAAAA, the model will predict that the silent state couldn't have been
> visited in the generation of these observations.
>
> Anas
>
>
> 2014-05-08 8:57 GMT+02:00 Andy <t3k...@gmail.com>:
>
>  Why don't you just add another output state that stands for "nothing"
>>
>>
>> On 05/07/2014 11:52 PM, anas elghafari wrote:
>>
>>  Thank you, Kyle, for your answer. I think there is some massaging of
>> the numbers going on. For example, I tried to specify a silent state (a
>> state where all emission probabilities are 0).The result was that all
>> emissions at that state were of the first signal. I.e. the probability of
>> the first emission was rounded up to 1 (code below).
>>
>>  My question: is there a way to specify a silent state? There are some
>> applications that require such a thing (e.g. DNA sequences where one of the
>> things that could happen is the deletion of a symbol. The deletion state
>> would show up in the sequence of states, but would
>> emit no signal)
>>
>> --------------
>> Code sample: trying to build an HMM with the second state (state 1) as a
>> silent state:
>>
>> def testHMM():
>>     startprob = np.array([0.5, 0.5])
>>     transmat = np.array([[0.5, 0.5], [0.5, 0.5]])
>>     emissions = np.array([[0,0.8,0.2], [0,0,0]])
>>     model = hmm.MultinomialHMM(2, startprob, transmat)
>>     model.emissionprob_ = emissions
>>     return model
>>
>> >>> m = HMM_test.testHMM()
>> >>> m.sample(20)
>> (array([2, 1, 1, 1, 2, 0, 1, 0, 1, 1, 0, 1, 0, 2, 0, 0, 1, 1, 1, 1],
>> dtype=int64),
>>  array([0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0]))
>>
>>  In this example, singal 0 should've never shown up (since its
>> probability is 0 in both states). However, it is the only signal emitted by
>> state 1.
>>
>>  Anas
>>
>>
>>
>>
>> 2014-05-07 21:34 GMT+02:00 Kyle Kastner <kastnerk...@gmail.com>:
>>
>>> If you are manually specifying the emission probabilities I don't think
>>> there are any hooks/asserts to guarantee that variable is normalized I.E.
>>> if you assign to the emissionprob_ instead of using the fit() function, I
>>> think it is on you to make sure the emission probabilities you are
>>> assigning *are* actually probabilities. From my perspective that is desired
>>> behavior, but maybe someone with more experience on this HMM implementation
>>> can comment.
>>>
>>> For future reference,
>>> Hidden Markov Models were recently split into their own project: seen
>>> here https://github.com/hmmlearn/hmmlearn
>>>
>>>  Kyle
>>>
>>>
>>>  On Wed, May 7, 2014 at 10:41 AM, anas elghafari <
>>> anas.elghaf...@gmail.com> wrote:
>>>
>>>>     Hi scikit community,
>>>>
>>>>  I am playing around with the Hidden Markov Mode module
>>>> (multinomialHMM) and one thing I don't understand is why scikit accepts
>>>> emission probabilities that add up to more or less than 1.
>>>>
>>>>  Here is an example with two states and three emission signals:
>>>>
>>>> >>> import numpy
>>>> >>> from sklearn import hmm
>>>> >>> startprob = numpy.array([0.5, 0.5])
>>>> >>> transition_matrix = numpy.array([[0.5, 0.5], [0.5, 0.5]])
>>>> >>> model = hmm.MultinomialHMM(2, startprob, transition_matrix)
>>>> >>> model.emissionprob_ = numpy.array([[0, 0, 0.2], [0.6, 0.35, 0.05]])
>>>>
>>>>  As you can see, I am specifying the emission proabilities for state 0
>>>> as [0, 0, 0.2]. Scikit accepts this and generates predications with no
>>>> complaints. Is this desired behavior? Do these probabilities get 
>>>> normalized?
>>>>
>>>>  Thanks,
>>>>
>>>>  Anas
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Is your legacy SCM system holding you back? Join Perforce May 7 to find
>>>> out:
>>>> &#149; 3 signs your SCM is hindering your productivity
>>>> &#149; Requirements for releasing software faster
>>>> &#149; Expert tips and advice for migrating your SCM now
>>>> http://p.sf.net/sfu/perforce
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Is your legacy SCM system holding you back? Join Perforce May 7 to find
>>> out:
>>> &#149; 3 signs your SCM is hindering your productivity
>>> &#149; Requirements for releasing software faster
>>> &#149; Expert tips and advice for migrating your SCM now
>>> http://p.sf.net/sfu/perforce
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
>> &#149; 3 signs your SCM is hindering your productivity
>> &#149; Requirements for releasing software faster
>> &#149; Expert tips and advice for migrating your SCM 
>> nowhttp://p.sf.net/sfu/perforce
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Is your legacy SCM system holding you back? Join Perforce May 7 to find
>> out:
>> &#149; 3 signs your SCM is hindering your productivity
>> &#149; Requirements for releasing software faster
>> &#149; Expert tips and advice for migrating your SCM now
>> http://p.sf.net/sfu/perforce
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Is your legacy SCM system holding you back? Join Perforce May 7 to find
> out:
> &#149; 3 signs your SCM is hindering your productivity
> &#149; Requirements for releasing software faster
> &#149; Expert tips and advice for migrating your SCM now
> http://p.sf.net/sfu/perforce
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
&#149; 3 signs your SCM is hindering your productivity
&#149; Requirements for releasing software faster
&#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] How does scikit handle HMM emission probabilities that do not add up to 1?

Reply via email to