Mathieu,
Thanks for your long answer and your research paper.
My data is a data set of 20 000 time series ( length max is 32).
Problem 1: My problem is a risk problem at the moment.
* I want to represent the behaviour of my 20 000 time series and
generates some monte carlo simulations using the method.sample()
* Intuitively, i have to choose between 1 or 3 ( max 4) the
number of states.
* Then, I want to capture the risk for each time series and the
risk at the aggregate level.( I will generate 100 Monte Carlo
Simulations)
* Question 1: I don t see how I can train and cross validate my
HMM in Scikit Learn ( First time I use Scikit Learn for this purpose)
* Question 2: the lenght of the time serie is 32 periods, is it
enough to make cross validation and validation ?
Problem 2: Classification
* In the near future, I will try to make some classification of
time series but I have no ideas how to handle the problem ? Should I use
an SVM ? Can you refer any paper ?
Generic Questions: I was wondering if your algorithm is developed iin
Python ? Do you think your algo is relevant to apply to my business
problem ?
By Advance thanks for this.
Didier
From: Mathieu Blondel [mailto:math...@mblondel.org]
Sent: 17 October 2012 17:07
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] HMM: Determination of the state
numbers
On Thu, Oct 18, 2012 at 12:18 AM, Didier Vila <dv...@capquestco.com>
wrote:
Should I maximize the "score" function depending the number of state
through several candidates that makes senses from a business point of
view ?
The best criterion to optimize is not "score" but the evaluation metric
you really care about for your application. For example, if what you
want to do is time-series classification, you should optimize the number
of states for classification accuracy. Also, be careful, you cannot use
your test data to choose the number of states. You need to use
validation data or cross-validation.
For some types of data, states have an intuitive meaning and the number
of states can be chosen easily. For example, if what you want to do is
mouse gesture recognition, you can use one state for your HMM modeling a
straight line and two states for your HMM modeling a gesture with a
turn. You can also try to visualize your data and see if it has an
easy-to-identify number of "modes".
In an old paper of mine [*], I proposed a simple heuristic to choose the
number of states but it's only useful if you want to model several
object classes, each with one HMM. In that case, choosing the optimal
number of states for all HMMs is a combinatorial problem and my
heuristic allows to search only 1 parameter.
You may also want to try GMMHMM but in that case you need to tune the
number of mixture components as well (more components give your HMM more
expressive power therefore your HMM may need fewer states).
HTH,
Mathieu
[*] http://mblondel.org/publications/mblondel-icpr2010.pdf
This e-mail is intended solely for the addressee, is strictly confidential and
may also be legally privileged. If you are not the addressee please do not
read, print, re-transmit, store or act in reliance on it or any attachments.
Instead, please email it back to the sender and then immediately permanently
delete it. E-mail communications cannot be guaranteed to be secure or error
free, as information could be intercepted, corrupted, amended, lost, destroyed,
arrive late or incomplete, or contain viruses. We do not accept liability for
any such matters or their consequences. Anyone who communicates with us by
e-mail is taken to accept the risks in doing so. Opinions, conclusions and
other information in this e-mail and any attachments are solely those of the
author and do not represent those of CapQuest Group Limited or any of its
subsidiaries unless otherwise stated. CapQuest Group Limited (registered number
4936030), CapQuest Debt Recovery Limited (registered number 3772278), CapQuest
Investments Limited (registered number 5245825), CapQuest Asset Management
Limited (registered number 5245829) and CapQuest Mortgage Servicing Limited
(registered number 05821008) are all limited companies registered in England
and Wales with their registered offices at Fleet 27, Rye Close, Fleet,
Hampshire, GU51 2QQ. Each company is a separate and independent legal entity.
None of the companies have any liability for each other's acts or omissions.
This communication is from the company named in the sender's details above.
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general