Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Lars Buitinck
2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com:
 2) The gensim implementation predates the patenting

Does that matter?

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn?

2015-07-01 Thread Jacob Schreiber
I don't think that having that feature is a good idea. The great power of
decision trees (and ensembles of trees) is the ability to learn complicated
non-linearities which involve splitting on a variable multiple times if
necessary. If you're looking for an interpretable feature selection method,
there are better alternatives.

On Wed, Jul 1, 2015 at 8:16 AM, Sebastian Raschka se.rasc...@gmail.com
wrote:

 Maybe a crazy idea, but what I think could be useful is to have something
 like a repeat_features parameter that can be set to `False` to not reuse
 features down the tree.

 E.g., let's say we have 1000 different drug molecules with certain
 chemical groups and have some sort of experimental data of whether they
 work or not. Using decision tree classification/regression without feature
 repetition could help to interpret which of the functional groups may be
 important -- here the focus is maybe not so much predictive performance but
 rather interpretability, something like supervised clustering.



 On Jul 1, 2015, at 11:08 AM, Andreas Mueller t3k...@gmail.com wrote:

  Not really, at that kind of defeats the purpose of learning the tree.
 you could built a series of stumps that first only get feature a, then
 feature b and then feature c.

 On 06/30/2015 11:37 PM, Rex wrote:

 Given three columns, [A, B, C], can we specify the order of
 splitting, so that it firstly split on categories of A, then B, and
 then by others?

 Based on on documentation page on DecisionTreeClassifier, there is no such
 option. Is there any way to work it out?


 http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html





 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud 
 Today.https://www.gigenetcloud.com/



 ___
 Scikit-learn-general mailing 
 listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general



 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.

 https://www.gigenetcloud.com/___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Andreas Mueller


On 07/01/2015 02:42 PM, Lars Buitinck wrote:
 2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com:
 2) The gensim implementation predates the patenting
 Does that matter?

no

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Fred Mailhot
Not wrt to the patents/licensing question, but the link to it was in
response to my are we even allowed to use it...the point I was making
(poorly) was that it was implemented in gensim before are we allowed was
a question that was even relevant, ergo it wasn't actually an answer to my
question.

FM.

On 1 July 2015 at 11:42, Lars Buitinck larsm...@gmail.com wrote:

 2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com:
  2) The gensim implementation predates the patenting

 Does that matter?


 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Gael Varoquaux
On Wed, Jul 01, 2015 at 11:04:30AM -0400, Andreas Mueller wrote:
 Theano uses __setstate__ and __getstate__ and they seem to be happy with 
 that.

As long as we don't change the data model that works easily, but then so
does pickle. If when chage the data model, which we have done a few time,
we need to add migration code. We used to do that in Mayavi, but it
turned out to be a very big maintenance burden. I'd like to push that
burden onto users: it's not hard to do if you understand the models. I
think that we already do a lot and that we shouldn't strive to do more,
as it is going to weaken us.

 We could add a library of previously pickled models to the tests to 
 ensure it works

That we be necessary. But you need to add impedence matching code each
time you change the data model.

Gaël

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn?

2015-07-01 Thread Jacob Schreiber
If you are working with entirely binary data, then features will not be
repeated in the tree naturally. I think you are discussing the more general
field of 'feature selection', though. There are a plethora of algorithms
which do that--try to identify which inputs are important to a correct
prediction. You can read more here:
http://scikit-learn.org/stable/modules/feature_selection.html

On Wed, Jul 1, 2015 at 9:45 AM, Sebastian Raschka se.rasc...@gmail.com
wrote:

 Yes, and thanks for the answers, it was just a random idea.

 But in all seriousness, which algorithm would you use for such a task --
 here, the goal is not predictive performance but rather inference:

 I am collaborating with experimentalists who obtained measurements on a
 continuous scale 0.0 - 1.0, and each sample has ~30 binary features. They
 basically want to learn from this data, for example, which combination of
 features was important to yield a response = 0.5 (although this
 threshold is not fixed)
 For example, using a decision tree, you could come up with something like

 If feature A=1 -- response  0.5
 If feature B=0 -- response  0.6
  If feature C=1  --- response  0.7
 etc.

 Basically, an association rule mining but with continuous outputs.

 On Jul 1, 2015, at 12:34 PM, Dale Smith dsm...@nexidia.com wrote:

 It is a crazy idea. It defeats the purpose of random forest, which is
 introducing randomness in specific ways in order to achieve certain goals.
 Your idea, while appropriate in your use case, does not fit with the
 algorithm you want to use. Why not investigate alternatives that better fit
 your use case?


 *Dale Smith, Ph.D.*
 Data Scientist
 ​
 image001.png http://nexidia.com/

 *d.* 404.495.7220 x 4008   *f.* 404.795.7221
 Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta,
 GA 30305

 image002.jpg http://blog.nexidia.com/ image003.jpg
 https://www.linkedin.com/company/nexidia image004.jpg
 https://plus.google.com/u/0/107921893643164441840/posts image005.jpg
 https://twitter.com/Nexidia image006.jpg
 https://www.youtube.com/user/NexidiaTV

 *From:* Sebastian Raschka [mailto:se.rasc...@gmail.com
 se.rasc...@gmail.com]
 *Sent:* Wednesday, July 01, 2015 12:17 PM
 *To:* scikit-learn-general@lists.sourceforge.net
 *Subject:* Re: [Scikit-learn-general] Is it possible to specify the order
 of spliting in decision tree with scikit-learn?

 Maybe a crazy idea, but what I think could be useful is to have something
 like a repeat_features parameter that can be set to `False` to not reuse
 features down the tree.

 E.g., let's say we have 1000 different drug molecules with certain
 chemical groups and have some sort of experimental data of whether they
 work or not. Using decision tree classification/regression without feature
 repetition could help to interpret which of the functional groups may be
 important -- here the focus is maybe not so much predictive performance but
 rather interpretability, something like supervised clustering.



 On Jul 1, 2015, at 11:08 AM, Andreas Mueller t3k...@gmail.com wrote:


 Not really, at that kind of defeats the purpose of learning the tree.
 you could built a series of stumps that first only get feature a, then
 feature b and then feature c.
 On 06/30/2015 11:37 PM, Rex wrote:

 Given three columns, [A, B, C], can we specify the order of
 splitting, so that it firstly split on categories of A, then B, and
 then by others?

 Based on on documentation page on DecisionTreeClassifier, there is no such
 option. Is there any way to work it out?


 http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html





 --

 Don't Limit Your Business. Reach for the Cloud.

 GigeNET's Cloud Solutions provide you with the tools and support that

 you need to offload your IT needs and focus on growing your business.

 Configured For All Businesses. Start Your Cloud Today.

 https://www.gigenetcloud.com/




 ___

 Scikit-learn-general mailing list

 Scikit-learn-general@lists.sourceforge.net

 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.

 https://www.gigenetcloud.com/___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to 

Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Mathieu Blondel
On Wed, Jul 1, 2015 at 8:43 PM, Dale Smith dsm...@nexidia.com wrote:

  Apparently so; here is a python/cython implementation.



 http://rare-technologies.com/deep-learning-with-word2vec-and-gensim/


word2vec is *not* deep learning. The skip-gram model has been shown
recently to reduce to a certain matrix factorization [*]. So it's a shallow
network with only one hidden layer and without non-linearities.

Mathieu

[*] Neural Word Embedding as Implicit Matrix Factorization by O. Levy and
Y. Goldberg.
--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Fred Mailhot
1) The upshot seems to be that it's a defensive patent, and in any case the
code was released under Apache 2.0, so it's fine to use.
https://code.google.com/p/word2vec/
https://groups.google.com/forum/#!topic/word2vec-toolkit/1hID9F74_Ho

2) The gensim implementation predates the patenting

(thanks for that reference, Mathieu...interesting!)

FM.

On 1 July 2015 at 06:52, Mathieu Blondel math...@mblondel.org wrote:



 On Wed, Jul 1, 2015 at 8:43 PM, Dale Smith dsm...@nexidia.com wrote:

  Apparently so; here is a python/cython implementation.



 http://rare-technologies.com/deep-learning-with-word2vec-and-gensim/


 word2vec is *not* deep learning. The skip-gram model has been shown
 recently to reduce to a certain matrix factorization [*]. So it's a shallow
 network with only one hidden layer and without non-linearities.

 Mathieu

 [*] Neural Word Embedding as Implicit Matrix Factorization by O. Levy and
 Y. Goldberg.



 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Dale Smith
Apparently so; here is a python/cython implementation.

http://rare-technologies.com/deep-learning-with-word2vec-and-gensim/


Dale Smith, Ph.D.
Data Scientist
​
[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20logo.png]http://nexidia.com/

d. 404.495.7220 x 4008   f. 404.795.7221
Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 
30305

[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Blog.jpeg]http://blog.nexidia.com/
 [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20LinkedIn.jpeg] 
https://www.linkedin.com/company/nexidia  
[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Google.jpeg] 
https://plus.google.com/u/0/107921893643164441840/posts  
[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20twitter.jpeg] 
https://twitter.com/Nexidia  
[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Youtube.jpeg] 
https://www.youtube.com/user/NexidiaTV

From: Fred Mailhot [mailto:fred.mail...@gmail.com]
Sent: Tuesday, June 30, 2015 11:10 PM
To: Mathieu Blondel; scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] Library of pre-trained models

Tangent: Are we even allowed to use word2vec anymore, now that Goog has 
patented it? (in any case, I'll be looking a bit more closely at GloVe)

F.

On 30 June 2015 at 19:26, Mathieu Blondel 
math...@mblondel.orgmailto:math...@mblondel.org wrote:
For unsupervised models that take a long time to train, such as deep learning 
or word2vec based feature extractors, this can be pretty useful.
Regardless, a major issue is that we still haven't figured out how to robustly 
solve model persistence.
Mathieu

On Wed, Jul 1, 2015 at 4:53 AM, Andreas Mueller 
t3k...@gmail.commailto:t3k...@gmail.com wrote:
For most applications, this will not work, as the training data needs to come 
from the same distribution as your test data.
Language identification is pretty simple, and training a linear classifier on 
n-grams should get you quite a bit.

On 06/28/2015 09:58 AM, Erez Segal wrote:
I was searching a library of pre-trained models using scikit-learn (e.g. - a 
classifier for language identification),
stored with joblib or something similiar.
Is there such existing library?
Thanks,
Erez


--

Monitor 25 network devices or servers for free with OpManager!

OpManager is web-based network management software that monitors

network devices and physical  virtual servers, alerts via email  sms

for fault. Monitor 25 devices for free with no restriction. Download now

http://ad.doubleclick.net/ddm/clk/292181274;119417398;o


___

Scikit-learn-general mailing list

Scikit-learn-general@lists.sourceforge.netmailto:Scikit-learn-general@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.netmailto:Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.netmailto:Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Joel Nothman
oh, I missed that one from Omer Levy's debunking word2vec series. Nice!

On 1 July 2015 at 23:52, Mathieu Blondel math...@mblondel.org wrote:



 On Wed, Jul 1, 2015 at 8:43 PM, Dale Smith dsm...@nexidia.com wrote:

  Apparently so; here is a python/cython implementation.



 http://rare-technologies.com/deep-learning-with-word2vec-and-gensim/


 word2vec is *not* deep learning. The skip-gram model has been shown
 recently to reduce to a certain matrix factorization [*]. So it's a shallow
 network with only one hidden layer and without non-linearities.

 Mathieu

 [*] Neural Word Embedding as Implicit Matrix Factorization by O. Levy and
 Y. Goldberg.



 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Andreas Mueller


On 07/01/2015 10:27 AM, Fred Mailhot wrote:
 1) The upshot seems to be that it's a defensive patent, and in any 
 case the code was released under Apache 2.0, so it's fine to use.
 https://code.google.com/p/word2vec/
 https://groups.google.com/forum/#!topic/word2vec-toolkit/1hID9F74_Ho 
 https://groups.google.com/forum/#%21topic/word2vec-toolkit/1hID9F74_Ho

 2) The gensim implementation predates the patenting

Patents and code licenses are completely independent and open source 
implementations don't invalidate patents.
The Apache license grants patents iirc, so that is a bit different (I'm 
not certain about the details).
If it was BSD licensed, they could definitely still sue you for using it.

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


[Scikit-learn-general] Off Topic: Advices for a conference.

2015-07-01 Thread Luca Puggini
Hi,
I have written a paper where I apply  a machine learning methodology to
rank the risk factors for asthma.

The proposed methodology is new but what is interesting is the application
to the medical problem.

I am wondering if you are aware of any conference that may be relevant with
submission deadline before the end of August.
I am looking for something like: Quantitative methods for life science

Let me know.
Thanks,
Luca
--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn?

2015-07-01 Thread Andreas Mueller

Not really, at that kind of defeats the purpose of learning the tree.
you could built a series of stumps that first only get feature a, then 
feature b and then feature c.


On 06/30/2015 11:37 PM, Rex wrote:
Given three columns, [A, B, C], can we specify the order of 
splitting, so that it firstly split on categories of A, then B, 
and then by others?


Based on on documentation page on DecisionTreeClassifier, there is no 
such option. Is there any way to work it out?


http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html





--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/


___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Andreas Mueller


On 06/30/2015 10:26 PM, Mathieu Blondel wrote:
 Regardless, a major issue is that we still haven't figured out how to 
 robustly solve model persistence.

Theano uses __setstate__ and __getstate__ and they seem to be happy with 
that.

We could add a library of previously pickled models to the tests to 
ensure it works

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Mathieu Blondel
http://arxiv.org/abs/1301.3781
Submitted on 16 Jan 2013, last revised 7 Sep 2013

https://www.google.com/patents/US9037464
Filed on 15 March 2013

On Thu, Jul 2, 2015 at 4:03 AM, Matthieu Brucher matthieu.bruc...@gmail.com
 wrote:

 2015-07-01 19:43 GMT+01:00 Andreas Mueller t3k...@gmail.com:
 
 
  On 07/01/2015 02:42 PM, Lars Buitinck wrote:
  2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com:
  2) The gensim implementation predates the patenting
  Does that matter?
 
  no

 If the algorithm was published before it was filed, then it should be
 in the public domain... At least that's why lawyers told me to file
 patents before we publish anything...


 --
 Information System Engineer, Ph.D.
 Blog: http://matt.eifelle.com
 LinkedIn: http://www.linkedin.com/in/matthieubrucher
 Music band: http://liliejay.com/


 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Matthieu Brucher
The main interesting point is the date of filing the patent and the
date of the publication of the paper... These are as interesting to
have as the implementation date.

2015-07-01 21:12 GMT+01:00 Fred Mailhot fred.mail...@gmail.com:
 Ah...of course it was filed before...Radim did the initial implementation
 Sept 2013. My bad, I'll stop talking out of my behind now.

 On 1 July 2015 at 12:56, Mathieu Blondel math...@mblondel.org wrote:

 http://arxiv.org/abs/1301.3781
 Submitted on 16 Jan 2013, last revised 7 Sep 2013

 https://www.google.com/patents/US9037464
 Filed on 15 March 2013

 On Thu, Jul 2, 2015 at 4:03 AM, Matthieu Brucher
 matthieu.bruc...@gmail.com wrote:

 2015-07-01 19:43 GMT+01:00 Andreas Mueller t3k...@gmail.com:
 
 
  On 07/01/2015 02:42 PM, Lars Buitinck wrote:
  2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com:
  2) The gensim implementation predates the patenting
  Does that matter?
 
  no

 If the algorithm was published before it was filed, then it should be
 in the public domain... At least that's why lawyers told me to file
 patents before we publish anything...


 --
 Information System Engineer, Ph.D.
 Blog: http://matt.eifelle.com
 LinkedIn: http://www.linkedin.com/in/matthieubrucher
 Music band: http://liliejay.com/


 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Matthieu Brucher
If that's the case, then the patent could be invalidated as far as I
know (that's what I was told and in my company and the one I'm
currently working for).
Except if they didn't present how it is actually being done.

2015-07-01 20:56 GMT+01:00 Mathieu Blondel math...@mblondel.org:
 http://arxiv.org/abs/1301.3781
 Submitted on 16 Jan 2013, last revised 7 Sep 2013

 https://www.google.com/patents/US9037464
 Filed on 15 March 2013

 On Thu, Jul 2, 2015 at 4:03 AM, Matthieu Brucher
 matthieu.bruc...@gmail.com wrote:

 2015-07-01 19:43 GMT+01:00 Andreas Mueller t3k...@gmail.com:
 
 
  On 07/01/2015 02:42 PM, Lars Buitinck wrote:
  2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com:
  2) The gensim implementation predates the patenting
  Does that matter?
 
  no

 If the algorithm was published before it was filed, then it should be
 in the public domain... At least that's why lawyers told me to file
 patents before we publish anything...


 --
 Information System Engineer, Ph.D.
 Blog: http://matt.eifelle.com
 LinkedIn: http://www.linkedin.com/in/matthieubrucher
 Music band: http://liliejay.com/


 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Fred Mailhot
Ah...of course it was filed before...Radim did the initial implementation
Sept 2013. My bad, I'll stop talking out of my behind now.

On 1 July 2015 at 12:56, Mathieu Blondel math...@mblondel.org wrote:

 http://arxiv.org/abs/1301.3781
 Submitted on 16 Jan 2013, last revised 7 Sep 2013

 https://www.google.com/patents/US9037464
 Filed on 15 March 2013

 On Thu, Jul 2, 2015 at 4:03 AM, Matthieu Brucher 
 matthieu.bruc...@gmail.com wrote:

 2015-07-01 19:43 GMT+01:00 Andreas Mueller t3k...@gmail.com:
 
 
  On 07/01/2015 02:42 PM, Lars Buitinck wrote:
  2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com:
  2) The gensim implementation predates the patenting
  Does that matter?
 
  no

 If the algorithm was published before it was filed, then it should be
 in the public domain... At least that's why lawyers told me to file
 patents before we publish anything...


 --
 Information System Engineer, Ph.D.
 Blog: http://matt.eifelle.com
 LinkedIn: http://www.linkedin.com/in/matthieubrucher
 Music band: http://liliejay.com/


 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Andreas Mueller


On 07/01/2015 02:49 PM, Gael Varoquaux wrote:
 On Wed, Jul 01, 2015 at 11:04:30AM -0400, Andreas Mueller wrote:
 Theano uses __setstate__ and __getstate__ and they seem to be happy with
 that.
 As long as we don't change the data model that works easily, but then so
 does pickle. If when chage the data model, which we have done a few time,
 we need to add migration code. We used to do that in Mayavi, but it
 turned out to be a very big maintenance burden. I'd like to push that
 burden onto users: it's not hard to do if you understand the models. I
 think that we already do a lot and that we shouldn't strive to do more,
 as it is going to weaken us.
I agree that we probably don't want to do this at this point.

 We could add a library of previously pickled models to the tests to
 ensure it works
 That we be necessary. But you need to add impedence matching code each
 time you change the data model.
At least it could guard us against changing the data model accidentally

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Matthieu Brucher
2015-07-01 19:43 GMT+01:00 Andreas Mueller t3k...@gmail.com:


 On 07/01/2015 02:42 PM, Lars Buitinck wrote:
 2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com:
 2) The gensim implementation predates the patenting
 Does that matter?

 no

If the algorithm was published before it was filed, then it should be
in the public domain... At least that's why lawyers told me to file
patents before we publish anything...


-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn?

2015-07-01 Thread Sebastian Raschka
Yes, I could do sequential backward selection in combination with a linear 
regression model, however, that would be essentially the same as the decision 
tree approach using MSE as objective function to be minimized at each split. 
Thanks for the input though, I have to brainstorm about it a little bit more.

 On Jul 1, 2015, at 3:00 PM, Jacob Schreiber jmschreibe...@gmail.com wrote:
 
 If you are working with entirely binary data, then features will not be 
 repeated in the tree naturally. I think you are discussing the more general 
 field of 'feature selection', though. There are a plethora of algorithms 
 which do that--try to identify which inputs are important to a correct 
 prediction. You can read more here: 
 http://scikit-learn.org/stable/modules/feature_selection.html 
 http://scikit-learn.org/stable/modules/feature_selection.html
 
 On Wed, Jul 1, 2015 at 9:45 AM, Sebastian Raschka se.rasc...@gmail.com 
 mailto:se.rasc...@gmail.com wrote:
 Yes, and thanks for the answers, it was just a random idea.
 
 But in all seriousness, which algorithm would you use for such a task -- 
 here, the goal is not predictive performance but rather inference:
 
 I am collaborating with experimentalists who obtained measurements on a 
 continuous scale 0.0 - 1.0, and each sample has ~30 binary features. They 
 basically want to learn from this data, for example, which combination of 
 features was important to yield a response = 0.5 (although this threshold 
 is not fixed)
 For example, using a decision tree, you could come up with something like
 
 If feature A=1 -- response  0.5
 If feature B=0 -- response  0.6
  If feature C=1  --- response  0.7
 etc.
 
 Basically, an association rule mining but with continuous outputs.
 
 On Jul 1, 2015, at 12:34 PM, Dale Smith dsm...@nexidia.com 
 mailto:dsm...@nexidia.com wrote:
 
 It is a crazy idea. It defeats the purpose of random forest, which is 
 introducing randomness in specific ways in order to achieve certain goals. 
 Your idea, while appropriate in your use case, does not fit with the 
 algorithm you want to use. Why not investigate alternatives that better fit 
 your use case?
  
 
 Dale Smith, Ph.D.
 Data Scientist
 ​
 image001.png http://nexidia.com/
 
 d. 404.495.7220 x 4008 tel:404.495.7220%20x%204008   f. 404.795.7221 
 tel:404.795.7221
 Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, 
 GA 30305
 
 image002.jpg http://blog.nexidia.com/ image003.jpg 
 https://www.linkedin.com/company/nexidia image004.jpg 
 https://plus.google.com/u/0/107921893643164441840/posts image005.jpg 
 https://twitter.com/Nexidia image006.jpg 
 https://www.youtube.com/user/NexidiaTV
  
 From: Sebastian Raschka [mailto:se.rasc...@gmail.com 
 mailto:se.rasc...@gmail.com] 
 Sent: Wednesday, July 01, 2015 12:17 PM
 To: scikit-learn-general@lists.sourceforge.net 
 mailto:scikit-learn-general@lists.sourceforge.net
 Subject: Re: [Scikit-learn-general] Is it possible to specify the order of 
 spliting in decision tree with scikit-learn?
  
 Maybe a crazy idea, but what I think could be useful is to have something 
 like a repeat_features parameter that can be set to `False` to not reuse 
 features down the tree.
  
 E.g., let's say we have 1000 different drug molecules with certain chemical 
 groups and have some sort of experimental data of whether they work or not. 
 Using decision tree classification/regression without feature repetition 
 could help to interpret which of the functional groups may be important -- 
 here the focus is maybe not so much predictive performance but rather 
 interpretability, something like supervised clustering. 
  
  
 On Jul 1, 2015, at 11:08 AM, Andreas Mueller t3k...@gmail.com 
 mailto:t3k...@gmail.com wrote:
  
 Not really, at that kind of defeats the purpose of learning the tree.
 you could built a series of stumps that first only get feature a, then 
 feature b and then feature c.
 
 On 06/30/2015 11:37 PM, Rex wrote:
 Given three columns, [A, B, C], can we specify the order of splitting, 
 so that it firstly split on categories of A, then B, and then by others?
 
 Based on on documentation page on DecisionTreeClassifier, there is no such 
 option. Is there any way to work it out?
 
 http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
  
 http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
 
 
 
 
 
 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/ https://www.gigenetcloud.com/
 
 
 
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net 
 

Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn?

2015-07-01 Thread Sebastian Raschka
Yes, and thanks for the answers, it was just a random idea.

But in all seriousness, which algorithm would you use for such a task -- here, 
the goal is not predictive performance but rather inference:

I am collaborating with experimentalists who obtained measurements on a 
continuous scale 0.0 - 1.0, and each sample has ~30 binary features. They 
basically want to learn from this data, for example, which combination of 
features was important to yield a response = 0.5 (although this threshold is 
not fixed)
For example, using a decision tree, you could come up with something like

If feature A=1 -- response  0.5
If feature B=0 -- response  0.6
 If feature C=1  --- response  0.7
etc.

Basically, an association rule mining but with continuous outputs.

 On Jul 1, 2015, at 12:34 PM, Dale Smith dsm...@nexidia.com wrote:
 
 It is a crazy idea. It defeats the purpose of random forest, which is 
 introducing randomness in specific ways in order to achieve certain goals. 
 Your idea, while appropriate in your use case, does not fit with the 
 algorithm you want to use. Why not investigate alternatives that better fit 
 your use case?
  
 
 Dale Smith, Ph.D.
 Data Scientist
 ​
 image001.png http://nexidia.com/
 
 d. 404.495.7220 x 4008   f. 404.795.7221
 Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 
 30305
 
 image002.jpg http://blog.nexidia.com/ image003.jpg 
 https://www.linkedin.com/company/nexidia image004.jpg 
 https://plus.google.com/u/0/107921893643164441840/posts image005.jpg 
 https://twitter.com/Nexidia image006.jpg 
 https://www.youtube.com/user/NexidiaTV
  
 From: Sebastian Raschka [mailto:se.rasc...@gmail.com] 
 Sent: Wednesday, July 01, 2015 12:17 PM
 To: scikit-learn-general@lists.sourceforge.net
 Subject: Re: [Scikit-learn-general] Is it possible to specify the order of 
 spliting in decision tree with scikit-learn?
  
 Maybe a crazy idea, but what I think could be useful is to have something 
 like a repeat_features parameter that can be set to `False` to not reuse 
 features down the tree.
  
 E.g., let's say we have 1000 different drug molecules with certain chemical 
 groups and have some sort of experimental data of whether they work or not. 
 Using decision tree classification/regression without feature repetition 
 could help to interpret which of the functional groups may be important -- 
 here the focus is maybe not so much predictive performance but rather 
 interpretability, something like supervised clustering. 
  
  
 On Jul 1, 2015, at 11:08 AM, Andreas Mueller t3k...@gmail.com 
 mailto:t3k...@gmail.com wrote:
  
 Not really, at that kind of defeats the purpose of learning the tree.
 you could built a series of stumps that first only get feature a, then 
 feature b and then feature c.
 
 On 06/30/2015 11:37 PM, Rex wrote:
 Given three columns, [A, B, C], can we specify the order of splitting, 
 so that it firstly split on categories of A, then B, and then by others?
 
 Based on on documentation page on DecisionTreeClassifier, there is no such 
 option. Is there any way to work it out?
 
 http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
  
 http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
 
 
 
 
 
 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/ https://www.gigenetcloud.com/
 
 
 
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net 
 mailto:Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
  
 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/___ 
 https://www.gigenetcloud.com/___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net 
 mailto:Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
  
 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT 

Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn?

2015-07-01 Thread Dale Smith
It is a crazy idea. It defeats the purpose of random forest, which is 
introducing randomness in specific ways in order to achieve certain goals. Your 
idea, while appropriate in your use case, does not fit with the algorithm you 
want to use. Why not investigate alternatives that better fit your use case?


Dale Smith, Ph.D.
Data Scientist
​
[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20logo.png]http://nexidia.com/

d. 404.495.7220 x 4008   f. 404.795.7221
Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 
30305

[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Blog.jpeg]http://blog.nexidia.com/
 [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20LinkedIn.jpeg] 
https://www.linkedin.com/company/nexidia  
[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Google.jpeg] 
https://plus.google.com/u/0/107921893643164441840/posts  
[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20twitter.jpeg] 
https://twitter.com/Nexidia  
[http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Youtube.jpeg] 
https://www.youtube.com/user/NexidiaTV

From: Sebastian Raschka [mailto:se.rasc...@gmail.com]
Sent: Wednesday, July 01, 2015 12:17 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] Is it possible to specify the order of 
spliting in decision tree with scikit-learn?

Maybe a crazy idea, but what I think could be useful is to have something like 
a repeat_features parameter that can be set to `False` to not reuse features 
down the tree.

E.g., let's say we have 1000 different drug molecules with certain chemical 
groups and have some sort of experimental data of whether they work or not. 
Using decision tree classification/regression without feature repetition could 
help to interpret which of the functional groups may be important -- here the 
focus is maybe not so much predictive performance but rather interpretability, 
something like supervised clustering.


On Jul 1, 2015, at 11:08 AM, Andreas Mueller 
t3k...@gmail.commailto:t3k...@gmail.com wrote:

Not really, at that kind of defeats the purpose of learning the tree.
you could built a series of stumps that first only get feature a, then feature 
b and then feature c.
On 06/30/2015 11:37 PM, Rex wrote:
Given three columns, [A, B, C], can we specify the order of splitting, so 
that it firstly split on categories of A, then B, and then by others?

Based on on documentation page on DecisionTreeClassifier, there is no such 
option. Is there any way to work it out?

http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html






--

Don't Limit Your Business. Reach for the Cloud.

GigeNET's Cloud Solutions provide you with the tools and support that

you need to offload your IT needs and focus on growing your business.

Configured For All Businesses. Start Your Cloud Today.

https://www.gigenetcloud.com/




___

Scikit-learn-general mailing list

Scikit-learn-general@lists.sourceforge.netmailto:Scikit-learn-general@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.netmailto:Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] GSoC midterms NOW!

2015-07-01 Thread Olivier Grisel
Hi all,

Sorry I am late on my emails, I am at a conference.

I have not invested enough time to mentor Wei Xue on the GMM but he is
responsive and still making progress on a regular basis albeit behind
schedule.

So I plan to make him pass.


-- 
Olivier

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general