Re: [Scikit-learn-general] Library of pre-trained models
2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com: 2) The gensim implementation predates the patenting Does that matter? -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn?
I don't think that having that feature is a good idea. The great power of decision trees (and ensembles of trees) is the ability to learn complicated non-linearities which involve splitting on a variable multiple times if necessary. If you're looking for an interpretable feature selection method, there are better alternatives. On Wed, Jul 1, 2015 at 8:16 AM, Sebastian Raschka se.rasc...@gmail.com wrote: Maybe a crazy idea, but what I think could be useful is to have something like a repeat_features parameter that can be set to `False` to not reuse features down the tree. E.g., let's say we have 1000 different drug molecules with certain chemical groups and have some sort of experimental data of whether they work or not. Using decision tree classification/regression without feature repetition could help to interpret which of the functional groups may be important -- here the focus is maybe not so much predictive performance but rather interpretability, something like supervised clustering. On Jul 1, 2015, at 11:08 AM, Andreas Mueller t3k...@gmail.com wrote: Not really, at that kind of defeats the purpose of learning the tree. you could built a series of stumps that first only get feature a, then feature b and then feature c. On 06/30/2015 11:37 PM, Rex wrote: Given three columns, [A, B, C], can we specify the order of splitting, so that it firstly split on categories of A, then B, and then by others? Based on on documentation page on DecisionTreeClassifier, there is no such option. Is there any way to work it out? http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today.https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
On 07/01/2015 02:42 PM, Lars Buitinck wrote: 2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com: 2) The gensim implementation predates the patenting Does that matter? no -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
Not wrt to the patents/licensing question, but the link to it was in response to my are we even allowed to use it...the point I was making (poorly) was that it was implemented in gensim before are we allowed was a question that was even relevant, ergo it wasn't actually an answer to my question. FM. On 1 July 2015 at 11:42, Lars Buitinck larsm...@gmail.com wrote: 2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com: 2) The gensim implementation predates the patenting Does that matter? -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
On Wed, Jul 01, 2015 at 11:04:30AM -0400, Andreas Mueller wrote: Theano uses __setstate__ and __getstate__ and they seem to be happy with that. As long as we don't change the data model that works easily, but then so does pickle. If when chage the data model, which we have done a few time, we need to add migration code. We used to do that in Mayavi, but it turned out to be a very big maintenance burden. I'd like to push that burden onto users: it's not hard to do if you understand the models. I think that we already do a lot and that we shouldn't strive to do more, as it is going to weaken us. We could add a library of previously pickled models to the tests to ensure it works That we be necessary. But you need to add impedence matching code each time you change the data model. Gaël -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn?
If you are working with entirely binary data, then features will not be repeated in the tree naturally. I think you are discussing the more general field of 'feature selection', though. There are a plethora of algorithms which do that--try to identify which inputs are important to a correct prediction. You can read more here: http://scikit-learn.org/stable/modules/feature_selection.html On Wed, Jul 1, 2015 at 9:45 AM, Sebastian Raschka se.rasc...@gmail.com wrote: Yes, and thanks for the answers, it was just a random idea. But in all seriousness, which algorithm would you use for such a task -- here, the goal is not predictive performance but rather inference: I am collaborating with experimentalists who obtained measurements on a continuous scale 0.0 - 1.0, and each sample has ~30 binary features. They basically want to learn from this data, for example, which combination of features was important to yield a response = 0.5 (although this threshold is not fixed) For example, using a decision tree, you could come up with something like If feature A=1 -- response 0.5 If feature B=0 -- response 0.6 If feature C=1 --- response 0.7 etc. Basically, an association rule mining but with continuous outputs. On Jul 1, 2015, at 12:34 PM, Dale Smith dsm...@nexidia.com wrote: It is a crazy idea. It defeats the purpose of random forest, which is introducing randomness in specific ways in order to achieve certain goals. Your idea, while appropriate in your use case, does not fit with the algorithm you want to use. Why not investigate alternatives that better fit your use case? *Dale Smith, Ph.D.* Data Scientist image001.png http://nexidia.com/ *d.* 404.495.7220 x 4008 *f.* 404.795.7221 Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 30305 image002.jpg http://blog.nexidia.com/ image003.jpg https://www.linkedin.com/company/nexidia image004.jpg https://plus.google.com/u/0/107921893643164441840/posts image005.jpg https://twitter.com/Nexidia image006.jpg https://www.youtube.com/user/NexidiaTV *From:* Sebastian Raschka [mailto:se.rasc...@gmail.com se.rasc...@gmail.com] *Sent:* Wednesday, July 01, 2015 12:17 PM *To:* scikit-learn-general@lists.sourceforge.net *Subject:* Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn? Maybe a crazy idea, but what I think could be useful is to have something like a repeat_features parameter that can be set to `False` to not reuse features down the tree. E.g., let's say we have 1000 different drug molecules with certain chemical groups and have some sort of experimental data of whether they work or not. Using decision tree classification/regression without feature repetition could help to interpret which of the functional groups may be important -- here the focus is maybe not so much predictive performance but rather interpretability, something like supervised clustering. On Jul 1, 2015, at 11:08 AM, Andreas Mueller t3k...@gmail.com wrote: Not really, at that kind of defeats the purpose of learning the tree. you could built a series of stumps that first only get feature a, then feature b and then feature c. On 06/30/2015 11:37 PM, Rex wrote: Given three columns, [A, B, C], can we specify the order of splitting, so that it firstly split on categories of A, then B, and then by others? Based on on documentation page on DecisionTreeClassifier, there is no such option. Is there any way to work it out? http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to
Re: [Scikit-learn-general] Library of pre-trained models
On Wed, Jul 1, 2015 at 8:43 PM, Dale Smith dsm...@nexidia.com wrote: Apparently so; here is a python/cython implementation. http://rare-technologies.com/deep-learning-with-word2vec-and-gensim/ word2vec is *not* deep learning. The skip-gram model has been shown recently to reduce to a certain matrix factorization [*]. So it's a shallow network with only one hidden layer and without non-linearities. Mathieu [*] Neural Word Embedding as Implicit Matrix Factorization by O. Levy and Y. Goldberg. -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
1) The upshot seems to be that it's a defensive patent, and in any case the code was released under Apache 2.0, so it's fine to use. https://code.google.com/p/word2vec/ https://groups.google.com/forum/#!topic/word2vec-toolkit/1hID9F74_Ho 2) The gensim implementation predates the patenting (thanks for that reference, Mathieu...interesting!) FM. On 1 July 2015 at 06:52, Mathieu Blondel math...@mblondel.org wrote: On Wed, Jul 1, 2015 at 8:43 PM, Dale Smith dsm...@nexidia.com wrote: Apparently so; here is a python/cython implementation. http://rare-technologies.com/deep-learning-with-word2vec-and-gensim/ word2vec is *not* deep learning. The skip-gram model has been shown recently to reduce to a certain matrix factorization [*]. So it's a shallow network with only one hidden layer and without non-linearities. Mathieu [*] Neural Word Embedding as Implicit Matrix Factorization by O. Levy and Y. Goldberg. -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
Apparently so; here is a python/cython implementation. http://rare-technologies.com/deep-learning-with-word2vec-and-gensim/ Dale Smith, Ph.D. Data Scientist [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20logo.png]http://nexidia.com/ d. 404.495.7220 x 4008 f. 404.795.7221 Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 30305 [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Blog.jpeg]http://blog.nexidia.com/ [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20LinkedIn.jpeg] https://www.linkedin.com/company/nexidia [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Google.jpeg] https://plus.google.com/u/0/107921893643164441840/posts [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20twitter.jpeg] https://twitter.com/Nexidia [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Youtube.jpeg] https://www.youtube.com/user/NexidiaTV From: Fred Mailhot [mailto:fred.mail...@gmail.com] Sent: Tuesday, June 30, 2015 11:10 PM To: Mathieu Blondel; scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] Library of pre-trained models Tangent: Are we even allowed to use word2vec anymore, now that Goog has patented it? (in any case, I'll be looking a bit more closely at GloVe) F. On 30 June 2015 at 19:26, Mathieu Blondel math...@mblondel.orgmailto:math...@mblondel.org wrote: For unsupervised models that take a long time to train, such as deep learning or word2vec based feature extractors, this can be pretty useful. Regardless, a major issue is that we still haven't figured out how to robustly solve model persistence. Mathieu On Wed, Jul 1, 2015 at 4:53 AM, Andreas Mueller t3k...@gmail.commailto:t3k...@gmail.com wrote: For most applications, this will not work, as the training data needs to come from the same distribution as your test data. Language identification is pretty simple, and training a linear classifier on n-grams should get you quite a bit. On 06/28/2015 09:58 AM, Erez Segal wrote: I was searching a library of pre-trained models using scikit-learn (e.g. - a classifier for language identification), stored with joblib or something similiar. Is there such existing library? Thanks, Erez -- Monitor 25 network devices or servers for free with OpManager! OpManager is web-based network management software that monitors network devices and physical virtual servers, alerts via email sms for fault. Monitor 25 devices for free with no restriction. Download now http://ad.doubleclick.net/ddm/clk/292181274;119417398;o ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.netmailto:Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.netmailto:Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.netmailto:Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
oh, I missed that one from Omer Levy's debunking word2vec series. Nice! On 1 July 2015 at 23:52, Mathieu Blondel math...@mblondel.org wrote: On Wed, Jul 1, 2015 at 8:43 PM, Dale Smith dsm...@nexidia.com wrote: Apparently so; here is a python/cython implementation. http://rare-technologies.com/deep-learning-with-word2vec-and-gensim/ word2vec is *not* deep learning. The skip-gram model has been shown recently to reduce to a certain matrix factorization [*]. So it's a shallow network with only one hidden layer and without non-linearities. Mathieu [*] Neural Word Embedding as Implicit Matrix Factorization by O. Levy and Y. Goldberg. -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
On 07/01/2015 10:27 AM, Fred Mailhot wrote: 1) The upshot seems to be that it's a defensive patent, and in any case the code was released under Apache 2.0, so it's fine to use. https://code.google.com/p/word2vec/ https://groups.google.com/forum/#!topic/word2vec-toolkit/1hID9F74_Ho https://groups.google.com/forum/#%21topic/word2vec-toolkit/1hID9F74_Ho 2) The gensim implementation predates the patenting Patents and code licenses are completely independent and open source implementations don't invalidate patents. The Apache license grants patents iirc, so that is a bit different (I'm not certain about the details). If it was BSD licensed, they could definitely still sue you for using it. -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
[Scikit-learn-general] Off Topic: Advices for a conference.
Hi, I have written a paper where I apply a machine learning methodology to rank the risk factors for asthma. The proposed methodology is new but what is interesting is the application to the medical problem. I am wondering if you are aware of any conference that may be relevant with submission deadline before the end of August. I am looking for something like: Quantitative methods for life science Let me know. Thanks, Luca -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn?
Not really, at that kind of defeats the purpose of learning the tree. you could built a series of stumps that first only get feature a, then feature b and then feature c. On 06/30/2015 11:37 PM, Rex wrote: Given three columns, [A, B, C], can we specify the order of splitting, so that it firstly split on categories of A, then B, and then by others? Based on on documentation page on DecisionTreeClassifier, there is no such option. Is there any way to work it out? http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
On 06/30/2015 10:26 PM, Mathieu Blondel wrote: Regardless, a major issue is that we still haven't figured out how to robustly solve model persistence. Theano uses __setstate__ and __getstate__ and they seem to be happy with that. We could add a library of previously pickled models to the tests to ensure it works -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
http://arxiv.org/abs/1301.3781 Submitted on 16 Jan 2013, last revised 7 Sep 2013 https://www.google.com/patents/US9037464 Filed on 15 March 2013 On Thu, Jul 2, 2015 at 4:03 AM, Matthieu Brucher matthieu.bruc...@gmail.com wrote: 2015-07-01 19:43 GMT+01:00 Andreas Mueller t3k...@gmail.com: On 07/01/2015 02:42 PM, Lars Buitinck wrote: 2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com: 2) The gensim implementation predates the patenting Does that matter? no If the algorithm was published before it was filed, then it should be in the public domain... At least that's why lawyers told me to file patents before we publish anything... -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
The main interesting point is the date of filing the patent and the date of the publication of the paper... These are as interesting to have as the implementation date. 2015-07-01 21:12 GMT+01:00 Fred Mailhot fred.mail...@gmail.com: Ah...of course it was filed before...Radim did the initial implementation Sept 2013. My bad, I'll stop talking out of my behind now. On 1 July 2015 at 12:56, Mathieu Blondel math...@mblondel.org wrote: http://arxiv.org/abs/1301.3781 Submitted on 16 Jan 2013, last revised 7 Sep 2013 https://www.google.com/patents/US9037464 Filed on 15 March 2013 On Thu, Jul 2, 2015 at 4:03 AM, Matthieu Brucher matthieu.bruc...@gmail.com wrote: 2015-07-01 19:43 GMT+01:00 Andreas Mueller t3k...@gmail.com: On 07/01/2015 02:42 PM, Lars Buitinck wrote: 2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com: 2) The gensim implementation predates the patenting Does that matter? no If the algorithm was published before it was filed, then it should be in the public domain... At least that's why lawyers told me to file patents before we publish anything... -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
If that's the case, then the patent could be invalidated as far as I know (that's what I was told and in my company and the one I'm currently working for). Except if they didn't present how it is actually being done. 2015-07-01 20:56 GMT+01:00 Mathieu Blondel math...@mblondel.org: http://arxiv.org/abs/1301.3781 Submitted on 16 Jan 2013, last revised 7 Sep 2013 https://www.google.com/patents/US9037464 Filed on 15 March 2013 On Thu, Jul 2, 2015 at 4:03 AM, Matthieu Brucher matthieu.bruc...@gmail.com wrote: 2015-07-01 19:43 GMT+01:00 Andreas Mueller t3k...@gmail.com: On 07/01/2015 02:42 PM, Lars Buitinck wrote: 2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com: 2) The gensim implementation predates the patenting Does that matter? no If the algorithm was published before it was filed, then it should be in the public domain... At least that's why lawyers told me to file patents before we publish anything... -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
Ah...of course it was filed before...Radim did the initial implementation Sept 2013. My bad, I'll stop talking out of my behind now. On 1 July 2015 at 12:56, Mathieu Blondel math...@mblondel.org wrote: http://arxiv.org/abs/1301.3781 Submitted on 16 Jan 2013, last revised 7 Sep 2013 https://www.google.com/patents/US9037464 Filed on 15 March 2013 On Thu, Jul 2, 2015 at 4:03 AM, Matthieu Brucher matthieu.bruc...@gmail.com wrote: 2015-07-01 19:43 GMT+01:00 Andreas Mueller t3k...@gmail.com: On 07/01/2015 02:42 PM, Lars Buitinck wrote: 2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com: 2) The gensim implementation predates the patenting Does that matter? no If the algorithm was published before it was filed, then it should be in the public domain... At least that's why lawyers told me to file patents before we publish anything... -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
On 07/01/2015 02:49 PM, Gael Varoquaux wrote: On Wed, Jul 01, 2015 at 11:04:30AM -0400, Andreas Mueller wrote: Theano uses __setstate__ and __getstate__ and they seem to be happy with that. As long as we don't change the data model that works easily, but then so does pickle. If when chage the data model, which we have done a few time, we need to add migration code. We used to do that in Mayavi, but it turned out to be a very big maintenance burden. I'd like to push that burden onto users: it's not hard to do if you understand the models. I think that we already do a lot and that we shouldn't strive to do more, as it is going to weaken us. I agree that we probably don't want to do this at this point. We could add a library of previously pickled models to the tests to ensure it works That we be necessary. But you need to add impedence matching code each time you change the data model. At least it could guard us against changing the data model accidentally -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Library of pre-trained models
2015-07-01 19:43 GMT+01:00 Andreas Mueller t3k...@gmail.com: On 07/01/2015 02:42 PM, Lars Buitinck wrote: 2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com: 2) The gensim implementation predates the patenting Does that matter? no If the algorithm was published before it was filed, then it should be in the public domain... At least that's why lawyers told me to file patents before we publish anything... -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn?
Yes, I could do sequential backward selection in combination with a linear regression model, however, that would be essentially the same as the decision tree approach using MSE as objective function to be minimized at each split. Thanks for the input though, I have to brainstorm about it a little bit more. On Jul 1, 2015, at 3:00 PM, Jacob Schreiber jmschreibe...@gmail.com wrote: If you are working with entirely binary data, then features will not be repeated in the tree naturally. I think you are discussing the more general field of 'feature selection', though. There are a plethora of algorithms which do that--try to identify which inputs are important to a correct prediction. You can read more here: http://scikit-learn.org/stable/modules/feature_selection.html http://scikit-learn.org/stable/modules/feature_selection.html On Wed, Jul 1, 2015 at 9:45 AM, Sebastian Raschka se.rasc...@gmail.com mailto:se.rasc...@gmail.com wrote: Yes, and thanks for the answers, it was just a random idea. But in all seriousness, which algorithm would you use for such a task -- here, the goal is not predictive performance but rather inference: I am collaborating with experimentalists who obtained measurements on a continuous scale 0.0 - 1.0, and each sample has ~30 binary features. They basically want to learn from this data, for example, which combination of features was important to yield a response = 0.5 (although this threshold is not fixed) For example, using a decision tree, you could come up with something like If feature A=1 -- response 0.5 If feature B=0 -- response 0.6 If feature C=1 --- response 0.7 etc. Basically, an association rule mining but with continuous outputs. On Jul 1, 2015, at 12:34 PM, Dale Smith dsm...@nexidia.com mailto:dsm...@nexidia.com wrote: It is a crazy idea. It defeats the purpose of random forest, which is introducing randomness in specific ways in order to achieve certain goals. Your idea, while appropriate in your use case, does not fit with the algorithm you want to use. Why not investigate alternatives that better fit your use case? Dale Smith, Ph.D. Data Scientist image001.png http://nexidia.com/ d. 404.495.7220 x 4008 tel:404.495.7220%20x%204008 f. 404.795.7221 tel:404.795.7221 Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 30305 image002.jpg http://blog.nexidia.com/ image003.jpg https://www.linkedin.com/company/nexidia image004.jpg https://plus.google.com/u/0/107921893643164441840/posts image005.jpg https://twitter.com/Nexidia image006.jpg https://www.youtube.com/user/NexidiaTV From: Sebastian Raschka [mailto:se.rasc...@gmail.com mailto:se.rasc...@gmail.com] Sent: Wednesday, July 01, 2015 12:17 PM To: scikit-learn-general@lists.sourceforge.net mailto:scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn? Maybe a crazy idea, but what I think could be useful is to have something like a repeat_features parameter that can be set to `False` to not reuse features down the tree. E.g., let's say we have 1000 different drug molecules with certain chemical groups and have some sort of experimental data of whether they work or not. Using decision tree classification/regression without feature repetition could help to interpret which of the functional groups may be important -- here the focus is maybe not so much predictive performance but rather interpretability, something like supervised clustering. On Jul 1, 2015, at 11:08 AM, Andreas Mueller t3k...@gmail.com mailto:t3k...@gmail.com wrote: Not really, at that kind of defeats the purpose of learning the tree. you could built a series of stumps that first only get feature a, then feature b and then feature c. On 06/30/2015 11:37 PM, Rex wrote: Given three columns, [A, B, C], can we specify the order of splitting, so that it firstly split on categories of A, then B, and then by others? Based on on documentation page on DecisionTreeClassifier, there is no such option. Is there any way to work it out? http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net
Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn?
Yes, and thanks for the answers, it was just a random idea. But in all seriousness, which algorithm would you use for such a task -- here, the goal is not predictive performance but rather inference: I am collaborating with experimentalists who obtained measurements on a continuous scale 0.0 - 1.0, and each sample has ~30 binary features. They basically want to learn from this data, for example, which combination of features was important to yield a response = 0.5 (although this threshold is not fixed) For example, using a decision tree, you could come up with something like If feature A=1 -- response 0.5 If feature B=0 -- response 0.6 If feature C=1 --- response 0.7 etc. Basically, an association rule mining but with continuous outputs. On Jul 1, 2015, at 12:34 PM, Dale Smith dsm...@nexidia.com wrote: It is a crazy idea. It defeats the purpose of random forest, which is introducing randomness in specific ways in order to achieve certain goals. Your idea, while appropriate in your use case, does not fit with the algorithm you want to use. Why not investigate alternatives that better fit your use case? Dale Smith, Ph.D. Data Scientist image001.png http://nexidia.com/ d. 404.495.7220 x 4008 f. 404.795.7221 Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 30305 image002.jpg http://blog.nexidia.com/ image003.jpg https://www.linkedin.com/company/nexidia image004.jpg https://plus.google.com/u/0/107921893643164441840/posts image005.jpg https://twitter.com/Nexidia image006.jpg https://www.youtube.com/user/NexidiaTV From: Sebastian Raschka [mailto:se.rasc...@gmail.com] Sent: Wednesday, July 01, 2015 12:17 PM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn? Maybe a crazy idea, but what I think could be useful is to have something like a repeat_features parameter that can be set to `False` to not reuse features down the tree. E.g., let's say we have 1000 different drug molecules with certain chemical groups and have some sort of experimental data of whether they work or not. Using decision tree classification/regression without feature repetition could help to interpret which of the functional groups may be important -- here the focus is maybe not so much predictive performance but rather interpretability, something like supervised clustering. On Jul 1, 2015, at 11:08 AM, Andreas Mueller t3k...@gmail.com mailto:t3k...@gmail.com wrote: Not really, at that kind of defeats the purpose of learning the tree. you could built a series of stumps that first only get feature a, then feature b and then feature c. On 06/30/2015 11:37 PM, Rex wrote: Given three columns, [A, B, C], can we specify the order of splitting, so that it firstly split on categories of A, then B, and then by others? Based on on documentation page on DecisionTreeClassifier, there is no such option. Is there any way to work it out? http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net mailto:Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net mailto:Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT
Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn?
It is a crazy idea. It defeats the purpose of random forest, which is introducing randomness in specific ways in order to achieve certain goals. Your idea, while appropriate in your use case, does not fit with the algorithm you want to use. Why not investigate alternatives that better fit your use case? Dale Smith, Ph.D. Data Scientist [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20logo.png]http://nexidia.com/ d. 404.495.7220 x 4008 f. 404.795.7221 Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 30305 [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Blog.jpeg]http://blog.nexidia.com/ [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20LinkedIn.jpeg] https://www.linkedin.com/company/nexidia [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Google.jpeg] https://plus.google.com/u/0/107921893643164441840/posts [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20twitter.jpeg] https://twitter.com/Nexidia [http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Youtube.jpeg] https://www.youtube.com/user/NexidiaTV From: Sebastian Raschka [mailto:se.rasc...@gmail.com] Sent: Wednesday, July 01, 2015 12:17 PM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] Is it possible to specify the order of spliting in decision tree with scikit-learn? Maybe a crazy idea, but what I think could be useful is to have something like a repeat_features parameter that can be set to `False` to not reuse features down the tree. E.g., let's say we have 1000 different drug molecules with certain chemical groups and have some sort of experimental data of whether they work or not. Using decision tree classification/regression without feature repetition could help to interpret which of the functional groups may be important -- here the focus is maybe not so much predictive performance but rather interpretability, something like supervised clustering. On Jul 1, 2015, at 11:08 AM, Andreas Mueller t3k...@gmail.commailto:t3k...@gmail.com wrote: Not really, at that kind of defeats the purpose of learning the tree. you could built a series of stumps that first only get feature a, then feature b and then feature c. On 06/30/2015 11:37 PM, Rex wrote: Given three columns, [A, B, C], can we specify the order of splitting, so that it firstly split on categories of A, then B, and then by others? Based on on documentation page on DecisionTreeClassifier, there is no such option. Is there any way to work it out? http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.netmailto:Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.netmailto:Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] GSoC midterms NOW!
Hi all, Sorry I am late on my emails, I am at a conference. I have not invested enough time to mentor Wei Xue on the GMM but he is responsive and still making progress on a regular basis albeit behind schedule. So I plan to make him pass. -- Olivier -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general