Re: [scikit-learn] A necessary feature for Decision trees
Dear Yang Li, > Neither the classificationTree nor the regressionTree supports categorical feature. That means the Decision trees model can only accept continuous feature. Consider either manually encoding your categories in bitstrings (e.g., "Facebook" = 001, "Twitter" = 010, "Google" = 100), or using OneHotEncoder to do the same thing for you automatically. Cheers, J.B. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] A necessary feature for Decision trees
Hi, I`m a graduate student utilizing sklean for some data work. And when I`m handling the data using the Decision Trees library, I found there are some inconvenience: Neither the classificationTree nor the regressionTree supports categorical feature. That means the Decision trees model can only accept continuous feature. For example, the categorical feature like app name such as google, facebook can`t be input into the model, because they can`t be transformed to continuous value properly. And there don`t exist a corresponding algorithm to divide discrete feature in the Decision Trees library. However, the CART algorithm itself has considered the use of categorical feature. So I have made some modification of Decision Trees library based on CART and apply the new model on my own work. And it proves that the support for categorical feature indeed improves the performance, which is very necessary for decision tree, I think. I`m very willing to contribute this to sklearn community, but I`m new to this community, not so familiar about the procedure. Could u give some suggestions or comments on this new feature? Or has anyone already processed on this feature? Thank you so much. Best wishes! -- 顺颂时祺! 李扬 上海交通大学 电子信息 与 电气工程 学院 电话:18818212371 地址:上海市闵行区东川路800号 邮编:200240 Yang Li +86 188 1821 2371 Shanghai Jiao Tong University School of Electronic,Information and Electrical Engineering F1203026 800 Dongchuan Road, Minhang District, Shanghai 200240 ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Any plans on generalizing Pipeline and transformers?
I've read about Dask and it is a tool I want to have in my belt especially for using the SGE connection in order to run GridSearchCV on the supercomputer center I have access to. Should it work as promised it will be one of my favs. As far as my toy example I keep more limited goals with this graph: I am not currently interested in parallelizing each step as I guess that parallelizing each graph fit through gridSearchCV will be more similar to what I need. I keep working on a proof concept. You can have a look at: https://github.com/mcasl/PAELLA/blob/master/pipeGraph.py along with a few unitary tests: https://github.com/mcasl/PAELLA/blob/master/tests/test_pipeGraph.py As of today, I have an iterable graph of steps that can be fitted/run depending on their role (some can be disable during run while active during fit or vice-versa). I still have to play a bit with injecting different parameters to make it compatible with gridSearchCV and learn a bit about the memory options in order to cache results. Any comments highly appreciated, truly! Manolo 2017-12-30 15:34 GMT+01:00 Frédéric Bastien: > This start to look as the dask project. Do you know it? > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] clustering on big dataset
Yes, it is an efficient method, still, we need to specify the number of clusters or the threshold. Is there another way to run hierarchy clustering on the big dataset? The main problem is the distance matrix. Thanks. On Tue, Jan 2, 2018 at 6:02 AM, Olivier Griselwrote: > Have you had a look at BIRCH? > > http://scikit-learn.org/stable/modules/clustering.html#birch > > -- > Olivier > > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] pomegranate v0.9.0 released: probabilistic modeling for Python
Howdy all! I'm pleased to announced the release of pomegranate v0.9.0. The focus of this release is on missing value support across all model fitting / structure learning / inference methods and models. This enables you to do everything from fitting a multivariate Gaussian distribution to an incomplete data set (using a GPU if desired!) to learning the structure of a Bayesian network on an incomplete data set, to running Viterbi decoding using a hidden Markov model on a sequence with some missing values. Read more about it here: http://bit.ly/2CyrXtX Thanks! Jacob ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] MLPClassifier as a feature selector
I agree with Gael on this one and am happy to help with the PR if you need any assistance. Best, Maciek Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl 2017-12-29 18:14 GMT+01:00 Gael Varoquaux: > I think that a transform method would be good. We would have to add a > parameter to the constructor to specify which layer is used for the > transform. It should default to "-1", in my opinion. > > Cheers, > > Gaël > > Sent from my phone. Please forgive typos and briefness. > On Dec 29, 2017, at 17:48, "Javier López" wrote: > >> Hi Thomas, >> >> it is possible to obtain the activation values of any hidden layer, but >> the >> procedure is not completely straight forward. If you look at the code of >> the `_predict` method of MLPs you can see the following: >> >> ```python >> def _predict(self, X): >> """Predict using the trained model >> >> Parameters >> -- >> X : {array-like, sparse matrix}, shape (n_samples, n_features) >> The input data. >> >> Returns >> --- >> y_pred : array-like, shape (n_samples,) or (n_samples, n_outputs) >> The decision function of the samples for each class in the >> model. >> """ >> X = check_array(X, accept_sparse=['csr', 'csc', 'coo']) >> >> # Make sure self.hidden_layer_sizes is a list >> hidden_layer_sizes = self.hidden_layer_sizes >> if not hasattr(hidden_layer_sizes, "__iter__"): >> hidden_layer_sizes = [hidden_layer_sizes] >> hidden_layer_sizes = list(hidden_layer_sizes) >> >> layer_units = [X.shape[1]] + hidden_layer_sizes + \ >> [self.n_outputs_] >> >> # Initialize layers >> activations = [X] >> >> for i in range(self.n_layers_ - 1): >> activations.append(np.empty((X.shape[0], >> layer_units[i + 1]))) >> # forward propagate >> self._forward_pass(activations) >> y_pred = activations[-1] >> >> return y_pred >> ``` >> >> the line `y_pred = activations[-1]` is responsible for extracting the >> values for the last layer, >> but the `activations` variable contains the values for all the neurons. >> >> You can make this function into your own external method (changing the >> `self` attribute by >> a proper parameter) and add an extra argument which specifies the >> layer(s) that you want. >> I have done this myself in order to make an AutoEncoderNetwork out of the >> MLP >> implementation. >> >> This makes me wonder, would it be worth adding this to sklearn? >> A very simple way would be to refactor the `_predict` method, with the >> additional layer >> argument, to a new method `_predict_layer`, then we can have the >> `_predict` method >> simply call `_predict_layer(..., layer=-1)` and add a new method (perhaps >> a `transform`?) >> that allows to get (raveled) values for an arbitrary subset of the layers. >> >> I'd be happy to submit a PR if you guys think it would be interesting for >> the project. >> >> Javier >> >> >> >> On Thu, Dec 7, 2017 at 12:51 AM Thomas Evangelidis >> wrote: >> >>> Greetings, >>> >>> I want to train a MLPClassifier with one hidden layer and use it as a >>> feature selector for an MLPRegressor. >>> Is it possible to get the values of the neurons from the last hidden >>> layer of the MLPClassifier to pass them as input to the MLPRegressor? >>> >>> If it is not possible with scikit-learn, is anyone aware of any >>> scikit-compatible NN library that offers this functionality? For example >>> this one: >>> >>> http://scikit-neuralnetwork.readthedocs.io/en/latest/index.html >>> >>> I wouldn't like to do this in Tensorflow because the MLP there is much >>> slower than scikit-learn's implementation. >>> >> -- >> >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn