Re: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features?

2019-09-14 Thread Javier López
If you have datasets with many categorical features, and perhaps many categories, the tools in sklearn are quite limited, but there are alternative implementations of boosted trees that are designed with categorical features in mind. Take a look at catboost [1], which has an sklearn-compatible API.

Re: [scikit-learn] How does the random state influence the decision tree splits?

2018-10-27 Thread Javier López
Hi Sebastian, I think the random state is used to select the features that go into each split (look at the `max_features` parameter) Cheers, Javier On Sun, Oct 28, 2018 at 12:07 AM Sebastian Raschka < m...@sebastianraschka.com> wrote: > Hi all, > > when I was implementing a bagging classifier b

Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-10-03 Thread Javier López
On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux wrote: > The reason that pickles are brittle and that sharing pickles is a bad > practice is that pickle use an implicitly defined data model, which is > defined via the internals of objects. > Plus the fact that loading a pickle can execute arbitrar

Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Javier López
On Fri, Sep 28, 2018 at 8:46 PM Andreas Mueller wrote: > Basically what you're saying is that you're fine with versioning the > models and having the model break loudly if anything changes. > That's not actually what most people want. They want to be able to make > predictions with a given model

Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Javier López
On Fri, Sep 28, 2018 at 6:41 PM Andreas Mueller wrote: > Javier: > The problem is not so much storing the "model" but storing how to make > predictions. Different versions could act differently > on the same data structure - and the data structure could change. Both > happen in scikit-learn. > So

Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Javier López
On Fri, Sep 28, 2018 at 1:03 AM Sebastian Raschka wrote: > Chris Emmery, Chris Wagner and I toyed around with JSON a while back ( > https://cmry.github.io/notes/serialize), and it could be feasible I came across your notes a while back, they were really useful! I hacked a variation of it that d

Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-27 Thread Javier López
First of all, congratulations on the release, great work, everyone! I think model serialization should be a priority. Particularly, I think that (whenever practical) there should be a way of serializing estimators (either unfitted or fitted) in a text-readable format, prefereably JSON or PMML/PFA

Re: [scikit-learn] Delegating "get_params" and "set_params" to a wrapped estimator when parameter is not defined.

2018-04-16 Thread Javier López
Hi Manolo! Your code looks nice, but my use case is a bit different. I have a mixed set of parameters, some come from my wrapper, and some from the wrapped estimator. The logic I am going for is something like "If you know about this parameter, then deal with it, if not, then pass it along to the

Re: [scikit-learn] Delegating "get_params" and "set_params" to a wrapped estimator when parameter is not defined.

2018-04-16 Thread Javier López
How could I make mixins work in this case? If I define the class `FancyEstimatorMixin`, in order to get a drop-in replacement for a sklearn object wouldn't I need to monkey-patch the scikit-learn `BaseEstimator` class to inherit from my mixin? Or am I misunderstanding something? (BTW monkey-patch

[scikit-learn] Delegating "get_params" and "set_params" to a wrapped estimator when parameter is not defined.

2018-04-13 Thread Javier López
I have a class `FancyEstimator(BaseEstimator, MetaEstimatorMixin): ...` that wraps around an arbitrary sklearn estimator to add some functionality I am interested about. This class contains an attribute `self.estimator` that contains the wrapped estimator. Delegation of the main methods, such as `f

Re: [scikit-learn] MLPClassifier as a feature selector

2017-12-29 Thread Javier López
Hi Thomas, it is possible to obtain the activation values of any hidden layer, but the procedure is not completely straight forward. If you look at the code of the `_predict` method of MLPs you can see the following: ```python def _predict(self, X): """Predict using the trained model

Re: [scikit-learn] Validating L2 - Least Squares - sum of squares, During a Normalization Function

2017-10-08 Thread Javier López
Why would the square of a real number ever be negative? I believe the "quirk" in python is just operator precedence, as the power gets evaluated before applying the unary "-" On Sun, Oct 8, 2017 at 11:34 AM Joel Nothman wrote: > (normalize(X) * normalize(X)).sum(axis=1) works fine here. > > But

Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-13 Thread Javier López Peña
> On 13 Mar 2017, at 21:18, Andreas Mueller wrote: > > No, if all the samples are normalized and your aggregation function is sane > (like the mean), the output will also be normalised. You are completely right, I hadn’t checked this for random forests. Still, my purpose is to reduce model com

Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-13 Thread Javier López Peña
> You could use a regression model with a logistic sigmoid in the output layer. By training a regression network with logistic activation the outputs do not add to 1. I just checked on a minimal example on the iris dataset. ___ scikit-learn mailing lis

Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-13 Thread Javier López Peña
Hi Giles, thanks for the suggestion! Training a regression tree would require sticking some kind of probability normaliser at the end to ensure proper probabilities, this might somehow hurt sharpness or calibration. Unfortunately, one of the things I am trying to do with this is moving away fr

Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-12 Thread Javier López Peña
> On 12 Mar 2017, at 18:38, Gael Varoquaux > wrote: > > You can use sample weights to go a bit in this direction. But in general, > the mathematical meaning of your intuitions will depend on the > classifier, so they will not be general ways of implementing them without > a lot of tinkering. I

[scikit-learn] Label encoding for classifiers and soft targets

2017-03-11 Thread Javier López Peña
Hi there! I have been recently experimenting with model regularization through the use of soft targets, and I’d like to be able to play with that from sklearn. The main idea is as follows: imagine I want to fit a (probabilisitic) classifier with three possible targets, 0, 1, 2 If I pass my tr