If you have datasets with many categorical features, and perhaps many
categories, the tools in sklearn are quite limited,
but there are alternative implementations of boosted trees that are
designed with categorical features in mind. Take a look
at catboost [1], which has an sklearn-compatible API.
Hi Sebastian,
I think the random state is used to select the features that go into each
split (look at the `max_features` parameter)
Cheers,
Javier
On Sun, Oct 28, 2018 at 12:07 AM Sebastian Raschka <
m...@sebastianraschka.com> wrote:
> Hi all,
>
> when I was implementing a bagging classifier b
On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux
wrote:
> The reason that pickles are brittle and that sharing pickles is a bad
> practice is that pickle use an implicitly defined data model, which is
> defined via the internals of objects.
>
Plus the fact that loading a pickle can execute arbitrar
On Fri, Sep 28, 2018 at 8:46 PM Andreas Mueller wrote:
> Basically what you're saying is that you're fine with versioning the
> models and having the model break loudly if anything changes.
> That's not actually what most people want. They want to be able to make
> predictions with a given model
On Fri, Sep 28, 2018 at 6:41 PM Andreas Mueller wrote:
> Javier:
> The problem is not so much storing the "model" but storing how to make
> predictions. Different versions could act differently
> on the same data structure - and the data structure could change. Both
> happen in scikit-learn.
> So
On Fri, Sep 28, 2018 at 1:03 AM Sebastian Raschka
wrote:
> Chris Emmery, Chris Wagner and I toyed around with JSON a while back (
> https://cmry.github.io/notes/serialize), and it could be feasible
I came across your notes a while back, they were really useful!
I hacked a variation of it that d
First of all, congratulations on the release, great work, everyone!
I think model serialization should be a priority. Particularly,
I think that (whenever practical) there should be a way of
serializing estimators (either unfitted or fitted) in a text-readable
format,
prefereably JSON or PMML/PFA
Hi Manolo!
Your code looks nice, but my use case is a bit different. I have a mixed
set of parameters, some come from my wrapper,
and some from the wrapped estimator. The logic I am going for is something
like
"If you know about this parameter, then deal with it, if not, then pass it
along to the
How could I make mixins work in this case?
If I define the class `FancyEstimatorMixin`, in order to get a drop-in
replacement for a sklearn
object wouldn't I need to monkey-patch the scikit-learn `BaseEstimator`
class to inherit from my mixin?
Or am I misunderstanding something?
(BTW monkey-patch
I have a class `FancyEstimator(BaseEstimator, MetaEstimatorMixin): ...`
that wraps
around an arbitrary sklearn estimator to add some functionality I am
interested about.
This class contains an attribute `self.estimator` that contains the wrapped
estimator.
Delegation of the main methods, such as `f
Hi Thomas,
it is possible to obtain the activation values of any hidden layer, but the
procedure is not completely straight forward. If you look at the code of
the `_predict` method of MLPs you can see the following:
```python
def _predict(self, X):
"""Predict using the trained model
Why would the square of a real number ever be negative?
I believe the "quirk" in python is just operator precedence,
as the power gets evaluated before applying the unary "-"
On Sun, Oct 8, 2017 at 11:34 AM Joel Nothman wrote:
> (normalize(X) * normalize(X)).sum(axis=1) works fine here.
>
> But
> On 13 Mar 2017, at 21:18, Andreas Mueller wrote:
>
> No, if all the samples are normalized and your aggregation function is sane
> (like the mean), the output will also be normalised.
You are completely right, I hadn’t checked this for random forests.
Still, my purpose is to reduce model com
> You could use a regression model with a logistic sigmoid in the output layer.
By training a regression network with logistic activation the outputs do not
add to 1.
I just checked on a minimal example on the iris dataset.
___
scikit-learn mailing lis
Hi Giles,
thanks for the suggestion!
Training a regression tree would require sticking some kind of
probability normaliser at the end to ensure proper probabilities,
this might somehow hurt sharpness or calibration.
Unfortunately, one of the things I am trying to do
with this is moving away fr
> On 12 Mar 2017, at 18:38, Gael Varoquaux
> wrote:
>
> You can use sample weights to go a bit in this direction. But in general,
> the mathematical meaning of your intuitions will depend on the
> classifier, so they will not be general ways of implementing them without
> a lot of tinkering.
I
Hi there!
I have been recently experimenting with model regularization through the use of
soft targets,
and I’d like to be able to play with that from sklearn.
The main idea is as follows: imagine I want to fit a (probabilisitic)
classifier with three possible
targets, 0, 1, 2
If I pass my tr
17 matches
Mail list logo