Hello Scikit-learn team,
I've come across this:
https://twitter.com/tristanharris/status/1277136696568508418?s=12
Basically, it is an initiative to include in software license a prohibition of
use by fossil fuel extractivist companies.
I would like to know your views on this? Is this something
Hi Olivier, Gabriel, and further team,
Thank you so much for your views.
I understand enforcement is an issue. And I don't have yet an answer on if and
how the license could be enforced.
I also think that this is a second step. First would be making the use of the
software illegal. This would
Did you have a look at the package feature-engine? It has its own imputers and
encoders that allow you to select the columns to transform and returns a
dataframe. It also has a sklear wrapper that wraps sklearn transformers so that
they return a dataframe instead of a numpy array.
Cheers.
Sole
Hello team,
I am trying to understand why does logistic regression return uncalibrated
probabilities with values tending to low probabilities for the positive (rare)
cases, when trained on an imbalanced dataset.
I've read a number of articles, all seem to agree that this is the case, many
show
Thank you guys, that was actually very helpful.
Best regards
Sole
Soledad Galli
https://www.trainindata.com/
‐‐‐ Original Message ‐‐‐
On Tuesday, November 17th, 2020 at 10:54 AM, Roman Yurchak
wrote:
> On 17/11/2020 09:57, Sole Galli via scikit-learn wrote:
>
> > And
Hello team,
What is the difference in the implementation of class_weight and sample_weight
in those algorithms that support both? like random forest or logistic
regression?
Are both modifying the loss function? in a similar way?
Thank you!
Sole___
s
eters
https://stackoverflow.com/questions/30972029/how-does-the-class-weight-parameter-in-scikit-learn-work/30982811#30982811
Soledad Galli
https://www.trainindata.com/
‐‐‐ Original Message ‐‐‐
On Thursday, December 3, 2020 11:55 AM, Sole Galli via scikit-learn
wrote:
> Hello team,
>
ption is
>> data-agnostic. Arguably, allowing a dict with actual class values violates
>> the above argument (of not having data-related stuff in init), so I guess
>> that's where the logic ends ;)
>>
>> As to why one would use both, I'm not so sure honestly.
Hello team,
I am reading in some of the MICE original articles that supposedly, each
variable should be modelled upon the other ones in the data, with a suitable
model. So for example, if the variable with NA is binary, it should be modelled
with classification, or if continuous with a regressi
Hello everyone,
I am trying to get Feature-engine transformers pass the check_estimator tests
and there is one test, that I am not too sure what it is intended for.
The transformers fail the check_transformer_data_not_an_array because the input
is a _NotAnArray class, and Feature-engine transfo
why each test is
important, and the consequences of failing this or that test. At least it would
be useful for me :p
Thank you!
Sole
‐‐‐ Original Message ‐‐‐
On Monday, May 10, 2021 3:28 PM, Sole Galli via scikit-learn
wrote:
> Hello everyone,
>
> I am trying to get Featu
The FunctionTransformer will apply the transformation coded your function to
the entire dataset passed to the transform() method.
I find it hard to see how this could work to add additional columns to the
dataset, but I guess it might depend on how you designed your function.
Did you try passin
Hello community,
Do I understand correctly that Random Forests are trained as a 1 vs rest when
the target has more than 2 classes? Say the target takes values 0, 1 and 2,
then the model would train 3 estimators 1 per class under the hood?.
The predict_proba output is an array with 3 columns, co
:
> > On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn
> > scikit-learn@python.org wrote:
> >
> > Hello community,
> >
> > Do I understand correctly that Random Forests are trained as a 1 vs rest
> > when the target has more than 2 classes? Say the tar
>
> Nicolas
>
> On 27/07/2021 10:22, Guillaume Lemaître wrote:
>
>>> On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn
>>> [](mailto:scikit-learn@python.org)
>>> wrote:
>>>
>>> Hello community,
>>>
>>> Do I understan
Hello community,
Say I have a pipeline with 3 data transformations, i.e., SimpleImputer,
OrdinalEncoder and StandardScaler, and a Lasso at the end. And I want to obtain
a copy of the transformed data that would be input to the Lasso.
Is there a way other than selecting all the steps of the pipe
Maybe with numpy.set_printoptions?
See thread here:
https://stackoverflow.com/questions/1987694/how-to-print-the-full-numpy-array-without-truncation
Soledad Galli
https://www.trainindata.com/
Sent with Proton Mail secure email.
--- Original Message ---
On Friday, May 13th, 2022 at 10:35
Did you try:
pipeline.named_steps["the_string_name_for_knn"].kneighbours
?
pipeline should be replaced by the name you gave to your pipeline and the
string in named_steps is the name you have to the knn when setting the pipe.
Sole
Sent with Proton Mail secure email.
--- Original Message
Hey,
My understanding is that with sklearn you can compare 2 continuous variables
like this:
mutual_info_regression(data["var1"].to_frame(), data["var"],
discrete_features=[False])
Where var1 and var are continuous.
You can also compare multiple continuous variables against one continuous
va
Hello,
I would like to obtain final intervals from the decision tree structure. I am
not interested in every node, just the limits that take a sample to a final
decision /leaf.
For example, if the tree structure is this one:
|--- feature_0 <= 0.08
| |--- class: 0
|--- feature_0 > 0.08
| |
Hey team,
I am going over the TargetEncoder documentation and I want to make sure I
understand this correctly.
Is the intention of fit_transform's cross fit just to understand/ analyse /
determine somehow how this transformer would perform?
Because if I got this right, the attribute values (ca
Hi guys,
I'd like to understand why sklearn's implementation of tf-idf is different from
the standard textbook notation as described in the docs:
https://scikit-learn.org/stable/modules/feature_extraction.html#tfidf-term-weighting
Do you have any reference that I could take a look at? I didn't
nraschka.com/)
>
> Staff Research Engineer at Lightning AI, https://lightning.ai
>
> On May 28, 2024 at 9:43 AM -0500, Sole Galli via scikit-learn
> , wrote:
>
>> Hi guys,
>>
>> I'd like to understand why sklearn's implementation of tf-idf is differen
Hello everyone,
I am running an undersampling algorithm from imblearn, and at the back I want
to train several gbms (catboost, xgb, lgbm, etc) with successivehalving to
optimize the hyperparams.
Based on imblearn docs, I need to set up the undersampler within a pipeline,
but that's innefficien
Hey team,
In RandomisedSearchCV or SuccessiveHalving, we can pass indexes in the fold
parameter, if we want to test the hyperparameters on specific folds.
Say I have a dataset of 12 rows, indexes 1 to 12, and I pass as fold to the
randomized search or SH the following folds:
[1,2] [9, 10, 11,
25 matches
Mail list logo