hka.com/)
>
> Staff Research Engineer at Lightning AI, https://lightning.ai
>
> On May 28, 2024 at 9:43 AM -0500, Sole Galli via scikit-learn
> , wrote:
>
>> Hi guys,
>>
>> I'd like to understand why sklearn's implementation of tf-idf is different
>> fro
Hi guys,
I'd like to understand why sklearn's implementation of tf-idf is different from
the standard textbook notation as described in the docs:
https://scikit-learn.org/stable/modules/feature_extraction.html#tfidf-term-weighting
Do you have any reference that I could take a look at? I didn't
Hey team,
I am going over the TargetEncoder documentation and I want to make sure I
understand this correctly.
Is the intention of fit_transform's cross fit just to understand/ analyse /
determine somehow how this transformer would perform?
Because if I got this right, the attribute values
Hello,
I would like to obtain final intervals from the decision tree structure. I am
not interested in every node, just the limits that take a sample to a final
decision /leaf.
For example, if the tree structure is this one:
|--- feature_0 <= 0.08
| |--- class: 0
|--- feature_0 > 0.08
|
Hey,
My understanding is that with sklearn you can compare 2 continuous variables
like this:
mutual_info_regression(data["var1"].to_frame(), data["var"],
discrete_features=[False])
Where var1 and var are continuous.
You can also compare multiple continuous variables against one continuous
Did you try:
pipeline.named_steps["the_string_name_for_knn"].kneighbours
?
pipeline should be replaced by the name you gave to your pipeline and the
string in named_steps is the name you have to the knn when setting the pipe.
Sole
Sent with Proton Mail secure email.
--- Original Message
Maybe with numpy.set_printoptions?
See thread here:
https://stackoverflow.com/questions/1987694/how-to-print-the-full-numpy-array-without-truncation
Soledad Galli
https://www.trainindata.com/
Sent with Proton Mail secure email.
--- Original Message ---
On Friday, May 13th, 2022 at
Hello community,
Say I have a pipeline with 3 data transformations, i.e., SimpleImputer,
OrdinalEncoder and StandardScaler, and a Lasso at the end. And I want to obtain
a copy of the transformed data that would be input to the Lasso.
Is there a way other than selecting all the steps of the
Nicolas
>
> On 27/07/2021 10:22, Guillaume Lemaître wrote:
>
>>> On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn
>>> [](mailto:scikit-learn@python.org)
>>> wrote:
>>>
>>> Hello community,
>>>
>>> Do I understand correctl
:
> > On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn
> > scikit-learn@python.org wrote:
> >
> > Hello community,
> >
> > Do I understand correctly that Random Forests are trained as a 1 vs rest
> > when the target has more than 2 classes? S
Hello community,
Do I understand correctly that Random Forests are trained as a 1 vs rest when
the target has more than 2 classes? Say the target takes values 0, 1 and 2,
then the model would train 3 estimators 1 per class under the hood?.
The predict_proba output is an array with 3 columns,
The FunctionTransformer will apply the transformation coded your function to
the entire dataset passed to the transform() method.
I find it hard to see how this could work to add additional columns to the
dataset, but I guess it might depend on how you designed your function.
Did you try
of why each test is
important, and the consequences of failing this or that test. At least it would
be useful for me :p
Thank you!
Sole
‐‐‐ Original Message ‐‐‐
On Monday, May 10, 2021 3:28 PM, Sole Galli via scikit-learn
wrote:
> Hello everyone,
>
> I am trying to get Featu
Hello everyone,
I am trying to get Feature-engine transformers pass the check_estimator tests
and there is one test, that I am not too sure what it is intended for.
The transformers fail the check_transformer_data_not_an_array because the input
is a _NotAnArray class, and Feature-engine
Hello team,
I am reading in some of the MICE original articles that supposedly, each
variable should be modelled upon the other ones in the data, with a suitable
model. So for example, if the variable with NA is binary, it should be modelled
with classification, or if continuous with a
gnostic. Arguably, allowing a dict with actual class values violates
>> the above argument (of not having data-related stuff in init), so I guess
>> that's where the logic ends ;)
>>
>> As to why one would use both, I'm not so sure honestly.
>>
>> Nico
https://stackoverflow.com/questions/30972029/how-does-the-class-weight-parameter-in-scikit-learn-work/30982811#30982811
Soledad Galli
https://www.trainindata.com/
‐‐‐ Original Message ‐‐‐
On Thursday, December 3, 2020 11:55 AM, Sole Galli via scikit-learn
wrote:
> Hello team,
>
Hello team,
What is the difference in the implementation of class_weight and sample_weight
in those algorithms that support both? like random forest or logistic
regression?
Are both modifying the loss function? in a similar way?
Thank you!
Sole___
Thank you guys, that was actually very helpful.
Best regards
Sole
Soledad Galli
https://www.trainindata.com/
‐‐‐ Original Message ‐‐‐
On Tuesday, November 17th, 2020 at 10:54 AM, Roman Yurchak
wrote:
> On 17/11/2020 09:57, Sole Galli via scikit-learn wrote:
>
> > And
Hello team,
I am trying to understand why does logistic regression return uncalibrated
probabilities with values tending to low probabilities for the positive (rare)
cases, when trained on an imbalanced dataset.
I've read a number of articles, all seem to agree that this is the case, many
Did you have a look at the package feature-engine? It has its own imputers and
encoders that allow you to select the columns to transform and returns a
dataframe. It also has a sklear wrapper that wraps sklearn transformers so that
they return a dataframe instead of a numpy array.
Cheers.
Hi Olivier, Gabriel, and further team,
Thank you so much for your views.
I understand enforcement is an issue. And I don't have yet an answer on if and
how the license could be enforced.
I also think that this is a second step. First would be making the use of the
software illegal. This would
Hello Scikit-learn team,
I've come across this:
https://twitter.com/tristanharris/status/1277136696568508418?s=12
Basically, it is an initiative to include in software license a prohibition of
use by fossil fuel extractivist companies.
I would like to know your views on this? Is this something
23 matches
Mail list logo