I think that you could you use imbalanced-learn regarding the issue that
you have with the y.
You should be able to wrap your clustering inside the FunctionSampler (
https://github.com/scikit-learn-contrib/imbalanced-learn/pull/342 - we are
on the way to merge it)
On 19 December 2017 at 13:44,
Hey Manuel,
In imbalanced-learn we have an extra type of estimators, named Samplers,
which are able to modify X and y, at the same time, with the use of new API
methods, sample and fit_sample.
Also, we have adopted a modified version of scikit-learn's Pipeline class
where we allow subsequent
Eager to learn! Diving on the code right now!
Thanks for the tip!
Manuel
2017-12-19 14:18 GMT+01:00 Guillaume Lemaître :
> I think that you could you use imbalanced-learn regarding the issue that
> you have with the y.
> You should be able to wrap your clustering inside
Wow, that seems promising. I'll read with interest the imbalance-learn code.
Thanks for the info!
Manuel
2017-12-19 14:15 GMT+01:00 Christos Aridas :
> Hey Manuel,
>
> In imbalanced-learn we have an extra type of estimators, named Samplers,
> which are able to modify X and y,
Hai all,
I am doing text classification. I have around 10 million data to be
classified to around 7k category.
Below is the code I am using
*# Importing the libraries*
*import pandas as pd*
*import nltk*
*from nltk.corpus import stopwords*
*from nltk.tokenize import word_tokenize*
*from
Hi guys,
I'm currently developing a web-interface, and programmatic rest-API for
sklearn. I currently have SVM, and SVR available with some parameters like C,
and gamma exposed:
- https://github.com/jeff1evesque/machine-learning
I'm working a bit to improve the web-interface at the moment.
With as few data points, there is a huge uncertainty in the estimation of
the prediction accuracy with cross-validation. This isn't a problem of
the method, is it a basic limitation of the small amount of data. I've
written a paper on this problem is the specific context of neuroimaging:
Hello,
I am a researcher in fMRI and am using SVMs to analyze brain data. I am
doing decoding between two classes, each of which has 24 exemplars per
class. I am comparing two different methods of cross-validation for my
data: in one, I am training on 23 exemplars from each class, and testing on
At a glance, and perhaps not knowing imbalanced-learn well enough, I have
some doubts that it will provide an immediate solution for all your needs.
At the end of the day, the Pipeline keeps its scope relatively tight, but
it should not be so hard to implement something for your own needs if your
Hi JohnMark,
SVMs, by design, are quite sensitive to the addition of single data points
– but only if those data points happen to lie near the margin. I wrote
about some of those types of details here:
https://jakevdp.github.io/PythonDataScienceHandbook/05.07-support-vector-machines.html
Hope
Hi all.
I’m working for text classification to classify Wikipedia documents. I using a
word count approach to extract feature from my text so I obtain a big
vocabulary that contains all documents word (train dataset) after lemmatization
and deleted stop word. Now I have 7 features. I
It depends what the set of classes is. Best way to find out is to try it...
On 19 December 2017 at 19:36, Luigi Lomasto <
l.loma...@innovationengineering.eu> wrote:
> Hi all.
>
> I’m working for text classification to classify Wikipedia documents. I
> using a word count approach to extract
12 matches
Mail list logo