This would be much clearer if you provided some code, but I think I get
what you're saying.
The final GridSearchCV model is trained on the full training set, so the
fact that it perfectly fits that data with random forests is not altogether
surprising. What you can say about the parameters is
On 7 May 2016 at 19:12, Matthias Feurer
wrote:
> 1. Return the fit and predict time in `grid_scores_`
>
This has been proposed for many years as part of an overhaul of
grid_scores_. The latest attempt is currently underway at
'll attempt a more rigorous test later this week and report
> back. Thanks!
>
> Juan.
>
> On Wed, Apr 13, 2016 at 10:21 AM, Joel Nothman <joel.noth...@gmail.com>
> wrote:
>
>> It's hard to believe this is a software problem rather than a data
>> problem. If y
It's hard to believe this is a software problem rather than a data problem.
If your data was accidentally a duplicate of the dataset, you could
certainly get 100%.
On 13 April 2016 at 10:10, Juan Nunez-Iglesias wrote:
> Hallelujah! I'd given up on this thread. Thanks for
Yes, there are no doubt more efficient ways to store forests, but it
seems unlikely to be a worthwhile investment.
I think this is a documentation rather than an engineering issue. We
frequently get issues raised that relate to "size": runtime, memory
consumption, model size on disk,
I think you should submit these changes as a pull request. Thanks, Jared.
On 8 April 2016 at 21:17, Jared Gabor wrote:
> I recently modified the kernel density estimation routines in
> sklearn/neighbors to include optional weighting of the training samples (to
> make
ifiers and I'm taking into account only classifiers
> that are returning 'Yes'. So I could make multilabelled classification with
> my own dataset.
>
> I can evaluate precision, recall and f-measure values for each classifier(for
> each category) but how can I test my all dataset(all cl
OneVsRestClassifier already implements Binary Relevance. What is unclear
about our documentation on model evaluation and metrics?
On 25 March 2016 at 00:13, Enise Basaran wrote:
> Hi everyone,
>
> I want to learn binary classifier evaluation metrics please. I implemented
I think all the scikit-learn devs know that the serialisation available in
scikit-learn is inadequate, and recommend storing training data and model
parameters.
Designing a serialisation format that is robust to future changes is a huge
engineering effort, and is likely to result in one of: (a) a
And I lied that none of the scikit-learn estimators define their own
get_params. Of course the following do: VotingClassifier, Kernel (and
subclasses), Pipeline and FeatureUnion
On 23 March 2016 at 15:04, Joel Nothman <joel.noth...@gmail.com> wrote:
> something like the following ma
something like the following may suffice:
def get_params(self, deep=True):
out = super(WordCooccurrenceVectorizer, self).get_params(deep=deep)
out['w2v_clusters'] = self.w2v_clusters
return out
On 23 March 2016 at 15:01, Joel Nothman <joel.noth...@gmail.com> wrote:
> Hi Fre
Hi Fred,
We use the __init__ signature to get the list of parameters that (a) can be
set by grid search; (b) need to be copied to a cloned instance of the
estimator (with any fitted model discarded) in constructing ensembles,
cross validation, etc. While none of the scikit-learn library of
Currently there is no automatic mechanism for eliminating the generation of
features that are not selected downstream. It needs to be achieved manually.
On 15 March 2016 at 08:05, Philip Tully wrote:
> Hi,
>
> I'm trying to optimize the time it takes to make a prediction with
We should probably be escaping feature names internally. It's easy to
forget that graphviz supports HTML-like markup.
On 14 March 2016 at 08:00, Andreas Mueller wrote:
> Try escaping the &.
>
> On 03/12/2016 02:57 PM, Raphael C wrote:
> > The code snippet should have been
> >
86
>
> (I hope I got it right this time!)
>
> In any case, I am not finding any literature describing this, and I am
> also not proposing to add it to sickit-learn, just wanted to get some info
> whether this is implemented or not. Thanks! :)
>
>
>
> > On Mar 8, 2016
is is actually very similar to the F1
> score. But instead of computing the harmonic mean between “precision and
> the true positive rate), we compute the harmonic mean between "precision
> and true negative rate"
>
> > On Mar 8, 2016, at 6:40 PM, Joel Nothman <joel.noth...
(Although multiloutput accuracy is reasonable to support.)
On 9 March 2016 at 12:29, Joel Nothman <joel.noth...@gmail.com> wrote:
> Firstly, balanced accuracy is a different thing, and yes, it should be
> supported.
>
> Secondly, I am correct in thinking you're talkin
I've not seen this metric used (references?). Am I right in thinking that
in the binary case, this is identical to accuracy? If I predict all
elements to be the majority class, then adding more minority classes into
the problem increases my score. I'm not sure what this metric is getting at.
On 8
What estimator(s) are you searching over? How big is your data?
On 24 February 2016 at 06:15, Stylianos Kampakis <
stylianos.kampa...@gmail.com> wrote:
> Hi everyone,
>
> Sometimes, when I am using random search with n_jobs>1 the processing
> stops. I am on a Mac. I went through some discussions
If not stack overflow, the appropriate venue for such questions is the
scikit-learn-general mailing list.
The current dbscan implementation is by default not memory efficient,
constructing a full pairwise similarity matrix in the case where
kd/ball-trees cannot be used (e.g. with sparse
It's not clear *why* you're doing this. The model will automatically
recluster the subclusters after identifying them, as long as you specify
either a number of clusters or a clustering model to the n_clusters
parameter. Can you fit this post-processing into that "final clustering"
framework?
On
How many distinct words are in your dataset?
On 27 January 2016 at 00:21, Rockenkamm, Christian <
c.rockenk...@stud.uni-goettingen.de> wrote:
> Hallo,
>
>
> I have question concerning the Latent Dirichlet Allocation. The results I
> get from using it are a bit confusing.
>
> At first I use about
safe_sqr applies when its operand may be a sparse matrix. In theory this
could be true of coef_, but I don't think this is tested as often as it
might be.
But, in general, you should not take what is done in any particular piece
of code to be indicative of best practice. There are often multiple
I think you've misunderstood this one, Sören. This sounds like it is a
structured learning problem, where the steps are the "target" of the
learning task, and the result is the input example.
Take, for instance, the natural language processing task of dependency
parsing.
The "result" of some
I have many times committed coded and had to fix for python 2.6.
FWIW: features that I have had to remove include format strings with
implicit arg numbers, set literals, dict comprehensions, perhaps ordered
dicts / counters. We are already clandestinely using argparse in benchmark
code.
Most of
But check that the version you are using in the appropriate Python instance
is correct. For example:
python -c 'import sklearn; print(sklearn.__version__)'
On 2 December 2015 at 16:24, Sumedh Arani
wrote:
> Greetings!!
>
> I've used pip install --upgrade
Labels weren't available for PRs until relatively recently. I think the
status and its meaning would be clearer with such tags.
On 2 December 2015 at 15:16, Andreas Mueller wrote:
> Yeah that was the intention of [MRG]. Though it might be easier to
> filter by tag.
> No strong
If you are treating your Logistic Regression output as binary (i.e. not
using predict_proba or decision_function), could you please provide the
confusion matrix?
On 26 November 2015 at 05:06, Herbert Schulz wrote:
> Hi, i think i have some "missunderstanding" due to the
Changes to support this case have recently been merged into master, and an
example is on its way:
https://github.com/scikit-learn/scikit-learn/issues/5589
I think you should be able to run your code by importing GridSearchCV,
cross_val_score and StratifiedShuffleSplit from the new
Yes, simply set n_clusters=KMeans(). In fact, it's a pity we don't have an
example of this feature in the examples gallery and contributions are
welcome!
On 14 October 2015 at 23:27, Dženan Softić wrote:
> Hi,
>
> I would like to change the global step of BIRCH algorithm to
RFECV will select features based on scores on a number of validation sets,
as selected by its cv parameter. As opposed to that StackOverflow query,
RFECV should now support RandomForest and its feature_importances_
attribute.
On 7 October 2015 at 18:16, Raphael C wrote:
> I
See http://scikit-learn.org/stable/auto_examples/plot_roc.html
On 6 October 2015 at 17:56, aravind ramesh wrote:
> Dear All,
>
> I want to compare my new svm model generated with already published model.
>
> I generated required features and got the prediction labels for
Hi Mira,
I think the community is very interested in this work, but you might
consider collaborating with https://github.com/alex-pirozhenko/sklearn-pmml.
Its support for models is limited to trees and their ensembles, but it also
includes a test harness (
In terms of memory: I gather joblib.parallel is meant to automatically
memmap large arrays (>100MB). However, then each subprocess will extract a
non-contiguous set of samples from the data for training under a
cross-validation regime. Would I be right in thinking that's where the
memory blowout
And anyone looking for a small contribution to make could take on
https://github.com/scikit-learn/scikit-learn/issues/5281
On 22 September 2015 at 10:24, Andreas Mueller wrote:
> The list is currently pretty long:
>
>
A reflective response without a clear opinion:
I'll admit to rarely-if-ever using function versions, and suspect they
frequently have limited utility over the estimator interface. Occasionally
they even wrap the estimator interface, so they're not going to provide the
efficiency advantages Gaël
:33 PM, Joel Nothman joel.noth...@gmail.com
wrote:
+1
On 28 August 2015 at 04:23, Andreas Mueller t3k...@gmail.com wrote:
I think it would be fine to enable it now without support in all solvers.
On 8/27/2015 11:29 AM, Valentin Stolbunov wrote:
Joel, I see you've done some work in that PR
A Cite me with duecredit sash on the opposite corner to Fork me on
github? ;)
On 30 August 2015 at 14:36, Mathieu Blondel math...@mblondel.org wrote:
On Sun, Aug 30, 2015 at 7:27 AM, Yaroslav Halchenko s...@onerussian.com
wrote:
As long as installation is straightforward, I think it
The randomisation only changes the order of the data, not the set of data
points.
On 27 August 2015 at 22:44, Andrew Howe ahow...@gmail.com wrote:
I'm working through the tutorial, and also experimenting kind of on my
own. I'm on the text analysis example, and am curious about the relative
them in the other two solvers via the rough
steps I outlined earlier?
On Wed, Aug 26, 2015 at 9:59 PM, Andy t3k...@gmail.com wrote:
On 08/26/2015 09:29 PM, Joel Nothman wrote:
I agree. I suspect this was an unintentional omission, in fact.
Apart from which, sample_weight support
I agree. I suspect this was an unintentional omission, in fact.
Apart from which, sample_weight support in liblinear could be merged from
https://github.com/scikit-learn/scikit-learn/pull/2784 which is dormant,
and merely needs some core contributors to show interest in merging it...
On 27
I suspect supporting PMML import is a separate and low-priority project.
Higher priority is support for transformers (in pipelines / feature
unions), other predictors, and tests that verify the model against an
existing PMML predictor.
On 21 August 2015 at 01:37, Dale Smith dsm...@nexidia.com
Frequently the suggestion of supporting PMML or similar is raised, but it's
not clear whether such models would be importable in to scikit-learn, or
how to translate scikit-learn transformation pipelines into its notation
without going mad, etc. Still, even a library of exporters for individual
See https://github.com/scikit-learn/scikit-learn/issues/1596
On 19 August 2015 at 16:35, Joel Nothman joel.noth...@gmail.com wrote:
Frequently the suggestion of supporting PMML or similar is raised, but
it's not clear whether such models would be importable in to scikit-learn,
or how
Please make a pull request. This looks like a small and useful change,
consistent with Lasso's support of non-negativity.
On 18 August 2015 at 14:30, Michael Graber michigra...@gmail.com wrote:
Dear all,
I extended the lars_path, Lars and LarsLasso estimators in the
scikit-learn
This is a known scipy deficiency. See
https://github.com/scipy/scipy/pull/4821 and related issues.
On 15 August 2015 at 05:37, Jason Sanchez jason.sanchez.m...@statefarm.com
wrote:
This code raises a PicklingError:
from sklearn.datasets import load_boston
from sklearn.pipeline import
While it's not bad to have more people know the internals of the tree code,
ideally people shouldn't *have* to. Do you have any hints for how
documentation could better serve users to not land in whatever trap you did?
On 15 August 2015 at 16:03, Simon Burton si...@arrowtheory.com wrote:
My
I find that list somewhat obscure, and reading your section on Code
Authorship gives me some sense of why. All of those people have been very
important contributors to the project, and I'd think the absence of Gaël,
Andreas and Olivier alone would be very damaging, if only because of their
calls during training. But that may
most probably be compensated as the number of queries grow since 2**b *
n_estimators is a constant time.
I'll send a PR with proper refactoring.
On Sun, Aug 2, 2015 at 6:41 PM, Joel Nothman joel.noth...@gmail.com
wrote:
Thanks, I look forward to this being
on this but I think I'll
need your or some other contributers' reviewing as well . I'll do this if
it's possible.
On Sun, Aug 2, 2015 at 3:50 AM, Joel Nothman joel.noth...@gmail.com
wrote:
@Maheshakya, will you be able to do work in the near future on speeding
up the ascending phase instead? Or should
the most fundamental component of LSHForest.
On 30 July 2015 at 22:28, Joel Nothman joel.noth...@gmail.com wrote:
(sorry, I should have said the first b layers, not 2**b layers, producing
a memoization of 2**b offsets)
On 30 July 2015 at 22:22, Joel Nothman joel.noth...@gmail.com wrote:
One
What makes you think this is the main bottleneck? While it is not an
insignificant consumer of time, I really doubt this is what's making
scikit-learn's LSH implementation severely underperform with respect to
other implementations.
We need to profile. In order to do that, we need some sensible
,
and makes the searchsorted calls run in log(n / (2 ** b)) time rather than
log(n). It is also much more like traditional LSH. However, it complexifies
the code, as we now have to consider two strategies for descent/ascent.
On 30 July 2015 at 21:46, Joel Nothman joel.noth...@gmail.com wrote:
What makes
(sorry, I should have said the first b layers, not 2**b layers, producing a
memoization of 2**b offsets)
On 30 July 2015 at 22:22, Joel Nothman joel.noth...@gmail.com wrote:
One approach to fixing the ascending phase would ensure that
_find_matching_indices is only searching over parts
This isn't directly a problem with RFECV, it's a problem with what you
provided as an argument to `scoring`. I suspect you provided a function
with signature fn(y_true, y_pred) - score, where what is required is a
function fn(estimator, X, y_true) - score. See
TfidfVectorizer is just CountVectorizer followed by a TfidfTransformer. The
Tfidf transformation tends to be cheap relative to tokenization which is
independent of what corpus you want to calculate TF.IDF over. If I
understand correctly, you can perform CountVectorizer on all of your
documents,
oh, I missed that one from Omer Levy's debunking word2vec series. Nice!
On 1 July 2015 at 23:52, Mathieu Blondel math...@mblondel.org wrote:
On Wed, Jul 1, 2015 at 8:43 PM, Dale Smith dsm...@nexidia.com wrote:
Apparently so; here is a python/cython implementation.
It's a problem of excessive memory consumption due to a O(# possible
parameter settings) approach to sampling from discrete parameter grids
without replacement.
The fix was merged into master only hours ago. Please feel free to work
with master, or to cherry-pick febefb0
On 25 June 2015 at
:47 PM, Joel Nothman joel.noth...@gmail.com
wrote:
What estimators have predict with multiple args? Without support for same
in cross validation routines and scorers, isn't t easier to write this
functionality in custom code as you need it, leaving the predictor off the
Pipeline?
On 25 June
Across models, weights should be implemented such that duplicating samples
would give identical results to corresponding integer weights. That is true
here, to my understanding, if we remove the stochasticity such that all
identical samples have their update occur at once.
On 25 June 2015 at
What estimators have predict with multiple args? Without support for same
in cross validation routines and scorers, isn't t easier to write this
functionality in custom code as you need it, leaving the predictor off the
Pipeline?
On 25 June 2015 at 06:06, Michael Kneier michael.kne...@gmail.com
To me, those numbers appear identical at 2 decimal places.
On 17 June 2015 at 23:04, Herbert Schulz hrbrt@gmail.com wrote:
Hello everyone,
i wrote a function to calculate the sensitivity,specificity, ballance
accuracy and accuracy from a confusion matrix.
Now i have a Problem, I'm
, or is
the precision in this case the sensitivity?
On 17 June 2015 at 15:29, Andreas Mueller t3k...@gmail.com wrote:
Yeah that is the rounding of using %2f in the classification report.
On 06/17/2015 09:20 AM, Joel Nothman wrote:
To me, those numbers appear identical at 2 decimal places
I think it gets a bit noisier when using n_jobs != 1, as verbose is passed
to joblib.Parallel. I agree that it's not a very controllable or
well-documented setting.
On 16 June 2015 at 13:24, Adam Goodkind a.goodk...@gmail.com wrote:
Right. Thank you. I guess I was just overwhelmed by the amount
See the sample_size parameter: silhouette score can be calculated on a
random subset of the data, presumably for efficiency. Feel free to submit a
PR improving the docstring.
On 16 June 2015 at 13:54, Sebastian Raschka se.rasc...@gmail.com wrote:
Hi, all,
I am a little bit confused about the
Until sample_weight is directly supported in Pipeline, you need to prefix
`sample_weight` by the step name with '__'. So for Pipeline([('a', A()),
('b', B())] use fit_params={'a__sample_weight': sample_weight,
'b__sample_weight': sample_weight} or similar.
HTH
On 10 June 2015 at 03:57, José
Just a quick note that I've been silent lately because I've been Busy With
Life, but also because github was notifying an email address hosted at my
previous employer, which was deactivated a fortnight ago. If there were
issues that sought my particular attention, please let me know.
As at
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html
Prior to shuffling, `X` stacks a number of these primary informative
features, redundant linear combinations of these, repeated
duplicates
of sampled features, and arbitrary noise for and
noise in flip_y) across classes with respect to the
informative features.
On 28 May 2015 at 19:57, Joel Nothman joel.noth...@gmail.com wrote:
As at
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html
Prior to shuffling, `X` stacks a number
Sorry, I meant https://github.com/scikit-learn/scikit-learn/issues/4301
On 18 May 2015 at 12:10, Joel Nothman joel.noth...@gmail.com wrote:
Sorry, grid search (and similar) does not support clusterers. This
probably should be formally tracked as an issue.
https://github.com/scikit-learn
Sorry, grid search (and similar) does not support clusterers. This probably
should be formally tracked as an issue.
https://github.com/scikit-learn/scikit-learn/issues/4040 might be helpful
to you.
On 18 May 2015 at 11:56, Jitesh Khandelwal jk231...@gmail.com wrote:
I have recently been using
Hi Sam,
I think this could be interesting. You could allow for learning parameters
on each sub-cluster by accepting a transformer as a parameter, then using
sample = sklearn.base.clone(transformer).fit_transform(sample).
I suspect bisecting k-means is notable enough and different enough for
What Sebastian and Ronnie said. Plus: there are multiple off-the-shelf
neural net pull requests in the process of review, notably those by Issam
Laradji for GSoC 2014. Extreme Learning Machines and Multilayer Perceptrons
should be merged Real Soon Now.
On 7 May 2015 at 14:58, Ronnie Ghose
The algorithm isn't the issue so much as defining a metric that measures
the distance or affinity between items, or else finding a way to reduce
your data to a more standard metric space.
I have for instance clustered sets of objects by first minhashing them (an
approximate dim reduction for
Yes, this is not a probabilistic method.
On 29 April 2015 at 14:56, C K Kashyap ckkash...@gmail.com wrote:
Works like a charm. Just noticed though that the max value is sometimes
more than 1.0 is that okay?
Regards,
Kashyap
On Wed, Apr 29, 2015 at 10:12 AM, Joel Nothman joel.noth
elaborate on the code please?
What would be dataset.target_names and dataset.target in my case -
http://lpaste.net/131649
Regards,
Kashyap
On Wed, Apr 29, 2015 at 3:08 AM, Joel Nothman joel.noth...@gmail.com
wrote:
This shows the newsgroup name and highest scoring topic for each doc.
zip
On Wed, Apr 29, 2015 at 9:45 AM, Joel Nothman joel.noth...@gmail.com
wrote:
Highest ranking topic for each doc is just np.argmax(nmf.transform(tfidf),
axis=1).
This is because nmf.transform
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html
This shows the newsgroup name and highest scoring topic for each doc.
zip(np.take(dataset.target_names, dataset.target),
np.argmax(nmf.transform(tfidf), axis=1))
I think something based on this should be added to the example.
On 29 April 2015 at 07:01, Andreas Mueller t3k...@gmail.com wrote:
I assume you have checked that combine_train_test_dataset produces data of
the correct dimensions in both X and y.
I would be very surprised if the problem were not in PAA, so check it
again: make sure that you test that PAA().fit(X1).transform(X2) gives the
transformation of X2. The error seems
I suspect this method is underreported by any particular name, as it's a
straightforward greedy search. It is also very close to what I think many
researchers do in system development or report in system analysis, albeit
with more automation.
In the case of KNN, I would think metric learning
On 17 April 2015 at 13:52, Daniel Vainsencher daniel.vainsenc...@gmail.com
wrote:
On 04/16/2015 05:49 PM, Joel Nothman wrote:
I more or less agree. Certainly we only need to do one searchsorted per
query per tree, and then do linear scans. There is a question of how
close we stay
Although I note that I've got LaTeX compilation errors, so I'm not sure how
Andy compiles this.
On 16 April 2015 at 20:25, Joel Nothman joel.noth...@gmail.com wrote:
I've proposed a better chapter ordering at
https://github.com/scikit-learn/scikit-learn/pull/4602...
On 16 April 2015 at 03:48
I've proposed a better chapter ordering at
https://github.com/scikit-learn/scikit-learn/pull/4602...
On 16 April 2015 at 03:48, Andreas Mueller t3k...@gmail.com wrote:
Hi.
Yes, run make latexpdf in the doc folder.
Best,
Andy
On 04/15/2015 01:11 PM, Tim wrote:
Thanks, Andy!
How do
for the n_candidates with the lowest hamming distances.
This should achieve a pretty good sweet spot of performance, with just a
bit of Cython.
Daniel
On 04/16/2015 12:18 AM, Joel Nothman wrote:
Once we're dealing with large enough index and n_candidates, most time
is spent in searchsorted
I agree this is disappointing, and we need to work on making LSHForest
faster. Portions should probably be coded in Cython, for instance, as the
current implementation is a bit circuitous in order to work in numpy. PRs
are welcome.
LSHForest could use parallelism to be faster, but so can (and
Oh. Silly mistake. Doesn't break with the correct patch, now at PR#4604...
On 16 April 2015 at 14:24, Joel Nothman joel.noth...@gmail.com wrote:
Except apparently that commit breaks the code... Maybe I've misunderstood
something :(
On 16 April 2015 at 14:18, Joel Nothman joel.noth
. Try d500, n_points10, I don't remember
the switchover point.
The documentation should make this clear, but unfortunately I don't see
that it does.
On Apr 15, 2015 7:08 PM, Joel Nothman joel.noth...@gmail.com wrote:
I agree this is disappointing, and we need to work on making LSHForest
Except apparently that commit breaks the code... Maybe I've misunderstood
something :(
On 16 April 2015 at 14:18, Joel Nothman joel.noth...@gmail.com wrote:
ball tree is not vectorized in the sense of SIMD, but there is
Python/numpy overhead in LSHForest that is not present in ball tree.
I
Use preprocessing.StandardScaler()'s transform and inverse_transform
methods.
HTH!
On 14 April 2015 at 19:06, Souad Chaabouni chaabouni_so...@yahoo.fr wrote:
Hello,
Im beginner,
I have an image which i done a preprocessing with sklearn
img_scaled = preprocessing.scale(img)
my question
Ignoring the class label 'O' from evaluation will be possible with #4287
https://github.com/scikit-learn/scikit-learn/pull/4287 merged
On 14 April 2015 at 11:43, namma igloo nammaig...@outlook.com wrote:
I was removing the class 'O' (other) from labels as given in the
python-crfsuite example
Or report macro and micro in classification_report. Micro is equivalent to
accuracy for multiclass without #4287
https://github.com/scikit-learn/scikit-learn/pull/4287.
On 10 April 2015 at 01:00, Andreas Mueller t3k...@gmail.com wrote:
Hi Jack.
You mean in the classification report?
That
Issam Laradji implemented a multilayer perceptron and extreme learning
machines for last year's GSoC. Both are awaiting final reviews before being
merged. They should be functional and can be found in the Issue Tracker.
On 7 April 2015 at 21:09, Vlad Ionescu ionescu.vl...@gmail.com wrote:
On 25 March 2015 at 00:01, Gael Varoquaux gael.varoqu...@normalesup.org
wrote:
To make this more concrete, the MetricLearner().metric_ estimator would
require specialised set_params or clone behaviour, I assume. I.e. it
involves hacking API fundamentals.
It's more a general principle of
-
https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:-Multiple-metric-support-for-CV-and-grid_search-and-other-general-improvements
.
Possible mentors : Andreas Mueller (amueller) and Joel Nothman
(jnothman)
Any feedback/suggestions/additions/deletions would be awesome
On 24 March 2015 at 23:56, Gael Varoquaux gael.varoqu...@normalesup.org
wrote:
So I just thought: what if metric learners will have an attribute
`metric`
Before adding features and API entries, I'd really like to focus on
having a 1.0 release, with a fixed API that really solves the
Hi Artem, I've taken a look at your proposal. I think this is an
interesting contribution, but I suspect your proposal is far too ambitious:
- The proposal doesn't well account for the need to receive reviews and
alter the PR in accordance. This is especially so because you are
GSOC isn't the best way to get started. We recommend you get to know the
code structure, API and development process by starting with issues
labelled https://github.com/scikit-learn/scikit-learn/labels/Easy. In
general, look through the Issue Tracker and find something of interest, or
which has
Are there any objections on Joel's variant of y? It serves my needs, but
is quite different from what one can usually find in scikit-learn.
FWIW It'll require some changes to cross-validation routines.
On 22 March 2015 at 11:54, Artem barmaley@gmail.com wrote:
Are there any objections
This is off-topic, but I should note that there is a patch at
https://github.com/scikit-learn/scikit-learn/pull/2784 awaiting review for
a while now...
On 20 March 2015 at 08:16, Charles Martin charlesmarti...@gmail.com wrote:
I would like to propose extending the linearSVC package
by
I don't know a lot about metric learning either, but it sounded like from
your initial statement that fit(X, D) where D is the target/known distance
between each point in X might be appropriate. I have no idea if this is how
it is formulated in the literature (your mention of asymmetric metrics
1 - 100 of 384 matches
Mail list logo