Re: [scikit-learn] help

2018-06-16 Thread Joel Nothman
e same deprecation warning, though I don't understand why as I am using > model_evaluation now. Regardless, I think the problem is fixed. > > > Once again, thank you for your help! > > > Kind regards, > > > Alexandra > -- > *Fra:* scikit

Re: [scikit-learn] Jeff Levesque: association rules

2018-06-11 Thread Joel Nothman
We have definitely discussed association rules in issues before. It's considered out of scope for scikit-learn, except insofar as it is used for learning classification. We haven't yet been convinced that classifiers based on associative learning have enough practical demand to justify their

Re: [scikit-learn] Jeff Levesque: profit functionality

2018-06-11 Thread Joel Nothman
There is a PR for more GLM support ( https://github.com/scikit-learn/scikit-learn/pull/9405), but I don't think it will be in the next release.​ ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] PyCM: Multiclass confusion matrix library in Python

2018-06-04 Thread Joel Nothman
> > Thanks for this -- looks useful. I had to write something similar (for >> the binary case) and wish scikit had something like this. > > > Which part of it? I'm not entirely sure I understand what the core > functionality is. > > I think the core efficiently evaluating the full set of metrics

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-17 Thread Joel Nothman
he code, because I > only know how to work with the SciKitLearn package ready. :-( > > Att., > Mauricio Reis > > 2018-05-16 20:33 GMT-03:00 Joel Nothman <joel.noth...@gmail.com>: > >> Implemented in a previous version of #10280 >> <https://github.com/s

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-16 Thread Joel Nothman
Implemented in a previous version of #10280 , but removed for now to simplify reviews . If others would like to review #10280, I'm happy to follow up with

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-13 Thread Joel Nothman
Note that this has long been documented under "Memory consumption for large sample sizes" at http://scikit-learn.org/stable/modules/clustering.html#dbscan On 14 May 2018 at 12:59, Joel Nothman <joel.noth...@gmail.com> wrote: > This is quite a common issue with our imple

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-13 Thread Joel Nothman
This is quite a common issue with our implementation of DBSCAN, and improvements to documentation would be very, very welcome. The high memory cost comes from constructing the pairwise radius neighbors for all points. If using a distance metric that cannot be indexed with a KD-tree or Ball Tree,

Re: [scikit-learn] Unable to run make test-coverage

2018-05-10 Thread Joel Nothman
Do you have pytest-cov installed?​ ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Retracting model from the 'blackbox' SVM

2018-05-05 Thread Joel Nothman
The coef_ available from LinearSVC will be somewhat indicative of the relative importance of each feature. But you might want to look into our feature selection documentation: http://scikit-learn.org/stable/modules/feature_selection.html ___

Re: [scikit-learn] K Medoids Clustering Implementation

2018-04-15 Thread Joel Nothman
The current contributor for this is finding it hard to find time to complete the work. I think the remaining issues are quite minor, and we would be keen for someone to take over: respond to my review and hope we get another before next release. ___

Re: [scikit-learn] K Medoids Clustering Implementation

2018-04-15 Thread Joel Nothman
Did you find https://github.com/scikit-learn/scikit-learn/pull/7694? On 14 April 2018 at 11:35, Zane DuFour wrote: > Is someone working on an implementation of K-Medoids clustering > at the moment? If not, I would > like to

Re: [scikit-learn] Delegating "get_params" and "set_params" to a wrapped estimator when parameter is not defined.

2018-04-15 Thread Joel Nothman
Have you considered whether a mixin is a better model than a wrapper?​ ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Run time complexity of algorithms

2018-03-21 Thread Joel Nothman
You may also be interested in the work at https://github.com/scikit-learn/scikit-learn/issues/10289 and perhaps interested in helping give feedback towards finishing it off. ___ scikit-learn mailing list scikit-learn@python.org

Re: [scikit-learn] Run time complexity of algorithms

2018-03-21 Thread Joel Nothman
If you produce a catalogue of their runtime complexities, it would be great if you could contribute them back to the project's documentation. Thanks!​ However, I suspect you'll find that the theoretical worst-case asymptotic runtime is often not what you're most interested in.

Re: [scikit-learn] Using KMeans cluster labels in KNN

2018-03-12 Thread Joel Nothman
A meta-estimator for this (generic to which classifier / clusterer) is coded up at https://github.com/scikit-learn/scikit-learn/issues/4543#issuecomment-91073246 We even have a pull request that made an example of this sort of thing at https://github.com/scikit-learn/scikit-learn/pull/6478, but

Re: [scikit-learn] KMeans default distance function

2018-03-10 Thread Joel Nothman
kmeans necessarily uses Euclidean distance, and a patch to the documentation is welcome. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] help-Renaming features in Sckit-learn's CountVectorizer()

2018-03-05 Thread Joel Nothman
You can effectively merge features through matrix multiplication: multiply the CountVectorizer output by a sparse matrix of shape (n_features_in, n_features_out) which has 1 where the output feature corresponds to an input feature. Your spelling correction then consists of building this mapping

Re: [scikit-learn] KMeans cluster

2018-02-14 Thread Joel Nothman
you can repeatedly use n_init=1? ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Multi-Output Decision Trees for mixed classification-regerssion problems

2018-02-12 Thread Joel Nothman
presuming there are clear applications for this, other models should be able to support mixed targets similarly, like MLP. since we don't really have an API design for this, it might take some time to find consensus on what it should look like. but a PR would be a good way to concretely consider

Re: [scikit-learn] Pipegraph is on its way!

2018-02-07 Thread Joel Nothman
cool! We have been talking for a while about how to pass other things around grid search and other meta-analysis estimators. This injection approach looks pretty neat as a way to express it. Will need to mull on it. On 8 Feb 2018 2:51 am, "Manuel Castejón Limas" wrote:

Re: [scikit-learn] Jeff Levesque: custom json encoder

2018-02-05 Thread Joel Nothman
I think you need to describe what use cases you intend once you've encoded the thing. JSON's pretty generic. You can convert any pickle into JSON, but it'll still have the security and versioning issues of a pickle. You can convert to PMML and convert the XML to JSON, but it'd still be limited by

Re: [scikit-learn] One-hot encoding

2018-02-05 Thread Joel Nothman
(sparse one-hot matrix size). These numbers aren't > exact, but you can see my point. > > Cheers, > Sarah > > On Mon, Feb 5, 2018 at 9:50 PM, Joel Nothman <joel.noth...@gmail.com> > wrote: > >> OneHotEncoder will not magically reduce the size of your input. It will &

Re: [scikit-learn] Need Help with Failing Travis/Appveyor Build

2018-02-05 Thread Joel Nothman
I assume it is not available in all supported versions of numpy. but I can't imagine you need it if we have not used it before! On 6 Feb 2018 2:32 am, "Yacine MAZARI" wrote: > Hello, > > I added some additional unit tests to this PR >

Re: [scikit-learn] One-hot encoding

2018-02-04 Thread Joel Nothman
test = enc.fit_transform([[7, 0, 3], [1, 2, 0], [0, 2, 1], [1, 0, 2]]) >> >>> test >> array([[0., 0., 0., 1., 0., 0., 0., 1., 1.], >> [0., 1., 0., 0., 0., 2., 0., 0., 0.], >>[1., 0., 0., 0., 0., 1., 1., 0., 0.], >>[0., 1., 0., 1., 0., 0.,

Re: [scikit-learn] One-hot encoding

2018-02-04 Thread Joel Nothman
20 million categories, or 20 million categorical variables? OneHotEncoder is pretty efficient if you specify n_values. On 5 February 2018 at 15:10, Sarah Wait Zaranek wrote: > Hello - > > I was just wondering if there was a way to improve performance on the > one-hot

Re: [scikit-learn] CountVectorizer: Additional Feature Suggestion

2018-01-28 Thread Joel Nothman
ument > length. > > Best, > Sebastian > > > On Jan 28, 2018, at 1:29 AM, Joel Nothman <joel.noth...@gmail.com> > wrote: > > > > sklearn.preprocessing.Normalizer allows you to normalize any vector by > its L1 or L2 norm. L1 would be equivalent to "document

Re: [scikit-learn] sklearn.model_selection.GridSearchCV - unable to use n_jobs>1 on MacOS Sierra python 2.7

2018-01-07 Thread Joel Nothman
What do you mean by "the jobs start to die out one by one"? Surely the jobs should finish and die out one by one...? On 8 January 2018 at 06:35, Sumeet Sandhu wrote: > Hi, > > I was able to run this with n_jobs=-1, and the activity monitor does show > all 8 CPUs

Re: [scikit-learn] clustering on big dataset

2018-01-04 Thread Joel Nothman
dataset of 12*872505 (features, samples). It takes several days to run the > program. Is there any way to speed up the query process of NN? I doubt > query may be too slow. > Thanks for your time. > > On Thu, Jan 4, 2018 at 3:55 AM, Joel Nothman <joel.noth...@gmail.com> > wr

Re: [scikit-learn] Text classification of large dataet

2017-12-20 Thread Joel Nothman
To clarify: You have 2.3M samples How many features? How many active features on average per sample? In 7k classes: multiclass or multilabel? Have you tried limiting the depth of the forest? Have you tried embedding your feature space into a smaller vector (pre-trained embeddings, hashing, lda,

Re: [scikit-learn] Any plans on generalizing Pipeline and transformers?

2017-12-19 Thread Joel Nothman
At a glance, and perhaps not knowing imbalanced-learn well enough, I have some doubts that it will provide an immediate solution for all your needs. At the end of the day, the Pipeline keeps its scope relatively tight, but it should not be so hard to implement something for your own needs if your

Re: [scikit-learn] Feature selection with words.

2017-12-19 Thread Joel Nothman
It depends what the set of classes is. Best way to find out is to try it... On 19 December 2017 at 19:36, Luigi Lomasto < l.loma...@innovationengineering.eu> wrote: > Hi all. > > I’m working for text classification to classify Wikipedia documents. I > using a word count approach to extract

[scikit-learn] FYI: StratifiedKFold(..., shuffle=True) differs in 0.19

2017-12-13 Thread Joel Nothman
It has come to our attention in #10274 that we accidentally changed shuffled StratifiedKFold behaviour in the 0.19.0 release from what had come before. That is, for the same random state, you will get a different cross-validation data

Re: [scikit-learn] Grid search fir multi-label task

2017-12-10 Thread Joel Nothman
for legacy reasons, multilabel targets need to be passed as an array (or a sparse matrix if supported by the classifier). lists of lists are not supported but may be in the near future. ___ scikit-learn mailing list scikit-learn@python.org

Re: [scikit-learn] Error while running 'python setup.py build_ext --inplace'

2017-12-06 Thread Joel Nothman
We're biased, but we reckon the skills to make a PR are (a) not insurmountable with a bit of homework; and (b) very worthwhile to have. So try pick it up by yourself, but give us a shout if you're struggling. ___ scikit-learn mailing list

Re: [scikit-learn] Error while running 'python setup.py build_ext --inplace'

2017-12-05 Thread Joel Nothman
A PR is welcome if you can improve documentation. Thanks On 6 December 2017 at 04:01, Aniket Meshram wrote: > Yeah. That did it. After updating Cython to latest 0.27.3, the issue is > resolved now. > Thanks all. I guess this should also be updated on the site /

Re: [scikit-learn] Error while running 'python setup.py build_ext --inplace'

2017-12-02 Thread Joel Nothman
There's not enough information there for us to help you. Please provide the full log if possible. Are you sure you want to build from source? On 2 December 2017 at 08:05, Aniket Meshram wrote: > hi, > > I'm following the 'ways to contribute page' > > After forking

Re: [scikit-learn] Issue with Sihouette_samples

2017-11-16 Thread Joel Nothman
https://github.com/scikit-learn/scikit-learn/pull/7177 makes silhouette more memory-efficient. Try that branch? On 17 November 2017 at 05:46, Shiheng Duan wrote: > Hi Luigi, > > Actually my data has 621*1405 points and each point has 12 features. I > made it into a 2-D

Re: [scikit-learn] Custom Distance Metric / Distance Matrix with K-means?

2017-11-14 Thread Joel Nothman
e to work on it again. I > recall the only work left is to ensure the code works with the latest > sklearn version. > > -Timo > > 15.11.2017 1.51 "Joel Nothman" <joel.noth...@gmail.com> kirjoitti: > >> No, it's not applicable to KMeans. There are related algor

Re: [scikit-learn] Custom Distance Metric / Distance Matrix with K-means?

2017-11-14 Thread Joel Nothman
No, it's not applicable to KMeans. There are related algorithms that support custom metrics, e.g. K Medoids (a pull request to scikit-learn is here https://github.com/scikit-learn/scikit-learn/pull/7694 but implementations exist in other libraries). Cheers, Joel

Re: [scikit-learn] Interested to Contribute to Scikit Learn

2017-10-24 Thread Joel Nothman
hello and welcome Nikhil, as described in our contributor guide, which you should read, we would much prefer to make your acquaintance through non-critical contributions. please start by looking for issues labelled as "easy" or"good first issue", and "help wanted" more generally indicated issues

Re: [scikit-learn] Wrong docs of sklearn/neighbours

2017-10-10 Thread Joel Nothman
yes, I think that statement is imprecise, at least in the context of nearest neighbours, and I think it is the kind of statement that is hard to maintain consistent with the library in any case. No issue has been opened to my knowledge. thanks for following up, and feel free to submit a PR even

Re: [scikit-learn] Wrong docs of sklearn/neighbours

2017-10-09 Thread Joel Nothman
I don't know what you're asking. The documentation at http://scikit-learn.org/dev should request that pull request ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Validating L2 - Least Squares - sum of squares, During a Normalization Function

2017-10-08 Thread Joel Nothman
Ah of course. Thanks. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Validating L2 - Least Squares - sum of squares, During a Normalization Function

2017-10-08 Thread Joel Nothman
(normalize(X) * normalize(X)).sum(axis=1) works fine here. But I was unaware of these quirks in Python's implementation of pow: Numpy seems to be consistent in returning nan when a negative float is raised to a non-integer (or equivalent float) power. By only calculating integer powers of

Re: [scikit-learn] question for using GridSearchCV on LocalOutlierFactor

2017-10-07 Thread Joel Nothman
actually I'm probably wrong there, but you may not be able to use accuracy ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Using perplexity from LatentDirichletAllocation for cross validation of Topic Models

2017-10-07 Thread Joel Nothman
just a note that if you're using this for topic modelling, perplexity might not be a good choice of objective function. others have been proposed. see the diagnostic functions for MALLET topic modelling for instance. ___ scikit-learn mailing list

Re: [scikit-learn] question for using GridSearchCV on LocalOutlierFactor

2017-10-07 Thread Joel Nothman
I don't think LOF is designed to apply to unseen data. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Terminating a Pipeline with a NearestNeighbors search

2017-09-16 Thread Joel Nothman
table for my use case. > > Ryan > > On Wed, Sep 13, 2017 at 2:14 PM, Joel Nothman <joel.noth...@gmail.com> > wrote: > >> it's pretty easy to implement this by creating your own Pipeline >> subclass, isn't it? >> >> On 14 Sep 2017 4:55 am, "Gael Varoq

Re: [scikit-learn] Accessing Clustering Feature Tree in Birch

2017-09-16 Thread Joel Nothman
There is no such thing as "the data samples in this cluster". The point of Birch being online is that it loses any reference to the individual samples that contributed to each node, but stores some statistics on their basis. Roman Yurchak has, however, offered a PR where, for the non-online case,

Re: [scikit-learn] Terminating a Pipeline with a NearestNeighbors search

2017-09-13 Thread Joel Nothman
it's pretty easy to implement this by creating your own Pipeline subclass, isn't it? On 14 Sep 2017 4:55 am, "Gael Varoquaux" wrote: > On Wed, Sep 13, 2017 at 02:45:41PM -0400, Andreas Mueller wrote: > > We could add a way to call non-standard methods, but I'm not

Re: [scikit-learn] how to make result less number of group with NearestNeighbors?

2017-09-10 Thread Joel Nothman
Given your related post on the issue tracker, I think you're trying to perform clustering. Use DBSCAN, which is a standard approach to clustering based on neighborhoods within radius. On 10 September 2017 at 14:44, Martin Lee wrote: > nbrs =

Re: [scikit-learn] Problem found when testing DecisionTreeClassifier within the source folder

2017-09-04 Thread Joel Nothman
I suspect this is due to an intricacy of Cython. Despite using relative imports, Cython expects the Criterion instance to come from a package called sklearn, not called sklearn1. On 5 September 2017 at 12:42, hanzi mao wrote: > Hi, > > I am researching on the source code of

Re: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline

2017-08-28 Thread Joel Nothman
=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, >>>> verbose=0, warm_start=False) >>> >>> >>> I wonder what I am missing? >>> >>> Thanks, >>> Raga >>> >>> >>> On Mon, Aug 28, 2017 at

Re: [scikit-learn] Getting weight coefficient of logistic regression from a pipeline

2017-08-27 Thread Joel Nothman
No, we do not have a way to get the coefficients with respect to the input (pre-scaling) space. On 28 August 2017 at 13:20, Raga Markely wrote: > Hello, > > I am wondering if it's possible to get the weight coefficients of logistic > regression from a pipeline? > > For

Re: [scikit-learn] imbalanced-learn 0.3.0 is chasing scikit-learn 0.19.0

2017-08-24 Thread Joel Nothman
Congratulations Guillaume and the imblearn team! On 25 August 2017 at 10:14, Guillaume Lemaître wrote: > We are excited to announce the new release of the scikit-learn-contrib > imbalanced-learn, already available through conda and pip (cf. the > installation page

Re: [scikit-learn] any interest in incorporating a new Transformer?

2017-08-20 Thread Joel Nothman
> > On Sat, Aug 19, 2017 at 2:47 AM, Joel Nothman <joel.noth...@gmail.com> > wrote: > >> this is the right place to ask, but I'd be more interested to see a >> scikit-learn-compatible implementation available, perhaps in >> scikit-learn-contrib more than to s

Re: [scikit-learn] any interest in incorporating a new Transformer?

2017-08-19 Thread Joel Nothman
this is the right place to ask, but I'd be more interested to see a scikit-learn-compatible implementation available, perhaps in scikit-learn-contrib more than to see it part of the main package... On 19 Aug 2017 2:13 am, "Michael Capizzi" wrote: > Hi all - > >

Re: [scikit-learn] Categorical handling

2017-08-17 Thread Joel Nothman
gist at https://gist.github.com/jnothman/a75bac778c1eb9661017555249e50379 On 18 August 2017 at 01:26, Joel Nothman <joel.noth...@gmail.com> wrote: > I don't consider LabelBinarizer the best workaround. > > Given a Pandas dataframe df, I'd use: > > DictVectorizer().fit_transf

Re: [scikit-learn] Categorical handling

2017-08-17 Thread Joel Nothman
I don't consider LabelBinarizer the best workaround. Given a Pandas dataframe df, I'd use: DictVectorizer().fit_transform(df.to_dict(orient='records')) which will handle encoding strings with one-hot and numerical features as column vectors. Or: class PandasVectorizer(DictVectorizer): def

Re: [scikit-learn] caching transformers during hyper parameter optimization

2017-08-16 Thread Joel Nothman
We certainly considered this over the many years that Pipeline caching has been in the pipeline. Storing the fitted model means we can do both a fit_transform and a transform on new data, and in many cases takes away the pain point of CV over pipelines where downstream steps are varied. What

Re: [scikit-learn] Truncated svd not working for complex matrices

2017-08-10 Thread Joel Nothman
Should we be more explicitly forbidding complex data in most estimators, and perhaps allow it in a few where it is tested (particularly decomposition)? On 11 August 2017 at 01:08, André Melo wrote: > Actually, it makes more sense to change > > B =

Re: [scikit-learn] Help With Text Classification

2017-08-02 Thread Joel Nothman
lize my script. I did not want to abstract > away too much early on since I am in the beginning stages of learning > machine learning and scikit-learn. > > - Daniel > > On Wed, Aug 2, 2017 at 10:38 PM, Joel Nothman <joel.noth...@gmail.com> > wrote: > >> Use a

Re: [scikit-learn] Help With Text Classification

2017-08-02 Thread Joel Nothman
Use a Pipeline to help avoid this kind of issue (and others). You might also want to do something like http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html On 3 August 2017 at 12:01, pybokeh wrote: > Hello, > I am studying this example from scikit-learn's

Re: [scikit-learn] Fwd: Custom transformer failing check_estimator test

2017-07-26 Thread Joel Nothman
aybe we should just do a backport for assert_raises and > assert_raises_regex? > > > On 07/25/2017 07:58 PM, Joel Nothman wrote: > > One advantage of moving to pytest is that we can put messages into > pytest.raises, and we should emphasise this in moving the chec

Re: [scikit-learn] Fwd: Custom transformer failing check_estimator test

2017-07-24 Thread Joel Nothman
what is the failing test? please provide the full traceback. On 24 Jul 2017 10:58 pm, "Sam Barnett" wrote: > Dear scikit-learn developers, > > I am developing a transformer, named Sqizer, that has the ultimate goal > of modifying a kernel for use with the sklearn.svm

Re: [scikit-learn] scikit-learn hits 20k github stars

2017-07-22 Thread Joel Nothman
:13 am, "Jacob Schreiber" <jmschreibe...@gmail.com> wrote: > [image: Inline image 1] > > Many thanks to everyone who has worked on and contributed to the project > for the past decade to make it such a great tool! Also a special thanks to > Joel Nothman, who has

Re: [scikit-learn] Max f1 score for soft classifier?

2017-07-17 Thread Joel Nothman
I suppose it would not be hard to build a wrapper that does this, if all we are doing is choosing a threshold. Although a global maximum is not guaranteed without some kind of interpolation over the precision-recall curve. On 18 July 2017 at 02:41, Stuart Reynolds

Re: [scikit-learn] Moving average transformer

2017-07-06 Thread Joel Nothman
I agree that this is best handled with a custom transformer, for the reasons cited by Jacob, but also because it sounds like this transformer does not gather statistics from the training data, and so can be implemented with FunctionTransformer On 7 Jul 2017 6:10 am, "Jacob Schreiber"

Re: [scikit-learn] Fwd: sample_weight parameter is not split when used in GridSearchCV

2017-06-22 Thread Joel Nothman
why are you passing [my_sample_weights] rather than just my_sample_weights? On 23 Jun 2017 7:49 am, "Julio Antonio Soto de Vicente" wrote: > Hi Manuel, > > Are you sure that you are using the latest version (or at least >0.17)? > The code for splitting the sample weights in

Re: [scikit-learn] Need Help Random Forest Imputation Model as in R

2017-06-15 Thread Joel Nothman
Hi Akash, the fancyimpute package (https://pypi.python.org/pypi/fancyimpute) may be of interest. It doesn't implement exactly this, but MICE may be a similar enough technique to give good results. A main difference appears to be that random forest imputation has the notion of proximity weighting,

Re: [scikit-learn] Cross-validation & cross-testing

2017-06-04 Thread Joel Nothman
And when I mean testing it, I mean writing tests that live with the code so that they can be re-executed, and so that someone else can see what your tests assert about your code's correctness. On 5 June 2017 at 11:52, Joel Nothman <joel.noth...@gmail.com> wrote: > Hi Rain, > >

Re: [scikit-learn] Cross-validation & cross-testing

2017-06-04 Thread Joel Nothman
Hi Rain, I would suggest that you start by documenting what your code is meant to do (the structure of the Korjus et al paper makes it pretty difficult to even determine what this technique is, for you to then not to describe it in your own words in your repository), testing it with diverse

Re: [scikit-learn] gridsearchCV able to handle list of input?

2017-04-30 Thread Joel Nothman
) for single in train_input]) GridSearchCV(make_pipeline(tmi, my_predictor), ...) On 1 May 2017 at 13:19, Joel Nothman <joel.noth...@gmail.com> wrote: > Unless I'm mistaken about what we're looking at, you could use something > like: > > class ToMultiInpu

Re: [scikit-learn] gridsearchCV able to handle list of input?

2017-04-30 Thread Joel Nothman
<nofl...@gmail.com> wrote: > How … batchsize could also be 1, I’ve just stored it like that. > > But how do reshape me data to be a matrix.. thats the big question.. is > possible? > > Den 1. maj 2017 kl. 02.21 skrev Joel Nothman <joel.noth...@gmail.com>: > > Do

Re: [scikit-learn] gridsearchCV able to handle list of input?

2017-04-30 Thread Joel Nothman
Do each of your 33 inputs have a batch of size 100? If you reshape your data so that it all fits in one matrix, and then split it back out into its 33 components as the first transformation in a Pipeline, there should be no problem. On 1 May 2017 at 10:17, Joel Nothman <joel.noth...@gmail.

Re: [scikit-learn] gridsearchCV able to handle list of input?

2017-04-30 Thread Joel Nothman
Banks <nofl...@gmail.com>: > > The shapes are > > print len(train_input)print train_input[0].shapeprint train_output.shape > 33(100, 8, 45, 3)(100, 1, 145) > > > 100 is the batch-size.. > > Den 30. apr. 2017 kl. 12.57 skrev Joel Nothman <joel.noth...@gmail.com>

Re: [scikit-learn] gridsearchCV able to handle list of input?

2017-04-30 Thread Joel Nothman
API does not > support multi-input data... > > El 30 abr 2017, a las 12:02, Joel Nothman <joel.noth...@gmail.com> > escribió: > > What are the shapes of train_input and train_output? > > On 30 April 2017 at 12:59, Carlton Banks <nofl...@gmail.com> wrote: > >> I a

Re: [scikit-learn] gridsearchCV able to handle list of input?

2017-04-30 Thread Joel Nothman
What are the shapes of train_input and train_output? On 30 April 2017 at 12:59, Carlton Banks wrote: > I am currently trying to run some gridsearchCV on a keras model which has > multiple inputs. > The inputs is stored in a list in which each entry in the list is a input >

Re: [scikit-learn] What if I don't want performance measures per each outcome class?

2017-04-24 Thread Joel Nothman
"Traditional" sensitivity is defined for binary classification only. Maybe micro-average is what you're looking for, but in the multiclass case without anything more specified, you'll merely be calculating accuracy. Perhaps quantiles of the scores returned by permutation_test_score will give you

Re: [scikit-learn] sklearn - knn sklearn.neighbors kneighbors function producing unexpected result for text analysis?

2017-04-20 Thread Joel Nothman
The problem is the misuse of the label encoder. See https://github.com/scikit-learn/scikit-learn/issues/8767 On 20 April 2017 at 19:58, Alex Garel wrote: > I'm not totally sure of what you're trying to do, but here are some > remarks that may help you: > > 1. in modelfit =

Re: [scikit-learn] sklearn - knn sklearn.neighbors kneighbors function producing unexpected result for text analysis?

2017-04-18 Thread Joel Nothman
towards debugging, perhaps add the return_distances option On 16 Apr 2017 9:19 pm, "Evaristo Caraballo via scikit-learn" < scikit-learn@python.org> wrote: > I have been asked to implement a simple knn for text similarity analysis. > I tried by using sklearn.neighbors module. > The file to be

Re: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release

2017-03-25 Thread Joel Nothman
Andreas Mueller <t3k...@gmail.com> wrote: > >> >> >> On 02/07/2017 09:00 PM, Joel Nothman wrote: >> >> On 12 January 2017 at 08:51, Gael Varoquaux < >> gael.varoqu...@normalesup.org> wrote: >> >>> On Thu, Jan 12, 2017 at 08:41:5

Re: [scikit-learn] Intermediate results using gridsearchCV?

2017-03-19 Thread Joel Nothman
g the result at the end of the run. > > On Sun, 19 Mar 2017 at 11:46 Joel Nothman <joel.noth...@gmail.com> wrote: > >> Not sure what you mean. Have you used cv_results_ >> >> On 18 March 2017 at 08:46, Carlton Banks <nofl...@gmail.com> wrote: >> >>

Re: [scikit-learn] Intermediate results using gridsearchCV?

2017-03-19 Thread Joel Nothman
Not sure what you mean. Have you used cv_results_ On 18 March 2017 at 08:46, Carlton Banks wrote: > Is it possible to receive intermediate the intermediate result of a > gridsearchcv? > > instead getting the final result? > > > >

Re: [scikit-learn] GridsearchCV

2017-03-15 Thread Joel Nothman
If you're using something like n_jobs=-1, that will explode memory usage in proportion to the number of cores, and particularly so if you're passing the data as a list rather than array and hence can't take advantage of memmapped data parallelism. On 16 March 2017 at 15:46, Carlton Banks

Re: [scikit-learn] Differences between scikit-learn and Spark.ml for regression toy problem

2017-03-15 Thread Joel Nothman
sklearn's (and hence liblinear's) intercept is not being used here, but a feature is added in Python to represent the bias, so it's being regularised in any case. On 16 March 2017 at 14:27, Sebastian Raschka wrote: > I think the liblinear solver (default in

Re: [scikit-learn] best way to scale on the random forest for text w bag of words ...

2017-03-15 Thread Joel Nothman
Trees are not a traditional choice for bag of words models, but you should make sure you are at least using the parameters of the random forest to limit the size (depth, branching) of the trees. On 16 March 2017 at 12:20, Sasha Kacanski wrote: > Hi, > As soon as number of

Re: [scikit-learn] GSoC 2017

2017-02-27 Thread Joel Nothman
Hi Pradeep, we would usually only accept candidates who have shown their proficiency and understanding of our package and processes by making some contributions prior to this stage. you are certainly welcome to aim for GSoC 2018 by beginning to develop your familiarity and rapport now. cheers,

Re: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release

2017-02-07 Thread Joel Nothman
On 12 January 2017 at 08:51, Gael Varoquaux <gael.varoqu...@normalesup.org> wrote: > On Thu, Jan 12, 2017 at 08:41:51AM +1100, Joel Nothman wrote: > > When the two versions deprecation policy was instituted, releases were > much > > more frequent... Is that enough of an ex

Re: [scikit-learn] Need Corresponding indices array of values in each split of a DesicisionTreeClassifier

2017-02-07 Thread Joel Nothman
I don't think putting that array of indices in a visualisation is a great idea! If you use my_tree.apply(X) you will be able to determine which leaf each instance in X lands up at, and potentially trace up the tree from there. On 8 February 2017 at 01:26, Nixon Raj wrote: >

Re: [scikit-learn] top N accuracy classification metric

2017-01-21 Thread Joel Nothman
There are metrics with that kind of input in sklearn.metrics.ranking. I don't have the time to look them up now, but there have been proposals and PRs for similar ranking metrics. Please search the issue tracker for related issues. Thanks, Joel On 21 January 2017 at 06:16, Johnson, Jeremiah

Re: [scikit-learn] Identify spectra with "marker"

2017-01-21 Thread Joel Nothman
Wrong mailing list? On 21 January 2017 at 02:52, Sebastian Illner < sebastian.ill...@imtek.uni-freiburg.de> wrote: > Hi guys, > I'm new to NIR-measurement as wenn as chemometrics. My current project > involvs the recognition of determined spectra (of a reference system) among > others. > The

Re: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release

2017-01-11 Thread Joel Nothman
When the two versions deprecation policy was instituted, releases were much more frequent... Is that enough of an excuse? On 12 January 2017 at 03:43, Andreas Mueller wrote: > > > On 01/09/2017 10:15 AM, Gael Varoquaux wrote: > >> instead of setting up a roadmap I would rather

Re: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release

2017-01-09 Thread Joel Nothman
In terms of the bug fixes listed in the change-log, most seem non-urgent. I would consider pulling across #7954, #8006, #8087, #7872, #7983. But I also wonder whether we'd be better off sprinting towards a small 0.19 release. On 9 January 2017 at 20:48, Olivier Grisel

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-08 Thread Joel Nothman
Btw, I may have been unclear in the discussion of overfitting. For *training* the meta-estimator in stacking, it's standard to do something like cross_val_predict on your training set to produce its input features. On 8 January 2017 at 22:42, Thomas Evangelidis wrote: >

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Joel Nothman
* > There is no problem, in general, with overfitting, as long as your > evaluation of an estimator's performance isn't biased towards the training > set. We've not talked about evaluation. > ___ scikit-learn mailing list scikit-learn@python.org

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Joel Nothman
On 8 January 2017 at 08:36, Thomas Evangelidis wrote: > > > On 7 January 2017 at 21:20, Sebastian Raschka > wrote: > >> Hi, Thomas, >> sorry, I overread the regression part … >> This would be a bit trickier, I am not sure what a good strategy for >>

Re: [scikit-learn] modifying CV score

2017-01-04 Thread Joel Nothman
Well, it returns the equivalent of lambda estimator, X, y: estimator.score(X, y) On 5 January 2017 at 08:47, Jonathan Taylor wrote: > (Think this is right reply to from a digest... If not, apologies) > > Thanks for the pointers. From what I read on the API, I

Re: [scikit-learn] KNeighborsClassifier and metric='precomputed'

2017-01-02 Thread Joel Nothman
n_indexed means the number of samples in the X passed to fit. It needs to be able to compare each prediction sample with each training sample. On 3 January 2017 at 07:44, Pedro Pazzini wrote: > Hi all! > > I'm trying to use a KNeighborsClassifier with precomputed metric.

<    1   2   3   >