Re: [Scikit-learn-general] Average Per-Class Accuracy metric

2016-03-08 Thread Joel Nothman
You mean TP / N, not TP / TN. And I think the average per-class accuracy does some weird things. Like: true = [1, 1, 1, 0, 0] pred = [1, 1, 1, 1, 1] a.p.c.a = (3 + 3) / 5 / 2 true = [1, 1, 1, 0, 2] pred = [1, 1, 1, 1, 1] a.p.c.a = (4 + 4 + 3) / 5 / 3 I don't think that's very useful. On 9 Marc

Re: [Scikit-learn-general] "In-bag" for RandomForest*

2016-03-08 Thread Mathieu Blondel
If this function is generally useful, it might be a good idea to make it public. Mathieu On Wed, Mar 9, 2016 at 1:29 AM, Ariel Rokem wrote: > > On Mon, Mar 7, 2016 at 8:24 AM, Andreas Mueller wrote: > >> Hi Ariel. >> We are not storing them any more because of memory issues, but you can >> rec

Re: [Scikit-learn-general] Average Per-Class Accuracy metric

2016-03-08 Thread Sebastian Raschka
> Firstly, balanced accuracy is a different thing, and yes, it should be > supported. > Secondly, I am correct in thinking you're talking about multiclass (not > multilabel). Sorry for the confusion, and yes, you are right. I think have mixed the terms “average per-class accuracy” with “balan

Re: [Scikit-learn-general] Average Per-Class Accuracy metric

2016-03-08 Thread Joel Nothman
Firstly, balanced accuracy is a different thing, and yes, it should be supported. Secondly, I am correct in thinking you're talking about multiclass (not multilabel). However, what you're describing isn't accuracy. It's actually micro-averaged recall, except that your dataset is impossible becaus

Re: [Scikit-learn-general] Average Per-Class Accuracy metric

2016-03-08 Thread Joel Nothman
(Although multiloutput accuracy is reasonable to support.) On 9 March 2016 at 12:29, Joel Nothman wrote: > Firstly, balanced accuracy is a different thing, and yes, it should be > supported. > > Secondly, I am correct in thinking you're talking about multiclass (not > multilabel). > > However, w

Re: [Scikit-learn-general] Average Per-Class Accuracy metric

2016-03-08 Thread Sebastian Raschka
I haven’t seen this in practice, yet, either. A colleague was looking for this in scikit-learn recently, and he asked me if I know whether this is implemented or not. I couldn’t find anything in the docs and was just curious about your opinion. However, I just found this entry here on wikipedia:

Re: [Scikit-learn-general] Average Per-Class Accuracy metric

2016-03-08 Thread Joel Nothman
I've not seen this metric used (references?). Am I right in thinking that in the binary case, this is identical to accuracy? If I predict all elements to be the majority class, then adding more minority classes into the problem increases my score. I'm not sure what this metric is getting at. On 8

Re: [Scikit-learn-general] Implementation of Bag-of-Features

2016-03-08 Thread Guillaume Lemaître
Regarding the MiniBatchKMeans, I use the following parameters MiniBatchKMeans(n_clusters=nb_words, verbose=1, init='random', batch_size=10 * nb_words, compute_labels=False, reassignment_ratio=0.0, random_state=1, n_init=3) With 1000 words. I am not sure about the batch size as well as the initial

Re: [Scikit-learn-general] Implementation of Bag-of-Features

2016-03-08 Thread Guillaume Lemaître
Sorry I was wrong. The MiniBatchKMeans converge after 20 minutes. So for one iteration of the CV, I get something like that: Classification performed [[21 2 0] [ 0 20 0] [ 0 0 23]] It took 1253.23589396 seconds. Probably this is not desirable to have a cross-validation. I don't know if you

Re: [Scikit-learn-general] Implementation of Bag-of-Features

2016-03-08 Thread Andreas Mueller
Hey Guillaume. If it is a couple of hours, I'm not sure it is worth adding. You can probably aggressively subsample or just do fewer iterations (like, one pass over the data) How do you run MiniBatchKMeans? Cheers, Andy On 03/08/2016 03:21 PM, Guillaume Lemaître wrote: Hi, I made a pull-requ

Re: [Scikit-learn-general] Implementation of Bag-of-Features

2016-03-08 Thread Guillaume Lemaître
Hi, I made a pull-request with the draft: https://github.com/scikit-learn/scikit-learn/pull/6509 Extracting the feature is taking a honest amount of time (around 30 sec.) The codebook generation through MiniBatchKMeans is more problematic. I am still running it but it could be a couple of hours.

Re: [Scikit-learn-general] scikit-learn in Julia

2016-03-08 Thread Andreas Mueller
On 03/07/2016 04:47 PM, Cedric St-Jean wrote: > >> There is already Pandas.jl, Stan.jl, MATLAB.jl and Bokeh.jl following > >> that trend. > >That is interesting. Were they done by people associated with the > >original projects? > > As far as I can tell, no, they weren't. Stan.jl and Bokeh.jl are

Re: [Scikit-learn-general] "In-bag" for RandomForest*

2016-03-08 Thread Ariel Rokem
On Mon, Mar 7, 2016 at 8:24 AM, Andreas Mueller wrote: > Hi Ariel. > We are not storing them any more because of memory issues, but you can > recover them using the random state of the tree: > > https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/forest.py#L76 > > > indices

[Scikit-learn-general] Average Per-Class Accuracy metric

2016-03-08 Thread Sebastian Raschka
Hi, I was just wondering why there’s no support for the average per-class accuracy in the scorer functions (if I am not overlooking something). E.g., we have 'f1_macro', 'f1_micro', 'f1_samples', ‘f1_weighted’ but I didn’t see a ‘accuracy_macro’, i.e., (acc.class_1 + acc.class_2 + … + acc.class

Re: [Scikit-learn-general] scikit-learn in Julia

2016-03-08 Thread Cedric St-Jean
>> There is already Pandas.jl, Stan.jl, MATLAB.jl and Bokeh.jl following >> that trend. >That is interesting. Were they done by people associated with the >original projects? As far as I can tell, no, they weren't. Stan.jl and Bokeh.jl are now both recognized (but not explicitly supported) by thei

Re: [Scikit-learn-general] [Matplotlib-users] Scipy2016: call for proposals

2016-03-08 Thread Kyle Kastner
I am on the fence still - internship this summer so I need to check on timing/vacation expectation On Mon, Mar 7, 2016 at 3:09 PM, Jacob Vanderplas wrote: > I'm not going to be able to make it this year, unfortunately. > Jake > > Jake VanderPlas > Senior Data Science Fellow > Director of Re

[Scikit-learn-general] release of scikit-image 0.12

2016-03-08 Thread Emmanuelle Gouillart
Announcement: scikit-image 0.12 === The scikit-image team is very pleased to announce the release of version 0.12 of scikit-image. scikit-image is an image processing toolbox for Python and SciPy, that includes algorithms for segmentation, geometric transformations, co