[Scikit-learn-general] specificity calculation

2015-06-15 Thread Herbert Schulz
Hello,

is there a function to calculate the specificity? i know there is the
classification_report function, but there is no specificity.

best regards,

Herb
--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


[Scikit-learn-general] suggestions for unequal group training

2015-06-15 Thread Neal Becker
I am interested in supervised learning for classification where I have 
multiple classes, but training data is highly unequal.  There may be 1000s 
of training examples for class A, but maybe 100s for class B.  What are 
suggested algorithms/approaches?


--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


[Scikit-learn-general] normalize with nan values

2015-06-15 Thread William Correa beltran
Hello, 

I would like to know if there is a way to normalize a numpy array using the 
preprocessing.normalize function when some of the values are of type nan. I 
prefer not to imput then, just to ignore them at the normalize process. 

Thanks in advance, 
William Correa 
--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] specificity calculation

2015-06-15 Thread Andreas Mueller

No, there is not.
PR welcome, I think.


On 06/15/2015 07:35 AM, Herbert Schulz wrote:

Hello,

is there a function to calculate the specificity? i know there is the 
classification_report function, but there is no specificity.


best regards,

Herb


--


___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] normalize with nan values

2015-06-15 Thread Andreas Mueller

Hey.
Not with scikit-learn but it should be about three lines in numpy to do 
it yourself.
I would replace them with 0 for computing the norm, that is all there 
is, right?


Andy

On 06/15/2015 10:43 AM, William Correa beltran wrote:

Hello,

I would like to know if there is a way to normalize a numpy array 
using the preprocessing.normalize function when some of the values are 
of type nan. I prefer not to imput then, just to ignore them at the 
normalize process.


Thanks in advance,
William Correa


--


___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Incrementally Printing GridSearch Results

2015-06-15 Thread Joel Nothman
I think it gets a bit noisier when using n_jobs != 1, as verbose is passed
to joblib.Parallel. I agree that it's not a very controllable or
well-documented setting.

On 16 June 2015 at 13:24, Adam Goodkind a.goodk...@gmail.com wrote:

 Right. Thank you. I guess I was just overwhelmed by the amount of data
 pouring in.


 On Sun, Jun 14, 2015 at 4:42 PM, Andreas Mueller t3k...@gmail.com wrote:

  Not really. It only ouputs parameters and scores, though, right?
 Well, it prints the parameters when it starts a job and after it finishes
 a job.


 On 06/12/2015 06:57 PM, Adam Goodkind wrote:

 Thanks Andy. I see that I have to set verbose to at least 3 to get the
 scores. However, at that level it prints out a lot. Is there a way to
 refine the output to just the parameters and scores?

  Thanks,
 Adam

 On Wed, Jun 10, 2015 at 3:41 PM, Andreas Mueller t3k...@gmail.com
 wrote:

  Yes, set verbose to a nonzero value.


 On 06/10/2015 03:25 PM, Adam Goodkind wrote:

  Is it possible to print the results of a grid search as each iteration
 is completed?

  Thanks,
 Adam

  --
  *Adam Goodkind *
 adamgoodkind.com http://www.adamgoodkind.com
 @adamgreatkind https://twitter.com/#%21/adamgreatkind


  
 --



 ___
 Scikit-learn-general mailing 
 listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general




 --

 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




  --
  *Adam Goodkind *
 adamgoodkind.com http://www.adamgoodkind.com
 @adamgreatkind https://twitter.com/#%21/adamgreatkind


 --



 ___
 Scikit-learn-general mailing 
 listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general




 --

 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




 --
 *Adam Goodkind *
 adamgoodkind.com http://www.adamgoodkind.com
 @adamgreatkind https://twitter.com/#!/adamgreatkind


 --

 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] silhouette_score and silhouette_samples

2015-06-15 Thread Sebastian Raschka
Thanks, Joel, it makes total sense now! Updating the docstring sounds like a 
good idea, I will get to it in the next couple of days.

Best,
Sebastian

 On Jun 16, 2015, at 12:34 AM, Joel Nothman joel.noth...@gmail.com wrote:
 
 See the sample_size parameter: silhouette score can be calculated on a random 
 subset of the data, presumably for efficiency. Feel free to submit a PR 
 improving the docstring.
 
 On 16 June 2015 at 13:54, Sebastian Raschka se.rasc...@gmail.com wrote:
 Hi, all,
 
 I am a little bit confused about the two related metrics silhouette_score and 
 silhouette_samples. The silhouette_samples calculates the silhouette 
 coefficient for each sample and returns an array of those. However, I am 
 wondering if I interpret the silhouette_score correctly. Based on the 
 documentation at 
 http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html
  I assume that it's just the average of the silhouette coefficients, which 
 can be confirmed by running, e.g.,
 
 np.mean(silhouette_samples(X, y, metric='euclidean'))
 
 Now, I am wondering why silhouette_score has this additional random_state 
 parameter?
 
 Best,
 Sebastian
 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
 
 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Incrementally Printing GridSearch Results

2015-06-15 Thread Adam Goodkind
Right. Thank you. I guess I was just overwhelmed by the amount of data
pouring in.

On Sun, Jun 14, 2015 at 4:42 PM, Andreas Mueller t3k...@gmail.com wrote:

  Not really. It only ouputs parameters and scores, though, right?
 Well, it prints the parameters when it starts a job and after it finishes
 a job.


 On 06/12/2015 06:57 PM, Adam Goodkind wrote:

 Thanks Andy. I see that I have to set verbose to at least 3 to get the
 scores. However, at that level it prints out a lot. Is there a way to
 refine the output to just the parameters and scores?

  Thanks,
 Adam

 On Wed, Jun 10, 2015 at 3:41 PM, Andreas Mueller t3k...@gmail.com wrote:

  Yes, set verbose to a nonzero value.


 On 06/10/2015 03:25 PM, Adam Goodkind wrote:

  Is it possible to print the results of a grid search as each iteration
 is completed?

  Thanks,
 Adam

  --
  *Adam Goodkind *
 adamgoodkind.com http://www.adamgoodkind.com
 @adamgreatkind https://twitter.com/#%21/adamgreatkind


  
 --



 ___
 Scikit-learn-general mailing 
 listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general




 --

 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




  --
  *Adam Goodkind *
 adamgoodkind.com http://www.adamgoodkind.com
 @adamgreatkind https://twitter.com/#%21/adamgreatkind


 --



 ___
 Scikit-learn-general mailing 
 listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general




 --

 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




-- 
*Adam Goodkind *
adamgoodkind.com http://www.adamgoodkind.com
@adamgreatkind https://twitter.com/#!/adamgreatkind
--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] silhouette_score and silhouette_samples

2015-06-15 Thread Joel Nothman
See the sample_size parameter: silhouette score can be calculated on a
random subset of the data, presumably for efficiency. Feel free to submit a
PR improving the docstring.

On 16 June 2015 at 13:54, Sebastian Raschka se.rasc...@gmail.com wrote:

 Hi, all,

 I am a little bit confused about the two related metrics silhouette_score
 and silhouette_samples. The silhouette_samples calculates the silhouette
 coefficient for each sample and returns an array of those. However, I am
 wondering if I interpret the silhouette_score correctly. Based on the
 documentation at
 http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html
 I assume that it's just the average of the silhouette coefficients, which
 can be confirmed by running, e.g.,

 np.mean(silhouette_samples(X, y, metric='euclidean'))

 Now, I am wondering why silhouette_score has this additional random_state
 parameter?

 Best,
 Sebastian

 --
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] normalize with nan values

2015-06-15 Thread Michael Eickenberg
np.nan_to_num replaces NaN's with zeros. If you want to take into account
the fact that you are normalizing over less entries, you need to do

normalize(np.nan_to_num(X)) * np.sqrt(np.isnan(X).sum(0) /
float(X.shape[0]))



On Mon, Jun 15, 2015 at 5:28 PM, Andreas Mueller t3k...@gmail.com wrote:

  Hey.
 Not with scikit-learn but it should be about three lines in numpy to do it
 yourself.
 I would replace them with 0 for computing the norm, that is all there is,
 right?

 Andy


 On 06/15/2015 10:43 AM, William Correa beltran wrote:

  Hello,

  I would like to know if there is a way to normalize a numpy array using
 the preprocessing.normalize function when some of the values are of type
 nan. I prefer not to imput then, just to ignore them at the normalize
 process.

  Thanks in advance,
 William Correa


 --



 ___
 Scikit-learn-general mailing 
 listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general




 --

 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


[Scikit-learn-general] silhouette_score and silhouette_samples

2015-06-15 Thread Sebastian Raschka
Hi, all,

I am a little bit confused about the two related metrics silhouette_score and 
silhouette_samples. The silhouette_samples calculates the silhouette 
coefficient for each sample and returns an array of those. However, I am 
wondering if I interpret the silhouette_score correctly. Based on the 
documentation at 
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html
 I assume that it's just the average of the silhouette coefficients, which can 
be confirmed by running, e.g., 

np.mean(silhouette_samples(X, y, metric='euclidean'))

Now, I am wondering why silhouette_score has this additional random_state 
parameter?

Best,
Sebastian
--
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general