[Scikit-learn-general] specificity calculation
Hello, is there a function to calculate the specificity? i know there is the classification_report function, but there is no specificity. best regards, Herb -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
[Scikit-learn-general] suggestions for unequal group training
I am interested in supervised learning for classification where I have multiple classes, but training data is highly unequal. There may be 1000s of training examples for class A, but maybe 100s for class B. What are suggested algorithms/approaches? -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
[Scikit-learn-general] normalize with nan values
Hello, I would like to know if there is a way to normalize a numpy array using the preprocessing.normalize function when some of the values are of type nan. I prefer not to imput then, just to ignore them at the normalize process. Thanks in advance, William Correa -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] specificity calculation
No, there is not. PR welcome, I think. On 06/15/2015 07:35 AM, Herbert Schulz wrote: Hello, is there a function to calculate the specificity? i know there is the classification_report function, but there is no specificity. best regards, Herb -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] normalize with nan values
Hey. Not with scikit-learn but it should be about three lines in numpy to do it yourself. I would replace them with 0 for computing the norm, that is all there is, right? Andy On 06/15/2015 10:43 AM, William Correa beltran wrote: Hello, I would like to know if there is a way to normalize a numpy array using the preprocessing.normalize function when some of the values are of type nan. I prefer not to imput then, just to ignore them at the normalize process. Thanks in advance, William Correa -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Incrementally Printing GridSearch Results
I think it gets a bit noisier when using n_jobs != 1, as verbose is passed to joblib.Parallel. I agree that it's not a very controllable or well-documented setting. On 16 June 2015 at 13:24, Adam Goodkind a.goodk...@gmail.com wrote: Right. Thank you. I guess I was just overwhelmed by the amount of data pouring in. On Sun, Jun 14, 2015 at 4:42 PM, Andreas Mueller t3k...@gmail.com wrote: Not really. It only ouputs parameters and scores, though, right? Well, it prints the parameters when it starts a job and after it finishes a job. On 06/12/2015 06:57 PM, Adam Goodkind wrote: Thanks Andy. I see that I have to set verbose to at least 3 to get the scores. However, at that level it prints out a lot. Is there a way to refine the output to just the parameters and scores? Thanks, Adam On Wed, Jun 10, 2015 at 3:41 PM, Andreas Mueller t3k...@gmail.com wrote: Yes, set verbose to a nonzero value. On 06/10/2015 03:25 PM, Adam Goodkind wrote: Is it possible to print the results of a grid search as each iteration is completed? Thanks, Adam -- *Adam Goodkind * adamgoodkind.com http://www.adamgoodkind.com @adamgreatkind https://twitter.com/#%21/adamgreatkind -- ___ Scikit-learn-general mailing listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- *Adam Goodkind * adamgoodkind.com http://www.adamgoodkind.com @adamgreatkind https://twitter.com/#%21/adamgreatkind -- ___ Scikit-learn-general mailing listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- *Adam Goodkind * adamgoodkind.com http://www.adamgoodkind.com @adamgreatkind https://twitter.com/#!/adamgreatkind -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] silhouette_score and silhouette_samples
Thanks, Joel, it makes total sense now! Updating the docstring sounds like a good idea, I will get to it in the next couple of days. Best, Sebastian On Jun 16, 2015, at 12:34 AM, Joel Nothman joel.noth...@gmail.com wrote: See the sample_size parameter: silhouette score can be calculated on a random subset of the data, presumably for efficiency. Feel free to submit a PR improving the docstring. On 16 June 2015 at 13:54, Sebastian Raschka se.rasc...@gmail.com wrote: Hi, all, I am a little bit confused about the two related metrics silhouette_score and silhouette_samples. The silhouette_samples calculates the silhouette coefficient for each sample and returns an array of those. However, I am wondering if I interpret the silhouette_score correctly. Based on the documentation at http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html I assume that it's just the average of the silhouette coefficients, which can be confirmed by running, e.g., np.mean(silhouette_samples(X, y, metric='euclidean')) Now, I am wondering why silhouette_score has this additional random_state parameter? Best, Sebastian -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Incrementally Printing GridSearch Results
Right. Thank you. I guess I was just overwhelmed by the amount of data pouring in. On Sun, Jun 14, 2015 at 4:42 PM, Andreas Mueller t3k...@gmail.com wrote: Not really. It only ouputs parameters and scores, though, right? Well, it prints the parameters when it starts a job and after it finishes a job. On 06/12/2015 06:57 PM, Adam Goodkind wrote: Thanks Andy. I see that I have to set verbose to at least 3 to get the scores. However, at that level it prints out a lot. Is there a way to refine the output to just the parameters and scores? Thanks, Adam On Wed, Jun 10, 2015 at 3:41 PM, Andreas Mueller t3k...@gmail.com wrote: Yes, set verbose to a nonzero value. On 06/10/2015 03:25 PM, Adam Goodkind wrote: Is it possible to print the results of a grid search as each iteration is completed? Thanks, Adam -- *Adam Goodkind * adamgoodkind.com http://www.adamgoodkind.com @adamgreatkind https://twitter.com/#%21/adamgreatkind -- ___ Scikit-learn-general mailing listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- *Adam Goodkind * adamgoodkind.com http://www.adamgoodkind.com @adamgreatkind https://twitter.com/#%21/adamgreatkind -- ___ Scikit-learn-general mailing listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- *Adam Goodkind * adamgoodkind.com http://www.adamgoodkind.com @adamgreatkind https://twitter.com/#!/adamgreatkind -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] silhouette_score and silhouette_samples
See the sample_size parameter: silhouette score can be calculated on a random subset of the data, presumably for efficiency. Feel free to submit a PR improving the docstring. On 16 June 2015 at 13:54, Sebastian Raschka se.rasc...@gmail.com wrote: Hi, all, I am a little bit confused about the two related metrics silhouette_score and silhouette_samples. The silhouette_samples calculates the silhouette coefficient for each sample and returns an array of those. However, I am wondering if I interpret the silhouette_score correctly. Based on the documentation at http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html I assume that it's just the average of the silhouette coefficients, which can be confirmed by running, e.g., np.mean(silhouette_samples(X, y, metric='euclidean')) Now, I am wondering why silhouette_score has this additional random_state parameter? Best, Sebastian -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] normalize with nan values
np.nan_to_num replaces NaN's with zeros. If you want to take into account the fact that you are normalizing over less entries, you need to do normalize(np.nan_to_num(X)) * np.sqrt(np.isnan(X).sum(0) / float(X.shape[0])) On Mon, Jun 15, 2015 at 5:28 PM, Andreas Mueller t3k...@gmail.com wrote: Hey. Not with scikit-learn but it should be about three lines in numpy to do it yourself. I would replace them with 0 for computing the norm, that is all there is, right? Andy On 06/15/2015 10:43 AM, William Correa beltran wrote: Hello, I would like to know if there is a way to normalize a numpy array using the preprocessing.normalize function when some of the values are of type nan. I prefer not to imput then, just to ignore them at the normalize process. Thanks in advance, William Correa -- ___ Scikit-learn-general mailing listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
[Scikit-learn-general] silhouette_score and silhouette_samples
Hi, all, I am a little bit confused about the two related metrics silhouette_score and silhouette_samples. The silhouette_samples calculates the silhouette coefficient for each sample and returns an array of those. However, I am wondering if I interpret the silhouette_score correctly. Based on the documentation at http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html I assume that it's just the average of the silhouette coefficients, which can be confirmed by running, e.g., np.mean(silhouette_samples(X, y, metric='euclidean')) Now, I am wondering why silhouette_score has this additional random_state parameter? Best, Sebastian -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general