Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Michael Eickenberg
Here is a possibly useful comment of larsmans on stackoverflow about exactly this procedure http://stackoverflow.com/questions/26604175/how-to-predict-a-continuous-dependent-variable-that-expresses-target-class-proba/26614131#comment41846816_26614131 On Mon, Oct 10, 2016 at 4:04 PM, Sean Violant

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Sean Violante
sorry yes there was a misunderstanding: I meant for each feature configuration you should pass in two rows (one for the positive cases and one for the negative) and the sample weight being the corresponding count for that configuration and class and I am saying that the total count is important

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Raphael C
On 10 October 2016 at 12:22, Sean Violante wrote: > no ( but please check !) > > sample weights should be the counts for the respective label (0/1) > > [ I am actually puzzled about the glm help file - proportions loses how > often an input data 'row' was present relative to the other - though you

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Sean Violante
no ( but please check !) sample weights should be the counts for the respective label (0/1) [ I am actually puzzled about the glm help file - proportions loses how often an input data 'row' was present relative to the other - though you could do this by repeating the row 'n' times] On Mon, Oct 1

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Raphael C
How do I use sample_weight for my use case? In my case is "y" an array of 0s and 1s and sample_weight then an array real numbers between 0 and 1 where I should make sure to set sample_weight[i]= 0 when y[i] = 0? Raphael On 10 October 2016 at 12:08, Sean Violante wrote: > should be the sample we

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Sean Violante
should be the sample weight function in fit http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html On Mon, Oct 10, 2016 at 1:03 PM, Raphael C wrote: > I just noticed this about the glm package in R. > http://stats.stackexchange.com/a/26779/53128 > > " > Th

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Raphael C
I just noticed this about the glm package in R. http://stats.stackexchange.com/a/26779/53128 " The glm function in R allows 3 ways to specify the formula for a logistic regression model. The most common is that each row of the data frame represents a single observation and the response variable i

[scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Raphael C
I am trying to perform regression where my dependent variable is constrained to be between 0 and 1. This constraint comes from the fact that it represents a count proportion. That is counts in some category divided by a total count. In the literature it seems that one common way to tackle this is