Are you sure this is an apples-to-apples comparison? for example does your SAS process normalize or otherwise transform the data first?
Is the optimization configured similarly in both cases -- same regularization, etc.? Are you sure you are pulling out the intercept correctly? It is a separate value from the logistic regression model in Spark. On Thu, Dec 18, 2014 at 4:34 PM, Franco Barrientos < franco.barrien...@exalitica.com> wrote: > > Hi all!, > > > > I have a problem with LogisticRegressionWithSGD, when I train a data set > with one variable (wich is a amount of an item) and intercept, I get > weights of > > (-0.4021,-207.1749) for both features, respectively. This don´t make sense > to me because I run a logistic regression for the same data in SAS and I > get these weights (-2.6604,0.000245). > > > > The rank of this variable is from 0 to 59102 with a mean of 1158. > > > > The problem is when I want to calculate the probabilities for each user > from data set, this probability is near to zero or zero in much cases, > because when spark calculates exp(-1*(-0.4021+(-207.1749)*amount)) this is > a big number, in fact infinity for spark. > > > > How can I treat this variable? or why this happened? > > > > Thanks , > > > > *Franco Barrientos* > Data Scientist > > Málaga #115, Of. 1003, Las Condes. > Santiago, Chile. > (+562)-29699649 > (+569)-76347893 > > franco.barrien...@exalitica.com > > www.exalitica.com > > [image: http://exalitica.com/web/img/frim.png] > > >