If you can examine your data matrix and know that about < 1/6 or so of the values are non-zero (so > 5/6 are zeros), then it's probably worth using sparse vectors. (1/6 is a rough estimate.)
There is support for L1 and L2 regularization. You can look at the guide here: http://spark.apache.org/docs/latest/mllib-linear-methods.html#logistic-regression and the API docs linked from the menu. On Fri, Apr 3, 2015 at 1:24 PM, Jeetendra Gangele <gangele...@gmail.com> wrote: > Hi All > I am building a logistic regression for matching the person data lets say > two person object is given with their attribute we need to find the score. > that means at side you have 10 millions records and other side we have 1 > record , we need to tell which one match with highest score among 1 million. > > I am strong the score of similarity algos in dense matrix and considering > this as features. will apply many similarity alogs on one attributes. > > Should i use sparse or dense? what happen in dense when score is null or > when some of the attribute is missing? > > is there any support for regularized logistic regression ?currently i am > using LogisticRegressionWithSGD. > > Regards > jeetendra >