It turns out that the weight was too large (with mean around 5000 and the
standard deviation around 8000) and caused overflow. After scaling down the
weight to, for example, numbers between 0 and 1, the code converged nicely.
Spark did not report the overflow issue. We actually found it out by run
The Logistic Regression (LR) offered by Spark has rather limited model
statistics output. I would like to have access to q-value, AIC, standard
error etc. Generalized Linear Regression (GLR) does offer these statistics
in the model output, and can be used as as LR if one specifies
family="binomial"