[ https://issues.apache.org/jira/browse/SPARK-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yanbo Liang resolved SPARK-18501. --------------------------------- Resolution: Fixed Fix Version/s: 2.1.0 > SparkR spark.glm error on collinear data > ----------------------------------------- > > Key: SPARK-18501 > URL: https://issues.apache.org/jira/browse/SPARK-18501 > Project: Spark > Issue Type: Bug > Components: ML, SparkR > Reporter: Yanbo Liang > Assignee: Yanbo Liang > Fix For: 2.1.0 > > > Spark {{GeneralizedLinearRegression}} can handle collinear data since the > underlying {{WeightedLeastSquares}} can be solved by local "l-bfgs"(rather > than "normal"). But the SparkR wrapper {{spark.glm}} throw errors when > fitting on collinear data: > {code} > > df <- read.df("data/mllib/sample_binary_classification_data.txt", source = > > "libsvm") > > model <- spark.glm(df, label ~ features, family = binomial(link = "logit”)) > > summary(model) > Error in `rownames<-`(`*tmp*`, value = c("(Intercept)", "features_0", : > length of 'dimnames' [1] not equal to array extent > {code} > After depth study of this error, I found it was caused the standard error of > coefficients, t value and p value are not available when the underlying > {{WeightedLeastSquares}} was solved by local "l-bfgs". So the coefficients > matrix was generated failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org