Hi Team, I am using a multinomial regression in Spark Scala. I want to generate the coefficient and p-values for every category.
For example, given two variables salary group (dependent variable) and age group (Independent variable) salary-group: 10,000-, 10,000-100,000, 100,000+ age-group: 30-, 30-40, 40+ I am looking to get an output like With 10,000- as baseline, get the coefficients and pvalues for each category. in the salary group 10,000-100,000, coefficient Pvalue Intercept .. .. age group 30-40 .. .. 40+ .. .. 30- 0 0 100,000+ coefficient Pvalue Intercept .. .. age group 30-40 .. .. 40+ .. .. 30- 0 0 To do this, I am forced to use glm with binomial family twice. In order to parallelize it, I am using thread pools which doesn't seem ideal. Do you think there is a way to do multinomial logit in spark scala.I do see it in spark R : https://rdrr.io/cran/SparkR/man/spark.logit.html Is there a spark way to make the glms parallel? Something like:- SparkLogisticRegressionResult glm (df: DataFrame) { } dfs : Seq[df] dfs.map(glm) Thanks a lot for the help! Regards, Surya, -- Confidentiality Notice: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. Additionally, this email and any files transmitted with it may not be disseminated, distributed or copied. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. -- <http://www.medallia.com/gartner-report/?source=Marketing%20-%20Email&utm_campaign=FY22Q4_NA_Gartner_MQ_VoC_Campaign&utm_medium=email&utm_source=email-signature&utm_content=report&utm_term=medallia-named-a-leader>