Hi Team,

I am using a multinomial regression in Spark Scala. I want to generate the
coefficient and p-values for every category.

For example, given two variables salary group (dependent variable) and age
group (Independent variable)

salary-group: 10,000-, 10,000-100,000, 100,000+
age-group: 30-, 30-40, 40+

I am looking to get an output like

With 10,000- as baseline,  get the coefficients and pvalues for each
category. in the salary group

10,000-100,000,
coefficient Pvalue
Intercept .. ..

age group
30-40 .. ..
40+ .. ..
30- 0 0

100,000+ coefficient Pvalue
Intercept .. ..

age group
30-40 .. ..
40+ .. ..
30- 0 0
To do this, I am forced to use glm with binomial family twice. In order to
parallelize it,  I am using thread pools which doesn't seem ideal.

Do you think there is a way to do multinomial logit in spark scala.I do see
it in spark R : https://rdrr.io/cran/SparkR/man/spark.logit.html

Is there a spark way to make the glms parallel? Something like:-

SparkLogisticRegressionResult glm (df: DataFrame) {
}

dfs : Seq[df]
dfs.map(glm)


Thanks a lot for the help!

Regards,
Surya,

-- 
Confidentiality Notice: This email and any files transmitted with it are 
confidential and intended solely for the use of the individual or entity to 
whom they are addressed.  Additionally, this email and any files 
transmitted with it may not be disseminated, distributed or copied. Please 
notify the sender immediately by email if you have received this email by 
mistake and delete this email from your system. If you are not the intended 
recipient, you are notified that disclosing, copying, distributing or 
taking any action in reliance on the contents of this information is 
strictly prohibited.

-- 
 
<http://www.medallia.com/gartner-report/?source=Marketing%20-%20Email&utm_campaign=FY22Q4_NA_Gartner_MQ_VoC_Campaign&utm_medium=email&utm_source=email-signature&utm_content=report&utm_term=medallia-named-a-leader>

Reply via email to