[ https://issues.apache.org/jira/browse/SPARK-26387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-26387. ------------------------------- Resolution: Not A Problem It shouldn't have any effect. But, you might get different results on different runs if you don't fix a seed for k-fold cross validation. Reopen if that's not it, and you can maybe show a reproducer vs 2.4 or master. > Parallelism seems to cause difference in CrossValidation model metrics > ---------------------------------------------------------------------- > > Key: SPARK-26387 > URL: https://issues.apache.org/jira/browse/SPARK-26387 > Project: Spark > Issue Type: Bug > Components: ML, MLlib > Affects Versions: 2.3.1, 2.3.2 > Reporter: Evan Zamir > Priority: Major > > I can only reproduce this issue when running Spark on different Amazon EMR > versions, but it seems that between Spark 2.3.1 and 2.3.2 (corresponding to > EMR versions 5.17/5.18) the presence of the parallelism parameter was causing > AUC metric to increase. Literally, I run the same exact code with and without > parallelism and the AUC of my models (logistic regression) are changing > significantly. I can't find a previous bug report relating to this, so I'm > posting this as new. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org