Github user harsha2010 commented on a diff in the pull request:
https://github.com/apache/spark/pull/6403#discussion_r31079050
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -176,7 +176,7 @@ final class OneVsRest(override val uid: String)
}
// create k columns, one for each binary classifier.
- val models = Range(0, numClasses).par.map { index =>
+ val models = Range(0, numClasses).map { index =>
--- End diff --
@mengxr , i removed par because it is possible that the underlying
classifier caches a portion of the dataset, so if this runs in parallel, we end
up creating multiple copies of the dataset in the intermediate stages(I wasn't
too sure this would be an issue since i am already caching the multiclass
labeled dataset, but the behavior of the underlying classifiers in the
meta-learner scenario as far as caching goes is still a bit unclear to me, so
decided it is less risk this way)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]