[GitHub] [spark] srowen commented on a change in pull request #24793: [SPARK-27944][ML] Unify the behavior of checking empty output column names

GitBox Fri, 12 Jul 2019 05:58:35 -0700

srowen commented on a change in pull request #24793: [SPARK-27944][ML] Unify 
the behavior of checking empty output column names
URL: https://github.com/apache/spark/pull/24793#discussion_r302969870


 ##########
 File path: 
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
 ##########
 @@ -107,9 +107,15 @@ class BisectingKMeansModel private[ml] (
   @Since("2.0.0")
   override def transform(dataset: Dataset[_]): DataFrame = {
     transformSchema(dataset.schema, logging = true)
-    val predictUDF = udf((vector: Vector) => predict(vector))
-    dataset.withColumn($(predictionCol),
-      predictUDF(DatasetUtils.columnToVector(dataset, getFeaturesCol)))
+    if ($(predictionCol).nonEmpty) {
 
 Review comment:
   Hm, I'd say we don't need this check anywhere that the user would have to 
explicitly set no prediction column to get no output, and in that case, I don't 
think it's worth checking and warning. I'm neutral on removing the other 
checks, but not against it.
   
   Some checks are OK like the ones above as it might be easier to accidentally 
get into this situation because there are multiple prediction cols.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #24793: [SPARK-27944][ML] Unify the behavior of checking empty output column names

Reply via email to