Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r158158375 --- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala --- @@ -430,4 +433,49 @@ object ParamsSuite extends SparkFunSuite { require(copyReturnType === obj.getClass, s"${clazz.getName}.copy should return ${clazz.getName} instead of ${copyReturnType.getName}.") } + + /** + * Checks that the class throws an exception in case both `inputCols` and `inputCol` are set and + * in case both `outputCols` and `outputCol` are set. + * These checks are performed only whether the class extends respectively both `HasInputCols` and + * `HasInputCol` and both `HasOutputCols` and `HasOutputCol`. + * + * @param paramsClass The Class to be checked + * @param spark A `SparkSession` instance to use + */ + def checkMultiColumnParams(paramsClass: Class[_ <: Params], spark: SparkSession): Unit = { + import spark.implicits._ + // create fake input Dataset + val feature1 = Array(-1.0, 0.0, 1.0) + val feature2 = Array(1.0, 0.0, -1.0) + val df = feature1.zip(feature2).toSeq.toDF("feature1", "feature2") --- End diff -- The reason why I created the dataframe inside the method was to control the names of the columns it has. Otherwise we can't ensure that those columns exist. I think that the type check is performed later, thus it is not a problem here. What do you think? I preferred to use `paramsClass: Class[_ <: Params]` because I need a clean instance for each of the two checks: if an instance is passed I cannot enforce that it is clean, ie. some parameters weren't already set and I would need to copy it to create new instances as well, since otherwise the second check would be influenced by the first one. What do you think? Thanks.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org