Github user dongjinleekr commented on a diff in the pull request: https://github.com/apache/spark/pull/21525#discussion_r216333687 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Transformer.scala --- @@ -116,10 +116,17 @@ abstract class UnaryTransformer[IN, OUT, T <: UnaryTransformer[IN, OUT, T]] StructType(outputFields) } + /** + * Returns [[Metadata]] to be attached to the output column. + */ + protected def outputMetadata(outputSchema: StructType, dataset: Dataset[_]): Metadata = + Metadata.empty + override def transform(dataset: Dataset[_]): DataFrame = { - transformSchema(dataset.schema, logging = true) + val outputSchema = transformSchema(dataset.schema, logging = true) val transformUDF = udf(this.createTransformFunc, outputDataType) - dataset.withColumn($(outputCol), transformUDF(dataset($(inputCol)))) + val metadata = outputMetadata(outputSchema, dataset) --- End diff -- Sorry for the late reply. Here is the answer: **because the ultimate goal is [to make `HashingTF` to extend `UnaryTransformer`](https://issues.apache.org/jira/browse/SPARK-13998), not just attaching attribute**. Yes, you are right, `HashingTF` is an example of how metadata is created and attached to `outputSchema`. However, we need a method to wrap that metadata routine to replace `HashingTF extends Transformer with HasInputCol with HasOutputCol` into `HashingTF extends UnaryTransformer`. It's why. (Please refer Joseph K. Bradley's comment at [SPARK-13998](https://issues.apache.org/jira/browse/SPARK-13998))
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org