Github user dongjinleekr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21525#discussion_r216333687
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/Transformer.scala ---
    @@ -116,10 +116,17 @@ abstract class UnaryTransformer[IN, OUT, T <: 
UnaryTransformer[IN, OUT, T]]
         StructType(outputFields)
       }
     
    +  /**
    +   * Returns [[Metadata]] to be attached to the output column.
    +   */
    +  protected def outputMetadata(outputSchema: StructType, dataset: 
Dataset[_]): Metadata =
    +    Metadata.empty
    +
       override def transform(dataset: Dataset[_]): DataFrame = {
    -    transformSchema(dataset.schema, logging = true)
    +    val outputSchema = transformSchema(dataset.schema, logging = true)
         val transformUDF = udf(this.createTransformFunc, outputDataType)
    -    dataset.withColumn($(outputCol), transformUDF(dataset($(inputCol))))
    +    val metadata = outputMetadata(outputSchema, dataset)
    --- End diff --
    
    Sorry for the late reply. Here is the answer: **because the ultimate goal 
is [to make `HashingTF` to extend 
`UnaryTransformer`](https://issues.apache.org/jira/browse/SPARK-13998), not 
just attaching attribute**. Yes, you are right, `HashingTF` is an example of 
how metadata is created and attached to `outputSchema`. However, we need a 
method to wrap that metadata routine to replace `HashingTF extends Transformer 
with HasInputCol with HasOutputCol` into `HashingTF extends UnaryTransformer`. 
It's why. (Please refer Joseph K. Bradley's comment at 
[SPARK-13998](https://issues.apache.org/jira/browse/SPARK-13998))


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to