Github user facaiy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18998#discussion_r171412547
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala 
---
    @@ -93,11 +97,21 @@ class HashingTF @Since("1.4.0") (@Since("1.4.0") 
override val uid: String)
       @Since("2.0.0")
       override def transform(dataset: Dataset[_]): DataFrame = {
         val outputSchema = transformSchema(dataset.schema)
    -    val hashingTF = new 
feature.HashingTF($(numFeatures)).setBinary($(binary))
    -    // TODO: Make the hashingTF.transform natively in ml framework to 
avoid extra conversion.
    -    val t = udf { terms: Seq[_] => hashingTF.transform(terms).asML }
    +    val hashUDF = udf { (terms: Seq[_]) =>
    +      val ids = terms.map { term =>
    --- End diff --
    
    Sorry, I can't remember all details exactly since the pr is too old. if my 
memory is correct, the ML implementation keep consistent with MLLIB (old one). 
As the "TODO" above said
    > Make the hashingTF.transform natively in ml framework to avoid extra 
conversion.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to