[GitHub] spark pull request #20313: [SPARK-22974][ML] Attach attributes to output col...

viirya Tue, 12 Jun 2018 00:21:44 -0700

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20313#discussion_r194636521
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
    @@ -264,7 +265,9 @@ class CountVectorizerModel(
     
           Vectors.sparse(dictBr.value.size, effectiveCounts)
         }
    -    dataset.withColumn($(outputCol), vectorizer(col($(inputCol))))
    +    val attrs = vocabulary.map(_ => new 
NumericAttribute).asInstanceOf[Array[Attribute]]
    --- End diff --
    
    Sorry for replying late. Though I agree that this attributes don't provide 
much info, I'm wondering if we can let it lazily generated. At this point, I 
think we don't know if following transformer will need it or not?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20313: [SPARK-22974][ML] Attach attributes to output col...

Reply via email to