[GitHub] spark pull request: [SQL] Various DataFrame DSL update.

mengxr Wed, 28 Jan 2015 23:23:30 -0800

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4260#discussion_r23750396
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
    @@ -133,15 +132,14 @@ class LogisticRegressionModel private[ml] (
       override def transform(dataset: DataFrame, paramMap: ParamMap): 
DataFrame = {
         transformSchema(dataset.schema, paramMap, logging = true)
         val map = this.paramMap ++ paramMap
    -    val score: Vector => Double = (v) => {
    +    val scoreFunction: Vector => Double = (v) => {
           val margin = BLAS.dot(v, weights)
           1.0 / (1.0 + math.exp(-margin))
         }
         val t = map(threshold)
    -    val predict: Double => Double = (score) => {
    -      if (score > t) 1.0 else 0.0
    -    }
    -    dataset.select($"*", callUDF(score, 
Column(map(featuresCol))).as(map(scoreCol)))
    -      .select($"*", callUDF(predict, 
Column(map(scoreCol))).as(map(predictionCol)))
    +    val predictFunction: Double => Double = (score) => { if (score > t) 
1.0 else 0.0 }
    +    dataset
    +      .select($"*", callUDF(scoreFunction, 
col(map(featuresCol))).as(map(scoreCol)))
    --- End diff --
    
    minor: The word `col` might be used as matrix column index in ML algorithms.
    
    This line is still not straightforward to read. I'm thinking of something 
like the following
    
    ~~~
    val scoreFunc = UDF((score: Double) => {if (score > t) 1.0 else 0.0})
    dataset.select($"*", scoreFunc(col(map(featuresCol))).as(map(scoreCol))
    ~~~



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SQL] Various DataFrame DSL update.

Reply via email to