Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15314#discussion_r81784677
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
    @@ -61,6 +61,29 @@ object MLTestingUtils extends SparkFunSuite {
           "Column label must be of type NumericType but was actually of type 
StringType"))
       }
     
    +  def checkWeightColNumericTypes[M <: Model[M], T <: Estimator[M]](
    --- End diff --
    
    What do you think about merging the weight col check into the existing 
`checkNumericTypes` method? We can check if the estimator passed in accepts 
weight col and if it does we can check that along with label col at the same 
time. That way if either of them fails then the whole test will fail. To be 
more concrete, something like this inside `checkNumericType`:
    
    ````scala
    val actuals = dfs.keys.filter(_ != DoubleType).map { t =>
          estimator match {
            case weighted: Estimator[M] with HasWeightCol =>
              weighted.set(weighted.weightCol, "weight")
                .fit(dfs(t).withColumn("weight", rand(seed = 42)))
            case _: Estimator[M] => estimator.fit(dfs(t))
            case _ => throw new Exception()
          }
        }
    ````
    
    This will allow us to avoid doing a bunch of extra fit calls which are 
expensive, and we won't have to create any new methods. For testing that 
strings throw an error, we can check label col and weight col separately inside 
the method. What are your thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to