Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5810#discussion_r29467818
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
 ---
    @@ -107,8 +107,11 @@ private[tree] object DecisionTreeMetadata extends 
Logging {
           numTrees: Int,
           featureSubsetStrategy: String): DecisionTreeMetadata = {
     
    -    val numFeatures = input.take(1)(0).features.size
         val numExamples = input.count()
    +    require(numExamples > 0, s"DecisionTree requires size of input RDD > 
0, " +
    --- End diff --
    
    You should use `isEmpty` rather than count the whole data set. Does this 
help much? You get an exception either way. Although this makes the message 
nicer. At the cost of non-trivial extra work.
    
    At this stage wouldn't the size have already had to be positive? have you 
encountered this in real life?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to