Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2607#issuecomment-59589703
  
    @manishamde  Sorry for the delay; the code is looking good!  I made some 
small comments inline.  My main overall comment is about specifying parameters. 
 How would it be if we started mimicking the coming API update (as much as 
possible)?  Parameter specification would work as follows:
    
    In DecisionTree, add a static “defaultParams” method so users can 
construct a tree.Strategy instance without having to worry about importing 
Strategy (and remembering its name).  Likewise for GradientBoosting.
    
    Change GradientBoostingStrategy to store tree params in a field 
weakLearnerParams: tree.Strategy
    
    Here’s the use pattern I envision:
    
    val treeParams = DecisionTree.defaultParams()
    treeParams.maxDepth = ...
    val boostingParams = GradientBoosting.defaultParams()
    boostingParams.weakLearnerParams = treeParams
    val model = GradientBoosting.train(myData, boostingParams)
    
    This API should work for Scala and Python right away.  (Though a Python API 
can be another PR.)
    
    For Java, this API should almost work; I believe the only issue will be 
setting fields which take special types (e.g., quantileCalculationStrategy and 
categoricalFeaturesInfo).  For those, there is a nice annotation you can use 
which will automatically add getParamName and setParamName methods for Java 
users to call, and you can override them as needed.  For the special params 
like categoricalFeaturesInfo, you can overload them with versions which take 
Java-friendly types (such as a Java map for categoricalFeaturesInfo and a 
string for quantileCalculationStrategy).  Here’s the BeanProperty doc:
    [http://www.scala-lang.org/api/current/scala/beans/BeanProperty.html]
    
    Does that sound reasonable?
    
    Let me know when it’s ready for another pass and for testing.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to