[GitHub] spark pull request: [SPARK-8538][SPARK-8539][ML] Linear Regression...

jkbradley Tue, 30 Jun 2015 18:18:03 -0700

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/7099#issuecomment-117384810
  
    > Are the *TrainingResults and Results classes too specialized for 
LinearRegressionModel? Where would be an appropriate level of abstraction?
    
    It's OK if we add abstractions now or later on, and those abstractions can 
be private at first if we are uncertain about the API.  The public methods and 
classes won't be able to change in the future, so we do need to think about 
those for sure.
    
    > Any thoughts on RDDs versus DataFrames? If using DataFrames, suggested 
schemas for each intermediate step? Also, how to create a "local DataFrame" 
without a sqlContext?
    
    I think we should try to use DataFrames instead of RDDs wherever possible 
within the spark.ml APIs.  Hopefully the schema will be clear based on the 
source of the data (e.g., following the input and output schema of the 
model.transform method).
    
    You can create a SQLContext from the given DataFrame's SparkContext.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8538][SPARK-8539][ML] Linear Regression...

Reply via email to