Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/7099#issuecomment-117384810
> Are the *TrainingResults and Results classes too specialized for
LinearRegressionModel? Where would be an appropriate level of abstraction?
It's OK if we add abstractions now or later on, and those abstractions can
be private at first if we are uncertain about the API. The public methods and
classes won't be able to change in the future, so we do need to think about
those for sure.
> Any thoughts on RDDs versus DataFrames? If using DataFrames, suggested
schemas for each intermediate step? Also, how to create a "local DataFrame"
without a sqlContext?
I think we should try to use DataFrames instead of RDDs wherever possible
within the spark.ml APIs. Hopefully the schema will be clear based on the
source of the data (e.g., following the input and output schema of the
model.transform method).
You can create a SQLContext from the given DataFrame's SparkContext.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]