[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35918173 --- Diff: R/pkg/R/mllib.R --- @@ -71,3 +71,27 @@ setMethod(predict, signature(object = PipelineModel), function(object, newData) { return(dataFrame(callJMethod(object@model, transform, newData@sdf))) }) + +#' Get statistics on a model. +#' +#' Returns statistics on a model produced by glm(), similarly to R's summary(). +#' +#' @param model A fitted MLlib model +#' @return data.frame containing model statistics +#' @rdname glm +#' @export +#' @examples +#'\dontrun{ +#' model - glm(y ~ x, trainingData) +#' summary(model) +#'} +setMethod(summary, signature(object = PipelineModel), --- End diff -- I think this should be ```summary.glm```, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35914268 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala --- @@ -56,17 +56,26 @@ class OneHotEncoder(override val uid: String) extends Transformer new BooleanParam(this, dropLast, whether to drop the last category) setDefault(dropLast - true) + /** + * Param for the output attr prefix. If not specified, a prefix will be automatically generated. + * @group param + */ + final val outputAttrPrefix: Param[String] = --- End diff -- We try to keep ML attributes handling under the hood. So this parameter might be surprising to many users. Maybe we don't need to keep prefix and `_is_` in the generated feature names. Instead of country_is_US`, we can have just `US` as the feature name, but under group `country` or whatever the output column names is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35914239 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -41,6 +42,17 @@ class VectorAssembler(override val uid: String) def this() = this(Identifiable.randomUID(vecAssembler)) + /** + * Whether to rewrite vector attribute names. + * @group param + */ + final val rewriteAttributeNames: BooleanParam = --- End diff -- Similar arguments here. It would be nice if we can keep ML attributes under the hood. I think the major problem is we tied feature name (or feature group name) with column name, and it is hard to keep good column names during transformation. If in `OneHotEncoder`, we don't add group name to feature name. The attribute transformation is ~~~ county - OneHotEncoder - country. : [US, CA, ...] - VectorAssembler - [country.US, country.CA] ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35914255 --- Diff: R/pkg/R/mllib.R --- @@ -71,3 +71,27 @@ setMethod(predict, signature(object = PipelineModel), function(object, newData) { return(dataFrame(callJMethod(object@model, transform, newData@sdf))) }) + +#' Get statistics on a model. +#' +#' Returns statistics on a model produced by glm(), similarly to R's summary(). +#' +#' @param model A fitted MLlib model +#' @return data.frame containing model statistics +#' @rdname glm +#' @export +#' @examples +#'\dontrun{ +#' model - glm(y ~ x, trainingData) +#' summary(model) +#'} +setMethod(summary, signature(object = PipelineModel), + function(object) { +features - callJStatic(org.apache.spark.ml.api.r.SparkRWrappers, + getModelFeatures, object@model) +weights - callJStatic(org.apache.spark.ml.api.r.SparkRWrappers, + getModelWeights, object@model) +stats - data.frame(unlist(features), unlist(weights)) --- End diff -- In R, `summary` returns a named list instead of a data.frame. It contains a field called `coefficients`, which is an R matrix with row/col names. We should implement `summary` the same way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35914277 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -300,7 +303,8 @@ class LinearRegressionTrainingSummary private[regression] ( predictions: DataFrame, predictionCol: String, labelCol: String, -val objectiveHistory: Array[Double]) +val objectiveHistory: Array[Double], +val featuresCol: StructField) --- End diff -- We can find the schema from `predictions`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35914262 --- Diff: mllib/src/main/scala/org/apache/spark/ml/attribute/AttributeGroup.scala --- @@ -165,6 +165,11 @@ class AttributeGroup private ( /** Converts to a StructField. */ def toStructField(): StructField = toStructField(Metadata.empty) + override def toString: String = { --- End diff -- Is it required? Please keep this PR minimal. You can create another PR if `toString` is useful for AttributeGroup. (Btw, we can reuse the JSON serialization in `toString`.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35914249 --- Diff: R/pkg/R/mllib.R --- @@ -71,3 +71,27 @@ setMethod(predict, signature(object = PipelineModel), function(object, newData) { return(dataFrame(callJMethod(object@model, transform, newData@sdf))) }) + +#' Get statistics on a model. --- End diff -- `statistics` - `summary`. We also need to have a better description of the return value. For example, this is R's doc of `summary.glm` (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/summary.glm.html): ~~~ summary.glm returns an object of class summary.glm, a list with components ... coefficientsthe matrix of coefficients, standard errors, z-values and p-values. Aliased coefficients are omitted. ... ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126510085 [Test build #39112 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39112/consoleFull) for PR 7771 at commit [`a5ca93b`](https://github.com/apache/spark/commit/a5ca93b82bf5cca737b2590602b77de343d2922e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35929481 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -17,7 +17,7 @@ package org.apache.spark.ml.feature -import scala.collection.mutable.ArrayBuffer +import scala.collection.mutable.{ArrayBuffer, Set = MutableSet} --- End diff -- use `mutable.Set` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126515330 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126515314 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126509444 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126509452 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126515665 [Test build #39117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39117/consoleFull) for PR 7771 at commit [`ccd54c3`](https://github.com/apache/spark/commit/ccd54c300928403efe6826995d1bdd0746a993f7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35929697 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -300,7 +303,8 @@ class LinearRegressionTrainingSummary private[regression] ( predictions: DataFrame, --- End diff -- Do not do production when creating this summary. Then we only need to remember the feature column name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35930269 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala --- @@ -167,4 +165,10 @@ class OneHotEncoder(override val uid: String) extends Transformer } override def copy(extra: ParamMap): OneHotEncoder = defaultCopy(extra) + + private def toOutputAttrName(index: Int): String = toOutputAttrName(index.toString) + + private def toOutputAttrName(value: String): String = { --- End diff -- `toOutputAttrName`s are not necessary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126517613 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/7771 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35928619 --- Diff: mllib/src/main/scala/org/apache/spark/ml/attribute/AttributeGroup.scala --- @@ -165,6 +165,11 @@ class AttributeGroup private ( /** Converts to a StructField. */ def toStructField(): StructField = toStructField(Metadata.empty) + override def toString: String = { --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35928599 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -41,6 +42,17 @@ class VectorAssembler(override val uid: String) def this() = this(Identifiable.randomUID(vecAssembler)) + /** + * Whether to rewrite vector attribute names. + * @group param + */ + final val rewriteAttributeNames: BooleanParam = --- End diff -- Done (assuming the OneHotEncoder behavior change is ok). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35928607 --- Diff: R/pkg/R/mllib.R --- @@ -71,3 +71,27 @@ setMethod(predict, signature(object = PipelineModel), function(object, newData) { return(dataFrame(callJMethod(object@model, transform, newData@sdf))) }) + +#' Get statistics on a model. --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35928626 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala --- @@ -56,17 +56,26 @@ class OneHotEncoder(override val uid: String) extends Transformer new BooleanParam(this, dropLast, whether to drop the last category) setDefault(dropLast - true) + /** + * Param for the output attr prefix. If not specified, a prefix will be automatically generated. + * @group param + */ + final val outputAttrPrefix: Param[String] = --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35928666 --- Diff: R/pkg/R/mllib.R --- @@ -71,3 +71,27 @@ setMethod(predict, signature(object = PipelineModel), function(object, newData) { return(dataFrame(callJMethod(object@model, transform, newData@sdf))) }) + +#' Get statistics on a model. +#' +#' Returns statistics on a model produced by glm(), similarly to R's summary(). +#' +#' @param model A fitted MLlib model +#' @return data.frame containing model statistics +#' @rdname glm +#' @export +#' @examples +#'\dontrun{ +#' model - glm(y ~ x, trainingData) +#' summary(model) +#'} +setMethod(summary, signature(object = PipelineModel), --- End diff -- That one is specific to summary.glm class, I think we need to override summary to have it work for pipelinemodel. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35928613 --- Diff: R/pkg/R/mllib.R --- @@ -71,3 +71,27 @@ setMethod(predict, signature(object = PipelineModel), function(object, newData) { return(dataFrame(callJMethod(object@model, transform, newData@sdf))) }) + +#' Get statistics on a model. +#' +#' Returns statistics on a model produced by glm(), similarly to R's summary(). +#' +#' @param model A fitted MLlib model +#' @return data.frame containing model statistics +#' @rdname glm +#' @export +#' @examples +#'\dontrun{ +#' model - glm(y ~ x, trainingData) +#' summary(model) +#'} +setMethod(summary, signature(object = PipelineModel), + function(object) { +features - callJStatic(org.apache.spark.ml.api.r.SparkRWrappers, + getModelFeatures, object@model) +weights - callJStatic(org.apache.spark.ml.api.r.SparkRWrappers, + getModelWeights, object@model) +stats - data.frame(unlist(features), unlist(weights)) --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35928631 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -300,7 +303,8 @@ class LinearRegressionTrainingSummary private[regression] ( predictions: DataFrame, predictionCol: String, labelCol: String, -val objectiveHistory: Array[Double]) +val objectiveHistory: Array[Double], +val featuresCol: StructField) --- End diff -- I don't think so, since we project only predictionCol and labelCol. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126522217 [Test build #39117 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39117/console) for PR 7771 at commit [`ccd54c3`](https://github.com/apache/spark/commit/ccd54c300928403efe6826995d1bdd0746a993f7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126522281 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126516125 [Test build #39112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39112/console) for PR 7771 at commit [`a5ca93b`](https://github.com/apache/spark/commit/a5ca93b82bf5cca737b2590602b77de343d2922e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126516296 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35931506 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala --- @@ -167,4 +165,10 @@ class OneHotEncoder(override val uid: String) extends Transformer } override def copy(extra: ParamMap): OneHotEncoder = defaultCopy(extra) + + private def toOutputAttrName(index: Int): String = toOutputAttrName(index.toString) + + private def toOutputAttrName(value: String): String = { --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35931502 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -300,7 +303,8 @@ class LinearRegressionTrainingSummary private[regression] ( predictions: DataFrame, --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7771#discussion_r35931515 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -17,7 +17,7 @@ package org.apache.spark.ml.feature -import scala.collection.mutable.ArrayBuffer +import scala.collection.mutable.{ArrayBuffer, Set = MutableSet} --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126167104 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126167119 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126167173 [Test build #38960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38960/consoleFull) for PR 7771 at commit [`2772111`](https://github.com/apache/spark/commit/27721112709b2df9254f14e878bd14604e5d442c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126170375 [Test build #38960 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38960/console) for PR 7771 at commit [`2772111`](https://github.com/apache/spark/commit/27721112709b2df9254f14e878bd14604e5d442c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7771#issuecomment-126170419 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...
GitHub user ericl opened a pull request: https://github.com/apache/spark/pull/7771 [SPARK-9463] [ML] Expose model coefficients with names in SparkR RFormula Preview: ``` summary(m) features coefficients 1(Intercept)1.6765001 2 Sepal_Length0.3498801 3 Species.versicolor -0.9833885 4 Species.virginica -1.0075104 ``` Design doc from umbrella task: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit cc @mengxr You can merge this pull request into a Git repository by running: $ git pull https://github.com/ericl/spark summary Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7771.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7771 commit 8c539aa87cf304b874db3e600ee1416b642b4697 Author: Eric Liang e...@databricks.com Date: 2015-07-29T22:56:49Z first pass commit 3c55024282fbb14c903e2f2a40e099d6684af2c3 Author: Eric Liang e...@databricks.com Date: 2015-07-30T02:21:48Z working commit 7c247d4f5c5f9486c3543d0fc24938cd9d70c935 Author: Eric Liang e...@databricks.com Date: 2015-07-30T02:23:59Z Merge branch 'master' into summary Conflicts: R/pkg/inst/tests/test_mllib.R commit 70483efe5c220fbfbf9a57bee893a94bc63eadb0 Author: Eric Liang e...@databricks.com Date: 2015-07-30T02:47:25Z fix test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org