[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread falaki
Github user falaki commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35918173
  
--- Diff: R/pkg/R/mllib.R ---
@@ -71,3 +71,27 @@ setMethod(predict, signature(object = PipelineModel),
   function(object, newData) {
 return(dataFrame(callJMethod(object@model, transform, 
newData@sdf)))
   })
+
+#' Get statistics on a model.
+#'
+#' Returns statistics on a model produced by glm(), similarly to R's 
summary().
+#'
+#' @param model A fitted MLlib model
+#' @return data.frame containing model statistics
+#' @rdname glm
+#' @export
+#' @examples
+#'\dontrun{
+#' model - glm(y ~ x, trainingData)
+#' summary(model)
+#'}
+setMethod(summary, signature(object = PipelineModel),
--- End diff --

I think this should be ```summary.glm```,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35914268
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala ---
@@ -56,17 +56,26 @@ class OneHotEncoder(override val uid: String) extends 
Transformer
 new BooleanParam(this, dropLast, whether to drop the last category)
   setDefault(dropLast - true)
 
+  /**
+   * Param for the output attr prefix. If not specified, a prefix will be 
automatically generated.
+   * @group param
+   */
+  final val outputAttrPrefix: Param[String] =
--- End diff --

We try to keep ML attributes handling under the hood. So this parameter 
might be surprising to many users. Maybe we don't need to keep prefix and 
`_is_` in the generated feature names. Instead of country_is_US`, we can have 
just `US` as the feature name, but under group `country` or whatever the output 
column names is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35914239
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala ---
@@ -41,6 +42,17 @@ class VectorAssembler(override val uid: String)
 
   def this() = this(Identifiable.randomUID(vecAssembler))
 
+  /**
+   * Whether to rewrite vector attribute names.
+   * @group param
+   */
+  final val rewriteAttributeNames: BooleanParam =
--- End diff --

Similar arguments here. It would be nice if we can keep ML attributes under 
the hood. I think the major problem is we tied feature name (or feature group 
name) with column name, and it is hard to keep good column names during 
transformation. If in `OneHotEncoder`, we don't add group name to feature name. 
The attribute transformation is

~~~
county - OneHotEncoder - country. : [US, CA, ...] - VectorAssembler 
- [country.US, country.CA]
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35914255
  
--- Diff: R/pkg/R/mllib.R ---
@@ -71,3 +71,27 @@ setMethod(predict, signature(object = PipelineModel),
   function(object, newData) {
 return(dataFrame(callJMethod(object@model, transform, 
newData@sdf)))
   })
+
+#' Get statistics on a model.
+#'
+#' Returns statistics on a model produced by glm(), similarly to R's 
summary().
+#'
+#' @param model A fitted MLlib model
+#' @return data.frame containing model statistics
+#' @rdname glm
+#' @export
+#' @examples
+#'\dontrun{
+#' model - glm(y ~ x, trainingData)
+#' summary(model)
+#'}
+setMethod(summary, signature(object = PipelineModel),
+  function(object) {
+features - 
callJStatic(org.apache.spark.ml.api.r.SparkRWrappers,
+   getModelFeatures, object@model)
+weights - 
callJStatic(org.apache.spark.ml.api.r.SparkRWrappers,
+   getModelWeights, object@model)
+stats - data.frame(unlist(features), unlist(weights))
--- End diff --

In R, `summary` returns a named list instead of a data.frame. It contains a 
field called `coefficients`, which is an R matrix with row/col names. We should 
implement `summary` the same way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35914277
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -300,7 +303,8 @@ class LinearRegressionTrainingSummary 
private[regression] (
 predictions: DataFrame,
 predictionCol: String,
 labelCol: String,
-val objectiveHistory: Array[Double])
+val objectiveHistory: Array[Double],
+val featuresCol: StructField)
--- End diff --

We can find the schema from `predictions`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35914262
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/attribute/AttributeGroup.scala ---
@@ -165,6 +165,11 @@ class AttributeGroup private (
   /** Converts to a StructField. */
   def toStructField(): StructField = toStructField(Metadata.empty)
 
+  override def toString: String = {
--- End diff --

Is it required? Please keep this PR minimal. You can create another PR if 
`toString` is useful for AttributeGroup. (Btw, we can reuse the JSON 
serialization in `toString`.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35914249
  
--- Diff: R/pkg/R/mllib.R ---
@@ -71,3 +71,27 @@ setMethod(predict, signature(object = PipelineModel),
   function(object, newData) {
 return(dataFrame(callJMethod(object@model, transform, 
newData@sdf)))
   })
+
+#' Get statistics on a model.
--- End diff --

`statistics` - `summary`. We also need to have a better description of the 
return value. For example, this is R's doc of `summary.glm` 
(https://stat.ethz.ch/R-manual/R-devel/library/stats/html/summary.glm.html):

~~~
summary.glm returns an object of class summary.glm, a list with components
...
coefficientsthe matrix of coefficients, standard errors, z-values 
and p-values. Aliased coefficients are omitted.
...
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126510085
  
  [Test build #39112 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39112/consoleFull)
 for   PR 7771 at commit 
[`a5ca93b`](https://github.com/apache/spark/commit/a5ca93b82bf5cca737b2590602b77de343d2922e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35929481
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 
---
@@ -17,7 +17,7 @@
 
 package org.apache.spark.ml.feature
 
-import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.{ArrayBuffer, Set = MutableSet}
--- End diff --

use `mutable.Set`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126515330
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126515314
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126509444
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126509452
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126515665
  
  [Test build #39117 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39117/consoleFull)
 for   PR 7771 at commit 
[`ccd54c3`](https://github.com/apache/spark/commit/ccd54c300928403efe6826995d1bdd0746a993f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35929697
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -300,7 +303,8 @@ class LinearRegressionTrainingSummary 
private[regression] (
 predictions: DataFrame,
--- End diff --

Do not do production when creating this summary. Then we only need to 
remember the feature column name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35930269
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala ---
@@ -167,4 +165,10 @@ class OneHotEncoder(override val uid: String) extends 
Transformer
   }
 
   override def copy(extra: ParamMap): OneHotEncoder = defaultCopy(extra)
+
+  private def toOutputAttrName(index: Int): String = 
toOutputAttrName(index.toString)
+
+  private def toOutputAttrName(value: String): String = {
--- End diff --

`toOutputAttrName`s are not necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126517613
  
LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7771


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35928619
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/attribute/AttributeGroup.scala ---
@@ -165,6 +165,11 @@ class AttributeGroup private (
   /** Converts to a StructField. */
   def toStructField(): StructField = toStructField(Metadata.empty)
 
+  override def toString: String = {
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35928599
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala ---
@@ -41,6 +42,17 @@ class VectorAssembler(override val uid: String)
 
   def this() = this(Identifiable.randomUID(vecAssembler))
 
+  /**
+   * Whether to rewrite vector attribute names.
+   * @group param
+   */
+  final val rewriteAttributeNames: BooleanParam =
--- End diff --

Done (assuming the OneHotEncoder behavior change is ok).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35928607
  
--- Diff: R/pkg/R/mllib.R ---
@@ -71,3 +71,27 @@ setMethod(predict, signature(object = PipelineModel),
   function(object, newData) {
 return(dataFrame(callJMethod(object@model, transform, 
newData@sdf)))
   })
+
+#' Get statistics on a model.
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35928626
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala ---
@@ -56,17 +56,26 @@ class OneHotEncoder(override val uid: String) extends 
Transformer
 new BooleanParam(this, dropLast, whether to drop the last category)
   setDefault(dropLast - true)
 
+  /**
+   * Param for the output attr prefix. If not specified, a prefix will be 
automatically generated.
+   * @group param
+   */
+  final val outputAttrPrefix: Param[String] =
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35928666
  
--- Diff: R/pkg/R/mllib.R ---
@@ -71,3 +71,27 @@ setMethod(predict, signature(object = PipelineModel),
   function(object, newData) {
 return(dataFrame(callJMethod(object@model, transform, 
newData@sdf)))
   })
+
+#' Get statistics on a model.
+#'
+#' Returns statistics on a model produced by glm(), similarly to R's 
summary().
+#'
+#' @param model A fitted MLlib model
+#' @return data.frame containing model statistics
+#' @rdname glm
+#' @export
+#' @examples
+#'\dontrun{
+#' model - glm(y ~ x, trainingData)
+#' summary(model)
+#'}
+setMethod(summary, signature(object = PipelineModel),
--- End diff --

That one is specific to summary.glm class, I think we need to override 
summary to have it work for pipelinemodel.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35928613
  
--- Diff: R/pkg/R/mllib.R ---
@@ -71,3 +71,27 @@ setMethod(predict, signature(object = PipelineModel),
   function(object, newData) {
 return(dataFrame(callJMethod(object@model, transform, 
newData@sdf)))
   })
+
+#' Get statistics on a model.
+#'
+#' Returns statistics on a model produced by glm(), similarly to R's 
summary().
+#'
+#' @param model A fitted MLlib model
+#' @return data.frame containing model statistics
+#' @rdname glm
+#' @export
+#' @examples
+#'\dontrun{
+#' model - glm(y ~ x, trainingData)
+#' summary(model)
+#'}
+setMethod(summary, signature(object = PipelineModel),
+  function(object) {
+features - 
callJStatic(org.apache.spark.ml.api.r.SparkRWrappers,
+   getModelFeatures, object@model)
+weights - 
callJStatic(org.apache.spark.ml.api.r.SparkRWrappers,
+   getModelWeights, object@model)
+stats - data.frame(unlist(features), unlist(weights))
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35928631
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -300,7 +303,8 @@ class LinearRegressionTrainingSummary 
private[regression] (
 predictions: DataFrame,
 predictionCol: String,
 labelCol: String,
-val objectiveHistory: Array[Double])
+val objectiveHistory: Array[Double],
+val featuresCol: StructField)
--- End diff --

I don't think so, since we project only predictionCol and labelCol.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126522217
  
  [Test build #39117 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39117/console)
 for   PR 7771 at commit 
[`ccd54c3`](https://github.com/apache/spark/commit/ccd54c300928403efe6826995d1bdd0746a993f7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126522281
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126516125
  
  [Test build #39112 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39112/console)
 for   PR 7771 at commit 
[`a5ca93b`](https://github.com/apache/spark/commit/a5ca93b82bf5cca737b2590602b77de343d2922e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126516296
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35931506
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala ---
@@ -167,4 +165,10 @@ class OneHotEncoder(override val uid: String) extends 
Transformer
   }
 
   override def copy(extra: ParamMap): OneHotEncoder = defaultCopy(extra)
+
+  private def toOutputAttrName(index: Int): String = 
toOutputAttrName(index.toString)
+
+  private def toOutputAttrName(value: String): String = {
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35931502
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -300,7 +303,8 @@ class LinearRegressionTrainingSummary 
private[regression] (
 predictions: DataFrame,
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-30 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/7771#discussion_r35931515
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 
---
@@ -17,7 +17,7 @@
 
 package org.apache.spark.ml.feature
 
-import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.{ArrayBuffer, Set = MutableSet}
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126167104
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126167119
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126167173
  
  [Test build #38960 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38960/consoleFull)
 for   PR 7771 at commit 
[`2772111`](https://github.com/apache/spark/commit/27721112709b2df9254f14e878bd14604e5d442c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126170375
  
  [Test build #38960 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38960/console)
 for   PR 7771 at commit 
[`2772111`](https://github.com/apache/spark/commit/27721112709b2df9254f14e878bd14604e5d442c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7771#issuecomment-126170419
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9463] [ML] Expose model coefficients wi...

2015-07-29 Thread ericl
GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/7771

[SPARK-9463] [ML] Expose model coefficients with names in SparkR RFormula

Preview:

```
 summary(m)
features coefficients
1(Intercept)1.6765001
2   Sepal_Length0.3498801
3 Species.versicolor   -0.9833885
4  Species.virginica   -1.0075104

```

Design doc from umbrella task: 
https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit

cc @mengxr 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark summary

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7771.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7771


commit 8c539aa87cf304b874db3e600ee1416b642b4697
Author: Eric Liang e...@databricks.com
Date:   2015-07-29T22:56:49Z

first pass

commit 3c55024282fbb14c903e2f2a40e099d6684af2c3
Author: Eric Liang e...@databricks.com
Date:   2015-07-30T02:21:48Z

working

commit 7c247d4f5c5f9486c3543d0fc24938cd9d70c935
Author: Eric Liang e...@databricks.com
Date:   2015-07-30T02:23:59Z

Merge branch 'master' into summary

Conflicts:
R/pkg/inst/tests/test_mllib.R

commit 70483efe5c220fbfbf9a57bee893a94bc63eadb0
Author: Eric Liang e...@databricks.com
Date:   2015-07-30T02:47:25Z

fix test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org