[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-30 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/7065#issuecomment-117312797
  
Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7065


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7065#issuecomment-116460292
  
  [Test build #35962 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35962/consoleFull)
 for   PR 7065 at commit 
[`4afae45`](https://github.com/apache/spark/commit/4afae457a220b016c1a787748b29d73c0dd15c3d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7065#issuecomment-116488507
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7065#issuecomment-116488468
  
  [Test build #35962 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35962/console)
 for   PR 7065 at commit 
[`4afae45`](https://github.com/apache/spark/commit/4afae457a220b016c1a787748b29d73c0dd15c3d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class PCA (override val uid: String) extends Estimator[PCAModel] with 
PCAParams `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7065#issuecomment-116458114
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7065#issuecomment-116458262
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7065#issuecomment-116241394
  
  [Test build #35929 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35929/console)
 for   PR 7065 at commit 
[`e9effd7`](https://github.com/apache/spark/commit/e9effd738f5df8adeaca283747b5fac175c3c91c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class PCA (override val uid: String) extends Estimator[PCAModel] with 
PCAParams `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-28 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/7065

[SPARK-8664] [ML] Add PCA transformer

Add PCA transformer for ML pipeline


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-8664

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7065.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7065


commit e9effd738f5df8adeaca283747b5fac175c3c91c
Author: Yanbo Liang yblia...@gmail.com
Date:   2015-06-28T08:15:38Z

Add PCA transformer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7065#issuecomment-116224957
  
  [Test build #35929 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35929/consoleFull)
 for   PR 7065 at commit 
[`e9effd7`](https://github.com/apache/spark/commit/e9effd738f5df8adeaca283747b5fac175c3c91c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-28 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/7065#discussion_r33420498
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala ---
@@ -68,7 +68,7 @@ class PCA(val k: Int) {
  * @param k number of principal components.
  * @param pc a principal components Matrix. Each column is one principal 
component.
  */
-class PCAModel private[mllib] (val k: Int, val pc: DenseMatrix) extends 
VectorTransformer {
+class PCAModel private[spark] (val k: Int, val pc: DenseMatrix) extends 
VectorTransformer {
--- End diff --

Because test case of 
ml.feature.PCASuite(https://github.com/apache/spark/pull/7065/files#diff-e1593bb9e311c3f2a2ea49cce20ed671R34)
 use the constructor, so I change it to spark private like Word2VecModel.
There are different access permission of constructors in mllib.feature, 
some are private[spark] while others are public. I think it's confusion and 
need to uniform in a separate task.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7065#issuecomment-116224612
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7065#issuecomment-116224581
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7065#issuecomment-116241503
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7065#discussion_r33435077
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala ---
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.ml._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.feature
+import org.apache.spark.mllib.linalg.{Vector, VectorUDT}
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Params for [[PCA]] and [[PCAModel]].
+ */
+private[feature] trait PCAParams extends Params with HasInputCol with 
HasOutputCol {
+
+  /**
+   * The number of principal components.
+   * @group param
+   */
+  final val k: IntParam = new IntParam(this, k, the number of principal 
components)
+
+  /** @group getParam */
+  def getK: Int = $(k)
+
+}
+
+/**
+ * :: Experimental ::
+ * PCA trains a model to project vectors to a low-dimensional space using 
PCA.
+ */
+@Experimental
+class PCA (override val uid: String) extends Estimator[PCAModel] with 
PCAParams {
+
+  def this() = this(Identifiable.randomUID(PCA))
--- End diff --

`PCA` - pca. Treat the id as a variable name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-28 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/7065#issuecomment-116443860
  
LGTM except a minor inline comment. Please create JIRAs and submit PRs for 
the following:

1. Java unit test
1. Python API
1. set ML attributes in the PCA output column

Thanks!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org