[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7065#issuecomment-117312797 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/7065 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7065#issuecomment-116460292 [Test build #35962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35962/consoleFull) for PR 7065 at commit [`4afae45`](https://github.com/apache/spark/commit/4afae457a220b016c1a787748b29d73c0dd15c3d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7065#issuecomment-116488507 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7065#issuecomment-116488468 [Test build #35962 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35962/console) for PR 7065 at commit [`4afae45`](https://github.com/apache/spark/commit/4afae457a220b016c1a787748b29d73c0dd15c3d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class PCA (override val uid: String) extends Estimator[PCAModel] with PCAParams ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7065#issuecomment-116458114 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7065#issuecomment-116458262 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7065#issuecomment-116241394 [Test build #35929 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35929/console) for PR 7065 at commit [`e9effd7`](https://github.com/apache/spark/commit/e9effd738f5df8adeaca283747b5fac175c3c91c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class PCA (override val uid: String) extends Estimator[PCAModel] with PCAParams ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/7065 [SPARK-8664] [ML] Add PCA transformer Add PCA transformer for ML pipeline You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-8664 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7065.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7065 commit e9effd738f5df8adeaca283747b5fac175c3c91c Author: Yanbo Liang yblia...@gmail.com Date: 2015-06-28T08:15:38Z Add PCA transformer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7065#issuecomment-116224957 [Test build #35929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35929/consoleFull) for PR 7065 at commit [`e9effd7`](https://github.com/apache/spark/commit/e9effd738f5df8adeaca283747b5fac175c3c91c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/7065#discussion_r33420498 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -68,7 +68,7 @@ class PCA(val k: Int) { * @param k number of principal components. * @param pc a principal components Matrix. Each column is one principal component. */ -class PCAModel private[mllib] (val k: Int, val pc: DenseMatrix) extends VectorTransformer { +class PCAModel private[spark] (val k: Int, val pc: DenseMatrix) extends VectorTransformer { --- End diff -- Because test case of ml.feature.PCASuite(https://github.com/apache/spark/pull/7065/files#diff-e1593bb9e311c3f2a2ea49cce20ed671R34) use the constructor, so I change it to spark private like Word2VecModel. There are different access permission of constructors in mllib.feature, some are private[spark] while others are public. I think it's confusion and need to uniform in a separate task. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7065#issuecomment-116224612 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7065#issuecomment-116224581 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7065#issuecomment-116241503 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7065#discussion_r33435077 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala --- @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import org.apache.spark.annotation.Experimental +import org.apache.spark.ml._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util.Identifiable +import org.apache.spark.mllib.feature +import org.apache.spark.mllib.linalg.{Vector, VectorUDT} +import org.apache.spark.sql._ +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types.{StructField, StructType} + +/** + * Params for [[PCA]] and [[PCAModel]]. + */ +private[feature] trait PCAParams extends Params with HasInputCol with HasOutputCol { + + /** + * The number of principal components. + * @group param + */ + final val k: IntParam = new IntParam(this, k, the number of principal components) + + /** @group getParam */ + def getK: Int = $(k) + +} + +/** + * :: Experimental :: + * PCA trains a model to project vectors to a low-dimensional space using PCA. + */ +@Experimental +class PCA (override val uid: String) extends Estimator[PCAModel] with PCAParams { + + def this() = this(Identifiable.randomUID(PCA)) --- End diff -- `PCA` - pca. Treat the id as a variable name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7065#issuecomment-116443860 LGTM except a minor inline comment. Please create JIRAs and submit PRs for the following: 1. Java unit test 1. Python API 1. set ML attributes in the PCA output column Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org