[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72577494 [Test build #26574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26574/consoleFull) for PR 4301 at commit [`7db28fb`](https://github.com/apache/spark/commit/7db28fbb6551d804436c52bcbff4067f7d8fac95). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4301#discussion_r23976526 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -149,6 +165,18 @@ private[clustering] object PowerIterationClustering extends Logging { } /** + * Generates the degree vector as the vertex properties (v0) to start power iteration. + * + * @param g a graph representing the normalized affinity matrix (W) + * @return a graph with edges representing W and vertices representing the degree vector + */ + def initDegreeVector(g: Graph[Double, Double]): Graph[Double, Double] = { --- End diff -- Does it need to be? `randomInit` is not private too. And they are included in a private object `PowerIterationClustering`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4301#discussion_r23976829 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -149,6 +165,18 @@ private[clustering] object PowerIterationClustering extends Logging { } /** + * Generates the degree vector as the vertex properties (v0) to start power iteration. + * + * @param g a graph representing the normalized affinity matrix (W) + * @return a graph with edges representing W and vertices representing the degree vector + */ + def initDegreeVector(g: Graph[Double, Double]): Graph[Double, Double] = { +val sum = g.vertices.values.sum() --- End diff -- Agree. I have put that in the doc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72587034 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4301 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4301#discussion_r23980652 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -149,6 +165,18 @@ private[clustering] object PowerIterationClustering extends Logging { } /** + * Generates the degree vector as the vertex properties (v0) to start power iteration. + * + * @param g a graph representing the normalized affinity matrix (W) + * @return a graph with edges representing W and vertices representing the degree vector + */ + def initDegreeVector(g: Graph[Double, Double]): Graph[Double, Double] = { --- End diff -- No, my bad ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4301#discussion_r23941409 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -149,6 +165,18 @@ private[clustering] object PowerIterationClustering extends Logging { } /** + * Generates the degree vector as the vertex properties (v0) to start power iteration. + * + * @param g a graph representing the normalized affinity matrix (W) + * @return a graph with edges representing W and vertices representing the degree vector + */ + def initDegreeVector(g: Graph[Double, Double]): Graph[Double, Double] = { +val sum = g.vertices.values.sum() --- End diff -- Is it accurate to call the method degree'? We use the diagonals of `D` but they are not node degrees. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4301#discussion_r23941402 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -70,6 +72,17 @@ class PowerIterationClustering private[clustering] ( } /** + * Set the initialization method + */ + def setInitialization(method: String): this.type = { +this.initMethod = method match { + case random | degree = method + case _ = throw new IllegalArgumentException(Incorrect initialization method) --- End diff -- Include `method` in the error message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4301#discussion_r23941396 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -43,10 +43,12 @@ class PowerIterationClusteringModel( * * @param k Number of clusters. * @param maxIterations Maximum number of iterations of the PIC algorithm. + * @param initMethod Initialization method. */ class PowerIterationClustering private[clustering] ( private var k: Int, -private var maxIterations: Int) extends Serializable { +private var maxIterations: Int, +private var initMethod: String = random) extends Serializable { --- End diff -- We don't put the default values here but in the auxiliary constructor (for Java compatibility when we make it public) Remember to document the default value in the auxiliary constructor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4301#discussion_r23941407 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -149,6 +165,18 @@ private[clustering] object PowerIterationClustering extends Logging { } /** + * Generates the degree vector as the vertex properties (v0) to start power iteration. + * + * @param g a graph representing the normalized affinity matrix (W) + * @return a graph with edges representing W and vertices representing the degree vector + */ + def initDegreeVector(g: Graph[Double, Double]): Graph[Double, Double] = { --- End diff -- Should be private. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4301#discussion_r23941399 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -70,6 +72,17 @@ class PowerIterationClustering private[clustering] ( } /** + * Set the initialization method + */ + def setInitialization(method: String): this.type = { --- End diff -- Shall we change the name to `setInitializationMode` to match `KMeans`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4301#discussion_r23941398 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -70,6 +72,17 @@ class PowerIterationClustering private[clustering] ( } /** + * Set the initialization method --- End diff -- Mention what choices are in the doc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72586516 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26574/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72586512 [Test build #26574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26574/consoleFull) for PR 4301 at commit [`7db28fb`](https://github.com/apache/spark/commit/7db28fbb6551d804436c52bcbff4067f7d8fac95). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72421316 [Test build #26494 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26494/consoleFull) for PR 4301 at commit [`19cf94e`](https://github.com/apache/spark/commit/19cf94ecfd6d879cbceb52f0abc0a32461e7d871). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72421323 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26494/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72364545 Looks like the test failed on an unrelated unit test. Please test it again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72364465 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26473/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72364459 [Test build #26473 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26473/consoleFull) for PR 4301 at commit [`ec88567`](https://github.com/apache/spark/commit/ec88567259f68ced0e6fb7392b300848074a4489). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val elem = sarray (class $` * `val elem = sexternalizable object (class $` * `val elem = sobject (class $` * ` implicit class ObjectStreamClassMethods(val desc: ObjectStreamClass) extends AnyVal ` * `class IsotonicRegressionModel (` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72362826 [Test build #26473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26473/consoleFull) for PR 4301 at commit [`ec88567`](https://github.com/apache/spark/commit/ec88567259f68ced0e6fb7392b300848074a4489). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/4301 [SPARK-5512][Mllib] Run the PIC algorithm with initial vector suggected by the PIC paper As suggested by the paper of Power Iteration Clustering, it is useful to set the initial vector v0 as the degree vector d. This pr tries to add a running method for that. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 pic_degreevector Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4301.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4301 commit ec88567259f68ced0e6fb7392b300848074a4489 Author: Liang-Chi Hsieh vii...@gmail.com Date: 2015-02-01T12:20:45Z Run the PIC algorithm with degree vector d as suggected by the PIC paper. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72373979 @viirya If that works better than a randomized vector in general, we can replace the current initialization. We set it to a random vector to guarantee that if it far from the first eigenvector. If we want to keep both, instead of adding new method, we can make the switch an option: ~~~ val pic = new PowerIterationClustering() .setInitialization(random) // or degree .run(...) ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72414811 @mengxr I think it is better to keep both and leave it as an option users can switch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4301#issuecomment-72415114 [Test build #26494 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26494/consoleFull) for PR 4301 at commit [`19cf94e`](https://github.com/apache/spark/commit/19cf94ecfd6d879cbceb52f0abc0a32461e7d871). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org