[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-12-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4622


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-12-30 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-168112592
  
I'm going to close this pull request. If this is still relevant and you are 
interested in pushing it forward, please open a new pull request. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-07-14 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-121487220
  
@mengxr any plan to revisit this and merge it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r30020688
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,497 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param member cluster member.
+ */
+@Experimental
+case class AffinityPropagationAssignment(val id: Long, val exemplar: Long, 
val member: Long)
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster member.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param assignments the cluster assignments of AffinityPropagation 
clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val assignments: RDD[AffinityPropagationAssignment]) extends 
Serializable {
+
+  /**
+   * Get the number of clusters
+   */
+  lazy val k: Long = assignments.map(_.id).distinct.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[RDD]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): RDD[Long] = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assignments.filter(_.id == assign(0).id).map(_.member)
+} else {
+  assignments.sparkContext.emptyRDD[Long]
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assign(0).id
+} else {
+  -1
+}
+  } 
+
+  /**
+   * Turn cluster assignments to cluster representations 
[[AffinityPropagationCluster]].
+   * @return a [[RDD]] that contains all clusters generated by Affinity 
Propagation. Because the
+   * cluster members in [[AffinityPropagationCluster]] is an [[Array]], it 
could consume too much
+   * memory even run out of memory when you call collect() on the returned 
[[RDD]].
+   */
+  def fromAssignToClusters(): RDD[AffinityPropagationCluster] = {
+assignments.map { assign = ((assign.id, assign.exemplar), 
assign.member) }
+  .aggregateByKey(mutable.Set[Long]())(
+seqOp = (s, d) = s ++ mutable.Set(d),
+combOp = (s1, s2) = s1 ++ s2
+  ).map(kv = new AffinityPropagationCluster(kv._1._1, kv._1._2, 
kv._2.toArray))
+  }
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-11 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r30052128
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,497 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param member cluster member.
+ */
+@Experimental
+case class AffinityPropagationAssignment(val id: Long, val exemplar: Long, 
val member: Long)
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster member.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param assignments the cluster assignments of AffinityPropagation 
clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val assignments: RDD[AffinityPropagationAssignment]) extends 
Serializable {
+
+  /**
+   * Get the number of clusters
+   */
+  lazy val k: Long = assignments.map(_.id).distinct.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[RDD]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): RDD[Long] = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assignments.filter(_.id == assign(0).id).map(_.member)
+} else {
+  assignments.sparkContext.emptyRDD[Long]
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assign(0).id
+} else {
+  -1
+}
+  } 
+
+  /**
+   * Turn cluster assignments to cluster representations 
[[AffinityPropagationCluster]].
+   * @return a [[RDD]] that contains all clusters generated by Affinity 
Propagation. Because the
+   * cluster members in [[AffinityPropagationCluster]] is an [[Array]], it 
could consume too much
+   * memory even run out of memory when you call collect() on the returned 
[[RDD]].
+   */
+  def fromAssignToClusters(): RDD[AffinityPropagationCluster] = {
+assignments.map { assign = ((assign.id, assign.exemplar), 
assign.member) }
+  .aggregateByKey(mutable.Set[Long]())(
+seqOp = (s, d) = s ++ mutable.Set(d),
+combOp = (s1, s2) = s1 ++ s2
+  ).map(kv = new AffinityPropagationCluster(kv._1._1, kv._1._2, 
kv._2.toArray))
+  }
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29949043
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,497 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param member cluster member.
+ */
+@Experimental
+case class AffinityPropagationAssignment(val id: Long, val exemplar: Long, 
val member: Long)
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster member.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param assignments the cluster assignments of AffinityPropagation 
clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val assignments: RDD[AffinityPropagationAssignment]) extends 
Serializable {
+
+  /**
+   * Get the number of clusters
+   */
+  lazy val k: Long = assignments.map(_.id).distinct.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[RDD]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): RDD[Long] = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assignments.filter(_.id == assign(0).id).map(_.member)
+} else {
+  assignments.sparkContext.emptyRDD[Long]
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assign(0).id
+} else {
+  -1
+}
+  } 
+
+  /**
+   * Turn cluster assignments to cluster representations 
[[AffinityPropagationCluster]].
+   * @return a [[RDD]] that contains all clusters generated by Affinity 
Propagation. Because the
+   * cluster members in [[AffinityPropagationCluster]] is an [[Array]], it 
could consume too much
+   * memory even run out of memory when you call collect() on the returned 
[[RDD]].
+   */
+  def fromAssignToClusters(): RDD[AffinityPropagationCluster] = {
+assignments.map { assign = ((assign.id, assign.exemplar), 
assign.member) }
+  .aggregateByKey(mutable.Set[Long]())(
+seqOp = (s, d) = s ++ mutable.Set(d),
+combOp = (s1, s2) = s1 ++ s2
+  ).map(kv = new AffinityPropagationCluster(kv._1._1, kv._1._2, 
kv._2.toArray))
+  }
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29884174
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,496 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param member cluster member.
+ */
+@Experimental
+case class AffinityPropagationAssignment(val id: Long, val exemplar: Long, 
val member: Long)
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster member.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param assignments the cluster assignments of AffinityPropagation 
clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val assignments: RDD[AffinityPropagationAssignment]) extends 
Serializable {
+
+  /**
+   * Get the number of clusters
+   */
+  lazy val k: Long = assignments.map(_.id).distinct.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[RDD]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): RDD[Long] = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assignments.filter(_.id == assign(0).id).map(_.member)
+} else {
+  assignments.sparkContext.emptyRDD[Long]
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assign(0).id
+} else {
+  -1
+}
+  } 
+
+  /**
+   * Turn cluster assignments to cluster representations 
[[AffinityPropagationCluster]].
+   * @return a [[RDD]] that contains all clusters generated by Affinity 
Propagation. Because the
+   * cluster members in [[AffinityPropagationCluster]] is an [[Array]], it 
could consume too much
+   * memory even run out of memory when you call collect() on the returned 
[[RDD]].
+   */
+  def fromAssignToClusters(): RDD[AffinityPropagationCluster] = {
+assignments.map { assign = ((assign.id, assign.exemplar), 
assign.member) }
+  .aggregateByKey(mutable.Set[Long]())(
+seqOp = (s, d) = s ++ mutable.Set(d),
+combOp = (s1, s2) = s1 ++ s2
+  ).map(kv = new AffinityPropagationCluster(kv._1._1, kv._1._2, 
kv._2.toArray))
+  }
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: EdgeMessage =
+similarity == 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29883577
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,496 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param member cluster member.
+ */
+@Experimental
+case class AffinityPropagationAssignment(val id: Long, val exemplar: Long, 
val member: Long)
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster member.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param assignments the cluster assignments of AffinityPropagation 
clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val assignments: RDD[AffinityPropagationAssignment]) extends 
Serializable {
+
+  /**
+   * Get the number of clusters
+   */
+  lazy val k: Long = assignments.map(_.id).distinct.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[RDD]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): RDD[Long] = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assignments.filter(_.id == assign(0).id).map(_.member)
+} else {
+  assignments.sparkContext.emptyRDD[Long]
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assign(0).id
+} else {
+  -1
+}
+  } 
+
+  /**
+   * Turn cluster assignments to cluster representations 
[[AffinityPropagationCluster]].
+   * @return a [[RDD]] that contains all clusters generated by Affinity 
Propagation. Because the
+   * cluster members in [[AffinityPropagationCluster]] is an [[Array]], it 
could consume too much
+   * memory even run out of memory when you call collect() on the returned 
[[RDD]].
+   */
+  def fromAssignToClusters(): RDD[AffinityPropagationCluster] = {
+assignments.map { assign = ((assign.id, assign.exemplar), 
assign.member) }
+  .aggregateByKey(mutable.Set[Long]())(
+seqOp = (s, d) = s ++ mutable.Set(d),
+combOp = (s1, s2) = s1 ++ s2
+  ).map(kv = new AffinityPropagationCluster(kv._1._1, kv._1._2, 
kv._2.toArray))
+  }
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: EdgeMessage =
+similarity == 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99804275
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99804294
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99804679
  
  [Test build #32103 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32103/consoleFull)
 for   PR 4622 at commit 
[`0c7a26f`](https://github.com/apache/spark/commit/0c7a26f56e85febfe1edac0d85251de9b0bf91e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29896816
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,497 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param member cluster member.
+ */
+@Experimental
+case class AffinityPropagationAssignment(val id: Long, val exemplar: Long, 
val member: Long)
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster member.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param assignments the cluster assignments of AffinityPropagation 
clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val assignments: RDD[AffinityPropagationAssignment]) extends 
Serializable {
+
+  /**
+   * Get the number of clusters
+   */
+  lazy val k: Long = assignments.map(_.id).distinct.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[RDD]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): RDD[Long] = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assignments.filter(_.id == assign(0).id).map(_.member)
+} else {
+  assignments.sparkContext.emptyRDD[Long]
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assign(0).id
+} else {
+  -1
+}
+  } 
+
+  /**
+   * Turn cluster assignments to cluster representations 
[[AffinityPropagationCluster]].
+   * @return a [[RDD]] that contains all clusters generated by Affinity 
Propagation. Because the
+   * cluster members in [[AffinityPropagationCluster]] is an [[Array]], it 
could consume too much
+   * memory even run out of memory when you call collect() on the returned 
[[RDD]].
+   */
+  def fromAssignToClusters(): RDD[AffinityPropagationCluster] = {
+assignments.map { assign = ((assign.id, assign.exemplar), 
assign.member) }
+  .aggregateByKey(mutable.Set[Long]())(
+seqOp = (s, d) = s ++ mutable.Set(d),
+combOp = (s1, s2) = s1 ++ s2
+  ).map(kv = new AffinityPropagationCluster(kv._1._1, kv._1._2, 
kv._2.toArray))
+  }
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99837389
  
  [Test build #32103 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32103/consoleFull)
 for   PR 4622 at commit 
[`0c7a26f`](https://github.com/apache/spark/commit/0c7a26f56e85febfe1edac0d85251de9b0bf91e0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaAffinityPropagation `
  * `case class AffinityPropagationAssignment(val id: Long, val exemplar: 
Long, val member: Long)`
  * `case class AffinityPropagationCluster(val id: Long, val exemplar: 
Long, val members: Array[Long])`
  * `class AffinityPropagationModel(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99837449
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32103/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99837445
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29738697
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,475 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster members.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the clusters of AffinityPropagation clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: RDD[AffinityPropagationCluster]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  lazy val getK: Long = clusters.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[Array]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): Array[Long] = {
--- End diff --

 I have the same concern @mengxr has. Can that particular cluster contains 
lots of vertexes that run out of memory?  This can return `RDD[Long]`, but with 
data structure of `(vertex id, cluster id)`, it seems it requires two passes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29738695
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,475 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster members.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the clusters of AffinityPropagation clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: RDD[AffinityPropagationCluster]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  lazy val getK: Long = clusters.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[Array]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): Array[Long] = {
+val cluster = clusters.filter(_.members.contains(vertexID)).collect()
+if (cluster.nonEmpty) {
+  cluster(0).members
+} else {
+  null
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val clusterIds = clusters.flatMap { cluster =
+  if (cluster.members.contains(vertexID)) {
+Seq(cluster.id)
+  } else {
+Seq()
+  }
+}.collect()
+if (clusterIds.nonEmpty) {
+  clusterIds(0)
+} else {
+  -1
+}
+  } 
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: EdgeMessage =
+similarity == e.similarity  availability == e.availability 
+  responsibility == e.responsibility
+  case _ =
+false
+}
+  }
+}
+
+/**
+ * The data stored in each vertex on the graph
+ */
+private case class VertexData(availability: Double, responsibility: Double)
+
+/**
+ * :: Experimental ::
+ *
+ * Affinity propagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by [[http://doi.org/10.1126/science.1136800 Frey and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ * @param lambda lambda parameter used in the messaging iteration loop
+ * @param normalization Indication of performing normalization
+ * @param symmetric Indication of using symmetric similarity input
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29738770
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/clustering/AffinityPropagationSuite.scala
 ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.scalatest.FunSuite
+
+import org.apache.spark.graphx.{Edge, Graph}
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.mllib.util.TestingUtils._
+
+class AffinityPropagationSuite extends FunSuite with MLlibTestSparkContext 
{
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  test(affinity propagation) {
+/*
+ We use the following graph to test AP.
+
+ 15-14 -13  12
+ . \  /
+ 4 . 3 . 2  
+ |   .   |
+ 5   0 . 1  10
+ | \ .   .
+ 6   7 . 8 - 9 - 11
+ */
+
--- End diff --

Mirror: we use the following style
```
/**
 * .
 * 
 */
```
for comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99354269
  
General question. In euclidean space, the negative squared error is used as 
similarity. If we want to use affinity propagation to clustering lots of 
samples in euclidean space, it's impossible to create all the pairs of 
similarity data even it's symmetrical. What's the criteria to filter out those 
pairs which have very low similarity? Also, it's impossible to compute all the 
pairs of `RDD[Vector]` since it's O(N^2) operation, and how people address this 
in practice? 

I really like this algorithm, but still have concern about how people can 
use it in practice. 

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29777663
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,475 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster members.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the clusters of AffinityPropagation clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: RDD[AffinityPropagationCluster]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  lazy val getK: Long = clusters.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[Array]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): Array[Long] = {
+val cluster = clusters.filter(_.members.contains(vertexID)).collect()
+if (cluster.nonEmpty) {
+  cluster(0).members
+} else {
+  null
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val clusterIds = clusters.flatMap { cluster =
+  if (cluster.members.contains(vertexID)) {
+Seq(cluster.id)
+  } else {
+Seq()
+  }
+}.collect()
+if (clusterIds.nonEmpty) {
+  clusterIds(0)
+} else {
+  -1
+}
+  } 
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: EdgeMessage =
+similarity == e.similarity  availability == e.availability 
+  responsibility == e.responsibility
+  case _ =
+false
+}
+  }
+}
+
+/**
+ * The data stored in each vertex on the graph
+ */
+private case class VertexData(availability: Double, responsibility: Double)
+
+/**
+ * :: Experimental ::
+ *
+ * Affinity propagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by [[http://doi.org/10.1126/science.1136800 Frey and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ * @param lambda lambda parameter used in the messaging iteration loop
+ * @param normalization Indication of performing normalization
+ * @param symmetric Indication of using symmetric similarity input
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99576733
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99565738
  
@viirya Maybe you can comment on this in the documentation and in the 
comment of the code, and it will be useful for people trying to understand the 
use-case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99577439
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32017/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99577433
  
  [Test build #32017 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32017/consoleFull)
 for   PR 4622 at commit 
[`e062a94`](https://github.com/apache/spark/commit/e062a94e580e51d7cb5ff7b3d0e93be9c3edc704).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaAffinityPropagation `
  * `case class AffinityPropagationAssignment(val id: Long, val exemplar: 
Long, val member: Long)`
  * `case class AffinityPropagationCluster(val id: Long, val exemplar: 
Long, val members: Array[Long])`
  * `class AffinityPropagationModel(`
  * `class JoinedRow6 extends Row `
  * `case class WindowSpecDefinition(`
  * `case class WindowSpecReference(name: String) extends WindowSpec`
  * `sealed trait FrameBoundary `
  * `case class ValuePreceding(value: Int) extends FrameBoundary `
  * `case class ValueFollowing(value: Int) extends FrameBoundary `
  * `case class SpecifiedWindowFrame(`
  * `trait WindowFunction extends Expression `
  * `case class UnresolvedWindowFunction(`
  * `case class UnresolvedWindowExpression(`
  * `case class WindowExpression(`
  * `case class WithWindowDefinition(`
  * `case class Window(`
  * `case class Window(`
  * `  case class ComputedWindow(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99577436
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99576709
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29791336
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,496 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param member cluster member.
+ */
+@Experimental
+case class AffinityPropagationAssignment(val id: Long, val exemplar: Long, 
val member: Long)
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster member.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param assignments the cluster assignments of AffinityPropagation 
clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val assignments: RDD[AffinityPropagationAssignment]) extends 
Serializable {
+
+  /**
+   * Get the number of clusters
+   */
+  lazy val k: Long = assignments.map(_.id).distinct.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[RDD]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): RDD[Long] = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assignments.filter(_.id == assign(0).id).map(_.member)
+} else {
+  assignments.sparkContext.emptyRDD[Long]
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assign(0).id
+} else {
+  -1
+}
+  } 
+
+  /**
+   * Turn cluster assignments to cluster representations 
[[AffinityPropagationCluster]].
+   * @return a [[RDD]] that contains all clusters generated by Affinity 
Propagation. Because the
+   * cluster members in [[AffinityPropagationCluster]] is an [[Array]], it 
could consume too much
+   * memory even run out of memory when you call collect() on the returned 
[[RDD]].
+   */
+  def fromAssignToClusters(): RDD[AffinityPropagationCluster] = {
+assignments.map { assign = ((assign.id, assign.exemplar), 
assign.member) }
+  .aggregateByKey(mutable.Set[Long]())(
+seqOp = (s, d) = s ++ mutable.Set(d),
+combOp = (s1, s2) = s1 ++ s2
+  ).map(kv = new AffinityPropagationCluster(kv._1._1, kv._1._2, 
kv._2.toArray))
+  }
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: EdgeMessage =
+similarity == 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99577109
  
  [Test build #32017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32017/consoleFull)
 for   PR 4622 at commit 
[`e062a94`](https://github.com/apache/spark/commit/e062a94e580e51d7cb5ff7b3d0e93be9c3edc704).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29791211
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,496 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param member cluster member.
+ */
+@Experimental
+case class AffinityPropagationAssignment(val id: Long, val exemplar: Long, 
val member: Long)
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster member.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param assignments the cluster assignments of AffinityPropagation 
clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val assignments: RDD[AffinityPropagationAssignment]) extends 
Serializable {
+
+  /**
+   * Get the number of clusters
+   */
+  lazy val k: Long = assignments.map(_.id).distinct.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[RDD]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): RDD[Long] = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assignments.filter(_.id == assign(0).id).map(_.member)
+} else {
+  assignments.sparkContext.emptyRDD[Long]
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assign(0).id
+} else {
+  -1
+}
+  } 
+
+  /**
+   * Turn cluster assignments to cluster representations 
[[AffinityPropagationCluster]].
+   * @return a [[RDD]] that contains all clusters generated by Affinity 
Propagation. Because the
+   * cluster members in [[AffinityPropagationCluster]] is an [[Array]], it 
could consume too much
+   * memory even run out of memory when you call collect() on the returned 
[[RDD]].
+   */
+  def fromAssignToClusters(): RDD[AffinityPropagationCluster] = {
+assignments.map { assign = ((assign.id, assign.exemplar), 
assign.member) }
+  .aggregateByKey(mutable.Set[Long]())(
+seqOp = (s, d) = s ++ mutable.Set(d),
+combOp = (s1, s2) = s1 ++ s2
+  ).map(kv = new AffinityPropagationCluster(kv._1._1, kv._1._2, 
kv._2.toArray))
+  }
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: EdgeMessage =
+similarity == 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29780015
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,475 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster members.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the clusters of AffinityPropagation clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: RDD[AffinityPropagationCluster]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  lazy val getK: Long = clusters.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[Array]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): Array[Long] = {
+val cluster = clusters.filter(_.members.contains(vertexID)).collect()
+if (cluster.nonEmpty) {
+  cluster(0).members
+} else {
+  null
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val clusterIds = clusters.flatMap { cluster =
+  if (cluster.members.contains(vertexID)) {
+Seq(cluster.id)
+  } else {
+Seq()
+  }
+}.collect()
+if (clusterIds.nonEmpty) {
+  clusterIds(0)
+} else {
+  -1
+}
+  } 
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: EdgeMessage =
+similarity == e.similarity  availability == e.availability 
+  responsibility == e.responsibility
+  case _ =
+false
+}
+  }
+}
+
+/**
+ * The data stored in each vertex on the graph
+ */
+private case class VertexData(availability: Double, responsibility: Double)
+
+/**
+ * :: Experimental ::
+ *
+ * Affinity propagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by [[http://doi.org/10.1126/science.1136800 Frey and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ * @param lambda lambda parameter used in the messaging iteration loop
+ * @param normalization Indication of performing normalization
+ * @param symmetric Indication of using symmetric similarity input
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29777568
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,475 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster members.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the clusters of AffinityPropagation clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: RDD[AffinityPropagationCluster]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  lazy val getK: Long = clusters.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[Array]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): Array[Long] = {
+val cluster = clusters.filter(_.members.contains(vertexID)).collect()
+if (cluster.nonEmpty) {
+  cluster(0).members
+} else {
+  null
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val clusterIds = clusters.flatMap { cluster =
+  if (cluster.members.contains(vertexID)) {
+Seq(cluster.id)
+  } else {
+Seq()
+  }
+}.collect()
+if (clusterIds.nonEmpty) {
+  clusterIds(0)
+} else {
+  -1
+}
+  } 
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: EdgeMessage =
+similarity == e.similarity  availability == e.availability 
+  responsibility == e.responsibility
+  case _ =
+false
+}
+  }
+}
+
+/**
+ * The data stored in each vertex on the graph
+ */
+private case class VertexData(availability: Double, responsibility: Double)
+
+/**
+ * :: Experimental ::
+ *
+ * Affinity propagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by [[http://doi.org/10.1126/science.1136800 Frey and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ * @param lambda lambda parameter used in the messaging iteration loop
+ * @param normalization Indication of performing normalization
+ * @param symmetric Indication of using symmetric similarity input
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29778597
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,475 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster members.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the clusters of AffinityPropagation clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: RDD[AffinityPropagationCluster]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  lazy val getK: Long = clusters.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[Array]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): Array[Long] = {
+val cluster = clusters.filter(_.members.contains(vertexID)).collect()
+if (cluster.nonEmpty) {
+  cluster(0).members
+} else {
+  null
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val clusterIds = clusters.flatMap { cluster =
+  if (cluster.members.contains(vertexID)) {
+Seq(cluster.id)
+  } else {
+Seq()
+  }
+}.collect()
+if (clusterIds.nonEmpty) {
+  clusterIds(0)
+} else {
+  -1
+}
+  } 
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: EdgeMessage =
+similarity == e.similarity  availability == e.availability 
+  responsibility == e.responsibility
+  case _ =
+false
+}
+  }
+}
+
+/**
+ * The data stored in each vertex on the graph
+ */
+private case class VertexData(availability: Double, responsibility: Double)
+
+/**
+ * :: Experimental ::
+ *
+ * Affinity propagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by [[http://doi.org/10.1126/science.1136800 Frey and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ * @param lambda lambda parameter used in the messaging iteration loop
+ * @param normalization Indication of performing normalization
+ * @param symmetric Indication of using symmetric similarity input
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-99541423
  
@dbtsai Thanks for comments and the question.
I have no very good answer to the question. We have used a threshold to 
filter out the insignificant similarities of large scale data. As you said, it 
is impossible to compute all the pairs. However, the data we process is very 
high dimensional and very sparse, so the all-pair computation can be much 
reduced by only considering the pairs having corresponding dimensions with 
values more than zero (or a threshold defined).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-06 Thread duncanfinney
Github user duncanfinney commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29823539
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,496 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param member cluster member.
+ */
+@Experimental
+case class AffinityPropagationAssignment(val id: Long, val exemplar: Long, 
val member: Long)
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster member.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param assignments the cluster assignments of AffinityPropagation 
clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val assignments: RDD[AffinityPropagationAssignment]) extends 
Serializable {
+
+  /**
+   * Get the number of clusters
+   */
+  lazy val k: Long = assignments.map(_.id).distinct.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[RDD]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): RDD[Long] = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assignments.filter(_.id == assign(0).id).map(_.member)
+} else {
+  assignments.sparkContext.emptyRDD[Long]
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val assign = assignments.filter(_.member == vertexID).collect()
+if (assign.nonEmpty) {
+  assign(0).id
+} else {
+  -1
+}
+  } 
+
+  /**
+   * Turn cluster assignments to cluster representations 
[[AffinityPropagationCluster]].
+   * @return a [[RDD]] that contains all clusters generated by Affinity 
Propagation. Because the
+   * cluster members in [[AffinityPropagationCluster]] is an [[Array]], it 
could consume too much
+   * memory even run out of memory when you call collect() on the returned 
[[RDD]].
+   */
+  def fromAssignToClusters(): RDD[AffinityPropagationCluster] = {
+assignments.map { assign = ((assign.id, assign.exemplar), 
assign.member) }
+  .aggregateByKey(mutable.Set[Long]())(
+seqOp = (s, d) = s ++ mutable.Set(d),
+combOp = (s1, s2) = s1 ++ s2
+  ).map(kv = new AffinityPropagationCluster(kv._1._1, kv._1._2, 
kv._2.toArray))
+  }
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: EdgeMessage =
+similarity == 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29677618
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,475 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster members.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
--- End diff --

Is it reasonable to assume that `k` is small but the number of vertices are 
large? Then storing members as `Array[Long]` may run out of memory. We can 
store `id` and `exemplar` and the driver and cluster assignments distributively 
as `RDD[(Long, Int)]` (vertex id, cluster id). Lookup becomes less expensive in 
this setup.

Btw, is it sufficient to use `Int` for cluster id? It won't provide much 
information if AP outputs more than `Int.MaxValue` clusters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29680209
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,475 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster members.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the clusters of AffinityPropagation clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: RDD[AffinityPropagationCluster]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  lazy val getK: Long = clusters.count()
--- End diff --

`getK` - `k`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29680338
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,475 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster members.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the clusters of AffinityPropagation clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: RDD[AffinityPropagationCluster]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  lazy val getK: Long = clusters.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[Array]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): Array[Long] = {
+val cluster = clusters.filter(_.members.contains(vertexID)).collect()
+if (cluster.nonEmpty) {
+  cluster(0).members
+} else {
+  null
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
--- End diff --

This could be simplified if we store cluster assignment in `RDD[(Long, 
Int)]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29682634
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,475 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster members.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the clusters of AffinityPropagation clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: RDD[AffinityPropagationCluster]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  lazy val getK: Long = clusters.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[Array]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): Array[Long] = {
+val cluster = clusters.filter(_.members.contains(vertexID)).collect()
+if (cluster.nonEmpty) {
+  cluster(0).members
+} else {
+  null
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val clusterIds = clusters.flatMap { cluster =
+  if (cluster.members.contains(vertexID)) {
+Seq(cluster.id)
+  } else {
+Seq()
+  }
+}.collect()
+if (clusterIds.nonEmpty) {
+  clusterIds(0)
+} else {
+  -1
+}
+  } 
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: EdgeMessage =
+similarity == e.similarity  availability == e.availability 
+  responsibility == e.responsibility
+  case _ =
+false
+}
+  }
+}
+
+/**
+ * The data stored in each vertex on the graph
+ */
+private case class VertexData(availability: Double, responsibility: Double)
+
+/**
+ * :: Experimental ::
+ *
+ * Affinity propagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by [[http://doi.org/10.1126/science.1136800 Frey and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ * @param lambda lambda parameter used in the messaging iteration loop
+ * @param normalization Indication of performing normalization
+ * @param symmetric Indication of using symmetric similarity input
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-05-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r29682601
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,475 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param id cluster id.
+ * @param exemplar cluster exemplar.
+ * @param members cluster members.
+ */
+@Experimental
+case class AffinityPropagationCluster(val id: Long, val exemplar: Long, 
val members: Array[Long])
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the clusters of AffinityPropagation clustering results.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: RDD[AffinityPropagationCluster]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  lazy val getK: Long = clusters.count()
+ 
+  /**
+   * Find the cluster the given vertex belongs
+   * @param vertexID vertex id.
+   * @return a [[Array]] that contains vertex ids in the same cluster of 
given vertexID. If
+   * the given vertex doesn't belong to any cluster, return null.
+   */
+  def findCluster(vertexID: Long): Array[Long] = {
+val cluster = clusters.filter(_.members.contains(vertexID)).collect()
+if (cluster.nonEmpty) {
+  cluster(0).members
+} else {
+  null
+}
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs to
+   * @param vertexID vertex id.
+   * @return the cluster id that the given vertex belongs to. If the given 
vertex doesn't belong to
+   * any cluster, return -1.
+   */
+  def findClusterID(vertexID: Long): Long = {
+val clusterIds = clusters.flatMap { cluster =
+  if (cluster.members.contains(vertexID)) {
+Seq(cluster.id)
+  } else {
+Seq()
+  }
+}.collect()
+if (clusterIds.nonEmpty) {
+  clusterIds(0)
+} else {
+  -1
+}
+  } 
+}
+
+/**
+ * The message exchanged on the node graph
+ */
+private case class EdgeMessage(
+similarity: Double,
+availability: Double,
+responsibility: Double) extends Equals {
+  override def canEqual(that: Any): Boolean = {
+that match {
+  case e: EdgeMessage =
+similarity == e.similarity  availability == e.availability 
+  responsibility == e.responsibility
+  case _ =
+false
+}
+  }
+}
+
+/**
+ * The data stored in each vertex on the graph
+ */
+private case class VertexData(availability: Double, responsibility: Double)
+
+/**
+ * :: Experimental ::
+ *
+ * Affinity propagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by [[http://doi.org/10.1126/science.1136800 Frey and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ * @param lambda lambda parameter used in the messaging iteration loop
+ * @param normalization Indication of performing normalization
+ * @param symmetric Indication of using symmetric similarity input
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation 

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91784107
  
  [Test build #30067 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30067/consoleFull)
 for   PR 4622 at commit 
[`97cef01`](https://github.com/apache/spark/commit/97cef01f1e9f1bd2fcaa2d7718302c8ae79cf25a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91795886
  
  [Test build #30067 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30067/consoleFull)
 for   PR 4622 at commit 
[`97cef01`](https://github.com/apache/spark/commit/97cef01f1e9f1bd2fcaa2d7718302c8ae79cf25a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaAffinityPropagation `
  * `case class AffinityPropagationCluster(val id: Long, val exemplar: 
Long, val members: Array[Long])`
  * `class AffinityPropagationModel(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91795890
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30067/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-09 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91319694
  
@mengxr The model is now distributed. Edge messages and vertex data are 
defined as private case classes too. I will try to add Java examples later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-09 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91304199
  
Thanks @vanzin, I think it might be a problem. I decided to move to custom 
method for median computation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91322337
  
  [Test build #29956 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29956/consoleFull)
 for   PR 4622 at commit 
[`d70b70a`](https://github.com/apache/spark/commit/d70b70a4b2813ac8219cfe5f5e8c4c3e41f48c4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91342743
  
  [Test build #29956 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29956/consoleFull)
 for   PR 4622 at commit 
[`d70b70a`](https://github.com/apache/spark/commit/d70b70a4b2813ac8219cfe5f5e8c4c3e41f48c4b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AffinityPropagationModel(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91337081
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29955/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91337063
  
  [Test build #29955 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29955/consoleFull)
 for   PR 4622 at commit 
[`6ad4905`](https://github.com/apache/spark/commit/6ad4905b0de5ac6ca9602ea30ce515bd75151221).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AffinityPropagationModel(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-09 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91299289
  
 it has to add spark-hive as dependency of MLlib

spark-hive can be disabled (in fact it has to be explicitly enabled, it's 
disabled by default), while mllib cannot, so I guess that is a non-starter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91321346
  
  [Test build #29955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29955/consoleFull)
 for   PR 4622 at commit 
[`6ad4905`](https://github.com/apache/spark/commit/6ad4905b0de5ac6ca9602ea30ce515bd75151221).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91342764
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29956/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-09 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91229215
  
@mengxr I have added a new function to automatically calculate the 
preferences used in AP algorithm. In fact, it just the median of similarities.

Because I use the function `percentile_approx` in Hive to calculate the 
median value, it has to add spark-hive as dependency of MLlib. I don't know if 
this is a good idea? How do you think?

If it is not, in order to provide this function to users, where it is 
better to put this in? Or it is better to implement a custom method to compute 
the median value?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-90995035
  
  [Test build #29876 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29876/consoleFull)
 for   PR 4622 at commit 
[`250b6a4`](https://github.com/apache/spark/commit/250b6a40206b4a540432bb340ef598bef0dd457d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-90987246
  
  [Test build #29872 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29872/consoleFull)
 for   PR 4622 at commit 
[`f031efd`](https://github.com/apache/spark/commit/f031efdb45dd7bd879297d4a9d7b31e9d9a8804f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-90992381
  
  [Test build #29872 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29872/consoleFull)
 for   PR 4622 at commit 
[`f031efd`](https://github.com/apache/spark/commit/f031efdb45dd7bd879297d4a9d7b31e9d9a8804f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch **removes the following dependencies:**
   * `RoaringBitmap-0.4.5.jar`
   * `activation-1.1.jar`
   * `akka-actor_2.10-2.3.4-spark.jar`
   * `akka-remote_2.10-2.3.4-spark.jar`
   * `akka-slf4j_2.10-2.3.4-spark.jar`
   * `aopalliance-1.0.jar`
   * `arpack_combined_all-0.1.jar`
   * `avro-1.7.6.jar`
   * `breeze-macros_2.10-0.11.1.jar`
   * `breeze_2.10-0.11.1.jar`
   * `chill-java-0.5.0.jar`
   * `chill_2.10-0.5.0.jar`
   * `commons-beanutils-1.7.0.jar`
   * `commons-beanutils-core-1.8.0.jar`
   * `commons-cli-1.2.jar`
   * `commons-codec-1.10.jar`
   * `commons-collections-3.2.1.jar`
   * `commons-compress-1.4.1.jar`
   * `commons-configuration-1.6.jar`
   * `commons-digester-1.8.jar`
   * `commons-httpclient-3.1.jar`
   * `commons-io-2.1.jar`
   * `commons-lang-2.5.jar`
   * `commons-lang3-3.3.2.jar`
   * `commons-math-2.1.jar`
   * `commons-math3-3.1.1.jar`
   * `commons-net-2.2.jar`
   * `compress-lzf-1.0.0.jar`
   * `config-1.2.1.jar`
   * `core-1.1.2.jar`
   * `curator-client-2.4.0.jar`
   * `curator-framework-2.4.0.jar`
   * `curator-recipes-2.4.0.jar`
   * `gmbal-api-only-3.0.0-b023.jar`
   * `grizzly-framework-2.1.2.jar`
   * `grizzly-http-2.1.2.jar`
   * `grizzly-http-server-2.1.2.jar`
   * `grizzly-http-servlet-2.1.2.jar`
   * `grizzly-rcm-2.1.2.jar`
   * `groovy-all-2.3.7.jar`
   * `guava-14.0.1.jar`
   * `guice-3.0.jar`
   * `hadoop-annotations-2.2.0.jar`
   * `hadoop-auth-2.2.0.jar`
   * `hadoop-client-2.2.0.jar`
   * `hadoop-common-2.2.0.jar`
   * `hadoop-hdfs-2.2.0.jar`
   * `hadoop-mapreduce-client-app-2.2.0.jar`
   * `hadoop-mapreduce-client-common-2.2.0.jar`
   * `hadoop-mapreduce-client-core-2.2.0.jar`
   * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
   * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
   * `hadoop-yarn-api-2.2.0.jar`
   * `hadoop-yarn-client-2.2.0.jar`
   * `hadoop-yarn-common-2.2.0.jar`
   * `hadoop-yarn-server-common-2.2.0.jar`
   * `ivy-2.4.0.jar`
   * `jackson-annotations-2.4.0.jar`
   * `jackson-core-2.4.4.jar`
   * `jackson-core-asl-1.8.8.jar`
   * `jackson-databind-2.4.4.jar`
   * `jackson-jaxrs-1.8.8.jar`
   * `jackson-mapper-asl-1.8.8.jar`
   * `jackson-module-scala_2.10-2.4.4.jar`
   * `jackson-xc-1.8.8.jar`
   * `jansi-1.4.jar`
   * `javax.inject-1.jar`
   * `javax.servlet-3.0.0.v201112011016.jar`
   * `javax.servlet-3.1.jar`
   * `javax.servlet-api-3.0.1.jar`
   * `jaxb-api-2.2.2.jar`
   * `jaxb-impl-2.2.3-1.jar`
   * `jcl-over-slf4j-1.7.10.jar`
   * `jersey-client-1.9.jar`
   * `jersey-core-1.9.jar`
   * `jersey-grizzly2-1.9.jar`
   * `jersey-guice-1.9.jar`
   * `jersey-json-1.9.jar`
   * `jersey-server-1.9.jar`
   * `jersey-test-framework-core-1.9.jar`
   * `jersey-test-framework-grizzly2-1.9.jar`
   * `jets3t-0.7.1.jar`
   * `jettison-1.1.jar`
   * `jetty-util-6.1.26.jar`
   * `jline-0.9.94.jar`
   * `jline-2.10.4.jar`
   * `jodd-core-3.6.3.jar`
   * `json4s-ast_2.10-3.2.10.jar`
   * `json4s-core_2.10-3.2.10.jar`
   * `json4s-jackson_2.10-3.2.10.jar`
   * `jsr305-1.3.9.jar`
   * `jtransforms-2.4.0.jar`
   * `jul-to-slf4j-1.7.10.jar`
   * `kryo-2.21.jar`
   * `log4j-1.2.17.jar`
   * `lz4-1.2.0.jar`
   * `management-api-3.0.0-b012.jar`
   * `mesos-0.21.0-shaded-protobuf.jar`
   * `metrics-core-3.1.0.jar`
   * `metrics-graphite-3.1.0.jar`
   * `metrics-json-3.1.0.jar`
   * `metrics-jvm-3.1.0.jar`
   * `minlog-1.2.jar`
   * `netty-3.8.0.Final.jar`
   * `netty-all-4.0.23.Final.jar`
   * `objenesis-1.2.jar`
   * `opencsv-2.3.jar`
   * `oro-2.0.8.jar`
   * `paranamer-2.6.jar`
   * `parquet-column-1.6.0rc3.jar`
   * `parquet-common-1.6.0rc3.jar`
   * `parquet-encoding-1.6.0rc3.jar`
   * `parquet-format-2.2.0-rc1.jar`
   * `parquet-generator-1.6.0rc3.jar`
   * `parquet-hadoop-1.6.0rc3.jar`
   * `parquet-jackson-1.6.0rc3.jar`
   * `protobuf-java-2.4.1.jar`
   * `protobuf-java-2.5.0-spark.jar`
   * `py4j-0.8.2.1.jar`
   * `pyrolite-2.0.1.jar`
   * `quasiquotes_2.10-2.0.1.jar`
   * `reflectasm-1.07-shaded.jar`
   * `scala-compiler-2.10.4.jar`
   * `scala-library-2.10.4.jar`
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-90992384
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29872/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91016629
  
  [Test build #29876 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29876/consoleFull)
 for   PR 4622 at commit 
[`250b6a4`](https://github.com/apache/spark/commit/250b6a40206b4a540432bb340ef598bef0dd457d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AffinityPropagationModel(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-91016654
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29876/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r27893371
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r27896111
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-90644122
  
  [Test build #29798 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29798/consoleFull)
 for   PR 4622 at commit 
[`d762697`](https://github.com/apache/spark/commit/d762697e4140781b2dc8882c4d654e19ce5bb80b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-90643243
  
@mengxr I updated for small comments. For the bigger ones, I will update 
later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r27893434
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
--- End diff --

ok, would work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-90644488
  
  [Test build #29798 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29798/consoleFull)
 for   PR 4622 at commit 
[`d762697`](https://github.com/apache/spark/commit/d762697e4140781b2dc8882c4d654e19ce5bb80b).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AffinityPropagationModel(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-90644493
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29798/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r27893348
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-90652236
  
  [Test build #29799 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29799/consoleFull)
 for   PR 4622 at commit 
[`68947bd`](https://github.com/apache/spark/commit/68947bd619650c5b25ef574d155a24bf97f0356a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-90694626
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29799/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-90694610
  
  [Test build #29799 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29799/consoleFull)
 for   PR 4622 at commit 
[`68947bd`](https://github.com/apache/spark/commit/68947bd619650c5b25ef574d155a24bf97f0356a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AffinityPropagationModel(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r27930452
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-07 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r27930450
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-05 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-89915614
  
@viirya Any plan to update this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-04-05 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-89915963
  
@mengxr I will update this soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-75284420
  
@mengxr have we decided no to include this algorithm? If so, please let me 
know. Then I will close this pr and maintain it as third-party package. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-75288067
  
I didn't see `cartesian` in your code. So the complexity is really `O(nnz * 
k)` but not `O(n^2 * k)`, correct? This is the same complexity as PIC/PageRank. 
If this is the case, let's check the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092347
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092363
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092359
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092309
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
--- End diff --

`Option` is not Java friendly. Would `-1` work here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092338
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092313
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
--- End diff --

`AffinityPropagation` - `Affinity propagation`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092304
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
--- End diff --

So we store the entire clustering result to driver. How large could it be? 
It would be nice if we can store the model distributively. In PIC, the result 
is stored as `RDD[(Long, Int)]`. Please also consider Java users. Both `Seq` 
and `Set` are Scala collections. The best way to check this is to provide a 
Java example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092334
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092345
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092331
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092352
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092326
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092329
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092321
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
--- End diff --

Put backticks around `0.5`. Otherwise, ScalaDoc will only show ... lambda: 
0. as the preview.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092362
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092300
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
--- End diff --

Please only import `mutable` and use `mutable.Map` and `mutable.Set` in the 
code. Sometimes, mixing the default `Map` with mutable `Map` causes problems 
that are hard to debug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092308
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
--- End diff --

Document the behavior if the given vertex doesn't belong to any cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092310
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
--- End diff --

use this style for consistency:

~~~
cluster.foreach { cluster =
   ...
}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092320
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
--- End diff --

`(Wikipedia)` - `Affinity propagation (Wikipedia)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092316
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
--- End diff --

Use DOI link: http://doi.org/10.1126/science.1136800


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092333
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092335
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4622#discussion_r25092341
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/AffinityPropagation.scala
 ---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.Map
+import scala.collection.mutable.Set
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.graphx._
+import org.apache.spark.graphx.impl.GraphImpl
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ *
+ * Model produced by [[AffinityPropagation]].
+ *
+ * @param clusters the vertexIDs of each cluster.
+ * @param exemplars the vertexIDs of all exemplars.
+ */
+@Experimental
+class AffinityPropagationModel(
+val clusters: Seq[Set[Long]],
+val exemplars: Seq[Long]) extends Serializable {
+
+  /**
+   * Set the number of clusters
+   */
+  def getK(): Int = clusters.size
+
+  /**
+   * Find the cluster the given vertex belongs
+   */
+  def findCluster(vertexID: Long): Set[Long] = {
+clusters.filter(_.contains(vertexID))(0)
+  } 
+ 
+  /**
+   * Find the cluster id the given vertex belongs
+   */
+  def findClusterID(vertexID: Long): Option[Int] = {
+var i = 0
+clusters.foreach(cluster = {
+  if (cluster.contains(vertexID)) {
+return Some(i)
+  }
+  i += i
+})
+None 
+  } 
+}
+
+/**
+ * :: Experimental ::
+ *
+ * AffinityPropagation (AP), a graph clustering algorithm based on the 
concept of message passing
+ * between data points. Unlike clustering algorithms such as k-means or 
k-medoids, AP does not
+ * require the number of clusters to be determined or estimated before 
running it. AP is developed
+ * by 
[[http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf Frey 
and Dueck]].
+ *
+ * @param maxIterations Maximum number of iterations of the AP algorithm.
+ *
+ * @see [[http://en.wikipedia.org/wiki/Affinity_propagation (Wikipedia)]]
+ */
+@Experimental
+class AffinityPropagation private[clustering] (
+private var maxIterations: Int,
+private var lambda: Double,
+private var normalization: Boolean) extends Serializable {
+
+  import org.apache.spark.mllib.clustering.AffinityPropagation._
+
+  /** Constructs a AP instance with default parameters: {maxIterations: 
100, lambda: 0.5,
+   *normalization: false}.
+   */
+  def this() = this(maxIterations = 100, lambda = 0.5, normalization = 
false)
+
+  /**
+   * Set maximum number of iterations of the messaging iteration loop
+   */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+ 
+  /**
+   * Get maximum number of iterations of the messaging iteration loop
+   */
+  def getMaxIterations(): Int = {
+this.maxIterations
+  }
+ 
+  /**
+   * Set lambda of the messaging iteration loop
+   */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+ 
+  /**
+   * Get lambda of the messaging iteration loop
+   */
+  def getLambda(): Double = {
+this.lambda
+  }
+ 
+  /**
+   * Set whether to do normalization or not
+   */
+  def setNormalization(normalization: Boolean): this.type = {
+this.normalization = normalization
+this
+  }
+ 
+  /**
+   * Get whether to do normalization or not
+   */
+  def getNormalization(): Boolean = {
+this.normalization
+  }
+ 
+  /**
+   * Run the AP algorithm.
+   *
+   * @param similarities an RDD of (i, j, s,,ij,,) tuples representing the 
similarity matrix, which
   

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-20 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-75297368
  
@viirya I made a pass on the code. Some high-level feedback:

1. What is the expected size of the model? Should we store it 
distributively?
2. Defining a private case class to replace `Seq(Double, Double, Double)` 
would make the code much easier to read.
3. Please try to add a Java example, which helps test Java API 
compatibility.

I will go through the algorithm part again after 2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-74653282
  
  [Test build #27628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27628/consoleFull)
 for   PR 4622 at commit 
[`6cddeb2`](https://github.com/apache/spark/commit/6cddeb2f655fb477d23c1fbe5bf0230e2b97bdce).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AffinityPropagationModel(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >