Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3100#discussion_r20204348
  
    --- Diff: 
graphx/src/main/scala/org/apache/spark/graphx/impl/EdgePartition.scala ---
    @@ -21,63 +21,93 @@ import scala.reflect.{classTag, ClassTag}
     
     import org.apache.spark.graphx._
     import 
org.apache.spark.graphx.util.collection.GraphXPrimitiveKeyOpenHashMap
    +import org.apache.spark.util.collection.BitSet
     
     /**
    - * A collection of edges stored in columnar format, along with any vertex 
attributes referenced. The
    - * edges are stored in 3 large columnar arrays (src, dst, attribute). The 
arrays are clustered by
    - * src. There is an optional active vertex set for filtering computation 
on the edges.
    + * A collection of edges, along with referenced vertex attributes and an 
optional active vertex set
    + * for filtering computation on the edges.
    + *
    + * The edges are stored in columnar format in `localSrcIds`, 
`localDstIds`, and `data`. All
    + * referenced global vertex ids are mapped to a compact set of local 
vertex ids according to the
    + * `global2local` map. Each local vertex id is a valid index into 
`vertexAttrs`, which stores the
    + * corresponding vertex attribute, and `local2global`, which stores the 
reverse mapping to global
    + * vertex id. The global vertex ids that are active are optionally stored 
in `activeSet`.
    + *
    + * The edges are clustered by source vertex id, and the mapping from 
global vertex id to the index
    + * of the corresponding edge cluster is stored in `index`.
      *
      * @tparam ED the edge attribute type
      * @tparam VD the vertex attribute type
      *
    - * @param srcIds the source vertex id of each edge
    - * @param dstIds the destination vertex id of each edge
    + * @param localSrcIds the local source vertex id of each edge as an index 
into `local2global` and
    + *   `vertexAttrs`
    + * @param localDstIds the local destination vertex id of each edge as an 
index into `local2global`
    + *   and `vertexAttrs`
      * @param data the attribute associated with each edge
    - * @param index a clustered index on source vertex id
    - * @param vertices a map from referenced vertex ids to their corresponding 
attributes. Must
    - *   contain all vertex ids from `srcIds` and `dstIds`, though not 
necessarily valid attributes for
    - *   those vertex ids. The mask is not used.
    + * @param index a clustered index on source vertex id as a map from each 
global source vertex id to
    + *   the offset in the edge arrays where the cluster for that vertex id 
begins
    + * @param global2local a map from referenced vertex ids to local ids which 
index into vertexAttrs
    + * @param local2global an array of global vertex ids where the offsets are 
local vertex ids
    + * @param vertexAttrs an array of vertex attributes where the offsets are 
local vertex ids
      * @param activeSet an optional active vertex set for filtering 
computation on the edges
      */
     private[graphx]
     class EdgePartition[
         @specialized(Char, Int, Boolean, Byte, Long, Float, Double) ED: 
ClassTag, VD: ClassTag](
    -    val srcIds: Array[VertexId] = null,
    -    val dstIds: Array[VertexId] = null,
    -    val data: Array[ED] = null,
    -    val index: GraphXPrimitiveKeyOpenHashMap[VertexId, Int] = null,
    -    val vertices: VertexPartition[VD] = null,
    -    val activeSet: Option[VertexSet] = None
    -  ) extends Serializable {
    +    localSrcIds: Array[Int],
    +    localDstIds: Array[Int],
    +    data: Array[ED],
    +    index: GraphXPrimitiveKeyOpenHashMap[VertexId, Int],
    +    global2local: GraphXPrimitiveKeyOpenHashMap[VertexId, Int],
    +    local2global: Array[VertexId],
    +    vertexAttrs: Array[VD],
    +    activeSet: Option[VertexSet])
    +  extends Serializable {
     
    -  /** Return a new `EdgePartition` with the specified edge data. */
    -  def withData[ED2: ClassTag](data_ : Array[ED2]): EdgePartition[ED2, VD] 
= {
    -    new EdgePartition(srcIds, dstIds, data_, index, vertices, activeSet)
    -  }
    +  private def this() = this(null, null, null, null, null, null, null, null)
    --- End diff --
    
    note that this is used for serialization.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to