yeah you can store it as well in your custom object like you are storing adjacency list.
On Wed, Jun 24, 2015 at 10:10 PM, Ravikant Dindokar <[email protected] > wrote: > Hi Harshit, > > Is there any way to retain the partition id for each vertex in the > adjacency list? > > > Thanks > Ravikant > > On Wed, Jun 24, 2015 at 7:55 PM, Ravikant Dindokar < > [email protected]> wrote: > >> Thanks Harshit >> >> On Wed, Jun 24, 2015 at 5:35 PM, Harshit Mathur <[email protected]> >> wrote: >> >>> Hi, >>> >>> >>> This may be the solution (i hope i understood the problem correctly) >>> >>> Job 1: >>> >>> You need to have two Mappers one reading from Edge File and the other >>> reading from Partition file. >>> Say, EdgeFileMapper and PartitionFileMapper, and a common Reducer. >>> Now you can have a custom writable (say GraphCustomObject) holding the >>> following, >>> 1)type : a representation of the object coming from which mapper >>> 2)Adjacency vertex list: list of adjacency vertex >>> 3)partiton Id: to hold the partition id >>> >>> Now the output key and value of the EdgeFileMapper will be, >>> key=> vertexId >>> value=> {type=edgefile; adjcencyVertex, partitonid=0(this will not be >>> present in this file) >>> >>> The output of PartitionFileMapper will be, >>> key=>vertexId >>> value=>{type=partitionfile; adjcencyVertex=0, partitonid) >>> >>> >>> So in the Reducer for each VertexId we will can have the complete >>> GraphCustomObject populated. >>> vertexId => {adjcencyVertex complete list, partitonid=0} >>> >>> The output of this reducer will be, >>> key=> partitionId >>> Value=> {adjcencyVertexList, vertexId} >>> This will be the stored as output of job1. >>> >>> Job 2 >>> This job will read the output generated in the previous job and use >>> identity Mapper, so in the reducer we will have >>> key=> partitionId >>> value=> list of all the adjacency vertexlist along with vertexid >>> >>> >>> >>> I know my explanation seems a bit messy, sorry for that. >>> >>> BR, >>> Harshit >>> >>> >>> >>> >>> >>> >>> >>> >>> On Wed, Jun 24, 2015 at 12:05 PM, Ravikant Dindokar < >>> [email protected]> wrote: >>> >>>> Hi Hadoop user, >>>> >>>> I want to use hadoop for performing operation on graph data >>>> I have two file : >>>> >>>> 1. Edge list file >>>> This file contains one line for each edge in the graph. >>>> sample: >>>> 1 2 (here 1 is source and 2 is sink node for the edge) >>>> 1 5 >>>> 2 3 >>>> 4 2 >>>> 4 3 >>>> 5 6 >>>> 5 4 >>>> 5 7 >>>> 7 8 >>>> 8 9 >>>> 8 10 >>>> >>>> 2. Partition file : >>>> This file contains one line for each vertex. Each line has two >>>> values first number is <vertex id> and second number is <partition id > >>>> sample : <vertex id> <partition id > >>>> 2 1 >>>> 3 1 >>>> 4 1 >>>> 5 2 >>>> 6 2 >>>> 7 2 >>>> 8 1 >>>> 9 1 >>>> 10 1 >>>> >>>> >>>> The Edge list file is having size of 32Gb, while partition file is of >>>> 10Gb. >>>> (size is so large that map/reduce can read only partition file . I have >>>> 20 node cluster with 24Gb memory per node.) >>>> >>>> My aim is to get all vertices (along with their adjacency list )those >>>> having same partition id in one reducer so that I can perform further >>>> analytics on a given partition in reducer. >>>> >>>> Is there any way in hadoop to get join of these two file in mapper and >>>> so that I can map based on the partition id ? >>>> >>>> Thanks >>>> Ravikant >>>> >>> >>> >>> >>> -- >>> Harshit Mathur >>> >> >> > -- Harshit Mathur
