but in the reducer for Job1, you have :
vertexId => {adjcencyVertex complete list, partitonid=0}so partition Id's for vertices in the adjacency list are not available. So essentially what I am trying to get output as <vertex_id,partitionId>,<list > where each element of list is of type <vertex_id,partitionId> can this be achieved in single map-reduce job? Thanks Ravikant On Thu, Jun 25, 2015 at 9:25 AM, Harshit Mathur <[email protected]> wrote: > yeah you can store it as well in your custom object like you are storing > adjacency list. > > On Wed, Jun 24, 2015 at 10:10 PM, Ravikant Dindokar < > [email protected]> wrote: > >> Hi Harshit, >> >> Is there any way to retain the partition id for each vertex in the >> adjacency list? >> >> >> Thanks >> Ravikant >> >> On Wed, Jun 24, 2015 at 7:55 PM, Ravikant Dindokar < >> [email protected]> wrote: >> >>> Thanks Harshit >>> >>> On Wed, Jun 24, 2015 at 5:35 PM, Harshit Mathur <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> >>>> This may be the solution (i hope i understood the problem correctly) >>>> >>>> Job 1: >>>> >>>> You need to have two Mappers one reading from Edge File and the other >>>> reading from Partition file. >>>> Say, EdgeFileMapper and PartitionFileMapper, and a common Reducer. >>>> Now you can have a custom writable (say GraphCustomObject) holding the >>>> following, >>>> 1)type : a representation of the object coming from which mapper >>>> 2)Adjacency vertex list: list of adjacency vertex >>>> 3)partiton Id: to hold the partition id >>>> >>>> Now the output key and value of the EdgeFileMapper will be, >>>> key=> vertexId >>>> value=> {type=edgefile; adjcencyVertex, partitonid=0(this will not be >>>> present in this file) >>>> >>>> The output of PartitionFileMapper will be, >>>> key=>vertexId >>>> value=>{type=partitionfile; adjcencyVertex=0, partitonid) >>>> >>>> >>>> So in the Reducer for each VertexId we will can have the complete >>>> GraphCustomObject populated. >>>> vertexId => {adjcencyVertex complete list, partitonid=0} >>>> >>>> The output of this reducer will be, >>>> key=> partitionId >>>> Value=> {adjcencyVertexList, vertexId} >>>> This will be the stored as output of job1. >>>> >>>> Job 2 >>>> This job will read the output generated in the previous job and use >>>> identity Mapper, so in the reducer we will have >>>> key=> partitionId >>>> value=> list of all the adjacency vertexlist along with vertexid >>>> >>>> >>>> >>>> I know my explanation seems a bit messy, sorry for that. >>>> >>>> BR, >>>> Harshit >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Jun 24, 2015 at 12:05 PM, Ravikant Dindokar < >>>> [email protected]> wrote: >>>> >>>>> Hi Hadoop user, >>>>> >>>>> I want to use hadoop for performing operation on graph data >>>>> I have two file : >>>>> >>>>> 1. Edge list file >>>>> This file contains one line for each edge in the graph. >>>>> sample: >>>>> 1 2 (here 1 is source and 2 is sink node for the edge) >>>>> 1 5 >>>>> 2 3 >>>>> 4 2 >>>>> 4 3 >>>>> 5 6 >>>>> 5 4 >>>>> 5 7 >>>>> 7 8 >>>>> 8 9 >>>>> 8 10 >>>>> >>>>> 2. Partition file : >>>>> This file contains one line for each vertex. Each line has >>>>> two values first number is <vertex id> and second number is <partition id >>>>> > >>>>> sample : <vertex id> <partition id > >>>>> 2 1 >>>>> 3 1 >>>>> 4 1 >>>>> 5 2 >>>>> 6 2 >>>>> 7 2 >>>>> 8 1 >>>>> 9 1 >>>>> 10 1 >>>>> >>>>> >>>>> The Edge list file is having size of 32Gb, while partition file is of >>>>> 10Gb. >>>>> (size is so large that map/reduce can read only partition file . I >>>>> have 20 node cluster with 24Gb memory per node.) >>>>> >>>>> My aim is to get all vertices (along with their adjacency list )those >>>>> having same partition id in one reducer so that I can perform further >>>>> analytics on a given partition in reducer. >>>>> >>>>> Is there any way in hadoop to get join of these two file in mapper and >>>>> so that I can map based on the partition id ? >>>>> >>>>> Thanks >>>>> Ravikant >>>>> >>>> >>>> >>>> >>>> -- >>>> Harshit Mathur >>>> >>> >>> >> > > > -- > Harshit Mathur >
