Hi Jae, Thanks so much for pointing out that it wasn't directed. I made the changes and made a directed graph and connected components now works :)
Thanks, Ghufran On Wed, Apr 16, 2014 at 7:31 PM, Yu, Jaewook <[email protected]> wrote: > Ghufran, > > > > The Youtube community dataset > (com-youtube.ungraph.txt.gz<https://snap.stanford.edu/data/bigdata/communities/com-youtube.ungraph.txt.gz>) > [1] is formatted as directed graph although the description says it’s > undirected graph. With some minor changes in your conversion program, you > should be able to generated a proper undirected adjacency list. > > > > Hope this will help. > > > > Thanks, > > Jae > > > > [1] https://snap.stanford.edu/data/com-Youtube.html > > > > *From:* Yu, Jaewook [mailto:[email protected]] > *Sent:* Wednesday, April 16, 2014 11:00 AM > *To:* [email protected] > *Subject:* RE: Running ConnectedComponents in a cluster. > > > > Hi Ghufran, > > > > Have you verified the neighbors of each vertex actually exist? From your > adjacency list, for example, 278447 278447 532613, is the neighbor’s vertex > id 532613 valid? > > > > Thanks, > > Jae > > > > > > *From:* ghufran malik > [mailto:[email protected]<[email protected]>] > > *Sent:* Wednesday, April 16, 2014 9:22 AM > *To:* [email protected] > *Subject:* Running ConnectedComponents in a cluster. > > > > Hi, > > I have setup Giraph on my university cluster of computers (Giraph > 1.1.0-SNAPSHOT-for-hadoop-2.0.0-cdh4.3.1). I've successfully ran the > connected components algorithm on a very small test dataset using 4 workers > and it produced the expected output. > > > dataset: > > vertex id, vertex value, neighbours.... > > 0 0 1 > 1 1 0 2 3 > 2 2 1 3 > 3 3 1 2 > > output: > 1 0 > 0 0 > 3 0 > 2 0 > > > > However when I tried to run this algorithm on a larger dataset > (reformatted version of com-youtube.ungraph from Stanford snap to match the > IntIntNullTextVertexInputFormat) it successfully complets but the incorrect > output is produced. It seems to just output the vertex id with its orignal > value (its vertex id is its original value that i set). > > A snippet of the dataset is provided: > > vertex id, vertex value, neighbours.... > ....... > 278447 278447 532613 > 278449 278449 305447 324115 414238 > 83899 83899 153460 172614 176613 211448 > 773749 773749 845366 > 773748 773748 960388 > ....... > > output produced: > ............. > 73132 73132 > 831308 831308 > 199788 199788 > 763644 763644 > 300572 300572 > ............. > > there's not one vertex value that isn't the same as its original vertex > ID. > > The computation also stops after superstep 0 is done and goes no further, > whereas on my smaller data set completes 3 supersteps. > > Does anyone have an idea to why this is? > > Kind regards, > > Ghufran > > >
