I would appreciate it if you could lend me your assistance with another
problem of mine.
I have an implementation of TriangleCounting algorithm that runs correctly
on the smaller dataset I used to test ConnectedComponents, but fails when
trying to compute this larger dataset.
the map seems to fail and I do not know why. Full output is below.
14/04/17 16:12:30 INFO
job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
writeHaltInstructions: To halt after next superstep execute:
'bin/halt-application --zkServer ricotta.eecs.qmul.ac.uk:2181 --zkNode
/_hadoopBsp/job_1381849812331_2770/_haltComputation'
14/04/17 16:12:31 INFO mapreduce.Job: Running job: job_1381849812331_2770
14/04/17 16:12:31 INFO job.JobProgressTracker: Data from 3 workers -
Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
loaded, 0 edge input splits loaded; min free memory on worker 2 - 141.96MB,
average 142.66MB
14/04/17 16:12:32 INFO mapreduce.Job: Job job_1381849812331_2770 running in
uber mode : false
14/04/17 16:12:32 INFO mapreduce.Job: map 100% reduce 0%
14/04/17 16:12:36 INFO job.JobProgressTracker: Data from 3 workers -
Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
loaded, 0 edge input splits loaded; min free memory on worker 2 - 141.96MB,
average 142.66MB
14/04/17 16:12:41 INFO job.JobProgressTracker: Data from 1 workers -
Compute superstep 1: 0 out of 378222 vertices computed; 0 out of 3
partitions computed; min free memory on worker 2 - 24.77MB, average 103.6MB
14/04/17 16:12:46 INFO job.JobProgressTracker: Data from 3 workers -
Compute superstep 1: 0 out of 1134723 vertices computed; 0 out of 9
partitions computed; min free memory on worker 1 - 22.5MB, average 23.36MB
14/04/17 16:12:48 INFO mapreduce.Job: Job job_1381849812331_2770 failed
with state FAILED due to: Task failed task_1381849812331_2770_m_000002
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/04/17 16:12:48 INFO mapreduce.Job: Counters: 46
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=143668
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=37028489
HDFS: Number of bytes written=0
HDFS: Number of read operations=3
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed map tasks=1
Launched map tasks=3
Other local map tasks=3
Total time spent by all maps in occupied slots (ms)=24219
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
Map input records=2
Map output records=0
Input split bytes=88
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=22209
CPU time spent (ms)=77200
Physical memory (bytes) snapshot=659660800
Virtual memory (bytes) snapshot=1657229312
Total committed heap usage (bytes)=372899840
Giraph Stats
Aggregate edges=0
Aggregate finished vertices=0
Aggregate sent message message bytes=0
Aggregate sent messages=0
Aggregate vertices=0
Current master task partition=0
Current workers=0
Last checkpointed superstep=0
Sent message bytes=0
Sent messages=0
Superstep=0
Giraph Timers
Initialize (ms)=0
Setup (ms)=0
Shutdown (ms)=0
Total (ms)=0
Zookeeper base path
/_hadoopBsp/job_1381849812331_2770=0
Zookeeper halt node
/_hadoopBsp/job_1381849812331_2770/_haltComputation=0
Zookeeper server:port
ricotta.eecs.qmul.ac.uk:2181=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
Thanks,
Ghufran
On Thu, Apr 17, 2014 at 4:21 PM, ghufran malik <[email protected]>wrote:
> oh woops! yes i meant i change it to an undirected format!
>
>
> On Thu, Apr 17, 2014 at 4:11 PM, ghufran malik <[email protected]>wrote:
>
>> Hi Jae,
>>
>> Thanks so much for pointing out that it wasn't directed. I made the
>> changes and made a directed graph and connected components now works :)
>>
>> Thanks,
>> Ghufran
>>
>>
>> On Wed, Apr 16, 2014 at 7:31 PM, Yu, Jaewook <[email protected]>wrote:
>>
>>> Ghufran,
>>>
>>>
>>>
>>> The Youtube community dataset
>>> (com-youtube.ungraph.txt.gz<https://snap.stanford.edu/data/bigdata/communities/com-youtube.ungraph.txt.gz>)
>>> [1] is formatted as directed graph although the description says it’s
>>> undirected graph. With some minor changes in your conversion program, you
>>> should be able to generated a proper undirected adjacency list.
>>>
>>>
>>>
>>> Hope this will help.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Jae
>>>
>>>
>>>
>>> [1] https://snap.stanford.edu/data/com-Youtube.html
>>>
>>>
>>>
>>> *From:* Yu, Jaewook [mailto:[email protected]]
>>> *Sent:* Wednesday, April 16, 2014 11:00 AM
>>> *To:* [email protected]
>>> *Subject:* RE: Running ConnectedComponents in a cluster.
>>>
>>>
>>>
>>> Hi Ghufran,
>>>
>>>
>>>
>>> Have you verified the neighbors of each vertex actually exist? From your
>>> adjacency list, for example, 278447 278447 532613, is the neighbor’s vertex
>>> id 532613 valid?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Jae
>>>
>>>
>>>
>>>
>>>
>>> *From:* ghufran malik
>>> [mailto:[email protected]<[email protected]>]
>>>
>>> *Sent:* Wednesday, April 16, 2014 9:22 AM
>>> *To:* [email protected]
>>> *Subject:* Running ConnectedComponents in a cluster.
>>>
>>>
>>>
>>> Hi,
>>>
>>> I have setup Giraph on my university cluster of computers (Giraph
>>> 1.1.0-SNAPSHOT-for-hadoop-2.0.0-cdh4.3.1). I've successfully ran the
>>> connected components algorithm on a very small test dataset using 4 workers
>>> and it produced the expected output.
>>>
>>>
>>> dataset:
>>>
>>> vertex id, vertex value, neighbours....
>>>
>>> 0 0 1
>>> 1 1 0 2 3
>>> 2 2 1 3
>>> 3 3 1 2
>>>
>>> output:
>>> 1 0
>>> 0 0
>>> 3 0
>>> 2 0
>>>
>>>
>>>
>>> However when I tried to run this algorithm on a larger dataset
>>> (reformatted version of com-youtube.ungraph from Stanford snap to match the
>>> IntIntNullTextVertexInputFormat) it successfully complets but the incorrect
>>> output is produced. It seems to just output the vertex id with its orignal
>>> value (its vertex id is its original value that i set).
>>>
>>> A snippet of the dataset is provided:
>>>
>>> vertex id, vertex value, neighbours....
>>> .......
>>> 278447 278447 532613
>>> 278449 278449 305447 324115 414238
>>> 83899 83899 153460 172614 176613 211448
>>> 773749 773749 845366
>>> 773748 773748 960388
>>> .......
>>>
>>> output produced:
>>> .............
>>> 73132 73132
>>> 831308 831308
>>> 199788 199788
>>> 763644 763644
>>> 300572 300572
>>> .............
>>>
>>> there's not one vertex value that isn't the same as its original vertex
>>> ID.
>>>
>>> The computation also stops after superstep 0 is done and goes no
>>> further, whereas on my smaller data set completes 3 supersteps.
>>>
>>> Does anyone have an idea to why this is?
>>>
>>> Kind regards,
>>>
>>> Ghufran
>>>
>>>
>>>
>>
>>
>