Eric,

Can you please check it again, in both logs you attached we are waiting on the 
worker 13 to send data, so none of those can't be worker 13's log.

Maja

From: Eric Kimbrel 
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, May 16, 2013 2:15 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Broadcast of large aggregated value is slow.

One of the attached logs is worker 13,  During this time period it is waiting 
for an aggregator request so that it can start the super step.


Eric Kimbrel
Software Engineer I Data Fusion & Analytics
Sotera Defense Solutions, Inc.
o: 360-516-6621
c: 360-990-1873
e: [email protected]<mailto:[email protected]>
w: 
www.potomacfusion.com<https://urldefense.proofpoint.com/v1/url?u=http://www.potomacfusion.com/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0A&m=IVLhuSbQeHVpz2XEdAMnlmA5DbtqWgrwg930PpuMQoQ%3D%0A&s=b933c5068d68b34f5bfbac0db0f8eb919a01dacd3555330fe3147bbf53399d72>
 | 
www.soteradefense.com<https://urldefense.proofpoint.com/v1/url?u=http://www.soteradefense.com/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0A&m=IVLhuSbQeHVpz2XEdAMnlmA5DbtqWgrwg930PpuMQoQ%3D%0A&s=f5fe0f489b7bfc207fb44206c50b2f74d0763169d0db18557586da3ce1d83443>
Agility. Ingenuity. Integrity.


From: Maja Kabiljo <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, May 16, 2013 2:11 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Broadcast of large aggregated value is slow.
Resent-From: 
<[email protected]<mailto:[email protected]>>

Eric,

Can you please take a look at the logs of one of the workers listed (13, 34, 
38, 50, 48, 52, 58, 56), what are they doing? The fact that a worker is waiting 
on aggregator can have different causes, it doesn’t necessarily mean that 
sending aggregators is slow. It can for example mean that some workers finished 
computing before others and are now waiting for others to finish and send their 
data.
How big are aggregators which you are using?

Thanks,
Maja

From: Eric Kimbrel 
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, May 16, 2013 2:00 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Broadcast of large aggregated value is slow.

>From the attached logs in original post, you can see that both workers use 
>about 4 seconds of compute time on super step 4, but they complete super step 
>4 about 10 minutes apart.


Eric Kimbrel
Software Engineer I Data Fusion & Analytics
Sotera Defense Solutions, Inc.
o: 360-516-6621
c: 360-990-1873
e: [email protected]<mailto:[email protected]>
w: 
www.potomacfusion.com<https://urldefense.proofpoint.com/v1/url?u=http://www.potomacfusion.com/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0A&m=rMjEN5TrXaS2BX1KqSuqFERFV5ssM40qL4bcaGFCtvE%3D%0A&s=206a9bd1407d0a4e7cdc6007d5c113baf96438de1c17043e501877ff185a6a3c>
 | 
www.soteradefense.com<https://urldefense.proofpoint.com/v1/url?u=http://www.soteradefense.com/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0A&m=rMjEN5TrXaS2BX1KqSuqFERFV5ssM40qL4bcaGFCtvE%3D%0A&s=e2806a46969606798541933625edcd907e560f71b173ad03f7eda8fb18ff175a>
Agility. Ingenuity. Integrity.


From: Eric Kimbrel 
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, May 16, 2013 1:50 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Broadcast of large aggregated value is slow.


I have an giraph job in which the Master will read a chunk of a file from HDFS, 
and then use an aggregator to broadcast the data to all vertices.  No other 
messages are sent, and no vertices aggregate values, only the master.

In the attached logs you can see that the time spent to broadcast the data to 
all vertices is slow, and seems to be hanging up somehwere.  It appears that 
the majority of workers receive the data in 10-15 seconds, but then nothing 
happens for around 10 minutes.  Log snippet shown below

Is there a known reason why transmitting this data during the synchronization 
is taking so long, or anything that can be done to speed it up?


2013-05-16 11:09:03,041 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 30 more tasks to send their aggregator data
2013-05-16 11:09:14,444 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 10 more tasks to send their aggregator 
data, task ids: [13, 20, 22, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:09:25,190 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:09:45,191 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:10:05,191 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:10:15,192 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:10:35,193 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:10:55,193 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:11:05,194 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:11:25,195 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:11:45,196 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:12:05,196 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:12:15,197 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:12:35,198 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:12:55,198 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:13:05,199 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:13:25,200 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:13:45,201 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:14:05,201 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:14:15,202 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:14:35,203 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:14:55,204 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:15:15,205 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:15:35,205 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:15:45,206 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:16:05,207 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:16:25,208 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:16:45,208 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:16:55,209 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:17:15,210 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:17:35,210 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:17:45,211 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:18:05,212 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:18:19,841 INFO 
org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server window 
metrics MBytes/sec sent = 0, MBytes/sec received = 0.027, MBytesSent = 0.0006, 
MBytesReceived = 15.4028, ave sent req MBytes = 0, ave received req MBytes = 
0.0034, secs waited = 571.136






Eric Kimbrel
Software Engineer I Data Fusion & Analytics
Sotera Defense Solutions, Inc.
o: 360-516-6621
c: 360-990-1873
e: [email protected]<mailto:[email protected]>
w: 
www.potomacfusion.com<https://urldefense.proofpoint.com/v1/url?u=http://www.potomacfusion.com/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0A&m=rMjEN5TrXaS2BX1KqSuqFERFV5ssM40qL4bcaGFCtvE%3D%0A&s=206a9bd1407d0a4e7cdc6007d5c113baf96438de1c17043e501877ff185a6a3c>
 | 
www.soteradefense.com<https://urldefense.proofpoint.com/v1/url?u=http://www.soteradefense.com/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0A&m=rMjEN5TrXaS2BX1KqSuqFERFV5ssM40qL4bcaGFCtvE%3D%0A&s=e2806a46969606798541933625edcd907e560f71b173ad03f7eda8fb18ff175a>
Agility. Ingenuity. Integrity.

Reply via email to