[jira] [Commented] (GIRAPH-46) Race condition on superstep 1 with RPC servers not started by the time that requests are sent

2011-10-01 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118871#comment-13118871
 ] 

Hudson commented on GIRAPH-46:
--

Integrated in Giraph-trunk-Commit #12 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/12/])
GIRAPH-46: Race condition on superstep 1 with RPC servers not started
by the time that requests are sent. (aching)

aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178065
Files : 
* /incubator/giraph/trunk/CHANGELOG
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java


> Race condition on superstep 1 with RPC servers not started by the time that 
> requests are sent
> -
>
> Key: GIRAPH-46
> URL: https://issues.apache.org/jira/browse/GIRAPH-46
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.70.0
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Fix For: 0.70.0
>
> Attachments: diff.txt
>
>
> Hi,
> occasionally (maybe one time in four), my giraph run fails because of the 
> below RuntimeException.
> According to code, it should never happen:
> if (msgMap == null) { // should never happen after constructor throw new 
> RuntimeException( "sendMessage: msgMap did not exist for " + addr + " for 
> vertex " + destVertex); }
> This happens during superstep 1 (second superstep). My application actually 
> *adds* edges on superstep 1
> (to make every out-edge also an in-edge of the destination), but since I am 
> running only on 3 workers,
> I am surprised if every worker would not had been registered in the RPC layer 
> initially.
> One hypothesis is that Hadoop does something funny, because one of my server 
> was under heavy
> load. Maybe Hadoop launched another worker to replace a slow worker? Can it 
> happen?
> java.lang.RuntimeException: sendMessage: msgMap did not exist for 
> [hostname].ml.cmu.edu:30003 for vertex 875713
> at 
> org.apache.giraph.comm.BasicRPCCommunications.sendMessageReq(BasicRPCCommunications.java:825)
> at org.apache.giraph.graph.BasicVertex.sendMsg(BasicVertex.java:179)
> at edu.cmu.selectlab.BP.BinaryBPVertex.compute(BinaryBPVertex.java:94)
> at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:624)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-46) Race condition on superstep 1 with RPC servers not started by the time that requests are sent

2011-10-01 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118865#comment-13118865
 ] 

Jakob Homan commented on GIRAPH-46:
---

+1

> Race condition on superstep 1 with RPC servers not started by the time that 
> requests are sent
> -
>
> Key: GIRAPH-46
> URL: https://issues.apache.org/jira/browse/GIRAPH-46
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.70.0
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Fix For: 0.70.0
>
> Attachments: diff.txt
>
>
> Hi,
> occasionally (maybe one time in four), my giraph run fails because of the 
> below RuntimeException.
> According to code, it should never happen:
> if (msgMap == null) { // should never happen after constructor throw new 
> RuntimeException( "sendMessage: msgMap did not exist for " + addr + " for 
> vertex " + destVertex); }
> This happens during superstep 1 (second superstep). My application actually 
> *adds* edges on superstep 1
> (to make every out-edge also an in-edge of the destination), but since I am 
> running only on 3 workers,
> I am surprised if every worker would not had been registered in the RPC layer 
> initially.
> One hypothesis is that Hadoop does something funny, because one of my server 
> was under heavy
> load. Maybe Hadoop launched another worker to replace a slow worker? Can it 
> happen?
> java.lang.RuntimeException: sendMessage: msgMap did not exist for 
> [hostname].ml.cmu.edu:30003 for vertex 875713
> at 
> org.apache.giraph.comm.BasicRPCCommunications.sendMessageReq(BasicRPCCommunications.java:825)
> at org.apache.giraph.graph.BasicVertex.sendMsg(BasicVertex.java:179)
> at edu.cmu.selectlab.BP.BinaryBPVertex.compute(BinaryBPVertex.java:94)
> at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:624)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira