[
https://issues.apache.org/jira/browse/GIRAPH-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118871#comment-13118871
]
Hudson commented on GIRAPH-46:
--
Integrated in Giraph-trunk-Commit #12 (See
[https://builds.apache.org/job/Giraph-trunk-Commit/12/])
GIRAPH-46: Race condition on superstep 1 with RPC servers not started
by the time that requests are sent. (aching)
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178065
Files :
* /incubator/giraph/trunk/CHANGELOG
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
> Race condition on superstep 1 with RPC servers not started by the time that
> requests are sent
> -
>
> Key: GIRAPH-46
> URL: https://issues.apache.org/jira/browse/GIRAPH-46
> Project: Giraph
> Issue Type: Bug
>Affects Versions: 0.70.0
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Fix For: 0.70.0
>
> Attachments: diff.txt
>
>
> Hi,
> occasionally (maybe one time in four), my giraph run fails because of the
> below RuntimeException.
> According to code, it should never happen:
> if (msgMap == null) { // should never happen after constructor throw new
> RuntimeException( "sendMessage: msgMap did not exist for " + addr + " for
> vertex " + destVertex); }
> This happens during superstep 1 (second superstep). My application actually
> *adds* edges on superstep 1
> (to make every out-edge also an in-edge of the destination), but since I am
> running only on 3 workers,
> I am surprised if every worker would not had been registered in the RPC layer
> initially.
> One hypothesis is that Hadoop does something funny, because one of my server
> was under heavy
> load. Maybe Hadoop launched another worker to replace a slow worker? Can it
> happen?
> java.lang.RuntimeException: sendMessage: msgMap did not exist for
> [hostname].ml.cmu.edu:30003 for vertex 875713
> at
> org.apache.giraph.comm.BasicRPCCommunications.sendMessageReq(BasicRPCCommunications.java:825)
> at org.apache.giraph.graph.BasicVertex.sendMsg(BasicVertex.java:179)
> at edu.cmu.selectlab.BP.BinaryBPVertex.compute(BinaryBPVertex.java:94)
> at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:624)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira