[jira] [Commented] (GIRAPH-154) Worker ports are not synched properly with its peers
[ https://issues.apache.org/jira/browse/GIRAPH-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232015#comment-13232015 ] Hudson commented on GIRAPH-154: --- Integrated in Giraph-trunk-Commit #86 (See [https://builds.apache.org/job/Giraph-trunk-Commit/86/]) GIRAPH-154: Worker ports are not synched properly with its peers (Zhiwei Gu via aching). (Revision 1301962) Result = SUCCESS aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301962 Files : * /incubator/giraph/trunk/CHANGELOG * /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java * /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java * /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java * /incubator/giraph/trunk/src/test/java/org/apache/giraph/examples/TryMultiRpcBindingPortsTest.java > Worker ports are not synched properly with its peers > > > Key: GIRAPH-154 > URL: https://issues.apache.org/jira/browse/GIRAPH-154 > Project: Giraph > Issue Type: Bug > Components: bsp >Affects Versions: 0.2.0 >Reporter: Zhiwei Gu >Assignee: Zhiwei Gu > Attachments: GIRAPH-154.patch > > > When worker trying multiple ports to setup the rpc server, the final port is > not synched with it's peer workers properly, and resulted in peer workers > send message to the default port. > Here is some logs: > > Base port: 34900 > > > log for worker 161: > > IPC Server handler 98 on 36061: starting > BasicRPCCommunications: Started RPC communication server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:36061 with 100 handlers and 199 > flush threads on bind attempt 1 > IPC Server handler 99 on 36061: starting > setup: Registering health of this worker... > getJobState: Job state already exists > (/_hadoopBsp/job_201203130609_14838/_masterJobState) > getApplicationAttempt: Node > /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists! > getApplicationAttempt: Node > /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists! > registerHealth: Created my health node for attempt=0, superstep=-1 with > /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta32085.tan.ygrid.yahoo.com_161 > and workerInfo= Worker(hostname=gsta32085.tan.ygrid.yahoo.com, > MRpartition=161, port=35061) > process: partitionAssignmentsReadyChanged (partitions are assigned) > startSuperstep: Ready for computation on superstep -1 since worker selection > and vertex range assignments are done in > /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_partitionAssignments > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 0 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 1 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 2 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 3 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 4 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 5 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 6 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 7 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 8 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 9 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 10 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 11 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 12 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 13 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 14 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 15 time(s). > Retrying connect to server: > gsta32
[jira] [Commented] (GIRAPH-154) Worker ports are not synched properly with its peers
[ https://issues.apache.org/jira/browse/GIRAPH-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232012#comment-13232012 ] Avery Ching commented on GIRAPH-154: Nice work Zhiwei (+1), I verified it as well and committed. Will close once Hudson verifies as well. > Worker ports are not synched properly with its peers > > > Key: GIRAPH-154 > URL: https://issues.apache.org/jira/browse/GIRAPH-154 > Project: Giraph > Issue Type: Bug > Components: bsp >Affects Versions: 0.2.0 >Reporter: Zhiwei Gu >Assignee: Zhiwei Gu > Attachments: GIRAPH-154.patch > > > When worker trying multiple ports to setup the rpc server, the final port is > not synched with it's peer workers properly, and resulted in peer workers > send message to the default port. > Here is some logs: > > Base port: 34900 > > > log for worker 161: > > IPC Server handler 98 on 36061: starting > BasicRPCCommunications: Started RPC communication server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:36061 with 100 handlers and 199 > flush threads on bind attempt 1 > IPC Server handler 99 on 36061: starting > setup: Registering health of this worker... > getJobState: Job state already exists > (/_hadoopBsp/job_201203130609_14838/_masterJobState) > getApplicationAttempt: Node > /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists! > getApplicationAttempt: Node > /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists! > registerHealth: Created my health node for attempt=0, superstep=-1 with > /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta32085.tan.ygrid.yahoo.com_161 > and workerInfo= Worker(hostname=gsta32085.tan.ygrid.yahoo.com, > MRpartition=161, port=35061) > process: partitionAssignmentsReadyChanged (partitions are assigned) > startSuperstep: Ready for computation on superstep -1 since worker selection > and vertex range assignments are done in > /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_partitionAssignments > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 0 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 1 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 2 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 3 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 4 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 5 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 6 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 7 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 8 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 9 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 10 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 11 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 12 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 13 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 14 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 15 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 16 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 17 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 18 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 19 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 20 time(s). > Retrying connect to server: > gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried