[jira] [Updated] (GIRAPH-185) Improve concurrency of putMsg / putMsgList
[ https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Wang updated GIRAPH-185: --- Attachment: GIRAPH-185.patch Fixed the problem and wait for review. Improve concurrency of putMsg / putMsgList -- Key: GIRAPH-185 URL: https://issues.apache.org/jira/browse/GIRAPH-185 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.2.0 Reporter: Bo Wang Assignee: Bo Wang Fix For: 0.2.0 Attachments: GIRAPH-185.patch Original Estimate: 2h Remaining Estimate: 2h Currently in putMsg / putMsgList, a synchronized closure is used to protect the whole transientInMessages when adding the new message. This lock prevents other concurrent calls to putMsg/putMsgList and increases the response time. We should use fine-grain locks to allow high concurrency in message communication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Change coarse-grain lock to fine-grain lock by pre-populating the message map so that each entry for a vertex can be locked individually and the hazard condition is also avoided.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4840/ --- Review request for giraph. Summary --- Improve concurrency of putMsg / putMsgList. This addresses bug GIRAPH-185. https://issues.apache.org/jira/browse/GIRAPH-185 Diffs - http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java 1328747 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1328747 Diff: https://reviews.apache.org/r/4840/diff Testing --- All test passed. Wait for review. Thanks. Thanks, Bo
[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList
[ https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259408#comment-13259408 ] jirapos...@reviews.apache.org commented on GIRAPH-185: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4840/ --- Review request for giraph. Summary --- Improve concurrency of putMsg / putMsgList. This addresses bug GIRAPH-185. https://issues.apache.org/jira/browse/GIRAPH-185 Diffs - http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java 1328747 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1328747 Diff: https://reviews.apache.org/r/4840/diff Testing --- All test passed. Wait for review. Thanks. Thanks, Bo Improve concurrency of putMsg / putMsgList -- Key: GIRAPH-185 URL: https://issues.apache.org/jira/browse/GIRAPH-185 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.2.0 Reporter: Bo Wang Assignee: Bo Wang Fix For: 0.2.0 Attachments: GIRAPH-185.patch Original Estimate: 2h Remaining Estimate: 2h Currently in putMsg / putMsgList, a synchronized closure is used to protect the whole transientInMessages when adding the new message. This lock prevents other concurrent calls to putMsg/putMsgList and increases the response time. We should use fine-grain locks to allow high concurrency in message communication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList
[ https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259412#comment-13259412 ] Avery Ching commented on GIRAPH-185: Thanks for looking at this Bo. I have a couple of questions/comments. 1) Do you have any idea what kind of performance gain there is? Can you run a few experiments? 2) The one downside is memory related. By pre-allocating a list for every vertex, we are going to use memory, whether the vertex will receive a message or not. I thought you might be looking into a higher level concurrency by using ConcurrentHashMap or something like that for transientInMessages? Improve concurrency of putMsg / putMsgList -- Key: GIRAPH-185 URL: https://issues.apache.org/jira/browse/GIRAPH-185 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.2.0 Reporter: Bo Wang Assignee: Bo Wang Fix For: 0.2.0 Attachments: GIRAPH-185.patch Original Estimate: 2h Remaining Estimate: 2h Currently in putMsg / putMsgList, a synchronized closure is used to protect the whole transientInMessages when adding the new message. This lock prevents other concurrent calls to putMsg/putMsgList and increases the response time. We should use fine-grain locks to allow high concurrency in message communication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList
[ https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259704#comment-13259704 ] Claudio Martella commented on GIRAPH-185: - I agree with Avery. Last time I modified this code, I populated the transientInMessages datastructure so that it contained lists only for vertices that were receiving messages. You could do this by using a ConcurrentHashMap and by using the putIfAbsent() method. public void add(Object key, Object val) { Queue q = map.get(key); if (q == null) { q = new ConcurrentLinkedQueue(); Queue temp = map.putIfAbsent(q); if (temp != null) q = temp; } q.add(val); } This will populate the datastructure only for vertices that have messages AND by using the ConcurrentLinkedQueue we'd also get rid of the other synchronization block. What do you think? Improve concurrency of putMsg / putMsgList -- Key: GIRAPH-185 URL: https://issues.apache.org/jira/browse/GIRAPH-185 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.2.0 Reporter: Bo Wang Assignee: Bo Wang Fix For: 0.2.0 Attachments: GIRAPH-185.patch Original Estimate: 2h Remaining Estimate: 2h Currently in putMsg / putMsgList, a synchronized closure is used to protect the whole transientInMessages when adding the new message. This lock prevents other concurrent calls to putMsg/putMsgList and increases the response time. We should use fine-grain locks to allow high concurrency in message communication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList
[ https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259798#comment-13259798 ] Bo Wang commented on GIRAPH-185: Thanks for the comments, Avery and Claudio. I am measuring the performance, but I found the #Sent messages (reported by counter) is always zero in both the original and changed version. Is that a bug? In the output of Quick Start Guide, it is also zero. I think ConcurrentHashMap is a good way to go. But for the current approach, I think the memory overhead won't be much since most of the vertices will receive messages except for those isolated ones and the one whose neighbors are all inactive. In comparison, ConcurrentHashMap is much larger than HashMap. An empty ConcurrentHashMap takes around 1673 bytes while an empty HashMap only takes 137 bytes. Another reason to use pre-population is to avoid the time spent on allocating new message lists and inserting them into the map. We just need to clear the list before the next super-step. What's your opinion? Thanks, Bo Improve concurrency of putMsg / putMsgList -- Key: GIRAPH-185 URL: https://issues.apache.org/jira/browse/GIRAPH-185 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.2.0 Reporter: Bo Wang Assignee: Bo Wang Fix For: 0.2.0 Attachments: GIRAPH-185.patch Original Estimate: 2h Remaining Estimate: 2h Currently in putMsg / putMsgList, a synchronized closure is used to protect the whole transientInMessages when adding the new message. This lock prevents other concurrent calls to putMsg/putMsgList and increases the response time. We should use fine-grain locks to allow high concurrency in message communication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList
[ https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259841#comment-13259841 ] Avery Ching commented on GIRAPH-185: Since we only allocate one ConcurrentHashMap per worker, the empty overhead isn't a concern. If however, the per element memory cost of a entry into the concurrent hash map is much more expensive then I would definitely be worried. We can also tune the concurrency level (default 16) to a reasonable tradeoff. Improve concurrency of putMsg / putMsgList -- Key: GIRAPH-185 URL: https://issues.apache.org/jira/browse/GIRAPH-185 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.2.0 Reporter: Bo Wang Assignee: Bo Wang Fix For: 0.2.0 Attachments: GIRAPH-185.patch Original Estimate: 2h Remaining Estimate: 2h Currently in putMsg / putMsgList, a synchronized closure is used to protect the whole transientInMessages when adding the new message. This lock prevents other concurrent calls to putMsg/putMsgList and increases the response time. We should use fine-grain locks to allow high concurrency in message communication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (GIRAPH-23) Giraph causes capacity scheduler to report crazy statistics
[ https://issues.apache.org/jira/browse/GIRAPH-23?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan resolved GIRAPH-23. --- Resolution: Not A Problem We tracked this down to the mapred.job.map.memory.mb and mapred.job.reducer.memory.mb settings which have to be set for 1.0 but not for 0.20. Setting them causes the 0.20 JobTracker to go a bit crazy while the job is running, but this is a Hadoop problem, not a Giraph one. Giraph causes capacity scheduler to report crazy statistics --- Key: GIRAPH-23 URL: https://issues.apache.org/jira/browse/GIRAPH-23 Project: Giraph Issue Type: Bug Environment: Hadoop 20.2, non-secure with capacity scheduler Reporter: Jakob Homan Not sure why, but all our Giraph jobs create crazy values for the scheduler in terms of number of mappers: {noformat}51 running map tasks using -52224 map slots, 0 running reduce tasks using 0 reduce slots. {noformat} and this trickles out to the whole cluster: {noformat}Used capacity: -58229 (-12468.7% of Capacity){noformat} These numbers don't appear to affect the job and the correct themselves a short time after the Giraph job finishes running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira