[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.
[ https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151910#comment-13151910 ] Arun Suresh commented on GIRAPH-77: --- This looks very related to [GIRAPH-76|https://issues.apache.org/jira/browse/GIRAPH-76] since I will be refactoring GraphMapper. I can take this up as well if you havn't already started.. Coordinator should expose a web interface with progress, vertex region assignments, etc. Key: GIRAPH-77 URL: https://issues.apache.org/jira/browse/GIRAPH-77 Project: Giraph Issue Type: New Feature Reporter: Jakob Homan It would be nice if the coordinator worker had a web interface that showed progress, splits, etc. during job execution. Right now it would duplicate information currently being exposed through task status, but with the move to YARN, it will be a necessity. It would be great if we could do this in a modern way to avoid the screen-scraping, etc. currently used to get information from most other Hadoop project's web interfaces. The coordinator could announce its address at the beginning or via status updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-94) Loading vertex ranges from HBase
Loading vertex ranges from HBase Key: GIRAPH-94 URL: https://issues.apache.org/jira/browse/GIRAPH-94 Project: Giraph Issue Type: New Feature Reporter: Claudio Martella Assignee: Claudio Martella Loading vertices from an HTable would be an option. A possible schema for storing the graph would be Hexastore (http://www.vldb.org/pvldb/1/1453965.pdf). Also, as vertices whom messages are sent to get created on the fly (if they don't exist already), we could potentially have a HBaseVertex that fetches the adjacency list + vertex value from HBase. That would be kind of a Lazy-load approach, if you can define the initial split as an HBase query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
[ https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151965#comment-13151965 ] Hudson commented on GIRAPH-91: -- Integrated in Giraph-trunk-Commit #36 (See [https://builds.apache.org/job/Giraph-trunk-Commit/36/]) GIRAPH-91: Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings). (aching) aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1203130 Files : * /incubator/giraph/trunk/CHANGELOG * /incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java * /incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java * /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java * /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java * /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java * /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java * /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java * /incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java * /incubator/giraph/trunk/src/test/java/org/apache/giraph/graph * /incubator/giraph/trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings) --- Key: GIRAPH-91 URL: https://issues.apache.org/jira/browse/GIRAPH-91 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-91.diff Current vertex implementation uses a HashMap for storing the edges, which is quite memory heavy for large graphs. The default settings in Giraph need to be improved for large graphs and heaps of 20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-95) vertex resolution expects MutableVertex instead of BasicVertex
vertex resolution expects MutableVertex instead of BasicVertex -- Key: GIRAPH-95 URL: https://issues.apache.org/jira/browse/GIRAPH-95 Project: Giraph Issue Type: Bug Components: graph Reporter: Claudio Martella At the beginning of the superstep, when a message is sent to non-existing vertex, the new vertex is created. This new vertex id is set through setVertexId() which belongs to MutableVertex. Should use initialize() instead. See BspRPCCommunication:948 (on my local trunk) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-96) Support for Graphs with Huge adjacency lists
Support for Graphs with Huge adjacency lists Key: GIRAPH-96 URL: https://issues.apache.org/jira/browse/GIRAPH-96 Project: Giraph Issue Type: Improvement Reporter: Arun Suresh Currently the vertex initialize() method is passed the complete adjacency list as a HashMap. All the current concrete implementations of Vertex iterate over the adjacency list and recreate new Data Structures within the Vertex instance to hold/manipulate the adjacency list. This would seize to be feasible once the size of the adjacency list becomes really huge. I propose storing the adjacency list and all vertex information (and incoming messages ?) in a distributed data store such as HBase. The adjacency list can be lazily loaded via HBase Scans. I was thinking of an HBase schema where the row Id is a concatenation of VertexID+OutboundVertexId with a single column containing the edge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-96) Support for Graphs with Huge adjacency lists
[ https://issues.apache.org/jira/browse/GIRAPH-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152087#comment-13152087 ] Arun Suresh commented on GIRAPH-96: --- Looks like Claudio beat me to a similar suggestion [GIRAPH-94|https://issues.apache.org/jira/browse/GIRAPH-94] My proposal was more for a standard means of storing vertex/adjacency list information. The Giraph framework would handle the storage and would expose APIs which the Vertex reader can use to store the information as it reads the graph. The user would then not be required to subclass a Vertex class and implement the initialize() method. All adjacency list/vertex manipulation would go thru the common data store. Support for Graphs with Huge adjacency lists Key: GIRAPH-96 URL: https://issues.apache.org/jira/browse/GIRAPH-96 Project: Giraph Issue Type: Improvement Reporter: Arun Suresh Currently the vertex initialize() method is passed the complete adjacency list as a HashMap. All the current concrete implementations of Vertex iterate over the adjacency list and recreate new Data Structures within the Vertex instance to hold/manipulate the adjacency list. This would seize to be feasible once the size of the adjacency list becomes really huge. I propose storing the adjacency list and all vertex information (and incoming messages ?) in a distributed data store such as HBase. The adjacency list can be lazily loaded via HBase Scans. I was thinking of an HBase schema where the row Id is a concatenation of VertexID+OutboundVertexId with a single column containing the edge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-96) Support for Graphs with Huge adjacency lists
[ https://issues.apache.org/jira/browse/GIRAPH-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152101#comment-13152101 ] Gianmarco De Francisci Morales commented on GIRAPH-96: -- In my opinion this would make things too complex. I wouldn't like to keep my graph in HBase to run Giraph on it. Also, this makes HBase a dependency. I agree that this is a nice option to have but I wouldn't make it the default. Finally, a similar goal could be attained by streaming edges to disk and reading them with sequential scans when performing supersteps. This requires no network connection and should be much faster. You just need good out-of-core data structures and algorithms. Support for Graphs with Huge adjacency lists Key: GIRAPH-96 URL: https://issues.apache.org/jira/browse/GIRAPH-96 Project: Giraph Issue Type: Improvement Components: bsp Affects Versions: 0.70.0 Reporter: Arun Suresh Currently the vertex initialize() method is passed the complete adjacency list as a HashMap. All the current concrete implementations of Vertex iterate over the adjacency list and recreate new Data Structures within the Vertex instance to hold/manipulate the adjacency list. This would seize to be feasible once the size of the adjacency list becomes really huge. I propose storing the adjacency list and all vertex information (and incoming messages ?) in a distributed data store such as HBase. The adjacency list can be lazily loaded via HBase Scans. I was thinking of an HBase schema where the row Id is a concatenation of VertexID+OutboundVertexId with a single column containing the edge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-68) Implement a Graph Generator
[ https://issues.apache.org/jira/browse/GIRAPH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyunsik Choi updated GIRAPH-68: --- Attachment: GIRAPH-68_2.patch Avery, Thank you for review. I think that the GraphGenerator is necessary to test the overall of IO-related sub systems. For example, *InputFormat and Partitioners can be examined by some generated data set instead of PseudoRandomVertexInputFormat. As you mentioned, I modified PageRank/RandomMessageBenchmark to use a specified InputFormat and an input path. If the input format and input path are not given, they will work as the current implementation using PseudoRandomVertexInputFormat. Implement a Graph Generator --- Key: GIRAPH-68 URL: https://issues.apache.org/jira/browse/GIRAPH-68 Project: Giraph Issue Type: New Feature Components: benchmark Affects Versions: 0.70.0 Reporter: Hyunsik Choi Assignee: Hyunsik Choi Attachments: GIRAPH-68_1.patch, GIRAPH-68_2.patch To provide users with benchmark environments and to deeply test the input/output system of giraph, we need a graph generator. We will enable the graph generator to generate various kinds of graph data sets by specifying a VertexInputFormat and a VertexOutputFormat. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-68) Implement a Graph Generator
[ https://issues.apache.org/jira/browse/GIRAPH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152185#comment-13152185 ] Hyunsik Choi commented on GIRAPH-68: I missed javadoc. I will reattach the patch including javadoc. Implement a Graph Generator --- Key: GIRAPH-68 URL: https://issues.apache.org/jira/browse/GIRAPH-68 Project: Giraph Issue Type: New Feature Components: benchmark Affects Versions: 0.70.0 Reporter: Hyunsik Choi Assignee: Hyunsik Choi Attachments: GIRAPH-68_1.patch, GIRAPH-68_2.patch To provide users with benchmark environments and to deeply test the input/output system of giraph, we need a graph generator. We will enable the graph generator to generate various kinds of graph data sets by specifying a VertexInputFormat and a VertexOutputFormat. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
[ https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching resolved GIRAPH-91. --- Resolution: Fixed Thanks for the quick review Claudio. Hudson +1'ed it as well, resolving. Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings) --- Key: GIRAPH-91 URL: https://issues.apache.org/jira/browse/GIRAPH-91 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-91.diff Current vertex implementation uses a HashMap for storing the edges, which is quite memory heavy for large graphs. The default settings in Giraph need to be improved for large graphs and heaps of 20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-96) Support for Graphs with Huge adjacency lists
[ https://issues.apache.org/jira/browse/GIRAPH-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152203#comment-13152203 ] Avery Ching commented on GIRAPH-96: --- The general issue of overloading our memory and working out of core has been discussed a little in GIRAPH-45 as well. I suppose you could implement a BasicVertex that loaded everything on demand from HBase, but I suspect it would be a little slow, but depends on the application. Support for Graphs with Huge adjacency lists Key: GIRAPH-96 URL: https://issues.apache.org/jira/browse/GIRAPH-96 Project: Giraph Issue Type: Improvement Components: bsp Affects Versions: 0.70.0 Reporter: Arun Suresh Currently the vertex initialize() method is passed the complete adjacency list as a HashMap. All the current concrete implementations of Vertex iterate over the adjacency list and recreate new Data Structures within the Vertex instance to hold/manipulate the adjacency list. This would seize to be feasible once the size of the adjacency list becomes really huge. I propose storing the adjacency list and all vertex information (and incoming messages ?) in a distributed data store such as HBase. The adjacency list can be lazily loaded via HBase Scans. I was thinking of an HBase schema where the row Id is a concatenation of VertexID+OutboundVertexId with a single column containing the edge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-86) Simplify boolean expressions in ZooKeeperExt::createExt
[ https://issues.apache.org/jira/browse/GIRAPH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152284#comment-13152284 ] Attila Csordas commented on GIRAPH-86: -- it didn't help yesterday but I checked out the latest code today and now rat:check is ok. This time cannot build the project due to Failed to execute goal on project giraph: Could not resolve dependencies for project org.apache.giraph:giraph:jar:0.70: Could not find artifact org.apache.hadoop:hadoop-common:jar:0.21.0-dev-SNAPSHOT - [Help 1] do we need that at all? Simplify boolean expressions in ZooKeeperExt::createExt --- Key: GIRAPH-86 URL: https://issues.apache.org/jira/browse/GIRAPH-86 Project: Giraph Issue Type: Improvement Affects Versions: 0.70.0 Reporter: Jakob Homan Assignee: Attila Csordas Labels: newbie Attachments: GIRAPH-86.patch, pom.diff In ZooKeeperExt::createExt there are two instances of {{recursive==false}} that can be simplified to !recursive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-96) Support for Graphs with Huge adjacency lists
[ https://issues.apache.org/jira/browse/GIRAPH-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152291#comment-13152291 ] Claudio Martella commented on GIRAPH-96: it is indeed a nice discussion, the amount of data to be read is the same after all, but we're talking about random i/o here. It's a possibility. Also, answering to Gianmarco, the idea of having an HBase InputReader is the same as the current discussion on supporting Hive, Pig and HCatalog. If you store your data in HBase it can be quite useful, as much as it is not for MR. The lazy-approach could be something to investigate and anyway something that would be necessary only with huge graphs or, as in my case, where we have computations that don't necessarily touch the whole graph. Good out-of-core data structures/maps are difficult to find around, maybe linkedin's krati or leveldb (but i guess we'd have license issues there). Support for Graphs with Huge adjacency lists Key: GIRAPH-96 URL: https://issues.apache.org/jira/browse/GIRAPH-96 Project: Giraph Issue Type: Improvement Components: bsp Affects Versions: 0.70.0 Reporter: Arun Suresh Currently the vertex initialize() method is passed the complete adjacency list as a HashMap. All the current concrete implementations of Vertex iterate over the adjacency list and recreate new Data Structures within the Vertex instance to hold/manipulate the adjacency list. This would seize to be feasible once the size of the adjacency list becomes really huge. I propose storing the adjacency list and all vertex information (and incoming messages ?) in a distributed data store such as HBase. The adjacency list can be lazily loaded via HBase Scans. I was thinking of an HBase schema where the row Id is a concatenation of VertexID+OutboundVertexId with a single column containing the edge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-86) Simplify boolean expressions in ZooKeeperExt::createExt
[ https://issues.apache.org/jira/browse/GIRAPH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152328#comment-13152328 ] Jakob Homan commented on GIRAPH-86: --- +1. I've committed this. Not sure why you ended up with a reference to Hadoop 0.21. That's an unstable, unsupported version. I can't find a reference to it in our pom. In terms of the rat check, rather than making rat ignore the file we should really just fix GIRAPH-20, which is triggering this. Thanks for the contribution, Attila! Simplify boolean expressions in ZooKeeperExt::createExt --- Key: GIRAPH-86 URL: https://issues.apache.org/jira/browse/GIRAPH-86 Project: Giraph Issue Type: Improvement Affects Versions: 0.70.0 Reporter: Jakob Homan Assignee: Attila Csordas Labels: newbie Fix For: 0.70.0 Attachments: GIRAPH-86.patch, pom.diff In ZooKeeperExt::createExt there are two instances of {{recursive==false}} that can be simplified to !recursive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-86) Simplify boolean expressions in ZooKeeperExt::createExt
[ https://issues.apache.org/jira/browse/GIRAPH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152337#comment-13152337 ] Hudson commented on GIRAPH-86: -- Integrated in Giraph-trunk-Commit #37 (See [https://builds.apache.org/job/Giraph-trunk-Commit/37/]) GIRAPH-86. Simplify boolean expressions in ZooKeeperExt::createExt. Contributed by Attila Csordas. jghoman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1203369 Files : * /incubator/giraph/trunk/CHANGELOG * /incubator/giraph/trunk/src/main/java/org/apache/giraph/zk/ZooKeeperExt.java Simplify boolean expressions in ZooKeeperExt::createExt --- Key: GIRAPH-86 URL: https://issues.apache.org/jira/browse/GIRAPH-86 Project: Giraph Issue Type: Improvement Affects Versions: 0.70.0 Reporter: Jakob Homan Assignee: Attila Csordas Labels: newbie Fix For: 0.70.0 Attachments: GIRAPH-86.patch, pom.diff In ZooKeeperExt::createExt there are two instances of {{recursive==false}} that can be simplified to !recursive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.
[ https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152348#comment-13152348 ] Jakob Homan commented on GIRAPH-77: --- I've got a bit of code, but enough to block progress. I can see this as separate work from 76. I was looking at starting up a webserver on the coordinator to track all the info it's normally difficult to get during job execution. If this comes up as part of 76, go for it. Coordinator should expose a web interface with progress, vertex region assignments, etc. Key: GIRAPH-77 URL: https://issues.apache.org/jira/browse/GIRAPH-77 Project: Giraph Issue Type: New Feature Reporter: Jakob Homan It would be nice if the coordinator worker had a web interface that showed progress, splits, etc. during job execution. Right now it would duplicate information currently being exposed through task status, but with the move to YARN, it will be a necessity. It would be great if we could do this in a modern way to avoid the screen-scraping, etc. currently used to get information from most other Hadoop project's web interfaces. The coordinator could announce its address at the beginning or via status updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-92) Need outputformat for just vertex ID and value
[ https://issues.apache.org/jira/browse/GIRAPH-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated GIRAPH-92: -- Attachment: GIRAPH-92-2.patch Updated patch per Avery's comments. Will commit this. Need outputformat for just vertex ID and value -- Key: GIRAPH-92 URL: https://issues.apache.org/jira/browse/GIRAPH-92 Project: Giraph Issue Type: New Feature Components: lib Affects Versions: 0.70.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.70.0 Attachments: GIRAPH-92-2.patch, GIRAPH-92.patch We should have an text outputformat that just spits out the vertex id and value without its edges: {noformat}index.html 0.9423{noformat} This would be particularly helpful for further processing by, for instance, Pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?
[ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152356#comment-13152356 ] Jakob Homan commented on GIRAPH-83: --- bq. I think it should probably actually be called Vertex, because everything is a BasicVertex currently, so it makes sense instead to say everything is a Vertex. Absolutely agreed. bq. We can factor off lots of stuff into other classes, but the question comes down to how does the user writing their algorithm get access to them? How is it all wired together? You want compute() to get passed some state that you have right when you need it, instead of either going with inheritance or composition? That could be nice, I think, as long as we package it all up into a minimal set of *Context-like objects to carry around. Correct, this is what I'm getting at. bq. In what way are the out edges of a vertex managed by the framework currently? In that Vertex is responsible for maintaining the destEdgeMap for an implementation of Vertex, rather than implementers having to do this themselves. For each compute invocation, the vertex shouldn't assume anything about its outgoing edges, as they may have been mutated since the last call. Is Vertex correct yet? -- Key: GIRAPH-83 URL: https://issues.apache.org/jira/browse/GIRAPH-83 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages
[ https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152360#comment-13152360 ] Jakob Homan commented on GIRAPH-45: --- On a side note, is it worth considering messages to be immutable (or provide a separate annotation for these)? This would help with message de-duplication, which could be a significant help in some algorithms. One would only need to keep one copy of the message going to a particular worker, regardless of the number of vertices it is bound for. Improve the way to keep outgoing messages - Key: GIRAPH-45 URL: https://issues.apache.org/jira/browse/GIRAPH-45 Project: Giraph Issue Type: Improvement Components: bsp Reporter: Hyunsik Choi Assignee: Hyunsik Choi As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a potential problem to cause out of memory when the rate of message generation is higher than the rate of message flush (or network bandwidth). To overcome this problem, we need more eager strategy for message flushing or some approach to spill messages into disk. The below link is Dmitriy's suggestion. https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?
[ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152373#comment-13152373 ] Jake Mannix commented on GIRAPH-83: --- bq. In that Vertex is responsible for maintaining the destEdgeMap for an implementation of Vertex, rather than implementers having to do this themselves. For each compute invocation, the vertex shouldn't assume anything about its outgoing edges, as they may have been mutated since the last call. You mean that in the current Vertex class, we have the map of edges right there? It's not really in the framework, it's in the superclass, but ok, you're saying we *shouldn't* take care of the bookkeeping about this, and leave it always up to the implementations (like the way that LongDoubleFloatDoubleVertex does it with primitives)? Or that there should be some other structure which handles them? Is Vertex correct yet? -- Key: GIRAPH-83 URL: https://issues.apache.org/jira/browse/GIRAPH-83 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?
[ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152383#comment-13152383 ] Jakob Homan commented on GIRAPH-83: --- I'm saying we should be responsible for maintaining it (since we have to mutate it), but that _maybe_ it shouldn't be in Vertex itself, just to have a cleaner delineation. But Avery makes a good point and I'm not completely sold on this aspect myself. How many different memory efficient implementations of Vertex can we expect to have? Is Vertex correct yet? -- Key: GIRAPH-83 URL: https://issues.apache.org/jira/browse/GIRAPH-83 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan I'm seeing a number of people run into oddities with Vertex and am thinking we may not have it quite correct yet... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.
[ https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152386#comment-13152386 ] Jakob Homan commented on GIRAPH-77: --- s/enough/not enough/g. oy. Coordinator should expose a web interface with progress, vertex region assignments, etc. Key: GIRAPH-77 URL: https://issues.apache.org/jira/browse/GIRAPH-77 Project: Giraph Issue Type: New Feature Reporter: Jakob Homan It would be nice if the coordinator worker had a web interface that showed progress, splits, etc. during job execution. Right now it would duplicate information currently being exposed through task status, but with the move to YARN, it will be a necessity. It would be great if we could do this in a modern way to avoid the screen-scraping, etc. currently used to get information from most other Hadoop project's web interfaces. The coordinator could announce its address at the beginning or via status updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (GIRAPH-84) Simplify boolean expressions in BspRecordReader
[ https://issues.apache.org/jira/browse/GIRAPH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Csordas reassigned GIRAPH-84: Assignee: Attila Csordas Simplify boolean expressions in BspRecordReader --- Key: GIRAPH-84 URL: https://issues.apache.org/jira/browse/GIRAPH-84 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Assignee: Attila Csordas Labels: newbie Twice in BspRecordReader boolean expressions are evaluated with == and can be simplified to just one liners or variable evaluation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-84) Simplify boolean expressions in BspRecordReader
[ https://issues.apache.org/jira/browse/GIRAPH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152469#comment-13152469 ] Shaunak Kashyap commented on GIRAPH-84: --- I believe that style goes against the coding conventions for this project (see http://svn.apache.org/repos/asf/incubator/giraph/trunk/CODE_CONVENTIONS). I wonder if using the ternary operator like so is acceptable: {code} return (seenRecord ? 1f : 0f); {code} Simplify boolean expressions in BspRecordReader --- Key: GIRAPH-84 URL: https://issues.apache.org/jira/browse/GIRAPH-84 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Assignee: Attila Csordas Labels: newbie Twice in BspRecordReader boolean expressions are evaluated with == and can be simplified to just one liners or variable evaluation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages
[ https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152477#comment-13152477 ] Claudio Martella commented on GIRAPH-45: I was struggling with this and the annotation could actually be the elegant solution to this. Avery asked for a review one month ago about this. We could clone those that are not annotated as a general approach. Improve the way to keep outgoing messages - Key: GIRAPH-45 URL: https://issues.apache.org/jira/browse/GIRAPH-45 Project: Giraph Issue Type: Improvement Components: bsp Reporter: Hyunsik Choi Assignee: Hyunsik Choi As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a potential problem to cause out of memory when the rate of message generation is higher than the rate of message flush (or network bandwidth). To overcome this problem, we need more eager strategy for message flushing or some approach to spill messages into disk. The below link is Dmitriy's suggestion. https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-84) Simplify boolean expressions in BspRecordReader
[ https://issues.apache.org/jira/browse/GIRAPH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152488#comment-13152488 ] Avery Ching commented on GIRAPH-84: --- ternary is fine with me. I think we use it in the codebase. We should probably add it to the coding conventions...unless someone objects. Simplify boolean expressions in BspRecordReader --- Key: GIRAPH-84 URL: https://issues.apache.org/jira/browse/GIRAPH-84 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Assignee: Attila Csordas Labels: newbie Twice in BspRecordReader boolean expressions are evaluated with == and can be simplified to just one liners or variable evaluation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.
[ https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152583#comment-13152583 ] Hyunsik Choi commented on GIRAPH-77: I also think that this feature is necessary because we would not depend on MapReduce anymore after we port Giraph to Yarn. Coordinator should expose a web interface with progress, vertex region assignments, etc. Key: GIRAPH-77 URL: https://issues.apache.org/jira/browse/GIRAPH-77 Project: Giraph Issue Type: New Feature Reporter: Jakob Homan It would be nice if the coordinator worker had a web interface that showed progress, splits, etc. during job execution. Right now it would duplicate information currently being exposed through task status, but with the move to YARN, it will be a necessity. It would be great if we could do this in a modern way to avoid the screen-scraping, etc. currently used to get information from most other Hadoop project's web interfaces. The coordinator could announce its address at the beginning or via status updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira